Funder
Bill and Melinda Gates foundation
Duration
Lorem ipsum dolor
Keywords (Technologies and Domain)
Agricultural sciencies
Support for building the Luganda Common Voice community by Mozilla Foundation
Common Voice is an open-source crowdsourcing platform for collecting text and audio data, enabling individuals to contribute to building their language’s digital footprint. Through the platform, users can donate their voice or text or access the crowdsourced data to train and fine-tune Natural Language Processing (NLP) models for various applications. With funding from the Lacuna Fund, this project focuses on developing NLP text and speech datasets for low-resourced languages in East Africa. The project leverages the Common Voice (CV) platform to enable contributions from diverse communities, allowing anyone with internet access and a web browser to participate. Currently, the primary focus is on Luganda, following its recent inclusion on the Common Voice platform. These datasets will support the development of key NLP technologies, including Automatic Speech Recognition (ASR) models for local languages, contributing to the broader goal of advancing the Sustainable Development Goals (SDGs). In addition to speech data, the project is also developing monolingual and parallel text corpora to support various NLP applications.
Outputs (Datasets, publications, models)
NLP Text and Speech Datasets