Funder
Duration
Lorem ipsum dolor
Keywords (Technologies and Domain)
Language technology, Informal speech Machine translation
Improving machine translation for informal usage of Ugandan languages
Current machine translation systems, such as Google Translate, often struggle to accurately capture the nuances of informal Ugandan speech, which is characterized by the use of slang, idioms, proverbs, and code-switching. A key limitation is the lack of datasets that represent the linguistic complexity and diversity of informal communication in Ugandan languages. This project seeks to address this gap by presenting a machine translation training dataset specifically developed to support the translation of informal sentences between English and Ugandan local languages, including Luganda, Lusoga, Kakwa, Acholi, Lugbara, and Runyakitara — a language cluster comprising Runyankole, Runyoro, Rutooro, and Rufumbira.
Outputs (Datasets, publications, models)
Informal speech text dataset