Funder

Google

Duration

Lorem ipsum dolor

Keywords (Technologies and Domain)

Language technology, Informal speech Machine translation

Improving machine translation for informal usage of Ugandan languages

Current machine translation systems, such as Google Translate, often struggle to accurately capture the nuances of informal Ugandan speech, which is characterized by the use of slang, idioms, proverbs, and code-switching. A key limitation is the lack of datasets that represent the linguistic complexity and diversity of informal communication in Ugandan languages. This project seeks to address this gap by presenting a machine translation training dataset specifically developed to support the translation of informal sentences between English and Ugandan local languages, including Luganda, Lusoga, Kakwa, Acholi, Lugbara, and Runyakitara — a language cluster comprising Runyankole, Runyoro, Rutooro, and Rufumbira.

Outputs (Datasets, publications, models)

Informal speech text dataset