Named Entity Recognition (NER)

Joyce Nakatumba-Nabende, (PI), Jonathan Mukiibi, Eric Peter Kigaye, Maurice Katusiime, Deborah Nabagereka

Funder: Lacuna Fund – A Collaborative Fund between the Rockefeller Foundation, Google.org and Canada’s International Development Research Center

  • 2022
  • ongoing
  • 5 min read

Named Entity Recognition (NER) is a classification task that identifies words in a text that refer to entities or predefined categories (such as dates, person, organization and location names).

Example of NER: Mukiibi [PERSON] asomera Makerere University [ORGANISATION] e Kampala [LOCATION] buli Lwakubiri [DATE].

Together with Masakhane, we collaborated on creating Named Entity Recognition Models for 10 African languages. Our focus was on Luganda, a Bantu Language which is the most spoken by over 15 million people in Uganda with the majority of speakers being in Central Uganda.

In this research, we take an initial step towards improving representation for African languages for the NER task, making the following contributions:

(i) We bring together language speakers, dataset curators, NLP practitioners, and evaluation experts to address the challenges facing NER for African languages. Based on the availability of online news corpora and language annotators, we develop NER datasets, models, and evaluations covering ten widely spoken African languages.

(ii) We curate NER datasets from local sources to ensure the relevance of future research for native speakers of the respective languages.

(iii) We train and evaluate multiple NER models for all ten languages. Our experiments provide insights into the transfer across languages and highlight open challenges.

(iv) We release the datasets, code, and models to facilitate future research on the specific challenges raised by NER for African languages.

Datasets

Library

Models