Publications

At Mak-CAD, we produce cutting-edge research in AI and data science, driving innovation across health, agriculture, and language technologies. Our publications highlight groundbreaking discoveries and contributions that shape the future of technology and its real-world applications.
Building text and speech datasets for low resourced languages: A case of languages in east africa

Journal: 3rd Workshop on African Natural Language Processing. 2022.

This paper addresses the lack of natural language processing resources for African languages and the challenges of obtaining high-quality speech and text data. It details the curation and annotation process for five East African languages—Luganda, Runyankore-Rukiga, Acholi, Lumasaba, and Swahili. Baseline models were developed for machine translation, topic modeling, classification, sentiment classification, and automatic speech recognition. The paper also highlights key experiences, challenges, and lessons learned in building these datasets.

The Makerere Radio Speech Corpus: A Luganda Radio Corpus for Automatic Speech Recognition

Journal: arXiv preprint 

Building an automatic speech recognition (ASR) system for under-resourced languages is crucial, especially in societies where radio is the primary medium of communication. In Uganda, efforts to understand rural perspectives are hindered by the lack of transcribed speech datasets. To address this, the Makerere AI research lab has released a 155-hour Luganda radio speech corpus, the first publicly available radio dataset in sub-Saharan Africa. This paper details the corpus development and presents baseline ASR performance results using the Coqui STT toolkit.

Scoring Root Necrosis in Cassava Using Semantic Segmentation

Journal: arXiv preprint 

Cassava, a major food crop in Africa, is severely affected by Cassava Brown Streak Disease (CBSD), which causes necrosis in starch-bearing tissues. Breeders currently rely on subjective visual inspection for scoring necrosis. This paper presents an automated approach using deep convolutional neural networks with semantic segmentation. Our experiments show that the UNet model performs this task with high accuracy, achieving a mean Intersection over Union (IoU) of 0.90 on the test set. This method provides a means to use a quantitative measure for necrosis scoring on root cross-sections. This is done by segmenting and classifying the necrotized and non-necrotized pixels of cassava root cross-sections without any additional feature engineering. toolkit.

Improving in-field cassava whitefly pest surveillance with machine learning

Journal: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 2020.

Whiteflies are key vectors of cassava diseases, and monitoring their numbers is crucial for disease control. The current manual counting method is tedious and time-consuming. This paper proposes an automated approach using computer vision techniques. Images of infested cassava leaves were collected, and a detector was trained using Haar Cascade and Deep Learning methods to identify and count whiteflies. Results show that this approach achieves high precision. The method can also be adapted for similar object detection tasks with minor modifications.

Machine Translation For African Languages: Community Creation Of Datasets And Models In Uganda

Journal: 3rd Workshop on African Natural Language Processing. 2022.

Machine translation systems are limited by the lack of training and evaluation data for many languages. This study presents the creation of a parallel text corpus, SALT, for five Ugandan languages (Luganda, Runyankole, Acholi, Lugbara, and Ateso) by local NLP teams. Various methods were used to train and evaluate translation models, which proved effective even for previously unsupported languages, achieving mean BLEU scores of 26.2 (to English) and 19.9 (from English). The SALT dataset and models are publicly available at GitHub.

Interpretable Machine Learning-Based Triage For Decision Support in Emergency Care

Journal: 7th International Conference on Trends in Electronics and Informatics (ICOEI). IEEE, 2023.

Triage in medicine prioritizes patients based on urgency, but traditional nurse-led evaluations are time-consuming and prone to human error. Mis-triage can delay critical care, while the absence of triage can overwhelm hospital resources. This research explores Explainable AI (XAI) for machine learning-based triage, using classifiers such as Decision Trees, Random Forest, XGBoost, and Histogram-Based Gradient Boosting. The best-performing model, Histogram-Based Gradient Boosting, achieved a 91% AUC score and 70% F1 score. XAI techniques like LIME and SHAP were applied to enhance model transparency and trustworthiness for intelligent healthcare.

Gender bias Evaluation in Luganda-English Machine Translation Proceedings of the 15th biennial conference of the Association for Machine Translation in the Americas

Journal: Proceedings of the 15th biennial conference of the Association for Machine Translation in the Americas

In this paper, we explore the evaluation of gender bias in Luganda-English machine translation, an area that remains underexplored due to limited explicit text data. We build machine translation models using transfer learning with a pre-trained Marian MT model for English-Luganda and Luganda-English. To assess gender bias, we apply the Word Embeddings Fairness Evaluation Framework (WEFE), focusing on Luganda’s gender-neutral pronouns. A small set of trusted gendered examples is used to measure bias, with results validated through human evaluation. Additionally, we introduce a modified Translation Gender Bias Index (TGBI) to account for Luganda’s grammatical structure.

Using machine learning for image-based analysis of sweetpotato root sensory attributes

Journal: Smart Agricultural Technology.

In this paper, we present a machine learning-based approach to predicting sweetpotato sensory attributes, specifically flesh color and mealiness, to improve the breeding process. Traditional methods rely on trained human panels, which are costly and time-consuming, limiting throughput. Our approach uses image-based analysis with the DigiEye imaging system to capture and process sweetpotato cross-section images, extract features, and train predictive models. The Linear Regression and Random Forest Regression models achieved high accuracy for flesh color prediction (R² = 0.92 and 0.87, respectively), while the Random Forest and Gradient Boosting models performed well for mealiness prediction (R² = 0.85 and 0.80). The models were successfully tested by the sweetpotato breeding team at the International Potato Center in Uganda, demonstrating their potential to automate and accelerate the evaluation process. This method could enhance the selection of promising sweet potato varieties for breeding and increase adoption by consumers.

Misinformation detection in Luganda-English code-mixed social media text

Journal: arXiv preprint 

In this paper, we address the lack of misinformation detection tools for Uganda’s 40 indigenous languages by developing a dataset and classification models for detecting misinformation in code-mixed Luganda-English social media messages. The dataset was sourced from Facebook and Twitter, and various machine learning methods were applied for classification. A 10-fold cross-validation experiment showed that the Discriminative Multinomial Naive Bayes (DMNB) model performed best, achieving an accuracy of 78.19% and an F-measure of 77.90%. Support Vector Machine and Bagging ensemble models also produced comparable results. These findings demonstrate the potential of machine learning-based approaches for misinformation detection in under-resourced languages using n-gram features.

Keyword Spotter Model for Crop Pest and Disease Monitoring from Community Radio Data

Journal: arXiv preprint 

In this paper, we explore the use of machine learning-based speech keyword spotting techniques to analyze community radio data in rural Uganda, where radio remains a dominant means of communication. Unlike urban areas with widespread internet access, rural communities rely on radio talk shows for news and discussions. We develop models to identify keywords related to agriculture from radio audio streams, providing a cost-efficient method for monitoring food security concerns such as crop diseases, pests, drought, and famine. This approach supports early warning systems for policymakers and stakeholders, enhancing agricultural and economic resilience in rural areas.

A Comparison of Topic Modeling and Classification Machine Learning Algorithms on Luganda Data

Journal: AfricaNLP workshop.

This paper discusses the use of topic modeling and classification techniques on Luganda text data to automatically extract functional themes and topics. The authors employed Non-negative Matrix Factorization (NMF) for topic modeling, an unsupervised algorithm that identifies hidden patterns in text, and various approaches for topic classification, including classical methods, neural networks, and pretrained algorithms. The Bidirectional Encoder Representations from Transformers (BERT) and Support Vector Machine (SVM) algorithms yielded the best results for topic classification. The study found that both topic modeling and classification produced similar results when trained on a balanced dataset.

Enhanced Infield Agriculture with Interpretable Machine Learning Approaches for Crop Classification

Journal: arXiv preprint 

This research explores various approaches for crop classification using Artificial Intelligence (AI), particularly in the agricultural sector. Four techniques were evaluated: traditional machine learning with handcrafted feature extraction (SIFT, ORB, and Color Histogram), custom-designed CNN and deep learning architectures like AlexNet, transfer learning with pre-trained models (EfficientNetV2, ResNet152V2, Xception, Inception-ResNetV2, MobileNetV3), and cutting-edge models like YOLOv8 and DINOv2. Among them, Xception outperformed others, achieving 98% accuracy with a model size of 80.03 MB and a prediction time of 0.0633 seconds. The research also emphasized the importance of model explainability using tools like LIME, SHAP, and GradCAM, ensuring transparency in AI predictions. The study highlights the significance of selecting the right model based on specific tasks and the role of explainability in improving AI-driven crop management strategies.

Building a Luganda Text-to-Speech Model From Crowdsourced Data

Journal: arXiv preprint 

This paper addresses the limitations of text-to-speech (TTS) development for Luganda, a language with scarce high-quality, single-speaker recordings. Previous models using Luganda Common Voice recordings have generated intelligible but low-quality speech due to insufficient preprocessing, varying intonations, and background noise. The authors improve TTS quality by training on recordings from six female speakers with similar intonations, applying a pre-trained speech enhancement model to reduce noise, and filtering recordings with a high Mean Opinion Score (MOS) above 3.5. The resulting TTS model achieved a MOS of 3.55, significantly outperforming the existing model (2.5 MOS) and models trained on fewer speakers, demonstrating the effectiveness of using multiple speakers with close intonation to enhance TTS quality.

Machine Learning Analysis of Radio Data to Uncover Community Perceptions on the Ebola Outbreak in Uganda

Journal: ACM Journal on Computing and Sustainable Societies

This study used machine learning to analyze English and Luganda radio broadcasts to understand public perceptions of the Ebola outbreak in Uganda. The analysis identified three main speaker categories: media personalities, community guests and listeners, and government officials, with the government playing the most significant role in public education. The findings revealed that the community was hesitant to use Ebola vaccines, citing concerns about their untested status in other populations, and expressed worries about COVID-19 lockdown measures. Additionally, differences were noted in the timing and content of conversations between male and female speakers. These insights can help inform population-specific policies for managing current and future pandemics.

Predicting Sweepotato Sensory Attributes Using DigiEye and Image Analysis as a Breeding Tool

Journal: RTBFoods

This study aimed to develop and evaluate a color and mealiness classification model for sweetpotato roots using images. A total of 3018 images were collected from 950 samples across various sites in Uganda and Kenya between October 2021 and November 2022. Sensory panel data were used for calibration, with up to twelve cooked roots per genotype evaluated per session. Linear regression models showed strong performance, particularly for predicting orange color intensity (R² = 0.92, MSE = 0.58), indicating suitability for field application. The best model for mealiness showed a Mean Absolute Error (MAE) of 2.16 for mealiness-by-hand and 9.01 for positive area.

Standard operating procedure for image capture in sweetpotato and potato, and sensory attribute prediction. Work package 3

Journal: RTBFoods

This paper highlights the use of computer vision technology, specifically the DigiEye system, for evaluating important crop traits to enhance breeding programs. The DigiEye system, which measures color and appearance, is a fast, non-destructive, high-throughput tool for acquiring crop traits on a large scale. It is particularly useful in capturing data related to color and texture, which are linked to the chemical composition and sensory properties of food. The paper outlines a Standard Operating Procedure (SOP) for using the DigiEye system to capture images of sweet potato and potato, and to predict color and mealiness, providing a step-by-step guide for replicating the process.

Leveraging Edge Computing and Deep Learning for the Real-Time Identification of Bean Plant Pathologies

Journal: Smart Agricultural Technology

This work proposes a deep learning-based approach to identify diseases in bean plants, specifically Angular Leaf Spot (ALS) and bean rust, which are common in Uganda. The study evaluates image classification and object detection models using Convolutional Neural Network (CNN) architectures. The Makerere University beans image dataset, consisting of 15,335 images (ALS, bean rust, and healthy), was used for training. The dataset was expanded with an additional “unknown class” of 2,800 images to account for unrelated images. Adversarial training and Out-of-Distribution (ODD) detection techniques improved model robustness. The custom CNN, BeanWatchNet, achieved 90% accuracy for the three target classes, while EfficientNet v2 B0 and BeanWatchNet performed at 91% and 90% accuracy for a four-class classification task. YOLO v8 was superior in object detection, achieving an mAP@50 of 87.6. The models were deployed on smartphones and Raspberry Pi for in-field disease detection, with the code and models available on GitHub.

Assessing the contribution of the Adhoc crop health surveillance tool on the food security and livelihoods of smallholder farmers in Uganda.

Journal: The Electronic Journal of Information Systems in Developing Countries

This study assesses the impact of Adsurv, a mobile phone crowdsourcing tool used by smallholder farmers in Uganda for crop health surveillance and pest management. While previous research focused on the effects of mobile technologies on food security or livelihoods separately, this study provides a holistic evaluation of both aspects. Using the Sustainable Livelihood Framework (SLF) and the Analytic Hierarchy Process (AHP), the study found that Adsurv contributed more significantly to food availability rather than access or utilization. The main benefits were in enhancing human assets by empowering farmers with skills, which improved other livelihood assets. The findings suggest that further research is needed to promote the nutritional value of food in farming practices for long-term sustainable livelihoods.

PaliGemma-CXR: A Multi-task Multimodal Model for TB Chest X-ray Interpretation

Journal: arXiv preprint

This study presents PaliGemma-CXR, a multi-task multimodal model designed to address the challenges of TB diagnosis using chest X-rays. TB is a global health issue, and while X-rays are standard for screening, the shortage of radiologists is a major concern. The model aims to automate TB diagnosis, object detection, segmentation, report generation, and visual question answering (VQA). Using a multimodal dataset, the model was fine-tuned and data sampling methods were applied to improve performance. The results show impressive performance across tasks: 90.32% accuracy on TB diagnosis, 98.95% on close-ended VQA, 41.3 BLEU score for report generation, and mAP scores of 19.4 and 16.0 for object detection and segmentation, respectively. PaliGemma-CXR demonstrates the effective use of multi-task learning to improve image interpretation for TB detection.

Machine Translation For African Languages: Community Creation Of Datasets And Models In Uganda.

Journal: 3rd Workshop on African Natural Language Processing

This case study focuses on creating resources for machine translation systems for underrepresented languages. A parallel text corpus, SALT, was developed for five Ugandan languages (Luganda, Runyankole, Acholi, Lugbara, and Ateso) to address the shortage of training and evaluation data. The study explored various methods to train and evaluate translation models, which proved effective for practical applications. The resulting models achieved a mean BLEU score of 26.2 for translations into English and 19.9 for translations from English. The SALT dataset and models are publicly available for use.