The KaraAgroAI Cocoa Dataset

Akogo Darlington; Nakatumba-Nabende Joyce; Christabel Acquaye; Emmanuel Amoako; Jerry Buaba; Issah Samori; Tusubira Francis Jeremy; Namanya Gloria

  • Data

  • Metadata

About the Dataset

The dataset was created to provide an open and accessible Cocoa dataset with well-labeled, sufficiently curated, and prepared Cocoa crop imagery that will be used by data scientists, researchers, the wider machine learning community, and social entrepreneurs within Sub-saharan Africa and worldwide for various machine learning experiments so as to build solutions towards in-field Cocoa crop disease diagnosis and spatial analysis. The Cocoa dataset was collected across 3 classes i.e. Healthy, Cocoa Swollen Shoot Virus Disease (CSSVD), and Anthracnose.


Despite the fact that the agricultural sector is a national economic development priority in sub-Saharan Africa, crop pests and diseases have been the challenge affecting major food security crops like cocoa. Cocoa Swollen Shoot Virus Disease (CSSVD) can substantially reduce yield by about 70% and even cause the death of cocoa trees within 2–3 years of infection at all stages of cocoa growth. It is one of the major disease problems affecting cocoa production in West Africa, most especially, in Ghana, Côte D’Ivoire, Nigeria, and Togo. Anthracnose, caused by Colletotrichum lupini, is the world's most important lupin disease. The current state of data collection and crop pest and disease diagnosis is transitioning from disease identification using visible symptoms to the use of data-driven solutions applying machine learning and computer vision techniques. The image data previously collected is biased and not reproducible It has also not been sufficiently curated, prepared, and shared with the wider community.

Usage Information

    CC0 1.0