Advancing automatic speech recognition for low-resource ghanaian languages: Audio datasets for Akan, Ewe, Dagbani, Dagaare, and Ikposo

dc.contributor.authorWiafe. I
dc.contributor.authorEkpezu. A. O
dc.contributor.authorHelegah. R. D
dc.contributor.authoret al
dc.date.accessioned2025-12-02T18:01:10Z
dc.date.issued2025
dc.descriptionResearch Article
dc.description.abstractAudio datasets are fundamental to the development of auto- matic speech-recognition (ASR) systems. However, the avail- ability of a large corpus of audio datasets in low-resource languages (LRLs) is limited. This study addresses this gap by introducing audio speech datasets for five low-resource lan- guages spoken in Ghana and parts of Togo. Specifically, it presents a 50 0 0-hour speech corpus in Akan, Ewe, Dagbani, Dagaare, and Ikposo. Each language corpus includes 10 0 0 h of validated audio speech recorded by their indigenous speakers. These audio recordings are spoken descriptions of 10 0 0 culturally relevant images collected using a custom An- droid mobile application. To enhance the dataset’s utility in ASR and linguistic research 10 % of the audio recordings for each language were randomly selected and transcribed, resulting in approximately 100 h of transcription per lan- guage. This dataset represents a critical resource for pre- serving and documenting Ghanaian languages. It holds the potential for advancing speech and language technologies in these languages. Creating this audio dataset is the first step towards bridging the technological gap between high- and low-resource languages. Ethical guidelines were strictly followed throughout the data collection process and partic- ipants were given incentives for lending their voices to this study.
dc.identifier.issnhttps://doi.org/10.1016/j.dib.2025.111880
dc.identifier.urihttps://ugspace.ug.edu.gh/handle/123456789/44208
dc.language.isoen
dc.publisherData in Brief
dc.subjectSpeech-to-text
dc.subjectSpeech synthesis
dc.subjectNatural language processing
dc.titleAdvancing automatic speech recognition for low-resource ghanaian languages: Audio datasets for Akan, Ewe, Dagbani, Dagaare, and Ikposo
dc.typeArticle

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
1-s2.0-S2352340925006043-main.pdf
Size:
2.2 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: