Google in partnership with African universities and research organisations, has launched WAXAL, a large-scale open speech dataset designed to strengthen artificial intelligence tools for African languages and expand access to voice-based technologies across the continent.
The dataset includes speech data covering 21 Sub-Saharan African languages, including Hausa, Yoruba, Igbo, Luganda, Swahili and Acholi.
According to Google, WAXAL is intended to support more than 100 million speakers who have largely been excluded from modern voice technologies due to the scarcity of high-quality language data.
“The ultimate impact of WAXAL is the empowerment of people in Africa. This dataset provides the critical foundation for students, researchers, and entrepreneurs to build technology on their own terms, in their own languages, finally reaching over 100 million people,” Aisha Walcott-Bryantt, Head of Google Research Africa, says.
Bridging Africa’s speech data gap
Voice assistants, speech-to-text tools and audio-based services are now embedded in everyday life in many parts of the world. Yet Africa’s linguistic diversity, spanning more than 2,000 languages, has remained significantly underrepresented in artificial intelligence systems, limiting the usefulness of these tools for education, healthcare delivery, accessibility and commerce.
The absence of robust speech datasets has been one of the biggest barriers to building reliable AI systems for African languages. Many global AI models are trained primarily on English, Mandarin and a small number of European languages, leaving African languages poorly recognised or entirely unsupported.
WAXAL seeks to address this imbalance. Developed over a three-year period with funding from Google, the dataset contains 1,250 hours of transcribed natural speech alongside more than 20 hours of high-quality studio recordings. These studio recordings can be used to create realistic synthetic voices, a key component of modern voice assistants and accessibility tools.
By making the dataset openly available, Google and its partners hope to lower the entry barrier for African developers, researchers and startups seeking to build speech-powered applications tailored to local contexts.
Community-led data collection
A defining feature of the WAXAL initiative is its emphasis on local ownership and community participation. African universities and organisations, including Makerere University in Uganda, the University of Ghana, and Digital Umuganda in Rwanda, led the data collection process, working closely with Google researchers.
Rather than extracting data for external use, the project was structured so that partner institutions retain ownership of the datasets. This approach is intended to ensure that African researchers and students can develop their own tools and applications without dependence on foreign companies or platforms.
At the University of Ghana alone, more than 7,000 volunteers contributed their voices to the project. According to Professor Isaac Wiafe, an Associate Professor at the institution, the initiative is already opening new pathways for innovation in health, education and agriculture.
Enabling inclusive AI development
The release of WAXAL comes amid growing debate about bias, representation and equity in artificial intelligence. Language exclusion not only limits access to technology but can also reinforce existing social and economic inequalities.
By supporting African languages at scale, the dataset could enable new tools for voice-based learning, local-language health information systems, farmer advisory services and accessibility technologies for people with disabilities.
The dataset is now publicly available, allowing developers, startups and researchers to integrate African languages more effectively into speech recognition, voice assistants and transcription systems.
While WAXAL alone will not solve Africa’s AI inclusion challenges, experts say it represents a significant step towards ensuring that the continent’s languages and the people who speak them are not left behind as voice-driven technologies become increasingly central to the digital economy.
Talking Points
At Techparley, we see Google’s WAXAL dataset as a foundational intervention in Africa’s long-standing exclusion from voice-based artificial intelligence.
Despite rapid growth in mobile and digital services, most African languages remain invisible in global AI systems, limiting how people access education, healthcare information and digital tools. WAXAL directly targets that structural gap.
What stands out is the decision to build the dataset with African universities and allow local institutions to retain ownership of the data. This shifts African researchers from being data contributors to technology builders.
We also see implications beyond consumer tech. Voice AI in local languages could improve agricultural extension services, accessibility tools and public-sector digital services where literacy or language barriers remain high.
However, datasets alone are not enough. The long-term impact of WAXAL will depend on sustained investment, open access, and whether African developers are supported to turn this data into widely adopted products.
——————-
Bookmark Techparley.com for the most insightful technology news from the African continent.
Follow us on Twitter @Techparleynews, on Facebook at Techparley Africa, on LinkedIn at Techparley Africa, or on Instagram at Techparleynews.

