Google Launches WAXAL Dataset to Make Voice AI Accessible to Over 100 Million Africans

Quadri Adejumo
By
Quadri Adejumo
Senior Journalist and Analyst
Quadri Adejumo is a senior journalist and analyst at Techparley, where he leads coverage on innovation, startups, artificial intelligence, digital transformation, and policy developments shaping Africa’s...
- Senior Journalist and Analyst
6 Min Read

Google in partnership with African universities and research organisations, has launched WAXAL, a large-scale open speech dataset designed to strengthen artificial intelligence tools for African languages and expand access to voice-based technologies across the continent.

The dataset includes speech data covering 21 Sub-Saharan African languages, including Hausa, Yoruba, Igbo, Luganda, Swahili and Acholi.

According to Google, WAXAL is intended to support more than 100 million speakers who have largely been excluded from modern voice technologies due to the scarcity of high-quality language data.

“The ultimate impact of WAXAL is the empowerment of people in Africa. This dataset provides the critical foundation for students, researchers, and entrepreneurs to build technology on their own terms, in their own languages, finally reaching over 100 million people,” Aisha Walcott-Bryantt, Head of Google Research Africa, says.

Bridging Africa’s speech data gap

Voice assistants, speech-to-text tools and audio-based services are now embedded in everyday life in many parts of the world. Yet Africa’s linguistic diversity, spanning more than 2,000 languages, has remained significantly underrepresented in artificial intelligence systems, limiting the usefulness of these tools for education, healthcare delivery, accessibility and commerce.

The absence of robust speech datasets has been one of the biggest barriers to building reliable AI systems for African languages. Many global AI models are trained primarily on English, Mandarin and a small number of European languages, leaving African languages poorly recognised or entirely unsupported.

WAXAL seeks to address this imbalance. Developed over a three-year period with funding from Google, the dataset contains 1,250 hours of transcribed natural speech alongside more than 20 hours of high-quality studio recordings. These studio recordings can be used to create realistic synthetic voices, a key component of modern voice assistants and accessibility tools.

By making the dataset openly available, Google and its partners hope to lower the entry barrier for African developers, researchers and startups seeking to build speech-powered applications tailored to local contexts.

Community-led data collection

A defining feature of the WAXAL initiative is its emphasis on local ownership and community participation. African universities and organisations, including Makerere University in Uganda, the University of Ghana, and Digital Umuganda in Rwanda, led the data collection process, working closely with Google researchers.

Rather than extracting data for external use, the project was structured so that partner institutions retain ownership of the datasets. This approach is intended to ensure that African researchers and students can develop their own tools and applications without dependence on foreign companies or platforms.

At the University of Ghana alone, more than 7,000 volunteers contributed their voices to the project. According to Professor Isaac Wiafe, an Associate Professor at the institution, the initiative is already opening new pathways for innovation in health, education and agriculture.

Enabling inclusive AI development

The release of WAXAL comes amid growing debate about bias, representation and equity in artificial intelligence. Language exclusion not only limits access to technology but can also reinforce existing social and economic inequalities.

By supporting African languages at scale, the dataset could enable new tools for voice-based learning, local-language health information systems, farmer advisory services and accessibility technologies for people with disabilities.

The dataset is now publicly available, allowing developers, startups and researchers to integrate African languages more effectively into speech recognition, voice assistants and transcription systems.

While WAXAL alone will not solve Africa’s AI inclusion challenges, experts say it represents a significant step towards ensuring that the continent’s languages and the people who speak them are not left behind as voice-driven technologies become increasingly central to the digital economy.

Talking Points

At Techparley, we see Google’s WAXAL dataset as a foundational intervention in Africa’s long-standing exclusion from voice-based artificial intelligence.

Despite rapid growth in mobile and digital services, most African languages remain invisible in global AI systems, limiting how people access education, healthcare information and digital tools. WAXAL directly targets that structural gap.

What stands out is the decision to build the dataset with African universities and allow local institutions to retain ownership of the data. This shifts African researchers from being data contributors to technology builders.

We also see implications beyond consumer tech. Voice AI in local languages could improve agricultural extension services, accessibility tools and public-sector digital services where literacy or language barriers remain high.

However, datasets alone are not enough. The long-term impact of WAXAL will depend on sustained investment, open access, and whether African developers are supported to turn this data into widely adopted products.

——————-

Bookmark Techparley.com for the most insightful technology news from the African continent.

Follow us on Twitter @Techparleynews, on Facebook at Techparley Africa, on LinkedIn at Techparley Africa, or on Instagram at Techparleynews.

Senior Journalist and Analyst
Follow:
Quadri Adejumo is a senior journalist and analyst at Techparley, where he leads coverage on innovation, startups, artificial intelligence, digital transformation, and policy developments shaping Africa’s tech ecosystem and beyond. With years of experience in investigative reporting, feature writing, critical insights, and editorial leadership, Quadri breaks down complex issues into clear, compelling narratives that resonate with diverse audiences, making him a trusted voice in the industry.
Leave a Comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Subscribe to Techparley Africa

Stay ahead of the curve. While millions of people still have to search the internet for the latest tech stories, industry insights and expert analysis; you can simply get them delivered to your inbox.


Please ignore this message if you have already subscribed.

×