The field of natural language processing (NLP) has advanced the furthest in the most widely-used languages like English and Russian. But an emerging body of research is focused on training AI models using African languages.
Thanks to such efforts, the dream of an African language chatbot is edging closer to reality.
Chatbot Research Dominated by English Language
Natural language processing and the large language models that power chatbots like ChatGPT are still relatively new technologies. And to date, research and development has focused on the most spoken languages.
For example, ChatGPT is available in English, Spanish, French, German, Portuguese, Italian, Dutch, Russian, Arabic, and Chinese.
The tendency toward language dominance in AI research is largely driven by data availability.
It is estimated that over half of all written content available online is in English. Accordingly, of the datasets needed to train language models, the largest and most readily available are in English, followed by the other most popular languages.
African Languages Pose a Challenge for AI Researchers
Currently, the world’s largest AI firms are battling it out to build the most advanced chatbots for a handful of languages. But another sphere of research is looking to develop AI tools for less popular languages.
For African languages, the limited availability of training data presents a significant challenge for AI developers.
The linguistic diversity
Go to Source to See Full Article
Author: James Morales