When Western-Trained AI Meets African Healthcare
Artificial intelligence (AI) is increasingly stepping into the world of medicine. Large language models (LLMs) can summarize clinical notes, answer health questions, and assist physicians in diagnosing disease. In places where doctors are scarce and healthcare infrastructure is limited; these tools promise to expand access to care.
But there is a problem hiding inside the data.
Most medical AI systems learn from training material drawn largely from Western countries. Their medical knowledge often reflects U.S. or European clinical practice, datasets written in English, and cultural assumptions embedded in those systems. When these models encounter patients, diseases, or treatment traditions outside that environment, their performance can change in unexpected ways.
For Georgia Tech P.h.D. student Charles Nimo (College of Computing, Institute for People and Technology), this raises a central question. What happens when AI trained on Western medicine meets the realities of African healthcare?
Nimo, who is in his third year as a P.h.D. student in computer science and a graduate research assistant in Professor Michael Best’s Technologies and International Development Lab has most recently been selected as the newest recipient of the Roland Ewubare Fellowship in Societal Engagement and Impact, a philanthropic program supporting underrepresented graduate students whose research advances community focused, socially relevant scholarship.
His research explores that tension. By building new datasets and evaluating modern language models against them, Nimo and his collaborators are uncovering where medical AI succeeds, where it fails, and what it will take to make these systems work across cultures.
From Enterprise Systems to Global Health AI
Nimo did not start out his career focused on healthcare.
He studied electrical engineering at Virginia Commonwealth University and later worked as a software engineer at Dell, where he helped build enterprise systems used to monitor large-scale computing infrastructure. The work focused on the technical backbone of modern data centers.
“I had spent years building large computing systems,” Nimo says. “But I became more interested in what those systems could actually do in the real world.”
Around 2020 he decided to return to school to deepen his understanding of AI and machine learning.
At the University of Texas at Austin, he joined a lab led by Professor Ying Ding that explored applications of AI in healthcare. The lab focused on developing efficient machine learning models capable of running in resource-constrained environments. For his master’s thesis, Nimo began applying that work to healthcare challenges in African contexts.
That experience reshaped the questions he wanted to pursue.
“Healthcare looks very different depending on where you are in the world,” Nimo says.
After completing his master’s degree, he moved to Atlanta to pursue a P.h.D. in computer science at Georgia Tech. There he began collaborating with researchers including Professors Michael Best and Irfan Essa exploring how AI systems perform in the African context.
Building a Benchmark for AI in Medicine Across the African Continent
One of the first problems Nimo encountered was surprisingly simple.
Most medical AI systems had never been rigorously tested on African clinical knowledge.
“If you look at how we evaluate medical AI today, almost all of the benchmarks come from Western exams,” Nimo says.
Many of those benchmarks rely on datasets derived from medical licensing exams such as the United States Medical Licensing Examination. These collections contain thousands of questions about diagnosis, treatment, and clinical reasoning. Models that perform well on them are often considered medically capable.
But success on a U.S. exam does not necessarily translate to other regions.
Together with collaborators from multiple institutions, Nimo helped develop a new benchmark called AfriMed-QA. The dataset brings together more than 15,000 medical questions drawn from across the African continent, including material from over sixty medical schools and spanning thirty-two specialties.
The questions range from clinician-written exam problems to consumer health queries that reflect how patients might ask about symptoms or treatments.
To assemble the dataset, the research team worked with clinicians, trainees, and contributors across multiple African countries, creating the largest study on LLMs in African healthcare with support from multiple organizations including Google, the Gates Foundation, and PATH. The goal was to capture the diversity of medical knowledge, health conditions, and patient experiences present across the continent.
When the researchers tested modern Western LLMs against this dataset, a clear pattern emerged. Many models that had performed well on Western medical benchmarks showed noticeable drops in accuracy when answering medical questions relevant to practice across the African continent.
“Medicine isn’t practiced in a vacuum,” Nimo says. “Disease patterns, available treatments, and even when patients show up for care can be very different depending on where you are.”
This gap in accuracy highlighted an important reality. Medical knowledge may be global, but the context in which it is practiced is not.
For example, some questions required familiarity with diseases more common in tropical climates. Others reflected differences in healthcare access, diagnostic resources, or the timing of treatment.
Models trained primarily on Western data struggled to navigate those differences.
The Cultural Layer of Medicine
If the AfriMed-QA project exposed performance gaps, Nimo’s second major study looked deeper at why those gaps appear.
The research, titled Africa Health Check, examines cultural bias inside medical language models. The study focuses on how AI systems respond when presented with treatments rooted in traditional medicinal practices across Africa.
Across the continent, traditional herbal medicine remains a central component of healthcare. Estimates suggest that roughly 80 percent of people in Africa rely on these remedies for primary care.
Yet most modern medical AI systems rarely mention them.
“A lot of people assume medicine is universal,” Nimo says. “But culture shapes how people understand illness and treatment.”
To study that dynamic, Nimo and his collaborators built a dataset that pairs African medicinal plants with the health conditions they address. The dataset includes more than one hundred remedies and over 130 country-specific treatment pairs drawn from peer-reviewed literature.
Researchers then asked language models to choose between different treatment options or complete prompts describing medical scenarios.
The results revealed a consistent pattern. When given little contextual information, models tended to default to conventional Western treatments even when traditional remedies were relevant and widely used in local healthcare systems.
The study also introduced new techniques to analyze why models make these choices. One method measures how strongly a model prefers one treatment over another. Another traces which words in a prompt influence the model’s response.
Together, these tools allow researchers to see both what a model recommends and how it arrived at that decision.
The findings suggest that bias does not always come from explicit errors. Often it emerges quietly from the distribution of training data. If a model learns mostly from Western-centric medical literature, it will naturally prioritize the treatments it encounters most often.
Toward Context-Aware Health AI
For Nimo, these studies are not just about identifying limitations. They are also about building better tools.
Healthcare systems in low- and middle-income countries face persistent shortages of physicians and specialists. AI systems have the potential to assist clinicians, provide decision support, and answer patient questions in environments where medical expertise is scarce.
But those tools must reflect the communities they serve.
Future versions of the AfriMed-QA dataset aim to expand beyond English and include additional languages spoken across Africa. The research team also hopes to incorporate multimodal data such as medical images and audible speech.
These additions matter because healthcare knowledge is not only written in textbooks. It exists in conversations, local languages, clinical images, and cultural practices. AI systems designed for global use must learn to understand all of them.
The challenge Nimo is considering sits at the intersection of technology and health equity. AI may transform healthcare, but only if it recognizes the diversity of medical practice around the world.