Bài báo quốc tế
Kho tri thức
/
Bài báo quốc tế
/
SwahiliVQA: A Dataset for Visual Question Answering in Swahili Language
SwahiliVQA: A Dataset for Visual Question Answering in Swahili Language
Mbwana Francis Stephen
This paper introduces the first Swahili Visual Question Answering (SwahiliVQA) dataset, addressing the critical underrepresentation of African languages in vision-language research. Swahili, spoken by over 100 million people across East Africa, remains severely underserved in AI development despite its widespread use. Our dataset comprises 10,000 images paired with 41,448 question-answer combinations, translated from English resources and validated by native Swahili speakers to ensure linguistic authenticity. We establish baseline performance metrics using various model architectures and propose a novel multimodal approach that combines CLIP's vision encoder with a Swahili-specific RoBERTa model, achieving 38.38% accuracy. This work provides essential resources for Swahili-language AI development, establishes a methodological framework for creating VQA datasets in other underrepresented languages, and contributes to more equitable artificial intelligence that serves diverse linguistic communities across East Africa and beyond.
Bài báo liên quan
AnoResLSTM: A Hybrid Deep Learning Framework for Real-Time Cheating Detection in Online Exams
Bùi Quốc HuyRL-HCR: A Reinforcement Learning Based Adaptive Leader Selection Framework for Energy-Efficient WSNs
Trần Huy LongA Dual-Path approach for Time Series Anomaly Detection in Building Environmental Sensors
Nguyễn Chí Minh HiếuRefurbished Smartphones in the Circular Economy: Insights From Environmental and Consumer Perspectives
Nguyễn Đình Sơn