Lanfrica Talks #16 | AfroLID: A Neural Language Identification Tool for African Languages

Lanfrica Talks

22-03-2023 • 29 mins

In this episode, Ife Adebara takes us through the AfroLID toolkit project, a neural LID toolkit for 517 African languages and varieties. Language identification (LID) is a crucial precursor for NLP, especially for mining web data. Problematically, most of the world’s 7000+ languages today are not covered by LID technologies.


AfroLID exploits a multi-domain web dataset manually curated from across 14 language families utilizing five orthographic systems. When evaluated on our blind Test set, AfroLID achieves 95.89 F_1-score. We also compare AfroLID to five existing LID tools that each cover a small number of African languages, finding it to outperform them on most languages. We further show the utility of AfroLID in the wild by testing it on the acutely under-served Twitter domain. Finally, we offer a number of controlled case studies and perform a linguistically-motivated error analysis that allow us to both showcase AfroLID’s powerful capabilities and limitations.

--- Send in a voice message: https://podcasters.spotify.com/pod/show/lanfrica-talks-podcast/message

You Might Like

Darknet Diaries
Darknet Diaries
Jack Rhysider
Double Tap
Double Tap
Accessible Media Inc.
TechStuff
TechStuff
iHeartPodcasts
PRETEND
PRETEND
Creative Babble
The Vergecast
The Vergecast
The Verge
Fortnite Emotes
Fortnite Emotes
Lawrence Hopkinson
Waveform: The MKBHD Podcast
Waveform: The MKBHD Podcast
Vox Media Podcast Network
Acquired
Acquired
Ben Gilbert and David Rosenthal
RNIB Tech Talk
RNIB Tech Talk
RNIB Connect Radio
Smashing Security
Smashing Security
Graham Cluley & Carole Theriault