Meta Expands AI Speech Recognition to 1,600+ Languages

Image: Adobe Stock

Omnilingual Automatic Speech Recognition can transcribe speech in over 1,600 languages — including 500 low-resource languages.

Écrit par

TechRepublic Staff

Revu par :

Antony Peyton

Nov 11, 2025

We may earn from vendors via affiliate links or sponsorships. This might affect product placement on our site, but not the content of our reviews. See our Terms of Use for details.

Willkommen. Bienvenue. Welcome. C’mon in. Meta has unveiled Omnilingual Automatic Speech Recognition (ASR), an AI system that can transcribe speech in over 1,600 languages — including 500 low-resource languages that have never been handled by AI before.

The project represents the latest development from Meta’s Fundamental AI Research (FAIR) team and signals a major shift toward making speech technology open to all linguistic communities. Critics might argue that promoting a global language (i.e., English) would be a better use of our time, and perhaps lead to world harmony and more technological innovations.

Anyway, according to the announcement, the company has also open-sourced several key assets alongside the system: Omnilingual wav2vec 2.0, a seven-billion-parameter self-supervised multilingual speech model, and the Omnilingual ASR Corpus, a collection of transcribed speech in 350 underserved languages.

All models are released under the Apache 2.0 license, while datasets are licensed under CC-BY, ensuring they are freely usable and modifiable by the global AI community. The framework is built atop fairseq2 and fully compatible with the PyTorch ecosystem.

Tackling the digital language divide
Scaling ASR to global coverage
Bring your own language
Partnerships and data sourcing
Broader implications

Tackling the digital language divide

ASR systems have historically performed well only for a handful of high-resource languages like English, Spanish, and Mandarin, which dominate the internet and benefit from large labeled datasets. Low-resource languages, often spoken by millions globally, have remained excluded from digital systems — a gap that Meta reckons perpetuates inequalities in education, access, and digital participation.

Meta’s Omnilingual ASR is designed to close that gap by reducing the data and expertise required to build functioning ASR models. Its architecture introduces two decoder variants — one based on the traditional connectionist temporal classification (CTC) framework, and another using a transformer-based LLM decoder.

Scaling ASR to global coverage

Meta reports that its largest model — the 7B-LLM-ASR — has character error rates below 10 for nearly 80% of them.

The scale of the project also highlights the progress in AI architectures capable of learning from untranscribed or raw speech. By scaling up wav2vec 2.0 to seven billion parameters, Meta’s engineers have built a model that learns generalized speech representations without requiring huge labeled datasets, making it easier to extend to previously unsupported languages.

Bring your own language

Omnilingual ASR has the capacity to learn new languages with only a few examples. Traditionally, adding a new language to ASR systems required extensive fine-tuning by experts — an expensive and highly technical process. In contrast, Meta says its system can adapt to a new language simply by processing a few paired audio-text samples, a technique borrowed from context learning in LLMs.

This approach means speakers of underrepresented languages can contribute to the inclusion of their language without access to high-end computing or massive datasets. While performance may not initially match that of fully trained models, the scalability and accessibility of this method could redefine how languages enter the digital sphere.

Partnerships and data sourcing

To create a dataset, Meta worked with local partners and linguistic organizations around the world. Many of these collaborations involved recruiting and compensating native speakers to record speech in their own languages, often in remote or digitally underserved areas.

Through the Language Technology Partner Program, Meta collaborated with groups like Mozilla Foundation’s Common Voice and Lanfrica/NaijaVoices.

The company is releasing the commissioned portions of this training data publicly as the Omnilingual ASR Corpus, which now stands as the world’s “largest ultra-low-resource spontaneous ASR dataset.”

Broader implications

The release of Omnilingual ASR could have implications beyond research. For education, it could support transcription and translation of oral traditions or lectures in native languages. For governments and NGOs, it could make voice interfaces and documentation tools accessible to marginalized groups. And for the AI industry at large, it demonstrates that global-scale AI systems can be built on open, community-driven foundations.

OpenAI is offering US veterans free access to ChatGPT Plus, using AI tools to help service members transition into civilian careers and new opportunities.