


Meta uses the Bible to train a super multi-language model: recognize 1107 languages and identify 4017 languages
There is a story about the Tower of Babel in the Bible. It is said that human beings united to plan to build a high tower, hoping to lead to heaven, but God disrupted human language and the plan failed. Today, AI technology is expected to tear down the barriers between human languages and help mankind create a civilized Tower of Babel.
Recently, a study by Meta has taken an important step towards this aspect. They call the newly proposed method Massively Multilingual Speech (MMS), which is based on "The Bible" was used as part of the training data and the following results were obtained:
- #Using wave2vec 2.0 to train on 1107 languages, a multi-language speech recognition with 1 billion parameters was obtained Compared with OpenAI's Whisper model, the error rate of the model is reduced by more than 50%.
- A single audio synthesis model supports text-to-speech (TTS) for these 1107 languages.
- Developed a language recognition classifier capable of identifying 4017 languages.
How does Meta solve the problem of data scarcity in many rare languages? The method they used is interesting, using religious corpora, because corpora like the Bible have the most "aligned" speech data. Although this dataset is skewed toward religious content and features mostly male voices, the paper shows that the model performs well in other domains as well when using female voices. This is the emergent behavior of the base model, and it's truly amazing. What’s even more amazing is that Meta has released all newly developed models (speech recognition, TTS and language recognition) for free!
- Model download: https://github.com/facebookresearch/fairseq/tree/main/examples/mms
- Paper address: https://research.facebook.com/publications/scaling-speech-technology-to-1000-languages/
The newly proposed method
In order to create a speech model that can recognize thousands of words, the first challenge is to collect audio data in various languages, because the largest speech data set currently available is only Up to 100 languages. To overcome this problem, Meta researchers used religious texts, such as the Bible, which have been translated into many different languages, and those translations have been extensively studied. These translations have audio recordings of people reading them in different languages, and these audios are also publicly available. Using these audios, the researchers created a dataset containing audio of people reading the New Testament in 1,100 languages, with an average audio length of 32 hours per language.
They then included unannotated recordings of many other Christian readings, increasing the number of available languages to more than 4,000. Although the field of this data set is single and mostly consists of male voices, the analysis results show that Meta’s newly developed model performs equally well on female voices, and the model is not particularly biased towards producing more religious language. The researchers stated in the blog that this is mainly due to the Connectionist Temporal Classification method they used, which is far superior to large language models (LLM) or sequence-to-sequence speech recognition models. More restricted.
# Analysis of potential gender bias situations. On the FLEURS benchmark, this automatic speech recognition model trained on the Multilingual Speech (MMS) dataset has similar error rates for male and female voices.
In order to improve the quality of data so that it can be used by machine learning algorithms, they also adopted some preprocessing methods. First, they trained an alignment model on existing data from more than 100 languages, and then paired it with an efficient forced alignment algorithm that can handle very long recordings of more than 20 minutes. Afterwards, after multiple rounds of alignment processes, a final step of cross-validation filtering is performed to remove potentially misaligned data based on model accuracy. In order to facilitate other researchers to create new speech data sets, Meta added the alignment algorithm to PyTorch and released the alignment model.
To train a generally usable supervised speech recognition model, just 32 hours of data per language is not enough. Therefore, their model is developed based on wav2vec 2.0, which is their previous research on self-supervised speech representation learning, which can greatly reduce the amount of labeled data required for training. Specifically, the researchers trained a self-supervised model using approximately 500,000 hours of speech data in more than 1,400 languages—more than five times more languages than any previous study. Then, based on specific speech tasks (such as multilingual speech recognition or language recognition), the researchers fine-tune the resulting model.
Results
The researchers evaluated the newly developed model on some existing benchmarks.
The training of its multi-language speech recognition model uses the wav2vec 2.0 model with 1 billion parameters, and the training data set contains more than 1,100 languages. Model performance does decrease as the number of languages increases, but the decrease is very small: when the number of languages increases from 61 to 1107, the character error rate increases by only 0.4%, but the language coverage increases by more than 18 times.
On the benchmark test of 61 FLEURS languages, the character error rate changes as the number of languages increases, error rate The higher it is, the worse the model is.
By comparing OpenAI's Whisper model, the researchers found that their model's word error rate was only half that of Whisper, while the new model supported 11 times more languages. This result demonstrates the superior capabilities of the new method.
Comparison of word error rates between OpenAI Whisper and MMS on benchmarks of 54 directly comparable FLEURS languages .
Next, using previously existing data sets (such as FLEURS and CommonVoice) and new data sets, Meta researchers also trained a language identification (LID) model and used The FLEURS LID task was evaluated. The results show that not only does the new model perform great, but it also supports 40 times more languages.
Previous research also only supported more than 100 languages on the VoxLingua-107 benchmark, while MMS supports more than 4000 languages.
In addition, Meta has built a text-to-speech system that supports 1,100 languages. The training data for current text-to-speech models is usually speech corpus from a single speaker. One limitation of the MMS data is that many languages have only a small number of speakers, often even a single speaker. However, this became an advantage when building a text-to-speech system, so Meta built a TTS system that supports more than 1,100 languages. Researchers say the quality of speech generated by these systems is actually quite good, and several examples are given below.
Demo of MMS text-to-speech model for Yoruba, Iroko and Maithili languages.
Despite this, researchers say that AI technology is still not perfect, and the same is true for MMS. For example, MMS may mistranscribe selected words or phrases during speech-to-text. This may result in offensive and/or inaccurate language in the output. The researchers emphasized the importance of working with the AI community to develop responsibly.
The value of supporting a thousand words with a single model
Many languages around the world are endangered, and the limitations of current speech recognition and speech generation technology will only further accelerate this trend. The researcher imagined in the blog: Maybe technology can encourage people to retain their own language, because with good technology, they can use their favorite language to obtain information and use technology.
They believe the MMS project is an important step in this direction. They also said that the project will continue to be developed and will support more languages in the future, and will even solve the problems of dialects and accents.
The above is the detailed content of Meta uses the Bible to train a super multi-language model: recognize 1107 languages and identify 4017 languages. For more information, please follow other related articles on the PHP Chinese website!
![Can't use ChatGPT! Explaining the causes and solutions that can be tested immediately [Latest 2025]](https://img.php.cn/upload/article/001/242/473/174717025174979.jpg?x-oss-process=image/resize,p_40)
ChatGPT is not accessible? This article provides a variety of practical solutions! Many users may encounter problems such as inaccessibility or slow response when using ChatGPT on a daily basis. This article will guide you to solve these problems step by step based on different situations. Causes of ChatGPT's inaccessibility and preliminary troubleshooting First, we need to determine whether the problem lies in the OpenAI server side, or the user's own network or device problems. Please follow the steps below to troubleshoot: Step 1: Check the official status of OpenAI Visit the OpenAI Status page (status.openai.com) to see if the ChatGPT service is running normally. If a red or yellow alarm is displayed, it means Open

On 10 May 2025, MIT physicist Max Tegmark told The Guardian that AI labs should emulate Oppenheimer’s Trinity-test calculus before releasing Artificial Super-Intelligence. “My assessment is that the 'Compton constant', the probability that a race to

AI music creation technology is changing with each passing day. This article will use AI models such as ChatGPT as an example to explain in detail how to use AI to assist music creation, and explain it with actual cases. We will introduce how to create music through SunoAI, AI jukebox on Hugging Face, and Python's Music21 library. Through these technologies, everyone can easily create original music. However, it should be noted that the copyright issue of AI-generated content cannot be ignored, and you must be cautious when using it. Let’s explore the infinite possibilities of AI in the music field together! OpenAI's latest AI agent "OpenAI Deep Research" introduces: [ChatGPT]Ope

The emergence of ChatGPT-4 has greatly expanded the possibility of AI applications. Compared with GPT-3.5, ChatGPT-4 has significantly improved. It has powerful context comprehension capabilities and can also recognize and generate images. It is a universal AI assistant. It has shown great potential in many fields such as improving business efficiency and assisting creation. However, at the same time, we must also pay attention to the precautions in its use. This article will explain the characteristics of ChatGPT-4 in detail and introduce effective usage methods for different scenarios. The article contains skills to make full use of the latest AI technologies, please refer to it. OpenAI's latest AI agent, please click the link below for details of "OpenAI Deep Research"

ChatGPT App: Unleash your creativity with the AI assistant! Beginner's Guide The ChatGPT app is an innovative AI assistant that handles a wide range of tasks, including writing, translation, and question answering. It is a tool with endless possibilities that is useful for creative activities and information gathering. In this article, we will explain in an easy-to-understand way for beginners, from how to install the ChatGPT smartphone app, to the features unique to apps such as voice input functions and plugins, as well as the points to keep in mind when using the app. We'll also be taking a closer look at plugin restrictions and device-to-device configuration synchronization

ChatGPT Chinese version: Unlock new experience of Chinese AI dialogue ChatGPT is popular all over the world, did you know it also offers a Chinese version? This powerful AI tool not only supports daily conversations, but also handles professional content and is compatible with Simplified and Traditional Chinese. Whether it is a user in China or a friend who is learning Chinese, you can benefit from it. This article will introduce in detail how to use ChatGPT Chinese version, including account settings, Chinese prompt word input, filter use, and selection of different packages, and analyze potential risks and response strategies. In addition, we will also compare ChatGPT Chinese version with other Chinese AI tools to help you better understand its advantages and application scenarios. OpenAI's latest AI intelligence

These can be thought of as the next leap forward in the field of generative AI, which gave us ChatGPT and other large-language-model chatbots. Rather than simply answering questions or generating information, they can take action on our behalf, inter

Efficient multiple account management techniques using ChatGPT | A thorough explanation of how to use business and private life! ChatGPT is used in a variety of situations, but some people may be worried about managing multiple accounts. This article will explain in detail how to create multiple accounts for ChatGPT, what to do when using it, and how to operate it safely and efficiently. We also cover important points such as the difference in business and private use, and complying with OpenAI's terms of use, and provide a guide to help you safely utilize multiple accounts. OpenAI


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Atom editor mac version download
The most popular open source editor

SublimeText3 English version
Recommended: Win version, supports code prompts!

Zend Studio 13.0.1
Powerful PHP integrated development environment

mPDF
mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

Dreamweaver Mac version
Visual web development tools
