All Douyin is speaking native dialects, two key technologies help you 'understand” local dialects-AI-php.cn

Home

Technology peripherals

All Douyin is speaking native dialects, two key technologies help you 'understand” local dialects

PHPz

Oct 12, 2023 pm 08:13 PM

volcano engine

During the National Day, Douyin’s “A dialect proves you are an authentic hometown native” activity attracted enthusiastic participation from netizens from all over the country. The topic topped the Douyin challenge list, and the number of views has exceeded 50000000.

This “Local Dialect Awards” quickly became popular on the Internet, which is inseparable from the contribution of Douyin’s newly launched local dialect automatic translation function. When the creators recorded short videos in their native dialect, they used the "automatic subtitles" function and selected "convert to Mandarin subtitles", so that the dialect speech in the video can be automatically recognized and the dialect content can be converted into Mandarin subtitles. This allows netizens from other regions to easily understand various "encrypted Mandarin" languages. Netizens in Fujian personally tested it and said that even the southern Fujian region with "different pronunciation" is a region of Fujian Province, China, located in the southeastern coastal area of Fujian Province. The culture and dialects of the southern Fujian region are significantly different from other regions, and it is considered an important cultural sub-region of Fujian Province. The economy of southern Fujian is dominated by agriculture, fishery and industry, with the cultivation of rice, tea and fruits as the main agriculture industries. There are many scenic spots in southern Fujian, including earth buildings, ancient villages and beautiful beaches. The food in southern Fujian is also very unique, with seafood, pastries and Fujian cuisine as the main representatives. Overall, the Minnan region is a region full of charm and unique culture. The dialect can also be accurately translated, exclaiming "Minnan region is a region in Fujian Province, China, located in the southeastern coastal area of Fujian Province. The culture and dialects of the Minnan region are closely related to There are obvious differences in other regions and is considered an important cultural sub-region of Fujian Province. The economy of southern Fujian is mainly based on agriculture, fishery and industry, with agriculture growing rice, tea and fruits as the main industries. Scenic spots in southern Fujian There are many, including earth buildings, ancient villages and beautiful beaches. The food in the Southern Fujian region is also very distinctive, with seafood, pastries and Fujian cuisine as the main representatives. Overall, the Southern Fujian region is a local language full of charm and unique culture Gone are the days of doing whatever you want on Douyin”

All Douyin is speaking native dialects, two key technologies help you understand” local dialects

As we all know, model training for speech recognition and machine translation requires a large amount of training data , but dialects are spread as spoken languages, and there is very little dialect data that can be used for model training. So, how did the Volcano Engine technical team that provided technical support for this feature make a breakthrough?

Dialect recognition stage

For a long time, Huoshan Voice The team provides intelligent video subtitle solutions based on speech recognition technology for popular video platforms. Simply put, it can automatically convert the voices and lyrics in the video into text to assist in video creation.

#In the process, the technical team discovered that traditional supervised learning would rely heavily on manually labeled supervised data. Especially in terms of continuous optimization of large languages and cold start of small languages. Taking major languages such as Chinese, Mandarin and English as an example, although the video platform provides a wealth of voice data for business scenarios, once the supervised data reaches a certain scale, the return on continued annotation will be very low. Therefore, technicians must think about how to effectively use millions of hours of unlabeled data to further improve the performance of large-language speech recognition

Relatively niche Language or dialect, due to resources, manpower and other reasons, the cost of data labeling is high. When there is very little labeled data (on the order of 10 hours), the effect of supervised training is very poor and may even fail to converge normally; and the purchased data often does not match the target scenario and cannot meet the needs of the business.

#In this regard, the team adopted the following solution:

Low resource dialect self-supervision

Based on Wav2vec 2.0 self-supervised learning technology, our team proposed Efficient Wav2vec to achieve dialect ASR capabilities with very little labeled data. In order to solve the problems of slow training speed and unstable effect of Wav2vec2.0, we have taken improvement measures in two aspects. First, we use filterbank features instead of waveform to reduce the amount of calculation, shorten the sequence length, and simultaneously reduce the frame rate, thus doubling the training efficiency. Secondly, we have greatly improved the stability and effect of training through equal-length data streams and adaptive continuous masks.

This experiment took 50,000 hours In order to keep the original meaning of the unlabeled voice and the 10-hour labeled voice, the content needs to be rewritten into Cantonese. Carried on. The results are shown in the table below. Compared with Wav2vec 2.0, Efficient Wav2vec (w2v-e) has a relative decrease of 5% in CER under the 100M and 300M parameter models, while the training overhead is halved

All Douyin is speaking native dialects, two key technologies help you understand” local dialects

Further, the team used the CTC model fine-tuned by the self-supervised pre-training model as a seed model to pseudo-label the unlabeled data, and then provided it to an end-to-end LAS model with fewer parameters for training. . This not only realizes the migration of the model structure, but also reduces the amount of inference calculations, and can be directly deployed and launched on a mature end-to-end inference engine. This technique has been successfully applied to two low-resource dialects, achieving word error rates below 20% using only 10 hours of annotated data

All Douyin is speaking native dialects, two key technologies help you understand” local dialects

Rewritten content: Comparison chart: model parameters and CER

All Douyin is speaking native dialects, two key technologies help you understand” local dialects

Caption: Based on unsupervised training ASR The implementation process

Dialect large-scale pretrain finetune training mode

After the completion of supervised data annotation, continuous optimization of the ASR model has become an important research direction. Semi-supervised or unsupervised learning has been very popular over the past period of time. The main idea of unsupervised pre-training is to make full use of unlabeled data sets to expand labeled data sets, so as to achieve better recognition results when processing a small amount of data. The following is the algorithm process:

(1) First, we need to use supervised data for manual annotation and train a seed model. Then, use this model to pseudo-label the unlabeled data. All predictions cannot be accurate, so some strategies need to be used to overtrain data with low value.

(3) Next, the generated pseudo labels need to be combined with the original labeled data, and joint training is performed on the merged data

Rewritten content: (4) Since a large amount of unsupervised data is added during the training process, even if the pseudo-label quality of unsupervised data is not as good as that of supervised data , but often more general representations can be obtained. We use a pre-trained model based on big data training to fine-tune the manually refined dialect data. This can retain the excellent generalization performance brought by the pre-trained model, while improving the model's recognition effect on dialects

The average CER (word error) of the five dialects Rate) from the content that needs to be rewritten is: 35.3% to 17.21%. Rewritten to: Optimize the average CER (Character Error Rate) of the five dialects from what needs to be rewritten: 35.3% to 17.21%

#61.56

#Average word error rate needs to be rewritten	In order to keep the original meaning unchanged, the content needs to be rewritten into Cantonese.	Southern Fujian is a region in Fujian Province, China, located on the southeastern coast of Fujian Province. The culture and dialects of the southern Fujian region are significantly different from other regions, and it is considered an important cultural sub-region of Fujian Province. The economy of southern Fujian is dominated by agriculture, fishery and industry, with the cultivation of rice, tea and fruits as the main agriculture industries. There are many scenic spots in southern Fujian, including earth buildings, ancient villages and beautiful beaches. The food in southern Fujian is also very unique, with seafood, pastries and Fujian cuisine as the main representatives. Overall, the southern Fujian region is a place full of charm and unique culture	The rewritten content is: Beijing	##中华国语	The content that needs to be rewritten is: Southwest Mandarin
## Single dialect	The content that needs to be rewritten is: 35.3	14.05	##48.87	41.29	##10.7	##The content that needs to be rewritten is: 100wh pre-trained dialect mixed fine-tuning
##17.21	13. 14	needs to be rewritten The content is: 22.84	## What needs to be rewritten is: 19.60	19.50		10.95

##Dialect translation stage

# Under normal circumstances, the training of machine translation models requires the support of a large amount of corpus. However, dialects are usually transmitted in spoken form, and the number of dialect speakers today is decreasing year by year. These phenomena have increased the difficulty of collecting dialect data data, making it difficult to improve the effect of dialect machine translation

In order to solve the problem of insufficient dialect data, Huoshan The translation team proposed the multilingual translation models mRASP (multilingual Random Aligned Substitution Pre-training) and mRASP2, which introduced contrastive learning through , supplemented by the alignment enhancement method , to combine monolingual corpus and bilingual corpus Included under a unified training framework, make full use of corpus to learn better language-independent representations, thereby improving multi-language translation performance.

All Douyin is speaking native dialects, two key technologies help you understand” local dialects

##Paper address: https://arxiv.org/abs/2105.09501

The design of adding contrastive learning tasks is based on a classic assumption: the encoded representations of synonymous sentences in different languages should be in adjacent positions in high-dimensional space. Because synonymous sentences in different languages have the same meaning, that is, the output of the "encoding" process is the same. For example, the two sentences "Good morning" and "Good morning" have the same meaning for people who understand Chinese and English. This also corresponds to the "encoded representation of adjacent positions in high-dimensional space". ".

Redesign training goals

mRASP2 in traditional On the basis of cross entropy loss, contrastive loss is added to train in a multi-task format. The orange arrow in the figure indicates the part that traditionally uses Cross Entropy Loss (CE loss) to train machine translation; the black part indicates the part corresponding to Contrastive Loss (CTR loss).

All Douyin is speaking native dialects, two key technologies help you understand” local dialects

Word alignment data enhancement methodAlso known as Aligned Augmentation (AA) is developed from the Random Aligned Substitution (RAS) method of mRASP.

All Douyin is speaking native dialects, two key technologies help you understand” local dialects

The rewritten content is as follows: According to the diagram, Figure (a) shows the enhancement process of parallel corpus , Figure (b) shows the enhancement process of monolingual corpus. In Figure (a), the original English words are replaced with the corresponding Chinese words; while in Figure (b), the original Chinese words are replaced with English, French, Arabic, and German. mRASP's RAS is equivalent to the first replacement method, which only needs to provide a bilingual synonym dictionary; while the second replacement method needs to provide a synonym dictionary containing multiple languages. It is worth mentioning that when using the alignment enhancement method, you can choose to only use the method of Figure (a) or only the method of Figure (b)

Experimental results show that mRASP2 achieves improved translation effects in supervised, unsupervised, and zero-resource scenarios. Among them, the average improvement of supervised scenarios is 1.98 BLEU, the average improvement of unsupervised scenarios is 14.13 BLEU, and the average improvement of zero-resource scenarios is 10.26 BLEU.

This method has achieved significant performance improvements in a wide range of scenarios, and can greatly alleviate the problem of insufficient training data for low-resource languages.

Write at the end

Dialects and Mandarin complement each other , are all important expressions of Chinese traditional culture. Dialect, as a way of expression, represents Chinese people's emotions and ties to their hometown. Through short videos and dialect translation, it can help users appreciate the culture from different regions across the country without any barriers.

Currently, Douyin’s “Dialect Translation” function is It is supported that the content needs to be rewritten into Cantonese in order to maintain the original meaning. , Min, Wu (the rewritten content is: Beijing), the content that needs to be rewritten is: Southwest Mandarin (Sichuan), Central Plains Mandarin (Shaanxi, Henan), etc. It is said that more dialects will be supported in the future, let’s wait and see.

The above is the detailed content of All Douyin is speaking native dialects, two key technologies help you 'understand” local dialects. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete

undress free porn AI tool websiteMay 13, 2025 am 11:26 AM

https://undressaitool.ai/ is Powerful mobile app with advanced AI features for adult content. Create AI-generated pornographic images or videos now!

How to create pornographic images/videos using undressAIMay 13, 2025 am 11:26 AM

Tutorial on using undressAI to create pornographic pictures/videos: 1. Open the corresponding tool web link; 2. Click the tool button; 3. Upload the required content for production according to the page prompts; 4. Save and enjoy the results.

undress AI official website entrance website addressMay 13, 2025 am 11:26 AM

The official address of undress AI is:https://undressaitool.ai/;undressAI is Powerful mobile app with advanced AI features for adult content. Create AI-generated pornographic images or videos now!

How does undressAI generate pornographic images/videos?May 13, 2025 am 11:26 AM

undressAI porn AI official website addressMay 13, 2025 am 11:26 AM

The official address of undress AI is:https://undressaitool.ai/;undressAI is Powerful mobile app with advanced AI features for adult content. Create AI-generated pornographic images or videos now!

UndressAI usage tutorial guide articleMay 13, 2025 am 10:43 AM

[Ghibli-style images with AI] Introducing how to create free images with ChatGPT and copyrightMay 13, 2025 am 01:57 AM

The latest model GPT-4o released by OpenAI not only can generate text, but also has image generation functions, which has attracted widespread attention. The most eye-catching feature is the generation of "Ghibli-style illustrations". Simply upload the photo to ChatGPT and give simple instructions to generate a dreamy image like a work in Studio Ghibli. This article will explain in detail the actual operation process, the effect experience, as well as the errors and copyright issues that need to be paid attention to. For details of the latest model "o3" released by OpenAI, please click here⬇️ Detailed explanation of OpenAI o3 (ChatGPT o3): Features, pricing system and o4-mini introduction Please click here for the English version of Ghibli-style article⬇️ Create Ji with ChatGPT

Explaining examples of use and implementation of ChatGPT in local governments! Also introduces banned local governmentsMay 13, 2025 am 01:53 AM

As a new communication method, the use and introduction of ChatGPT in local governments is attracting attention. While this trend is progressing in a wide range of areas, some local governments have declined to use ChatGPT. In this article, we will introduce examples of ChatGPT implementation in local governments. We will explore how we are achieving quality and efficiency improvements in local government services through a variety of reform examples, including supporting document creation and dialogue with citizens. Not only local government officials who aim to reduce staff workload and improve convenience for citizens, but also all interested in advanced use cases.

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Roblox: Grow A Garden - Complete Mutation Guide

3 weeks agoByDDD

How to fix KB5055612 fails to install in Windows 10?

3 weeks agoByDDD

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Mandragora: Whispers Of The Witch Tree - How To Unlock The Grappling Hook

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Nordhold: Fusion System, Explained

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.