search
HomeTechnology peripheralsAIXimalaya breaks through the speech overlap problem and wins first place in international conference challenge to accelerate AI innovation

Ximalaya breaks through the speech overlapping problem and wins first place in the International Conference Challenge, accelerating AI innovation

Recently, the multi-channel multi-party conference transcription challenge (M2MeT2.0) of the 2023 international top speech conference ASRU (IEEE Automatic Speech Recognition and Understanding, Automatic Speech Recognition and Understanding) came to a successful conclusion, and the Himalayan Everest Laboratory achieved excellence Achievements and won the championship honors.

喜马拉雅在国际会议挑战赛中突破语音重叠难题斩获第一 加速AI创新

喜马拉雅在国际会议挑战赛中突破语音重叠难题斩获第一 加速AI创新

The ASRU Symposium is the flagship technical event of the IEEE Speech and Language Processing Technical Committee (SLTC), held every two years, bringing together top experts and researchers from academia and industry to discuss a wide range of speech recognition and Understand the problem. The M2MeT2.0 Challenge is a key competition of ASRU in 2023. Its goal is to solve the problem of overlapping speech transcription in offline conference rooms. As a typical "cocktail party scene" where many people talk freely, the meeting scene has always been a difficulty and focus in the field of speech recognition. It is of great significance for developing speech artificial intelligence for meeting scenes and exploring industrial-level solutions to related problems.

It is worth noting that this is not the first time that Himalaya has participated in ASRU’s M2MeT Challenge. In the first M2MeT Challenge, Ximalaya cooperated with the University of Science and Technology of China and won third place in the speaker log track, achieving a log error rate of only 4.05%. In the inaugural challenge, the evaluation uses character error rate (CER) as a metric and only audio is transcribed to text without considering speaker labels. Based on the success of the first session, the M2MeT2.0 Challenge will focus on speaker-related evaluation, promote the practicalization of multi-speaker speech recognition systems, and set up two sub-tracks, limited data and unqualified data.

In order to meet this challenge, the Himalayan Everest Laboratory started from the basic framework of speech recognition and launched technical explorations in aliasing speech detection technology and speaker log technology. Ximalaya achieved excellent first place results in both the limited data set and open data set sub-tracks of the M2MeT2.0 Challenge.

This year’s M2MeT2.0 Challenge data set contains real, multi-scenario, multi-modal large-scale data, covering a variety of conference rooms of different sizes and layouts, simulating various furniture, regular meetings with different themes, and Various indoor noises. These overlapping sounds, such as human voices, TV sounds, fan and air conditioner sounds, keyboard sounds, door opening/closing sounds, bubble sounds, etc., increase the difficulty of the game. By simultaneously using a microphone array to record distant sounds and a headset microphone to record close sounds, accurate transcription of the corresponding speaker's speech is ensured. This data set is of great academic significance for the study of multi-speaker speech recognition and speech overlap problems, and provides real and diverse data resources for finding industrial-level solutions.

All speakers in the M2MeT2.0 Challenge data set are native speakers of Chinese. Ximalaya actively participates in it through a combination of industry, academia and research, and is committed to contributing to the development of China's local speech recognition technology. In the M2MeT2.0 Challenge, Himalaya demonstrated excellent speaker and speech recognition technology (ASR) and demonstrated excellent performance. Its Everest Laboratory team used self-developed speaker recognition, speech enhancement and speech recognition modules. With optimization and experience, significant breakthroughs have been made in speech overlap and multi-speaker environments. By combining deep learning and neural network models, Himalayan Everest Laboratory is able to transcribe and accurately identify and separate the speech of multiple speakers in real time.

Ximalaya related technologies have not only been verified in the ASRU 2023 M2MeT2.0 Challenge, but have also been applied and empowered in Ximalaya AIGC content production. Currently, Ximalaya Automatic Speech Recognition (ASR) technology has been widely used in the AI ​​script function of Ximalaya App. It transcribes the voice content without scripts in the Himalaya platform and outputs the corresponding text, thereby making it easier for the audience to better understand the voice content. . At the same time, for the sound content of the original manuscript, Ximalaya's AI manuscript function uses ultra-long audio and text alignment technology to time-stamp the sound and the manuscript to achieve synchronous highlighting of sound playback and corresponding text, allowing users to It is more convenient to enjoy the content consumption experience of listening and watching at the same time.

喜马拉雅在国际会议挑战赛中突破语音重叠难题斩获第一 加速AI创新

In addition to ASR technology, Himalaya’s TTS (speech synthesis) technology is also at the forefront of the industry and has been widely used in the production of storytelling, news, novels and other content. Using the HiTTS technology framework, Shan Tianfang’s “voice” is perfectly reproduced. According to reports, Ximalaya has launched more than 100 albums synthesized by Shan Tianfang's AI synthesized sounds, and the cumulative playback volume has exceeded 100 million times.

For many years, Himalaya has been conducting in-depth research in the field of AI voice technology. Its Everest Laboratory has long been focused on research and innovation in speech synthesis, emotion analysis, speech recognition and other fields. By participating in the ASRU 2023 M2MeT2.0 Challenge and winning the championship, Himalaya further consolidated its leading position in the field of voice technology and demonstrated its excellent ability to solve complex voice scenarios.

As an online audio platform loved by users, Himalaya has always adhered to the concept of empowering culture with technology, constantly integrating technology with creators and users to improve content production efficiency and provide excellent content experience. Ximalaya will also continue to combine advanced and intelligent voice technology with sound through technological empowerment and the integration of industry, academia and research, to provide users with excellent voice technology products and services.

The above is the detailed content of Ximalaya breaks through the speech overlap problem and wins first place in international conference challenge to accelerate AI innovation. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:搜狐. If there is any infringement, please contact admin@php.cn delete
From Friction To Flow: How AI Is Reshaping Legal WorkFrom Friction To Flow: How AI Is Reshaping Legal WorkMay 09, 2025 am 11:29 AM

The legal tech revolution is gaining momentum, pushing legal professionals to actively embrace AI solutions. Passive resistance is no longer a viable option for those aiming to stay competitive. Why is Technology Adoption Crucial? Legal professional

This Is What AI Thinks Of You And Knows About YouThis Is What AI Thinks Of You And Knows About YouMay 09, 2025 am 11:24 AM

Many assume interactions with AI are anonymous, a stark contrast to human communication. However, AI actively profiles users during every chat. Every prompt, every word, is analyzed and categorized. Let's explore this critical aspect of the AI revo

7 Steps To Building A Thriving, AI-Ready Corporate Culture7 Steps To Building A Thriving, AI-Ready Corporate CultureMay 09, 2025 am 11:23 AM

A successful artificial intelligence strategy cannot be separated from strong corporate culture support. As Peter Drucker said, business operations depend on people, and so does the success of artificial intelligence. For organizations that actively embrace artificial intelligence, building a corporate culture that adapts to AI is crucial, and it even determines the success or failure of AI strategies. West Monroe recently released a practical guide to building a thriving AI-friendly corporate culture, and here are some key points: 1. Clarify the success model of AI: First of all, we must have a clear vision of how AI can empower business. An ideal AI operation culture can achieve a natural integration of work processes between humans and AI systems. AI is good at certain tasks, while humans are good at creativity and judgment

Netflix New Scroll, Meta AI's Game Changers, Neuralink Valued At $8.5 BillionNetflix New Scroll, Meta AI's Game Changers, Neuralink Valued At $8.5 BillionMay 09, 2025 am 11:22 AM

Meta upgrades AI assistant application, and the era of wearable AI is coming! The app, designed to compete with ChatGPT, offers standard AI features such as text, voice interaction, image generation and web search, but has now added geolocation capabilities for the first time. This means that Meta AI knows where you are and what you are viewing when answering your question. It uses your interests, location, profile and activity information to provide the latest situational information that was not possible before. The app also supports real-time translation, which completely changed the AI ​​experience on Ray-Ban glasses and greatly improved its usefulness. The imposition of tariffs on foreign films is a naked exercise of power over the media and culture. If implemented, this will accelerate toward AI and virtual production

Take These Steps Today To Protect Yourself Against AI CybercrimeTake These Steps Today To Protect Yourself Against AI CybercrimeMay 09, 2025 am 11:19 AM

Artificial intelligence is revolutionizing the field of cybercrime, which forces us to learn new defensive skills. Cyber ​​criminals are increasingly using powerful artificial intelligence technologies such as deep forgery and intelligent cyberattacks to fraud and destruction at an unprecedented scale. It is reported that 87% of global businesses have been targeted for AI cybercrime over the past year. So, how can we avoid becoming victims of this wave of smart crimes? Let’s explore how to identify risks and take protective measures at the individual and organizational level. How cybercriminals use artificial intelligence As technology advances, criminals are constantly looking for new ways to attack individuals, businesses and governments. The widespread use of artificial intelligence may be the latest aspect, but its potential harm is unprecedented. In particular, artificial intelligence

A Symbiotic Dance: Navigating Loops Of Artificial And Natural PerceptionA Symbiotic Dance: Navigating Loops Of Artificial And Natural PerceptionMay 09, 2025 am 11:13 AM

The intricate relationship between artificial intelligence (AI) and human intelligence (NI) is best understood as a feedback loop. Humans create AI, training it on data generated by human activity to enhance or replicate human capabilities. This AI

AI's Biggest Secret — Creators Don't Understand It, Experts SplitAI's Biggest Secret — Creators Don't Understand It, Experts SplitMay 09, 2025 am 11:09 AM

Anthropic's recent statement, highlighting the lack of understanding surrounding cutting-edge AI models, has sparked a heated debate among experts. Is this opacity a genuine technological crisis, or simply a temporary hurdle on the path to more soph

Bulbul-V2 by Sarvam AI: India's Best TTS ModelBulbul-V2 by Sarvam AI: India's Best TTS ModelMay 09, 2025 am 10:52 AM

India is a diverse country with a rich tapestry of languages, making seamless communication across regions a persistent challenge. However, Sarvam’s Bulbul-V2 is helping to bridge this gap with its advanced text-to-speech (TTS) t

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Safe Exam Browser

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

mPDF

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

WebStorm Mac version

WebStorm Mac version

Useful JavaScript development tools