Home > Article > Technology peripherals > Siri is becoming more and more "popular". What breakthroughs will there be in smart voice in the future?
For human-computer interaction, how to make machines have good hearing has been a goal pursued unremittingly in the field of AI in recent years. Around 2009, deep learning model applications began to leave the academic world, and intelligent speech technology represented by speech awakening, recognition, enhancement, and synthesis also gradually matured.
A typical early example is the birth of siri in 2011. Intelligent voice has become a new leap in the way of communication and interaction between humans and machines. After more than ten years of development, "Hey, Siri"-style human-machine question and answer is no longer limited to mobile terminal devices, has entered thousands of households, and is widely used in various scenarios: smart speakers for home companions, and Tmall Genie for convenient online shopping. , simultaneous translation at meetings, car voice navigation assistants when traveling, etc.
As more and more Internet companies and upstream manufacturers actively deploy in the intelligent voice track, products such as intelligent voice customer service, conversational AI applications, and AI virtual assistants have achieved great success. With further quality improvement, the response voice is more natural, the understanding of questions is more accurate, and it has its own "little emotions".
In the era of digitalization, the trend of interconnection of everything is unstoppable. Intelligent voice, as the key interface for current human-computer interaction, is in a period of deep integration and collision with the real economy. With the further development and expansion of application scenarios, we have also seen many challenging problems, such as: how to identify the speaker's identity, how to identify dialects, how to eliminate ambiguity, etc. are the latest research hotspots.
Behind the maturity of a technology, there is often some potential, including its innovative ability in practical applications and its more potential evolution direction. Looking to the next stage, intelligent voice technology will also see new evolution trends. For example: Can deeply integrated AI voice chips replace the cloud model running model? Can innovative research on multi-modal fusion, unsupervised learning, and cross-integration of brain disciplines achieve breakthrough results? We'll see.
So, what real production problems have been encountered in the practical exploration of intelligent voice technology in major enterprises? How was it solved? What progress has been made? What new changes have occurred in the industry? What are the next development trends? The "AISummit Global Artificial Intelligence Technology Conference" intelligent voice technology special session will bring you in-depth thinking!
On August 7th, the “AISummit Global Artificial Intelligence Technology Conference” dedicated to intelligent voice created by 51CTO is coming!
1. Exploration of speech recognition technology: Share speech recognition technology in large-scale practical application scenarios such as end-to-end, efficient use of data, etc. And a hot word technical solution based on prefix automata was proposed.
2. Speech evaluation technology practice: In terms of speech pronunciation error correction technology, combined with the high-concurrency scenario of homework help, a multi-task knowledge transfer and multi-modal feature fusion solution is proposed, which is very significant. To a certain extent, the model's factor discrimination ability and error detection ability in a noisy environment are improved. In view of the difficulty in implementing voice evaluation, a high-performance cloud-based integrated evaluation technology was proposed.
3. Speech synthesis technology framework: Share the thoughts and practices of Zuoyebang on further improvements based on the existing small data volume speech technology framework.
1. Application process of speech recognition technology in office scenarios: office emails, instant messaging Voice input in office voice assistant, real-time subtitles & post-meeting transcription.
2. Solution thinking: Make meetings intelligent and improve efficiency.
3. Challenges and opportunities: Challenges of speech recognition tasks, challenges brought by downstream tasks, and meetings provide additional information.
4. Introduction to key algorithm work (end-to-end speech recognition system): Transducer & CIF, dynamic and static hot words, Context-aware.
1. Background introduction and problem analysis of high-level speech synthesis system.
2. Design thinking and implementation of high-level speech synthesis system.
3. Experimental evaluation.
4. Future work prospects.
1. End-to-end speech recognition in SOUL social metaverse scenarios
2. Construction route of multi-modal speech synthesis technology
3. Application in business scenarios such as voice security and voice interaction
1. Application scenarios of speech recognition in 58.com: AI intelligent voice application, speech recognition link introduction, challenges and technical routes
2. Model optimization work based on WeNet: semi-supervised training, Efficient Conformer, model compression
3. End-to-end speech recognition deployment plan :What are the important guests in the self-developed engine architecture, Wenet decoding service deployment, and streaming/non-streaming decoding performance testing
Song Yang has worked at Baidu for 7 years and is engaged in algorithm research and development. Joined Zuoyebang in 2015 as the head of the intelligent middle office department, providing middle office technical capabilities including data mining, NLP, and voice for the company's various businesses. He has been responsible for search and Q&A, personalized recommendations, intelligent quality inspection, voice evaluation, Intelligent service scheduling and other directions.
Before joining Zuoyebang, Wang Qiangqiang worked at the Department of Electronic Engineering, Tsinghua University, in Speech Processing and Machinery The intelligent laboratory is responsible for implementing speech recognition algorithms and building industrial-grade solutions. Joined Zuoyebang in 2018 and is responsible for the research and implementation of speech-related algorithms. He has led the implementation of speech recognition, evaluation, synthesis and other algorithms in Zuoyebang, providing the company with a complete set of voice technology solutions.
Zhang Jun has long been engaged in the research and application of speech algorithms such as speech recognition and voice wake-up, and has rich experience. . In 2018, he joined the ByteDance AI Lab intelligent voice team and is currently mainly responsible for the construction of voice technology solutions in the areas of intelligent office, intelligent hardware, and intelligent customer service.
Tan Xu’s research fields include deep learning, natural language/speech/music, AI content generation, etc. The machine translation and speech synthesis system developed has won multiple competition championships and reached human level in academic evaluation sets. Research work such as pre-training language model MASS, speech synthesis model FastSpeech/NaturalSpeech, and AI music project Muzic have received widespread attention in the industry.
Liu Zhongliang graduated from the Graduate School of the Chinese Academy of Sciences with a master's degree. He currently serves as the head of speech algorithm at SOUL. He once worked at Sogou AI Interaction Department and Momo Big Data Department. In the past 10 years, he has been mainly engaged in the research and development of speech technology systems such as voice wake-up, speech recognition, speech synthesis, and audio music understanding. It is mainly used in voice interaction and speech understanding business scenarios such as input methods, mobile assistants, smart hardware, and voice security. He is committed to Create the best implementable voice technology.
Zhou Wei, head of the speech algorithm department and algorithm of 58.com AI Lab Architect, responsible for speech recognition and speech synthesis algorithm development. Graduated with a master's degree from the University of Chinese Academy of Sciences in 2016. After graduation, he participated in entrepreneurship in the direction of conversational AI products. In May 2018, he joined 58.com and has participated in the research and development of NLP algorithms for AI projects such as intelligent customer service, intelligent outbound calls, and intelligent writing. In 2019 He began to focus on the direction of speech algorithms and led the team to independently develop the speech algorithm in the 58 city speech processing engine from 0 to 1.
In addition to the wonderful sharing of practical innovations by wonderful AI technology experts, the AISummit Global Artificial Intelligence Technology Conference also prepared a wealth of pre-site and in-site interactive benefits for attendees. Join this event, expand your technical capabilities and network resources, and take home surprise gifts at the same time!
The event includes four interesting interactive games such as "Don't give in", "Work with luck", and "Wise and share the same goals". There will always be an exquisite gift to surprise you! Then, the legendary and mysterious ultimate What will be the grand prize? Waiting for you who love technology to come and reveal the secret on site! (PS: I heard that the earlier you make an appointment to register, the higher your chance of winning the grand prize!)
Click to enter the official website of the AISummit Global Artificial Intelligence Technology Conference. Follow the prompts to completely fill in and submit the information to complete the registration. Scan the QR code to join the official group of the conference, participate in the lottery, and win exquisite gifts such as SONY speakers, Bingdundun, and AI technology books, as well as red envelopes.
The above is the detailed content of Siri is becoming more and more "popular". What breakthroughs will there be in smart voice in the future?. For more information, please follow other related articles on the PHP Chinese website!