Siri is becoming more and more 'popular'. What breakthroughs will there be in smart voice in the future?-AI-php.cn

Home

Technology peripherals

Siri is becoming more and more 'popular'. What breakthroughs will there be in smart voice in the future?

王林

May 06, 2023 pm 01:07 PM

aisummitIntelligent voice

For human-computer interaction, how to make machines have good hearing has been a goal pursued unremittingly in the field of AI in recent years. Around 2009, deep learning model applications began to leave the academic world, and intelligent speech technology represented by speech awakening, recognition, enhancement, and synthesis also gradually matured.

A typical early example is the birth of siri in 2011. Intelligent voice has become a new leap in the way of communication and interaction between humans and machines. After more than ten years of development, "Hey, Siri"-style human-machine question and answer is no longer limited to mobile terminal devices, has entered thousands of households, and is widely used in various scenarios: smart speakers for home companions, and Tmall Genie for convenient online shopping. , simultaneous translation at meetings, car voice navigation assistants when traveling, etc.

As more and more Internet companies and upstream manufacturers actively deploy in the intelligent voice track, products such as intelligent voice customer service, conversational AI applications, and AI virtual assistants have achieved great success. With further quality improvement, the response voice is more natural, the understanding of questions is more accurate, and it has its own "little emotions".

In the era of digitalization, the trend of interconnection of everything is unstoppable. Intelligent voice, as the key interface for current human-computer interaction, is in a period of deep integration and collision with the real economy. With the further development and expansion of application scenarios, we have also seen many challenging problems, such as: how to identify the speaker's identity, how to identify dialects, how to eliminate ambiguity, etc. are the latest research hotspots.

Behind the maturity of a technology, there is often some potential, including its innovative ability in practical applications and its more potential evolution direction. Looking to the next stage, intelligent voice technology will also see new evolution trends. For example: Can deeply integrated AI voice chips replace the cloud model running model? Can innovative research on multi-modal fusion, unsupervised learning, and cross-integration of brain disciplines achieve breakthrough results? We'll see.

So, what real production problems have been encountered in the practical exploration of intelligent voice technology in major enterprises? How was it solved? What progress has been made? What new changes have occurred in the industry? What are the next development trends? The "AISummit Global Artificial Intelligence Technology Conference" intelligent voice technology special session will bring you in-depth thinking!

On August 7th, the “AISummit Global Artificial Intelligence Technology Conference” dedicated to intelligent voice created by 51CTO is coming!

What special topics are you interested in?

Topic 1: Zuoyebang Speech Technology Practice

1. Exploration of speech recognition technology: Share speech recognition technology in large-scale practical application scenarios such as end-to-end, efficient use of data, etc. And a hot word technical solution based on prefix automata was proposed.

2. Speech evaluation technology practice: In terms of speech pronunciation error correction technology, combined with the high-concurrency scenario of homework help, a multi-task knowledge transfer and multi-modal feature fusion solution is proposed, which is very significant. To a certain extent, the model's factor discrimination ability and error detection ability in a noisy environment are improved. In view of the difficulty in implementing voice evaluation, a high-performance cloud-based integrated evaluation technology was proposed.

3. Speech synthesis technology framework: Share the thoughts and practices of Zuoyebang on further improvements based on the existing small data volume speech technology framework.

Topic 2: Application of byte speech recognition technology in Feishu

1. Application process of speech recognition technology in office scenarios: office emails, instant messaging Voice input in office voice assistant, real-time subtitles & post-meeting transcription.

2. Solution thinking: Make meetings intelligent and improve efficiency.

3. Challenges and opportunities: Challenges of speech recognition tasks, challenges brought by downstream tasks, and meetings provide additional information.

4. Introduction to key algorithm work (end-to-end speech recognition system): Transducer & CIF, dynamic and static hot words, Context-aware.

Topic 3: Practice of building a high-level speech synthesis system

1. Background introduction and problem analysis of high-level speech synthesis system.

2. Design thinking and implementation of high-level speech synthesis system.

3. Experimental evaluation.

4. Future work prospects.

1. End-to-end speech recognition in SOUL social metaverse scenarios

2. Construction route of multi-modal speech synthesis technology

3. Application in business scenarios such as voice security and voice interaction

Topic 5 : The exploration and practice of end-to-end speech recognition technology in 58.com

1. Application scenarios of speech recognition in 58.com: AI intelligent voice application, speech recognition link introduction, challenges and technical routes

2. Model optimization work based on WeNet: semi-supervised training, Efficient Conformer, model compression

3. End-to-end speech recognition deployment plan ：What are the important guests in the self-developed engine architecture, Wenet decoding service deployment, and streaming/non-streaming decoding performance testing

?

1. Song Yang, chief algorithm expert, head of intelligent middle office, and special producer of Zuoyebang

Song Yang has worked at Baidu for 7 years and is engaged in algorithm research and development. Joined Zuoyebang in 2015 as the head of the intelligent middle office department, providing middle office technical capabilities including data mining, NLP, and voice for the company's various businesses. He has been responsible for search and Q&A, personalized recommendations, intelligent quality inspection, voice evaluation, Intelligent service scheduling and other directions.

2. Wang Qiangqiang, head of the speech technology team of Zuoyebang

Before joining Zuoyebang, Wang Qiangqiang worked at the Department of Electronic Engineering, Tsinghua University, in Speech Processing and Machinery The intelligent laboratory is responsible for implementing speech recognition algorithms and building industrial-grade solutions. Joined Zuoyebang in 2018 and is responsible for the research and implementation of speech-related algorithms. He has led the implementation of speech recognition, evaluation, synthesis and other algorithms in Zuoyebang, providing the company with a complete set of voice technology solutions.

3. Zhang Jun, speech recognition algorithm researcher at ByteDance AI Lab

Zhang Jun has long been engaged in the research and application of speech algorithms such as speech recognition and voice wake-up, and has rich experience. . In 2018, he joined the ByteDance AI Lab intelligent voice team and is currently mainly responsible for the construction of voice technology solutions in the areas of intelligent office, intelligent hardware, and intelligent customer service.

4. Tan Xu, Researcher in Charge of Microsoft Research Asia

Tan Xu’s research fields include deep learning, natural language/speech/music, AI content generation, etc. The machine translation and speech synthesis system developed has won multiple competition championships and reached human level in academic evaluation sets. Research work such as pre-training language model MASS, speech synthesis model FastSpeech/NaturalSpeech, and AI music project Muzic have received widespread attention in the industry.

5. Liu Zhongliang, head of SOUL speech algorithm

Liu Zhongliang graduated from the Graduate School of the Chinese Academy of Sciences with a master's degree. He currently serves as the head of speech algorithm at SOUL. He once worked at Sogou AI Interaction Department and Momo Big Data Department. In the past 10 years, he has been mainly engaged in the research and development of speech technology systems such as voice wake-up, speech recognition, speech synthesis, and audio music understanding. It is mainly used in voice interaction and speech understanding business scenarios such as input methods, mobile assistants, smart hardware, and voice security. He is committed to Create the best implementable voice technology.

6. Zhou Wei, head of the speech algorithm department and algorithm architect of 58.com AI Lab

Zhou Wei, head of the speech algorithm department and algorithm of 58.com AI Lab Architect, responsible for speech recognition and speech synthesis algorithm development. Graduated with a master's degree from the University of Chinese Academy of Sciences in 2016. After graduation, he participated in entrepreneurship in the direction of conversational AI products. In May 2018, he joined 58.com and has participated in the research and development of NLP algorithms for AI projects such as intelligent customer service, intelligent outbound calls, and intelligent writing. In 2019 He began to focus on the direction of speech algorithms and led the team to independently develop the speech algorithm in the 58 city speech processing engine from 0 to 1.

What other exciting activities are there?

In addition to the wonderful sharing of practical innovations by wonderful AI technology experts, the AISummit Global Artificial Intelligence Technology Conference also prepared a wealth of pre-site and in-site interactive benefits for attendees. Join this event, expand your technical capabilities and network resources, and take home surprise gifts at the same time!

The event includes four interesting interactive games such as "Don't give in", "Work with luck", and "Wise and share the same goals". There will always be an exquisite gift to surprise you! Then, the legendary and mysterious ultimate What will be the grand prize? Waiting for you who love technology to come and reveal the secret on site! (PS: I heard that the earlier you make an appointment to register, the higher your chance of winning the grand prize!)

Siri is becoming more and more popular. What breakthroughs will there be in smart voice in the future?

How to make an appointment quickly?

Click to enter the official website of the AISummit Global Artificial Intelligence Technology Conference. Follow the prompts to completely fill in and submit the information to complete the registration. Scan the QR code to join the official group of the conference, participate in the lottery, and win exquisite gifts such as SONY speakers, Bingdundun, and AI technology books, as well as red envelopes.

Siri is becoming more and more popular. What breakthroughs will there be in smart voice in the future?

The above is the detailed content of Siri is becoming more and more 'popular'. What breakthroughs will there be in smart voice in the future?. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete

Emergency Operator Voice Chatbot: Empowering AssistanceMay 07, 2025 am 09:48 AM

Language models have been rapidly evolving in the world. Now, with Multimodal LLMs taking up the forefront of this Language Models race, it is important to understand how we can leverage the capabilities of these Multimodal model

Microsoft's Phi-4 Reasoning Models Explained SimplyMay 07, 2025 am 09:45 AM

Microsoft isn’t like OpenAI, Google, and Meta; especially not when it comes to large language models. While other tech giants prefer to launch multiple models almost overwhelming the users with choices; Microsoft launches a few,

Top 20 Git Commands Every Developer Should Know - Analytics VidhyaMay 07, 2025 am 09:44 AM

Git can feel like a puzzle until you learn the key moves. In this guide, you’ll find the top 20 Git commands, ordered by how often they are used. Each entry starts with a quick “What it does” summary, followed by an image display

Git Tutorial for BeginnersMay 07, 2025 am 09:36 AM

In software development, managing code across multiple contributors can get messy fast. Imagine several people editing the same document at the same time, each adding new ideas, fixing bugs, or tweaking features. Without a struct

Top 5 PDF to Markdown Converter for Effortless Formatting - Analytics VidhyaMay 07, 2025 am 09:21 AM

Different formats, such as PPTX, DOCX, or PDF, to Markdown converter is an essential tool for content writers, developers, and documentation specialists. Having the right tools makes all the difference when converting any type of

Qwen3 Models: How to Access, Features, Applications, and MoreMay 07, 2025 am 09:18 AM

Qwen has been silently adding one model after the other. Each of its models comes packed with features so big and sizes so quantized that they are just impossible to ignore. After QvQ, Qwen2.5-VL, and Qwen2.5-Omni this year, the

How to Build RAG Systems and AI Agents with Qwen3May 07, 2025 am 09:10 AM

Qwen just released 8 new models as part of its latest family – Qwen3, showcasing promising capabilities. The flagship model, Qwen3-235B-A22B, outperformed most other models including DeepSeek-R1, OpenAI’s o1, o3-mini,

Why Sam Altman And Others Are Now Using Vibes As A New Gauge For The Latest Progress In AIMay 06, 2025 am 11:12 AM

Let's discuss the rising use of "vibes" as an evaluation metric in the AI field. This analysis is part of my ongoing Forbes column on AI advancements, exploring complex aspects of AI development (see link here). Vibes in AI Assessment Tradi

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

How to fix KB5055523 fails to install in Windows 11?

3 weeks agoByDDD

How to fix KB5055518 fails to install in Windows 10?

3 weeks agoByDDD

Roblox: Dead Rails - How To Tame Wolves

1 months agoByDDD

Strength Levels for Every Enemy & Monster in R.E.P.O.

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Roblox: Grow A Garden - Complete Mutation Guide

2 weeks agoByDDD

Hot Tools

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

SublimeText3 Chinese version

Chinese version, very easy to use

VSCode Windows 64-bit Download

A free and powerful IDE editor launched by Microsoft

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software