Deeply cultivate AI voice multi-modal technology to achieve localized intelligent interactive experience-AI-php.cn

Home

Technology peripherals

Deeply cultivate AI voice multi-modal technology to achieve localized intelligent interactive experience

王林

Sep 17, 2023 pm 01:21 PM

Sound transmissionai voiceMultimodal technology.

With the development of 5G and artificial intelligence technology, intelligent voice has penetrated into people's daily lives with various intelligent terminal products, bringing more convenience and possibilities. As a provider of smart terminal products and mobile Internet services in emerging markets, Transsion focuses on continuous innovation in the field of artificial intelligence, continuously promotes the research and application of AI voice technology, explores more localized user scenario requirements, and brings full-scenario intelligence to users in emerging markets. interactive experience.

At present, Transsion has formed its own underlying AI voice technology capabilities in speech recognition, semantic understanding, speech synthesis, natural language processing, knowledge graphs, etc., has built advantages in small language voice data, and has developed in multilingual voice Major breakthroughs have been made in assistants, digital humans, and voice forgery detection technology. Since the beginning of this year, Transsion's AI technology department has continued to achieve results, winning great results in the ICASSP 2023 SLU Spoken Language Understanding Challenge and the IJCAI 2023 ADD Voice Deep Forgery Detection International Challenge, and published the Digital Human Multi-Model at the international multimedia flagship academic conference ICME 2023. Academic papers related to dynamic interaction.

Building a multilingual voice assistant for local voice interactive content ecosystem

Voice assistant is one of the standard applications of smartphones. Its core technology is voice interaction and natural language understanding, aiming to help users perform target tasks more quickly and efficiently. Faced with the demand for local voice interaction in emerging markets, TRANSSION has been deeply involved in multi-lingual voice assistant technology for a long time, focusing on understanding the needs of local users and forming technical solutions. It has accumulated profound technical capabilities and practical experience in the process of exploration and research and development.

At the top international conference ICASSP in 2023, Transsion AI Technology Department achieved great success in the SLU (Spoken Language Understanding) Challenge. With their excellent performance in speech recognition and semantic understanding, they won first place in the offline voice assistant sub-track with an accuracy of 71.97%. Their entry paper "A Two-Stage System for Spoken Language Understanding" was also included in the IEEE Institute of Electrical and Electronics Engineers

Deeply cultivate AI voice multi-modal technology to achieve localized intelligent interactive experience

Colleagues from Transsion’s AI technology department shared research results at ICASSP 2023

Currently, voice assistants are mainly oriented to mainstream languages, but have less coverage of niche languages, specific groups of people and other subdivided areas. Targeting the local accents and minority languages of users in emerging markets such as Africa and South Asia, TRANSSION has built a localized low-cost, high-quality corpus data production system based on massive mobile phone user resources to solve the problem of lack of corpus and data scarcity in minority languages. . On this basis, Transsion develops multilingual voice assistants that can adapt to the language and cultural characteristics of local users in emerging markets, helping local users more conveniently use local languages to interact with their mobile phones via voice. Currently, Transsion's multilingual voice assistant technology supports voice interaction and natural language understanding capabilities in English, French, Hausa, Arabic, Swahili and other languages, covering contact calls, APP quick launch, music playback, More than 100 usage scenarios such as WhatsApp messaging and chatting

In order to meet the needs of local users in life services, Transsion's multilingual AI voice assistant technology will continue to be applied to more life, travel, study and work scenarios to build a cross-language AI content service Ecosystem enables intelligent voice services to penetrate into all aspects of local life and benefit more people who speak small languages

Deeply cultivate AI voice multi-modal technology to achieve localized intelligent interactive experience

AI digital human technology empowers Transsion’s multi-scenario business

With the accelerated development of interactive intelligence technology, digital humans are moving from technological innovation to industrial application, playing a role in entertainment, education, medical and other fields. Transsion actively embraces AI development opportunities, deploys digital human technology in advance, and has established complete full-link technology and engineering self-research capabilities. Transsion's digital human system includes 2D real people and 3D realistic digital humans. It has data resources based on multilingual speech recognition, speech synthesis, voice awakening, natural language understanding and digital human capabilities. It can be used in multilingual voice dialogue, human design and Appearance, intelligent scene interaction and other areas have formed their own localized characteristics and industry leadership. In January this year, Transsion’s digital human system received the authoritative standard certification in the digital human field issued by the China Academy of Information and Communications Technology. This is also the only digital human system from a Chinese mobile phone manufacturer that has passed the evaluation of China Academy of Information and Communications Technology and is based on "interactive dialogue".

In order to improve the simulation effect of virtual images and synthesize realistic and expressive digital human videos, Transsion AI Technology Department independently developed end-to-end technology. In the process of optimizing the quality of digital human video generation, it proposed based on the Unet network A new technical framework densely-connected Unet structure is developed, and the encoder structure of CLIP is introduced to use text semantic information to improve the animation effect of digital human mouths. At the same time, this technology proposes a probability density map of face key point technology, which increases the modal information of the model network and improves the quality of model generation. This technological breakthrough can make the facial image of digital people more realistic and delicate, while improving the consistency of voice and lip shape. Its generation effect has reached an academically leading level. The related academic paper "CPNet: Exploiting CLIP-based Attention Condenser and Probability Map Guidance for High-fidelity Talking Face Generation" was successfully accepted by the international multimedia flagship academic conference ICME 2023 (IEEE International Conference on Multimedia and Expo).

Deeply cultivate AI voice multi-modal technology to achieve localized intelligent interactive experience

Currently, Transsion Digital Human System has been widely used in multiple business scenarios. It is not only used as a smart shopping guide in overseas mobile phone stores to provide users with a reference for purchasing mobile phones, but can also provide smart voice assistant functions for various smart terminal products to enhance user experience. In the future, Transsion will further utilize "AI digital human" technology to empower businesses in a variety of scenarios, actively explore new business forms such as digital human voice assistants and customer service systems, and bring a new intelligent interactive experience to users

Continue to build the underlying technical capabilities of AI voice

With the rapid development of AI technology today, algorithm-generated audio and audio forgery can be used to confuse fake audio with real audio. It is very difficult for ordinary users to distinguish audio authenticity from fake audio. In order to maintain the credibility of information and ensure social security, voice forgery detection technology has become crucial and has become a new research direction in the field of artificial intelligence. Transsion focuses on the business scenarios of smart terminal products and is guided by local user needs, continuously extending the underlying technical capabilities of AI voice, deploying new technology fields, and making major breakthroughs in voice forgery detection technology.

The Second Audio Deepfake Detection Challenge ADD (The Second Audio Deepfake Detection Challenge) "Tampering Area" organized by Transsion AI Technology Department at IJCAI 2023 (The 32nd International Joint Conference on Artificial Intelligence) Won second place in the Manipulation Region Location track. During the competition, Transsion's AI technology department independently developed innovative AI model algorithms and technologies that can accurately identify and locate speech tampering in audio, thereby effectively ensuring the originality and authenticity of digital audio and building a foundation for AI applications and information security. Provide new ideas. Relevant academic papers have been successfully published at this IJCAI 2023 Workshop on Deepfake Audio Detection and Analysis (DADA 2023) conference.

Deeply cultivate AI voice multi-modal technology to achieve localized intelligent interactive experience

In the next step, Transsion's AI technology department will continue to explore the application of voice deep forgery detection technology on Transsion's smart terminal products, such as call fraud checks to protect user privacy and security, etc., and continuously improve user experience.

In the future, Transsion will continue to work hard in the field of AI voice multi-modal technology, focusing on the core business needs of "mobile phone Internet services home appliances and digital accessories", combined with deep insights into emerging markets and local consumers, to provide users with Smart life experiences that meet their needs form a localized AI content service ecosystem that continues to meet multi-lingual, multi-scenario, personalized, and intelligent application needs.

The above is the detailed content of Deeply cultivate AI voice multi-modal technology to achieve localized intelligent interactive experience. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:搜狐. If there is any infringement, please contact admin@php.cn delete

What is Graph of Thought in Prompt EngineeringApr 13, 2025 am 11:53 AM

Introduction In prompt engineering, “Graph of Thought” refers to a novel approach that uses graph theory to structure and guide AI’s reasoning process. Unlike traditional methods, which often involve linear s

Optimize Your Organisation's Email Marketing with GenAI AgentsApr 13, 2025 am 11:44 AM

Introduction Congratulations! You run a successful business. Through your web pages, social media campaigns, webinars, conferences, free resources, and other sources, you collect 5000 email IDs daily. The next obvious step is

Real-Time App Performance Monitoring with Apache PinotApr 13, 2025 am 11:40 AM

Introduction In today’s fast-paced software development environment, ensuring optimal application performance is crucial. Monitoring real-time metrics such as response times, error rates, and resource utilization can help main

ChatGPT Hits 1 Billion Users? 'Doubled In Just Weeks' Says OpenAI CEOApr 13, 2025 am 11:23 AM

“How many users do you have?” he prodded. “I think the last time we said was 500 million weekly actives, and it is growing very rapidly,” replied Altman. “You told me that it like doubled in just a few weeks,” Anderson continued. “I said that priv

Pixtral-12B: Mistral AI's First Multimodal Model - Analytics VidhyaApr 13, 2025 am 11:20 AM

Introduction Mistral has released its very first multimodal model, namely the Pixtral-12B-2409. This model is built upon Mistral’s 12 Billion parameter, Nemo 12B. What sets this model apart? It can now take both images and tex

Agentic Frameworks for Generative AI Applications - Analytics VidhyaApr 13, 2025 am 11:13 AM

Imagine having an AI-powered assistant that not only responds to your queries but also autonomously gathers information, executes tasks, and even handles multiple types of data—text, images, and code. Sounds futuristic? In this a

Applications of Generative AI in the Financial SectorApr 13, 2025 am 11:12 AM

Introduction The finance industry is the cornerstone of any country’s development, as it drives economic growth by facilitating efficient transactions and credit availability. The ease with which transactions occur and credit

Guide to Online Learning and Passive-Aggressive AlgorithmsApr 13, 2025 am 11:09 AM

Introduction Data is being generated at an unprecedented rate from sources such as social media, financial transactions, and e-commerce platforms. Handling this continuous stream of information is a challenge, but it offers an

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Best Graphic Settings

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution

2 weeks agoByDDD

R.E.P.O. How to Fix Audio if You Can't Hear Anyone

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

WWE 2K25: How To Unlock Everything In MyRise

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software