Home > Article > Technology peripherals > my country has made important progress in formulating the AVS3 real-time voice standard, and Tencent's solution was selected
According to official news from the New Generation Artificial Intelligence Alliance, the AVS3P10 real-time speech coding standard has made important progress recently. The news was released on this site on December 14.
On December 14, 2023, the 87th AVS Working Group The conference opened in Chengdu. At the meeting, "Intelligent Media Coding Part 10 Real-time Speech" (hereinafter referred to as AVS3P10) WD 1.0 was reviewed by the plenary meeting; The technical solution submitted by Tencent was selected as the RM0 baseline of AVS3P10 real-time speech coding.
Real-time voice communication technology (this site’s note: RTC, Real-time Communication) has been widely used in collaborative office, interactive entertainment, social networking, etc. field. The above-mentioned diverse and rich application scenarios pose a variety of technical challenges to real-time voice communication technology. Among them, high-quality, low-latency, low-bandwidth, and high-resistance voice coding is a very important part.
At a code rate of 16-20kbps, traditional speech coders such as AVS and ITU-T standards can produce high-quality broadband speech. At 30-35kbps, they can generate high-quality ultra-wideband and even full-band voice. However, when the bit rate is further reduced (for example, below 10kbps), the recovery quality of the traditional speech encoder is significantly reduced, which has an impact on the user experience.
Based on the above application demands, in March this year, for the 84th time At the AVS meeting, Tencent proposed to launch a low-bitrate, high-quality voice system project for real-time voice communication scenarios in the AVS audio group. After demand analysis, at the 85th AVS meeting, AVS officially initiated the AV3P10 real-time speech coding project and issued a technical solicitation through the AVS audio group. The AVS3P10 real-time speech coding project will be promoted and maintained by Xiao Wei from Tencent Conference Teana Lab.
At the 86th AVS meeting, the audio group reviewed the M7886 "AVS3P10 Speech Coding Reference Model Candidate Technical Solution" proposal submitted by Tencent Conference Tianlai Laboratory
The review found that the solution has the following Four features:
Deeply integrates artificial intelligence technologies such as classic signal processing and deep neural network technology, and belongs to AI Codec;
supports low Code rate, high-quality encoding, real-time encoding and decoding and multi-rate encoding;
Based on sub-band encoding and multi-mode encoding architecture, low-frequency signals use deep neural networks to extract features, and high-frequency signals The frequency band expansion scheme is used to extract features, and scalar quantization and entropy coding are combined to complete feature compression;
has the technical characteristics of an open coding neural network architecture, and can ensure forward compatibility of the code stream Re-modify and optimize encoding neural networks.
The China Electronics Technology Standardization Institute and Huawei conducted subjective testing and cross-validation respectively. The cross-validation strives to be comprehensive, based on the ITU-T P.800 DCR subjective quality evaluation system. The subjective test covers pure voice, packet loss voice, mixed voice and other scenarios under different bandwidths, and for the first time, the 3A processed test scenario is introduced into the source coding In the machine test, to test the performance of the new generation AI Codec technology in close to real scenarios.
In the above test scenario,AVS3P10 RM0 has obvious quality advantages. Subjective test results show that AVS3P10 RM0 has achieved MOS points of more than 4.0 in multiple major test scenarios such as broadband and ultra-wideband, showing obvious advantages, with the lowest bit rate reaching 5.9kbps. AVS3P10 RM0 adopts deep neural network technology and has its own packet loss damage capability, which effectively improves the quality of the encoder when the network is poor.
In addition, in the ITU-T P.863 objective quality evaluation experiment, AVS3P10 RM0 also showed significant advantages. First of all, in all eight test bit rates, the MOS value of AVS3P10 RM0 exceeds 4.0, reaching a maximum of 4.45. The quality of AVS3P10 RM0 is comparable to the performance of traditional signal processing encoders such as OPUS and EVS at medium and high bit rates, reaching carrier-grade quality. In the field of AI codecs, AVS3P10 RM0 has a quality advantage of more than 0.6MOS at similar bit rates. The above test results show that AVS3P10 RM0 represents the highest level of current AI codecs
The New Generation Artificial Intelligence Alliance stated thatAVS3P10 real-time speech coding, as a new generation of speech coding and decoding technology standards, is the ideal An important addition to the AVS family of standards.
In the future, the AVS3P10 real-time speech coding project will be promoted according to the established plan, is expected to complete the standardization work in mid-2024.
Advertising Statement: This article contains external jump links (including but not limited to hyperlinks, QR codes, passwords, etc.), which are designed to provide more information and save screening time. The link results are for reference only. Please note that all articles on this site contain this statement
The above is the detailed content of my country has made important progress in formulating the AVS3 real-time voice standard, and Tencent's solution was selected. For more information, please follow other related articles on the PHP Chinese website!