search
HomeTechnology peripheralsAIWang Wenbing, head of Rokid algorithm: 'Sound' under AR is in a 'wonderful' state

Sound is ubiquitous in our daily lives and is an indispensable part, and the same is true in the metaverse world. In order to achieve a full range of immersion in the scenes of the Metaverse, the continuous upgrading and development of various sound technologies are required. At the "AISummit Global Artificial Intelligence Technology Conference" held recently by 51CTO, Wang Wenbing, the head of Rokid algorithm, published a speech The keynote speech "Sound in AR under "Wonderful" Land" introduced the concept of Rokid's self-developed 6DoF spatial sound field, the main technical modules, technical difficulties, the development trend of combining with AR and the original intention of developing the technology, explaining the spatial sound field An important manifestation of technology in the metaverse world.

The speech content is now organized as follows:

What is the 6dof spatial sound field?

When talking about this issue, you can first put aside the technical limitations and imagine how the sound on AR should be presented. In fact, most of the TVs and mobile phones we use now are two-channel like stereo. Home theaters have already used multi-channel. Professional scenes such as movie theaters also have speakers in the spatial layout.

How should it be presented on AR? We can imagine a scene, such as online meetings or online education that are very popular now. If you see the digital person on the right talking all the time in the metaverse world, but the voice comes from your left, does it feel weird at this time?

In addition, we can imagine an AR game. In the previous 2D vision, the sound can move with the focus of the vision, but in the 360-degree range of the 3D scene, Human eyes cannot grasp the entire visual focus, but sound has global focus. This is why in many games, people will switch perspectives according to the sound. Therefore, we can see some of the characteristics that sound on AR needs to have: it needs to meet people's high sensitivity to sound, the global focus of sound, and the realism requirements of sound.

Next, we will introduce the development path of sound form from three dimensions.

Wang Wenbing, head of Rokid algorithm: 'Sound' under AR is in a 'wonderful' state

First, the spatial expression dimension. The expression dimension of the entire sound ranges from mono/stereo to multi-channel in the plane of 5.1/7.1/9.1/..., to multi-channel in the space of 5.1.x/7.1.x, etc. There are more and more speakers, and their placement has increased from plane to space;

Second, the dimension of encoding methods. From the very beginning, channel-based (that is, channel-based encoding, each channel will have a variety of sounds, such as our usual left and right channel expressions), to object-based (also That is to code the object that happened), including the Dolby Atmos film source that everyone watched in the cinema. For example, when a cannonball is shot down, the object of that cannonball is specially coded, and its movement trajectory is recorded in the metadata, and then Playback is based on the corresponding speaker position; but our ultimate goal is to achieve an effect completely based on the scene, similar to the panoramic sound method such as HOA, not just the cannonballs, we all hope that every flower, grass and leaf will fall. It has a sense of space.

#Third, the XR experience dimension. In the past, virtual sound was separated from the real world. Now in XR, especially in AR, what we have been doing is the integration of virtual and reality.

The reason why people can distinguish sounds in such fine detail is because of the binaural mode, technically speaking it is ITD and ILD, which is the time difference and sound intensity difference between the two ears. These two differences will help us quickly locate the direction of the object's sound.

So how to make 3D sound popular? How to break through venue limitations? How to reduce user consumption costs? How can everyone enjoy technology? Rokid's self-developed 6dof spatial sound field will help solve these problems.

6dof spatial sound field can be divided into two parts from the name: 6dof and spatial sound field. 6dof mainly expresses six degrees of freedom. The gyroscope provides rotation around the three directions of XYZ, and the accelerometer provides acceleration in the three directions of XYZ.

6dof spatial sound field involves the generation, dissemination, rendering, encoding and decoding of sound, as well as the fusion and interaction of virtual and real sounds throughout the process.

Wang Wenbing, head of Rokid algorithm: 'Sound' under AR is in a 'wonderful' state

The main technology of 6dof spatial sound field

The main technical modules of 6dof spatial sound field include HRTFs, sound field rendering and sound effects. HRTFs is the impact function of the sound source from the free field to the eardrum. It is the process of transmitting all-round sound to the human ear in a simulated anechoic chamber environment. Sound field rendering can give people the ability to distinguish the position of sounds by listening, and can blend virtual and real objects to perfectly handle the impact of real objects on virtual sound sources. The sound effect is to enrich the sound quality by using open speakers designed for privacy to reduce sound leakage and ensure volume.

Wang Wenbing, head of Rokid algorithm: 'Sound' under AR is in a 'wonderful' state

The SDK at the top of the architecture diagram provides external spatial modules, namely the spatial engine export and the speech engine export. Spatial information can be acquired and modeled, helping to integrate the digital and physical worlds.

In addition, we have also made some modifications to Room Effect. Its overall framework is similar to the classic network structure. First, the network is constructed, and then a theoretical lossless network is generated. Then, based on this theory, various attenuation and loss related settings are made, including absorption, occlusion, reflection, etc. In fact, our own purpose is not to produce various sound effects. We just provide sound effects based on the usage scenarios of the product, such as theaters or music, so that users can achieve a good audio-visual experience. These can be experienced on the next-generation AR glasses Rokid Max. .

6dof spatial sound field comparison. The left side is the effect of a third-party SDK. When rotating from 0 degrees to 90 degrees, the change of each frequency is not smooth, and the decrease is sharp at first, and the subsequent changes are very small. The 6dof spatial sound field made by Rokid on the right has obvious changes in different frequency bands as your position changes. The picture shows the performance of different angles, different frequency bands, and different amplitudes.

Wang Wenbing, head of Rokid algorithm: 'Sound' under AR is in a 'wonderful' state

The development trend of 6dof space sound field

With the era of metaverse With the advent of 2020 and the rise of AR and VR technologies, the development of spatial sound fields has also ushered in new opportunities.

The development trend of spatial sound fields is mainly reflected in three aspects:

First, immersion, people can follow the real world Provide feedback to better integrate and interact virtual and real, and truly achieve an immersive experience. All sounds in the virtual world should not be free from the influence of any objects in the real world, because this will make people feel that it is still separate. In addition to integration, interaction is also required. For example, in the virtual world, you can interact with the enhanced sound on the AR terminal through different methods such as voice and gestures, to choose to pause, play, or switch windows of different levels and perspectives, or to feel your own way. Voices of interest and more.

The second is refinement, which involves refined exploration and practice in different aspects such as HRTF, resolution, test methods, and customization. The more difficult thing to refine is the head pass, because the generation method of the head pass itself is more time-consuming and laborious. It needs to play every point at different distances in the entire spherical space, and then sample the ear canal. Currently, some scholars are studying how to generate the same degree of refinement with fewer sampling points, and how to achieve higher accuracy through interpolation or other technical means; at the same time, from a longer-term perspective, the refinement One limit is customized implementation.

#The third is privacy and sound effects, and experience the auditory feast brought by sounds in different frequency bands. Different harmonics or different frequency bands give us different feelings. For example, severe reverberation will affect human hearing, while appropriate reverberation will bring rich listening experience in terms of sound quality; especially early reverberation, it is often used to judge timbre, below 3K The reverberation and lateral reflection will help create a better sense of space and depth, while the high-frequency component will help us achieve a sense of surround.

Wang Wenbing, head of Rokid algorithm: 'Sound' under AR is in a 'wonderful' state

The original intention of exploring spatial sound fields

Why does Rokid create spatial sound fields? There are three main reasons:

First, immersion. We have been pursuing the integration of the digital world and the physical world, such as the vividness when playing games, the reality of online meetings or online education.

Second, virtual and real interaction. We believe that the future in this world will be a fusion of reality and reality. Based on the fusion, many interactions can be made, including the process of spatial perception, the interaction of subjective behaviors, etc. Spatial perception refers to aspects of the world such as the size of objects, the size of space, materials, etc. This perception then forms an impact on virtual sounds; the interaction of subjective behavior is human intervention, selection, and interaction with sounds in the digital world. communicate.

Three, ultimate quality. AR Glass is different from mobile phones, tablets, TVs and other products. When you use your mobile phone, network disconnection or lag is tolerable, but the real-time requirements for AR Glass worn on your eyes are very high. How can we achieve this high real-time requirement? This involves the overall optimization of algorithms, engineering, systems, hardware, and applications.

These are the missions we have been pursuing. Rokid hopes to directly promote and popularize these capabilities to the public through AR Glass products; at the same time, we also hope to use these technologies as part of our Yoda OS The basic capabilities are released, thereby indirectly benefiting users and empowering all walks of life through the use of developers.

Now the conference speech replay and PPT are online, go to the official website to view the exciting content (https://www.php.cn/link/53253027fef2ab5162a602f2acfed431

The above is the detailed content of Wang Wenbing, head of Rokid algorithm: 'Sound' under AR is in a 'wonderful' state. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete
[Ghibli-style images with AI] Introducing how to create free images with ChatGPT and copyright[Ghibli-style images with AI] Introducing how to create free images with ChatGPT and copyrightMay 13, 2025 am 01:57 AM

The latest model GPT-4o released by OpenAI not only can generate text, but also has image generation functions, which has attracted widespread attention. The most eye-catching feature is the generation of "Ghibli-style illustrations". Simply upload the photo to ChatGPT and give simple instructions to generate a dreamy image like a work in Studio Ghibli. This article will explain in detail the actual operation process, the effect experience, as well as the errors and copyright issues that need to be paid attention to. For details of the latest model "o3" released by OpenAI, please click here⬇️ Detailed explanation of OpenAI o3 (ChatGPT o3): Features, pricing system and o4-mini introduction Please click here for the English version of Ghibli-style article⬇️ Create Ji with ChatGPT

Explaining examples of use and implementation of ChatGPT in local governments! Also introduces banned local governmentsExplaining examples of use and implementation of ChatGPT in local governments! Also introduces banned local governmentsMay 13, 2025 am 01:53 AM

As a new communication method, the use and introduction of ChatGPT in local governments is attracting attention. While this trend is progressing in a wide range of areas, some local governments have declined to use ChatGPT. In this article, we will introduce examples of ChatGPT implementation in local governments. We will explore how we are achieving quality and efficiency improvements in local government services through a variety of reform examples, including supporting document creation and dialogue with citizens. Not only local government officials who aim to reduce staff workload and improve convenience for citizens, but also all interested in advanced use cases.

What is the Fukatsu-style prompt in ChatGPT? A thorough explanation with example sentences!What is the Fukatsu-style prompt in ChatGPT? A thorough explanation with example sentences!May 13, 2025 am 01:52 AM

Have you heard of a framework called the "Fukatsu Prompt System"? Language models such as ChatGPT are extremely excellent, but appropriate prompts are essential to maximize their potential. Fukatsu prompts are one of the most popular prompt techniques designed to improve output accuracy. This article explains the principles and characteristics of Fukatsu-style prompts, including specific usage methods and examples. Furthermore, we have introduced other well-known prompt templates and useful techniques for prompt design, so based on these, we will introduce C.

What is ChatGPT Search? Explains the main functions, usage, and fee structure!What is ChatGPT Search? Explains the main functions, usage, and fee structure!May 13, 2025 am 01:51 AM

ChatGPT Search: Get the latest information efficiently with an innovative AI search engine! In this article, we will thoroughly explain the new ChatGPT feature "ChatGPT Search," provided by OpenAI. Let's take a closer look at the features, usage, and how this tool can help you improve your information collection efficiency with reliable answers based on real-time web information and intuitive ease of use. ChatGPT Search provides a conversational interactive search experience that answers user questions in a comfortable, hidden environment that hides advertisements

An easy-to-understand explanation of how to create a composition in ChatGPT and prompts!An easy-to-understand explanation of how to create a composition in ChatGPT and prompts!May 13, 2025 am 01:50 AM

In a modern society with information explosion, it is not easy to create compelling articles. How to use creativity to write articles that attract readers within a limited time and energy requires superb skills and rich experience. At this time, as a revolutionary writing aid, ChatGPT attracted much attention. ChatGPT uses huge data to train language generation models to generate natural, smooth and refined articles. This article will introduce how to effectively use ChatGPT and efficiently create high-quality articles. We will gradually explain the writing process of using ChatGPT, and combine specific cases to elaborate on its advantages and disadvantages, applicable scenarios, and safe use precautions. ChatGPT will be a writer to overcome various obstacles,

How to create diagrams using ChatGPT! Illustrated loading and plugins are also explainedHow to create diagrams using ChatGPT! Illustrated loading and plugins are also explainedMay 13, 2025 am 01:49 AM

An efficient guide to creating charts using AI Visual materials are essential to effectively conveying information, but creating it takes a lot of time and effort. However, the chart creation process is changing dramatically due to the rise of AI technologies such as ChatGPT and DALL-E 3. This article provides detailed explanations on efficient and attractive diagram creation methods using these cutting-edge tools. It covers everything from ideas to completion, and includes a wealth of information useful for creating diagrams, from specific steps, tips, plugins and APIs that can be used, and how to use the image generation AI "DALL-E 3."

An easy-to-understand explanation of ChatGPT Plus' pricing structure and payment methods!An easy-to-understand explanation of ChatGPT Plus' pricing structure and payment methods!May 13, 2025 am 01:48 AM

Unlock ChatGPT Plus: Fees, Payment Methods and Upgrade Guide ChatGPT, a world-renowned generative AI, has been widely used in daily life and business fields. Although ChatGPT is basically free, the paid version of ChatGPT Plus provides a variety of value-added services, such as plug-ins, image recognition, etc., which significantly improves work efficiency. This article will explain in detail the charging standards, payment methods and upgrade processes of ChatGPT Plus. For details of OpenAI's latest image generation technology "GPT-4o image generation" please click: Detailed explanation of GPT-4o image generation: usage methods, prompt word examples, commercial applications and differences from other AIs Table of contents ChatGPT Plus Fees Ch

Explaining how to create a design using ChatGPT! We also introduce examples of use and promptsExplaining how to create a design using ChatGPT! We also introduce examples of use and promptsMay 13, 2025 am 01:47 AM

How to use ChatGPT to streamline your design work and increase creativity This article will explain in detail how to create a design using ChatGPT. We will introduce examples of using ChatGPT in various design fields, such as ideas, text generation, and web design. We will also introduce points that will help you improve the efficiency and quality of a variety of creative work, such as graphic design, illustration, and logo design. Please take a look at how AI can greatly expand your design possibilities. table of contents ChatGPT: A powerful tool for design creation

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Dreamweaver Mac version

Dreamweaver Mac version

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

EditPlus Chinese cracked version

EditPlus Chinese cracked version

Small size, syntax highlighting, does not support code prompt function

MinGW - Minimalist GNU for Windows

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

SecLists

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.