


The development of brain-computer interface (BCI) in the field of scientific research and application has recently received widespread attention. People are generally curious about the application prospects of BCI.
Aphasia caused by neurological defects not only severely hinders patients' daily life, but may also limit their career development and social activities. With the rapid development of deep learning and brain-computer interface technology, modern science is moving in the direction of assisting aphasic people to regain communication abilities through neural voice prostheses.
The human brain has made a series of exciting developments, and there have been many breakthroughs in decoding signals such as speech and operations. It is particularly worth mentioning that Elon Musk’s Neuralink company has also made breakthrough progress in this field, with their disruptive development of brain interface technology.
The company successfully implanted electrodes in the brain of a test subject, enabling typing, gaming and other functions through simple cursor operations. This marks another step towards higher complexity neuro-speech/motor decoding. Compared with other brain-computer interface technologies, neuro-speech decoding is more complex, and its research and development work mainly relies on a special data source-electrocorticogram (ECoG).
The bed mainly takes care of the electrodermal chart data received during the patient's recovery process. The researchers used these electrodes to collect data on brain activity during vocalizations. These data not only have a high degree of temporal and spatial resolution, but have also achieved remarkable results in speech decoding research, greatly promoting the development of brain-computer interface technology. With the help of these advanced technologies, we are expected to see more people with neurological disorders regain the freedom to communicate in the future.
A breakthrough was achieved in a recent study published in Nature, which used quantified HuBERT features as an intermediate representation in a patient with an implanted device, combined with A pre-trained speech synthesizer converts these features into speech, an approach that not only improves the naturalness of speech but also maintains high accuracy.
However, HuBERT features cannot capture the unique acoustic characteristics of the speaker, and the generated sound is usually a unified speaker's voice, so additional models are still needed to combine this universal sound Convert to a specific patient's voice.
Another noteworthy point is that this study and most previous attempts adopted a non-causal architecture, which may limit its practical use in brain-computer interface applications that require causal operations. use.
On April 8, 2024, New York University VideoLab and Flinker Lab jointly published a breakthrough research in the magazine "Nature Machine Intelligence".
Picture
Paper link: https://www.nature.com/articles/s42256-024-00824-8
Research related code open source is at https://github.com/flinkerlab/neural_speech_decoding
More generated speech examples are at: https://xc1490.github.io/nsd/
This research titled "A neural speech decoding framework leveraging deep learning and speech synthesis" introduces an innovative differentiable speech synthesizer.
The synthesizer combines a lightweight convolutional neural network to encode speech into a series of interpretable speech parameters such as pitch, loudness and formant frequency, and Re-synthesize speech using differentiable techniques.
This study successfully constructed a neural speech decoding system that is highly interpretable and applicable to small data sets by mapping neural signals to these specific speech parameters. This system can not only reconstruct high-fidelity and natural-sounding speech, but also provide an empirical basis for high accuracy in future brain-computer interface applications.
The research team collected a total of data from 48 subjects, and on this basis made an attempt to decode speech, providing a basis for the practical application and development of high-precision brain-computer interface technology. A solid foundation was laid.
Turing Award winner Lecun also forwarded the research progress.
Picture
Research status
In the current neural signal to speech decoding During the research, we faced two core challenges.
The first is the limitation of the amount of data: in order to train a personalized neural-to-speech decoding model, the total amount of data available for each patient is usually only about ten minutes, which makes it difficult to rely on a large amount of training Data is a significant constraint for deep learning models.
Secondly, the high diversity of human speech also increases the complexity of modeling. Even if the same person pronounces and spells the same word repeatedly, factors such as his speaking speed, intonation, and pitch may change, adding additional difficulty to the construction of the model.
In early attempts, researchers mainly used linear models to decode neural signals into speech. This type of model does not require the support of a huge data set and has strong interpretability, but its accuracy is usually low.
Recently, with the advancement of deep learning technology, especially the application of convolutional neural network (CNN) and recurrent neural network (RNN), researchers are simulating the intermediate latent representation of speech. Extensive attempts have been made to improve the quality of synthesized speech.
For example, some studies decode cerebral cortex activity into mouth movements and then convert them into speech. Although this method is more powerful in decoding performance, the reconstructed sounds often sound Not natural enough.
In addition, some new methods try to use Wavenet vocoder and generative adversarial network (GAN) to reconstruct natural-sounding speech. Although these methods can improve the naturalness of the sound, they are There are still limitations in accuracy.
Main model framework
In this study, the research team demonstrated an innovative decoding framework from electroencephalogram (ECoG) signals to speech. They constructed a low-dimensional latent representation space generated by a lightweight speech encoding and decoding model using only speech signals.
This framework contains two core parts: The first is the ECoG decoder, which is responsible for converting the ECoG signal into a series of understandable acoustic speech parameters, such as pitch, whether to pronounce, Loudness and formant frequency, etc.; followed by the speech synthesizer part, which is responsible for converting these parameters into spectrograms.
By building a differentiable speech synthesizer, the researchers were able to train the ECoG decoder while also optimizing the speech synthesizer to jointly reduce the error in spectrogram reconstruction. The strong interpretability of this low-dimensional latent space, combined with the reference speech parameters generated by the lightweight pre-trained speech encoder, makes the entire neural speech decoding framework efficient and adaptable, effectively solving the problem of data scarcity in this field. .
In addition, this framework can not only generate natural speech that is very close to the speaker, but also supports the insertion of multiple deep learning model architectures in the ECoG decoder part and can perform causal operations.
The research team processed the ECoG data of 48 neurosurgery patients and used a variety of deep learning architectures (including convolution, recurrent neural network and Transformer) to achieve ECoG decoding.
These models have shown high accuracy in experiments, especially those using the ResNet convolutional architecture. This research framework not only achieves high accuracy through causal operations and a relatively low sampling rate (10mm interval), but also demonstrates the ability to effectively decode speech from both the left and right hemispheres of the brain, thereby extending the scope of applications of neural speech decoding To the right side of the brain.
Picture
One of the core innovations of this research is the development of a differentiable speech Synthesizer, which greatly improves the efficiency of speech re-synthesis and can synthesize high-fidelity audio close to the original sound.
The design of this speech synthesizer is inspired by the human vocal system and subdivides speech into two parts: Voice (mainly used for vowel simulation) and Unvoice (mainly used for vowel simulation) for the simulation of consonants).
In the Voice part, the fundamental frequency signal is first used to generate harmonics, and then passed through a filter composed of F1 to F6 formants to obtain the spectral characteristics of the vowels.
For the Unvoice part, the corresponding spectrum is generated by performing specific filtering on the white noise. A learnable parameter controls the mixing ratio of the two parts at each time point.
Finally, the final speech spectrum is generated by adjusting the loudness signal and adding background noise.
Based on this speech synthesizer, the research team designed an efficient speech resynthesis framework and neural-speech decoding framework. For detailed frame structure, please refer to Figure 6 of the original article.
Research results
1. Speech decoding results with temporal causality
In this study, the researchers First, a direct comparison of different model architectures, including convolutional networks (ResNet), recurrent neural networks (LSTM), and Transformer architectures (3D Swin), was conducted to evaluate their differences in speech decoding performance.
It is worth noting that these models can perform non-causal or causal operations on time series.
Picture
Decoding the cause and effect of models in applications of brain-computer interfaces (BCI) This has important implications: Causal models only use past and current neural signals to generate speech, while acausal models also refer to future neural signals, which is not feasible in practice.
Therefore, the focus of the research is to compare the performance of the same model when performing causal and non-causal operations. The results show that even the causal version of the ResNet model has performance comparable to the non-causal version, with no significant performance difference between the two.
Similarly, the causal and non-causal versions of the Swin model perform similarly, but the causal version of the LSTM performs significantly lower than its non-causal version. The study also demonstrated average decoding accuracy (total number of 48 samples) for several key speech parameters, including sound weight (the parameter that distinguishes vowels from consonants), loudness, fundamental frequency f0, first formant f1 and second formant f1. Formant f2.
Accurate reconstruction of these speech parameters, particularly the fundamental frequency, sound weight, and first two formants, is critical to achieving accurate speech decoding and natural reproduction of participant voices .
The research results show that both non-causal and causal models can provide reasonable decoding effects, which provides positive inspiration for future related research and applications.
2. Research on speech decoding and spatial sampling rate of left and right brain neural signals
In the latest study, researchers further explored the left and right brain Hemispheric performance differences in speech decoding.
Traditionally, most research has focused on the left hemisphere, which is closely related to speech and language functions.
Picture
However, what we know about the right brain hemisphere’s ability to decode verbal information is Still very limited. To explore this area, the research team compared the decoding performance of the participants' left and right hemispheres, verifying the feasibility of using the right hemisphere for speech recovery.
Of the 48 subjects collected in the study, 16 had ECoG signals from the right brain. By comparing the performance of ResNet and Swin decoders, researchers found that the right hemisphere can also effectively decode speech, and its effect is similar to that of the left hemisphere. The discovery provides a possible language restoration option for patients with damage to the left side of the brain who have lost language function.
The research also involves the impact of electrode sampling density on the speech decoding effect. Previous studies mostly used higher density electrode grids (0.4 mm), while the density of electrode grids commonly used in clinical practice is lower (1 cm).
Five participants in this study used hybrid-type (HB) electrode grids, which are primarily low-density but with some additional electrodes added. Low-density sampling was used for the remaining forty-three participants.
The results show that the decoding performance of these hybrid sampling (HB) is similar to that of traditional low-density sampling (LD), indicating that the model can effectively extract data from cerebral cortex electrode grids of different densities. Learn voice information. This finding suggests that electrode sampling densities commonly used in clinical settings may be sufficient to support future brain-computer interface applications.
3. Research on the contribution of different brain areas of the left and right brain to speech decoding
The researchers also explored speech-related areas in the brain role in the speech decoding process, which has important implications for the possible future implantation of speech restoration devices in the left and right brain hemispheres. To evaluate the impact of different brain regions on speech decoding, the research team used occlusion analysis.
By comparing the causal and non-causal models of ResNet and Swin decoders, the study found that in the non-causal model, the role of the auditory cortex is more significant. This result highlights the need to use causal models in real-time speech decoding applications that cannot rely on future neurofeedback signals.
picture
In addition, research also shows that the contribution of the sensorimotor cortex, especially the abdominal area, to speech decoding is similar whether in the left or right hemisphere of the brain. This finding suggests that implanting a neurological prosthesis in the right hemisphere to restore speech may be a viable option, providing important insights into future treatment strategies.
Conclusion (Inspiring Outlook)
The research team developed a new type of differentiable speech synthesizer that uses lightweight volume The product neural network encodes speech into a series of interpretable parameters such as pitch, loudness and formant frequency, and resynthesizes the speech using the same differentiable synthesizer.
By mapping neural signals to these parameters, the researchers built a neural speech decoding system that is highly interpretable and applicable to small data sets, capable of generating natural-sounding speech. .
This system demonstrated a high degree of reproducibility among 48 participants, was able to process data with different spatial sampling densities, and was able to process both left and right brain hemispheres simultaneously. electrical signals, demonstrating its strong potential in speech decoding.
Despite the significant progress, the researchers also pointed out some current limitations of the model, such as the fact that the decoding process relies on speech training data paired with ECoG recordings, which may be difficult for people with aphasia. not applicable.
In the future, the research team hopes to establish a model architecture that can handle non-grid data and more effectively utilize multi-patient and multi-modal EEG data. With the continuous advancement of hardware technology and the rapid development of deep learning technology, research in the field of brain-computer interface is still in its early stages, but as time goes by, the brain-computer interface vision in science fiction movies will gradually become a reality.
Reference:
https://www.nature.com/articles/s42256-024-00824-8
The first author of this article: Xupeng Chen (xc1490@nyu.edu), Ran Wang, corresponding author: Adeen Flinker
For more discussion of causality in neural speech decoding, you can Refer to another paper by the authors:
https://www.pnas.org/doi/10.1073/pnas.2300255120
The above is the detailed content of LeCun forwarded, AI allows aphasic people to speak again! NYU releases new 'neural-speech” decoder. For more information, please follow other related articles on the PHP Chinese website!

ai合并图层的快捷键是“Ctrl+Shift+E”,它的作用是把目前所有处在显示状态的图层合并,在隐藏状态的图层则不作变动。也可以选中要合并的图层,在菜单栏中依次点击“窗口”-“路径查找器”,点击“合并”按钮。

ai橡皮擦擦不掉东西是因为AI是矢量图软件,用橡皮擦不能擦位图的,其解决办法就是用蒙板工具以及钢笔勾好路径再建立蒙板即可实现擦掉东西。

虽然谷歌早在2020年,就在自家的数据中心上部署了当时最强的AI芯片——TPU v4。但直到今年的4月4日,谷歌才首次公布了这台AI超算的技术细节。论文地址:https://arxiv.org/abs/2304.01433相比于TPU v3,TPU v4的性能要高出2.1倍,而在整合4096个芯片之后,超算的性能更是提升了10倍。另外,谷歌还声称,自家芯片要比英伟达A100更快、更节能。与A100对打,速度快1.7倍论文中,谷歌表示,对于规模相当的系统,TPU v4可以提供比英伟达A100强1.

ai可以转成psd格式。转换方法:1、打开Adobe Illustrator软件,依次点击顶部菜单栏的“文件”-“打开”,选择所需的ai文件;2、点击右侧功能面板中的“图层”,点击三杠图标,在弹出的选项中选择“释放到图层(顺序)”;3、依次点击顶部菜单栏的“文件”-“导出”-“导出为”;4、在弹出的“导出”对话框中,将“保存类型”设置为“PSD格式”,点击“导出”即可;

ai顶部属性栏不见了的解决办法:1、开启Ai新建画布,进入绘图页面;2、在Ai顶部菜单栏中点击“窗口”;3、在系统弹出的窗口菜单页面中点击“控制”,然后开启“控制”窗口即可显示出属性栏。

Yann LeCun 这个观点的确有些大胆。 「从现在起 5 年内,没有哪个头脑正常的人会使用自回归模型。」最近,图灵奖得主 Yann LeCun 给一场辩论做了个特别的开场。而他口中的自回归,正是当前爆红的 GPT 家族模型所依赖的学习范式。当然,被 Yann LeCun 指出问题的不只是自回归模型。在他看来,当前整个的机器学习领域都面临巨大挑战。这场辩论的主题为「Do large language models need sensory grounding for meaning and u

引入密集强化学习,用 AI 验证 AI。 自动驾驶汽车 (AV) 技术的快速发展,使得我们正处于交通革命的风口浪尖,其规模是自一个世纪前汽车问世以来从未见过的。自动驾驶技术具有显着提高交通安全性、机动性和可持续性的潜力,因此引起了工业界、政府机构、专业组织和学术机构的共同关注。过去 20 年里,自动驾驶汽车的发展取得了长足的进步,尤其是随着深度学习的出现更是如此。到 2015 年,开始有公司宣布他们将在 2020 之前量产 AV。不过到目前为止,并且没有 level 4 级别的 AV 可以在市场

ai移动不了东西的解决办法:1、打开ai软件,打开空白文档;2、选择矩形工具,在文档中绘制矩形;3、点击选择工具,移动文档中的矩形;4、点击图层按钮,弹出图层面板对话框,解锁图层;5、点击选择工具,移动矩形即可。


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

EditPlus Chinese cracked version
Small size, syntax highlighting, does not support code prompt function

Dreamweaver CS6
Visual web development tools

WebStorm Mac version
Useful JavaScript development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

DVWA
Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software
