The non-invasive multi-modal learning model developed by the Institute of Automation realizes brain signal decoding and semantic analysis-AI-php.cn

The non-invasive multi-modal learning model developed by the Institute of Automation realizes brain signal decoding and semantic analysis

PHPz

May 08, 2023 am 10:40 AM

Modeldecoding

The non-invasive multi-modal learning model developed by the Institute of Automation realizes brain signal decoding and semantic analysis

##Paper address: https://ieeexplore.ieee.org/document/10089190
Code address: https://github.com/ChangdeDu/BraVL
##Data address: https:// figshare.com/articles/dataset/BraVL/17024591##too long not to read version

This study

First time Combining brain, visual and language knowledge, through multi-modal learning, it is possible to decode new visual categories from human brain activity records with zero samples. This article also contributes three "brain-picture-text" three-modal matching data sets. The experimental results indicate some interesting conclusions and cognitive insights: 1) Decoding new visual categories from human brain activity is achievable with high accuracy; 2) Using Decoding models that combine visual and linguistic features perform better than models using only one of them; 3) visual perception may be accompanied by linguistic influences to represent the semantics of visual stimuli. These findings not only shed light on the understanding of the human visual system, but also provide new ideas for future brain-computer interface technology. The code and data sets for this study are open source.

Research background

Decoding human visual neural representation is a challenge of important scientific significance, which can reveal the visual processing mechanism and promote the development of brain science and artificial intelligence. However,

current neural decoding methods are difficult to generalize to new categories beyond the training data. There are two main reasons: First, the existing methods do not fully utilize the many features behind neural data. Modal semantic knowledge, and second, there is little available pairing (stimulus-brain response) training data. Research shows that human perception and recognition of visual stimuli are affected by visual features and people’s previous experiences. For example, when we see a familiar object, our brains naturally retrieve knowledge related to that object. As shown in Figure 1 below, cognitive neuroscience research on dual coding theory [9] believes that specific concepts are encoded in the brain both visually and linguistically, where language, as an effective prior experience, helps to shape Representations generated by vision.

Therefore, the author believes that to better decode the recorded brain signals, not only the actual presented visual semantic features should be used, but also richer features related to the visual target object should be used. Decoding is performed by a combination of linguistic semantic features.

The non-invasive multi-modal learning model developed by the Institute of Automation realizes brain signal decoding and semantic analysis

Figure 1. Dual encoding of knowledge in the human brain. When we see pictures of elephants, we will naturally retrieve elephant-related knowledge in our minds (such as long trunks, long teeth, big ears, etc.). At this point, the concept of elephant is encoded in the brain in both visual and verbal form, with language serving as a valid prior experience that helps shape the representation produced by vision. As shown in Figure 2 below, since it is very expensive to collect human brain activities of various visual categories, researchers usually only have very limited brain activities of visual categories. However, image and text data are abundant and can provide additional useful information.

The method in this article can make full use of all types of data (trimodal, bimodal and unimodal) to improve the generalization ability of neural decoding.

Figure 2. Image stimuli, elicited brain activities, and their corresponding text data. We can only collect brain activity data for a few categories, but image and/or text data can easily be collected for almost all categories. Therefore, for known categories, we assume that brain activity, visual images, and corresponding text descriptions are all available for training, whereas for new categories, only visual images and text descriptions are available for training. The test data is brain activity data from new categories.

As shown in Figure 3A below, the key to this method is to combine each model The learned distributions are aligned into a shared latent space that contains the essential multi-modal information relevant to the new categories.

Specifically, the author proposesA multi-modal autoencoding variational Bayesian learning framework, where a Mixture-of-Products-of-Experts (MoPoE) model is used to infer a latent encoding to achieve joint generation of all three modalities. In order to learn more relevant joint representations and improve data efficiency when brain activity data is limited, the authors further introduce intra-modal and inter-modal mutual information regularization terms. Furthermore, BraVL models can be trained under various semi-supervised learning scenarios to incorporate additional visual and textual features of large-scale image categories.

In Figure 3B, the authorstrain an SVM classifier from latent representations of visual and textual features of new categories. It should be noted that the encoders E_v and E_t are frozen in this step and only the SVM classifier (gray module) will be optimized.

In the application, as shown in Figure 3C, the input of this method is only the new category of brain signals and does not require other data , so it can be easily applied to large Most neural decoding scenarios. The SVM classifier is able to generalize from (B) to (C) because the underlying representations of these three modalities are already aligned in A.

The non-invasive multi-modal learning model developed by the Institute of Automation realizes brain signal decoding and semantic analysis

Figure 3 The “brain-picture-text” three-modal joint learning framework proposed in this article, referred to as BraVL.

In addition, brain signals change from trial to trial, even for the same visual stimulus. To improve the stability of neural decoding, the authors used a stability selection method to process fMRI data. The stability scores of all voxels are shown in Figure 4 below. The author selected the top 15% of voxels with the best stability to participate in the neural decoding process. This operation can effectively reduce the dimensionality of fMRI data and suppress interference caused by noisy voxels without seriously affecting the discriminative ability of brain features.

The non-invasive multi-modal learning model developed by the Institute of Automation realizes brain signal decoding and semantic analysis

Figure 4. Voxel activity stability score map of the visual cortex of the brain.

# Existing neural encoding and decoding data sets often only have image stimuli and brain responses. In order to obtain the linguistic description corresponding to the visual concept, the author adopted a semi-automatic Wikipedia article extraction method. Specifically, the authors first create automatic matching of ImageNet classes and their corresponding Wikipedia pages. The matching is based on the similarity between the ImageNet class and the synset word of the Wikipedia title. and their parent categories. As shown in Figure 5 below, unfortunately, this kind of matching can occasionally produce false positives because similarly named classes may represent very different concepts. When constructing the trimodal dataset, in order to ensure high-quality matching between visual features and linguistic features, the authors manually deleted unmatched articles.

Figure 5. Semi-automatic visual concept description acquisition

Experimental results

The author is in multiple Extensive zero-shot neural decoding experiments were conducted on the "Brain-Image-Text" three-modal matching data set. The experimental results are shown in the table below. As can be seen, models using a combination of visual and textual features (V&T) perform much better than models using either of them alone. Notably, BraVL based on V&T features significantly improves the average top-5 accuracy on both datasets. These results suggest that, although the stimuli presented to subjects contain only visual information, it is conceivable that subjects subconsciously invoke appropriate linguistic representations, thereby affecting visual processing.

For each visual concept category, the authors also show the neural decoding accuracy gain after adding text features, as shown in Figure 6 below. It can be seen that for most test classes, the addition of text features has a positive impact, with the average Top-1 decoding accuracy increasing by about 6%.

The non-invasive multi-modal learning model developed by the Institute of Automation realizes brain signal decoding and semantic analysis

Figure 6. Neural decoding accuracy gain after adding text features

In addition to the neural decoding analysis, the authors also analyzed the contribution of text features in voxel-level neural encoding (predicting the corresponding brain based on visual or text features Voxel activity), the results are shown in Figure 7. It can be seen that for most high-level visual cortex (HVC, such as FFA, LOC and IT), fusing text features on the basis of visual features can improve the prediction accuracy of brain activity, while for most low-level visual cortex (LVC, such as V1, V2 and V3), fusing text features is not beneficial or even harmful.

From the perspective of cognitive neuroscience, our results are reasonable, because it is generally believed that HVC is responsible for processing higher-level semantic information such as category information and motion information of objects. LVC is responsible for processing low-level information such as direction and outline. In addition, a recent neuroscientific study found that visual and linguistic semantic representations are aligned at the boundary of the human visual cortex (i.e., the "semantic alignment hypothesis")[10], and the author's experimental results also support this hypothesis

The non-invasive multi-modal learning model developed by the Institute of Automation realizes brain signal decoding and semantic analysis

Figure 7. Projection of text feature contributions to visual cortex

For more experimental results, please see the original text.

Overall, this paper draws some interesting conclusions and cognitive insights: 1) Decoding new visual categories from human brain activity is achievable with high accuracy; 2) Decoding models using a combination of visual and linguistic features perform much better than decoding models using either alone; 3) Visual perception may be accompanied by linguistic influences to represent the semantics of visual stimuli; 4) Using natural Language as a concept description has higher neural decoding performance than using class names; 5) Additional data in both unimodality and bimodality can significantly improve decoding accuracy.

Discussion and Outlook

Du Changde, the first author of the paper and a special research assistant at the Institute of Automation, Chinese Academy of Sciences, said: "This work confirms that brain activity, visual images and text The features extracted in the description are effective for decoding neural signals. However, the extracted visual features may not accurately reflect all stages of human visual processing, and better feature sets would be helpful in the completion of these tasks. For example, larger Pre-trained language models (such as GPT-3) are used to extract text features that are more capable of zero-shot generalization. In addition, although Wikipedia articles contain rich visual information, this information is easily obscured by a large number of non-visual sentences. .This problem can be solved by visual sentence extraction or collecting more accurate and rich visual descriptions using models such as ChatGPT and GPT-4. Finally, compared with related studies, although this study used relatively more tri-modal data, Larger and more diverse data sets would be more beneficial. We leave these aspects to future research."

The corresponding author of the paper, researcher He Huiguang of the Institute of Automation, Chinese Academy of Sciences, pointed out: "The method proposed in this article has three potential applications: 1) As a neural semantic decoding tool, this method will be used in a new type of reading semantic information of the human brain. Play an important role in the development of neuroprosthetic devices. Although this application is not yet mature, the method in this article provides a technical foundation for it. 2) By inferring brain activity across modalities, the method in this article can also be used as a neural coding tool, using It is used to study how visual and language features are expressed on the human cerebral cortex, revealing which brain areas have multimodal properties (i.e., are sensitive to visual and language features). 3) The neural decodability of the internal representation of the AI model can be regarded as the model Brain-like level indicators. Therefore, the method in this paper can also be used as a brain-like property evaluation tool to test which model's (visual or language) representation is closer to human brain activity, thus motivating researchers to design more brain-like computing models. 》

Neural information encoding and decoding is a core issue in the field of brain-computer interface. It is also an effective way to explore the principles behind the complex functions of the human brain and promote the development of brain-like intelligence. The neural computing and brain-computer interaction research team of the Institute of Automation has been working in this field for many years and has made a series of research works, which were published in TPAMI 2023, TMI2023, TNNLS 2022/2019, TMM 2021, Info. Fusion 2021, AAAI 2020, etc. The preliminary work was reported in the headlines of MIT Technology Review and won the ICME 2019 Best Paper Runner-up Award.

This research was supported by the Science and Technology Innovation 2030-"New Generation of Artificial Intelligence" major project, the National Foundation Project, the Institute of Automation 2035 Project, and the China Artificial Intelligence Society-Huawei MindSpore Academic Award Fund and Intelligence Support for pedestals and other projects.

About the author

First author: Du Changde, special research assistant at the Institute of Automation, Chinese Academy of Sciences, engaged in research on brain cognition and artificial intelligence, in visual neural information He has published more than 40 papers on encoding and decoding, multi-modal neural computing, etc., including TPAMI/TNNLS/AAAI/KDD/ACMMM, etc. He has won the 2019 IEEE ICME Best Paper Runner-up Award and the 2021 Top 100 Chinese AI Rising Stars. He has successively undertaken a number of scientific research tasks for the Ministry of Science and Technology, the National Foundation for Science and Technology, and the Chinese Academy of Sciences, and his research results were reported in the headlines of MIT Technology Review.

The non-invasive multi-modal learning model developed by the Institute of Automation realizes brain signal decoding and semantic analysis

##Personal homepage: https://changdedu.github.io/

Corresponding author: He Huiguang, researcher at the Institute of Automation, Chinese Academy of Sciences, doctoral supervisor, post professor at the University of Chinese Academy of Sciences, distinguished professor at Shanghai University of Science and Technology, outstanding member of the Youth Promotion Association of the Chinese Academy of Sciences, and winner of the commemorative medal for the 70th anniversary of the founding of the People's Republic of China. He has successively undertaken 7 National Natural Fund projects (including key fund and international cooperation projects), 2 863 projects, and national key research plan projects. He has won two second-class National Science and Technology Progress Awards (ranked second and third respectively), two Beijing Science and Technology Progress Awards, the first-class Science and Technology Progress Award of the Ministry of Education, the first Outstanding Doctoral Thesis Award of the Chinese Academy of Sciences, Beijing Science and Technology Rising Star, and the Chinese Academy of Sciences " Lu Jiaxi Young Talent Award", Fujian Province "Minjiang Scholar" Chair Professor. Its research fields include artificial intelligence, brain-computer interface, medical image analysis, etc. In the past five years, he has published more than 80 articles in journals and conferences such as IEEE TPAMI/TNNLS and ICML. He is an editorial board member of IEEEE TCDS, Journal of Automation and other journals, a distinguished member of CCF, and a distinguished member of CSIG.

The non-invasive multi-modal learning model developed by the Institute of Automation realizes brain signal decoding and semantic analysis

The above is the detailed content of The non-invasive multi-modal learning model developed by the Institute of Automation realizes brain signal decoding and semantic analysis. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete

Can't use ChatGPT! Explaining the causes and solutions that can be tested immediately [Latest 2025]May 14, 2025 am 05:04 AM

ChatGPT is not accessible? This article provides a variety of practical solutions! Many users may encounter problems such as inaccessibility or slow response when using ChatGPT on a daily basis. This article will guide you to solve these problems step by step based on different situations. Causes of ChatGPT's inaccessibility and preliminary troubleshooting First, we need to determine whether the problem lies in the OpenAI server side, or the user's own network or device problems. Please follow the steps below to troubleshoot: Step 1: Check the official status of OpenAI Visit the OpenAI Status page (status.openai.com) to see if the ChatGPT service is running normally. If a red or yellow alarm is displayed, it means Open

Calculating The Risk Of ASI Starts With Human MindsMay 14, 2025 am 05:02 AM

On 10 May 2025, MIT physicist Max Tegmark told The Guardian that AI labs should emulate Oppenheimer’s Trinity-test calculus before releasing Artificial Super-Intelligence. “My assessment is that the 'Compton constant', the probability that a race to

An easy-to-understand explanation of how to write and compose lyrics and recommended tools in ChatGPTMay 14, 2025 am 05:01 AM

AI music creation technology is changing with each passing day. This article will use AI models such as ChatGPT as an example to explain in detail how to use AI to assist music creation, and explain it with actual cases. We will introduce how to create music through SunoAI, AI jukebox on Hugging Face, and Python's Music21 library. Through these technologies, everyone can easily create original music. However, it should be noted that the copyright issue of AI-generated content cannot be ignored, and you must be cautious when using it. Let’s explore the infinite possibilities of AI in the music field together! OpenAI's latest AI agent "OpenAI Deep Research" introduces: [ChatGPT]Ope

What is ChatGPT-4? A thorough explanation of what you can do, the pricing, and the differences from GPT-3.5!May 14, 2025 am 05:00 AM

The emergence of ChatGPT-4 has greatly expanded the possibility of AI applications. Compared with GPT-3.5, ChatGPT-4 has significantly improved. It has powerful context comprehension capabilities and can also recognize and generate images. It is a universal AI assistant. It has shown great potential in many fields such as improving business efficiency and assisting creation. However, at the same time, we must also pay attention to the precautions in its use. This article will explain the characteristics of ChatGPT-4 in detail and introduce effective usage methods for different scenarios. The article contains skills to make full use of the latest AI technologies, please refer to it. OpenAI's latest AI agent, please click the link below for details of "OpenAI Deep Research"

Explaining how to use the ChatGPT app! Japanese support and voice conversation functionMay 14, 2025 am 04:59 AM

ChatGPT App: Unleash your creativity with the AI assistant! Beginner's Guide The ChatGPT app is an innovative AI assistant that handles a wide range of tasks, including writing, translation, and question answering. It is a tool with endless possibilities that is useful for creative activities and information gathering. In this article, we will explain in an easy-to-understand way for beginners, from how to install the ChatGPT smartphone app, to the features unique to apps such as voice input functions and plugins, as well as the points to keep in mind when using the app. We'll also be taking a closer look at plugin restrictions and device-to-device configuration synchronization

How do I use the Chinese version of ChatGPT? Explanation of registration procedures and feesMay 14, 2025 am 04:56 AM

ChatGPT Chinese version: Unlock new experience of Chinese AI dialogue ChatGPT is popular all over the world, did you know it also offers a Chinese version? This powerful AI tool not only supports daily conversations, but also handles professional content and is compatible with Simplified and Traditional Chinese. Whether it is a user in China or a friend who is learning Chinese, you can benefit from it. This article will introduce in detail how to use ChatGPT Chinese version, including account settings, Chinese prompt word input, filter use, and selection of different packages, and analyze potential risks and response strategies. In addition, we will also compare ChatGPT Chinese version with other Chinese AI tools to help you better understand its advantages and application scenarios. OpenAI's latest AI intelligence

5 AI Agent Myths You Need To Stop Believing NowMay 14, 2025 am 04:54 AM

These can be thought of as the next leap forward in the field of generative AI, which gave us ChatGPT and other large-language-model chatbots. Rather than simply answering questions or generating information, they can take action on our behalf, inter

An easy-to-understand explanation of the illegality of creating and managing multiple accounts using ChatGPTMay 14, 2025 am 04:50 AM

Efficient multiple account management techniques using ChatGPT | A thorough explanation of how to use business and private life! ChatGPT is used in a variety of situations, but some people may be worried about managing multiple accounts. This article will explain in detail how to create multiple accounts for ChatGPT, what to do when using it, and how to operate it safely and efficiently. We also cover important points such as the difference in business and private use, and complying with OpenAI's terms of use, and provide a guide to help you safely utilize multiple accounts. OpenAI

See all articles