Home > Article > Technology peripherals > AI reading brain explodes! Scan brain images and Stable Diffusion reproduces images realistically
Even without the magic of Hogwarts, you can still see what others are thinking!
The method is very simple. It visualizes brain images based on Stable Diffusion.
For example, the bears, airplanes, and trains you see look like this.
When the AI sees the brain signal, the image generated is as follows, which shows that all the necessary points are included .
This AI brain reading technology has just been accepted by CVPR 2023, giving fans an instant "intracranial orgasm."
Too wild! Forget about prompting projects, now you just need to use your brain to "think" about those pictures.
Imagine using Stable Diffusion to reconstruct visual images from fMRI data, which may mean the development of non-invasive technologies in the future brain-computer interface.
Let AI skip human language directly and perceive what is thought in the human brain.
By then, Neuralink developed by Musk will also catch up with this AI ceiling.
So, how does AI realize brain reading?
The latest research comes from a research team at Osaka University in Japan.
##Paper address: https://www.php.cn/link/0424d20160a6a558e5bf86a7bc9b67f0
Researchers at Osaka University Graduate School of Frontier Biosciences and CiNet at NICT in Japan reconstructed visual experience from fMRI data based on the latent diffusion model (LDM), more specifically via Stable Diffusion .
The framework of the entire operation process is also very simple: 1 image encoder, 1 image decoder, and 1 semantic decoder.
# By doing this, the team eliminated the need to train and fine-tune complex AI models.
All that needs to be trained is a simple linear model that maps fMRI signals from lower and upper visual brain regions to a single Stable Diffusion component.
Specifically, the researchers mapped brain regions as input to image and text encoders. The lower brain areas are mapped to image encoders, and the upper brain areas are mapped to text encoders. This allows the system to use image composition and semantic content for reconstruction.
The first is decoding analysis. The LDM model used in the study consists of an image encoder ε, an image decoder D, and a text encoder τ.
The researchers decoded the latent representation of the reconstructed image z and the related text c from the fMRI signals of the early and high-level visual cortex respectively, and used them as input to generate the reproduced image Xzc by the autoencoder.
The researchers then established a coding model to predict the fMRI signals from different components of the LDM, thereby Explore the inner workings of LDM.
The researchers conducted experiments using fMRI images from the Natural Scenes Dataset (NSD) and tested whether they could Stable Diffusion to reconstruct what the subject saw.
It can be seen that the prediction accuracy of the latent image related to the encoding model and LDM, the last model produces the highest prediction accuracy in the visual cortex at the back of the brain.
The visual reconstruction results of a subject show that the image reconstructed with only z is visually consistent with the original image, But it cannot capture the semantic content.
While the image reconstructed using only c has better semantic fidelity, but poor visual consistency, the image reconstructed using zc can have both high semantic fidelity and poor visual consistency. high resolution.
Reconstruction results from all subjects on the same image show that the effect of reconstruction varies between different subjects It is stable and relatively accurate.
The differences in specific details may stem from differences in individual perception experience or data quality, rather than errors in the reconstruction process.
#Finally, the results of the quantitative assessment were graphed.
Various results show that the method used in the study can not only capture the low-level visual appearance, but also capture the high-level semantic content of the original stimulus.
From this point of view, the experiments show that the combination of image and text decoding provides accurate reconstruction.
There were differences in accuracy between subjects, but these differences were related to the quality of the fMRI images, the researchers said. According to the team, the quality of the reconstruction is comparable to current SOTA methods, but does not require training of the AI model used in it.
At the same time, the team also used models derived from fMRI data to study the various building blocks of Stable Diffusion, such as how semantic content is generated during the reverse diffusion process, Or what process happens in U-Net.
In the early stages of the denoising process, U-Net's bottleneck layer (orange) produces the highest prediction performance, and as the denoising process proceeds, the early layers (blue) For predictions of early visual cortex activity, the bottleneck layer shifts to higher-level visual cortex.
This means that at the beginning of the diffusion process, the image information is compressed in the bottleneck layer, and with denoising, the separation between U-Net layers appears in the visual cortex .
# Additionally, the team is developing a quantitative explanation of image transformations at different stages of diffusion. In this way, the researchers aim to contribute to a better understanding of diffusion models from a biological perspective, which are widely used but whose understanding is still limited.
Have human brain images been decoded by AI?
For years, researchers have been using artificial intelligence models to decode information from the human brain.
At the core of most methods, pre-recorded fMRI images are used as input to generative AI models of text or images.
For example, in early 2018, a team of researchers from Japan showed how a neural network could reconstruct images from fMRI recordings.
In 2019, a group reconstructed images from monkey neurons, and Meta's research group, led by Jean-Remi King, published new work such as from fMRI data to get the text.
In October 2022, a team at the University of Texas at Austin showed that a GPT model can generate data from fMRI scans Text that describes the semantic content one sees in a video is inferred.
In November 2022, researchers from the National University of Singapore, the Chinese University of Hong Kong, and Stanford University used the MinD-Vis diffusion model to reconstruct images from fMRI scans with significantly higher accuracy than methods available at the time.
Going further back, some netizens pointed out that “generating images based on brain waves has been around since at least 2008. Yes, it is simply ridiculous to imply that Stable Diffusion can read people's minds in some way."
This paper published in Nature by the University of California, Berkeley, stated, A person's brain wave activity can be converted into images using a visual decoder.
When it comes to tracing history, there are still people who directly Take out a 1999 study by Stanford Li Feifei on reconstructing images from the cerebral cortex.
## Li Feifei also commented and forwarded it, saying that she was still a university intern at the time.Also in 2011, a UC Berkeley study used functional magnetic resonance imaging (fMRI) and computational models to A preliminary reconstruction of the brain's "dynamic visual image".
In other words, they recreated clips that people have seen.
But compared with the latest research, this reconstruction cannot be called "high definition" at all and is almost unrecognizable.
about the author
Yu Takagi
Yu Takagi is an assistant professor at Osaka University. His research interests are at the intersection of computational neuroscience and artificial intelligence.
During his PhD, he studied techniques for predicting individual differences from whole-brain functional connectivity using functional magnetic resonance imaging (fMRI) in the ATR Brain Information Communication Research Laboratory.
Most recently, he has used machine learning techniques to understand dynamic computations in complex decision-making tasks at the Oxford Center for Human Brain Activity at the University of Oxford and the Department of Psychology at the University of Tokyo.
Shinji Nishimoto
Shinji Nishimoto is Professor at Osaka University. His research focuses on the quantitative understanding of visual and cognitive processing in the brain.
More specifically, the research focus of Professor Nishimoto’s team is to build predictive models of brain activity induced under natural perception and cognitive conditions. Understanding neural processing and representation.
Some netizens asked the author whether this research can be used to interpret dreams?
"It is possible to apply the same technology to brain activity during sleep, but the accuracy of such an application is unclear."
#After reading this research: Legilimency is in place.
Reference:
https: //www.php.cn/link/0424d20160a6a558e5bf86a7bc9b67f0
https://www.php.cn/link /02d72b702eed900577b953ef7a9c1182
The above is the detailed content of AI reading brain explodes! Scan brain images and Stable Diffusion reproduces images realistically. For more information, please follow other related articles on the PHP Chinese website!