Home > Article > Technology peripherals > Use machine learning to decode a brain that has been "voiceless" for 15 years and let it "speak"
Big Data Digest Works
Author: Miggy
For paralyzed patients, the greatest pain comes from the inability to communicate with the outside world. Although the brain is still active and wants to express, the inability to drive the vocal muscles will cause the language mechanism of such patients to gradually deteriorate.
Edward Chang, chief of neurosurgery at the University of California, San Francisco, is developing brain-computer interface technology for people who have lost the ability to speak. His laboratory is dedicated to decoding brain signals related to vocal tract commands and outputting the brain's language functions through computers through neural implants. The project required not only the best neurotech hardware available today, but also powerful machine learning models.
Recently, this technology has also made great progress, allowing a patient who has been "voiceless" due to paralysis for 15 years to use a computer to start communicating with the outside world. Chang also recorded this technical process and published it on IEEE.
Lets come look.
The computer screen displayed "Do you want to drink some water?" "This question. Below, three small dots are flashing, and then a line of words appears: "No, I'm not thirsty. "
Brain activity allows communication to occur - it is worth mentioning that the brain that produced this conversation was the brain of a mute person who had not spoken for more than 15 years. Fifteen years ago, due to a stroke, his brain The brain became "disconnected" from the rest of the body, and the patient's communication with the outside world ceased. He has tried many new technologies to try to communicate with the outside world; most recently, he used a pointer attached to a baseball cap to type words on a touch screen , this method is effective but slow.
Recently, this patient volunteered to participate in my research group's clinical trial at the University of California, San Francisco, hoping to explore a faster method of communication. So far, He only has access to this system of brain-generated text during research, but he hopes to help develop the technology into something people like him can use in their daily lives.
In our pilot study In , the surface of this volunteer's brain is covered with an array of thin, flexible electrodes. The electrodes record neural signals and send them to a speech decoder, which translates the signals into what he wants to say. This is paralysis without the ability to speak. For the first time, researchers have used neurotechnology to "broadcast" entire words from the brain, rather than just letters.
The trial is the culmination of more than a decade of research into the underlying brain mechanisms that govern speech, and we have provided the best results to date. I'm extremely proud of what we've accomplished so far. But we're just getting started. My lab at UCSF is working with colleagues around the world to make this technology safe, stable, and reliable enough for daily use at home. We're still working on it Improves the performance of the system, so it's worth the effort.
The first version of the brain-computer interface gave volunteers a vocabulary of 50 practical words.
Neural implant technology has come a long way over the past two decades. Prosthetic implants for hearing have evolved the furthest , which are designed to interface with the cochlear nerve of the inner ear or directly into the auditory brainstem. There is also a lot of research into retinal and brain implants, as well as efforts to provide tactile sensations in prosthetic hands. All of these sensory prostheses take information from the outside world and incorporate It is converted into electrical signals and input into the processing center of the brain.
Last week, Digest Magazine also reported a case of using an implant to help patients with smell loss regain their sense of taste.
Another A class of neuroprosthetics records the brain's electrical activity and converts it into signals that control the outside world, such as a robotic arm, a video game controller, or a cursor on a computer screen. This last form of control has been used by groups such as the BrainGate Alliance to allow paralyzed people to People type words—sometimes one letter at a time, sometimes using an autocomplete feature to speed up typing.
This type of typing through the brain is not new, but researchers often place implants in areas of movement The cortex, the part of the brain that controls movement. The user then imagines certain physical movements to control a cursor moving on a virtual keyboard. Another approach was pioneered by some of my collaborators in a 2021 paper, It lets a user imagine he is holding a pen to paper and writing a letter, generating signals in the motor cortex that are translated into text. This method set a new record for typing speed, allowing volunteers to write about 18 words per minute.
In our latest laboratory study, we took a more efficient approach. Instead of decoding the user's intention to move the cursor or pen, we decode the intention to control the vocal tract, including the dozens of muscles that control the larynx (often called the voice box), tongue, and lips.
For a paralyzed man, a seemingly simple conversation setup is enabled by sophisticated neurotech hardware and machine learning systems that decode his brain signals.
I started working in this field more than ten years ago. As a neurosurgeon, I often see patients with severe injuries that leave them unable to speak. To my surprise, in many cases, the location of the brain damage did not match the syndromes I learned about in medical school, and I realized that we still have a lot to learn about how the brain processes language. I decided to study the underlying neurobiology of language and, if possible, develop a brain-machine interface (BMI) to restore communication to people who have lost language. In addition to my neurosurgery background, my team has expertise in linguistics, electrical engineering, computer science, bioengineering, and medicine.
Language is one of the abilities that makes humans unique. Many other species make sounds, but only humans combine a set of sounds for expression in countless different ways. It's also a very complex motor behavior—some experts consider it the most complex motor behavior people perform. Speech is the product of modulated airflow through the vocal tract; we shape our breathing by creating audible vibrations in the laryngeal vocal cords and changing the shape of our lips, jaw, and tongue.
Many of the muscles of the vocal tract are completely different from joint-based muscles, such as those in the arms and legs, which can only move in a few prescribed ways. For example, the muscles that control the lips are sphincters, whereas the muscles that make up the tongue are more controlled by hydraulic pressure—the tongue is primarily made up of a fixed volume of muscle tissue, so moving one part of the tongue changes its shape elsewhere. The physics that control the movement of these muscles are completely different than those of biceps or hamstrings.
Because there are so many muscles involved, and each of them has so many degrees of freedom, there are essentially an infinite number of possible configurations. But when people speak, it turns out they use relatively few core actions (which vary across languages). For example, when English speakers pronounce the "d" sound, they place their tongue behind their teeth; when they pronounce the "k" sound, the base of their tongue reaches up to touch the ceiling at the back of the mouth. Few people realize the precise, complex and coordinated muscle movements required to speak the simplest of words.
Team member David Moses looks at a patient brain wave reading [left screen] and a display of decoding system activity [right screen].
My research group focuses on the motor cortex part of the brain that sends movement commands to the muscles of the face, throat, mouth, and tongue. These brain regions are multitasking: They manage muscle movements that produce speech, as well as movements of the same muscles for swallowing, smiling, and kissing.
Studying neural activity in these areas requires millimeter-level spatial resolution and millisecond-level temporal resolution. Historically, noninvasive imaging systems have been able to provide one or the other, but not both. When we began this study, we found that there was very little data on how patterns of brain activity relate to the simplest components of speech: phonemes and syllables.
Here we would like to thank our volunteers. At the UCSF Epilepsy Center, patients preparing for surgery often have electrodes surgically placed on the surface of their brains for several days so we can map the areas involved in seizures. During these days of wired downtime, many patients volunteered to participate in neurological research experiments, which utilize electrode recordings in their brains, allowing us to study patterns of neural activity as patients speak.
The hardware involved is called electrocorticography (ECoG). The electrodes in the ECoG system do not penetrate the brain but sit on its surface. Our arrays can contain hundreds of electrode sensors, each recording thousands of neurons. So far we have used an array with 256 channels. Our goal in these early studies was to discover patterns of cortical activity when people speak simple syllables. We asked volunteers to speak specific sounds and words, while their neural patterns were recorded and their tongue and mouth movements were tracked. Sometimes we do this by having them apply colorful face paint and using a computer vision system to extract motor gestures; other times, we use an ultrasound machine placed under the patient's jaw to image their moving tongue.
The system starts with an array of flexible electrodes that are draped over the patient's brain to receive signals from the motor cortex. The array specifically captures movement commands for the patient's vocal tract. A port fixed to the skull leads to wires connected to a computer system, which decodes brain signals and translates them into what the patient wants to say, displaying their answers on a display.
We use these systems to match neural patterns to the movements of the vocal tract. Initially, we had a lot of questions about Neural Code. One possibility is that neural activity encodes the direction of specific muscles, with the brain essentially turning those muscles on and off like pressing keys on a keyboard, and through another pattern determining how fast the muscles contract. Another is that neural activity corresponds to the coordinated pattern of muscle contractions used to produce a certain sound. (For example, to make the sound "aaah," both the tongue and jaw need to drop.) We found that there is a representational map that controls different parts of the vocal tract, as well as different brain regions. We can combine the two to produce fluent speech.
Our work depends on advances in artificial intelligence over the past decade. We can feed the data collected on neural activity and speech kinematics into a neural network, and then let the machine learning algorithm find patterns in the correlation between the two data sets, thereby establishing a link between neural activity and the speech produced. and use this model to produce computer-generated speech or text. But this technique cannot train algorithms for paralyzed people because we are missing half the data: we have the neural patterns, but not the corresponding muscle movements.
We realized that a smarter way to use machine learning is to break the problem into two steps. First, the decoder translates signals from the brain into intended movements of the muscles in the vocal tract, and then translates these intended movements into synthesized speech or text.
We call it a bionic approach because it replicates biological movement patterns; in the human body, neural activity is directly responsible for the movement of the vocal tract and only indirectly for the sound produced. A big advantage of this approach is the second step of training the decoder to convert muscle movements into sounds. Because the relationship between vocal tract motion and sound is more accessible, we were able to train the decoder on a large dataset from non-paralyzed people.
The next big challenge is bringing technology to the people who can actually benefit from it.
The National Institutes of Health (NIH) is funding our pilot trial, which will begin in 2021. We already have two paralyzed volunteers implanted with ECoG arrays, and we hope to recruit more in the coming years. The main goal is to improve their communication and we measure performance in words per minute. The average adult typing on a full keyboard can type 40 words per minute, with the fastest typists reaching speeds of over 80 words per minute.
We believe that using the voice system to speak will have a better effect. Humans speak much faster than they type: English speakers can easily utter 150 words a minute. We want paralyzed people to communicate at 100 words per minute. To achieve this goal, we still have a lot of work to do.
The implantation procedure is similar to other implants. First, the surgeon removes a small portion of the skull; next, the flexible ECoG array is gently placed on the cortical surface. A small port is then secured to the skull and exits through a separate opening in the scalp. We currently need this port, which connects to external wires to transmit data from the electrodes, but we hope to make the system wireless in the future.
We considered penetrating microelectrodes because they can record smaller neural populations and therefore provide more detail about neural activity. But current hardware is not as powerful and safe as ECoG for clinical use.
Another consideration is that penetrating electrodes often require daily recalibration to convert neural signals into clear commands, and research on neural devices shows that the speed of setup and reliability of performance are what keep people using the technology key. That's why we prioritize stability when creating "plug and play" systems for long-term use. We conducted a study looking at changes in neural signals over time in volunteers and found that the decoder performed better if it used data patterns across multiple sessions and days. In machine learning terms, we say that the "weights" of the decoder are inherited, resulting in an integrated neural signal.
Because our paralyzed volunteers were unable to speak while we observed their brain patterns, we asked our first volunteer to try two different approaches. He started with a list of 50 words that were convenient for everyday use, such as "hungry," "thirst," "please," "help" and "computer." Over the course of 48 sessions over several months, we sometimes asked him to imagine saying each word on the list and sometimes asked him to speak and try to "say" the words. We found that trying to speak produced clearer brain signals that were sufficient to train the decoding algorithm. The volunteer can then use these words from the list to generate a sentence of his own choice, such as "No, I'm not thirsty."
We are currently working on expanding our vocabulary. To achieve this, we need to continue improving the current algorithms and interfaces, but I believe these improvements will happen in the coming months and years. Now that the proof of principle has been established, the goal is optimization. We can focus on making our systems faster, more accurate, and—most importantly—more secure and reliable. Things should be moving quickly now.
The biggest breakthroughs may come if we can better understand the brain systems we are trying to decode, and how paralysis changes their activity. We have realized that the neural patterns of paralyzed patients who are unable to send commands to their vocal tract muscles are very different from those of epileptic patients who are able to send commands. We're attempting an ambitious feat of BMI engineering, and there's still a lot to learn about the underlying neuroscience. We believe it all comes together to give our patients the ability to communicate.
Source of material: https://spectrum.ieee.org/brain-computer-interface-speech
The above is the detailed content of Use machine learning to decode a brain that has been "voiceless" for 15 years and let it "speak". For more information, please follow other related articles on the PHP Chinese website!