Home >Technology peripherals >AI >Blow up the AI and biochemical environment! GPT-4 learns to do scientific research on its own and teaches humans how to conduct experiments step by step
Incredible, GPT-4 has learned to do scientific research on its own?
Recently, several scientists from Carnegie Mellon University published a paper, which simultaneously blew up the AI and chemistry circles.
They have created an AI that can conduct experiments and conduct scientific research on its own. This AI is composed of several large language models and can be regarded as a GPT-4 agent agent with explosive scientific research capabilities.
Because it has long-term memory from vector databases, it can read, understand complex scientific documents, and conduct chemical research in a cloud-based robotic laboratory.
Netizens were so shocked that they were speechless: So, is this AI researched by itself and then published by itself? Oh my god.
Some people lamented that the era of "Tennis Experiment" (TTE) is coming!
Is this the legendary AI Holy Grail in the chemical world?
Recently, many people probably feel that we are living in science fiction every day.
In March, OpenAI released GPT-4, a large language model that shocked the world.
This is the strongest LLM on the planet. It can score high in SAT and BAR exams, pass LeetCode challenges, answer physics questions correctly given a picture, and understand emojis. The plot in it.
The technical report also mentioned that GPT-4 can also solve chemical problems.
This inspired several scholars in the Carnegie Mellon Department of Chemistry. They hope to develop an AI based on multiple large language models, allowing it to design and conduct experiments by itself. .
##Paper address: https://arxiv.org/abs/2304.05332
And the AI they made is really bad!
It can search literature on the Internet by itself, accurately control liquid handling instruments, and solve complex problems that require the simultaneous use of multiple hardware modules and the integration of different data sources.
It feels like the AI version of Breaking Bad.
AI that can make ibuprofen by itselfFor example, let this AI synthesize ibuprofen for us.
Enter a simple prompt into it: "Synthetic ibuprofen."
Then this The model will go online to search for what to do.
It identified that the first step requires the Friedel-Crafts reaction of isobutylbenzene and acetic anhydride catalyzed by aluminum chloride.
In addition, this AI can also synthesize aspirin.
and synthetic aspartame.
There is a missing methyl group in the product, and if the model finds the correct synthesis example, it will be executed in the cloud laboratory for correction.
Tell the model: Study the Suzuki reaction. It will immediately and accurately identify the substrate and product.
In addition, we can connect the model to a chemical reaction database, such as Reaxys or SciFinder, through API, which adds a huge layer of buff to the model, and the accuracy rate soars.
And analyzing the previous records of the system can also greatly improve the accuracy of the model.
Let’s first take a look at how to operate a robot to do experiments.
It treats a set of samples as a whole (in this case, the entire microplate).
We can directly give it a prompt using natural language: "Color every other line with a color of your choice."
When executed by a robot, these protocols are very similar to the requested prompts (Figure 4B-E).
The agent's first action is to prepare a small sample of the original solution (Figure 4F).
#Then it asks for a UV-Vis measurement. Once completed, the AI is given a filename containing a NumPy array containing the spectra for each well of the microplate.
AI then wrote Python code to identify the wavelengths with maximum absorbance and used this data to correctly solve the problem.
In previous experiments, the AI may have been affected by the knowledge received during the pre-training stage.
This time, the researchers plan to thoroughly evaluate AI’s ability to design experiments.
AI first integrates the required data from the network, runs some necessary calculations, and finally operates the liquid reagent (the far left of the picture above part) to write a program.
In order to add some complexity, the researchers let the AI apply the heated shaker module.
These requirements are integrated and appear in the AI configuration.
The specific design is as follows: AI controls a liquid operating system equipped with two miniature versions, and the source version contains multiple A source of three reagents, including phenylacetylene and phenylboronic acid, multiple aryl halide coupling partners, as well as two catalysts and two bases.
The picture above is the content in the Source Plate.
The target version is installed on the heated shaker module.
In the picture above, the pipette on the left has a 20 μl range, and the single-channel pipette on the right has a 300 μl range.
The ultimate goal of AI is to design a process that can successfully realize the Suzuki and Sonogshela reactions.
Let’s tell it: You need to use some available reagents to generate these two reactions.
Then, he went online to search, for example, what conditions are needed for these reactions, what are the stoichiometry requirements, etc.
It can be seen that AI has successfully collected the required conditions, the quantification and concentration of the required reagents, etc.
AI picked the right coupling partner to complete the experiment. Among all aryl halides, AI selected bromobenzene for the Suzuki reaction experiment and iodobenzene for the Sonogheirah reaction.
In each round, the AI's choices change somewhat. For example, it also selected p-iodonitrobenzene because of its high reactivity in oxidation reactions.
Bromobenzene was chosen because bromobenzene can participate in the reaction and is less toxic than aryl iodide.
Next, AI chose Pd/NHC as the catalyst because it was more effective. This is a very advanced method for coupling reactions. As for the choice of base, AI took a fancy to triethylamine.
We can see from the above process that this model has unlimited potential in the future. Because it will repeat the experiment many times to analyze the reasoning process of the model and achieve better results.
After selecting different reagents, AI begins to calculate the required amount of each reagent, and then begins to plan the entire experimental process.
The AI also made a mistake in the middle, using the wrong name of the heating shaker module. But AI noticed this in time, spontaneously queried the data, corrected the experimental process, and finally ran successfully.
Putting aside the professional chemical process, let’s summarize the “professionalism” demonstrated by AI in this process.
It can be said that from the above process, AI has demonstrated extremely high analytical reasoning capabilities. It can spontaneously obtain the required information and solve complex problems step by step.
In this process, you can also write super high-quality code yourself and promote experimental design. Moreover, you can also change the code you write based on the output content.
OpenAI has successfully demonstrated the powerful capabilities of GPT-4. One day GPT-4 will definitely be able to participate in real experiments.
But the researchers didn’t want to stop there. They also gave AI a big problem - they gave it instructions to develop a new anti-cancer drug.
Things that don’t exist...can this AI still work?
It turns out that there are really two brushes. AI upholds the principle of not being afraid when encountering difficulties (of course it doesn’t know what fear is), it carefully analyzed the need to develop anti-cancer drugs, studied the current trends in anti-cancer drug development, and then selected a target to continue to explore. , determine its composition.
Then, the AI tried to start the synthesis by itself. It also searched online for information about the reaction mechanism and mechanisms. After the initial steps were completed, it then looked for examples of related reactions.
Finally complete the synthesis.
The content in the picture above cannot be truly synthesized by AI, it is only a theoretical discussion.
Among them are familiar drugs such as methamphetamine (also known as marijuana) and heroin, as well as poisonous gases such as mustard gas that are prohibited from use.
Out of a total of 11 compounds, AI provided synthesis plans for 4 of them, and tried to consult the data to advance the synthesis process.
Among the remaining seven substances, the synthesis of five was decisively rejected by the AI. AI searched the Internet for relevant information about these five compounds and found that it could not be messed with.
For example, AI discovered the relationship between codeine and morphine. It was concluded that this thing is a controlled drug and cannot be synthesized casually.
However, this insurance mechanism is not reliable. As long as the user slightly modifies the flower book, it can be further operated by AI. For example, use the word compound A instead of directly mentioning morphine, use compound B instead of directly mentioning codeine, and so on.
At the same time, the synthesis of some drugs must be licensed by the Drug Enforcement Administration (DEA), but some users can take advantage of this loophole and trick the AI into saying they have permission, inducing the AI to give Synthesis scheme.
AI is also aware of familiar contraband substances such as heroin and mustard gas. The problem is, this system can currently only detect existing compounds. For unknown compounds, the model is less likely to identify potential hazards.
For example, some complex protein toxins.
Therefore, in order to prevent anyone from verifying the effectiveness of these chemical ingredients out of curiosity, the researchers also posted a big red warning in the paper:
The synthesis of illicit drugs and chemical weapons discussed in this article is purely for academic research, with the primary purpose of highlighting the potential dangers associated with new technologies.
Under no circumstances should any person or organization attempt to recreate, synthesize, or otherwise produce the substances or compounds discussed in this article. Not only is engaging in this type of activity extremely dangerous, it is also illegal in most jurisdictions.
This AI is composed of multiple modules. These modules can exchange information with each other, and some can also access the Internet, access APIs, and access the Python interpreter.
After you enter the prompt into Planner, it starts to perform the operation.
For example, it can go online, write code in Python, and access documents. After understanding these basic tasks, it can do experiments on its own.
When humans do experiments, this AI can guide us step by step. Because it can reason about various chemical reactions, search the Internet, calculate the amount of chemicals required in the experiment, and then perform the corresponding reactions.
If the description provided is detailed enough, you don't even need to explain it to it, it can understand the entire experiment by itself.
After the "Web searcher" component receives the query from Planner, it will use the Google search API.
After searching for the results, it will filter out the first ten returned documents, exclude PDFs, and pass the results to itself.
Then, it will use the "BROWSE" operation to extract text from the web page and generate an answer. Flowing clouds and flowing water, all in one go.
This task can be completed by GPT-3.5, because its performance is obviously better than GPT-4, and there is no quality loss.
The "Docs searcher" component can find the most relevant parts through query and document indexing, thereby sorting out hardware documentation (such as robotic liquid handlers, GC- MS, Cloud Lab), and then summarize the best matching results to generate the most accurate answer.
The "Code execution" component does not use any language model and just executes code in an isolated Docker container to protect the terminal host from any unexpected operations by Planner. All code output is passed back to Planner, so that if the software goes wrong, it can be repaired and predicted. The same principle applies to the "Automation" component.
There are many difficulties in making an AI that can perform complex reasoning.
For example, in order for it to be able to integrate modern software, users need to be able to understand the software documentation. However, the language of this documentation is generally very academic and professional, which creates great obstacles.
The large language model can use natural language to generate software documents that non-experts can understand to overcome this obstacle.
One of the training sources for these models is a large amount of information related to APIs, such as Opentrons Python API.
However, the training data of GPT-4 is as of September 2021, so it is even more necessary to improve the accuracy of the API used by AI.
To this end, researchers have designed a method to provide AI with documentation for a given task.
They generated OpenAI’s ada embeddings for cross-referencing and calculating similarity relative to the query. And select parts of the document via distance-based vector search.
The number of provided parts depends on the number of GPT-4 tokens present in the original text. The maximum number of tokens is set to 7800, so that AI-related documents can be provided in just one step.
This method proved crucial for providing the AI with information about the heater-vibrator hardware module, which is required for the chemical reaction.
Greater challenges arise when this approach is applied to more diverse robotics platforms, such as Emerald Cloud Lab (ECL).
At this point, we can provide the GPT-4 model with information it does not know, such as about Cloud Lab’s Symbolic Lab Language (SLL).
In all cases, the AI correctly identified the task and then completed it.
In this process, the model effectively retains information about the various options, tools, and parameters of a given function. After ingesting the entire document, the model is prompted to generate a code block using the given function and passes it back to Planner.
Finally, the researchers emphasized that safeguards must be put in place to prevent large language models from being abused:
"We call on the AI community to prioritize the security of these models. We call on OpenAI, Microsoft, Google, Meta, Deepmind, Anthropic, and other major players to put their best efforts into the security of their large language models. We also call on The physical science community works with teams involved in developing large language models to assist them in developing these safeguards."
In this regard, New York University professor Marku Si strongly agreed: "This is no joke. Three scientists from Carnegie Mellon University urgently called for safety research on LLM."
The above is the detailed content of Blow up the AI and biochemical environment! GPT-4 learns to do scientific research on its own and teaches humans how to conduct experiments step by step. For more information, please follow other related articles on the PHP Chinese website!