Home > Article > Technology peripherals > Machine learning model classifies organic reaction mechanisms with outstanding accuracy
The discovery of chemical reactions is affected not only by how quickly experimental data can be obtained, but also by how easy it is for chemists to understand that data. Uncovering the mechanistic basis of new catalytic reactions is a particularly complex problem that often requires expertise in computational and physical organic chemistry. However, it is important to study catalytic reactions because they represent the most efficient chemical processes.
Recently, Burés and Larrosa from the Department of Chemistry at the University of Manchester (UoM) in the UK reported a machine learning model that demonstrated that a deep neural network model can be trained to analyze ordinary dynamics learn data and automatically elucidate the corresponding mechanistic categories without any additional user input. The model identifies various types of mechanisms with excellent accuracy.
The findings demonstrate that AI-guided mechanism classification is a powerful new tool that can simplify and automate mechanism elucidation. This work is expected to further advance the discovery and development of fully automated organic reactions.
The research is titled "Organic reaction mechanism classification using machine learning" and was published in "Nature on January 25, 2023 "superior.
Paper link: https://www.nature.com/articles/s41586 -022-05639-4
Determine the conversion of the substrate into The exact sequence of the basic steps involved in producing a product is critical for rationally improving synthesis methods, designing new catalysts, and safely scaling up industrial processes. To elucidate the reaction mechanism, multiple kinetic curves need to be collected, and human experts must perform kinetic analysis on the data. Although reaction monitoring technology has improved significantly over the past few decades to the point where kinetic data collection can be fully automated, the theoretical framework underlying mechanistic elucidation has not evolved at the same pace.
The current kinetic analysis pipeline consists of three main steps: extracting kinetic properties from experimental data, predicting kinetic properties for all possible mechanisms, and combining experimentally extracted properties with predictions characteristics for comparison.
For more than a century, chemists have been extracting mechanistic information from reaction rates. One method still used today is to evaluate the initial rate of a reaction, focusing on the consumption of the first few percent of the starting material. This method is popular because in most cases the change in reactant concentration over time is linear at the beginning of the reaction and is therefore simple to analyze. Although insightful, this technique ignores changes in reaction rates and concentrations that occur over much of the time course.
Over the past few decades, more advanced methods have been developed to evaluate the concentrations of reaction components throughout the reaction process. These methods are further facilitated by mathematical techniques that reveal the number of components participating in a reaction step (also known as the order of reaction components) from reaction kinetic diagrams. These techniques will certainly continue to provide insights into chemical reactivity, but they are limited to analyzing the order of reaction components rather than providing a more comprehensive mechanistic hypothesis describing the kinetic behavior of a catalytic system.
Figure 1: Relevance and state-of-the-art techniques for kinetic analysis. (Source: paper)
Machine learning is revolutionizing the way chemists solve problems, From designing molecules and routes to synthesizing molecules to understanding reaction mechanisms. Burés and Larrosa are now bringing this revolution to kinetic analysis by using machine learning models to classify reactions based on their simulated kinetic characteristics.
Here, researchers demonstrate that a deep learning model trained on simulated kinetic data is able to correctly elucidate various mechanisms from temporal concentration distributions. Machine learning models simplify kinetic analysis by eliminating the need for rate law derivation and kinetic property extraction and prediction, greatly facilitating the elucidation of reaction mechanisms in all synthesis laboratories.
Due to the holistic analysis of all available kinetic data, this method improves the ability to interrogate reaction curves, eliminates potential human error during kinetic analysis, and expands the available The kinetic range of the analysis includes non-steady states (including activation and deactivation processes) and reversible reactions. This approach would complement currently available kinetic analysis methods and would be particularly useful in the most challenging situations.
The researchers defined 20 categories of reaction mechanisms and developed rate laws for each category. Each mechanism consists of a set of kinetic constants (k1, … kn ) and the chemical substance concentration are mathematically described by ordinary differential equation (ODE) functions. They then solved these equations, generating millions of simulations describing the decay of reactants and the production of products. These simulated kinetics data are used to train learning algorithms to identify characteristic signatures for each mechanistic class. The resulting classification model uses kinetic curves as input, including initial and time concentration data, and outputs the mechanistic class of the reaction.
Figure 2: Mechanistic scope and data composition. (Source: Paper)
Training of deep learning models often requires large amounts of data, which can pose considerable challenges when this data must be collected experimentally.
Burés and Larrosa's approach to training the algorithm avoids the bottleneck of generating large amounts of experimental kinetic data. In this case, the researchers were able to numerically solve a set of ODEs to generate 5 million dynamics samples for model training and validation without using steady-state approximations.
The model contains 576,000 trainable parameters and uses a combination of two types of neural networks: (1) long-short-term memory neural network, which is used to process temporal data sequences (i.e., time concentration data); (2) a fully connected neural network for processing non-temporal data (i.e., the initial concentration of the catalyst in each kinetic run and the features extracted from long short-term memory). The model outputs a probability for each mechanism that sums to 1.
The researchers evaluated the trained model using a test set of simulated kinetic curves and demonstrated that it correctly assigned these curves to mechanistic classes with 92.6% accuracy.
Figure 3: Performance of the machine learning model on the test set, each kinetic curve has six point in time. (Source: paper)
#The model performs well even when "noisy" data is intentionally introduced, which means it can be used to classify experimental data.
Figure 4: The impact of error and number of data points on machine learning model performance. (Source: paper)
Finally, the researchers benchmarked their model using several previously reported experimental kinetic curves. The predicted mechanism is in good agreement with the conclusions of earlier kinetic studies. In some cases, the model also identified mechanistic details that were not detected in the original work. For a challenging reaction, the model proposes three very similar mechanistic categories. However, the authors correctly state that this result is not a bug but a feature of their model, as it suggests that further specific experiments are needed to explore the mechanism.
Figure 5: Case study with experimental kinetic data. (Source: Paper)
In summary, Burés and Larrosa have developed a method that not only automates the long process of deriving mechanistic hypotheses from kinetic studies; Perform kinetic analysis of challenging reaction mechanisms. As with any technological advance in data analysis, the resulting mechanistic classifications should be viewed as hypotheses requiring further experimental support. There is always a risk of misinterpreting kinetic data, but the algorithm's ability to identify the correct reaction path with high accuracy based on a small number of experiments could convince more researchers to try kinetic analysis.
Thus, this approach could popularize and facilitate the incorporation of kinetic analysis into reaction development processes, especially as chemists become more familiar with machine learning algorithms.
The above is the detailed content of Machine learning model classifies organic reaction mechanisms with outstanding accuracy. For more information, please follow other related articles on the PHP Chinese website!