Home >Technology peripherals >AI >The prediction accuracy is as high as 0.98. Tsinghua University, Shenzhen Technology and others proposed a multifunctional prediction framework for MOF materials based on Transformer.
Editor| Out of huge potential.
Traditional simulation methods, such as molecular dynamics, although complex and computationally demanding, are highly accurate in simulating system behavior. In contrast, machine learning methods based on feature engineering perform better when dealing with complex systems. However, due to the scarcity of labeled data, it can easily lead to overfitting problems. Furthermore, these machine learning methods are usually designed to solve a single task and lack support for multi-task learning. Therefore, when choosing an appropriate method, factors such as accuracy, data requirements, and task complexity need to be weighed to find the solution that best fits the specific problem.
To address these challenges, a multi-institutional team composed of Tsinghua University, University of California, Sun Yat-sen University, Suzhou University, Shenzhen Technology and AI for Science Institute (AISI, Beijing) collaborated present Uni-MOF, an innovative framework for large-scale three-dimensional MOF representation learning, designed for multi-purpose gas prediction. Uni-MOFs are suitable for both scientific research and practical applications.
Uni-MOF can be regarded as a multifunctional gas adsorption predictor for MOF materials, showing excellent prediction accuracy in simulated data, marking an important application of machine learning in gas adsorption research.
The study was titled "A comprehensive transformer-based approach for high-accuracy gas adsorption predictions in metal-organic frameworks" and was published in "Nature Communications" on March 1, 2024.
Paper link: https://www.nature.com/articles/s41467-024-46276-x
Need a unified adsorption framework
Metal-organic frameworks (MOFs) are widely used in gas separation and other fields because of their adjustable structural properties and chemical composition.
Although MOFs have great potential for gas adsorption, accurately predicting their adsorption capacity remains a challenge.
Computational methods such as molecular dynamics and Monte Carlo (MC) have high computational costs and complex implementation, which limits their use in large-scale, multi-gas and high-throughput calculations. In addition, gas adsorption operates over a wide range of conditions, making predictions more complex.
Graph neural networks and Transformers have been shown to successfully predict MOF properties.
Although existing models for predicting adsorption properties have high performance and strong predictive capabilities, they are usually designed for a single task, specifically predicting the adsorption absorption rate of a specific gas under specific conditions. However, the available datasets for these single-task predictions are often limited, hindering the generalizability of the models.
On the other hand, the combination of labeled data from various adsorbed gases in different temperature and pressure environments can create large data sets suitable for training across the entire operating conditions. The increased amount of data can also enhance the model's generalization capabilities and improve its practical industrial use. Therefore, a unified adsorption framework is needed to advance these models.
In addition, ensemble representation learning, or pre-training, for large-scale unlabeled MOF structures can further improve model performance and representation capabilities.
Uni-MOF Framework: Suitable for both scientific research and practical applications
Inspired by this, the research team proposed the Uni-MOF framework as a multi-purpose solution that uses structural representation learning to predict Gas adsorption of MOF under different conditions.
Compared with other Transformer-based models (such as MOFormer and MOFTransformer), Uni-MOF, as a Transformer-based framework, can not only identify and restore the three-dimensional structure of nanoporous materials in pre-training, thus greatly improving Robustness of nanoporous materials. And the fine-tuning task further takes into account operating conditions such as temperature, pressure and different gas molecules, making Uni-MOF suitable for both scientific research and practical applications.
Uni-MOF As a comprehensive gas adsorption estimator for MOF materials, only the crystal information file (CIF) of the MOF and related gas, temperature and pressure parameters are needed to predict nanoporous materials under a wide range of operating conditions. gas adsorption characteristics. The Uni-MOF framework is easy to use and allows module selection.
In addition, the problem of overfitting is effectively solved by combining various cross-system absorption labeled data with representation learning of a large amount of unlabeled structural data. This compensates for both high-quality data and data deficiencies, ultimately improving the accuracy of gas adsorption predictions.
The Uni-MOF framework enables atomic-level material identification accuracy, while integrated models make Uni-MOF more applicable to engineering problems. There is no doubt that achieving truly unified models is the future direction of the materials field, rather than just focusing on specialized fields. Uni-MOF is a pioneering practice of machine learning in the field of gas adsorption.
Uni-MOF Framework Overview
The Uni-MOF framework includes pre-training of three-dimensional nanoporous crystals and fine-tuning of multi-task predictions for downstream applications.
Figure 1: Schematic of the Uni-MOF framework. (Source: Paper)
Pre-training on 3D crystalline materials significantly enhances the prediction performance for downstream tasks, especially for large-scale unlabeled data.
To solve the problem of insufficient supervision of training data sets, researchers collected a large dataset of MOF structures and generated more than 300,000 MOFs using ToBaCCo.3.0. High-throughput construction of COFs based on Materials Genome Strategies and Quasi-Reactive Assembly Algorithm (QReaxAA) is feasible to establish a comprehensive COF library. Through the spatial configuration of the material, Uni-MOF is able to well learn the structural properties of the material, and the most important thing is the chemical bond information.
In order to enable Uni-MOF to learn a more diverse range of materials and thus improve the generalization ability to a wider range of materials, MOF and COF were introduced virtually and experimentally during the pre-training process. Similar to the masked labeling task in BERT and Uni-Mol, Uni-MOF adopts the prediction task of masked atoms, thereby facilitating pre-trained models to gain in-depth understanding of the material spatial structure.
To enhance the robustness of pre-training and generalize the learned representations, the researchers introduced noise to the original coordinates of MOFs. In the pre-training phase, two tasks are designed. (1) Reconstruct original 3D positions from noisy data, and (2) predict shielded atoms. These tasks can enhance model robustness and improve downstream predictive performance.
In addition to diverse spatial configurations, a comprehensive set of material property data points is also critical for model training. To enrich the dataset, the researchers established a custom data generation process (shown in Figure 1b).
Fine-tuning of Uni-MOF is based on the extraction of representations obtained through pre-training, and the use of home-made workflows to generate and collect large datasets. During the fine-tuning process, approximately 3,000,000 labeled data points under various adsorption conditions for MOFs and COFs were used to train the model, enabling accurate prediction of adsorption capacity.
With a diverse database of cross-system target data, Uni-MOF is fine-tuned to predict the multi-system adsorption properties of MOFs in any state. Therefore, Uni-MOF is a unified and easy-to-use framework for predicting the adsorption performance of MOF adsorbents.
Most importantly, Uni-MOF requires no additional labor to identify human-defined structural features. Instead, the CIF of the MOF and the associated gas, temperature and pressure parameters are sufficient. The self-supervised learning strategy and rich database ensure that Uni-MOF is able to predict the gas adsorption properties of nanoporous materials under various operating parameters, making it a proficient estimator of gas adsorption for MOF materials.
Prediction accuracy up to 0.98, predicts across systems
The study performed self-supervised learning on a database of more than 631,000 MOFs and COFs, with prediction accuracy up to 0.98. This shows that the representation learning framework based on 3D pre-training effectively learns the complex structural information of MOF while avoiding overfitting.
Uni-MOF was applied to predict the gas adsorption performance of three major databases (hMOF_MOFX-DB, CoRE_MOFX-DB and CoRE_MAP_DB), and a prediction accuracy of up to 0.98 was achieved in databases with sufficient data.
#Figure 2: Overall performance of Uni-MOF in large-scale databases. (Source: paper)
When the data set is fully sampled, Uni-MOF not only maintains a prediction accuracy of more than 0.83, but also can accurately select high performance at high pressure by predicting adsorption at low pressure only. adsorbent, consistent with the experimental screening results. Uni-MOF therefore represents a major breakthrough in the application of machine learning techniques in the field of materials science.
Figure 3: Adsorption isotherms based on low pressure predictions and high pressure experimental values, each curve represents a Langmuir fit. (Source: paper)
In addition, compared with single-system tasks, the Uni-MOF framework shows superior performance on cross-system data sets and can accurately predict the adsorption characteristics of unknown gases with a prediction accuracy as high as 0.85, Demonstrates its strong predictive power and versatility.
Figure 4: Uni-MOF cross-system prediction case. (Source: paper)
Research shows that pre-trained self-supervised learning strategies can effectively improve the robustness and downstream prediction performance of Uni-MOF.
Figure 5: Comparison of Uni-MOF and Uni-MOF without pre-training. (Source: paper)
Through extensive pre-training on three-dimensional structures, Uni-MOF effectively learns the structural features of MOFs and achieves a high coefficient of determination of 0.99 for hMOFs.
Figure 6: Prediction and analysis of structural characteristics. (Source: paper)
In addition, t-SNE (t-distributed stochastic neighbor embedding) analysis confirmed that the fine-tuning stage can further learn structural features and can well identify structures with different adsorbate behaviors, which It is shown that there is a strong correlation between the learned representation and the gas adsorption target.
Figure 7: Visualization of MOF structural representation in hMOF and CoRE_MOF datasets, low-dimensional embeddings computed by t-SNE method. (Source: paper)
In short, the Uni-MOF framework, as a multi-functional prediction platform for MOF materials, acts as a gas adsorption estimator for MOFs and has high accuracy in predicting gas adsorption under different operating conditions. It has broad application prospects in the field of materials science.
The above is the detailed content of The prediction accuracy is as high as 0.98. Tsinghua University, Shenzhen Technology and others proposed a multifunctional prediction framework for MOF materials based on Transformer.. For more information, please follow other related articles on the PHP Chinese website!