search
HomeTechnology peripheralsAIIntroducing ImageMol, the world's first molecular image generation framework based on self-supervised learning

Molecular is the smallest unit that maintains the chemical stability of a substance. The study of molecules is a fundamental issue in many scientific fields such as pharmacy, materials science, biology, and chemistry.

Introducing ImageMol, the worlds first molecular image generation framework based on self-supervised learning

Molecular Representation Learning has been a very popular direction in recent years and can currently be divided into many schools:

  • Computational pharmacologists say: Molecules can be represented as a string of fingerprints, or descriptors, such as AttentiveFP proposed by Shanghai Pharmaceuticals, which is an outstanding representative in this regard.
  • NLPer said: Molecules can be expressed as SMILES (sequences) and then processed as natural language, such as Baidu's X-Mol, which is an outstanding representative in this regard.
  • Graph neural network researchers say: Molecules can be represented as a graph (Graph), which is an adjacency matrix, and then processed using graph neural networks, such as Tencent's GROVER, MIT's DMPNN, Methods such as CMU's MOLCLR are outstanding representatives in this regard.

However, current characterization methods still have some limitations. For example, sequence representation lacks explicit structural information of molecules, and the expression ability of existing graph neural networks still has many limitations (Teacher Shen Huawei from the Institute of Computing Technology, Chinese Academy of Sciences discussed this, see Mr. Shen’s report "The Expression Ability of Graph Neural Networks").

What’s interesting is that when we study molecules in high school chemistry, we see images of molecules. When chemists design molecules, they also observe and think based on molecular images. A natural idea arises spontaneously: "Why not directly use molecular images to represent molecules?"If images can be used directly to represent molecules, then in CV (Computer Vision) Can't all the eighteen martial arts be used to study molecules?

Introducing ImageMol, the worlds first molecular image generation framework based on self-supervised learning

Just do it. There are so many models in CV, why don’t you use them to learn molecules? Stop, there is another important issue - data! Especially labeled data! In the field of CV, data annotation does not seem to be difficult. For classic CV and NLP problems such as image recognition or emotion classification, a person can annotate an average of 800 pieces of data. However, in the molecular field, molecular properties can only be assessed through wet experiments and clinical experiments, so labeled data are very scarce.

Based on this, researchers from Hunan University proposed the world's first unsupervised learning framework for molecular images, ImageMol, which uses large-scale unlabeled molecular image data for unsupervised pre-training. It provides a new paradigm for understanding molecular properties and drug targets, proving that molecular images have great potential in the field of intelligent drug research and development. The result was published in the top international journal "Nature Machine Intelligence" under the title "Accurate prediction of molecular properties and drug targets using a self-supervised image representation learning framework". The success achieved at the intersection of computer vision and molecular fields demonstrates the great potential of using computer vision technology to understand molecular properties and drug target mechanisms, and provides new opportunities for research in the molecular field.

Introducing ImageMol, the worlds first molecular image generation framework based on self-supervised learning

Paper link: https://www.nature.com/articles/s42256-022-00557-6.pdf

ImageMol model structure

The overall structure of ImageMol is shown in the figure below, which is divided into three parts:

Introducing ImageMol, the worlds first molecular image generation framework based on self-supervised learning


(1) Design a molecular encoder ResNet18 (light blue), which can extract latent features from about 10 million molecular images (a).

(2) Considering the chemical knowledge and structural information in the molecular image, five pre-training strategies (MG3C, MRD, JPP, MCL, MIR) are used to optimize the latent representation of the molecular encoder (b). Specifically:

① MG3C (Muti-granularity chemical clusters classification): The structure classifier (dark blue) is used to predict molecular images Chemical structure information;

② MRD (Molecular rationality discrimination): the rationality classifier (green), which is used to distinguish between reasonable and unreasonable molecules;

③ JPP (Jigsaw puzzle prediction): The Jigsaw classifier (light gray) is used to predict the reasonable arrangement of molecules;

④ MCL (MASK-based contrastive learning MASK-based contrastive learning): The contrastive classifier (dark gray) is used to maximize the similarity between the original image and the mask image;

⑤ MIR (Molecular image reconstruction): The generator (yellow) is used to restore latent features to the molecular image, and the discriminator (purple) is used to distinguish between real images and generated images. Fake molecular images generated by the machine.

(3) Fine-tune the preprocessed molecular encoder in downstream tasks to further improve model performance (c).

Introducing ImageMol, the worlds first molecular image generation framework based on self-supervised learning

Benchmark Evaluation

The authors first evaluated the performance of ImageMol using 8 drug discovery benchmark datasets and used two The most popular splitting strategies (scaffold split and random scaffold split) are used to evaluate the performance of ImageMol on all benchmark datasets. In the classification task, the Receiver Operating Characteristic (ROC) curve and the Area Under Curve (AUC) are used to evaluate. From the experimental results, it can be seen that ImageMol can obtain higher AUC values. (Figure a).

Introducing ImageMol, the worlds first molecular image generation framework based on self-supervised learning

Comparison of the detection results of HIV and Tox21 between ImageMol and Chemception, a classic convolutional neural network framework for predicting molecular images (Figure b), ImageMol’s AUC Value is higher. This article further evaluates the performance of ImageMol in predicting drug metabolism by five major metabolizing enzymes: CYP1A2, CYP2C9, CYP2C19, CYP2D6 and CYP3A4. Figure c shows that ImageMol achieves better results compared with three state-of-the-art molecular image-based representation models (Chemception46, ADMET-CNN12 and QSAR-CNN47) in the prediction of inhibitors versus non-inhibitors of five major drug metabolizing enzymes. achieved higher AUC values ​​(ranging from 0.799 to 0.893).

Introducing ImageMol, the worlds first molecular image generation framework based on self-supervised learning

Introducing ImageMol, the worlds first molecular image generation framework based on self-supervised learning

This paper further compares the performance of ImageMol with three state-of-the-art molecular representation models, e.g. As shown in Figures d and e. ImageMol has better performance compared to fingerprint-based models (such as AttentiveFP), sequence-based models (such as TF_Robust), and graph-based models (such as N-GRAM, GROVER, and MPG) that use random skeleton partitioning. Furthermore, ImageMol achieved higher AUC values ​​on CYP1A2, CYP2C9, CYP2C19, CYP2D6 and CYP3A4 compared with traditional MACCS-based methods and FP4-based methods (Figure f).

Introducing ImageMol, the worlds first molecular image generation framework based on self-supervised learning

ImageMol is compared with sequence-based models (including RNN_LR, TRFM_LR, RNN_MLP, TRFM_MLP, RNN_RF, TRFM_RF, and CHEM-BERT) and graph-based models (including MolCLRGIN, MolCLRGCN, and GROVER), as shown in Figure g It shows that ImageMol achieves better AUC performance on CYP1A2, CYP2C9, CYP2C19, CYP2D6, and CYP3A4.

Introducing ImageMol, the worlds first molecular image generation framework based on self-supervised learning

In the above comparison between ImageMol and other advanced models, we can see the superiority of ImageMol.

Since the outbreak of COVID-19, we have urgently needed to develop effective treatment strategies for COVID-19. Therefore, the authors evaluated ImageMol accordingly in this aspect.

Prediction of 13 SARS-CoV-2 targets

ImageMol conducted prediction experiments on 13 SARS-CoV-2 targets that are of concern today. -CoV-2 bioassay data set, ImageMol achieved high AUC values ​​of 72.6% to 83.7%. Panel a reveals the potential signature identified by ImageMol, which clusters well on 13 targets or endpoints active and inactive anti-SARS-CoV-2, with higher AUC values ​​than the other The model Jure's GNN is more than 12% higher, reflecting the high accuracy and strong generalization of the model.

Introducing ImageMol, the worlds first molecular image generation framework based on self-supervised learning

Identification of anti-SARS-CoV-2 inhibitors

The most direct experiment related to the study of drug molecules is here, using ImageMol Directly identify inhibitor molecules! Through the molecular image representation of inhibitors and non-inhibitors of 3CL protease (which has been proven to be a promising therapeutic development target for the treatment of COVID-19) under the ImageMol framework, this study found that 3CL inhibitors and non-inhibitors have significant differences in t- Well separated in the SNE plot, as shown in Figure b below.

In addition, ImageMol identified 10 of the 16 known 3CL protease inhibitors and visualized these 10 drugs into the embedded space in the figure (success rate 62.5%) , indicating high generalization ability in anti-SARS-CoV-2 drug discovery. When using the HEY293 assay to predict anti-SARS-CoV-2 repurposed drugs, ImageMol successfully predicted 42 out of 70 drugs (60% success rate), indicating that ImageMol is also good at inferring potential drug candidates in the HEY293 assay. It has high promotion potential. Figure c below shows ImageMol’s discovery of drugs that are potential inhibitors of 3CL on the DrugBank dataset. Panel d shows the molecular structure of the 3CL inhibitor discovered by ImageMol.

Introducing ImageMol, the worlds first molecular image generation framework based on self-supervised learning

Attention Visualization

ImageMol can obtain prior knowledge of chemical information from molecular image representations, including = O bonds, -OH bond, -NH3 bond and benzene ring. Panels b and c show 12 example molecules visualized by ImageMol's Grad-CAM. This means that ImageMol accurately captures attention to both global (b) and local (c) structural information simultaneously. These results allow researchers to visually understand how molecular structure affects properties and targets.

Introducing ImageMol, the worlds first molecular image generation framework based on self-supervised learning

The above is the detailed content of Introducing ImageMol, the world's first molecular image generation framework based on self-supervised learning. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete
From Friction To Flow: How AI Is Reshaping Legal WorkFrom Friction To Flow: How AI Is Reshaping Legal WorkMay 09, 2025 am 11:29 AM

The legal tech revolution is gaining momentum, pushing legal professionals to actively embrace AI solutions. Passive resistance is no longer a viable option for those aiming to stay competitive. Why is Technology Adoption Crucial? Legal professional

This Is What AI Thinks Of You And Knows About YouThis Is What AI Thinks Of You And Knows About YouMay 09, 2025 am 11:24 AM

Many assume interactions with AI are anonymous, a stark contrast to human communication. However, AI actively profiles users during every chat. Every prompt, every word, is analyzed and categorized. Let's explore this critical aspect of the AI revo

7 Steps To Building A Thriving, AI-Ready Corporate Culture7 Steps To Building A Thriving, AI-Ready Corporate CultureMay 09, 2025 am 11:23 AM

A successful artificial intelligence strategy cannot be separated from strong corporate culture support. As Peter Drucker said, business operations depend on people, and so does the success of artificial intelligence. For organizations that actively embrace artificial intelligence, building a corporate culture that adapts to AI is crucial, and it even determines the success or failure of AI strategies. West Monroe recently released a practical guide to building a thriving AI-friendly corporate culture, and here are some key points: 1. Clarify the success model of AI: First of all, we must have a clear vision of how AI can empower business. An ideal AI operation culture can achieve a natural integration of work processes between humans and AI systems. AI is good at certain tasks, while humans are good at creativity and judgment

Netflix New Scroll, Meta AI's Game Changers, Neuralink Valued At $8.5 BillionNetflix New Scroll, Meta AI's Game Changers, Neuralink Valued At $8.5 BillionMay 09, 2025 am 11:22 AM

Meta upgrades AI assistant application, and the era of wearable AI is coming! The app, designed to compete with ChatGPT, offers standard AI features such as text, voice interaction, image generation and web search, but has now added geolocation capabilities for the first time. This means that Meta AI knows where you are and what you are viewing when answering your question. It uses your interests, location, profile and activity information to provide the latest situational information that was not possible before. The app also supports real-time translation, which completely changed the AI ​​experience on Ray-Ban glasses and greatly improved its usefulness. The imposition of tariffs on foreign films is a naked exercise of power over the media and culture. If implemented, this will accelerate toward AI and virtual production

Take These Steps Today To Protect Yourself Against AI CybercrimeTake These Steps Today To Protect Yourself Against AI CybercrimeMay 09, 2025 am 11:19 AM

Artificial intelligence is revolutionizing the field of cybercrime, which forces us to learn new defensive skills. Cyber ​​criminals are increasingly using powerful artificial intelligence technologies such as deep forgery and intelligent cyberattacks to fraud and destruction at an unprecedented scale. It is reported that 87% of global businesses have been targeted for AI cybercrime over the past year. So, how can we avoid becoming victims of this wave of smart crimes? Let’s explore how to identify risks and take protective measures at the individual and organizational level. How cybercriminals use artificial intelligence As technology advances, criminals are constantly looking for new ways to attack individuals, businesses and governments. The widespread use of artificial intelligence may be the latest aspect, but its potential harm is unprecedented. In particular, artificial intelligence

A Symbiotic Dance: Navigating Loops Of Artificial And Natural PerceptionA Symbiotic Dance: Navigating Loops Of Artificial And Natural PerceptionMay 09, 2025 am 11:13 AM

The intricate relationship between artificial intelligence (AI) and human intelligence (NI) is best understood as a feedback loop. Humans create AI, training it on data generated by human activity to enhance or replicate human capabilities. This AI

AI's Biggest Secret — Creators Don't Understand It, Experts SplitAI's Biggest Secret — Creators Don't Understand It, Experts SplitMay 09, 2025 am 11:09 AM

Anthropic's recent statement, highlighting the lack of understanding surrounding cutting-edge AI models, has sparked a heated debate among experts. Is this opacity a genuine technological crisis, or simply a temporary hurdle on the path to more soph

Bulbul-V2 by Sarvam AI: India's Best TTS ModelBulbul-V2 by Sarvam AI: India's Best TTS ModelMay 09, 2025 am 10:52 AM

India is a diverse country with a rich tapestry of languages, making seamless communication across regions a persistent challenge. However, Sarvam’s Bulbul-V2 is helping to bridge this gap with its advanced text-to-speech (TTS) t

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

MinGW - Minimalist GNU for Windows

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

MantisBT

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment