search
HomeTechnology peripheralsAIIntroducing ImageMol, the world's first molecular image generation framework based on self-supervised learning

Molecular is the smallest unit that maintains the chemical stability of a substance. The study of molecules is a fundamental issue in many scientific fields such as pharmacy, materials science, biology, and chemistry.

Introducing ImageMol, the worlds first molecular image generation framework based on self-supervised learning

Molecular Representation Learning has been a very popular direction in recent years and can currently be divided into many schools:

  • Computational pharmacologists say: Molecules can be represented as a string of fingerprints, or descriptors, such as AttentiveFP proposed by Shanghai Pharmaceuticals, which is an outstanding representative in this regard.
  • NLPer said: Molecules can be expressed as SMILES (sequences) and then processed as natural language, such as Baidu's X-Mol, which is an outstanding representative in this regard.
  • Graph neural network researchers say: Molecules can be represented as a graph (Graph), which is an adjacency matrix, and then processed using graph neural networks, such as Tencent's GROVER, MIT's DMPNN, Methods such as CMU's MOLCLR are outstanding representatives in this regard.

However, current characterization methods still have some limitations. For example, sequence representation lacks explicit structural information of molecules, and the expression ability of existing graph neural networks still has many limitations (Teacher Shen Huawei from the Institute of Computing Technology, Chinese Academy of Sciences discussed this, see Mr. Shen’s report "The Expression Ability of Graph Neural Networks").

What’s interesting is that when we study molecules in high school chemistry, we see images of molecules. When chemists design molecules, they also observe and think based on molecular images. A natural idea arises spontaneously: "Why not directly use molecular images to represent molecules?"If images can be used directly to represent molecules, then in CV (Computer Vision) Can't all the eighteen martial arts be used to study molecules?

Introducing ImageMol, the worlds first molecular image generation framework based on self-supervised learning

Just do it. There are so many models in CV, why don’t you use them to learn molecules? Stop, there is another important issue - data! Especially labeled data! In the field of CV, data annotation does not seem to be difficult. For classic CV and NLP problems such as image recognition or emotion classification, a person can annotate an average of 800 pieces of data. However, in the molecular field, molecular properties can only be assessed through wet experiments and clinical experiments, so labeled data are very scarce.

Based on this, researchers from Hunan University proposed the world's first unsupervised learning framework for molecular images, ImageMol, which uses large-scale unlabeled molecular image data for unsupervised pre-training. It provides a new paradigm for understanding molecular properties and drug targets, proving that molecular images have great potential in the field of intelligent drug research and development. The result was published in the top international journal "Nature Machine Intelligence" under the title "Accurate prediction of molecular properties and drug targets using a self-supervised image representation learning framework". The success achieved at the intersection of computer vision and molecular fields demonstrates the great potential of using computer vision technology to understand molecular properties and drug target mechanisms, and provides new opportunities for research in the molecular field.

Introducing ImageMol, the worlds first molecular image generation framework based on self-supervised learning

Paper link: https://www.nature.com/articles/s42256-022-00557-6.pdf

ImageMol model structure

The overall structure of ImageMol is shown in the figure below, which is divided into three parts:

Introducing ImageMol, the worlds first molecular image generation framework based on self-supervised learning


(1) Design a molecular encoder ResNet18 (light blue), which can extract latent features from about 10 million molecular images (a).

(2) Considering the chemical knowledge and structural information in the molecular image, five pre-training strategies (MG3C, MRD, JPP, MCL, MIR) are used to optimize the latent representation of the molecular encoder (b). Specifically:

① MG3C (Muti-granularity chemical clusters classification): The structure classifier (dark blue) is used to predict molecular images Chemical structure information;

② MRD (Molecular rationality discrimination): the rationality classifier (green), which is used to distinguish between reasonable and unreasonable molecules;

③ JPP (Jigsaw puzzle prediction): The Jigsaw classifier (light gray) is used to predict the reasonable arrangement of molecules;

④ MCL (MASK-based contrastive learning MASK-based contrastive learning): The contrastive classifier (dark gray) is used to maximize the similarity between the original image and the mask image;

⑤ MIR (Molecular image reconstruction): The generator (yellow) is used to restore latent features to the molecular image, and the discriminator (purple) is used to distinguish between real images and generated images. Fake molecular images generated by the machine.

(3) Fine-tune the preprocessed molecular encoder in downstream tasks to further improve model performance (c).

Introducing ImageMol, the worlds first molecular image generation framework based on self-supervised learning

Benchmark Evaluation

The authors first evaluated the performance of ImageMol using 8 drug discovery benchmark datasets and used two The most popular splitting strategies (scaffold split and random scaffold split) are used to evaluate the performance of ImageMol on all benchmark datasets. In the classification task, the Receiver Operating Characteristic (ROC) curve and the Area Under Curve (AUC) are used to evaluate. From the experimental results, it can be seen that ImageMol can obtain higher AUC values. (Figure a).

Introducing ImageMol, the worlds first molecular image generation framework based on self-supervised learning

Comparison of the detection results of HIV and Tox21 between ImageMol and Chemception, a classic convolutional neural network framework for predicting molecular images (Figure b), ImageMol’s AUC Value is higher. This article further evaluates the performance of ImageMol in predicting drug metabolism by five major metabolizing enzymes: CYP1A2, CYP2C9, CYP2C19, CYP2D6 and CYP3A4. Figure c shows that ImageMol achieves better results compared with three state-of-the-art molecular image-based representation models (Chemception46, ADMET-CNN12 and QSAR-CNN47) in the prediction of inhibitors versus non-inhibitors of five major drug metabolizing enzymes. achieved higher AUC values ​​(ranging from 0.799 to 0.893).

Introducing ImageMol, the worlds first molecular image generation framework based on self-supervised learning

Introducing ImageMol, the worlds first molecular image generation framework based on self-supervised learning

This paper further compares the performance of ImageMol with three state-of-the-art molecular representation models, e.g. As shown in Figures d and e. ImageMol has better performance compared to fingerprint-based models (such as AttentiveFP), sequence-based models (such as TF_Robust), and graph-based models (such as N-GRAM, GROVER, and MPG) that use random skeleton partitioning. Furthermore, ImageMol achieved higher AUC values ​​on CYP1A2, CYP2C9, CYP2C19, CYP2D6 and CYP3A4 compared with traditional MACCS-based methods and FP4-based methods (Figure f).

Introducing ImageMol, the worlds first molecular image generation framework based on self-supervised learning

ImageMol is compared with sequence-based models (including RNN_LR, TRFM_LR, RNN_MLP, TRFM_MLP, RNN_RF, TRFM_RF, and CHEM-BERT) and graph-based models (including MolCLRGIN, MolCLRGCN, and GROVER), as shown in Figure g It shows that ImageMol achieves better AUC performance on CYP1A2, CYP2C9, CYP2C19, CYP2D6, and CYP3A4.

Introducing ImageMol, the worlds first molecular image generation framework based on self-supervised learning

In the above comparison between ImageMol and other advanced models, we can see the superiority of ImageMol.

Since the outbreak of COVID-19, we have urgently needed to develop effective treatment strategies for COVID-19. Therefore, the authors evaluated ImageMol accordingly in this aspect.

Prediction of 13 SARS-CoV-2 targets

ImageMol conducted prediction experiments on 13 SARS-CoV-2 targets that are of concern today. -CoV-2 bioassay data set, ImageMol achieved high AUC values ​​of 72.6% to 83.7%. Panel a reveals the potential signature identified by ImageMol, which clusters well on 13 targets or endpoints active and inactive anti-SARS-CoV-2, with higher AUC values ​​than the other The model Jure's GNN is more than 12% higher, reflecting the high accuracy and strong generalization of the model.

Introducing ImageMol, the worlds first molecular image generation framework based on self-supervised learning

Identification of anti-SARS-CoV-2 inhibitors

The most direct experiment related to the study of drug molecules is here, using ImageMol Directly identify inhibitor molecules! Through the molecular image representation of inhibitors and non-inhibitors of 3CL protease (which has been proven to be a promising therapeutic development target for the treatment of COVID-19) under the ImageMol framework, this study found that 3CL inhibitors and non-inhibitors have significant differences in t- Well separated in the SNE plot, as shown in Figure b below.

In addition, ImageMol identified 10 of the 16 known 3CL protease inhibitors and visualized these 10 drugs into the embedded space in the figure (success rate 62.5%) , indicating high generalization ability in anti-SARS-CoV-2 drug discovery. When using the HEY293 assay to predict anti-SARS-CoV-2 repurposed drugs, ImageMol successfully predicted 42 out of 70 drugs (60% success rate), indicating that ImageMol is also good at inferring potential drug candidates in the HEY293 assay. It has high promotion potential. Figure c below shows ImageMol’s discovery of drugs that are potential inhibitors of 3CL on the DrugBank dataset. Panel d shows the molecular structure of the 3CL inhibitor discovered by ImageMol.

Introducing ImageMol, the worlds first molecular image generation framework based on self-supervised learning

Attention Visualization

ImageMol can obtain prior knowledge of chemical information from molecular image representations, including = O bonds, -OH bond, -NH3 bond and benzene ring. Panels b and c show 12 example molecules visualized by ImageMol's Grad-CAM. This means that ImageMol accurately captures attention to both global (b) and local (c) structural information simultaneously. These results allow researchers to visually understand how molecular structure affects properties and targets.

Introducing ImageMol, the worlds first molecular image generation framework based on self-supervised learning

The above is the detailed content of Introducing ImageMol, the world's first molecular image generation framework based on self-supervised learning. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete
How to Run LLM Locally Using LM Studio? - Analytics VidhyaHow to Run LLM Locally Using LM Studio? - Analytics VidhyaApr 19, 2025 am 11:38 AM

Running large language models at home with ease: LM Studio User Guide In recent years, advances in software and hardware have made it possible to run large language models (LLMs) on personal computers. LM Studio is an excellent tool to make this process easy and convenient. This article will dive into how to run LLM locally using LM Studio, covering key steps, potential challenges, and the benefits of having LLM locally. Whether you are a tech enthusiast or are curious about the latest AI technologies, this guide will provide valuable insights and practical tips. Let's get started! Overview Understand the basic requirements for running LLM locally. Set up LM Studi on your computer

Guy Peri Helps Flavor McCormick's Future Through Data TransformationGuy Peri Helps Flavor McCormick's Future Through Data TransformationApr 19, 2025 am 11:35 AM

Guy Peri is McCormick’s Chief Information and Digital Officer. Though only seven months into his role, Peri is rapidly advancing a comprehensive transformation of the company’s digital capabilities. His career-long focus on data and analytics informs

What is the Chain of Emotion in Prompt Engineering? - Analytics VidhyaWhat is the Chain of Emotion in Prompt Engineering? - Analytics VidhyaApr 19, 2025 am 11:33 AM

Introduction Artificial intelligence (AI) is evolving to understand not just words, but also emotions, responding with a human touch. This sophisticated interaction is crucial in the rapidly advancing field of AI and natural language processing. Th

12 Best AI Tools for Data Science Workflow - Analytics Vidhya12 Best AI Tools for Data Science Workflow - Analytics VidhyaApr 19, 2025 am 11:31 AM

Introduction In today's data-centric world, leveraging advanced AI technologies is crucial for businesses seeking a competitive edge and enhanced efficiency. A range of powerful tools empowers data scientists, analysts, and developers to build, depl

AV Byte: OpenAI's GPT-4o Mini and Other AI InnovationsAV Byte: OpenAI's GPT-4o Mini and Other AI InnovationsApr 19, 2025 am 11:30 AM

This week's AI landscape exploded with groundbreaking releases from industry giants like OpenAI, Mistral AI, NVIDIA, DeepSeek, and Hugging Face. These new models promise increased power, affordability, and accessibility, fueled by advancements in tr

Perplexity's Android App Is Infested With Security Flaws, Report FindsPerplexity's Android App Is Infested With Security Flaws, Report FindsApr 19, 2025 am 11:24 AM

But the company’s Android app, which offers not only search capabilities but also acts as an AI assistant, is riddled with a host of security issues that could expose its users to data theft, account takeovers and impersonation attacks from malicious

Everyone's Getting Better At Using AI: Thoughts On Vibe CodingEveryone's Getting Better At Using AI: Thoughts On Vibe CodingApr 19, 2025 am 11:17 AM

You can look at what’s happening in conferences and at trade shows. You can ask engineers what they’re doing, or consult with a CEO. Everywhere you look, things are changing at breakneck speed. Engineers, and Non-Engineers What’s the difference be

Rocket Launch Simulation and Analysis using RocketPy - Analytics VidhyaRocket Launch Simulation and Analysis using RocketPy - Analytics VidhyaApr 19, 2025 am 11:12 AM

Simulate Rocket Launches with RocketPy: A Comprehensive Guide This article guides you through simulating high-power rocket launches using RocketPy, a powerful Python library. We'll cover everything from defining rocket components to analyzing simula

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

SublimeText3 Linux new version

SublimeText3 Linux new version

SublimeText3 Linux latest version

Dreamweaver Mac version

Dreamweaver Mac version

Visual web development tools

ZendStudio 13.5.1 Mac

ZendStudio 13.5.1 Mac

Powerful PHP integrated development environment

SecLists

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)