Home >Technology peripherals >AI >Achieving high versatility with small amounts of data, KAIST develops new framework for 3D molecule generation for drug design

Achieving high versatility with small amounts of data, KAIST develops new framework for 3D molecule generation for drug design

PHPz
PHPzforward
2024-04-02 21:30:011156browse

Achieving high versatility with small amounts of data, KAIST develops new framework for 3D molecule generation for drug design

Editor | Radish Skin

Deep generative models have great potential to accelerate drug design. However, existing generative models often face generalization challenges due to limited data, resulting in less innovative designs.

To address these issues, researchers at KAIST in South Korea proposed an interaction-aware 3D molecular generation functional framework that enables interaction-guided interaction design within the target binding pocket. By utilizing common patterns of protein-ligand interactions as prior knowledge, the model can achieve a high degree of generality with limited experimental data. At the same time, using protein mass-ligand mass as a general pattern for interaction purposes, the model can achieve a good balance between generality and high specificity, which provides generality and predictability for drug design.

The performance of the generated unseen target ligands was comprehensively evaluated by analyzing their binding posture certainty, affinity, diversity and novelty. Furthermore, the efficient design of potential mutation-selective inhibitors demonstrates the applicability of this approach to structure-based drug design.

The study was titled "3D molecular generative framework for interaction-guided drug design" and was published in "Nature Communications" on March 27, 2024.

Achieving high versatility with small amounts of data, KAIST develops new framework for 3D molecule generation for drug design

In data capture and scientific problems, appropriate hierarchical prior knowledge of deep learning models is crucial to developing generalizable models. For example, AlphaFold successfully predicts protein structures by leveraging co-evolutionary information and residue pair representations. Deep generative models are changing the drug design paradigm, but their performance is limited by the lack of activity data on drug molecules, resulting in low generalization capabilities. To improve the performance of deep generative models, we need appropriate prior knowledge to ensure their suitability for generalization of drug molecule activity data, which is critical for predicting challenging compound structures and properties.

Recent generative functional models improve the waveformation capabilities of the model by utilizing the three-dimensional structure of the binding site for structure-based ligand design without relying on activity data. A well-waved model should understand the universal properties of protein-ligand interactions, including hydrogen bonds, salt bridges, hydrophobic interactions, and π-π stacking. This is essential to form a stable binding structure and maintain high affinity. These ubiquitous interaction patterns are the basis for the design of powerful drugs.

Based on these circumstances, KAIST researchers proposed an interaction-aware 3D molecular generation framework. This framework exploits the universal nature of protein-ligand interactions to guide structure-based drug design. The framework consists of two main stages: (1) interaction sensing condition setting and (2) interacting 3D molecule generation.

Achieving high versatility with small amounts of data, KAIST develops new framework for 3D molecule generation for drug design

Illustration: Framework concept illustration. (Source: paper)

The first stage of the framework aims to set the interaction conditions I by studying the protein atoms for a given binding site P. The researchers used four types of protein-ligand interactions—hydrogen bonds, salt bridges, hydrophobic interactions, and π-π stacking. Here the researchers only considered the four most dominant interaction types in the Protein Data Bank (PDB), mainly because they used the PDBbind 2020 data set derived from the PDB for model training.

At the same time, the team developed a protein-atom interaction sensing regulation strategy. The researchers define interaction conditions as a one-hot vector of additional interaction types for a set of protein atoms, which indicates whether an atom can participate in a specific interaction and its role in the interaction.

Protein atoms are classified into one of seven categories: anions, cations, hydrogen bond donors and acceptors, aromatic, hydrophobic and non-interacting atoms. Instead of representing the entire interaction information as a single interaction fingerprint, the team's strategy aims to establish interaction conditions locally.

In this work, the researchers mainly determined the interaction categories of bag atoms through two strategies.

During the generation phase, since information on receptor-ligand interactions is not always available, criteria for interaction categories are predefined in order to specify interaction conditions by analyzing each protein atom. This The condition set is called the reference-free interaction condition.

During the training phase, the ground-truth structures of protein-ligand complexes are used to extract interaction conditions.

The researchers also proposed a deep generative model called DeepICL for reverse engineering ligands, which gradually generates atoms in the ligand based on the three-dimensional environment of the pocket and the first-stage interaction conditions.

Although target pockets can form different combinations of protein-ligand interaction types depending on the bound ligand and its binding posture; the team's goal was to reverse engineer one using a 3D conditional generative model called DeepICL. For ligands that satisfy specific interaction combinations, the model can be applied to any type of protein. Researchers use local interaction conditions in the subpockets to which ligands should bind, rather than using the entire interaction information, to prevent undesirable biases toward specific pockets or ligand structures.

Achieving high versatility with small amounts of data, KAIST develops new framework for 3D molecule generation for drug design

Illustration: Example of interaction-aware conditional ligand elaboration. (Source: Paper)

To demonstrate the framework's ability to perform general structure-based drug design, rather than using typical benchmarks consisting of 105 to 107 computer-generated protein-ligand binding structures, the researchers used only Approximately 104 real crystal structures were selected from the PDBbind database because a good generalization model can successfully extract appropriate features even for small-scale data.

Achieving high versatility with small amounts of data, KAIST develops new framework for 3D molecule generation for drug design

Illustration: Generating the universality of the framework. (Source: Paper)

The researchers evaluated their model by analyzing various aspects of the properties of the generated unseen target ligands—binding stability, affinity, geometric patterning, diversity, and novelty.

Achieving high versatility with small amounts of data, KAIST develops new framework for 3D molecule generation for drug design

#aIllustration: Modulating selectivity through site-specific interactions controls ligand design. (Source: Paper)

The researchers used the model to solve practical problems where specific interaction sites play a critical role, demonstrating the applicability of their approach to structure-based drug design.

Paper link:https://www.nature.com/articles/s41467-024-47011-2

The above is the detailed content of Achieving high versatility with small amounts of data, KAIST develops new framework for 3D molecule generation for drug design. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:jiqizhixin.com. If there is any infringement, please contact admin@php.cn delete