Home >Technology peripherals >AI >With accuracy comparable to AlphaFold, EPFL's AI method matches protein interactions from sequences

With accuracy comparable to AlphaFold, EPFL's AI method matches protein interactions from sequences

WBOY
WBOYOriginal
2024-07-16 01:18:30859browse

With accuracy comparable to AlphaFold, EPFLs AI method matches protein interactions from sequences

1. The importance of protein interactions

Proteins are the cornerstone of life and participate in almost all biological processes. Understanding how proteins interact is critical to explaining the complexity of cellular function.

2. New method: pairing interacting protein sequences

Anne-Florence Bitbol’s team at Ecole Polytechnique Fédérale de Lausanne (EPFL) proposed a method to pair interacting protein sequences. This method exploits the power of protein language models trained on multiple sequence alignments.

3. Method advantages

This method performs well for small data sets and can improve the structure prediction of protein complexes through supervised methods.

4. Research results published

The research was titled "Pairing interacting protein sequences using masked language modeling" and was published in "PNAS" on June 24, 2024.

With accuracy comparable to AlphaFold, EPFLs AI method matches protein interactions from sequences

Prediction of protein-protein interactions

Protein-protein interactions are crucial for cellular function, they ensure the specificity of signaling and the formation of multi-protein complexes such as molecular motors or receptors. Predicting protein-protein interactions and their complex structures is an important topic in computational biology and biophysics.

Although deep learning methods such as AlphaFold have made significant progress in protein monomer structure prediction, the prediction performance of complex structures is still not as good as monomer prediction, and there is heterogeneity. AlphaFold first constructs a homologous multiple sequence alignment (MSA) of the query protein sequence, and the quality of the MSA is extremely critical to prediction accuracy.

Paired MSA of heteromultimers

For protein complexes (heteromultimers) involving multiple chains, paired MSA can provide coevolutionary information between interaction partners and help infer interchain contacts , but constructing correctly paired MSA is a challenge; especially in eukaryotes, where there are many homologous proteins and are not dependent on genomic proximity.

Co-evolution method

At present, genome proximity, approximate homology, phylogeny-based methods and co-evolution strategies are combined to deal with this challenge. Among them, although the co-evolution method requires large data, it is still in optimization Shows potential in pairing and predicting complex structures, particularly by matching homologous proteins by maximizing coevolutionary signal.

DiffPALM: A Differentiable Pairing Method

Anne-Florence Bitbol’s team at EPFL has developed a method for pairing interacting protein sequences that utilizes a protein language trained on multiple sequence alignments (MSA) The power of models such as MSA Transformer and AlphaFold's EvoFormer module. This enables it to understand and predict complex interactions between proteins with a high degree of accuracy.

Based on these, the researchers proposed Differentiable Pairing using Alignment-Based Language Model (DiffPALM), a differentiable method to predict cognate word matches using MLM.

With accuracy comparable to AlphaFold, EPFLs AI method matches protein interactions from sequences

Graphic: DiffPALM performance on a small HK-RR MSA. (Source: paper)

DiffPALM outperforms existing coevolution methods by a large margin on a difficult benchmark of shallow MSA extracted from a ubiquitous prokaryotic protein dataset. DiffPALM performance further improves rapidly when known interacting pairs are provided as examples.

Coevolution-based pairing methods focus on studying how protein sequences evolve together over time when they interact closely - changes in one protein may lead to changes in its interacting molecules. This is an extremely important topic in molecular and cell biology and is well captured by protein language models trained on MSA.

With accuracy comparable to AlphaFold, EPFLs AI method matches protein interactions from sequences

Graphic: AFM performance using different pairing methods. (Source: paper)

The team then applied DiffPALM to the homolog matching puzzle of eukaryotic protein complexes. To do this, the researchers used DiffPALM paired sequences as input to AFM. In the tested complexes, the use of DiffPALM significantly improved the structure predictions from AFM in some cases. It also achieves performance comparable to using ortholog-based pairing.

With accuracy comparable to AlphaFold, EPFLs AI method matches protein interactions from sequences

Illustration: Impact of positive examples, MSA depth, and expansion to another pair of protein families. (Source: paper)

DiffPALM 的应用在基础蛋白质生物学领域显而易见,但它的应用范围不止于此,因为它有可能成为医学研究和药物开发的有力工具。例如,准确预测蛋白质相互作用有助于了解疾病机制和开发有针对性的治疗方法。

研究人员已免费提供 DiffPALM,希望科学界广泛采用它以进一步推动计算生物学的发展,并使研究人员能够探索蛋白质相互作用的复杂性。

DiffPALM 结合先进的机器学习技术和对复杂生物数据的有效处理,标志着计算生物学向前迈出了重大一步。

它不仅增强了科学家对蛋白质相互作用的理解,而且开辟了医学研究的新途径,有可能带来疾病治疗和药物开发的突破。

论文链接:
https://www.pnas.org/doi/10.1073/pnas.2311887121

相关报道:
https://phys.org/news/2024-06-ai-based-approach-protein-interaction.html

The above is the detailed content of With accuracy comparable to AlphaFold, EPFL's AI method matches protein interactions from sequences. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn