Home >Technology peripherals >AI >With accuracy comparable to AlphaFold, EPFL's AI method matches protein interactions from sequences
Proteins are the cornerstone of life and participate in almost all biological processes. Understanding how proteins interact is critical to explaining the complexity of cellular function.
2. New method: pairing interacting protein sequences
Anne-Florence Bitbol’s team at Ecole Polytechnique Fédérale de Lausanne (EPFL) proposed a method to pair interacting protein sequences. This method exploits the power of protein language models trained on multiple sequence alignments.
3. Method advantages
This method performs well for small data sets and can improve the structure prediction of protein complexes through supervised methods.
4. Research results published
The research was titled "Pairing interacting protein sequences using masked language modeling" and was published in "PNAS" on June 24, 2024.
Prediction of protein-protein interactionsProtein-protein interactions are crucial for cellular function, they ensure the specificity of signaling and the formation of multi-protein complexes such as molecular motors or receptors. Predicting protein-protein interactions and their complex structures is an important topic in computational biology and biophysics.
Although deep learning methods such as AlphaFold have made significant progress in protein monomer structure prediction, the prediction performance of complex structures is still not as good as monomer prediction, and there is heterogeneity. AlphaFold first constructs a homologous multiple sequence alignment (MSA) of the query protein sequence, and the quality of the MSA is extremely critical to prediction accuracy.
Paired MSA of heteromultimers
For protein complexes (heteromultimers) involving multiple chains, paired MSA can provide coevolutionary information between interaction partners and help infer interchain contacts , but constructing correctly paired MSA is a challenge; especially in eukaryotes, where there are many homologous proteins and are not dependent on genomic proximity.
Co-evolution method
At present, genome proximity, approximate homology, phylogeny-based methods and co-evolution strategies are combined to deal with this challenge. Among them, although the co-evolution method requires large data, it is still in optimization Shows potential in pairing and predicting complex structures, particularly by matching homologous proteins by maximizing coevolutionary signal.
DiffPALM: A Differentiable Pairing Method
Anne-Florence Bitbol’s team at EPFL has developed a method for pairing interacting protein sequences that utilizes a protein language trained on multiple sequence alignments (MSA) The power of models such as MSA Transformer and AlphaFold's EvoFormer module. This enables it to understand and predict complex interactions between proteins with a high degree of accuracy.
Based on these, the researchers proposed Differentiable Pairing using Alignment-Based Language Model (DiffPALM), a differentiable method to predict cognate word matches using MLM.
Graphic: DiffPALM performance on a small HK-RR MSA. (Source: paper)DiffPALM outperforms existing coevolution methods by a large margin on a difficult benchmark of shallow MSA extracted from a ubiquitous prokaryotic protein dataset. DiffPALM performance further improves rapidly when known interacting pairs are provided as examples.
Coevolution-based pairing methods focus on studying how protein sequences evolve together over time when they interact closely - changes in one protein may lead to changes in its interacting molecules. This is an extremely important topic in molecular and cell biology and is well captured by protein language models trained on MSA.
Graphic: AFM performance using different pairing methods. (Source: paper)The team then applied DiffPALM to the homolog matching puzzle of eukaryotic protein complexes. To do this, the researchers used DiffPALM paired sequences as input to AFM. In the tested complexes, the use of DiffPALM significantly improved the structure predictions from AFM in some cases. It also achieves performance comparable to using ortholog-based pairing.
Illustration: Impact of positive examples, MSA depth, and expansion to another pair of protein families. (Source: paper)DiffPALM 的应用在基础蛋白质生物学领域显而易见,但它的应用范围不止于此,因为它有可能成为医学研究和药物开发的有力工具。例如,准确预测蛋白质相互作用有助于了解疾病机制和开发有针对性的治疗方法。
研究人员已免费提供 DiffPALM,希望科学界广泛采用它以进一步推动计算生物学的发展,并使研究人员能够探索蛋白质相互作用的复杂性。
DiffPALM 结合先进的机器学习技术和对复杂生物数据的有效处理,标志着计算生物学向前迈出了重大一步。
它不仅增强了科学家对蛋白质相互作用的理解,而且开辟了医学研究的新途径,有可能带来疾病治疗和药物开发的突破。
论文链接:
https://www.pnas.org/doi/10.1073/pnas.2311887121
相关报道:
https://phys.org/news/2024-06-ai-based-approach-protein-interaction.html
The above is the detailed content of With accuracy comparable to AlphaFold, EPFL's AI method matches protein interactions from sequences. For more information, please follow other related articles on the PHP Chinese website!