Home  >  Article  >  Technology peripherals  >  Da Hongliang's research group at Shanghai Jiao Tong University & Shanghai AI Laboratory team released FSFP, a small sample prediction method for protein function based on language model, which was published in the Nature sub-journal

Da Hongliang's research group at Shanghai Jiao Tong University & Shanghai AI Laboratory team released FSFP, a small sample prediction method for protein function based on language model, which was published in the Nature sub-journal

王林
王林Original
2024-07-11 20:10:28776browse

Da Hongliangs research group at Shanghai Jiao Tong University & Shanghai AI Laboratory team released FSFP, a small sample prediction method for protein function based on language model, which was published in the Nature sub-journal

Editor | ScienceAI

Recently, the research group of Professor Hong Liang from the Institute of Natural Sciences/School of Physics and Astronomy/Zhangjiang Institute of Advanced Research/School of Pharmacy of Shanghai Jiao Tong University, and young researchers from the Shanghai Artificial Intelligence Laboratory talked about protein mutation - Important breakthroughs were made in property prediction.

This work adopts a new training strategy, which greatly improves the performance of traditional protein pre-trained large models in mutation-property prediction using very little wet experimental data.

The research results were titled "Enhancing the efficiency of protein language models with minimal wet-lab data through few-shot learning" and was published in "Nature Communications" on July 2, 2024.

Da Hongliangs research group at Shanghai Jiao Tong University & Shanghai AI Laboratory team released FSFP, a small sample prediction method for protein function based on language model, which was published in the Nature sub-journal

Paper link:
  • https://www.nature.com/articles/s41467-024-49798-6

Research background

Enzyme engineering requires mutation and screening of proteins to Get a better protein product. Traditional wet experiment methods require repeated experimental iterations, which is time-consuming and labor-intensive.

Deep learning methods can accelerate protein mutation transformation, but require a large amount of protein mutation data to train the model. Obtaining high-quality mutation data is restricted by traditional wet experiments.

There is an urgent need for a method that can accurately predict protein mutation-function without large amounts of wet experimental data.

Research Method

This study proposes the FSFP method, which combines meta-learning, ranking learning and efficient fine-tuning of parameters to train a protein pre-training model using only dozens of wet experimental data, greatly improving the mutation-property prediction effect .

FSFP Method:

  • Use the protein pre-trained model to evaluate the similarity between the target protein and the protein in ProteinGym.
  • Select the two ProteinGym data sets that are closest to the target protein as meta-learning auxiliary tasks.
  • Use GEMME’s scoring data of target proteins as the third auxiliary task.
  • Use the ranking learning loss function and Lora training method to train the protein pre-training model on a small amount of wet experimental data.

Test results show that even if the original prediction correlation is lower than 0.1, the FSFP method can increase the correlation to above 0.5 after training the model using only 20 wet experimental data.

Da Hongliangs research group at Shanghai Jiao Tong University & Shanghai AI Laboratory team released FSFP, a small sample prediction method for protein function based on language model, which was published in the Nature sub-journal

Illustration: FSFP overview. (Source: paper)

Research results
At the same time, in order to study the effectiveness of FSFP. We conducted a wet experiment in a specific case of protein Phi29 modification. FSFP was able to predict the top-20 single point mutations of the original protein pre-trained model ESM-1v when only 20 wet experiment data were used to train the model. The positivity rate increased by 25%, and nearly 10 new positive single point mutations could be found.

Da Hongliangs research group at Shanghai Jiao Tong University & Shanghai AI Laboratory team released FSFP, a small sample prediction method for protein function based on language model, which was published in the Nature sub-journal

Illustration: Engineering Phi29 using FSFP. (Source: paper)

Summary

In this work, the author proposed a new fine-tuning training method FSFP based on the protein pre-training model.

FSFP comprehensively utilizes meta-learning, ranking learning and efficient parameter fine-tuning technology to efficiently train a protein pre-training model using only 20 random wet experiment data, and can greatly improve the single-point mutation prediction positivity rate of the model.

The above results show that the FSFP method is of great significance in solving the high experimental cycle and reducing experimental costs in current protein engineering.

Author information

Professor Hong Liang from the Academy of Natural Sciences/School of Physics and Astronomy/Zhangjiang Institute for Advanced Study, and Tan Peng, a young researcher from the Shanghai Artificial Intelligence Laboratory, are the corresponding authors.

Postdoctoral fellow Zhou Ziyi from the School of Physics and Astronomy of Shanghai Jiao Tong University, master student Zhang Liang, doctoral student Yu Yuanxi, and doctoral student Wu Banghao from the School of Life Science and Technology are the co-first authors.

The above is the detailed content of Da Hongliang's research group at Shanghai Jiao Tong University & Shanghai AI Laboratory team released FSFP, a small sample prediction method for protein function based on language model, which was published in the Nature sub-journal. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn