New SOTA for protein function prediction, statistics-based AI methods from Shanghai Institute of Technology, Oxford and others, published in Nature sub-journal-AI-php.cn

New SOTA for protein function prediction, statistics-based AI methods from Shanghai Institute of Technology, Oxford and others, published in Nature sub-journal

PHPz

Aug 22, 2024 pm 04:45 PM

aiproteintheorybiology

New SOTA for protein function prediction, statistics-based AI methods from Shanghai Institute of Technology, Oxford and others, published in Nature sub-journal

, Editor | KX

단백질은 다른 분자와 결합하여 거의 모든 기본적인 생물학적 활동을 촉진합니다. 따라서 단백질 기능을 이해하는 것은 건강, 질병, 진화, 유기체 기능을 분자 수준에서 이해하는 데 중요합니다.

그러나 2억 개가 넘는 단백질은 특성이 밝혀지지 않은 상태로 남아 있으며, 계산 방법은 다양한 품질의 주석을 예측하기 위해 단백질의 구조 정보에 크게 의존합니다.

최근 옥스퍼드 대학교, ETH Zurich, 상하이 과학 기술 대학교, 베이징 사범 대학교의 연구팀은 기능 주석 및 단백질의 기능적 위치 식별을 촉진하기 위해 PhiGnet이라는 통계 기반 그래프 네트워크 방법을 설계했습니다.

PhiGnet은 성능 면에서 다른 방법보다 뛰어날 뿐만 아니라 구조적 정보가 없는 경우에도 시퀀스-기능 격차를 줄여줍니다. 연구 결과는 진화 데이터에 딥 러닝을 적용하면 잔류물 수준에서 기능적 위치를 강조하여 생물의학에서 단백질의 기존 특성과 새로운 기능을 해석하고 연구하는 데 귀중한 지원을 제공할 수 있음을 보여줍니다.

관련 연구는 "통계정보 그래프 네트워크를 활용한 단백질 기능의 정확한 예측"이라는 제목으로 "Nature Communications" 8월 4일자에 게재되었습니다.

New SOTA for protein function prediction, statistics-based AI methods from Shanghai Institute of Technology, Oxford and others, published in Nature sub-journal

논문 링크: https://www.nature.com/articles/s41467-024-50955-0

단백질 기능을 이해하는 것은 많은 주요 생물학적 활동의 복잡한 메커니즘을 이해하는 데 중요하며 의학, 생명공학 및 의약품 개발 분야에는 광범위한 의미가 있습니다.

현재까지 UniProt 데이터베이스(2023년 6월)에서는 3억 5,600만 개가 넘는 단백질의 서열이 분석되었으며, 그 중 대다수(~80%)에는 알려진 기능 주석이 없습니다.

딥 러닝 방법은 ab initio 방법 및 상동성 모델링과 같은 고전적인 방법의 기능을 능가하여 단백질 3D 구조를 예측하는 데 놀라운 정확도를 달성했습니다. 그러나 단백질에 기능적 주석을 정확하게 할당하는 것은 특히 실험적 분석과 비교할 때 여전히 어려운 일입니다.

이러한 과제를 해결하기 위해 연구자들은 공동 진화하는 잔류물에 포함된 정보를 사용하여 잔류물 수준 기능에 주석을 달 수 있다는 가설을 세웠습니다.

옥스퍼드 대학교 팀은 통계 기반 그래프 네트워크를 사용하여 서열로만 단백질 기능을 예측할 것을 제안합니다. 이 방법은 본질적으로 진화적 특징을 특징으로 하며 특정 기능을 수행하는 잔류물의 중요성에 대한 정량적 평가를 허용합니다.

이 방법은 진화 데이터에서 얻은 지식을 활용하여 두 개의 누적 그래프 컨벌루션 네트워크를 구동합니다. 얻은 지식과 설계된 네트워크 아키텍처를 통해 단백질에 기능적 주석을 정확하게 할당할 수 있으며, 중요한 것은 특정 기능과 관련된 각 잔기의 중요성을 정량화할 수 있다는 것입니다.

단백질 기능 주석을 위한 PhiGnet

PhiGnet 방법은 통계 기반 그래프 네트워크를 사용하여 단백질 기능에 주석을 달고 서열을 기반으로 종 전체의 기능적 위치를 식별합니다.

New SOTA for protein function prediction, statistics-based AI methods from Shanghai Institute of Technology, Oxford and others, published in Nature sub-journal

그림: PhiGnet은 단백질 기능에 주석을 달았습니다. (출처: 논문)

진화 결합(EVC) 및 잔여 커뮤니티(RC)의 지식을 흡수하기 위해 연구원들은 누적 그래프 컨벌루션 네트워크(GCN)를 사용하는 이중 채널 아키텍처 접근 방식을 설계했습니다. 이 방법은 효소 위원회(EC) 번호 및 유전자 온톨로지(GO) 용어(생물학적 과정, BP, 세포 구성 요소, CC 및 분자 기능, MF)를 포함하여 단백질에 기능적 주석을 할당하도록 특별히 설계되었습니다.

단백질 서열이 제공되면 연구에서는 사전 훈련된 ESM-1b 모델을 사용하여 해당 서열의 임베딩을 도출합니다. 그 후, 임베딩은 EVC 및 RC(그래프 가장자리)뿐만 아니라 그래프 노드로서 듀얼 스택 GCN의 6개 그래프 컨벌루션 레이어에 입력됩니다. 이러한 레이어는 두 개의 완전히 연결된(FC) 레이어 블록과 함께 작동하여 두 GCN의 정보를 신중하게 처리하여 궁극적으로 단백질에 기능 주석을 할당하는 가능성을 평가하는 확률 텐서를 생성합니다.

또한 Grad-CAM(Gradient-Weighted Class Activation Map) 방법을 사용하여 도출된 활성화 점수를 사용하여 특정 기능에서 각 잔기의 중요성을 평가합니다. 이 점수를 통해 PhiGnet은 개별 잔류물 수준에서 기능성 부위를 정확히 찾아낼 수 있습니다.

예를 들어, 세린-아스파르트산염 반복이 포함된 단백질 D(SdrD)의 RC를 계산하면 자연 진화를 통해 기능 부위의 잔기가 유지되는 것으로 나타났으며, PhiGnet은 이러한 정보를 포착할 수 있어 다음과 같은 분석이 향상됩니다. 잔기 구조적 데이터가 없는 경우에도 기본 수준에서 단백질 기능을 예측하는 방법입니다.

단백질 기능 부위에 주석을 답니다

Are computational predictions as accurate as experimentally determined functional annotations? To address this question, the study used activation scores to quantitatively examine the contribution of each amino acid to protein function. The predictive performance of PhiGnet was evaluated and the importance of residues (their contribution to protein function) in nine proteins was assessed.

New SOTA for protein function prediction, statistics-based AI methods from Shanghai Institute of Technology, Oxford and others, published in Nature sub-journal

Illustration: PhiGnet annotates protein function at the residue level. (Source: paper)

By calculating the activation score for each residue in nine proteins and comparing them to residues determined either experimentally or semi-manually annotation. PhiGnet showed good accuracy (average ⩾ 75%) in predicting important sites at the residue level, in good agreement with actual ligand/ion/DNA binding sites. PhiGnet accurately identifies functionally important residues of proteins with high activation scores.

Outperforms other state-of-the-art methods

To evaluate the prediction performance of PhiGnet, the method was applied to infer functional annotations (EC numbers and GO terms) of proteins in two benchmark sets. Compare PhiGnet with state-of-the-art methods, including alignment-based methods, deep learning-based methods. Two basic metrics were used for comparison, including the protein-centered Fmax score and the area under the precision-recall curve (AUPR).
Illustration: Comparison between different methods on GO terms in different ontologies and EC numbers. (Source: paper)

PhiGnet demonstrates the predictive power of assigning functional annotations to proteins in two test sets. It achieves average AUPR of 0.70 and 0.89, and Fmax scores of 0.80 and 0.88 for GO terms and EC numbers, respectively.

Overall, PhiGnet significantly outperforms all supervised and unsupervised methods on the benchmark dataset.

Additionally, the generalization robustness of PhiGnet was demonstrated to test proteins with different sequence identity thresholds than proteins in the training set. At different maximum sequence identity levels (30%, 40%, 50%, 70%, and 95%), PhiGnet showed better prediction performance as sequence identity increased.

Driven by evolutionary signatures

Evolutionary data plays an important role in PhiGnet and can be used to predict protein functional annotations and identify functional sites. First, ablation experiments were performed to test the contribution of EVC/RC to PhiGnet. Experiments show that PhiGnet can accurately assign protein functional annotations. Furthermore, PhiGnet using EVC or RC demonstrates a strong ability to learn general sequence-function relationships, often as well or as well as other methods.

Second, the ability of PhiGnet to characterize meaningful features from identified functionally relevant residues in residue communities was further investigated. Activation scores of residues were calculated to emphasize their contribution to protein function. Notably, the predicted residues are consistent with those of functional sites determined by experimental assays and are better identified than those in RC.

New SOTA for protein function prediction, statistics-based AI methods from Shanghai Institute of Technology, Oxford and others, published in Nature sub-journal

Illustration: PhiGnet learns evolutionary features to identify protein functional sites. (Source: Paper)

Research has shown that evolutionary information, especially the information contained in Remote Homology, is sufficient to specify the function of a protein and quantitatively characterize the residues of functional sites. Additionally, Remote Homology contains higher-order levels of evolutionary knowledge compared to the lower-order levels of information in the Evolutionary Vector. At the same time, the information contained in Remote Homology plays an important role in enhancing PhiGnet's ability to identify functionally relevant sites at the residue level.

Successes and Limitations

In summary, the better performance of PhiGnet can be attributed to its utilization of evolutionary data of protein sequences and higher-order patterns of the data, allowing for a deeper and more accurate understanding of protein function.

PhiGnet’s primary success is the use of statistical information graph convolutional neural networks to facilitate hierarchical learning of evolutionary data from massive sequence datasets. This approach significantly surpasses existing supervised and unsupervised methods and can be used to guide future biological and clinical experiments.

Limitations of the PhiGnet method include the bias/noise that occurs in protein families with low sequence diversity. Incorporating (co)evolutionary information into PhiGnet may affect the accurate identification of residue communities, especially if the information comes from highly conserved protein families. While integrating physically extracted knowledge into PhiGnet achieves significant improvements over other approaches, significant challenges remain in interpreting the learning mechanisms in PhiGnet.

The synergy between evolutionary data and machine learning will pave the way to accurately determine and engineer the biophysical properties of proteins.

The above is the detailed content of New SOTA for protein function prediction, statistics-based AI methods from Shanghai Institute of Technology, Oxford and others, published in Nature sub-journal. For more information, please follow other related articles on the PHP Chinese website!

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Can't use ChatGPT! Explaining the causes and solutions that can be tested immediately [Latest 2025]May 14, 2025 am 05:04 AM

ChatGPT is not accessible? This article provides a variety of practical solutions! Many users may encounter problems such as inaccessibility or slow response when using ChatGPT on a daily basis. This article will guide you to solve these problems step by step based on different situations. Causes of ChatGPT's inaccessibility and preliminary troubleshooting First, we need to determine whether the problem lies in the OpenAI server side, or the user's own network or device problems. Please follow the steps below to troubleshoot: Step 1: Check the official status of OpenAI Visit the OpenAI Status page (status.openai.com) to see if the ChatGPT service is running normally. If a red or yellow alarm is displayed, it means Open

Calculating The Risk Of ASI Starts With Human MindsMay 14, 2025 am 05:02 AM

On 10 May 2025, MIT physicist Max Tegmark told The Guardian that AI labs should emulate Oppenheimer’s Trinity-test calculus before releasing Artificial Super-Intelligence. “My assessment is that the 'Compton constant', the probability that a race to

An easy-to-understand explanation of how to write and compose lyrics and recommended tools in ChatGPTMay 14, 2025 am 05:01 AM

AI music creation technology is changing with each passing day. This article will use AI models such as ChatGPT as an example to explain in detail how to use AI to assist music creation, and explain it with actual cases. We will introduce how to create music through SunoAI, AI jukebox on Hugging Face, and Python's Music21 library. Through these technologies, everyone can easily create original music. However, it should be noted that the copyright issue of AI-generated content cannot be ignored, and you must be cautious when using it. Let’s explore the infinite possibilities of AI in the music field together! OpenAI's latest AI agent "OpenAI Deep Research" introduces: [ChatGPT]Ope

What is ChatGPT-4? A thorough explanation of what you can do, the pricing, and the differences from GPT-3.5!May 14, 2025 am 05:00 AM

The emergence of ChatGPT-4 has greatly expanded the possibility of AI applications. Compared with GPT-3.5, ChatGPT-4 has significantly improved. It has powerful context comprehension capabilities and can also recognize and generate images. It is a universal AI assistant. It has shown great potential in many fields such as improving business efficiency and assisting creation. However, at the same time, we must also pay attention to the precautions in its use. This article will explain the characteristics of ChatGPT-4 in detail and introduce effective usage methods for different scenarios. The article contains skills to make full use of the latest AI technologies, please refer to it. OpenAI's latest AI agent, please click the link below for details of "OpenAI Deep Research"

Explaining how to use the ChatGPT app! Japanese support and voice conversation functionMay 14, 2025 am 04:59 AM

ChatGPT App: Unleash your creativity with the AI assistant! Beginner's Guide The ChatGPT app is an innovative AI assistant that handles a wide range of tasks, including writing, translation, and question answering. It is a tool with endless possibilities that is useful for creative activities and information gathering. In this article, we will explain in an easy-to-understand way for beginners, from how to install the ChatGPT smartphone app, to the features unique to apps such as voice input functions and plugins, as well as the points to keep in mind when using the app. We'll also be taking a closer look at plugin restrictions and device-to-device configuration synchronization

How do I use the Chinese version of ChatGPT? Explanation of registration procedures and feesMay 14, 2025 am 04:56 AM

ChatGPT Chinese version: Unlock new experience of Chinese AI dialogue ChatGPT is popular all over the world, did you know it also offers a Chinese version? This powerful AI tool not only supports daily conversations, but also handles professional content and is compatible with Simplified and Traditional Chinese. Whether it is a user in China or a friend who is learning Chinese, you can benefit from it. This article will introduce in detail how to use ChatGPT Chinese version, including account settings, Chinese prompt word input, filter use, and selection of different packages, and analyze potential risks and response strategies. In addition, we will also compare ChatGPT Chinese version with other Chinese AI tools to help you better understand its advantages and application scenarios. OpenAI's latest AI intelligence

5 AI Agent Myths You Need To Stop Believing NowMay 14, 2025 am 04:54 AM

These can be thought of as the next leap forward in the field of generative AI, which gave us ChatGPT and other large-language-model chatbots. Rather than simply answering questions or generating information, they can take action on our behalf, inter

An easy-to-understand explanation of the illegality of creating and managing multiple accounts using ChatGPTMay 14, 2025 am 04:50 AM

Efficient multiple account management techniques using ChatGPT | A thorough explanation of how to use business and private life! ChatGPT is used in a variety of situations, but some people may be worried about managing multiple accounts. This article will explain in detail how to create multiple accounts for ChatGPT, what to do when using it, and how to operate it safely and efficiently. We also cover important points such as the difference in business and private use, and complying with OpenAI's terms of use, and provide a guide to help you safely utilize multiple accounts. OpenAI

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

How to fix KB5055612 fails to install in Windows 10?

4 weeks agoByDDD

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Roblox: Grow A Garden - Complete Mutation Guide

3 weeks agoByDDD

Nordhold: Fusion System, Explained

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Mandragora: Whispers Of The Witch Tree - How To Unlock The Grappling Hook

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

EditPlus Chinese cracked version

Small size, syntax highlighting, does not support code prompt function

SublimeText3 English version

Recommended: Win version, supports code prompts!

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

SublimeText3 Linux new version

SublimeText3 Linux latest version

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

Hot Topics

1672

1428

1332

1276

1256