search
HomeTechnology peripheralsAIThe research team of the Chinese Academy of Sciences released two important papers: the release of the first large-scale model of the basis of life across species, and the release of a new AI model for cell fate prediction

The research team of the Chinese Academy of Sciences released two important papers: the release of the first large-scale model of the basis of life across species, and the release of a new AI model for cell fate prediction

Author | Chinese Academy of Sciences Multidisciplinary Research Team

Editor | ScienceAI

Known as one of the three major scientific projects of mankind in the 20th century The genome project has kicked off an in-depth analysis of the mysteries of life. Due to the multi-dimensional and highly dynamic nature of life processes, it is difficult for traditional experimental research methods to systematically and accurately decipher the underlying common laws of the genetic code. It is urgent to use powerful computing technology to achieve representation modeling and knowledge discovery of genetic data.

Currently, artificial intelligence technology with large models as the core has triggered revolutions in fields such as computer vision and natural language understanding, demonstrating in-depth understanding of data and knowledge, and is expected to be applied in the field of life science research, systems To accurately decipher the underlying common laws of genetic codes

Recently, the "Xcompass Consortium" (Xcompass Consortium) composed of a multi-disciplinary interdisciplinary research team of the Chinese Academy of Sciences has made important breakthroughs in artificial intelligence empowering life science research. Successfully Constructed the world's first large-scale model of the basis of cross-species life - GeneCompass. This model integrates the transcriptome data of more than 126 million single cells of humans and mice, and integrates four types of prior knowledge including promoter sequences and gene co-expression relationships. The number of basic model parameters reaches 130 million, realizing the control of gene expression. Panoramic learning and understanding of regulatory laws simultaneously supports prediction of cell state changes and accurate analysis of various life processes, demonstrating the great potential of artificial intelligence in empowering life science research.

The study is titled "GeneCompass: Deciphering Universal Gene Regulatory Mechanisms with Knowledge-Informed Cross-Species Foundation Model" and was published on bioRxiv.

The research team of the Chinese Academy of Sciences released two important papers: the release of the first large-scale model of the basis of life across species, and the release of a new AI model for cell fate prediction

Paper link: https://www.biorxiv.org/content/10.1101/2023.09.26.559542v1

In addition, The team also simultaneously released a gene regulatory network generation model based on transfer learning, CellPolaris, which can accurately identify core factors for cell fate conversion and has the ability to simulate transcription factor perturbations.

The research is titled "CellPolaris: Decoding Cell Fate through Generalization Transfer Learning of Gene Regulatory Networks" and was published on bioRxiv.

The research team of the Chinese Academy of Sciences released two important papers: the release of the first large-scale model of the basis of life across species, and the release of a new AI model for cell fate prediction

Paper link: https://www.biorxiv.org/content/10.1101/2023.09.25.559244v1

GeneCompass: The first large-scale model of the basis of life across species

The research team of the Chinese Academy of Sciences released two important papers: the release of the first large-scale model of the basis of life across species, and the release of a new AI model for cell fate prediction

Individual mammals typically contain tens of thousands to tens of trillions of cells. Although all cells in an individual contain the same genetic sequence, the fate and function of each cell vary widely due to its unique spatiotemporal context. Such a sophisticated life process is controlled by a complex gene expression regulation system

In order to enhance the understanding of the essential laws of life and innovate the diagnosis and treatment of various major diseases, it is necessary to study the ubiquitous gene regulation mechanisms of life. Explore deeper. However, traditional research methods have low throughput and are limited to a single model organism, and cannot reveal complex gene regulatory mechanisms. In recent years, breakthroughs in single-cell omics technology have produced a large number of gene expression profile data of different types of cells. , providing a data basis for interpreting gene-gene interactions. At the same time, the development of deep learning, especially the emergence of large generative models, can comprehensively summarize the nonlinear regulation mechanism of massive data learning in different cell states, bringing unprecedented opportunities to life science research.

A large model of the basics of life across species, including 120 million cells and 130 million parameters

Currently, a single species has been obtained worldwide The scale of single-cell transcriptome data is only in the tens of millions, which is difficult to fully support the training of large models of basic life models used to analyze complex life processes.

The team collected open source single-cell transcriptome data from different species, and through pre-processing processes such as screening, cleaning, and normalization, established the largest known high-quality database, including more than 126 million cells in mice and humans. The training data set scCompass-126M adopts a deep learning architecture based on the Transformer self-attention mechanism, which can capture the long-term dynamic correlation between different genes in different cell backgrounds, and the model parameter size reaches 130 million. In order to achieve high-resolution characterization of life processes, GeneCompass dual-encodes gene numbers and expression levels for the first time, enabling effective and sensitive extraction of correlations between genes. This enables GeneCompass to provide more precise analysis of gene-gene interactions under a variety of specific conditions, such as cell types and perturbation states.

Embedding prior knowledge during pre-training can effectively improve model performance

The model effectively integrates promoter sequences, known gene regulatory networks, gene family information and gene co- Expressing the relationship between four kinds of biological prior knowledge, adding human annotation information encoding, improves the understanding of complex feature correlations between biological data. Through training and integrating data information and prior knowledge of different species, GeneCompass is expected to improve the efficiency and accuracy of traditional biological research and bring new entry points to complex life science problems that cannot yet be broken through.

The research team of the Chinese Academy of Sciences released two important papers: the release of the first large-scale model of the basis of life across species, and the release of a new AI model for cell fate prediction

GeneCompass integrates four kinds of biological prior knowledge.

Scale effect prompts model training to capture the conservative laws of biological evolution

The team found that models pre-trained on large-scale cross-species data can perform better on single-species subtasks It is consistent with the scaling law: larger-scale multi-species pre-training data can produce better pre-training representations and further improve the performance of downstream tasks. This finding shows that there are conserved gene regulation patterns between species, and that these patterns can be learned and understood by pre-trained models. At the same time, this also means that with the expansion of species and data, model performance is expected to continue to improve

The research team of the Chinese Academy of Sciences released two important papers: the release of the first large-scale model of the basis of life across species, and the release of a new AI model for cell fate prediction

Increasing the scale of cross-species data can improve model performance

Multi-task performance advantages show the powerful generalization ability of the basic large model

As the largest cross-species pre-trained basic life model with knowledge embedding to date, GeneCompass can realize multiple cross-species Transfer learning for downstream tasks, and achieving better performance than existing methods in cell type annotation, quantitative gene perturbation prediction, drug sensitivity analysis, etc. This fully demonstrates the strategic advantages of pre-training based on multi-species unlabeled big data and then using different sub-task data for model fine-tuning. It is expected to become a universal solution for analyzing and predicting various biological problems related to gene-cell characteristics.

The research team of the Chinese Academy of Sciences released two important papers: the release of the first large-scale model of the basis of life across species, and the release of a new AI model for cell fate prediction

Cell polarization: Transfer learning decodes gene regulatory networks and predicts cell fate changes

Using transfer learning to generate cells Specific gene regulatory network

The team also developed a set of gene regulatory network construction AI models based on generalized transfer learning, called CellPolaris. The model first sorts out hundreds of sets of transcriptome and chromatin accessibility data in matching cell scenarios to build a high-quality gene regulatory network, and then uses the generalized transfer learning model to generate more genes in cell scenarios using only transcriptome data. regulatory network. Then, using the generated high-confidence gene regulatory network, we developed a tool for identifying core transcription factors in cell fate transitions and a transcription factor perturbation simulation tool based on a probabilistic graphical model. This model can effectively identify the core factors of cell fate conversion and realize the simulation of transcription factor perturbation. It has important application value in the analysis of gene regulatory mechanisms and the discovery of disease-causing genes.





##Simulating the impact of transcription factor knockout on cell fate during placental development
The gene regulatory network generated by the CellPolaris model provides rich molecular interaction information and can be used as prior knowledge for large deep learning models . The low-dimensional embedding vectors generated by deep learning large models will provide important information for the analysis of gene regulatory mechanisms and the discovery of disease-causing genes.

The above two studies were completed by the "Compass Alliance" team. The "Compass Alliance" team is currently mainly composed of the Institute of Zoology of the Chinese Academy of Sciences, the Joint Computer Network Information Center, the Institute of Automation, the Institute of Computing Technology, and the Institute of Mathematics and Systems Science. Composed of research institutes and other research institutes, the goal of the alliance is to establish a new paradigm of life science research driven by digital intelligence and analyze the essential laws of life.


#artificial intelligence

× [Biological Neuroscience Mathematics Physics Chemistry Materials]

##

The above is the detailed content of The research team of the Chinese Academy of Sciences released two important papers: the release of the first large-scale model of the basis of life across species, and the release of a new AI model for cell fate prediction. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:机器之心. If there is any infringement, please contact admin@php.cn delete
Laravel入门教程:从零开始学习最流行的PHP框架Laravel入门教程:从零开始学习最流行的PHP框架Aug 13, 2023 pm 01:21 PM

Laravel入门教程:从零开始学习最流行的PHP框架引言:Laravel是当前最流行的PHP框架之一,它易于上手、功能强大且拥有活跃的开发社区。本文将带您从零开始学习Laravel框架,并提供一些实例代码,帮助您更好地理解和掌握这个强大的工具。第一步:安装Laravel在开始之前,您需要在计算机上安装Laravel框架。最简单的方法是通过Composer进

VUE3入门实例:制作一个简单的图片裁剪器VUE3入门实例:制作一个简单的图片裁剪器Jun 15, 2023 pm 08:45 PM

Vue.js是一款流行的JavaScript前端框架,目前已经推出了最新的版本——Vue3,新版Vue在性能、体积以及开发体验上均有所提升,受到越来越多的开发者欢迎。本文将介绍如何使用Vue3制作一个简单的图片裁剪器。首先,我们需要创建一个Vue项目并安装所需的插件。可以使用VueCLI来创建项目,也可以手动搭建。这里我们以使用VueCLI的方式为例:#

从入门到精通:掌握go-zero框架从入门到精通:掌握go-zero框架Jun 23, 2023 am 11:37 AM

Go-zero是一款优秀的Go语言框架,它提供了一整套解决方案,包括RPC、缓存、定时任务等功能。事实上,使用go-zero建立一个高性能的服务非常简单,甚至可以在数小时内从入门到精通。本文旨在介绍使用go-zero框架构建高性能服务的过程,并帮助读者快速掌握该框架的核心概念。一、安装和配置在开始使用go-zero之前,我们需要安装它并配置一些必要的环境。1

快速入门:使用Go语言函数实现简单的数据可视化功能快速入门:使用Go语言函数实现简单的数据可视化功能Aug 02, 2023 pm 04:25 PM

快速入门:使用Go语言函数实现简单的数据可视化功能随着数据的快速增长和复杂性的提高,数据可视化成为了数据分析和数据表达的重要手段。在数据可视化中,我们需要使用合适的工具和技术来将数据转化为易读且易理解的图表或图形。Go语言作为一种高效且易于使用的编程语言,在数据科学领域也有着广泛的应用。本文将介绍如何使用Go语言函数来实现简单的数据可视化功能。我们将使用Go

如何快速入门Beego开发框架?如何快速入门Beego开发框架?Jun 22, 2023 am 09:15 AM

Beego是一个基于Go语言的开发框架,它提供了一套完整的Web开发工具链,包括路由、模板引擎、ORM等。如果你想快速入门Beego开发框架,以下是一些简单易懂的步骤和建议。第一步:安装Beego和Bee工具安装Beego和Bee工具是开始学习Beego的第一步。你可以在Beego官网上找到详细的安装步骤,也可以使用以下命令来安装:gogetgithub

PHP中的人脸识别入门指南PHP中的人脸识别入门指南Jun 11, 2023 am 09:16 AM

随着科技的不断发展,人脸识别技术也越来越得到了广泛的应用。而在Web开发领域中,PHP是一种被广泛采用的技术,因此PHP中的人脸识别技术也备受关注。本文将介绍PHP中的人脸识别入门指南,帮助初学者快速掌握这一领域。一、什么是人脸识别技术人脸识别技术是一种基于计算机视觉技术的生物特征识别技术,其主要应用领域包括安防、金融、电商等。人脸识别技术的核心就是对人脸进

Laravel 8:快速入门指南Laravel 8:快速入门指南Jun 20, 2023 am 09:37 AM

Laravel是一个流行的PHP框架,它提供了许多工具和功能,以使开发Web应用程序变得更加轻松和快速。Laravel8已经发布,它带来了许多新的功能和改进。在本文中,我们将学习如何快速入门Laravel8。安装Laravel8要安装Laravel8,您需要满足以下要求:PHP>=7.3MySQL>=5.6或MariaDB>=10.

PHP摄像头调用教程:快速入门指南PHP摄像头调用教程:快速入门指南Jul 29, 2023 pm 11:13 PM

PHP摄像头调用教程:快速入门指南引言:在当今的数字时代,摄像头成为了人们生活中不可或缺的设备之一。在Web开发中,如何通过PHP调用摄像头,实现视频流的显示和处理,成为了很多开发者关注的问题。本文将为大家介绍如何快速入门使用PHP来调用摄像头。一、环境准备要使用PHP调用摄像头,我们需要准备以下环境:PHP:确保已经安装了PHP,并且安装了相应的扩展库,如

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

EditPlus Chinese cracked version

EditPlus Chinese cracked version

Small size, syntax highlighting, does not support code prompt function

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

WebStorm Mac version

WebStorm Mac version

Useful JavaScript development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

DVWA

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software