search
HomeTechnology peripheralsAIBeyond the Nobel Prize? For the first time in the biological world, 'ChatGPT' has synthesized a new protein from scratch, and it has been published in the Nature sub-journal!

The application of artificial intelligence has greatly accelerated research on protein engineering.

Recently, a fledgling startup in Berkeley, California, once again made amazing progress.

Scientists used Progen, a protein engineering deep learning language model similar to ChatGPT, to achieve AI prediction of protein synthesis for the first time.

Beyond the Nobel Prize? For the first time in the biological world, ChatGPT has synthesized a new protein from scratch, and it has been published in the Nature sub-journal!

Not only are these proteins completely different from those known, the lowest similarity is even only 31.4 %, but as effective as natural protein.

Now, this work has been officially published in the Nature sub-journal.

Beyond the Nobel Prize? For the first time in the biological world, ChatGPT has synthesized a new protein from scratch, and it has been published in the Nature sub-journal!

Paper address: https://www.nature.com/articles/s41587-022-01618-2

#This experiment also shows that although natural language processing was developed for reading and writing language text, it can also learn some basic principles of biology.

Technology comparable to the Nobel Prize

In response, researchers said that this new technology may become more powerful than directed evolution (the Nobel Prize-winning protein design technology ) is more powerful.

"It will revitalize the 50-year-old field of protein engineering by accelerating the development of new proteins that can be used in virtually everything from therapeutics to degrading plastics."

Beyond the Nobel Prize? For the first time in the biological world, ChatGPT has synthesized a new protein from scratch, and it has been published in the Nature sub-journal!

The company is called Profluent. It was founded by the former head of Salesforce AI research and has received US$9 million in start-up funding. Yu established an integrated wet lab and recruited machine learning scientists and biologists.

In the past, it was very laborious to mine proteins in nature or adjust proteins to the required functions. Profulent's goal is to make this process effortless.

They did it.

Beyond the Nobel Prize? For the first time in the biological world, ChatGPT has synthesized a new protein from scratch, and it has been published in the Nature sub-journal!

Profluent founder and CEO Ali Madani

Madani said in the interview that Profulent has designed multiple families of proteins. These proteins function like exemplar proteins and are therefore highly active enzymes.

This task is very difficult and is done in a zero-shot manner, which means that multiple rounds of optimization are not performed, or even any data from the wet laboratory is not provided at all.

The resulting protein is a highly active protein that usually takes hundreds of years to evolve.

Beyond the Nobel Prize? For the first time in the biological world, ChatGPT has synthesized a new protein from scratch, and it has been published in the Nature sub-journal!

ProGen based on language model

As a kind of deep neural network, the conditional language model is not only Semantically and grammatically correct, novel and diverse natural language text can be generated, and input control tags can be leveraged to guide style, topic, and more.

Similarly, researchers have developed today’s protagonist—ProGen, a conditional protein language model with 1.2 billion parameters.

Specifically, ProGen based on the Transformer architecture simulates the interaction of residues through a self-attention mechanism, and can generate different artificial protein sequences across protein families based on input control labels.

Beyond the Nobel Prize? For the first time in the biological world, ChatGPT has synthesized a new protein from scratch, and it has been published in the Nature sub-journal!

Generating artificial proteins using conditional language models

In order to create this model , the researchers fed the amino acid sequences of 280 million different proteins and let them "digest" for several weeks.

They then fine-tuned the model using 56,000 sequences from five lysozyme families and information about these proteins.

Progen’s algorithm is similar to GPT3.5, the model behind ChatGPT. It learns the ordering rules of amino acids in proteins and their relationship with protein structure and function.

Soon, the model generated a million sequences.

The researchers selected 100 for testing based on their similarity to natural protein sequences and the naturalness of their amino acid "syntax" and "semantics."

Of these, 66 produced chemical reactions similar to natural proteins that destroy bacteria in egg whites and saliva.

In other words, these new proteins generated by AI can also kill bacteria.

Beyond the Nobel Prize? For the first time in the biological world, ChatGPT has synthesized a new protein from scratch, and it has been published in the Nature sub-journal!

The artificial proteins generated are diverse and well expressed in experimental systems

Going a step further, the researchers selected the five proteins that reacted most strongly and added them to samples of E. coli.

Among them, there are two artificial enzymes that can break down the cell wall of bacteria.

By comparing with hen egg white lysozyme (HEWL), it can be found that their activity is equivalent to HEWL.

The researchers then used X-rays for imaging.

Although the amino acid sequences of artificial enzymes are up to 30% different from existing proteins, and only 18% are the same between them, their shapes are similar to those in nature. Proteins are not that different and have comparable functions.

Beyond the Nobel Prize? For the first time in the biological world, ChatGPT has synthesized a new protein from scratch, and it has been published in the Nature sub-journal!

Applicability of conditional language modeling to other protein systems

Besides, for a highly evolved natural protein, it may only take a small mutation to stop it from working.

But the researchers found in another round of screening that even though only 31.4% of the sequences of the AI-generated enzymes were identical to known proteins, they still showed considerable activity and Similar structure.

Beyond the Nobel Prize? For the first time in the biological world, ChatGPT has synthesized a new protein from scratch, and it has been published in the Nature sub-journal!

Protein design, entering a new era

As you can see, the way ProGen works is very similar to ChatGPT similar.

ChatGPT can take MBA and bar exams and write college papers by studying massive data.

And ProGen learned how to generate new proteins by learning the syntax of how amino acids are combined into the 280 million existing proteins.

Beyond the Nobel Prize? For the first time in the biological world, ChatGPT has synthesized a new protein from scratch, and it has been published in the Nature sub-journal!

In the interview, Madani said, “Just like ChatGPT learns human languages ​​such as English, we are learning the language of biology and proteins. ."

"Artificially designed proteins perform much better than proteins inspired by evolutionary processes," said James, co-author of the paper and professor of bioengineering and therapeutic sciences at the UCSF School of Pharmacy. Fraser said.

"Language models are learning aspects of evolution, but it is different from the normal evolutionary process. We now have the ability to adjust the production of these features to obtain specific effects. For example, let a Enzymes that are incredibly thermally stable, or prefer acidic environments, or don't interact with other proteins."

Back in 2020, Salesforce Research developed ProGen . It is based on natural language programming and was originally used to generate English text.

From previous work, researchers know that artificial intelligence systems can teach themselves grammar and word meanings, as well as other basic rules that make writing organized.

“When you train sequence-based models with large amounts of data, they are very powerful at learning structures and rules,” said Nikhil, director of artificial intelligence research at Salesforce Research and senior author of the paper. Dr. Naik said, "They will understand which words can appear together and how to combine them."

"Now, we have demonstrated the ability of ProGen to generate new proteins and made it public Released, everyone can conduct research based on ours."

Beyond the Nobel Prize? For the first time in the biological world, ChatGPT has synthesized a new protein from scratch, and it has been published in the Nature sub-journal!

Lysozyme, which is a protein, although very small , with up to about 300 amino acids.

But with 20 possible amino acids, there are 20^300 possible combinations.

This is more than all human beings throughout the ages multiplied by the number of grains of sand on the earth, multiplied by the number of atoms in the universe.

Given the near-infinite possibilities, it’s truly remarkable that Progen was able to design effective enzymes so easily.

Beyond the Nobel Prize? For the first time in the biological world, ChatGPT has synthesized a new protein from scratch, and it has been published in the Nature sub-journal!

"Generate it from scratch right out of the box," said Dr. Ali Madani, founder of Profluent Bio and former research scientist at Salesforce Research. The ability to create functional proteins shows that we are entering a new era of protein design."

"This is a versatile new tool available to all protein engineers, and we look forward to seeing it used. Applied to treatment."

At the same time, researchers continue to improve ProGen, trying to break through more limitations and challenges.

One of them is that it relies heavily on data.

"We have explored ways to improve sequence design by adding structure-based information," Naik said. "We are also looking at when you don't have much information about a particular protein family or How to improve the model generation capabilities when using data in the field."

It is worth noting that some startups are also trying similar technologies, such as Cradle, and the Biotechnology Incubator Flagship Pioneering's Generate Biomedicines, but these studies have not yet been peer-reviewed.

The above is the detailed content of Beyond the Nobel Prize? For the first time in the biological world, 'ChatGPT' has synthesized a new protein from scratch, and it has been published in the Nature sub-journal!. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete
从VAE到扩散模型:一文解读以文生图新范式从VAE到扩散模型:一文解读以文生图新范式Apr 08, 2023 pm 08:41 PM

1 前言在发布DALL·E的15个月后,OpenAI在今年春天带了续作DALL·E 2,以其更加惊艳的效果和丰富的可玩性迅速占领了各大AI社区的头条。近年来,随着生成对抗网络(GAN)、变分自编码器(VAE)、扩散模型(Diffusion models)的出现,深度学习已向世人展现其强大的图像生成能力;加上GPT-3、BERT等NLP模型的成功,人类正逐步打破文本和图像的信息界限。在DALL·E 2中,只需输入简单的文本(prompt),它就可以生成多张1024*1024的高清图像。这些图像甚至

找不到中文语音预训练模型?中文版 Wav2vec 2.0和HuBERT来了找不到中文语音预训练模型?中文版 Wav2vec 2.0和HuBERT来了Apr 08, 2023 pm 06:21 PM

Wav2vec 2.0 [1],HuBERT [2] 和 WavLM [3] 等语音预训练模型,通过在多达上万小时的无标注语音数据(如 Libri-light )上的自监督学习,显著提升了自动语音识别(Automatic Speech Recognition, ASR),语音合成(Text-to-speech, TTS)和语音转换(Voice Conversation,VC)等语音下游任务的性能。然而这些模型都没有公开的中文版本,不便于应用在中文语音研究场景。 WenetSpeech [4] 是

普林斯顿陈丹琦:如何让「大模型」变小普林斯顿陈丹琦:如何让「大模型」变小Apr 08, 2023 pm 04:01 PM

“Making large models smaller”这是很多语言模型研究人员的学术追求,针对大模型昂贵的环境和训练成本,陈丹琦在智源大会青源学术年会上做了题为“Making large models smaller”的特邀报告。报告中重点提及了基于记忆增强的TRIME算法和基于粗细粒度联合剪枝和逐层蒸馏的CofiPruning算法。前者能够在不改变模型结构的基础上兼顾语言模型困惑度和检索速度方面的优势;而后者可以在保证下游任务准确度的同时实现更快的处理速度,具有更小的模型结构。陈丹琦 普

解锁CNN和Transformer正确结合方法,字节跳动提出有效的下一代视觉Transformer解锁CNN和Transformer正确结合方法,字节跳动提出有效的下一代视觉TransformerApr 09, 2023 pm 02:01 PM

由于复杂的注意力机制和模型设计,大多数现有的视觉 Transformer(ViT)在现实的工业部署场景中不能像卷积神经网络(CNN)那样高效地执行。这就带来了一个问题:视觉神经网络能否像 CNN 一样快速推断并像 ViT 一样强大?近期一些工作试图设计 CNN-Transformer 混合架构来解决这个问题,但这些工作的整体性能远不能令人满意。基于此,来自字节跳动的研究者提出了一种能在现实工业场景中有效部署的下一代视觉 Transformer——Next-ViT。从延迟 / 准确性权衡的角度看,

Stable Diffusion XL 现已推出—有什么新功能,你知道吗?Stable Diffusion XL 现已推出—有什么新功能,你知道吗?Apr 07, 2023 pm 11:21 PM

3月27号,Stability AI的创始人兼首席执行官Emad Mostaque在一条推文中宣布,Stable Diffusion XL 现已可用于公开测试。以下是一些事项:“XL”不是这个新的AI模型的官方名称。一旦发布稳定性AI公司的官方公告,名称将会更改。与先前版本相比,图像质量有所提高与先前版本相比,图像生成速度大大加快。示例图像让我们看看新旧AI模型在结果上的差异。Prompt: Luxury sports car with aerodynamic curves, shot in a

五年后AI所需算力超100万倍!十二家机构联合发表88页长文:「智能计算」是解药五年后AI所需算力超100万倍!十二家机构联合发表88页长文:「智能计算」是解药Apr 09, 2023 pm 07:01 PM

人工智能就是一个「拼财力」的行业,如果没有高性能计算设备,别说开发基础模型,就连微调模型都做不到。但如果只靠拼硬件,单靠当前计算性能的发展速度,迟早有一天无法满足日益膨胀的需求,所以还需要配套的软件来协调统筹计算能力,这时候就需要用到「智能计算」技术。最近,来自之江实验室、中国工程院、国防科技大学、浙江大学等多达十二个国内外研究机构共同发表了一篇论文,首次对智能计算领域进行了全面的调研,涵盖了理论基础、智能与计算的技术融合、重要应用、挑战和未来前景。论文链接:​https://spj.scien

​什么是Transformer机器学习模型?​什么是Transformer机器学习模型?Apr 08, 2023 pm 06:31 PM

译者 | 李睿审校 | 孙淑娟​近年来, Transformer 机器学习模型已经成为深度学习和深度神经网络技术进步的主要亮点之一。它主要用于自然语言处理中的高级应用。谷歌正在使用它来增强其搜索引擎结果。OpenAI 使用 Transformer 创建了著名的 GPT-2和 GPT-3模型。自从2017年首次亮相以来,Transformer 架构不断发展并扩展到多种不同的变体,从语言任务扩展到其他领域。它们已被用于时间序列预测。它们是 DeepMind 的蛋白质结构预测模型 AlphaFold

AI模型告诉你,为啥巴西最可能在今年夺冠!曾精准预测前两届冠军AI模型告诉你,为啥巴西最可能在今年夺冠!曾精准预测前两届冠军Apr 09, 2023 pm 01:51 PM

说起2010年南非世界杯的最大网红,一定非「章鱼保罗」莫属!这只位于德国海洋生物中心的神奇章鱼,不仅成功预测了德国队全部七场比赛的结果,还顺利地选出了最终的总冠军西班牙队。不幸的是,保罗已经永远地离开了我们,但它的「遗产」却在人们预测足球比赛结果的尝试中持续存在。在艾伦图灵研究所(The Alan Turing Institute),随着2022年卡塔尔世界杯的持续进行,三位研究员Nick Barlow、Jack Roberts和Ryan Chan决定用一种AI算法预测今年的冠军归属。预测模型图

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

Hot Tools

VSCode Windows 64-bit Download

VSCode Windows 64-bit Download

A free and powerful IDE editor launched by Microsoft

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

mPDF

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

SAP NetWeaver Server Adapter for Eclipse

SAP NetWeaver Server Adapter for Eclipse

Integrate Eclipse with SAP NetWeaver application server.