Many people know that AlphaGo, which defeated Li Sedol, Ke Jie and other top international chess players, has had three iterations, namely the first-generation AlphaGo Lee, which defeated Li Sedol, and the second-generation AlphaGo Master, which defeated Ke Jie. And the third generation AlphaGo Zero, which beats the previous two generations.
AlphaGo’s chess skills can increase from generation to generation. Behind this is actually an obvious trend in AI technology, which is the increasing proportion of reinforcement learning.
In recent years, reinforcement learning has undergone another "evolution". People call the "evolved" reinforcement learning deep reinforcement learning.
But the sample efficiency of deep reinforcement learning agents is low, which greatly limits their application in practical problems.
Recently, many model-based methods have been designed to solve this problem, and learning in the imagination of world models is one of the most prominent methods.
However, while nearly unlimited interaction with a simulated environment sounds appealing, the world model must remain accurate over long periods of time.
Inspired by the success of Transformer in sequence modeling tasks, Vincent Micheli, Eloy Alonso, and François Fleure of Cornell University introduced IRIS. This is a data-efficient agent that learns in a world model composed of discrete autoencoders and autoregressive Transformers.
On the Atari 100k benchmark, over the equivalent of just two hours of gameplay, IRIS achieved an average human-normalized score of 1.046 and outperformed humans in 10 out of 26 games.
Previously, LeCun once said that reinforcement learning will lead to a dead end.
Now it seems that Cornell University’s Vincent Micheli, Eloy Alonso, Francois Fleure and others are integrating world models and reinforcement learning (more precisely, deep reinforcement learning), and the bridge connecting the two is Transformers.
What’s different about deep reinforcement learning
When it comes to artificial intelligence technology, what many people can think of is deep learning.
In fact, although deep learning is still active in the field of AI, many problems have been exposed.
The most commonly used method of deep learning now is supervised learning. Supervised learning may be understood as "learning with reference answers". One of its characteristics is that the data must be labeled before it can be used for training. But now a large amount of data is unlabeled data, and the cost of labeling is very high.
So much so that in response to this situation, some people joked that "there is as much intelligence as there are artificial intelligence."
Many researchers, including many experts, are reflecting on whether deep learning is "wrong".
So, reinforcement learning began to rise.
Reinforcement learning is different from supervised learning and unsupervised learning. It uses an agent to continuously trial and error, and rewards and punishes the AI according to the trial and error results. This is DeepMind’s method for making various chess and card AI and game AI. Believers of this path believe that as long as the reward incentives are set correctly, reinforcement learning will eventually create a real AGI.
But reinforcement learning also has problems. In LeCun’s words, “reinforcement learning requires a huge amount of data to train the model to perform the simplest tasks.”
So reinforcement learning and deep learning were combined to become deep reinforcement learning.
Deep reinforcement learning, reinforcement learning is the skeleton, and deep learning is the soul. What does this mean? The main operating mechanism of deep reinforcement learning is actually basically the same as reinforcement learning, except that a deep neural network is used to complete this process.
What’s more, some deep reinforcement learning algorithms simply add a deep neural network to the existing reinforcement learning algorithm to implement a new set of deep reinforcement learning algorithms. The very famous deep reinforcement learning Algorithm DQN is a typical example.
What’s so magical about Transformers
Transformers first appeared in 2017 and were proposed in Google’s paper “Attention is All You Need”.
Before the emergence of Transformer, the progress of artificial intelligence in language tasks had lagged behind the development of other fields. “Natural language processing has been somewhat of a latecomer to this deep learning revolution that’s happened over the past decade,” says Anna Rumshisky, a computer scientist at the University of Massachusetts Lowell. “In a sense, NLP was Lagging behind computer vision, Transformer changes this."
In recent years, the Transformer machine learning model has become one of the main highlights of the advancement of deep learning and deep neural network technology. It is mainly used for advanced applications in natural language processing. Google is using it to enhance its search engine results.
Transformer quickly became a leader in applications such as word recognition focused on analyzing and predicting text. It sparked a wave of tools like OpenAI’s GPT-3 that can be trained on hundreds of billions of words and generate coherent new text.
Currently, the Transformer architecture continues to evolve and expand into many different variants, extending from language tasks to other domains. For example, Transformer has been used for time series prediction and is also the key innovation behind DeepMind’s protein structure prediction model AlphaFold.
Transformers have also recently entered the field of computer vision, and they are slowly replacing convolutional neural networks (CNN) in many complex tasks.
World Model and Transformers join forces, what do other people think
Regarding the research results of Cornell University, some foreign netizens commented: "Please note that these two Hours are the length of shots from the environment, and training on the GPU takes a week."
Some people also question: So this system learns on a particularly accurate potential world model? Does the model require no pre-training?
In addition, some people feel that the results of Vincent Micheli and others from Cornell University are not ground-breaking breakthroughs: "It seems that they just trained the world model, vqvae and actor critics, all of which are Replay buffer from those 2 hours of experience (and about 600 epochs)".
Reference: https://www.reddit.com/r/MachineLearning/comments/x4e4jx/r_transformers_are_sample_efficient_world_models/
The above is the detailed content of Transformers+world model, can it save deep reinforcement learning?. For more information, please follow other related articles on the PHP Chinese website!

很多刚刚接触酷家乐软件的用户,不是很熟悉酷家乐如何自己建模?以下文章就为各位带来了酷家乐自己建模的操作步骤,让我们一起来看看吧。进入酷家乐平台,在酷家乐里,点击进入设计装修界面。在设计界面,点击左侧的行业库,在行业库里点击全屋硬装工具。在全屋硬装工具里,可以进行建模操作。

随着互联网的普及,Web应用的需求越来越高。在过去,我们可能使用PHP、Java或Python等语言构建Web应用,但是随着新的技术的不断涌现,我们现在更多选择使用Golang构建Web应用。在Golang中,Iris是一款非常优秀的Web框架,它拥有着和其他主流Web框架一样的功能和使用便利性。在本文中,我们将探讨使用Iris框架构建Web应用的基础知识。

很多人都知道,当年打败李世石、柯洁等一众国际顶尖棋手的AlphaGo一共迭代了三个版本,分别是战胜李世石的一代目AlphaGoLee、战胜柯洁的二代目AlphaGoMaster,以及吊打前两代的三代目AlphaGoZero。AlphaGo的棋艺能够逐代递增,背后其实是在AI技术上一个出现了明显的变化趋势,就是强化学习的比重越来越大。到了近几年,强化学习又发生了一次「进化」,人们把「进化」后的强化学习,称为深度强化学习。但深度强化学习代理的样本效率低下,这极大地限制了它们在实际问题中的应用。最近

随着人工智能和机器学习技术的快速发展,深度学习已经成为人工智能领域的热门技术之一。Python作为一种易学易用的编程语言,已经成为了许多深度学习从业者的首选语言。本文将为大家介绍如何在Python中使用深度建模。1.安装和配置Python环境首先,我们需要安装Python和相关的深度学习库。目前,Python中最常用的深度学习库是TensorFlow和PyT

go语言建模库的官网有:1、GORM,简单但功能强大的ORM库;2、XORM,具有高性能和易用性;3、beego ORM,提供了简洁的API来处理数据库访问和数据映射;4、sqlx,轻量级的数据库工具库;5、gorp,提供了简单的API来处理数据的持久化和查询。

Numpy是Python中最常用的数学库之一,它集成了许多最佳的数学函数和操作。Numpy的使用非常广泛,包括统计、线性代数、图像处理、机器学习、神经网络等领域。在数据分析和建模方面,Numpy更是必不可少的工具之一。本文将分享Numpy常用的数学函数,以及使用这些函数实现数据分析和建模的示例代码。一、创建数组使用Numpy中array()函数可以创建一个数

一键生成可玩游戏世界。问世才两个星期,谷歌的世界模型也来了,能力看起来更强大:它生成的虚拟世界「自主可控」。刚刚,谷歌定义了生成式AI的全新范式——生成式交互环境(Genie,GenerativeInteractiveEnvironments)。Genie是一个110亿参数的基础世界模型,可以通过单张图像提示生成可玩的交互式环境。我们可以用它从未见过的图像进行提示,然后与自己想象中的虚拟世界进行互动。不管是合成图像、照片甚至手绘草图,Genie都可以从中生成无穷无尽的可玩世界。Ge

最近一段时间,生成式AI技术兴起,众多造车新势力都在探索视觉语言模型与世界模型的新方法,端到端的智能驾驶新技术似乎成为了共同的研究方向。上个月,理想汽车发布了端到端+VLM视觉语言模型+世界模型的第三代自动驾驶技术架构。此架构已推送千人内测,将智能驾驶行为拟人化,提高了AI的信息处理效率,增强了对复杂路况的理解和应对能力。李想曾在公开的分享中表示,面对大部分算法难以识别和处理的罕见驾驶环境,VLM(VisualLanguageModel)即视觉语言模型可以系统地提升自动驾驶的能力,这种方法从理论


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Safe Exam Browser
Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

ZendStudio 13.5.1 Mac
Powerful PHP integrated development environment

SublimeText3 English version
Recommended: Win version, supports code prompts!

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools
