From mice walking in the maze to AlphaGo defeating humans, the development of reinforcement learning-AI-php.cn

Home

Technology peripherals

From mice walking in the maze to AlphaGo defeating humans, the development of reinforcement learning

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

May 09, 2023 pm 09:49 PM

aireinforcement learningmodel-based

When it comes to reinforcement learning, many researchers’ adrenaline surges uncontrollably! It plays a very important role in game AI systems, modern robots, chip design systems and other applications.

There are many different types of reinforcement learning algorithms, but they are mainly divided into two categories: "model-based" and "model-free".

In a conversation with TechTalks, neuroscientist and author of "The Birth of Intelligence" Daeyeol Lee discusses different models of reinforcement learning in humans and animals, artificial intelligence and natural intelligence, and future research directions .

From mice walking in the maze to AlphaGo defeating humans, the development of reinforcement learning

Model-free reinforcement learning

In the late 19th century, the "law of effect" proposed by psychologist Edward Thorndike became the basis of model-free reinforcement learning . Thorndike proposed that behaviors that have a positive impact in a specific situation are more likely to happen again in that situation, while behaviors that have a negative impact are less likely to happen again.

Thorndike explored this "law of effect" in an experiment. He placed a cat in a maze box and measured the time it took for the cat to escape from the box. To escape, the cat must operate a series of gadgets, such as ropes and levers. Thorndike observed that as the cat interacted with the puzzle box, it learned behaviors that aided in its escape. As time goes by, the cat escapes the box faster and faster. Thorndike concluded that cats can learn from the rewards and punishments their behaviors provide. The "Law of Effect" later paved the way for behaviorism. Behaviorism is a branch of psychology that attempts to explain human and animal behavior in terms of stimuli and responses. The “Law of Effect” is also the basis of model-free reinforcement learning. In model-free reinforcement learning, an agent perceives the world and then takes actions while measuring rewards.

In model-free reinforcement learning, there is no direct knowledge or world model. RL agents must directly experience the results of each action through trial and error.

Model-based reinforcement learning

Thorndike’s “Law of Effect” remained popular until the 1930s. Another psychologist at the time, Edward Tolman, discovered an important insight while exploring how rats quickly learned to navigate mazes. During his experiments, Tolman realized that animals could learn about their environment without reinforcement.

For example, when a mouse is released in a maze, it will freely explore the tunnel and gradually understand the structure of the environment. If the rat is then reintroduced to the same environment and provided with reinforcing signals, such as searching for food or finding an exit, it can reach the goal faster than an animal that has not explored the maze. Tolman calls this "latent learning", which becomes the basis of model-based reinforcement learning. "Latent learning" allows animals and humans to form a mental representation of their world, simulate hypothetical scenarios in their minds, and predict outcomes.

From mice walking in the maze to AlphaGo defeating humans, the development of reinforcement learning

# The advantage of model-based reinforcement learning is that it eliminates the need for the agent to perform trial and error in the environment. It’s worth emphasizing that model-based reinforcement learning has been particularly successful in developing artificial intelligence systems capable of mastering board games such as chess and Go, possibly because the environments of these games are deterministic.

From mice walking in the maze to AlphaGo defeating humans, the development of reinforcement learning

Model-based VS model-free

Generally speaking, model-based reinforcement learning will be very time-consuming. When it is extremely time-sensitive, it may Fatal danger occurs. "Computationally, model-based reinforcement learning is much more complex," Lee said. "First you have to obtain the model, perform a mental simulation, and then you have to find the trajectory of the neural process and then take action. However, model-based reinforcement learning is not necessarily It's more complicated than model-free RL." When the environment is very complex, if it can be modeled with a relatively simple model (which can be obtained quickly), then the simulation will be much simpler and cost-effective.

Multiple learning modes

In fact, neither model-based reinforcement learning nor model-free reinforcement learning is a perfect solution. Wherever you see a reinforcement learning system solving a complex problem, it's likely that it uses both model-based and model-free reinforcement learning, and possibly even more forms of learning. Research in neuroscience shows that both humans and animals have multiple ways of learning, and that the brain is constantly switching between these modes at any given moment. In recent years, there has been growing interest in creating artificial intelligence systems that combine multiple reinforcement learning models. Recent research by scientists at UC San Diego shows that combining model-free reinforcement learning and model-based reinforcement learning can achieve superior performance in control tasks. "If you look at a complex algorithm like AlphaGo, it has both model-free RL elements and model-based RL elements," Lee said. "It learns state values based on the board configuration. It's basically model-free RL, but it Model-based forward search is also performed."

Despite significant achievements, progress in reinforcement learning remains slow. Once an RL model faces a complex and unpredictable environment, its performance begins to degrade.

Lee said: "I think our brain is a complex world of learning algorithms that have evolved to handle many different situations."

In addition to constantly moving between these learning modes Beyond switching, the brain also manages to maintain and update them all the time, even when they are not actively involved in decision-making.

Psychologist Daniel Kahneman said: "Maintaining different learning modules and updating them simultaneously can help improve the efficiency and accuracy of artificial intelligence systems."

We also need to understand another aspect. Thing - how to apply the right inductive bias in AI systems to ensure they learn the right things in a cost-effective way. Billions of years of evolution have given humans and animals the inductive bias needed to learn effectively while using as little data as possible. Inductive bias can be understood as summarizing the rules from the phenomena observed in real life, and then placing certain constraints on the model, which can play the role of model selection, that is, selecting a model that is more consistent with the real rules from the hypothesis space. . "We get very little information from the environment. Using that information, we have to generalize," Lee said. "The reason is that the brain has an inductive bias, and there's a bias to generalizing from a small set of examples. That's a product of evolution." "More and more neuroscientists are interested in this." However, while inductive bias is easy to understand in object recognition tasks, it becomes obscure in abstract problems such as constructing social relationships. In the future, there is still a lot we need to know~~~

Reference materials:

https://thenextweb.com/news/everything-you-need-to-know-about- model-free-and-model-based-reinforcement-learning

The above is the detailed content of From mice walking in the maze to AlphaGo defeating humans, the development of reinforcement learning. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete

Tool Calling in LLMsApr 14, 2025 am 11:28 AM

Large language models (LLMs) have surged in popularity, with the tool-calling feature dramatically expanding their capabilities beyond simple text generation. Now, LLMs can handle complex automation tasks such as dynamic UI creation and autonomous a

How ADHD Games, Health Tools & AI Chatbots Are Transforming Global HealthApr 14, 2025 am 11:27 AM

Can a video game ease anxiety, build focus, or support a child with ADHD? As healthcare challenges surge globally — especially among youth — innovators are turning to an unlikely tool: video games. Now one of the world’s largest entertainment indus

UN Input On AI: Winners, Losers, And OpportunitiesApr 14, 2025 am 11:25 AM

“History has shown that while technological progress drives economic growth, it does not on its own ensure equitable income distribution or promote inclusive human development,” writes Rebeca Grynspan, Secretary-General of UNCTAD, in the preamble.

Learning Negotiation Skills Via Generative AIApr 14, 2025 am 11:23 AM

Easy-peasy, use generative AI as your negotiation tutor and sparring partner. Let’s talk about it. This analysis of an innovative AI breakthrough is part of my ongoing Forbes column coverage on the latest in AI, including identifying and explaining

TED Reveals From OpenAI, Google, Meta Heads To Court, Selfie With MyselfApr 14, 2025 am 11:22 AM

The TED2025 Conference, held in Vancouver, wrapped its 36th edition yesterday, April 11. It featured 80 speakers from more than 60 countries, including Sam Altman, Eric Schmidt, and Palmer Luckey. TED’s theme, “humanity reimagined,” was tailor made

Joseph Stiglitz Warns Of The Looming Inequality Amid AI Monopoly PowerApr 14, 2025 am 11:21 AM

Joseph Stiglitz is renowned economist and recipient of the Nobel Prize in Economics in 2001. Stiglitz posits that AI can worsen existing inequalities and consolidated power in the hands of a few dominant corporations, ultimately undermining economic

What is Graph Database?Apr 14, 2025 am 11:19 AM

Graph Databases: Revolutionizing Data Management Through Relationships As data expands and its characteristics evolve across various fields, graph databases are emerging as transformative solutions for managing interconnected data. Unlike traditional

LLM Routing: Strategies, Techniques, and Python ImplementationApr 14, 2025 am 11:14 AM

Large Language Model (LLM) Routing: Optimizing Performance Through Intelligent Task Distribution The rapidly evolving landscape of LLMs presents a diverse range of models, each with unique strengths and weaknesses. Some excel at creative content gen

See all articles