Home >Technology peripherals >AI >From mice walking in the maze to AlphaGo defeating humans, the development of reinforcement learning
When it comes to reinforcement learning, many researchers’ adrenaline surges uncontrollably! It plays a very important role in game AI systems, modern robots, chip design systems and other applications.
There are many different types of reinforcement learning algorithms, but they are mainly divided into two categories: "model-based" and "model-free".
In a conversation with TechTalks, neuroscientist and author of "The Birth of Intelligence" Daeyeol Lee discusses different models of reinforcement learning in humans and animals, artificial intelligence and natural intelligence, and future research directions .
In the late 19th century, the "law of effect" proposed by psychologist Edward Thorndike became the basis of model-free reinforcement learning . Thorndike proposed that behaviors that have a positive impact in a specific situation are more likely to happen again in that situation, while behaviors that have a negative impact are less likely to happen again.
Thorndike explored this "law of effect" in an experiment. He placed a cat in a maze box and measured the time it took for the cat to escape from the box. To escape, the cat must operate a series of gadgets, such as ropes and levers. Thorndike observed that as the cat interacted with the puzzle box, it learned behaviors that aided in its escape. As time goes by, the cat escapes the box faster and faster. Thorndike concluded that cats can learn from the rewards and punishments their behaviors provide. The "Law of Effect" later paved the way for behaviorism. Behaviorism is a branch of psychology that attempts to explain human and animal behavior in terms of stimuli and responses. The “Law of Effect” is also the basis of model-free reinforcement learning. In model-free reinforcement learning, an agent perceives the world and then takes actions while measuring rewards.
In model-free reinforcement learning, there is no direct knowledge or world model. RL agents must directly experience the results of each action through trial and error.
Thorndike’s “Law of Effect” remained popular until the 1930s. Another psychologist at the time, Edward Tolman, discovered an important insight while exploring how rats quickly learned to navigate mazes. During his experiments, Tolman realized that animals could learn about their environment without reinforcement.
For example, when a mouse is released in a maze, it will freely explore the tunnel and gradually understand the structure of the environment. If the rat is then reintroduced to the same environment and provided with reinforcing signals, such as searching for food or finding an exit, it can reach the goal faster than an animal that has not explored the maze. Tolman calls this "latent learning", which becomes the basis of model-based reinforcement learning. "Latent learning" allows animals and humans to form a mental representation of their world, simulate hypothetical scenarios in their minds, and predict outcomes.
# The advantage of model-based reinforcement learning is that it eliminates the need for the agent to perform trial and error in the environment. It’s worth emphasizing that model-based reinforcement learning has been particularly successful in developing artificial intelligence systems capable of mastering board games such as chess and Go, possibly because the environments of these games are deterministic.
Generally speaking, model-based reinforcement learning will be very time-consuming. When it is extremely time-sensitive, it may Fatal danger occurs. "Computationally, model-based reinforcement learning is much more complex," Lee said. "First you have to obtain the model, perform a mental simulation, and then you have to find the trajectory of the neural process and then take action. However, model-based reinforcement learning is not necessarily It's more complicated than model-free RL." When the environment is very complex, if it can be modeled with a relatively simple model (which can be obtained quickly), then the simulation will be much simpler and cost-effective.
In fact, neither model-based reinforcement learning nor model-free reinforcement learning is a perfect solution. Wherever you see a reinforcement learning system solving a complex problem, it's likely that it uses both model-based and model-free reinforcement learning, and possibly even more forms of learning. Research in neuroscience shows that both humans and animals have multiple ways of learning, and that the brain is constantly switching between these modes at any given moment. In recent years, there has been growing interest in creating artificial intelligence systems that combine multiple reinforcement learning models. Recent research by scientists at UC San Diego shows that combining model-free reinforcement learning and model-based reinforcement learning can achieve superior performance in control tasks. "If you look at a complex algorithm like AlphaGo, it has both model-free RL elements and model-based RL elements," Lee said. "It learns state values based on the board configuration. It's basically model-free RL, but it Model-based forward search is also performed."
Despite significant achievements, progress in reinforcement learning remains slow. Once an RL model faces a complex and unpredictable environment, its performance begins to degrade.
Lee said: "I think our brain is a complex world of learning algorithms that have evolved to handle many different situations."
In addition to constantly moving between these learning modes Beyond switching, the brain also manages to maintain and update them all the time, even when they are not actively involved in decision-making.
Psychologist Daniel Kahneman said: "Maintaining different learning modules and updating them simultaneously can help improve the efficiency and accuracy of artificial intelligence systems."
We also need to understand another aspect. Thing - how to apply the right inductive bias in AI systems to ensure they learn the right things in a cost-effective way. Billions of years of evolution have given humans and animals the inductive bias needed to learn effectively while using as little data as possible. Inductive bias can be understood as summarizing the rules from the phenomena observed in real life, and then placing certain constraints on the model, which can play the role of model selection, that is, selecting a model that is more consistent with the real rules from the hypothesis space. . "We get very little information from the environment. Using that information, we have to generalize," Lee said. "The reason is that the brain has an inductive bias, and there's a bias to generalizing from a small set of examples. That's a product of evolution." "More and more neuroscientists are interested in this." However, while inductive bias is easy to understand in object recognition tasks, it becomes obscure in abstract problems such as constructing social relationships. In the future, there is still a lot we need to know~~~
https://thenextweb.com/news/everything-you-need-to-know-about- model-free-and-model-based-reinforcement-learning
The above is the detailed content of From mice walking in the maze to AlphaGo defeating humans, the development of reinforcement learning. For more information, please follow other related articles on the PHP Chinese website!