Home > Article > Technology peripherals > GPT4 teaches a robot to turn a pen, which is called silky smoothness!
Recently, GPT-4, which inspired mathematician Terence Tao, started to teach robots how to turn pens in chats
The project is called Agent Eureka, developed by NVIDIA , the University of Pennsylvania, the California Institute of Technology and the University of Texas at Austin were jointly developed. Their research combines the power of the GPT-4 structure with the advantages of reinforcement learning, allowing Eureka to design exquisite reward functions.
The programming capabilities of GPT-4 give Eureka powerful reward function design skills. This means that in most tasks, Eureka’s own reward schemes are even better than those of human experts. This allows it to complete some tasks that are difficult for humans to complete, including turning pens, opening drawers, plate walnuts, and even more complex tasks, such as throwing and catching a ball, operating scissors, etc.
Picture
Picture
Although these are currently done in a simulation environment, But this is already very powerful.
The project has been open sourced, and the project address and paper address have been placed at the end of the article.
Briefly summarize the core points of the paper.
The paper explores how to use large language models (LLM) to design and optimize reward functions in machine learning. This is an important topic because designing a good reward function can greatly improve the performance of machine learning models, but designing such a function is very difficult.
The researchers proposed a new algorithm called EUREKA. EUREKA adopts LLM to generate and improve reward functions. In testing, EUREKA achieved human-level performance in 29 different reinforcement learning environments and surpassed reward functions designed by human experts in 83% of tasks
EUREKA successfully solved some previously unreachable problems Complex operation tasks solved by artificially designed reward functions, such as simulating the operation of a "Shadow Hand" hand to quickly turn a pen
In addition, EUREKA provides a brand-new method that can generate more effective, A reward function that is more consistent with human expectations
The way EUREKA works consists of three main steps:
Environment as context: EUREKA uses the source code of the environment as context to generate an executable reward function
2. Evolutionary search: EUREKA continuously proposes and improves reward functions through evolutionary search
3. Reward reflection: EUREKA generates text summaries of reward quality based on statistical data from policy training, thereby Automatic and targeted improvement of reward functions. 3. Reward reflection: EUREKA generates textual summaries of reward quality based on statistical data from policy training to automatically and targetedly improve reward functions
This research may have far-reaching implications for the fields of reinforcement learning and reward function design Impact because it provides a new and efficient way to automatically generate and improve reward functions, and the performance of this method exceeds that of human experts in many cases.
Project address: https://www.php.cn/link/e6b738eca0e6792ba8a9cbcba6c1881d
Paper link: https://www.php.cn/ link/ce128c3e8f0c0ae4b3e843dc7cbab0f7
The above is the detailed content of GPT4 teaches a robot to turn a pen, which is called silky smoothness!. For more information, please follow other related articles on the PHP Chinese website!