


ICML 2024 Oral | Is DPO more suitable for LLM than PPO? Tsinghua Wuyi team's latest revelation

The AIxiv column is a column where this site publishes academic and technical content. In the past few years, the AIxiv column of this site has received more than 2,000 reports, covering top laboratories from major universities and companies around the world, effectively promoting academic exchanges and dissemination. If you have excellent work that you want to share, please feel free to contribute or contact us for reporting. Submission email: liyazhou@jiqizhixin.com; zhaoyunfeng@jiqizhixin.com
Paper title: Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study Paper address: https://arxiv.org/pdf/2404.10719
Use a large batch size (large batch size) advantage normalization and update the reference model using exponential moving average (exponential moving average for the reference model).
NeurIPS 2022 The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games [1]: Proposed and open sourced the reinforcement learning parallel training framework MAPPO for multi-agent to support multi-agent training in cooperative scenarios. This work was It has been used in a large number of works in the field of multi-agent, and the current number of citations in papers has exceeded 1k. ICLR 2024 Scaling Distributed Reinforcement Learning to Over Ten Thousand Cores [2]: Proposed a distributed training framework for reinforcement learning, which can be easily expanded to tens of thousands of cores, and the acceleration ratio exceeds OpenAI's large-scale reinforcement learning system Rapid . ReaLHF: Optimized RLHF Training for Large Language Models through Parameter Reallocation [3]: Recently, Wu Yi’s team further implemented the distributed RLHF training framework ReaLHF. The ICML Oral paper of Wu Yi's team was produced based on the ReaLHF system. The ReaLHF system has been developed for a long time and has undergone a lot of detail polishing to achieve optimal performance. Compared with previous open source work, ReaLHF can achieve near-linear scalability in RLHF, a scenario that is more complex than pre-training. It also has higher resource utilization and can perform RLHF stably and quickly on 128 A100 GPUs. Training, related work has been open sourced: https://github.com/openpsi-project/ReaLHF
The above is the detailed content of ICML 2024 Oral | Is DPO more suitable for LLM than PPO? Tsinghua Wuyi team's latest revelation. For more information, please follow other related articles on the PHP Chinese website!

Hey there, Coding ninja! What coding-related tasks do you have planned for the day? Before you dive further into this blog, I want you to think about all your coding-related woes—better list those down. Done? – Let’

AI Augmenting Food Preparation While still in nascent use, AI systems are being increasingly used in food preparation. AI-driven robots are used in kitchens to automate food preparation tasks, such as flipping burgers, making pizzas, or assembling sa

Introduction Understanding the namespaces, scopes, and behavior of variables in Python functions is crucial for writing efficiently and avoiding runtime errors or exceptions. In this article, we’ll delve into various asp

Introduction Imagine walking through an art gallery, surrounded by vivid paintings and sculptures. Now, what if you could ask each piece a question and get a meaningful answer? You might ask, “What story are you telling?

Continuing the product cadence, this month MediaTek has made a series of announcements, including the new Kompanio Ultra and Dimensity 9400 . These products fill in the more traditional parts of MediaTek’s business, which include chips for smartphone

#1 Google launched Agent2Agent The Story: It’s Monday morning. As an AI-powered recruiter you work smarter, not harder. You log into your company’s dashboard on your phone. It tells you three critical roles have been sourced, vetted, and scheduled fo

I would guess that you must be. We all seem to know that psychobabble consists of assorted chatter that mixes various psychological terminology and often ends up being either incomprehensible or completely nonsensical. All you need to do to spew fo

Only 9.5% of plastics manufactured in 2022 were made from recycled materials, according to a new study published this week. Meanwhile, plastic continues to pile up in landfills–and ecosystems–around the world. But help is on the way. A team of engin


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

MantisBT
Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

MinGW - Minimalist GNU for Windows
This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

ZendStudio 13.5.1 Mac
Powerful PHP integrated development environment

EditPlus Chinese cracked version
Small size, syntax highlighting, does not support code prompt function

Zend Studio 13.0.1
Powerful PHP integrated development environment