


7B open source mathematical model defeats billions of GPT-4, produced by a Chinese team
7B open source model, the mathematical power exceeds the 100 billion-scale GPT-4!
Its performance can be said to have broken through the limits of the open source model. Even researchers from Alibaba Tongyi lamented whether the scaling law has failed.
Without any external tools, it achieves an accuracy of 51.7% on the competition-level MATH dataset.
Among the open source models, it is the first to achieve half accuracy on this dataset, even surpassing the early and API versions of GPT-4.
This performance shocked the entire open source community, with Stability AI founder Emad Mostaque praising the R&D team as "impressive" and with "underestimated potential".
It is the deep search team’s latest open source 7B large mathematical model DeepSeekMath.
7B model beats the crowd
In order to evaluate the mathematical ability of DeepSeekMath, the research team used Chinese (MGSM-zh, CMATH) English (GSM8K, MATH )Bilingual data set was tested.
Without using auxiliary tools and relying only on the prompts of the chain of thought (CoT) , DeepSeekMath's performance surpassed other open source models, including the 70B large mathematical model MetaMATH.
Compared with the 67B general large model launched by the company, DeepSeekMath's results have also been significantly improved.
If we consider the closed-source model, DeepSeekMath also surpassed Gemini Pro and GPT-3.5 on several data sets, and surpassed GPT-4 on Chinese CMATH. The performance on MATH is also close to it.
But it should be noted that GPT-4 is a behemoth with hundreds of billions of parameters according to leaked specifications, while DeepSeekMath has only 7B parameters.
If the tool (Python) is allowed to be used for assistance, DeepSeekMath's performance on the competition difficulty (MATH) data set is still good can be increased by another 7 percentage points.
So, what technologies are applied behind the excellent performance of DeepSeekMath?
Built based on code model
In order to obtain better mathematical capabilities than from the general model, the research team used the code model DeepSeek-Coder-v1.5 to initialize it.
Because the team found that, whether in a two-stage training or a one-stage training setting, code training can improve the mathematical capabilities of the model compared to general data training.
Based on Coder, the research team continued to train 500 billion tokens. The data distribution is as follows:
In terms of training data, DeepSeekMath uses 120B of high-quality mathematical webpage data extracted from Common Crawl to obtain DeepSeekMath Corpus. The total data volume is 9 times that of the open source data set OpenWebMath.
The data collection process is carried out iteratively. After four iterations, the research team collected more than 35 million mathematical web pages, and the number of Tokens reached 120 billion.
In order to ensure that the training data does not contain the content of the test set (because the content in GSM8K and MATH exists in large quantities on the Internet), the research team also Specially filtered.
In order to verify the data quality of DeepSeekMath Corpus, the research team trained 150 billion tokens using multiple data sets such as MathPile. As a result, Corpus was significantly ahead in multiple mathematical benchmarks.
In the alignment stage, the research team first constructed a 776K sample Chinese and English mathematics guided supervised fine-tuning (SFT) data set, including CoT, PoT and tool-integrated reasoning and other three formats.
In the reinforcement learning (RL) stage, the research team used an efficient method called "group-based relative policy optimization" (Group Relative Policy Optimization, GRPO) algorithm.
GRPO is a variant of proximal policy optimization (PPO) . In the process, the traditional value function is replaced by a group-based relative reward estimate, which can reduce the complexity of the training process. Computational and memory requirements.
At the same time, GRPO is trained through an iterative process, and the reward model is continuously updated based on the output of the policy model to ensure continuous improvement of the policy.
Have launched the first domestic open source MoE model
The in-depth search team that launched DeepSeekMath is a "leading player" in the field of domestic open source models.
Previously, the team had launched the first domestic open source MoE model DeepSeek MoE. Its 7B version defeated the dense model Llama 2 of the same scale with 40% of the calculation amount.
As a general model, DeepSeek MoE's performance on coding and mathematical tasks is already very impressive, and its resource consumption is very low.
In terms of code, the programming capabilities of DeepSeek-Coder launched by the team exceed CodeLllama, an open source benchmark of the same scale.
At the same time, it also defeated GPT-3.5-Turbo and became the open source code model closest to GPT-4-Turbo.
As mentioned above, the DeepSeekMath launched this time is also built on the basis of Coder.
On X, some people are already looking forward to the MoE versions of Coder and Math.
Paper address: https://arxiv.org/abs/2402.03300
The above is the detailed content of 7B open source mathematical model defeats billions of GPT-4, produced by a Chinese team. For more information, please follow other related articles on the PHP Chinese website!

Running large language models at home with ease: LM Studio User Guide In recent years, advances in software and hardware have made it possible to run large language models (LLMs) on personal computers. LM Studio is an excellent tool to make this process easy and convenient. This article will dive into how to run LLM locally using LM Studio, covering key steps, potential challenges, and the benefits of having LLM locally. Whether you are a tech enthusiast or are curious about the latest AI technologies, this guide will provide valuable insights and practical tips. Let's get started! Overview Understand the basic requirements for running LLM locally. Set up LM Studi on your computer

Guy Peri is McCormick’s Chief Information and Digital Officer. Though only seven months into his role, Peri is rapidly advancing a comprehensive transformation of the company’s digital capabilities. His career-long focus on data and analytics informs

Introduction Artificial intelligence (AI) is evolving to understand not just words, but also emotions, responding with a human touch. This sophisticated interaction is crucial in the rapidly advancing field of AI and natural language processing. Th

Introduction In today's data-centric world, leveraging advanced AI technologies is crucial for businesses seeking a competitive edge and enhanced efficiency. A range of powerful tools empowers data scientists, analysts, and developers to build, depl

This week's AI landscape exploded with groundbreaking releases from industry giants like OpenAI, Mistral AI, NVIDIA, DeepSeek, and Hugging Face. These new models promise increased power, affordability, and accessibility, fueled by advancements in tr

But the company’s Android app, which offers not only search capabilities but also acts as an AI assistant, is riddled with a host of security issues that could expose its users to data theft, account takeovers and impersonation attacks from malicious

You can look at what’s happening in conferences and at trade shows. You can ask engineers what they’re doing, or consult with a CEO. Everywhere you look, things are changing at breakneck speed. Engineers, and Non-Engineers What’s the difference be

Simulate Rocket Launches with RocketPy: A Comprehensive Guide This article guides you through simulating high-power rocket launches using RocketPy, a powerful Python library. We'll cover everything from defining rocket components to analyzing simula


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

SublimeText3 English version
Recommended: Win version, supports code prompts!

Safe Exam Browser
Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

Dreamweaver Mac version
Visual web development tools

EditPlus Chinese cracked version
Small size, syntax highlighting, does not support code prompt function

SublimeText3 Mac version
God-level code editing software (SublimeText3)