10 lines of code improved the mathematics of large models by 20%. The research on 'Yeluzi' was also tested by Google. The main author is all self-taught.-Hardware Review-php.cn

10 lines of code improved the mathematics of large models by 20%. The research on 'Yeluzi' was also tested by Google. The main author is all self-taught.

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Aug 27, 2024 pm 03:31 PM

GoogleModelResearchmathOpen sourceauthormain

With less than 10 lines of code, the mathematical capabilities of large models (GSM8k) can be improved by 20%!

Several independent scholars have proposed improvements to large model sampling, which has attracted the attention of the open source community.

Currently, this method has achieved results on Mistral-7B, and testing on Llama3-70B is also ongoing.

10 行代码让大模型数学提升 20%，“野路子”研究谷歌也测上了，主要作者全靠自学成才

This method is called min-p sampling, which aims to balance the coherence and diversity of the generated text.

Simply put, it allows the model to exert different characteristics in different situations, such as maintaining stable performance on factual issues and being creative in scenarios such as writing.

Currently, this method has achieved results on Mistral-7B, and testing on Llama-70B is about to begin.

10 行代码让大模型数学提升 20%，“野路子”研究谷歌也测上了，主要作者全靠自学成才

In the paper, the author mentioned that this method has been widely used by the open source community.

10 行代码让大模型数学提升 20%，“野路子”研究谷歌也测上了，主要作者全靠自学成才

At the same time, the author also revealed that closed source model manufacturers such as Anthropic and Google have also tested or are testing min-p.

10 行代码让大模型数学提升 20%，“野路子”研究谷歌也测上了，主要作者全靠自学成才

The news has also been confirmed by Google. Logan Kilpatrick, the developer community leader who switched from OpenAI to Google, has replied "On it".

10 行代码让大模型数学提升 20%，“野路子”研究谷歌也测上了，主要作者全靠自学成才

Abram Jackson, a researcher at Microsoft Copilot, said after reading it that this is the first improvement he has seen regarding token sampling in the inference process, and there is still a lot of room for improvement in the future.

10 行代码让大模型数学提升 20%，“野路子”研究谷歌也测上了，主要作者全靠自学成才

It is worth mentioning that the main author of this widely watched study, Minh Nhat Nguyen, has never systematically learned CS at all, but is self-taught.

With the help of an AI security research organization called Apart Research, Minh and other members of the team completed the project.

10 行代码让大模型数学提升 20%，“野路子”研究谷歌也测上了，主要作者全靠自学成才

Dynamic adjustment of the sampling threshold

min-p is a dynamic truncation sampling method, the core of which is to scale the minimum probability threshold according to the maximum probability of the token distribution at each step.

The purpose of this is mainly to balance the coherence and diversity of the generated text, especially under higher temperature conditions.

Specifically, min-p introduces a basic probability threshold p_base, which represents the minimum probability requirement for entering the sampling pool.

When generating tokens at each step, min-p will multiply p_base with the largest token probability p_max in the current probability distribution to obtain a scaled absolute threshold p_scaled.

Only tokens with probability greater than or equal to p_scaled can enter the sampling pool.

When the model's prediction probability for a certain token is very high (that is, p_max is very large), the value of p_scaled will also be very high, causing the sampling pool to be greatly reduced, and the vast majority of low-probability tokens are filtered, leaving only a few with high confidence. The choice of ensures the consistency of the output;

10 行代码让大模型数学提升 20%，“野路子”研究谷歌也测上了，主要作者全靠自学成才

When the model’s prediction probabilities for all tokens are relatively close (p_max is lower), the value of p_scaled will also become lower accordingly, relaxing the requirements for the sampling pool , incorporating more medium-probability tokens gives the model more space to generate more diverse content.

10 行代码让大模型数学提升 20%，“野路子”研究谷歌也测上了，主要作者全靠自学成才

After determining the sampling pool, min-p will scale the token probability distribution according to temperature.

It divides the logarithmic probability of token by a temperature parameter τ, and after normalization, the scaled probability distribution of temperature is obtained.

A τ value greater than 1 will make the probability distribution flatter, increasing the chance of low-probability tokens being selected; when

τ is less than 1, it will make the distribution sharper, strengthening the advantages of high-probability tokens.

Finally, min-p randomly selects the next token from the scaled sampling pool according to the adjusted probability distribution.

Stability and creativity, "I want it all"

What is the effect of the min-p method? The author used Mistral-7B as the basic model for testing. Let's look at the results by scenario.

In the inference task, the author uses the GPQA dataset. When temperature is 1, you can see that min-p has a slight advantage over the past top-p.

As temperature increases, the GPQA score shows a downward trend overall, but it can be observed that min-p decreases significantly slower than top-p.

The downward trend of min-p does not become obvious until temperature reaches 3, when the score of top-p is close to 0.

In other words, compared to top-p, min-p better maintains the required stability in inference tasks.

10 行代码让大模型数学提升 20%，“野路子”研究谷歌也测上了，主要作者全靠自学成才

Mathematical tasks also need to maintain stable performance. Here the author used the GSM8K data set for testing.

The result is that the score corresponding to min-p decreases with temperature faster than in GPQA, but still slower than the top-p method.

10 行代码让大模型数学提升 20%，“野路子”研究谷歌也测上了，主要作者全靠自学成才

The third type of task is creative writing. At this time, the requirements for stability are not so high, but the model needs to be more creative.

This test was done using the AlpacaEval dataset, and the experimental data was obtained from an independent evaluator in the open source community.

Experimental results show that under the settings of temperature=1.5 and min-p=0.1, the performance of min-p is particularly outstanding and can generate creative writing content that is difficult to generate with the top-p method.

Under this parameter, the text obtained by the min-p method achieved a human judgment preference rate of 58.12%, which is much higher than the performance of other methods under similar settings.

10 行代码让大模型数学提升 20%，“野路子”研究谷歌也测上了，主要作者全靠自学成才

Paper address:

https://arxiv.org/abs/2407.01082

GitHub:

https://github.com/menhguin/minp_paper/

Reference link:

https:// x.com/menhguin/status/1826132708508213629

The above is the detailed content of 10 lines of code improved the mathematics of large models by 20%. The research on 'Yeluzi' was also tested by Google. The main author is all self-taught.. For more information, please follow other related articles on the PHP Chinese website!

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

How to fix KB5055523 fails to install in Windows 11?

3 weeks agoByDDD

How to fix KB5055518 fails to install in Windows 10?

3 weeks agoByDDD

Roblox: Dead Rails - How To Tame Wolves

3 weeks agoByDDD

Strength Levels for Every Enemy & Monster in R.E.P.O.

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Roblox: Grow A Garden - Complete Mutation Guide

2 weeks agoByDDD

Hot Tools

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.