Consumer grade graphics cards available! Li Kaifu released and open sourced the 9 billion parameter Yi model, which has the strongest code mathematical ability in history-AI-php.cn

Consumer grade graphics cards available! Li Kaifu released and open sourced the 9 billion parameter Yi model, which has the strongest code mathematical ability in history

PHPz

Mar 07, 2024 pm 05:50 PM

dataModel

Zero One, an AI company owned by Kai-Fu Lee, has another big model player on the stage:

9 billion parameter Yi-9B.

Consumer grade graphics cards available! Li Kaifu released and open sourced the 9 billion parameter Yi model, which has the strongest code mathematical ability in history

It is known as the "Science Number One" in the Yi series. It "makes up for" code mathematics, and at the same time, its comprehensive ability has not fallen behind.

In a series of open source models of similar scale (including Mistral-7B, SOLAR-10.7B, Gemma-7B, DeepSeek-Coder-7B-Base-v1.5, etc.) , Best performance.

Old rule, release is open source, especially Friendly to developers:

Yi-9B (BF 16) and its quantized version Yi- 9B (Int8) can be deployed on consumer-grade graphics cards.

An RTX 4090 or an RTX 3090 is enough.

Consumer grade graphics cards available! Li Kaifu released and open sourced the 9 billion parameter Yi model, which has the strongest code mathematical ability in history

Deeply amplified and multi-stage incremental training

The Yi family of Zero One Thousand Things has previously released the Yi-6B and Yi-34B series .

Both of these are pre-trained on 3.1T token Chinese and English data. Yi-9B is based on this and adds 0.8T token to continue training.

The deadline for data is June 2023.

It was mentioned at the beginning that the biggest improvement of Yi-9B lies in mathematics and coding, so how can these two abilities be improved?

Introduction to Zero One Thousand Things:

Simply increasing the amount of data cannot meet expectations.

relies on first increasing the model size, increasing it to 9B based on Yi-6B, then performing multi-stage data incremental training.

First of all, how to increase the model size?

One premise is that the team found through analysis:

Yi-6B has been fully trained, and the training effect may not improve no matter how much more tokens are added. So consider increasing its size. (The unit in the picture below is not TB but B)

Consumer grade graphics cards available! Li Kaifu released and open sourced the 9 billion parameter Yi model, which has the strongest code mathematical ability in history

How to increase? The answer is deep amplification.

Introduction to Zero One Thing:

Expanding the width of the original model will bring more performance losses. After depth amplification of the model by selecting the appropriate layer, add a new layer The closer the input/output cosine is to 1.0, that is, the performance of the amplified model can maintain the performance of the original model, and the model performance loss is slight.

According to this idea, Zero Yiwu chose to copy the relatively backward 16 layers (layers 12-28) of Yi-6B to form the 48-layer Yi-9B.

Experiments show that this method has better performance than using the Solar-10.7B model to copy the middle 16 layers (layers 8-24) .

Secondly, what is the multi-stage training method?

The answer is to first add 0.4T data containing text and code, but the data ratio is the same as Yi-6B.

Then add another 0.4T data, which also includes text and code, but focuses on increasing the proportion of code and mathematical data.

(Understood, it’s the same idea as our trick “think step by step” in large model questions)

After these two steps are completed, it’s not over yet , the team also referred to the ideas of two papers (An Empirical Model of Large-Batch Training and Don't Decay the Learning Rate, Increase the Batch Size) to optimize the parameter adjustment method.

That is, starting from a fixed learning rate, every time the model loss stops declining, the batch size is increased so that the decline is uninterrupted and the model learns more fully.

In the end, Yi-9B actually contained a total of 8.8 billion parameters, reaching a 4k context length.

The Yi series has the strongest coding and mathematical capabilities

In actual testing, Zero Yiwu uses the greedy decoding generation method (that is, each time the word with the highest probability value is selected) to test.

The participating models are DeepSeek-Coder, DeepSeek-Math, Mistral-7B, SOLAR-10.7B and Gemma-7B:

(1)DeepSeek-Coder, from a domestic deep search company, its 33B instruction tuning version surpasses GPT-3.5-turbo in human evaluation, and the performance of the 7B version can reach the performance of CodeLlama-34B.

DeepSeek-Math relied on 7B parameters to overturn GPT-4, shocking the entire open source community.

(2)SOLAR-10.7BUpstage AI from South Korea was born in December 2023, and its performance surpasses Mixtral-8x7B-Instruct.

(3)Mistral-7B is the first open source MoE large model, reaching or even surpassing the level of Llama 2 70B and GPT-3.5.

(4)Gemma-7BFrom Google, Zero Yiwu pointed out:

The number of effective parameters is actually at the same level as Yi-9B .

(The naming standards of the two are different. The former only uses Non-Embedding parameters, while the latter uses all parameters and rounds them up)

Consumer grade graphics cards available! Li Kaifu released and open sourced the 9 billion parameter Yi model, which has the strongest code mathematical ability in history

The results are as follows.

First of all, in terms of coding tasks, the performance of Yi-9B is second only to DeepSeek-Coder-7B, and the other four are all KO.

Consumer grade graphics cards available! Li Kaifu released and open sourced the 9 billion parameter Yi model, which has the strongest code mathematical ability in history

In terms of mathematical ability, the performance of Yi-9B is second only to DeepSeek-Math-7B, surpassing the other four.

Consumer grade graphics cards available! Li Kaifu released and open sourced the 9 billion parameter Yi model, which has the strongest code mathematical ability in history

The overall ability is not bad either.

Its performance is the best among open source models of similar size, surpassing all other five players.

Consumer grade graphics cards available! Li Kaifu released and open sourced the 9 billion parameter Yi model, which has the strongest code mathematical ability in history

Finally, common sense and reasoning ability were also tested:

The result is that Yi-9B is different from Mistral-7B, SOLAR-10.7B and Gemma-7B Up and down.

and language skills, not only English is good, but Chinese is also widely praised:

Consumer grade graphics cards available! Li Kaifu released and open sourced the 9 billion parameter Yi model, which has the strongest code mathematical ability in history

Finally, after reading these, some netizens said: I can’t wait to try it tried.

Consumer grade graphics cards available! Li Kaifu released and open sourced the 9 billion parameter Yi model, which has the strongest code mathematical ability in history

Others are worried about DeepSeek:

Hurry up and strengthen your "game". The overall dominance is gone==

Consumer grade graphics cards available! Li Kaifu released and open sourced the 9 billion parameter Yi model, which has the strongest code mathematical ability in history

The portal is here: https://huggingface.co/01-ai/Yi-9B

The above is the detailed content of Consumer grade graphics cards available! Li Kaifu released and open sourced the 9 billion parameter Yi model, which has the strongest code mathematical ability in history. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete

Can't use ChatGPT! Explaining the causes and solutions that can be tested immediately [Latest 2025]May 14, 2025 am 05:04 AM

ChatGPT is not accessible? This article provides a variety of practical solutions! Many users may encounter problems such as inaccessibility or slow response when using ChatGPT on a daily basis. This article will guide you to solve these problems step by step based on different situations. Causes of ChatGPT's inaccessibility and preliminary troubleshooting First, we need to determine whether the problem lies in the OpenAI server side, or the user's own network or device problems. Please follow the steps below to troubleshoot: Step 1: Check the official status of OpenAI Visit the OpenAI Status page (status.openai.com) to see if the ChatGPT service is running normally. If a red or yellow alarm is displayed, it means Open

Calculating The Risk Of ASI Starts With Human MindsMay 14, 2025 am 05:02 AM

On 10 May 2025, MIT physicist Max Tegmark told The Guardian that AI labs should emulate Oppenheimer’s Trinity-test calculus before releasing Artificial Super-Intelligence. “My assessment is that the 'Compton constant', the probability that a race to

An easy-to-understand explanation of how to write and compose lyrics and recommended tools in ChatGPTMay 14, 2025 am 05:01 AM

AI music creation technology is changing with each passing day. This article will use AI models such as ChatGPT as an example to explain in detail how to use AI to assist music creation, and explain it with actual cases. We will introduce how to create music through SunoAI, AI jukebox on Hugging Face, and Python's Music21 library. Through these technologies, everyone can easily create original music. However, it should be noted that the copyright issue of AI-generated content cannot be ignored, and you must be cautious when using it. Let’s explore the infinite possibilities of AI in the music field together! OpenAI's latest AI agent "OpenAI Deep Research" introduces: [ChatGPT]Ope

What is ChatGPT-4? A thorough explanation of what you can do, the pricing, and the differences from GPT-3.5!May 14, 2025 am 05:00 AM

The emergence of ChatGPT-4 has greatly expanded the possibility of AI applications. Compared with GPT-3.5, ChatGPT-4 has significantly improved. It has powerful context comprehension capabilities and can also recognize and generate images. It is a universal AI assistant. It has shown great potential in many fields such as improving business efficiency and assisting creation. However, at the same time, we must also pay attention to the precautions in its use. This article will explain the characteristics of ChatGPT-4 in detail and introduce effective usage methods for different scenarios. The article contains skills to make full use of the latest AI technologies, please refer to it. OpenAI's latest AI agent, please click the link below for details of "OpenAI Deep Research"

Explaining how to use the ChatGPT app! Japanese support and voice conversation functionMay 14, 2025 am 04:59 AM

ChatGPT App: Unleash your creativity with the AI assistant! Beginner's Guide The ChatGPT app is an innovative AI assistant that handles a wide range of tasks, including writing, translation, and question answering. It is a tool with endless possibilities that is useful for creative activities and information gathering. In this article, we will explain in an easy-to-understand way for beginners, from how to install the ChatGPT smartphone app, to the features unique to apps such as voice input functions and plugins, as well as the points to keep in mind when using the app. We'll also be taking a closer look at plugin restrictions and device-to-device configuration synchronization

How do I use the Chinese version of ChatGPT? Explanation of registration procedures and feesMay 14, 2025 am 04:56 AM

ChatGPT Chinese version: Unlock new experience of Chinese AI dialogue ChatGPT is popular all over the world, did you know it also offers a Chinese version? This powerful AI tool not only supports daily conversations, but also handles professional content and is compatible with Simplified and Traditional Chinese. Whether it is a user in China or a friend who is learning Chinese, you can benefit from it. This article will introduce in detail how to use ChatGPT Chinese version, including account settings, Chinese prompt word input, filter use, and selection of different packages, and analyze potential risks and response strategies. In addition, we will also compare ChatGPT Chinese version with other Chinese AI tools to help you better understand its advantages and application scenarios. OpenAI's latest AI intelligence

5 AI Agent Myths You Need To Stop Believing NowMay 14, 2025 am 04:54 AM

These can be thought of as the next leap forward in the field of generative AI, which gave us ChatGPT and other large-language-model chatbots. Rather than simply answering questions or generating information, they can take action on our behalf, inter

An easy-to-understand explanation of the illegality of creating and managing multiple accounts using ChatGPTMay 14, 2025 am 04:50 AM

Efficient multiple account management techniques using ChatGPT | A thorough explanation of how to use business and private life! ChatGPT is used in a variety of situations, but some people may be worried about managing multiple accounts. This article will explain in detail how to create multiple accounts for ChatGPT, what to do when using it, and how to operate it safely and efficiently. We also cover important points such as the difference in business and private use, and complying with OpenAI's terms of use, and provide a guide to help you safely utilize multiple accounts. OpenAI

See all articles