Home  >  Article  >  Technology peripherals  >  0 threshold for free commercial use! Mencius 3-13B large model is officially open source and trained with trillions of token data

0 threshold for free commercial use! Mencius 3-13B large model is officially open source and trained with trillions of token data

PHPz
PHPzforward
2024-04-01 17:01:22620browse

Lanzhou Technology officially announced: Mencius 3-13B large model is officially open source!

This lightweight, cost-effective large model is fully open to academic research and supports free commercial use.

Mencius 3-13B has shown good performance in various benchmark evaluations such as MMLU, GSM8K, and HUMAN-EVAL.

Especially in the field of lightweight large models with parameters within 20B, the Chinese and English language abilities are particularly outstanding. Mathematics and programming skills are also at the forefront.

0 threshold for free commercial use! Mencius 3-13B large model is officially open source and trained with trillions of token data
△The above results are based on 5-shot.

According to reports, the Mencius 3-13B large model is based on the Llama architecture, and the data set size is as high as 3T Tokens.

The corpus is selected from web pages, encyclopedias, social media, media, news, and high-quality open source data sets. By continuing to train multi-lingual corpus on trillions of tokens, the model has outstanding Chinese capabilities and takes into account multi-lingual capabilities.

Open source of the Mencius 3-13B large model

You can use the Mencius 3-13B large model in just two steps.

First configure the environment.

pip install -r requirements.txt

Then start quickly.

import torchfrom transformers import AutoModelForCausalLM, AutoTokenizertokenizer = AutoTokenizer.from_pretrained("Langboat/Mengzi3-13B-Base", use_fast=False, trust_remote_code=True)model = AutoModelForCausalLM.from_pretrained("Langboat/Mengzi3-13B-Base", device_map="auto", trust_remote_code=True)inputs = tokenizer('指令:回答以下问题。输入:介绍一下孟子。输出:', return_tensors='pt')if torch.cuda.is_available():inputs = inputs.to('cuda')pred = model.generate(**inputs, max_new_tokens=512, repetition_penalty=1.01, eos_token_id=tokenizer.eos_token_id)print(tokenizer.decode(pred[0], skip_special_tokens=True))

Additionally, they provide a sample code that can be used for single-round interactive inference with the base model.

cd examplespython examples/base_streaming_gen.py --model model_path --tokenizer tokenizer_path

If you want to fine-tune the model, they also provide relevant files and code.

0 threshold for free commercial use! Mencius 3-13B large model is officially open source and trained with trillions of token data

In fact, many details of the Mencius 3-13B large model were revealed as early as March 18 at the Lanzhou large model technology and product launch conference.

At that time, they stated that the training of the Mencius 3-13B large model had been completed.

As for the reasons for choosing the 13B version, Zhou Ming explained:

First of all, Lanzhou clearly focuses on serving ToB scenarios, supplemented by ToC.

Practice found that the parameters of large models most frequently used in ToB scenarios are mostly 7B, 13B, 40B, and 100B, and the overall concentration is between 10B-100B.

Secondly, within this range, from the perspective of ROI (return on investment), it not only meets the needs of the scene, but is also the most cost-effective.

Therefore, for a long time, Lanzhou’s goal has been to create high-quality industry large models within the 10B-100B parameter scale.

As one of the earliest large-model entrepreneurial teams in China, Lanzhou released Mencius GPT V1 (MChat) in March last year.

In January this year, Mencius Big Model GPT V2 (including Mencius Big Model-Standard, Mencius Big Model-Lightweight, Mencius Big Model-Finance, Mencius Big Model-Encoding) was opened to the public.

Okay, interested friends can click on the link below to experience it.

GitHub link: https://github.com/Langboat/Mengzi3
HuggingFace: https:// huggingface.co/Langboat/Mengzi3-13B-Base
ModelScope:https://www.modelscope.cn/models/langboat/Mengzi3-13B-Base
Wisemodel:https://wisemodel.cn/models/Langboat/Mengzi3-13B-Base

The above is the detailed content of 0 threshold for free commercial use! Mencius 3-13B large model is officially open source and trained with trillions of token data. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete