Home >Technology peripherals >AI >The 0-threshold cloning solution has been upgraded, the open source model is completely reproduced, and no registration is required for online experience.
AI applications and large models represented by ChatGPT and GPT4 are popular around the world and are regarded as opening up a new technological industrial revolution and a new starting point for AGI (Artificial General Intelligence). Not only are technology giants chasing each other and vying to launch new products, but many AI tycoons in academia and industry are also investing in related entrepreneurial waves. Generative AI is rapidly iterating in "days" and continues to surge!
However, OpenAI has not made it open source. What are the technical details behind them? How to quickly follow, catch up and participate in this technology wave? How to reduce the high cost of building and applying large AI models? How to protect core data and intellectual property from being leaked due to the use of third-party large model APIs?
As the most popular open source AI large model solution, Colossal-AI is the first to establish a model that includes supervised data set collection -> supervised fine-tuning -> reward model Training -> The complete RLHF process of reinforcement learning fine-tuning , based on LLaMA pre-training model, launched ColossalChat, is currently the practical open source project closest to the original technical solution of ChatGPT!
Open source address: https://github.com/hpcaitech/ColossalAI
Contains the following content:
1. Demo: You can directly experience the model effect online without registration or waitinglist
2. Training code: Open source complete RLHF training code, which has been open sourced to include 7B and 13B models
3. Dataset: Open source 104K Chinese and English bilingual data set
4. Inference deployment: 4bit quantitative inference 7 billion parameter model only requires 4GB of video memory
5. Model weight: Only a single machine The server can quickly reproduce with a small amount of computing power
6. Larger-scale models, data sets, other optimizations, etc. will maintain high-speed iteration to add
ColossalChat only needs less than 10 billion parameters, and performs RLHF fine-tuning on the basis of a large language model to master Chinese and English bilingual capabilities, reaching a level similar to ChatGPT and GPT-3.5 Effect.
For example, common sense question and answer:
Chinese answer:
Write an email:
Write an algorithm:
Although GPT series models such as ChatGPT and GPT-4 are very powerful, they are unlikely to be fully open source. Fortunately, the open source community continues to work hard.
For example, Meta has open sourced the LLaMA model. The number of parameters of this model ranges from 7 billion to 65 billion. 13 billion parameters can outperform the 175 billion GPT-3 model in most cases. Benchmark performance. However, because it was not instructed to fine-tune (instruct tuning), the actual generation effect was not ideal.
Stanford's Alpaca generates training data in a self-instruct manner by calling the OpenAI API, so that a lightweight model with only 7 billion parameters can be fine-tuned at very low cost. The dialogue effect is comparable to that of ultra-large-scale language models with hundreds of billions of parameters like GPT-3.5.
ButThe existing open source solutions can be regarded as supervised fine-tuning models that only get the first step in reinforcement learning with human feedback (RLHF), no subsequent alignment and fine-tuning work has been performed. At the same time, Alpaca’s training data set is too small and the corpus is only in English, which also limits the performance of the model to a certain extent.
The amazing effect of ChatGPT and GPT-4 lies in the introduction of RLHF into the training process, making the generated content more consistent with human values.
##The three stages of RLHF
Based on the LLaMA model, Colossal-AI is the first open source Chat-like model reproduction solution ColossalChat that includes a complete RLHF process. It is currently closest to the original technical route of ChatGPT's practical open source project!
Open source of training data setColossalChat has open sourced a Chinese and English bilingual data set containing about 100,000 questions and answers. This data set collects and cleans real questioning scenarios of people on social platforms as a seed data set, uses self-instruct technology to expand the data, and costs about $900 for annotation. Compared with the data sets generated by other self-instruct methods, the seed data of this data set is more real and rich, and the generated data set covers more topics. This data can be used for both fine-tuning and RLHF training. Through high-quality data, ColossalChat can conduct better conversational interactions and support Chinese.
##ColossalChat data set collection process
RLHF algorithm reproduction
RLHF-Stage1 is supervised-fintuning, that is, using the data set mentioned above for model fine-tuning.RLHF-Stage2 trained the reward model. It manually sorted different outputs of the same prompt to obtain the corresponding scores and supervised the training of the reward model.
RLHF-Stage3 uses a reinforcement learning algorithm, which is the most complex part of the training process:
RLHF-Stage3 algorithm flow chart
In the PPO part, ColossalChat is divided into two stages: the first is the Make Experience part, using The SFT, Actor, RM, and Critic model calculations generate Experience and store it in the buffer; followed by the parameter update part, the Experience is used to calculate the strategy loss and value loss.In the PTX part, ColossalChat calculates the cross-entropy loss function of the Actor output response and the answer part of the input corpus, which is used to add the pre-training gradient to the PPO gradient to maintain the original language model Performance prevents forgetting. Finally, the strategy loss, value loss and PTX loss are summed for back propagation and parameter update.
Get started quickly
ColossalChat has open sourced the complete code for reproducing the three stages of training ChatGPT based on the LLaMA model.The first stage, train the SFT model:
# Training with a 4-GPU servers
colossalai run --nproc_per_node=4 train_sft.py
--pretrain "/path/to/LLaMa-7B/"
--model 'llama'
--strategy colossalai_zero2
--log_interval 10
--save_path/path/to/Coati-7B
--dataset /path/to/data.json
--batch_size 4
--accimulation_steps 8
--lr 2e-5
# Training with a 4-GPU servers
colossalai run --nproc_per_node=4 train_reward_model.py
--pretrain "/path/to/LLaMa-7B/"
--model 'llama'
--strategy colossalai_zero2
--dataset /path/to/datasets
# Training with a 8-GPU servers
colossalai run --nproc_per_node=8 train_prompts.py prompts.csv
--strategy colossalai_zero2
--pretrain "/path/to/Coati-7B"
--model 'llama'
--pretrain_dataset /path/to/dataset
ColossalChat 能够快速跟进 ChatGPT 完整 RLHF 流程复现,离不开 AI 大模型基础设施 Colossal-AI 及相关优化技术的底座支持,相同条件下训练速度相比 Alpaca 采用的 FSDP (Fully Sharded Data Parallel) 可提升三倍左右。 系统基础设施 Colossal-AI AI 大模型开发系统 Colossal-AI 为该方案提供了基础支持,它可基于 PyTorch 高效快速部署 AI 大模型训练和推理,从而降低 AI 大模型应用的成本。Colossal-AI 由加州伯克利大学杰出教授 James Demmel 和新加坡国立大学校长青年教授尤洋领导开发。自从它开源以来,Colossal-AI 已经多次在 GitHub 热榜位列世界第一,获得 GitHub Star 约两万颗,并成功入选 SC、AAAI、PPoPP、CVPR、ISC 等国际 AI 与 HPC 顶级会议的官方教程。 减少内存冗余的 ZeRO + Gemini Colossal-AI 支持使用无冗余优化器 (ZeRO) 提高内存使用效率,低成本容纳更大模型,同时不影响计算粒度和通信效率。自动 Chunk 机制可以进一步提升 ZeRO 的性能,提高内存使用效率,减少通信次数并避免内存碎片。异构内存空间管理器 Gemini 支持将优化器状态从 GPU 显存卸载到 CPU 内存或硬盘空间,以突破 GPU 显存容量限制,扩展可训练模型的规模,降低 AI 大模型应用成本。 使用 LoRA 低成本微调 Colossal-AI 支持使用低秩矩阵微调(LoRA)方法,对 AI 大模型进行低成本微调。LoRA 方法认为大语言模型是过参数化的,而在微调时,参数改变量是一个低秩矩阵。因此,可以将这个矩阵分解为两个更小的矩阵的乘积。在微调过程中,大模型的参数被固定,只有低秩矩阵参数被调整,从而显著减小了训练所需的参数量,并降低成本。 低成本量化推理 GPTQ 量化 为降低推理部署成本,Colossal-AI 使用 GPTQ 4bit 量化推理。在 GPT/OPT/BLOOM 类模型上,它比传统的 RTN (rount-to-nearest) 量化技术能够获得更好的 Perplexity 效果。相比常见的 FP16 推理,它可将显存消耗降低 75%,只损失极少量的吞吐速度与 Perplexity 性能。 以 ColossalChat-7B 为例,在使用 4bit 量化推理时,70 亿参数模型仅需大约 4GB 显存即可完成短序列(生成长度为 128 )推理,在普通消费级显卡上即可完成(例如 RTX 3060 Laptop),仅需一行代码即可使用。 如果采用高效的异步卸载技术 (offload),还可以进一步降低显存要求,使用更低成本的硬件推理更大的模型。 1. ColossalChat 开源了第一个完整的RLHF pipeline,斯坦福Alpaca没有做 RLHF,也就是没有做 Stage 2 和 Stage 3。 2. ColossalChat 采用了更多的指令数据,质量更好,范围更大,并使用强化学习做alignment 使回答更接近人类。 3. The ColossalChat training process integrates many system optimizations of Colossal-AI, and the training speed of the same data set and model size can be faster than Alpaca3 About times , allowing scientific researchers and small and medium-sized enterprises to independently train and deploy their own conversational systems. 4. The ColossalChat team collected more data sets themselves: a total of 24M tokens in English for training, about 30M tokens in Chinese, and a total of about 54M tokens. Among them, the data set collected by ColossalChat itself is 6M in English and 18M tokens in Chinese. The following are some performances of ColossalChat and Alpaca in language dialogue (ColossalChat above, Alpaca below). Write Quicksort in Python: ## Write an email to the professor to request a letter of recommendation: Although RLHF has been further introduced, due to the computing power Since the data set is limited, there is still room for improvement in actual performance in some scenarios. Fortunately, unlike in the past, large AI models and cutting-edge technologies were only monopolized by a few technology giants. Open source communities such as PyTorch, Hugging Face and OpenAI are closely related to Start-ups also play a key role in this wave. Drawing on the successful experience of the open source community, Colossal-AI welcomes all parties to participate in co-construction and embrace the era of large models! You can contact or participate through the following methods: 1. Post an issue on GitHub or submit a pull request (PR) 2. Join the Colossal-AI user WeChat or Slack group to communicate 3. Send a formal cooperation proposal to the email youy@comp.nus.edu.sg Open source address: https://github.com/hpcaitech/ColossalAIpython server.py/path/to/pretrained --quant 4bit --gptq_checkpoint /path/to/coati-7b-4bit-128g.pt --gptq_group_size 128
系统性能优化与开发加速
if args.quant == '4bit':
model = load_quant (args.pretrained, args.gptq_checkpoint, 4, args.gptq_group_size)
ColossalChat和Alpaca的区别
The above is the detailed content of The 0-threshold cloning solution has been upgraded, the open source model is completely reproduced, and no registration is required for online experience.. For more information, please follow other related articles on the PHP Chinese website!