Home >Technology peripherals >AI >No manual annotation required, the self-generated instruction framework breaks the cost bottleneck of LLMs such as ChatGPT
ChatGPT is the new top player in the AI circle at the end of this year. People are amazed by its powerful question and answer language capabilities and programming knowledge. But the more powerful the model, the higher the technical requirements behind it.
ChatGPT is based on the GPT 3.5 series of models and introduces "manually labeled data reinforcement learning" (RLHF) to continuously fine-tune the pre-trained language model , designed to allow large language models (LLM) to learn to understand human commands and learn to give optimal answers based on given prompts.
This technical idea is the current development trend of language models. Although this type of model has great development prospects, the cost of model training and fine-tuning is very high.
According to the information currently disclosed by OpenAI, the training process of ChatGPT is divided into three stages:
First of all, the first stage is a supervised policy model similar to GPT 3.5. This basic model is difficult to understand the intentions contained in different types of human instructions, and it is also difficult to judge the quality of the generated content. The researchers randomly selected some samples from the prompt data set, and then asked professional annotators to give high-quality answers based on the specified prompt. The prompts and their corresponding high-quality answers obtained through this manual process were used to fine-tune the initial supervised policy model to provide basic prompt understanding and initially improve the quality of the generated answers.
The second stage research team extracts multiple outputs generated by the model according to the given prompt, then asks human researchers to sort these outputs, and then uses the sorted data to train the reward model. RM). ChatGPT adopts pair-wise loss to train RM.
In the third phase, the research team uses reinforcement learning to enhance the capabilities of the pre-training model, and uses the RM model learned in the previous phase to update the parameters of the pre-training model.
We can find that among the three stages of ChatGPT training, only the third stage does not require manual annotation of data, while both the first and second stages require a large amount of manual annotation. Therefore, although models such as ChatGPT perform very well, in order to improve their ability to follow instructions, the labor cost is very high. As the scale of the model becomes larger and the scope of capabilities becomes wider and wider, this problem will become more serious and eventually become a bottleneck hindering the development of the model.
Some studies have tried to propose ways to solve this bottleneck. For example, the University of Washington and other institutions recently jointly published a paper "SELF-INSTRUCT: Aligning Language Model with Self Generated Instructions", proposing The new framework SELF-INSTRUCT improves the instruction-following capabilities of pre-trained language models by guiding the model's own generation process.
##Paper address: https://arxiv.org/pdf/2212.10560v1.pdf SELF-INSTRUCT is a semi-automated process that uses instruction signals from the model itself to make instruction adjustments to a pre-trained LM. As shown in the figure below, the entire process is an iterative bootstrapping algorithm. SELF-INSTRUCT Manually written instructions that guide the entire build process, starting from a limited set of seeds. In the first phase, the model is prompted into new task generation instructions, a step that leverages the existing instruction set to create broader instructions to define the new task. SELF-INSTRUCT also creates input and output instances for the newly generated instruction set for use in overseeing instruction adjustments. Finally, SELF-INSTRUCT also prunes low-quality and duplicate instructions. The entire process is performed iteratively, and the final model can generate instructions for a large number of tasks. To verify the effectiveness of the new method, this study applied the SELF-INSTRUCT framework on GPT-3, which ultimately produced approximately 52k instructions, 82k instance inputs, and target outputs. We observed that GPT-3 achieved an absolute improvement of 33.1% over the original model on the new task in the SUPER-NATURALINSTRUCTIONS dataset, which was comparable to the performance of InstructGPT_001 trained using private user data and human annotation. For further evaluation, the study collated a set of expert-written instructions for the new tasks and demonstrated through human evaluation , the performance of GPT-3 using SELF-INSTRUCT will be significantly better than existing models using public instruction data sets, and only 5% behind InstructGPT_001. SELF-INSTRUCT provides a method that requires almost no manual annotation and implements pre-trained language models and instructions Alignment. Several works have been attempted in similar directions, and all have achieved good results. It can be seen that this type of method is very effective in solving the problem of high manual labeling costs for large language models. This will make LLMs such as ChatGPT stronger and go further. The above is the detailed content of No manual annotation required, the self-generated instruction framework breaks the cost bottleneck of LLMs such as ChatGPT. For more information, please follow other related articles on the PHP Chinese website!