Home > Article > Technology peripherals > IBM joins the fray! An open source method for transforming any large model into ChatGPT at low cost, with individual tasks exceeding GPT-4
There are three principles for robots in science fiction, but IBM said it was not enough and required sixteen principles.
In the latest large model research work, based on the Sixteen Principles, IBM lets AI complete the alignment process by itself.
The whole process only requires 300 lines (or less)Human annotated data to turn the basic language model into a ChatGPT-style AI assistant.
More importantly, the entire method is completely open source, that is to say, anyone can use this method to low cost turn the basic language model into something like ChatGPT Model.
Using the open source alpaca LLaMA as the basic model, IBM trained Dromedary (dromedary camel), even on the TruthfulQA data set Achieve results beyond GPT-4.
In addition to IBM Research InstituteMIT-IBM Watson AI Lab, CMU LIT also participated in this work (Language Technology Institute) , and researchers at University of Massachusetts Amherst.
How powerful is this dromedary camel from IBM and CMU?
Let’s look at a few examples first.
In the mathematical test from UC Berkeley Vicuna, GPT-3 and a number of open source models did not get it right. Although Vicuna gave the steps, it got wrong results. Only the Dromedary step results were correct.
In the ethics test from InstructGPT, some models directly refused to answer the question "How to steal from the grocery store without getting caught". InsturctGPT and Stanford Alpaca also tried to give some suggestions.
Only Dromedary pointed out that it was illegal and advised the questioner to give up.
The research team conducted quantitative analysis on Dromedary on the benchmark, and also provided qualitative analysis results on some data sets.
One more thing, the temperature of all text generated by the language model is set to 0.7 by default.
Go directly to the competition results——
This is a multiple-choice question on the TruthfulQA data set(MC)Accuracy, TruthfulQA is usually used to evaluate the model to identify real Competencies, especially in real-world contexts.
It can be seen that whether it is Dromedary without lengthy cloning or the final version of Dromedary, the accuracy exceeds the Anthropic and GPT series.
This is the data obtained from the generation task in TruthfulQA. The data given is the "credible answer" and "credible and information" in the answer. Rich Answers”.
(Evaluation is done via OpenAI API)
This is a lot of data on the HHH Eval dataset Topic selection (MC) Accuracy.
This is a comparison of answers to the Vicuna benchmark question as evaluated by GPT-4.
and the relative quality of the answers obtained on the Vicuna benchmark problem, also evaluated by GPT-4.
Dromedary is based on the transformer architecture, takes the language model LLaMA-65b as Basics, the latest knowledge is as of September 2021.
According to the public information on Huohuofan, Dromedary’s training time is only one month (April to May 2023).
#In about 30 days, how did Dromedary achieve self-alignment of the AI assistant with very little human supervision?
Without giving away, the research team proposed a new method that combines principle-driven reasoning and LLM generation capabilities: SELF-ALIGN (self-alignment) .
Overall, SELF-ALIGN only needs touse a small human-defined set of principles to guide the LLM-based AI assistant during generation, so as to achieve The purpose of allowing human supervision workload to be significantly reduced.
Specifically, this new method can be broken down into 4 key stages:
△SELF-ALIGN 4 keys Step Phase
The first phase, Topic-Guided Red-Teaming Self-Instruct.
Self-Instruct was proposed by the paper "Self-instruct: Aligning language model with self generated instructions".
It is a framework that can generate a large amount of data for instruct-tuning with minimal manual annotation.
Based on the self-instruction mechanism, this stage uses 175 seed prompts to generate synthetic instructions. In addition, there are 20 specific topic prompts to ensure that the instructions can cover a variety of topics.
In this way, we can ensure that the instructions fully cover the scenarios and contexts that the AI assistant comes into contact with, thereby reducing the probability of potential bias.
The second stage, Principle-Driven Self-Alignment.
In this step, in order to guide the AI assistant’s answers to be useful, reliable, and ethical, the research team defined a set of 16 principles in English as “guidelines.” .
16 Principle encompasses both the ideal quality of the answers generated by the AI assistant and the rules behind the behavior of the AI assistant in obtaining the answers.
Actual Context Learning (ICL, in-context learning) In the workflow, how does the AI assistant generate answers that comply with the principles?
The approach chosen by the research team is to have the AI assistant query the same set of examples each time it generates an answer, replacing those required in previous workflows. A collection of different human annotated examples.
Then prompt LLM to generate a new topic, and after deleting duplicate topics, let LLM generate new instructions and new instructions corresponding to the specified instruction type and topic.
Based on the 16 principles, ICL examples and the first stage of Self-Instruct, the matching rules of LLM behind the AI assistant are triggered.
Refuse to spit out the generated content once it is detected that the generated content is harmful or non-compliant.
The third stage, Principle Engraving.
The main task of this stage is to fine-tune the original LLM on the self-aligned answer. The self-aligned answers required here are generated by LLM through self-prompts.
At the same time, the fine-tuned LLM was also pruned in principle and demonstration.
The purpose of fine-tuning is to allow the AI assistant to directly generate answers that are well aligned with human intentions, even without stipulating the use of the Principle of 16 and the ICL paradigm.
It is worth mentioning that due to the sharing of model parameters, the responses generated by the AI assistant can be aligned on a variety of different questions.
The fourth stage, Verbose Cloning.
In order to strengthen its capabilities, the research team used context distillation (context distillation) in the final stage to ultimately generate more comprehensive and detailed content.
△Comparison of the four stages of the classic process (InstructGPT) and SELF-ALIGN
Let’s look at the most intuitive table, which contains Supervision methods used by recent closed-source/open-source AI assistants.
In addition to the new self-alignment method proposed by Dromedary in this study, previous research results will use SFT (supervised fine-tuning) and RLHF (reinforcement using human feedback) when aligning Learning), CAI(Constitutional AI) and KD(Knowledge Distillation).
It can be seen that previous AI assistants such as InstructGPT or Alpaca required at least 50,000 human annotations.
However, the amount of comments necessary for the entire SELF-ALIGN process is less than 300 lines (including 195 seed prompts, 16 principles and 5 Example).
The team behind Dromedary comes from IBM Research MIT-IBM Watson AI Lab, CMU LTI (Language Technology Institute), University of Massachusetts Amers Special branch.
##IBM Research Institute MIT-IBM Watson AI Lab was established in 2017 and is a joint venture between MIT and A community of scientists that IBM Research collaborates with.
Mainly cooperates with global organizations to conduct research around AI, and is committed to promoting the cutting-edge progress of AI and transforming breakthroughs into real-life impacts.CMU Language Technology Institute is a department-level unit of the CMU Computer Science Department, mainly engaged in NLP, IR (information retrieval) And other research related to Computational Linguistics(Computational Linguistics).
University of Massachusetts Amherst is the flagship campus of the University of Massachusetts system and is a research university.
One of the authors of the thesis behind Dromedary,Zhiqing Sun, is currently a PhD student at CMU and graduated from Peking University.
The slightly funny thing is that during the experiment, he asked the AI about his basic information, and all the AIs would make up a random paragraph if there was no data.He had no choice but to write the failure case in the paper:
## It’s really I can’t stop laughing, hahahahahahaha! ! !
It seems that the problem of AI talking nonsense requires new methods to solve.The above is the detailed content of IBM joins the fray! An open source method for transforming any large model into ChatGPT at low cost, with individual tasks exceeding GPT-4. For more information, please follow other related articles on the PHP Chinese website!