Home >Technology peripherals >AI >Focusing on the chatbot competition between Google, Meta and OpenAI, ChatGPT makes LeCun's dissatisfaction the focus of the topic
A few days ago, Meta’s chief artificial intelligence scientist Yann LeCun’s comments about ChatGPT quickly spread throughout the industry and triggered a wave of discussions.
At a small gathering of media and executives at Zoom, LeCun made a surprising comment: "As far as the underlying technology is concerned, ChatGPT is not such a great innovation."
"Although in the public eye, it is revolutionary, but we know that it is a well-assembled product, nothing more."
ChatGPT, as the "top trend" chat robot in the past few months, has long been popular all over the world, and has even truly changed the careers of some people and the current situation of school education.
When the whole world was amazed by it, LeCun’s review of ChatGPT was so “understatement”.
But in fact, his remarks are not unreasonable.
Many companies and research laboratories have data-driven artificial intelligence systems like ChatGPT. LeCun said OpenAI is not unique in this field.
"In addition to Google and Meta, there are six startups, basically all with very similar technology." LeCun added.
Then, LeCun got a little sour -
"ChatGPT uses a Transformer architecture that is pre-trained in a self-supervised manner, and self-supervised Learning is what I have been advocating for a long time. OpenAI had not yet been born at that time."
Among them, Transformer is Google's invention. This kind of language neural network is the basis of large-scale language models such as GPT-3.
The first neural network language model was proposed by Yoshua Bengio 20 years ago. Bengio's attention mechanism was later used by Google in Transformer, and has since become a key element in all language models.
In addition, ChatGPT uses human feedback reinforcement learning (RLHF) technology, which was also pioneered by Google DeepMind Lab.
In LeCun’s view, ChatGPT is more of a successful engineering case than a scientific breakthrough.
OpenAI's technology "is not innovative in terms of basic science, it's just well designed."
"Of course, I won't criticize them for that."
I am not criticizing OpenAI’s work, nor their claims.
I want to correct the public and media views. They generally believe that ChatGPT is an innovative and unique technological breakthrough, but this is not the case.
In a symposium with New York Times reporter Cade Metz, LeCun felt the doubts of busybodies.
"You may want to ask, why don't Google and Meta have similar systems? My answer is that if Google and Meta launch such nonsense chatbots, the losses will be quite heavy." He smiled explain.
Coincidentally, as soon as the news came out that OpenAI was favored by Microsoft and other investors, and its net worth soared to 29 billion U.S. dollars, Marcus also wrote an article on his blog overnight to ridicule .
In the article, Marcus broke out a golden sentence: What can OpenAI do that Google can’t do, and is it worth a sky-high price of US$29 billion?
Without further ado, let’s pull out the chatbots of these AI giants and let the data speak for themselves.
LeCun said that many companies and laboratories have AI chatbots similar to ChatGPT, which is true.
ChatGPT is not the first AI chatbot based on language models. It has many "predecessors".
Before OpenAI, Meta, Google, DeepMind, etc. all released their own chatbots, such as Meta’s BlenderBot, Google’s LaMDA, and DeepMind’s Sparrow.
There are also some teams that have also announced their own open source chat robot plans. For example, Open-Assistant from LAION.
In a blog by Huggingface, several authors surveyed important papers on the topics of RLHF, SFT, IFT, and CoT (they are all keywords of ChatGPT) , categorized and summarized them.
They made a table comparing AI chatbots such as BlenderBot, LaMDA, Sparrow and InstructGPT based on details such as public access, training data, model architecture and evaluation direction.
Note: Because ChatGPT is not documented, they are using the details of InstructGPT, an instruction fine-tuning model from OpenAI that can be considered the basis of ChatGPT.
|
#LaMDA | BlenderBot 3 | ##SparrowChatGPT/ InstructGPT | |
Meta | DeepMind | OpenAI | ||
Closed | Public | Closed | Limited | |
Parameter scale |
137 billion |
##175 billion | 70 billion | ##175 billion|
Unknown | OPT | Chinchilla | GPT-3.5 | |
2.81 trillion | 100 billion | 1.4 trillion | Unknown | ##Access network |
✔️ |
✔️ |
✔️ |
##✖️ | |
Supervisory fine-tuning | ✔️ | ✔️ | ##✔️##✔️ | |
High quality: 6.4K | Security: 8KFalling Characteristics: 4KIR: 49K 20 NLP datasets ranging from 18K to 1.2M | Unknown | 12.7K (ChatGPT may be more) | |
✖️ | ##✖️ |
✔️ |
✔️ |
Manual Security Rules | ✔ |
✖️ |
✔ |
✖️ |
It is not difficult to find that although there are many differences in training data, basic models and fine-tuning, these chatbots all have one thing in common - following instructions.
For example, you can ask ChatGPT to write a poem about fine-tuning through instructions.
It can be seen that ChatGPT is very "cognitive" and never forgets to flatter LeCun and Hinton when writing poems.
Then he praised passionately: "Nudge, fine tune, you are a beautiful dance."
Normally, the language modeling of the basic model is not enough for the model to learn how to follow user instructions.
In model training, researchers will not only use classic NLP tasks (such as emotion, text classification, summary, etc.), but also use instruction fine-tuning (IFT). That is, fine-tuning the basic model through text instructions on a very diverse range of tasks.
These instruction examples are composed of three main parts: instructions, input and output.
Input is optional, some tasks only require instructions, like the open build in the ChatGPT example above.
When an input and output appear, an example is formed. For a given instruction, there can be multiple input and output examples. For example, the following example:
#IFT data is usually a collection of instructions written by humans and instruction examples guided by language models.
During the boot process, LM is prompted in a few-shot (small sample) setting (as shown above) and is instructed to generate new instructions, inputs and outputs.
In each round, the model is prompted to choose from human-written and model-generated samples.
The amount of human and model contribution to creating a dataset falls like a spectrum (see image below).
One end is a purely model-generated IFT data set, such as Unnatural Instructions, and the other end is a large number of artificially generated instructions, such as Super-natural instructions.
Somewhere in between is to use a smaller but higher quality seed data set and then perform guided work, such as Self-instruct.
Another way to organize datasets for IFT is to leverage existing high-quality crowdsourced NLP datasets on a variety of tasks (including prompts) and combine these using unified patterns or different templates. Data sets are converted into instructions.
Work in this area includes T0, Natural instructions dataset, FLAN LM and OPT-IML.
Related papers on natural instruction data set: https://arxiv.org/abs/2104.08773
On the other hand, OpenAI’s InstructGPT, DeepMind’s Sparrow, and Anthropic’s Constitutional AI all use reinforcement learning based on human feedback (RLHF), which is the annotation of human preferences.
In RLHF, a set of model responses are ranked based on human feedback (e.g., choosing a more popular text introduction).
Next, the researchers trained a preference model on these annotated responses, returning a scalar reward to the RL optimizer.
Finally, train the chatbot through reinforcement learning to simulate this preference model.
Chain of Thought (CoT) prompts are a special case of command examples that induce the chatbot to reason step by step to produce output.
Models fine-tuned with CoT use a data set of instructions for step-by-step inference with human annotations.
This is the origin of the famous prompt - "let's think step by step".
The following example is taken from "Scaling Instruction-Finetuned Language Models". Among them, orange highlights the instructions, pink shows the input and output, and blue is the CoT inference.
The paper points out that models using CoT fine-tuning perform better in tasks involving common sense, arithmetic and symbolic reasoning.
In addition, CoT fine-tuning is also very effective on sensitive topics (sometimes better than RLHF), especially to avoid model corruption-"Sorry, I can't answer".
As just mentioned, instruction-fine-tuned language models cannot always produce useful and safe response.
For example, it will escape by giving useless answers, such as "Sorry, I don't understand"; or output unsafe responses to users who raise sensitive topics.
In order to improve this behavior, researchers fine-tune the basic language model on high-quality human annotated data through a form of supervised fine-tuning (SFT), thereby improving the usefulness and harmlessness of the model.
The relationship between SFT and IFT is very close. IFT can be seen as a subset of SFT. In recent literature, the SFT phase is often used for security topics rather than for specific instruction topics that are completed after IFT.
In the future, their classification and description should have clearer use cases.
In addition, Google's LaMDA is also fine-tuned on a securely annotated conversation data set, which has security annotations based on a series of rules.
These rules are often predefined and developed by researchers and cover a wide range of topics, including harm, discrimination, misinformation, and more.
Regarding AI chatbots, there are still many open issues to be explored, such as:
1. RL How important is it in terms of learning from human feedback? Can we get the performance of RLHF in IFT or SFT with higher quality data training?
2. How does the security of SFT RLHF in Sparrow compare with just using SFT in LaMDA?
3. Given that we already have IFT, SFT, CoT and RLHF, how much more pre-training is necessary? What are the trade-offs? Which is the best base model (both public and private)?
4. These models are now carefully designed in which researchers specifically search for failure modes and influence future training (including tips and methods) based on the problems revealed. How can we systematically document and reproduce the effects of these methods?
1. Compared with the training data, only a very small part is needed for instruction fine-tuning (a few hundred orders of magnitude).
2. Supervised fine-tuning uses human annotations to make the model’s output more safe and useful.
3. CoT fine-tuning improves the model’s performance on step-by-step thinking tasks and prevents the model from always escaping sensitive issues.
Reference:
https://huggingface.co/blog/dialog-agents
The above is the detailed content of Focusing on the chatbot competition between Google, Meta and OpenAI, ChatGPT makes LeCun's dissatisfaction the focus of the topic. For more information, please follow other related articles on the PHP Chinese website!