Home >Technology peripherals >AI >ChatGPT Special Topic: The Capabilities and Future of Large Language Models
Nowadays, the generative AI track is hot. According to PitchBook statistics, the generative AI track will receive a total of approximately US$1.4 billion in financing in 2022, almost reaching the total of the past five years. Star companies such as OpenAI and Stability AI, and other start-ups such as Jasper, Regie.AI, Replika, etc. have all received capital favor.
Chart of the relationship between financing amount and time
In October 2022, Stability AI received approximately US$100 million in financing and released the open source model Stable Diffusion, which can be based on The text description input by the user generates pictures, detonating the field of AI painting. On November 30, 2022, after ChatGPT announced its public beta, five days after it went online, the number of global users exceeded one million. In less than 40 days since its launch, daily active users have exceeded 10 million. In the early morning of March 15, 2023, OpenAI released the most powerful GPT series model - GPT-4, which provides a large-scale multi-modal model that can accept image and text input and produce text output, which has a disruptive impact in the industry. . On March 17, 2023, Microsoft held the Microsoft 365 Copilot conference, officially installed OpenAI's GPT-4 model into the Office suite, and launched the new AI function Copilot. It can not only make PPT and write copy, but also perform analysis and generate videos. In addition, various major domestic manufacturers have also announced the launch of products similar to ChatGPT. On February 8, Alibaba experts broke the news that Damo Academy is developing a ChatGPT-like conversational robot and has opened it to employees within the company for testing. It is possible to deeply combine AI large model technology with DingTalk productivity tools. On February 8, He Xiaodong, Vice President of JD.com, said frankly: JD.com has rich scenarios and high-quality data in the field of ChatGPT. On February 9, relevant sources at Tencent said: Tencent currently has plans for products similar to ChatGPT and AI-generated content, and special research is also progressing in an orderly manner. NetEase said that its education business will integrate AI-generated content, including but not limited to AI speaking teachers, essay scoring and evaluation, etc. On March 16, Baidu officially released the large language model and generative AI product "Wen Xin Yi Yan". Two days after the release, 12 companies have completed the first batch of contract cooperation and applied for Baidu Intelligent Cloud Wen Xin Yi Yan API calling service. The number of companies tested reached 90,000.
At present, large models have gradually penetrated into our lives. In the future, all walks of life are likely to undergo earth-shaking changes. Taking ChatGPT as an example, it includes the following aspects:
It should be noted that although the main discussion here is the implementation of large language models, in fact, other large models in multiple modalities (audio, video, pictures) also have broad application scenarios.
is released by Google. The LaMDA model is based on the transformer framework, has 137 billion model parameters, and has the ability to model long-distance dependencies in text. The model is trained through conversations. It mainly includes two processes: pre-training and fine-tuning: In the pre-training stage, they used up to 1.56T of public conversation data sets and web page text, using the language model (LM) as the objective function of training, that is, the goal is to predict the next character (token). In the fine-tuning phase, they designed multiple tasks, such as scoring attributes of responses (sensitivity, safety, etc.), to give the language model its human preferences. The figure below shows one type of fine-tuning task.
LaMDA model pre-training phase
One of the tasks in the LaMDA model fine-tuning phase
LaMDA model Focuses on dialogue generation tasks but often makes factual errors. Google released Bard (an experimental conversational AI service) this year, which is powered by the LaMDA model. However, during Bard's press conference, Bard made factual errors, which caused Google's stock price to plummet on Wednesday, falling more than 8% intraday, as low as about $98 on the refresh day, and its market value evaporated by $110 billion, which is disappointing.
The InstructGPT model is based on the GPT architecture and mainly consists of supervised fine-tuning (Supervise Fune-Tuning, SFT) and human feedback reinforcement learning (Reinforce Learning Human Fune- tuning, RLHF). ChatGPT, a conversational product powered by InstructGPT, focuses on generating language text and can also generate code and perform simple mathematical operations. The specific technical details have been discussed in detail in the previous two issues. Readers can go there to read them and will not repeat them here.
InstructGPT model training flow chart
Cluade model training flow chart
Cluade is a conversational product of Anthropic Company. Cluade, like ChatGPT, is based on the GPT framework and is a one-way language model. However, unlike ChatGPT, it is mainly trained by reinforcement learning with supervised fine-tuning and AI feedback. In the supervised fine-tuning stage, it first formulates a series of rules (Constitution), such as not generating harmful information, not generating racial bias, etc., and then obtains supervised data based on these rules. Then, let AI judge the quality of the responses and automatically train the data set for reinforcement learning.
Compared with ChatGPT, Claude can reject inappropriate requests more clearly, and the connections between sentences are more natural. Claude is willing to speak up when faced with a problem that is beyond his capabilities. Currently, Cluade is still in the internal testing stage. However, according to the internal test results of Scale Sepllbook team members, compared to ChatGPT, Claude is stronger in 8 of the 12 tasks tested.
We have statistics on large language models at home and abroad, as well as model capabilities, open source situations, etc.
Domestic popular large language models
Foreign popular large language models
You can see It turns out that large language models have a variety of capabilities, including but not limited to few-shot learning, zero-shot transfer, and so on. So a very natural question arises, how do these abilities come about? Where does the power of large language models come from? Next, we try to answer the above doubts.
The figure below shows some mature large language models and evolution processes. To sum up, most models will go through three stages: pre-training, instruction fine-tuning and alignment. Representative models include Deepmind’s Sparrow and OpenAI’s ChatGPT.
Evolutionary diagram of popular large language models
So, behind each step, what kind of capabilities can the model achieve? Dr. Fu Yao from the University of Edinburgh summarized what he believed to be the corresponding relationship between steps and abilities, giving us some inspiration.
1. Pre-training phase. The goal of this phase is to obtain a powerful basic model. Correspondingly, the capabilities demonstrated by the model at this stage include: language generation, context learning capabilities, world knowledge, reasoning capabilities, etc. Representative models at this stage include GPT-3, PaLM, etc.
2. Instruction fine-tuning stage. The goal of this phase is to unlock some emergent abilities. The emergent ability here specifically refers to the ability that small models do not have but only large models have. The model that has undergone instruction fine-tuning has capabilities that the basic model does not have. For example, by constructing new instructions, the model can solve new tasks; another example is the ability of the thinking chain, that is, by showing the model the reasoning process, the model can also imitate the correct reasoning, etc. Representative models include InstructGPT, Flan, etc.
Alignment stage. The goal of this stage is to make the model possess human values, such as to generate informative replies and not to produce discriminatory remarks, etc. It can be thought that the alignment stage gives the models “personality”. The representative model of this type is ChatGPT.
Three stages of large language model. The picture comes from "Fu Yao: On the Source of the Ability of Large Language Models"
Generally speaking, the above three stages complement each other and are indispensable. Only when a sufficiently powerful basic model is obtained in the pre-training stage can it be possible to stimulate (or enhance) other capabilities of the language model through instruction fine-tuning. The alignment stage gives the model a certain "character" to better comply with some requirements of human society.
While large language model technology brings convenience, it also contains risks and challenges. At a technical level, the authenticity of the content generated by GPT cannot be guaranteed, such as harmful remarks, etc. At the usage level, users may abuse AI-generated texts in fields such as education and scientific research. Currently, many companies and institutions have begun to impose restrictions on the use of ChatGPT. Microsoft and Amazon have banned company employees from sharing sensitive data to ChatGPT for fear of leaking confidential information; the University of Hong Kong has banned the use of ChatGPT or other artificial intelligence tools in all classes, assignments and assessments at the University of Hong Kong. We mainly introduce related work in industry.
GPTZero: GPTZero is the earliest text generation and identification tool. It is an online website (https://gptzero.me/) published by Edward Tian (a CS undergraduate student from Princeton, USA). Its principle relies on text perplexity (PPL) as an indicator to determine who wrote the given content. Among them, perplexity is used to evaluate the quality of the language model, which is essentially to calculate the probability of a sentence appearing.
GPTZero website interface
(Here we use ChatGPT to generate a news report and let GPTZero determine whether it is generated text.)
GPT2 Output Detector: This tool is published by OpenAI. It leverages the "GPT2-Generated Content" and Reddit datasets, fine-tuned on RoBerta, to learn a detection classifier. That is, "fight magic with magic." The official website also reminds that the prediction results are more credible only when the text exceeds 50 characters (token).
GPT2 Output Detector website interface
AI Text Classifier: This tool is published by OpenAI. The principle is to collect human writing texts and AI writing texts on the same topic. Divide each text into prompt and reply pairs, and let the probability of GPT producing an answer after fine-tuning (for example, letting GPT produce Yes/No) as the result threshold. The tool's classification is very detailed, and the results include 5 categories: very unlikely to be generated by AI (threshold 0.98).
AI Text Classifier website interface
Large language models have emergent capabilities that small models do not have, such as excellent Zero-sample learning, domain transfer, and thinking chain capabilities. The power of large models actually comes from pre-training, instruction fine-tuning and alignment. These three processes are closely related and have made today's super powerful large language models possible.
The large language model (GPT series) currently does not have the capabilities of confidence update, formal reasoning, Internet retrieval, etc. Some experts believe that if knowledge can be offloaded outside the model, the number of parameters will be greatly reduced, and the large language model will be greatly reduced. Models can really go a step further.
Only under reasonable supervision and governance, artificial intelligence technology can better serve people. There is a long way to go to develop large-scale models in China!
[1] https://stablediffusionweb.com
[2] https://openai.com/product/gpt-4
[3] LaMDA: Language Models for Dialog Applications, Arxiv 2022.10
[4] Constitutional AI: Harmlessness from AI Feedback, Arxiv 2022.12
[5] https://scale.com /blog/chatgpt-vs-claude#Calculation
[6] Guolian Securities: "ChatGPT has arrived, and commercialization is accelerating"
[7] Guotai Junan Securities: "ChatGPT Research Framework 2023》
[8] Fu Yao: Pre-training, instruction fine-tuning, alignment, specialization: On the source of large language model capabilities https://www.bilibili.com/video/BV1Qs4y1h7pn/?spm_id_from=333.880 .my_history.page.click&vd_source=da8bf0b993cab65c4de0f26405823475
[9] Analysis of a 10,000-word long article! Reproduce and use GPT-3/ChatGPT, what you should know https://mp.weixin.qq.com/s/ILpbRRNP10Ef1z3lb2CqmA
The above is the detailed content of ChatGPT Special Topic: The Capabilities and Future of Large Language Models. For more information, please follow other related articles on the PHP Chinese website!