search
HomeTechnology peripheralsAINo manual annotation required, the self-generated instruction framework breaks the cost bottleneck of LLMs such as ChatGPT

ChatGPT is the new top player in the AI ​​circle at the end of this year. People are amazed by its powerful question and answer language capabilities and programming knowledge. But the more powerful the model, the higher the technical requirements behind it.

No manual annotation required, the self-generated instruction framework breaks the cost bottleneck of LLMs such as ChatGPT

ChatGPT is based on the GPT 3.5 series of models and introduces "manually labeled data reinforcement learning" (RLHF) to continuously fine-tune the pre-trained language model , designed to allow large language models (LLM) to learn to understand human commands and learn to give optimal answers based on given prompts.

This technical idea is the current development trend of language models. Although this type of model has great development prospects, the cost of model training and fine-tuning is very high.

According to the information currently disclosed by OpenAI, the training process of ChatGPT is divided into three stages:

No manual annotation required, the self-generated instruction framework breaks the cost bottleneck of LLMs such as ChatGPT

First of all, the first stage is a supervised policy model similar to GPT 3.5. This basic model is difficult to understand the intentions contained in different types of human instructions, and it is also difficult to judge the quality of the generated content. The researchers randomly selected some samples from the prompt data set, and then asked professional annotators to give high-quality answers based on the specified prompt. The prompts and their corresponding high-quality answers obtained through this manual process were used to fine-tune the initial supervised policy model to provide basic prompt understanding and initially improve the quality of the generated answers.

The second stage research team extracts multiple outputs generated by the model according to the given prompt, then asks human researchers to sort these outputs, and then uses the sorted data to train the reward model. RM). ChatGPT adopts pair-wise loss to train RM.

In the third phase, the research team uses reinforcement learning to enhance the capabilities of the pre-training model, and uses the RM model learned in the previous phase to update the parameters of the pre-training model.

We can find that among the three stages of ChatGPT training, only the third stage does not require manual annotation of data, while both the first and second stages require a large amount of manual annotation. Therefore, although models such as ChatGPT perform very well, in order to improve their ability to follow instructions, the labor cost is very high. As the scale of the model becomes larger and the scope of capabilities becomes wider and wider, this problem will become more serious and eventually become a bottleneck hindering the development of the model.

Some studies have tried to propose ways to solve this bottleneck. For example, the University of Washington and other institutions recently jointly published a paper "SELF-INSTRUCT: Aligning Language Model with Self Generated Instructions", proposing The new framework SELF-INSTRUCT improves the instruction-following capabilities of pre-trained language models by guiding the model's own generation process.

No manual annotation required, the self-generated instruction framework breaks the cost bottleneck of LLMs such as ChatGPT

##Paper address: https://arxiv.org/pdf/2212.10560v1.pdf

SELF-INSTRUCT is a semi-automated process that uses instruction signals from the model itself to make instruction adjustments to a pre-trained LM. As shown in the figure below, the entire process is an iterative bootstrapping algorithm.

SELF-INSTRUCT Manually written instructions that guide the entire build process, starting from a limited set of seeds. In the first phase, the model is prompted into new task generation instructions, a step that leverages the existing instruction set to create broader instructions to define the new task. SELF-INSTRUCT also creates input and output instances for the newly generated instruction set for use in overseeing instruction adjustments. Finally, SELF-INSTRUCT also prunes low-quality and duplicate instructions. The entire process is performed iteratively, and the final model can generate instructions for a large number of tasks.

To verify the effectiveness of the new method, this study applied the SELF-INSTRUCT framework on GPT-3, which ultimately produced approximately 52k instructions, 82k instance inputs, and target outputs. We observed that GPT-3 achieved an absolute improvement of 33.1% over the original model on the new task in the SUPER-NATURALINSTRUCTIONS dataset, which was comparable to the performance of InstructGPT_001 trained using private user data and human annotation.

No manual annotation required, the self-generated instruction framework breaks the cost bottleneck of LLMs such as ChatGPT

For further evaluation, the study collated a set of expert-written instructions for the new tasks and demonstrated through human evaluation , the performance of GPT-3 using SELF-INSTRUCT will be significantly better than existing models using public instruction data sets, and only 5% behind InstructGPT_001.

No manual annotation required, the self-generated instruction framework breaks the cost bottleneck of LLMs such as ChatGPT

SELF-INSTRUCT provides a method that requires almost no manual annotation and implements pre-trained language models and instructions Alignment. Several works have been attempted in similar directions, and all have achieved good results. It can be seen that this type of method is very effective in solving the problem of high manual labeling costs for large language models. This will make LLMs such as ChatGPT stronger and go further. ​

The above is the detailed content of No manual annotation required, the self-generated instruction framework breaks the cost bottleneck of LLMs such as ChatGPT. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete
Word文本框没有旋转按钮怎么办Word文本框没有旋转按钮怎么办Dec 08, 2022 am 09:50 AM

Word文本框没有旋转按钮的解决办法:打开兼容模式文档后按F12键另存为高版本,再打开就可以了。

令人惊艳的4个ChatGPT项目,开源了!令人惊艳的4个ChatGPT项目,开源了!Mar 30, 2023 pm 02:11 PM

自从 ChatGPT、Stable Diffusion 发布以来,各种相关开源项目百花齐放,着实让人应接不暇。今天,着重挑选几个优质的开源项目分享给大家,对我们的日常工作、学习生活,都会有很大的帮助。

Word文档拆分后的子文档字体格式变了怎么办Word文档拆分后的子文档字体格式变了怎么办Feb 07, 2023 am 11:40 AM

Word文档拆分后的子文档字体格式变了的解决办法:1、在大纲模式拆分文档前,先选中正文内容创建一个新的样式,给样式取一个与众不同的名字;2、选中第二段正文内容,通过选择相似文本的功能将剩余正文内容全部设置为新建样式格式;3、进入大纲模式进行文档拆分,操作完成后打开子文档,正文字体格式就是拆分前新建的样式内容。

vscode配置中文插件,带你无需注册体验ChatGPT!vscode配置中文插件,带你无需注册体验ChatGPT!Dec 16, 2022 pm 07:51 PM

​面对一夜爆火的 ChatGPT ,我最终也没抵得住诱惑,决定体验一下,不过这玩意要注册需要外国手机号以及科学上网,将许多人拦在门外,本篇博客将体验当下爆火的 ChatGPT 以及无需注册和科学上网,拿来即用的 ChatGPT 使用攻略,快来试试吧!

学术专用版ChatGPT火了,一键完成论文润色、代码解释、报告生成学术专用版ChatGPT火了,一键完成论文润色、代码解释、报告生成Apr 04, 2023 pm 01:05 PM

用 ChatGPT 辅助写论文这件事,越来越靠谱了。 ChatGPT 发布以来,各个领域的从业者都在探索 ChatGPT 的应用前景,挖掘它的潜力。其中,学术文本的理解与编辑是一种极具挑战性的应用场景,因为学术文本需要较高的专业性、严谨性等,有时还需要处理公式、代码、图谱等特殊的内容格式。现在,一个名为「ChatGPT 学术优化(chatgpt_academic)」的新项目在 GitHub 上爆火,上线几天就在 GitHub 上狂揽上万 Star。项目地址:https://github.com/

30行Python代码就可以调用ChatGPT API总结论文的主要内容30行Python代码就可以调用ChatGPT API总结论文的主要内容Apr 04, 2023 pm 12:05 PM

阅读论文可以说是我们的日常工作之一,论文的数量太多,我们如何快速阅读归纳呢?自从ChatGPT出现以后,有很多阅读论文的服务可以使用。其实使用ChatGPT API非常简单,我们只用30行python代码就可以在本地搭建一个自己的应用。 阅读论文可以说是我们的日常工作之一,论文的数量太多,我们如何快速阅读归纳呢?自从ChatGPT出现以后,有很多阅读论文的服务可以使用。其实使用ChatGPT API非常简单,我们只用30行python代码就可以在本地搭建一个自己的应用。使用 Python 和 C

用ChatGPT秒建大模型!OpenAI全新插件杀疯了,接入代码解释器一键get用ChatGPT秒建大模型!OpenAI全新插件杀疯了,接入代码解释器一键getApr 04, 2023 am 11:30 AM

ChatGPT可以联网后,OpenAI还火速介绍了一款代码生成器,在这个插件的加持下,ChatGPT甚至可以自己生成机器学习模型了。 ​上周五,OpenAI刚刚宣布了惊爆的消息,ChatGPT可以联网,接入第三方插件了!而除了第三方插件,OpenAI也介绍了一款自家的插件「代码解释器」,并给出了几个特别的用例:解决定量和定性的数学问题;进行数据分析和可视化;快速转换文件格式。此外,Greg Brockman演示了ChatGPT还可以对上传视频文件进行处理。而一位叫Andrew Mayne的畅销作

ChatGPT教我学习PHP中AOP的实现(附代码)ChatGPT教我学习PHP中AOP的实现(附代码)Mar 30, 2023 am 10:45 AM

本篇文章给大家带来了关于php的相关知识,其中主要介绍了我是怎么用ChatGPT学习PHP中AOP的实现,感兴趣的朋友下面一起来看一下吧,希望对大家有帮助。

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
Repo: How To Revive Teammates
1 months agoBy尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

mPDF

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

MinGW - Minimalist GNU for Windows

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

SublimeText3 English version

SublimeText3 English version

Recommended: Win version, supports code prompts!

SecLists

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use