Home  >  Article  >  Technology peripherals  >  What is the instruction learning behind ChatGPT? PSU publishes its first comprehensive review of "Instructional Learning"

What is the instruction learning behind ChatGPT? PSU publishes its first comprehensive review of "Instructional Learning"

王林
王林forward
2023-04-07 19:51:011151browse

Task semantics can be represented by a set of input to output examples or a text instruction. Traditional natural language processing (NLP) machine learning methods mainly rely on the availability of large-scale task-specific sample sets.

But two problems arise: First, collect task-specific markup examples that are not suitable for tasks that may be too complex or expensive to annotate, or that the system requires Scenarios where new tasks are handled immediately; secondly, this is not user friendly as end users may prefer to provide a description of the task before using the system rather than a set of examples.

As a result, the community has become increasingly interested in a new supervision-seeking paradigm for NLP: From task instructions Study in . Despite the impressive progress, the community still faces some common issues.

This article attempts to summarize the current research on instruction learning from the following aspects:

(1) What is a task instruction and what kinds of instructions exist? Instruction type?

(2) How to model instructions?

(3) What factors affect and explain the execution of instructions?

(4) What challenges still exist in the directive?

To our knowledge, this is the first comprehensive investigation of textual instructions.

What is the instruction learning behind ChatGPT? PSU publishes its first comprehensive review of Instructional Learning

##Paper address: https://arxiv.org/pdf/2303.10475v2.pdf

1 Introduction

One of the goals of artificial intelligence is to build a system that can universally understand and solve new tasks. Labeled examples, as mainstream task representations, are unlikely to be widely available or even non-existent. So, are there other task representations that can contribute to task understanding? Task instructions provide another supervisory dimension for expressing task semantics, and instructions often contain more abstract and comprehensive knowledge of the target task than a single labeled example.

Instruction learning is inspired by typical human learning of new tasks, For example, a child can solve it well by learning from instructions and a few examples A new mathematical task. This new learning paradigm has recently attracted major attention from the machine learning and NLP communities.

As shown in Figure 1, through the availability of task instructions, systems can be quickly built to handle new tasks, especially when task-specific annotations are scarce.

What is the instruction learning behind ChatGPT? PSU publishes its first comprehensive review of Instructional Learning

When it comes to task instructions, most of us first associate the concept with prompts - using A short template reformats new input into a language modeling problem in order to generate responses for initiating PLM. Although hints are ubiquitous in text classification, machine translation, etc., hints are just a special case of instructions. This article provides a comprehensive and broader view of instruction-driven NLP research. Specifically, we try to answer the following questions:

  • What are task instructions and what types of instructions exist?
  • Given a task Instructions, how can they be encoded to help complete the target task?
  • What factors (such as model size, number of tasks) affect the performance of instruction-driven systems, and how to design better instructions?
  • What applications can instruction learning bring?
  • What challenges exist in instruction learning, and what are the future directions?

What is the instruction learning behind ChatGPT? PSU publishes its first comprehensive review of Instructional Learning

To our knowledge, this is the first paper to investigate learning from text instructions. Compared with some existing surveys that focus on specific context instructions, such as prompts, input-by-output demonstrations, or reasoning, we provide a broader perspective that connects different research in this field in an organized way. I hope this article can present a better instruction learning story and attract more colleagues to study this challenging artificial intelligence problem. We have also published a corresponding reading list for this survey.

2 Basic knowledge

For task-based learning, the goal is to drive the system to achieve the output of a given input by following instructions. Therefore, a dataset consists of three elements:

Input (X): the input of the instance; it can be a piece of text (such as sentiment classification) or a Group text (such as text implication, question answer, etc.).

Output (Y): The output of the instance; in a classification problem, it can be one or more predefined labels; in a text generation task, it Can be any open-form text.

Template (T): A text template that attempts to express the meaning of a task alone, or to act as a bridge between X and y. T may not yet be a component structure.

3 What is a task order?

Various types of text instructions have been used in previous zero-shot and few-shot NLP tasks, such as prompts, Amazon Mechanical Turk instructions, instructions supplemented by demonstrations, and Thought chain explanation. Different instructions were originally designed for different goals (e.g., Mturk instructions were originally created for human annotator understanding, prompts were for controlling PLM). In this section, as shown in Figure 2, we first summarize these instructions into three categories that perform different combinations of T, formal definition.

3.1 I=T^ Y:Entailment-led directive

A traditional solution for handling classification tasks is to The target label is converted into an index and the model is allowed to decide which index the input belongs to. This paradigm focuses on encoding input semantics while losing label semantics. In order for the system to recognize new labels without relying on a large number of labeled examples, Yin et al. propose to establish a hypothesis for each label - then, the derived truth value of the label is converted into the truth value of the determined hypothesis. As shown in Table 1, this method is built into instruction I and combines template T with label Y to interpret each target label Y. Since this paradigm naturally satisfies the format of textual entailment (TE, where task inputs and instructions can be viewed as premises and hypotheses, respectively), these types of instructions are called "entailment-oriented instructions."

The entailment-oriented instruction learning method has the following four advantages:

(1) Maintains the label semantics, so that Input encoding and output encoding receive equal attention when modeling input-output relationships;

(2) forms a unified reasoning process—textual implication—to handle various NLP Question;

(3) It creates the opportunity to leverage indirect supervision of existing TE datasets so that pre-trained TE models are expected to perform well on these targets without task-specific fine-tuning. Work on the task;

(4) Extend the original closed-set label classification problem to an open-domain open-form label recognition problem with a small number or even zero generic class samples.

Therefore, it is widely used in various few-shot/zero-shot classification tasks, such as classifying topics, emotions, postures, entity types and entity relationships.

What is the instruction learning behind ChatGPT? PSU publishes its first comprehensive review of Instructional Learning

##3.2 I=T^ X: PLM-oriented instructions (such as ˆ prompt)

A prompt is a representation of a PLM-oriented instruction. It is usually a short statement preceded by task input (prefix prompt), or a cloze question template (cloze prompt). It is mainly used to query intermediate responses (which can be further converted into final answers) from pre-trained language models (PLM).

Since the prompt input meets the pre-training goals of PLM, for example, the Gestalt-style input meets the masked language modeling goal, it helps to get rid of the dependence on traditional supervised fine-tuning and greatly alleviates the cost of manual annotation. . As a result, fast learning has achieved impressive results on a large number of previous few/zero-shot NLP tasks, such as question answering, machine translation, sentiment analysis, text entailment, and named entity recognition.

What is the instruction learning behind ChatGPT? PSU publishes its first comprehensive review of Instructional Learning

3.3 People-oriented instructions

People-oriented instructions are basically Refers to instructions used for crowdsourcing on human annotation platforms (such as Amazon MTurk instructions). Unlike human-oriented instructions, human-oriented instructions are usually some human-readable, descriptive, paragraph-style task-specific text information, consisting of task titles, categories, definitions, things to avoid, etc. Therefore, human-centered instructions are more user-friendly and can be ideally applied to almost any complex NLP task.

4 How to model instructions?

In this section, we summarize several of the most popular instructional learning modeling strategies. Overall, this paper introduces four different modeling schemes: for early machine learning-based systems, (1) semantic parser-based strategies are a common method for encoding instructions; with the advent of neural networks and pre-trained language models Emerging, (2) cue template-based and (3) prefix-instruction-based instruction learning models have become two favored paradigms; recently, (4) hypernetwork-based methods have also attracted greater interest.

5 Application

##5.1 Human-computer interaction

Text instructions can be naturally regarded as A human-computer interaction method. Much previous work has used natural language instructions to "instruct" computers to perform a variety of real-world tasks.

For non-NLP (multimodal) tasks, most focus on environment-based language learning, that is, driving the agent to associate natural language instructions with the environment and make corresponding Reactions such as selecting mentioned objects from images/videos, following navigation instructions, drawing corresponding traces on the map, playing football/card games based on given rules, generating real-time sports broadcasts, controlling software and querying external databases. At the same time, instructions are also widely used to help communicate with systems to solve NLP tasks, such as following instructions for manipulating strings, classifying emails based on a given explanation, and text-to-code generation.

In recent years, more and more researches have tended to design the human-computer communication process in an iterative and modular manner. For example, Li et al. built a system to help users with daily tasks (e.g., ordering coffee or requesting an Uber). Thanks to the user-friendly graphical interface, the system can iteratively ask questions about tasks, and users can continually refine their instructions to avoid unclear descriptions or vague concepts. Similarly, Dwivedi-Yu et al. proposed a benchmark to iteratively guide PLM to improve text, where each iteration uses only a short set of instructions with a precise purpose (e.g., “simplify text” or “make text neutral”). In addition, Chakrabarty et al. built a collaborative poetry writing system where users can initially provide an ambiguous instruction (e.g., "Write a poem about cakes") and then gradually refine it with more details by observing the model's intermediate output. Instructions (e.g., "Contains the word -chocolate"). Meanwhile, Mishra and Nouri proposed a biography generation system that gradually collects necessary personal information from the user (by asking questions to guide the user in conversational scenarios) and ultimately generates a paragraph-based biography. In response to the problem that non-expert users have difficulty writing complete instructions at once, adopting an iterative and modular design paradigm in the design of instruction-based artificial intelligence systems can guide users to gradually enrich task instructions, thereby effectively alleviating users' thinking needs. Make the system more user-oriented. This article highlights the importance of this branch of work given its practical value.

5.2 Data and feature enhancement

Task orders are considered an indirect source of supervision, which sometimes contain superficial and arbitrary rules. These rules are also called labeling functions and can be applied directly to annotations (e.g., the sentence "a very fair price" is sentimentally positive because "the word price is directly preceded by fair"). Therefore, some existing works also use instructions as remote supervision to perform data or feature enhancement. For example, Srivastava et al. use semantic parsers to convert natural language explanations into logical forms and apply them to all instances in the dataset to generate additional binary features. Wang et al. used label interpretation to automatically annotate the original corpus and train a classifier on the generated noisy data. In addition to direct expansion, Su et al. further used task instructions to enrich the model representation and achieve strong cross-task generalization. Specifically, they trained an embedding model (single encoder) on a different instruction dataset with contrastive learning and then used the model to generate instruction-based task-specific representations for downstream unseen tasks.

5.3 Universal Language Model

According to the definition of Artificial General Intelligence (AGI), "General A "model" is usually a system that is capable of performing different tasks and scalable in changing environments, far beyond its creator's original expectations. Although specific to the NLP domain, the general language model should be an excellent multi-task assistant capable of proficiently handling a variety of real-world NLP tasks and different languages ​​in a completely zero-shot/few-shot manner. Since much existing work demonstrates the surprising ability of using instructions in cross-task generalization, this instruction is likely to be a breakthrough toward this ultimate goal.

It is worth noting that two recent notable applications of instructions, namely InstructGPT and ChatGPT, also indicate a big step towards building general language models. However, unlike other works that mainly adopt instructional learning, ChatGPT also adopts some other components such as reinforcement learning with human feedback (RLHF). While the answer to “which component contributes more to ChatGPT’s excellent results” remains vague and requires further investigation, we introduce some recent work to highlight the critical role of instruction learning. For example, Chung et al. conducted extensive experiments to evaluate human preference alignment for PaLM. They found that even without any human feedback, instruction fine-tuning significantly reduced the toxicities of PaLM's open generation, such as gender and occupational bias. Additionally, some other work has also used creative guidance alone rather than human feedback and achieved significant cross-task results. Although ChatGPT still has many unsatisfactory aspects and is still far from a universal language model, we hope that the goal of AGI can continue to be promoted through the adoption and development of more powerful technologies, including instruction learning.

The above is the detailed content of What is the instruction learning behind ChatGPT? PSU publishes its first comprehensive review of "Instructional Learning". For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete