Is 'imitation learning' just a cliché? Explanation fine-tuning + 13 billion parameters Orca: reasoning ability equals ChatGPT-AI-php.cn

Is 'imitation learning' just a cliché? Explanation fine-tuning + 13 billion parameters Orca: reasoning ability equals ChatGPT

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jun 17, 2023 am 11:39 AM

aistudy

Since the opening of the ChatGPT API, a large number of studies have chosen to use the output of large basic models (LFM) such as ChatGPT and GPT-4 as training data, and then improve the capabilities of small models through imitation learning.

However, due to problems such as superficial imitation signals, insufficient training data, and lack of strict evaluation standards, the actual performance of small models has been overestimated.

From an effect point of view, the small model is more inclined to imitate the output style of LFM rather than the inference process.

Is imitation learning just a cliché? Explanation fine-tuning + 13 billion parameters Orca: reasoning ability equals ChatGPT

## Paper link: https://arxiv.org/pdf/2306.02707.pdf

To address these challenges, Microsoft recently released a 51-page paper proposing a 13 billion-parameter Orca model that can learn to imitate the reasoning process of LFMs.

The researchers designed rich training signals for the large model, so that Orca can learn explanation traces, step-by-step thinking processes, complex instructions, etc. from GPT-4, and by ChatGPT Teachers assist in guidance; and mining large-scale and diverse imitation data through sampling and selection can further enhance the progressive learning effect.

In experimental evaluation, Orca outperformed other SOTA instruction fine-tuning models, achieving double the performance of Vicuna-13B in complex zero-shot inference benchmarks such as BigBench Hard (BBH) Performance, a 42% performance improvement was also achieved on AGIEval.

Is imitation learning just a cliché? Explanation fine-tuning + 13 billion parameters Orca: reasoning ability equals ChatGPT

Additionally, Orca achieved performance on par with ChatGPT on the BBH benchmark and on professional and academic exams such as the SAT, LSAT, GRE, and GMAT There is only a 4% performance gap in , and they are all measured in a zero-sample setting without thought chaining.

Is imitation learning just a cliché? Explanation fine-tuning + 13 billion parameters Orca: reasoning ability equals ChatGPT

#The findings show that letting models learn from step-by-step explanations, whether those explanations are generated by humans or more advanced AI models, They are all promising research directions to improve model capabilities and skills.

Explanation Tuning

Dataset construction

In the training data, each instance includes three parts, namely system message, user query and LFM reply.

System message (system message) is placed at the beginning of the prompt and provides basic context, guidance and other related details to LFM.

System messages can be used to change the length of responses, describe the personality of the AI assistant, establish acceptable and unacceptable LFM behavior, and determine the response structure of the AI model.

The researchers hand-crafted 16 pieces of system information to design different types of LFM responses, which can generate creative content and solve information query problems. The most important thing is to be able to generate explanations and prompts based on the prompts. Step by step reasoning answers.

Is imitation learning just a cliché? Explanation fine-tuning + 13 billion parameters Orca: reasoning ability equals ChatGPT

User query Defines the actual task you want LFM to perform.

In order to obtain a large number of diverse user queries, researchers used the FLAN-v2 collection to extract 5 million user queries (FLAN-5M) and collect ChatGPT responses; Then we further extracted 1 million instructions (FLAN-1M) from the 5 million instructions to collect the responses of GPT-4.

The FLAN-v2 set consists of five sub-sets, namely CoT, NiV2, T0, Flan 2021 and Dialogue, where each subset contains multiple tasks, and each task is a query collection.

Each sub-collection is related to multiple academic datasets, and each dataset has one or more tasks that focus mainly on zero-shot and few-shot queries.

In this work, the researchers only sampled the zero-shot queries for training Orca and did not sample from the Dialogue subset because these queries often lack the context to be useful from ChatGPT reply.

Let ChatGPT act as Teaching Assistant

First train Orca on FLAN-5M data (ChatGPT enhancement), followed by the second stage of training (GPT-4 enhancement) on FLAN-1M.

There are two main reasons for using ChatGPT as an intermediate teacher assistant:

1. Capability gap

Although the parameter amount of GPT-4 has not been disclosed, the 13 billion parameters of Orca are definitely many times smaller than GPT-4, and the capability gap between ChatGPT and Orca is Smaller, more suitable as an intermediate teacher, and this approach has been proven to improve the imitation learning performance of smaller student models in knowledge distillation.

This approach can also be seen as a kind of progressive learning or course learning, in which students first learn from easier examples and then move on to more difficult examples, assuming that the more Long responses will be more difficult to imitate than shorter responses, allowing for improved reasoning and step-by-step explanation skills from larger teacher models.

#2. Cost and Time

Large-scale data collection from Azure OpenAI API There will be some restrictions, including the rate limit of requests per minute to prevent excessive traffic; due to service delay issues, the number of available tokens per minute is limited; the prompt length and the monetary cost of token completion.

Is imitation learning just a cliché? Explanation fine-tuning + 13 billion parameters Orca: reasoning ability equals ChatGPT

In comparison, ChatGPT API is faster and cheaper than GPT-4 terminal, so more is collected from ChatGPT than GPT-4 5 times the data.

Is imitation learning just a cliché? Explanation fine-tuning + 13 billion parameters Orca: reasoning ability equals ChatGPT

It can be observed from the distribution of reply lengths of ChatGPT and GPT-4 corresponding to different system messages that the replies of GPT-4 are longer on average than those of ChatGPT 1.5x, enabling Orca to progressively learn from the complexity of teacher explanations, and demonstrating the impact of teacher help through ablation experiments.

Training

In the word segmentation stage, the researchers used LLaMA’s byte pair encoding (BPE) tokenizer to process input samples where multi-digit numbers are split into multiple single digits and fall back to bytes to decompose unknown UTF-8 characters.

In order to handle variable-length sequences, a filler word [[PAD]] is introduced in the vocabulary of the LLaMA tokenizer, and the final vocabulary contains 32001 tokens

In order to optimize the training process and effectively utilize available computing resources, researchers used packing technology to concatenate multiple input instances into a sequence before training the model.

During the packing process, the total length of the concatenated sequence does not exceed max_len=2048 tokens. The input samples will be randomly shuffled and divided into several groups. The length of each group of concatenated sequences At most max_len

Taking into account the length distribution of boosting instructions in the training data, the packing factor of each sequence is 2.7

To train Orca, The researchers chose to only calculate the loss of tokens generated by the teacher model, which means that learning to generate responses conditioned on system information and task instructions can ensure that the model focuses on learning from the most relevant and informative tokens, improving the efficiency of the training process. Overall efficiency and effectiveness.

Finally, Orca was trained on 20 NVIDIA A100 GPUs with 80GB of memory. It was first trained on FLAN-5M (ChatGPT enhanced) for 4 epochs, which took 160 hours; then on FLAN-1M (GPT -4 enhancement) and continue to train for 4 epochs

Due to traffic restrictions, terminal load and reply length issues, multiple GPT-3.5-turbo (ChatGPT) and GPT-4 The terminals took 2 and 3 weeks to collect data respectively.

Experimental part

The researchers mainly verified Orca’s reasoning capabilities.

Is imitation learning just a cliché? Explanation fine-tuning + 13 billion parameters Orca: reasoning ability equals ChatGPT

As can be seen in the AGIEval experiment, Orca's performance is equivalent to Text-da-Vinci-003 and achieves 88% of ChatGPT's Performance, but significantly behind GPT-4

For analysis and reasoning tasks, Vicuna performed significantly worse, retaining only 62% of ChatGPT quality, indicating that this open source language model The reasoning ability is very poor.

While Orca performs equally well with Text-da-Vinci-003, it is still 5 points lower than ChatGPT, Orca performs better on math-related tasks (in SAT, GRE, GMAT) There is a big gap between it and ChatGPT.

Compared to Vicuna, Orca shows stronger performance, outperforming Vicuna in every category, with an average relative improvement of 42%.

GPT-4 far outperforms all other models, but there is still significant room for improvement in this benchmark, with all models currently performing significantly below human scores .

Is imitation learning just a cliché? Explanation fine-tuning + 13 billion parameters Orca: reasoning ability equals ChatGPT

Orca's performance varies greatly depending on the type of system message. For trained models, empty system messages tend to work well. .

Is imitation learning just a cliché? Explanation fine-tuning + 13 billion parameters Orca: reasoning ability equals ChatGPT

Orca outperforms ChatGPT (Orca-beats-ChatGPT example) on 325 samples of different tasks, most of which are from LogiQA (29% ), while other LSAT tasks and SAT-English tasks each account for less than 10%

The reasoning evaluation results on the Big-Bench Hard Results data set show that Orca’s performance in all tasks The overall performance is slightly better than ChatGPT, but significantly behind GPT-4; 113% higher than Vicuna performance

Is imitation learning just a cliché? Explanation fine-tuning + 13 billion parameters Orca: reasoning ability equals ChatGPT

The above is the detailed content of Is 'imitation learning' just a cliché? Explanation fine-tuning + 13 billion parameters Orca: reasoning ability equals ChatGPT. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete

An easy-to-understand explanation of how to set up two-step authentication in ChatGPT!May 12, 2025 pm 05:37 PM

ChatGPT Security Enhanced: Two-Stage Authentication (2FA) Configuration Guide Two-factor authentication (2FA) is required as a security measure for online platforms. This article will explain in an easy-to-understand manner the 2FA setup procedure and its importance in ChatGPT. This is a guide for those who want to use ChatGPT safely. Click here for OpenAI's latest AI agent, OpenAI Deep Research ⬇️ [ChatGPT] What is OpenAI Deep Research? A thorough explanation of how to use it and the fee structure! table of contents ChatG

[For businesses] ChatGPT training | A thorough introduction to 8 free training options, subsidies, and examples!May 12, 2025 pm 05:35 PM

The use of generated AI is attracting attention as the key to improving business efficiency and creating new businesses. In particular, OpenAI's ChatGPT has been adopted by many companies due to its versatility and accuracy. However, the shortage of personnel who can effectively utilize ChatGPT is a major challenge in implementing it. In this article, we will explain the necessity and effectiveness of "ChatGPT training" to ensure successful use of ChatGPT in companies. We will introduce a wide range of topics, from the basics of ChatGPT to business use, specific training programs, and how to choose them. ChatGPT training improves employee skills

A thorough explanation of how to use ChatGPT to streamline your Twitter operations!May 12, 2025 pm 05:34 PM

Improved efficiency and quality in social media operations are essential. Particularly on platforms where real-time is important, such as Twitter, requires continuous delivery of timely and engaging content. In this article, we will explain how to operate Twitter using ChatGPT from OpenAI, an AI with advanced natural language processing capabilities. By using ChatGPT, you can not only improve your real-time response capabilities and improve the efficiency of content creation, but you can also develop marketing strategies that are in line with trends. Furthermore, precautions for use

[For Mac] Explaining how to get started and how to use the ChatGPT desktop app!May 12, 2025 pm 05:33 PM

ChatGPT Mac desktop app thorough guide: from installation to audio functions Finally, ChatGPT's desktop app for Mac is now available! In this article, we will thoroughly explain everything from installation methods to useful features and future update information. Use the functions unique to desktop apps, such as shortcut keys, image recognition, and voice modes, to dramatically improve your business efficiency! Installing the ChatGPT Mac version of the desktop app Access from a browser: First, access ChatGPT in your browser.

What is the character limit for ChatGPT? Explanation of how to avoid it and upper limits by modelMay 12, 2025 pm 05:32 PM

When using ChatGPT, have you ever had experiences such as, "The output stopped halfway through" or "Even though I specified the number of characters, it didn't output properly"? This model is very groundbreaking and not only allows for natural conversations, but also allows for email creation, summary papers, and even generate creative sentences such as novels. However, one of the weaknesses of ChatGPT is that if the text is too long, input and output will not work properly. OpenAI's latest AI agent, "OpenAI Deep Research"

What is ChatGPT's voice input and voice conversation function? Explaining how to set it up and how to use itMay 12, 2025 pm 05:27 PM

ChatGPT is an innovative AI chatbot developed by OpenAI. It not only has text input, but also features voice input and voice conversation functions, allowing for more natural communication. In this article, we will explain how to set up and use the voice input and voice conversation functions of ChatGPT. Even when you can't take your hands off, ChatGPT responds and responds with audio just by talking to you, which brings great benefits in a variety of situations, such as busy business situations and English conversation practice. A detailed explanation of how to set up the smartphone app and PC, as well as how to use each.

An easy-to-understand explanation of how to use ChatGPT for job hunting and job hunting!May 12, 2025 pm 05:26 PM

The shortcut to success! Effective job change strategies using ChatGPT In today's intensifying job change market, effective information gathering and thorough preparation are key to success. Advanced language models like ChatGPT are powerful weapons for job seekers. In this article, we will explain how to effectively utilize ChatGPT to improve your job hunting efficiency, from self-analysis to application documents and interview preparation. Save time and learn techniques to showcase your strengths to the fullest, and help you make your job search a success. table of contents Examples of job hunting using ChatGPT Efficiency in self-analysis: Chat

An easy-to-understand explanation of how to create and output mind maps using ChatGPT!May 12, 2025 pm 05:22 PM

Mind maps are useful tools for organizing information and coming up with ideas, but creating them can take time. Using ChatGPT can greatly streamline this process. This article will explain in detail how to easily create mind maps using ChatGPT. Furthermore, through actual examples of creation, we will introduce how to use mind maps on various themes. Learn how to effectively organize and visualize your ideas and information using ChatGPT. OpenAI's latest AI agent, OpenA

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Roblox: Grow A Garden - Complete Mutation Guide

3 weeks agoByDDD

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

How to fix KB5055612 fails to install in Windows 10?

3 weeks agoByDDD

Nordhold: Fusion System, Explained

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Mandragora: Whispers Of The Witch Tree - How To Unlock The Grappling Hook

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

ZendStudio 13.5.1 Mac

Powerful PHP integrated development environment

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

SublimeText3 Linux new version

SublimeText3 Linux latest version

Hot Topics

1666

1425

1327

1273

1252