Teach you how to shear 'alpaca' step by step, Chen Danqi's team proposed the LLM-Shearing large model pruning method-AI-php.cn

Home

Technology peripherals

Teach you how to shear 'alpaca' step by step, Chen Danqi's team proposed the LLM-Shearing large model pruning method

PHPz

Oct 12, 2023 pm 06:29 PM

projectLarge model pruningllm-shearing

What will be the effect of cutting the alpaca hair of the large model of Llama 2? Today, Princeton University’s Chen Danqi team proposed a large model pruning method called LLM-Shearing, which can achieve better performance than models of the same size with a small amount of calculation and cost.

Since the emergence of large language models (LLMs), they have achieved remarkable results on various natural language tasks. However, large language models require massive computing resources to train. As a result, the industry is increasingly interested in building equally powerful mid-scale models, with the emergence of LLaMA, MPT, and Falcon, enabling efficient inference and fine-tuning.

These LLMs of varying sizes are suitable for different use cases, but training each individual model from scratch (even a small 1 billion parameter model) still requires a lot of computing resources , which is still a huge burden for most scientific research institutions.

Therefore, in this article, Chen Danqi’s team at Princeton University attempts to solve the following problem: Can existing pre-trained LLM be used to build a smaller, general and performance-effective Competitive LLM while requiring much less computation than training from scratch?

Researchers explore the use of structured pruning to achieve their goals. The problem here is that for general-purpose LLM, the pruned model will experience performance degradation, especially if there is no significant computational investment after pruning. The efficient pruning method they used can be used to develop smaller but still performance-competitive LLMs, and training requires significantly less computation than training from scratch.

Teach you how to shear alpaca step by step, Chen Danqis team proposed the LLM-Shearing large model pruning method

Paper address: https://arxiv.org/abs/2310.06694
Code address: https://github.com/princeton-nlp/LLM-Shearing
ModelsSheared-LLaMA-1.3B, Sheared-LLaMA-2.7B

Before pruning LLM, researchers identified two key technical challenges. One is how to determine the final pruning structure with powerful performance and efficient reasoning? LLM's current structured pruning technology does not have a specified target structure, resulting in unsatisfactory performance and inference speed of the pruned model; second, how to continue pre-training the pruned model to achieve expected performance? They observed that training with raw pre-training data resulted in different loss reductions across domains compared to training the model from scratch.

In response to these two challenges, the researchers proposed the "LLM - shearing" algorithm. This novel pruning algorithm, called "directed structured pruning," prunes the source model to a specified target architecture, which is determined by the configuration of the existing pre-trained model. They show that the pruning method searches for substructures in the source model and maximizes performance under resource constraints. In addition, a dynamic batch loading algorithm is designed, which can load the training data of each domain in proportion according to the loss reduction rate, thereby efficiently utilizing the data and accelerating the overall performance improvement.

Finally, the researchers pruned the LLaMA2-7B model into two smaller LLMs, namely Sheared-LLaMA-1.3B and Sheared-LLaMA-2.7B , confirming the effectiveness of its method.

Teach you how to shear alpaca step by step, Chen Danqis team proposed the LLM-Shearing large model pruning method

They only used 50 billion tokens (i.e. 5% of the OpenLLaMA pre-training budget) to prune and continue pre-training, but for 11 representative downstream tasks (such as general knowledge, reading comprehension, and world knowledge) and open-ended generated instruction tuning, both models still outperform other popular LLMs of similar size, including Pythia, INCITE, and OpenLLaMA.

Teach you how to shear alpaca step by step, Chen Danqis team proposed the LLM-Shearing large model pruning method

But it should be mentioned that when this paper released Sheared-LLaMA-3B, the record of the strongest 3B open source model had been broken by StableLM-3B.

Teach you how to shear alpaca step by step, Chen Danqis team proposed the LLM-Shearing large model pruning method

In addition, downstream task performance trajectories indicate that using more tokens to further train the pruned model will bring greater benefits. The researchers only experimented with models up to 7 billion parameters, but LLM-shearing is highly general and can be extended to large language models of any size in future work.

Method introduction

Given an existing large model M_S (source model ), the goal of this article is to study how to effectively generate a smaller and stronger model M_T (target model). The study believes that this requires two stages to complete:

The first stage prunes M_S to M_T. Although this reduces the number of parameters, it Inevitably leads to performance degradation;
The second stage continues to pretrain M_T to make its performance stronger.

structured pruning

structured pruning A large number of parameters can be removed from the model, thereby compressing the model and accelerating inference. However, existing structured pruning methods can cause models to deviate from conventional architectural configurations. For example, the CoFiPruning method produces models with non-uniform layer configurations, which incurs additional inference overhead compared to standard unified layer configurations.

This article extends CoFiPruning to allow source models to be pruned to any target configuration specified. For example, this article uses the INCITE-Base-3B architecture as the target structure when generating the 2.7B model.

In addition, this article also learns a set of pruning masks on model parameters of different granularities. The mask variables are as follows:

Teach you how to shear alpaca step by step, Chen Danqis team proposed the LLM-Shearing large model pruning method

Each mask variable controls whether to prune or retain relevant substructures. For example, if the corresponding z^layer= 0, this layer needs to be deleted. Figure 2 below illustrates how pruning masks control which structures are pruned.

Teach you how to shear alpaca step by step, Chen Danqis team proposed the LLM-Shearing large model pruning method

After pruning, this paper finalizes the pruned architecture by retaining the highest scoring components associated with the mask variables in each substructure and continues using language construction. The model target is used to pre-train the pruned model.

Dynamic batch loading

This study believes that a large number of pruned models should be Pre-training is necessary to restore model performance.

Inspired by other research, this paper proposes a more efficient algorithm, dynamic batch loading, which can simply dynamically adjust the domain scale based on model performance. The algorithm is as follows:

Teach you how to shear alpaca step by step, Chen Danqis team proposed the LLM-Shearing large model pruning method

Experiments and results

Model configuration: This article uses LLaMA2-7B The model was used as the source model, and then a structured pruning experiment was performed. They compressed LLaMA2-7B into two smaller target sizes of 2.7 B and 1.3 B parameters, and compared the performance of the pruned model with models of the same size. Including OPT-1.3B, Pythia-1.4B, OPT-2.7B, Pythia-2.8B, INCITE-Base-3B, OpenLLaMA-3B-v1, OpenLLaMA-3B-v2. Table 8 summarizes the model architecture details for all these models.

Teach you how to shear alpaca step by step, Chen Danqis team proposed the LLM-Shearing large model pruning method

Data: Since the training data of LLaMA2 is not publicly accessible, this article uses the RedPajama dataset. Table 1 provides the pre-training data used by this paper’s model and the baseline model.

Teach you how to shear alpaca step by step, Chen Danqis team proposed the LLM-Shearing large model pruning method

Training: We used up to 16 Nvidia A100 GPUs (80GB) in all experiments.

SHEARED-LLAMA outperforms equivalently sized LM

This paper shows that Sheared- LLaMA significantly outperforms existing LLMs of similar size while using only a fraction of the computational budget to train these models from scratch.

Downstream tasks: Table 2 shows the zero-shot and few-shot performance of Sheared-LLaMA and existing pre-trained models of similar size on downstream tasks.

Teach you how to shear alpaca step by step, Chen Danqis team proposed the LLM-Shearing large model pruning method

Instruction Tuning: As shown in Figure 3, the instruction-tuned Sheared-LLaMA achieves a higher winning rate compared to all other pre-trained models of the same scale.

Teach you how to shear alpaca step by step, Chen Danqis team proposed the LLM-Shearing large model pruning method

Figure 4 shows that the INCITEBase-3B model starts out with much higher accuracy, but its performance levels off during the ongoing pre-training process.

Teach you how to shear alpaca step by step, Chen Danqis team proposed the LLM-Shearing large model pruning method

Analysis

Finally, the researcher analyzed the advantages of this method.

Effectiveness of dynamic batch loading

Among them, the researchers studied the following three To analyze the effectiveness of dynamic batch loading, we analyze the impact of: (1) the final LM loss across domains, (2) the data usage of each domain throughout the training process, and (3) downstream task performance. The results are based on the Sheared-LaMA-1.3B algorithm.

Cross-domain loss difference. The purpose of dynamic batch loading is to balance the loss reduction rate of each domain so that the loss reaches the reference value in approximately the same time. The difference between the model loss (original batch loading and dynamic batch loading) and the reference loss is plotted in Figure 5. In contrast, dynamic batch loading reduces the loss evenly and the difference in loss across domains is also very similar, which shows that the data More efficient use.

Teach you how to shear alpaca step by step, Chen Danqis team proposed the LLM-Shearing large model pruning method

Data usage. Table 3 compares RedPajama’s raw data proportions and dynamically loaded domain data usage (Figure 7 shows the changes in domain weights throughout the training process). Dynamic bulk loading increases the weight of the Book and C4 domains compared to other domains, indicating that these domains are more difficult to recover from the pruned model.

Teach you how to shear alpaca step by step, Chen Danqis team proposed the LLM-Shearing large model pruning method

Downstream performance. As shown in Figure 6, the pruned model trained using dynamic batch loading achieved better downstream performance compared to the model trained on the original RedPajama distribution. This suggests that the more balanced loss reduction brought about by dynamic batch loading can improve downstream performance.

Teach you how to shear alpaca step by step, Chen Danqis team proposed the LLM-Shearing large model pruning method

Comparison with other pruning methods

In addition, the researchers used LLM- The shearing method is compared with other pruning methods and validation perplexity is reported, which is a strong indicator of overall model capability.

Due to computational limitations, the following experiments control the total computational budget of all compared methods rather than running each method to the end.

As shown in Table 4, under the same sparsity, the inference throughput of the target pruning model in this article is higher than that of the non-uniform pruning CoFiPruning model, but the perplexity Slightly higher.

Teach you how to shear alpaca step by step, Chen Danqis team proposed the LLM-Shearing large model pruning method

Other analysis

Table 5 shows that when the total amount of tokens is controlled , increasing pruning overhead can continuously improve perplexity. However, since pruning is more expensive than continuous pre-training, the researchers allocate 0.4B tokens to pruning.

Teach you how to shear alpaca step by step, Chen Danqis team proposed the LLM-Shearing large model pruning method

For more research details, please refer to the original paper.

The above is the detailed content of Teach you how to shear 'alpaca' step by step, Chen Danqi's team proposed the LLM-Shearing large model pruning method. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:机器之心. If there is any infringement, please contact admin@php.cn delete

Tool Calling in LLMsApr 14, 2025 am 11:28 AM

Large language models (LLMs) have surged in popularity, with the tool-calling feature dramatically expanding their capabilities beyond simple text generation. Now, LLMs can handle complex automation tasks such as dynamic UI creation and autonomous a

How ADHD Games, Health Tools & AI Chatbots Are Transforming Global HealthApr 14, 2025 am 11:27 AM

Can a video game ease anxiety, build focus, or support a child with ADHD? As healthcare challenges surge globally — especially among youth — innovators are turning to an unlikely tool: video games. Now one of the world’s largest entertainment indus

UN Input On AI: Winners, Losers, And OpportunitiesApr 14, 2025 am 11:25 AM

“History has shown that while technological progress drives economic growth, it does not on its own ensure equitable income distribution or promote inclusive human development,” writes Rebeca Grynspan, Secretary-General of UNCTAD, in the preamble.

Learning Negotiation Skills Via Generative AIApr 14, 2025 am 11:23 AM

Easy-peasy, use generative AI as your negotiation tutor and sparring partner. Let’s talk about it. This analysis of an innovative AI breakthrough is part of my ongoing Forbes column coverage on the latest in AI, including identifying and explaining

TED Reveals From OpenAI, Google, Meta Heads To Court, Selfie With MyselfApr 14, 2025 am 11:22 AM

The TED2025 Conference, held in Vancouver, wrapped its 36th edition yesterday, April 11. It featured 80 speakers from more than 60 countries, including Sam Altman, Eric Schmidt, and Palmer Luckey. TED’s theme, “humanity reimagined,” was tailor made

Joseph Stiglitz Warns Of The Looming Inequality Amid AI Monopoly PowerApr 14, 2025 am 11:21 AM

Joseph Stiglitz is renowned economist and recipient of the Nobel Prize in Economics in 2001. Stiglitz posits that AI can worsen existing inequalities and consolidated power in the hands of a few dominant corporations, ultimately undermining economic

What is Graph Database?Apr 14, 2025 am 11:19 AM

Graph Databases: Revolutionizing Data Management Through Relationships As data expands and its characteristics evolve across various fields, graph databases are emerging as transformative solutions for managing interconnected data. Unlike traditional

LLM Routing: Strategies, Techniques, and Python ImplementationApr 14, 2025 am 11:14 AM

Large Language Model (LLM) Routing: Optimizing Performance Through Intelligent Task Distribution The rapidly evolving landscape of LLMs presents a diverse range of models, each with unique strengths and weaknesses. Some excel at creative content gen

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Best Graphic Settings

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution

2 weeks agoByDDD

R.E.P.O. How to Fix Audio if You Can't Hear Anyone

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

WWE 2K25: How To Unlock Everything In MyRise

1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

EditPlus Chinese cracked version

Small size, syntax highlighting, does not support code prompt function

VSCode Windows 64-bit Download

A free and powerful IDE editor launched by Microsoft

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

SublimeText3 Linux new version

SublimeText3 Linux latest version

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

Hot Topics

Where is the login entrance for gmail email?

7504

CakePHP Tutorial

1378

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers