A deep dive into models, data, and frameworks: an exhaustive 54-page review of efficient large language models-AI-php.cn

Home

Technology peripherals

A deep dive into models, data, and frameworks: an exhaustive 54-page review of efficient large language models

PHPz

Jan 14, 2024 pm 07:48 PM

aiResearchinvestigation

Large-scale language models (LLMs) have demonstrated compelling capabilities in many important tasks, including natural language understanding, language generation, and complex reasoning, and have had a profound impact on society. However, these outstanding capabilities require significant training resources (shown in the left image) and long inference times (shown in the right image). Therefore, researchers need to develop effective technical means to solve their efficiency problems.

In addition, as can be seen from the right side of the figure, some efficient LLMs (Language Models) such as Mistral-7B have been successfully used in the design and deployment of LLMs. These efficient LLMs can greatly reduce inference memory usage and reduce inference latency while maintaining accuracy similar to LLaMA1-33B. This shows that there are already some feasible and efficient methods that have been successfully applied to the design and use of LLMs.

A deep dive into models, data, and frameworks: an exhaustive 54-page review of efficient large language models

In this review, experts from Ohio State University, Imperial College, Michigan State University, University of Michigan, Amazon, Google, Boson AI, Researchers at Microsoft Asia Research provide a systematic and comprehensive survey of research into efficient LLMs. They divided existing technologies for optimizing the efficiency of LLMs into three categories, including model-centric, data-centric and framework-centric, and summarized and discussed the most cutting-edge related technologies.

A deep dive into models, data, and frameworks: an exhaustive 54-page review of efficient large language models

Paper: https://arxiv.org/abs/2312.03863
GitHub: https://github.com/AIoT-MLSys-Lab/Efficient-LLMs-Survey

In order to conveniently organize the papers involved in the review and keep them updated, the researcher created a GitHub repository and actively maintains it. They hope that this repository will help researchers and practitioners systematically understand the research and development of efficient LLMs and inspire them to contribute to this important and exciting field.

The URL of the warehouse is https://github.com/AIoT-MLSys-Lab/Efficient-LLMs-Survey. In this repository you can find content related to a survey of efficient and low-power machine learning systems. This repository provides research papers, code, and documentation to help people better understand and explore efficient and low-power machine learning systems. If you are interested in this area, you can get more information by visiting this repository.

Model Centric

A model-centric approach focuses on efficient techniques at the algorithm level and system level, where the model itself is the focus. Since LLMs have billions or even trillions of parameters and have unique characteristics such as emergence compared to smaller-scale models, new techniques need to be developed to optimize the efficiency of LLMs. This article discusses five categories of model-centric methods in detail, including model compression, efficient pre-training, efficient fine-tuning, efficient inference and efficient model architecture design.

1. Compression model In the field of machine learning, model size is often an important consideration. Larger models often require more storage space and computing resources, and may encounter limitations when running on mobile devices. Therefore, model compression is a commonly used technology that can reduce the size of the model

Model compression technology is mainly divided into four categories: quantization, parameter pruning, and low-rank estimation and knowledge distillation (see the figure below), in which quantization will compress the weights or activation values of the model from high precision to low precision, parameter pruning will search and delete the more redundant parts of the model weights, and low-rank estimation will reduce the model's weights. The weight matrix is converted into the product of several low-rank small matrices, and knowledge distillation directly uses the large model to train the small model, so that the small model has the ability to replace the large model when doing certain tasks.

A deep dive into models, data, and frameworks: an exhaustive 54-page review of efficient large language models

2. Efficient pre-training

The cost of pre-training LLMs is very expensive. Efficient pre-training aims to improve efficiency and reduce the cost of the LLMs pre-training process. Efficient pre-training can be divided into mixed precision acceleration, model scaling, initialization technology, optimization strategy and system-level acceleration.

Mixed-precision acceleration improves the efficiency of pre-training by calculating gradients, weights, and activations using low-precision weights, which are then converted back to high-precision and applied to update the original weights. Model scaling accelerates pre-training convergence and reduces training costs by using the parameters of small models to scale to large models. Initialization technology speeds up the convergence of the model by designing the initialization value of the model. The optimization strategy focuses on designing lightweight optimizers to reduce memory consumption during model training. System-level acceleration uses distributed and other technologies to accelerate model pre-training from the system level.

A deep dive into models, data, and frameworks: an exhaustive 54-page review of efficient large language models

3. Efficient fine-tuning

Efficient fine-tuning aims to improve LLMs Fine-tuning process efficiency. Common efficient fine-tuning technologies are divided into two categories, one is parameter-based efficient fine-tuning, and the other is memory-efficient fine-tuning.

The goal of parameter-based efficient fine-tuning (PEFT) is to tune LLM to downstream tasks by freezing the entire LLM backbone and updating only a small set of additional parameters. In the paper, we further divided PEFT into adapter-based fine-tuning, low-rank adaptation, prefix fine-tuning and prompt word fine-tuning.

Efficient memory-based fine-tuning focuses on reducing memory consumption during the entire LLM fine-tuning process, such as reducing the memory consumed by optimizer status and activation values.

A deep dive into models, data, and frameworks: an exhaustive 54-page review of efficient large language models

4. Efficient reasoning

Efficient reasoning aims to improve LLMs reasoning process efficiency. Researchers divide common high-efficiency reasoning technologies into two categories, one is algorithm-level reasoning acceleration, and the other is system-level reasoning acceleration.

Inference acceleration at the algorithm level can be divided into two categories: speculative decoding and KV - cache optimization. Speculative decoding speeds up the sampling process by computing tokens in parallel using a smaller draft model to create speculative prefixes for the larger target model. KV - Cache optimization refers to optimizing the repeated calculation of Key-Value (KV) pairs during the inference process of LLMs.

System-level inference acceleration is to optimize the number of memory accesses on specified hardware, increase the amount of algorithm parallelism, etc. to accelerate LLM inference.

A deep dive into models, data, and frameworks: an exhaustive 54-page review of efficient large language models

5. Efficient model architecture design

Efficient architecture design for LLMs refers to strategically optimizing the model structure and calculation process to improve performance and scalability while minimizing resource consumption. We divide efficient model architecture design into four major categories based on model types: efficient attention modules, hybrid expert models, long text large models, and architectures that can replace transformers.

The efficient attention module aims to optimize the complex calculations and memory usage in the attention module, while the mixed expert model (MoE) uses multiple reasoning decisions in some modules of LLMs. A small expert model is used as a replacement to achieve overall sparsity. The long text large model is an LLMs specially designed to efficiently process ultra-long text. The architecture that can replace the transformer is to reduce the complexity of the model and reduce the complexity of the model by redesigning the model architecture. Achieving comparable reasoning capabilities for post-transformer architectures.

A deep dive into models, data, and frameworks: an exhaustive 54-page review of efficient large language models

Data-centric

A data-centric approach focuses on the quality and structure of data in role in improving the efficiency of LLMs. In this article, researchers discuss two types of data-centric methods in detail, includingdata selection and prompt word engineering.

1. Data selection

The data selection of LLMs is aimed at pre-training/ Fine-tune data for cleaning and selection, such as removing redundant and invalid data, to speed up the training process.

A deep dive into models, data, and frameworks: an exhaustive 54-page review of efficient large language models

2. Prompt Word Project

Prompt Word Project guides LLMs by designing effective inputs (prompt words) The efficiency of generating the desired output is that the prompt words can be designed to achieve model performance equivalent to that of tedious fine-tuning. Researchers divide common prompt word engineering technologies into three major categories: few-sample prompt word engineering, prompt word compression, and prompt word generation.

The few-sample prompt word project provides LLM with a limited set of examples to guide its understanding of the tasks that need to be performed. Prompt word compression accelerates LLMs' processing of input by compressing lengthy prompt input or learning and using prompt representations. Prompt word generation aims to automatically create effective prompts that guide the model to generate specific and relevant responses, rather than using manually annotated data.

A deep dive into models, data, and frameworks: an exhaustive 54-page review of efficient large language models

Frame-centered

The researcher investigated The recently popular efficient LLMs framework lists the efficient tasks they can optimize, including pre-training, fine-tuning and inference (as shown in the figure below).

A deep dive into models, data, and frameworks: an exhaustive 54-page review of efficient large language models

Summary

In this survey, the researcher provides everyone with a A systematic review of LLMs, an important research area dedicated to making LLMs more democratized. They begin by explaining why efficient LLMs are needed. Under an orderly framework, this paper investigates efficient technologies at the algorithmic level and system level of LLMs from the model-centered, data-centered, and framework-centered perspectives respectively.

Researchers believe that efficiency will play an increasingly important role in LLMs and LLMs-oriented systems. They hope that this survey will help researchers and practitioners quickly enter this field and serve as a catalyst to stimulate new research on efficient LLMs.

The above is the detailed content of A deep dive into models, data, and frameworks: an exhaustive 54-page review of efficient large language models. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete

California Taps AI To Fast-Track Wildfire Recovery PermitsMay 04, 2025 am 11:10 AM

AI Streamlines Wildfire Recovery Permitting Australian tech firm Archistar's AI software, utilizing machine learning and computer vision, automates the assessment of building plans for compliance with local regulations. This pre-validation significan

What The US Can Learn From Estonia's AI-Powered Digital GovernmentMay 04, 2025 am 11:09 AM

Estonia's Digital Government: A Model for the US? The US struggles with bureaucratic inefficiencies, but Estonia offers a compelling alternative. This small nation boasts a nearly 100% digitized, citizen-centric government powered by AI. This isn't

Wedding Planning Via Generative AIMay 04, 2025 am 11:08 AM

Planning a wedding is a monumental task, often overwhelming even the most organized couples. This article, part of an ongoing Forbes series on AI's impact (see link here), explores how generative AI can revolutionize wedding planning. The Wedding Pl

What Are Digital Defense AI Agents?May 04, 2025 am 11:07 AM

Businesses increasingly leverage AI agents for sales, while governments utilize them for various established tasks. However, consumer advocates highlight the need for individuals to possess their own AI agents as a defense against the often-targeted

A Business Leader's Guide To Generative Engine Optimization (GEO)May 03, 2025 am 11:14 AM

Google is leading this shift. Its "AI Overviews" feature already serves more than one billion users, providing complete answers before anyone clicks a link.[^2] Other players are also gaining ground fast. ChatGPT, Microsoft Copilot, and Pe

This Startup Is Using AI Agents To Fight Malicious Ads And Impersonator AccountsMay 03, 2025 am 11:13 AM

In 2022, he founded social engineering defense startup Doppel to do just that. And as cybercriminals harness ever more advanced AI models to turbocharge their attacks, Doppel’s AI systems have helped businesses combat them at scale— more quickly and

How World Models Are Radically Reshaping The Future Of Generative AI And LLMsMay 03, 2025 am 11:12 AM

Voila, via interacting with suitable world models, generative AI and LLMs can be substantively boosted. Let’s talk about it. This analysis of an innovative AI breakthrough is part of my ongoing Forbes column coverage on the latest in AI, including

May Day 2050: What Have We Left To Celebrate?May 03, 2025 am 11:11 AM

Labor Day 2050. Parks across the nation fill with families enjoying traditional barbecues while nostalgic parades wind through city streets. Yet the celebration now carries a museum-like quality — historical reenactment rather than commemoration of c

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

How to fix KB5055523 fails to install in Windows 11?

3 weeks agoByDDD

How to fix KB5055518 fails to install in Windows 10?

3 weeks agoByDDD

Roblox: Dead Rails - How To Tame Wolves

4 weeks agoByDDD

Strength Levels for Every Enemy & Monster in R.E.P.O.

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Roblox: Grow A Garden - Complete Mutation Guide

2 weeks agoByDDD

Hot Tools

Atom editor mac version download

The most popular open source editor

SublimeText3 Mac version

God-level code editing software (SublimeText3)

PhpStorm Mac version

The latest (2018.2.1) professional PHP integrated development tool

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

Hot Topics

1655

1414

1307

1254

1228