Use BigDL-LLM to instantly accelerate tens of billions of parameter LLM inferences-AI-php.cn

Home

Technology peripherals

Use BigDL-LLM to instantly accelerate tens of billions of parameter LLM inferences

王林

Sep 05, 2023 pm 01:49 PM

AIdata

We are entering a new era of AI driven by Large Language Model (LLM). LLM is playing an increasingly important role in various applications such as customer service, virtual assistants, content creation, and programming assistance. role.

However, as the scale of LLM continues to expand, the resource consumption required to run large models is also increasing, causing it to run slower and slower, which brings considerable challenges to AI application developers. challenge.

To this end, Intel recently launched a large model open source library called BigDL-LLM^[1], which can help AI developers and researchers in Intel ^® accelerates the optimization of large language models on the platform and improves the experience of using large language models on the Intel ^® platform.

用BigDL-LLM 即刻加速百亿级参数LLM推理

The following shows the 33 billion parameter large language model Vicuna-33b-v1.3^[2] accelerated using BigDL-LLM Real-time effects running on a server equipped with Intel^®Xeon^®Platinum 8468 processor.

△Running a 33 billion parameter large language on a server equipped with Intel^®Xeon^®Platinum 8468 processor Actual speed of the model (real-time screen recording)

BigDL-LLM: An open source large language model acceleration library on the Intel^® platform

BigDL-LLM is a library focused on optimization and an open source library for accelerating large language models. It is part of BigDL and released under the Apache 2.0 license

It provides various low-precision optimizations (such as INT4/INT5/INT8) and can leverage a variety of Intel^®CPU integrated hardware acceleration technology (AVX/VNNI/AMX, etc.) and the latest software optimization enable large language models to achieve more efficient optimization on the Intel^® platform and run faster.

An important feature of BigDL-LLM is that for models based on the Hugging Face Transformers API, you only need to change one line of code to accelerate the model. In theory, it can support running any Transformers model, which is very friendly to developers who are familiar with the Transformers API.

In addition to Transformers API, many people also use LangChain to develop large language model applications.

To this end, BigDL-LLM also provides easy-to-use LangChain integration^[3], allowing developers to easily use BigDL-LLM to develop new applications or migrate existing, Applications based on Transformers API or LangChain API.

In addition, for general PyTorch large language models (models that do not use Transformer or LangChain API), you can also use BigDL-LLM optimize_model API one-click acceleration to improve performance. For details, please refer to GitHub README^[4] and official documentation^[5].

BigDL-LLM also provides a large number of commonly used open source LLM acceleration examples (e.g. examples using Transformers API^[6] and examples using LangChain API^[7], and tutorials (including supporting jupyter notebooks) ^[8], to facilitate developers to quickly get started.

Installation and use: simple installation process and easy-to-use API interface

Installing BigDL-LLM is very convenient, just execute the following command:

pip install --pre --upgrade bigdl-llm[all]

△If the code is not fully displayed, please leave Sliding

It is also very easy to use BigDL-LLM to accelerate large models (only the Transformers style API is used as an example here).

Use BigDL-LLM Transformer style API to accelerate the model , only the model loading part needs to be changed, and the subsequent use process is completely consistent with the native Transformers.

The method of loading the model using the BigDL-LLM API is almost the same as the Transformers API - the user only needs to change the import, in the from_pretrained parameter Just set load_in_4bit=True .

BigDL-LLM will perform 4-bit low-precision quantization during the model loading process and use it in the subsequent inference process Various software and hardware acceleration technologies are optimized

#Load Hugging Face Transformers model with INT4 optimizationsfrom bigdl.llm. transformers import AutoModelForCausalLMmodel = AutoModelForCausalLM.from_pretrained('/path/to/model/', load_in_4bit=True)

△If the code is not fully displayed, please slide left or right

示例：快速实现一个基于大语言模型的语音助手应用

下文将以 LLM 常见应用场景“语音助手”为例，展示采用 BigDL-LLM 快速实现 LLM 应用的案例。通常情况下，语音助手应用的工作流程分为以下两个部分：

用BigDL-LLM 即刻加速百亿级参数LLM推理

△图 1. 语音助手工作流程示意

语音识别——使用语音识别模型（本示例采用了 Whisper 模型^[9] ）将用户的语音转换为文本；
文本生成——将 1 中输出的文本作为提示语 (prompt)，使用一个大语言模型（本示例采用了 Llama2^[10] ）生成回复。

以下是本文使用 BigDL-LLM 和 LangChain^[11] 来搭建语音助手应用的过程：

在语音识别阶段：第一步，加载预处理器 processor 和语音识别模型 recog_model。本示例中使用的识别模型 Whisper 是一个 Transformers 模型。

只需使用 BigDL-LLM 中的 AutoModelForSpeechSeq2Seq 并设置参数 load_in_4bit=True，就能够以 INT4 精度加载并加速这一模型，从而显著缩短模型推理用时。

#processor = WhisperProcessor .from_pretrained(recog_model_path)recog_model = AutoModelForSpeechSeq2Seq .from_pretrained(recog_model_path, load_in_4bit=True)

△若代码显示不全，请左右滑动

第二步，进行语音识别。首先使用处理器从输入语音中提取输入特征，然后使用识别模型预测 token，并再次使用处理器将 token 解码为自然语言文本。

input_features = processor(frame_data,sampling_rate=audio.sample_rate,return_tensor=“pt”).input_featurespredicted_ids = recogn_model.generate(input_features, forced_decoder_ids=forced_decoder_ids)text = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]

△若代码显示不全，请左右滑动

在文本生成阶段，首先使用 BigDL-LLM 的 TransformersLLM API 创建一个 LangChain 语言模型（TransformersLLM 是在 BigDL-LLM 中定义的语言链 LLM 集成）。

可以使用这个 API 来加载 Hugging Face Transformers 的任何模型

llm = TransformersLLM . from_model_id(model_id=llm_model_path,model_kwargs={"temperature": 0, "max_length": args.max_length, "trust_remote_code": True},)

△若代码显示不全，请左右滑动

然后，创建一个正常的对话链 LLMChain，并将已经创建的 llm 设置为输入参数。

# The following code is complete the same as the use-casevoiceassistant_chain = LLMChain(llm=llm, prompt=prompt,verbose=True,memory=ConversationBufferWindowMemory(k=2),)

△若代码显示不全，请左右滑动

以下代码将使用一个链条来记录所有对话历史，并将其适当地格式化为大型语言模型的输入。这样，我们可以生成合适的回复。只需将识别模型生成的文本作为 "human_input" 输入即可。代码如下：

response_text = voiceassistant_chain .predict(human_input=text, stop=”\n\n”)

△若代码显示不全，请左右滑动

最后，将语音识别和文本生成步骤放入循环中，即可在多轮对话中与该“语音助手”交谈。您可访问底部 ^[12] 链接，查看完整的示例代码，并使用自己的电脑进行尝试。快用 BigDL-LLM 来快速搭建自己的语音助手吧！

作者简介

黄晟盛是英特尔公司的资深架构师，黄凯是英特尔公司的AI框架工程师，戴金权是英特尔院士、大数据技术全球CTO和BigDL项目的创始人，他们都从事着与大数据和AI相关的工作

The above is the detailed content of Use BigDL-LLM to instantly accelerate tens of billions of parameter LLM inferences. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete

Are You At Risk Of AI Agency Decay? Take The Test To Find OutApr 21, 2025 am 11:31 AM

This article explores the growing concern of "AI agency decay"—the gradual decline in our ability to think and decide independently. This is especially crucial for business leaders navigating the increasingly automated world while retainin

How to Build an AI Agent from Scratch? - Analytics VidhyaApr 21, 2025 am 11:30 AM

Ever wondered how AI agents like Siri and Alexa work? These intelligent systems are becoming more important in our daily lives. This article introduces the ReAct pattern, a method that enhances AI agents by combining reasoning an

Revisiting The Humanities In The Age Of AIApr 21, 2025 am 11:28 AM

"I think AI tools are changing the learning opportunities for college students. We believe in developing students in core courses, but more and more people also want to get a perspective of computational and statistical thinking," said University of Chicago President Paul Alivisatos in an interview with Deloitte Nitin Mittal at the Davos Forum in January. He believes that people will have to become creators and co-creators of AI, which means that learning and other aspects need to adapt to some major changes. Digital intelligence and critical thinking Professor Alexa Joubin of George Washington University described artificial intelligence as a “heuristic tool” in the humanities and explores how it changes

Understanding LangChain Agent FrameworkApr 21, 2025 am 11:25 AM

LangChain is a powerful toolkit for building sophisticated AI applications. Its agent architecture is particularly noteworthy, allowing developers to create intelligent systems capable of independent reasoning, decision-making, and action. This expl

What are the Radial Basis Functions Neural Networks?Apr 21, 2025 am 11:13 AM

Radial Basis Function Neural Networks (RBFNNs): A Comprehensive Guide Radial Basis Function Neural Networks (RBFNNs) are a powerful type of neural network architecture that leverages radial basis functions for activation. Their unique structure make

The Meshing Of Minds And Machines Has ArrivedApr 21, 2025 am 11:11 AM

Brain-computer interfaces (BCIs) directly link the brain to external devices, translating brain impulses into actions without physical movement. This technology utilizes implanted sensors to capture brain signals, converting them into digital comman

Insights on spaCy, Prodigy and Generative AI from Ines MontaniApr 21, 2025 am 11:01 AM

This "Leading with Data" episode features Ines Montani, co-founder and CEO of Explosion AI, and co-developer of spaCy and Prodigy. Ines offers expert insights into the evolution of these tools, Explosion's unique business model, and the tr

A Guide to Building Agentic RAG Systems with LangGraphApr 21, 2025 am 11:00 AM

This article explores Retrieval Augmented Generation (RAG) systems and how AI agents can enhance their capabilities. Traditional RAG systems, while useful for leveraging custom enterprise data, suffer from limitations such as a lack of real-time dat

See all articles