70 times ultimate compression! No matter how many checkpoints you have on a large model, you won't be afraid.-AI-php.cn

Home

Technology peripherals

70 times ultimate compression! No matter how many checkpoints you have on a large model, you won't be afraid.

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Aug 06, 2024 am 01:46 AM

projectExCP

70 times ultimate compression! No matter how many checkpoints you have on a large model, you wont be afraid.

The AIxiv column is a column where this site publishes academic and technical content. In the past few years, the AIxiv column of this site has received more than 2,000 reports, covering top laboratories from major universities and companies around the world, effectively promoting academic exchanges and dissemination. If you have excellent work that you want to share, please feel free to contribute or contact us for reporting. Submission email: liyazhou@jiqizhixin.com; zhaoyunfeng@jiqizhixin.com

The authors of this paper are all from Huawei Noah Laboratory. The first author is Li Wenshuo, and the corresponding authors are Wang Yunhe and Chen Xinghao. In recent years, relevant teams have published a number of representative works at top conferences such as ICML, CVPR, NeurIPS, ICCV, and ECCV. They have produced rich results in fields such as efficient large language models and visual models, and have cooperated with well-known universities and scientific research institutions. Institutional cooperation is extensive.

As the well-deserved “king of traffic” in the current AI industry and academia, large models have attracted a large number of scholars and companies to invest resources in research and training. As the scale grows, system and engineering issues have become unavoidable problems in large model training. For example, during the 54-day training of Llama3.1, the system crashed 466 times, averaging once every 2.78 hours!

70 times ultimate compression! No matter how many checkpoints you have on a large model, you wont be afraid.

Then, frequent storage checkpoints are very necessary. But storing checkpoints is also a big project in itself.

70 times ultimate compression! No matter how many checkpoints you have on a large model, you wont be afraid.

Meta has made a lot of efforts to speed up storage checkpoint time and increase storage frequency to combat frequent system failures. But frequent storage also means a lot of storage resource overhead. Its training cluster is equipped with 240PB SSD to meet this challenge. The cost of storage alone is 100 million yuan!

Huawei Noah’s ExCP method came into being. In order to deal with the huge overhead caused by storage, they proposed extreme compression checkpoint technology, which can losslessly compress the model 70 times, significantly reducing the storage overhead during training.

70 times ultimate compression! No matter how many checkpoints you have on a large model, you wont be afraid.

The code is currently open source and released under the Apache 2.0 framework. Some partners in the issue have successfully reproduced the results.

70 times ultimate compression! No matter how many checkpoints you have on a large model, you wont be afraid.

Article address: https://arxiv.org/abs/2406.11257
Warehouse address: https://github.com/Gaffey/ExCP

The method is also very good Innovation, the article mentioned two important concepts. One is to use the residual information of checkpoints in training to achieve a higher pruning ratio through the sparsity of information on the time series; the other is to combine the optimizer and weights up for compression to achieve an overall high compression rate.

70 times ultimate compression! No matter how many checkpoints you have on a large model, you wont be afraid.

Specific method

1. Checkpoint residual

During the training process, the current parameters can be regarded as the weights stored in the previous checkpoint plus the successive The sum of gradient updates during iterations is relatively sparse and contains less information. Therefore, compressing this residual can achieve a better compression ratio. On the contrary, the momentum stored in the optimizer is the sliding average of the first and second moments of the gradient. For the first moment, the default parameter of the sliding average is 0.9, which ranges from hundreds to thousands. After the iteration, there is not much correlation with the content stored in the last checkpoint, so the optimizer directly compresses its own value rather than the residual. The final checkpoint to be compressed is expressed as

70 times ultimate compression! No matter how many checkpoints you have on a large model, you wont be afraid.

2. Weight-optimizer momentum joint compression

The existing work related to model compression generally only focuses on the inference performance of the model, or the size of the final storage checkpoint of the model, but does not pay attention to the model’s The storage space overhead during the entire training process. Therefore, existing work only compresses the weights, ignoring that common optimizers such as Adam actually store momentum that is twice the number of weights. On the one hand, this work compresses the two together, significantly improving the overall compression ratio; on the other hand, it also uses the correlation between weights and optimizer momentum to further improve each other's compression ratio.

Weight pruning: Since the weight of pruning is the residual value, the second moment of the optimizer momentum can roughly represent the change amplitude of the weight residual value in the past period of time, so the second moment of the optimizer momentum can be used. The order moment is used as an indicator to determine the pruning ratio of different layers. The pruning strategy is shown in the following formula

70 times ultimate compression! No matter how many checkpoints you have on a large model, you wont be afraid. where W and represent the weight and the second moment respectively.

Optimizer momentum pruning: For momentum pruning, you can use the first-order moment as an indicator to perform pruning. There is a brief proof of convergence in the paper. At the same time, if the weight of a position has been pruned, the optimizer momentum of the corresponding position should also be processed simultaneously, so the pruning strategy is as shown in the following formula

70 times ultimate compression! No matter how many checkpoints you have on a large model, you wont be afraid.

where

represents the first-order moment.

3. Overall compression process

The overall compression process is as shown in Algorithm 1, and the steps of calculating weight residual/joint compression/non-uniform quantization/coding compression are sequentially performed to obtain the final Compress the results.

70 times ultimate compression! No matter how many checkpoints you have on a large model, you wont be afraid.

The process of recovering the complete checkpoint file is as shown in Algorithm 2. After decompression, the floating point result is first recovered from the codebook and subscript stored after non-uniform quantization, and then compared with the benchmark The weights (the original weight of the previous checkpoint or the recovered reconstructed weight) are added to obtain the complete checkpoint file. The process of restoring the checkpoint files in the entire training process is as shown in Algorithm 3. After completing the training, only the random seeds of the initialization weights and the compression results stored at each checkpoint are saved, and then the checkpoints are restored in sequence to obtain the complete A sequence of checkpoints from which one or more checkpoints can be selected to resume training/testing, etc.

Experimental results

The article not only evaluates large language models, this method can also achieve good results on larger visual models such as ViT-L32.

70 times ultimate compression! No matter how many checkpoints you have on a large model, you wont be afraid.

It can also be seen from the ablation experiment that the use of residual pruning greatly reduces the loss caused by pruning.

70 times ultimate compression! No matter how many checkpoints you have on a large model, you wont be afraid.

The article also provides examples of question and answer for large language models before and after compression. It can be seen that the compression itself does not cause damage to the question and answer ability of the model.

70 times ultimate compression! No matter how many checkpoints you have on a large model, you wont be afraid.

The above is the detailed content of 70 times ultimate compression! No matter how many checkpoints you have on a large model, you won't be afraid.. For more information, please follow other related articles on the PHP Chinese website!

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

undress free porn AI tool websiteMay 13, 2025 am 11:26 AM

https://undressaitool.ai/ is Powerful mobile app with advanced AI features for adult content. Create AI-generated pornographic images or videos now!

How to create pornographic images/videos using undressAIMay 13, 2025 am 11:26 AM

Tutorial on using undressAI to create pornographic pictures/videos: 1. Open the corresponding tool web link; 2. Click the tool button; 3. Upload the required content for production according to the page prompts; 4. Save and enjoy the results.

undress AI official website entrance website addressMay 13, 2025 am 11:26 AM

The official address of undress AI is:https://undressaitool.ai/;undressAI is Powerful mobile app with advanced AI features for adult content. Create AI-generated pornographic images or videos now!

How does undressAI generate pornographic images/videos?May 13, 2025 am 11:26 AM

undressAI porn AI official website addressMay 13, 2025 am 11:26 AM

The official address of undress AI is:https://undressaitool.ai/;undressAI is Powerful mobile app with advanced AI features for adult content. Create AI-generated pornographic images or videos now!

UndressAI usage tutorial guide articleMay 13, 2025 am 10:43 AM

[Ghibli-style images with AI] Introducing how to create free images with ChatGPT and copyrightMay 13, 2025 am 01:57 AM

The latest model GPT-4o released by OpenAI not only can generate text, but also has image generation functions, which has attracted widespread attention. The most eye-catching feature is the generation of "Ghibli-style illustrations". Simply upload the photo to ChatGPT and give simple instructions to generate a dreamy image like a work in Studio Ghibli. This article will explain in detail the actual operation process, the effect experience, as well as the errors and copyright issues that need to be paid attention to. For details of the latest model "o3" released by OpenAI, please click here⬇️ Detailed explanation of OpenAI o3 (ChatGPT o3): Features, pricing system and o4-mini introduction Please click here for the English version of Ghibli-style article⬇️ Create Ji with ChatGPT

Explaining examples of use and implementation of ChatGPT in local governments! Also introduces banned local governmentsMay 13, 2025 am 01:53 AM

As a new communication method, the use and introduction of ChatGPT in local governments is attracting attention. While this trend is progressing in a wide range of areas, some local governments have declined to use ChatGPT. In this article, we will introduce examples of ChatGPT implementation in local governments. We will explore how we are achieving quality and efficiency improvements in local government services through a variety of reform examples, including supporting document creation and dialogue with citizens. Not only local government officials who aim to reduce staff workload and improve convenience for citizens, but also all interested in advanced use cases.

See all articles