Home  >  Article  >  Technology peripherals  >  Four times faster, Bytedance’s open source high-performance training inference engine LightSeq technology revealed

Four times faster, Bytedance’s open source high-performance training inference engine LightSeq technology revealed

王林
王林forward
2023-05-02 17:52:071323browse

The Transformer model comes from the paper "Attention is all you need" published by the Google team in 2017. This paper first proposed the concept of using Attention to replace the cyclic structure of the Seq2Seq model, which brought a great impact to the NLP field. And with the continuous advancement of research in recent years, Transformer-related technologies have gradually flowed from natural language processing to other fields. Up to now, the Transformer series models have become mainstream models in NLP, CV, ASR and other fields.

Therefore, how to train and infer Transformer models faster has become an important research direction in the industry. Low-precision quantization technology can accelerate the calculation and communication process by reducing the width of data, and is an important means to accelerate model training and inference at this stage. However, the fly in the ointment is that quantization will cause a loss of accuracy and effect, and the loss needs to be reduced through means such as quantified perception and training. In response to the above pain points, ByteDance has developed and upgraded the LightSeq training and inference acceleration engine version 3.0, which for the first time simultaneously achieved precision and lossless Transformer model quantitative training and quantitative inference.

LightSeq implements a true quantization training process through int8 GEMM, instead of using the pseudo quantization method widely used in the industry, which can increase the model training speed by more than 4 times. Through quantitative strategies such as PACT, the loss of quantitative training can be minimized. After exporting the quantitative model to a format supported by LightSeq, you can further use the LightSeq quantitative inference engine to achieve fast inference, with speed improvements up to 70% on T4 graphics cards.

In the [T·TALK] technology sharing event on July 21, we specially invited Mr. Xiong Ying, ByteDance algorithm engineer and LightSeq core developer, to be a guest in the live broadcast room to reveal the secrets of ByteDance to the audience. Technical principles and practical details of the high-performance training inference engine LightSeq. Whether you are a practitioner in the algorithm industry or a developer keen on studying AI technology, I believe you can gain some unique technical experience and innovative inspiration from this sharing.

Welcome everyone to participate in the 12th technology sharing event of [T·TALK] on July 21, 20:00 pm

Scan the bottom of the poster QR codeReservation viewing

Four times faster, Bytedance’s open source high-performance training inference engine LightSeq technology revealed


The above is the detailed content of Four times faster, Bytedance’s open source high-performance training inference engine LightSeq technology revealed. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete