Home  >  Article  >  Technology peripherals  >  Easy and Efficient Transformer (NetEase's ultra-large model online inference engine)

Easy and Efficient Transformer (NetEase's ultra-large model online inference engine)

王林
王林forward
2024-01-24 10:45:05384browse

Easy and Efficient Transformer(网易超大模型线上推理引擎)

NetEase’s open source inference acceleration framework for transformer-based models supports single-card high-performance inference of tens of billions of models on the mid- to low-end Ampere architecture.

Project Background

Transformer-based large-scale models have proven effective in a variety of tasks in many fields. However, applying it to industrial production requires considerable effort to reduce the inference cost. To fill this gap, we propose a scalable inference solution: Easy and Efficient Transformer (EET). EET is a system that includes a series of Transformer reasoning optimizations at the algorithm and implementation levels. By optimizing the calculation and data processes of Transformer, EET can significantly reduce the cost of inference and improve the efficiency and performance of the model. Our experimental results show that EET can significantly improve inference speed and resource utilization without losing model accuracy, providing a simple and effective solution for large-scale model applications in industrial production.

First, we design a highly optimized kernel for long inputs and large hidden sizes.

In addition, we also propose a flexible CUDA memory manager to reduce the memory footprint when deploying large models. Compared with the state-of-the-art Transformer inference library (Faster Transformer v4.0), EET is able to achieve an average 1.40-4.20x decoding layer acceleration on the A100 GPU.

Paper address

https://arxiv.org/abs/2104.12470

Github address

https://github.com/NetEase-FuXi /EET

The above is the detailed content of Easy and Efficient Transformer (NetEase's ultra-large model online inference engine). For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:163.com. If there is any infringement, please contact admin@php.cn delete