Home  >  Article  >  Technology peripherals  >  One line of code, making elixirs twice as fast! PyTorch 2.0 comes out in surprise, LeCun enthusiastically forwards it

One line of code, making elixirs twice as fast! PyTorch 2.0 comes out in surprise, LeCun enthusiastically forwards it

王林
王林forward
2023-04-13 08:49:021406browse

On December 2, PyTorch 2.0 was officially released!

This update not only pushes the performance of PyTorch to new heights, but also adds support for dynamic shapes and distribution.

In addition, the 2.0 series will also move some PyTorch code from C back to Python.

One line of code, making elixirs twice as fast! PyTorch 2.0 comes out in surprise, LeCun enthusiastically forwards it

Currently, PyTorch 2.0 is still in the testing phase, and the first stable version is expected to be available in early March 2023.

One line of code, making elixirs twice as fast! PyTorch 2.0 comes out in surprise, LeCun enthusiastically forwards it

PyTorch 2.x: Faster, more Pythonic!

Over the past few years, PyTorch has innovated and iterated from 1.0 to the recent 1.13, and moved to the newly formed PyTorch Foundation to become part of the Linux Foundation.

The challenge with the current version of PyTorch is that eager-mode has difficulty keeping up with ever-increasing GPU bandwidth and crazier model architectures.

The birth of PyTorch 2.0 will fundamentally change and improve the way PyTorch runs at the compiler level.

One line of code, making elixirs twice as fast! PyTorch 2.0 comes out in surprise, LeCun enthusiastically forwards it

As we all know, (Py) in PyTorch comes from the open source Python programming language widely used in data science.

However, PyTorch’s code does not entirely use Python, but gives part of it to C.

However, in the future 2.x series, the PyTorch project team plans to move the code related to torch.nn back to Python.

Beyond that, since PyTorch 2.0 is a completely add-on (and optional) feature, 2.0 is 100% backwards compatible.

In other words, the code base is the same, the API is the same, and the way to write models is the same.

More technical support

  • TorchDynamo

Capture safely using Python framework evaluation hooks The PyTorch program is a major innovation developed by the team in graph capture over the past five years.

  • AOTAutograd

Overloads PyTorch's autograd engine as a tracing autodiff for generating lookahead reverse traces.

  • PrimTorch

Reduced ~2000+ PyTorch operators into a closed set of ~250 primitive operators that developers can target operator to build a complete PyTorch backend. The barrier to writing PyTorch functionality or backends is greatly lowered.

  • TorchInductor

A deep learning compiler that can generate fast code for multiple accelerators and backends. For Nvidia's GPUs, it uses OpenAI Triton as a key building block.

It is worth noting that TorchDynamo, AOTAutograd, PrimTorch and TorchInductor are all written in Python and support dynamic shapes.

Faster training speed

By introducing the new compilation mode "torch.compile", PyTorch 2.0 can be accelerated with just one line of code Model training.

No tricks required here, just run torch.compile() and that’s it:

opt_module = torch.compile(module)

In order to verify these technologies, the team carefully created test benchmarks, including image classification, object detection, image generation and other tasks, as well as various NLP tasks such as language modeling, question answering, sequence classification, recommendation systems and Reinforcement learning. Among them, these benchmarks can be divided into three categories:

  • 46 models from HuggingFace Transformers
  • 61 models from TIMM: The most advanced PyTorch collection by Ross Wightman Image model
  • 56 models from TorchBench: a set of popular code bases from github

Test results show that in these 163 models spanning vision, NLP and other fields On the open source model, the training speed has been improved by 38%-76%.

One line of code, making elixirs twice as fast! PyTorch 2.0 comes out in surprise, LeCun enthusiastically forwards it

Comparison on NVIDIA A100 GPU

In addition, the team also Benchmarks were conducted on some popular open source PyTorch models and substantial speedups from 30% to 2x were obtained.

Developer Sylvain Gugger said: "With just one line of code, PyTorch 2.0 can achieve a 1.5x to 2.0x speedup when training Transformers models. This is self-mixing precision training The most exciting thing since its advent!"

Technical Overview

PyTorch's compiler can be broken down into three parts:

  • Acquisition of graphs
  • Reduction of graphs
  • Compilation of graphs

Among them, when building the PyTorch compiler, the acquisition of graphs is more Difficult challenge.

One line of code, making elixirs twice as fast! PyTorch 2.0 comes out in surprise, LeCun enthusiastically forwards it

TorchDynamo

At the beginning of this year, the team started working on TorchDynamo. This method uses A CPython feature introduced in PEP-0523, called the Framework Evaluation API.

To this end, the team took a data-driven approach to verify the effectiveness of TorchDynamo on graph capture - by using more than 7,000 Github projects written in PyTorch as validation set.

The results show that TorchDynamo can perform graph capture correctly and safely 99% of the time, with negligible overhead.

TorchInductor

For PyTorch 2.0’s new compiler backend, the team took inspiration from how users write high-performance custom kernels : Increasing use of the Triton language.

TorchInductor uses Pythonic-defined loop-by-loop level IR to automatically map PyTorch models to generated Triton code on the GPU and C/OpenMP on the CPU.

TorchInductor's core loop-level IR only contains about 50 operators, and it is implemented in Python, making it easy to extend.

AOTAutograd

To speed up training, you need to capture not only user-level code, but also backpropagation.

AOTAutograd can use PyTorch's torch_dispatch extension mechanism to track the Autograd engine, capture backpropagation "in advance", and then use TorchInductor to accelerate the forward and backward channels.

PrimTorch

PyTorch has more than 1200 operators, and if you take into account the various overloads of each operator, there are more than 2000 indivual. Therefore, writing back-end or cross-domain functionality becomes an energy-consuming task.

In the PrimTorch project, the team defined two smaller and more stable operator sets:

  • Prim ops have about ~250 operators, suitable for compilers. Being low-level enough, it's just a matter of fusing them together to get good performance.
  • ATen ops has about ~750 typical operators suitable for output as-is. These are suitable for backends that are already integrated at the ATen level, or that are not compiled, thus restoring the performance of low-level operator sets like Prim ops.

One line of code, making elixirs twice as fast! PyTorch 2.0 comes out in surprise, LeCun enthusiastically forwards it

Dynamic Shapes

A key requirement when researching what is necessary to support the generality of PyTorch code is to support dynamic shapes and allow models to accept tensors of different sizes without causing recompilation every time the shape changes.

When dynamic shapes are not supported, a common workaround is to pad them to the nearest power of 2. However, as we can see from the chart below, it incurs a significant performance overhead and also results in significantly longer compilation times.

Now, with support for dynamic shapes, PyTorch 2.0 has achieved up to 40% higher performance than Eager.

One line of code, making elixirs twice as fast! PyTorch 2.0 comes out in surprise, LeCun enthusiastically forwards it

Finally, in the roadmap for PyTorch 2.x, the team hopes to further push the compilation model forward in terms of performance and scalability.

One line of code, making elixirs twice as fast! PyTorch 2.0 comes out in surprise, LeCun enthusiastically forwards it

The above is the detailed content of One line of code, making elixirs twice as fast! PyTorch 2.0 comes out in surprise, LeCun enthusiastically forwards it. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete