search
HomeTechnology peripheralsAIOne line of code, making elixirs twice as fast! PyTorch 2.0 comes out in surprise, LeCun enthusiastically forwards it

On December 2, PyTorch 2.0 was officially released!

This update not only pushes the performance of PyTorch to new heights, but also adds support for dynamic shapes and distribution.

In addition, the 2.0 series will also move some PyTorch code from C back to Python.

One line of code, making elixirs twice as fast! PyTorch 2.0 comes out in surprise, LeCun enthusiastically forwards it

Currently, PyTorch 2.0 is still in the testing phase, and the first stable version is expected to be available in early March 2023.

One line of code, making elixirs twice as fast! PyTorch 2.0 comes out in surprise, LeCun enthusiastically forwards it

PyTorch 2.x: Faster, more Pythonic!

Over the past few years, PyTorch has innovated and iterated from 1.0 to the recent 1.13, and moved to the newly formed PyTorch Foundation to become part of the Linux Foundation.

The challenge with the current version of PyTorch is that eager-mode has difficulty keeping up with ever-increasing GPU bandwidth and crazier model architectures.

The birth of PyTorch 2.0 will fundamentally change and improve the way PyTorch runs at the compiler level.

One line of code, making elixirs twice as fast! PyTorch 2.0 comes out in surprise, LeCun enthusiastically forwards it

As we all know, (Py) in PyTorch comes from the open source Python programming language widely used in data science.

However, PyTorch’s code does not entirely use Python, but gives part of it to C.

However, in the future 2.x series, the PyTorch project team plans to move the code related to torch.nn back to Python.

Beyond that, since PyTorch 2.0 is a completely add-on (and optional) feature, 2.0 is 100% backwards compatible.

In other words, the code base is the same, the API is the same, and the way to write models is the same.

More technical support

  • TorchDynamo

Capture safely using Python framework evaluation hooks The PyTorch program is a major innovation developed by the team in graph capture over the past five years.

  • AOTAutograd

Overloads PyTorch's autograd engine as a tracing autodiff for generating lookahead reverse traces.

  • PrimTorch

Reduced ~2000+ PyTorch operators into a closed set of ~250 primitive operators that developers can target operator to build a complete PyTorch backend. The barrier to writing PyTorch functionality or backends is greatly lowered.

  • TorchInductor

A deep learning compiler that can generate fast code for multiple accelerators and backends. For Nvidia's GPUs, it uses OpenAI Triton as a key building block.

It is worth noting that TorchDynamo, AOTAutograd, PrimTorch and TorchInductor are all written in Python and support dynamic shapes.

Faster training speed

By introducing the new compilation mode "torch.compile", PyTorch 2.0 can be accelerated with just one line of code Model training.

No tricks required here, just run torch.compile() and that’s it:

opt_module = torch.compile(module)

In order to verify these technologies, the team carefully created test benchmarks, including image classification, object detection, image generation and other tasks, as well as various NLP tasks such as language modeling, question answering, sequence classification, recommendation systems and Reinforcement learning. Among them, these benchmarks can be divided into three categories:

  • 46 models from HuggingFace Transformers
  • 61 models from TIMM: The most advanced PyTorch collection by Ross Wightman Image model
  • 56 models from TorchBench: a set of popular code bases from github

Test results show that in these 163 models spanning vision, NLP and other fields On the open source model, the training speed has been improved by 38%-76%.

One line of code, making elixirs twice as fast! PyTorch 2.0 comes out in surprise, LeCun enthusiastically forwards it

Comparison on NVIDIA A100 GPU

In addition, the team also Benchmarks were conducted on some popular open source PyTorch models and substantial speedups from 30% to 2x were obtained.

Developer Sylvain Gugger said: "With just one line of code, PyTorch 2.0 can achieve a 1.5x to 2.0x speedup when training Transformers models. This is self-mixing precision training The most exciting thing since its advent!"

Technical Overview

PyTorch's compiler can be broken down into three parts:

  • Acquisition of graphs
  • Reduction of graphs
  • Compilation of graphs

Among them, when building the PyTorch compiler, the acquisition of graphs is more Difficult challenge.

One line of code, making elixirs twice as fast! PyTorch 2.0 comes out in surprise, LeCun enthusiastically forwards it

TorchDynamo

At the beginning of this year, the team started working on TorchDynamo. This method uses A CPython feature introduced in PEP-0523, called the Framework Evaluation API.

To this end, the team took a data-driven approach to verify the effectiveness of TorchDynamo on graph capture - by using more than 7,000 Github projects written in PyTorch as validation set.

The results show that TorchDynamo can perform graph capture correctly and safely 99% of the time, with negligible overhead.

TorchInductor

For PyTorch 2.0’s new compiler backend, the team took inspiration from how users write high-performance custom kernels : Increasing use of the Triton language.

TorchInductor uses Pythonic-defined loop-by-loop level IR to automatically map PyTorch models to generated Triton code on the GPU and C/OpenMP on the CPU.

TorchInductor's core loop-level IR only contains about 50 operators, and it is implemented in Python, making it easy to extend.

AOTAutograd

To speed up training, you need to capture not only user-level code, but also backpropagation.

AOTAutograd can use PyTorch's torch_dispatch extension mechanism to track the Autograd engine, capture backpropagation "in advance", and then use TorchInductor to accelerate the forward and backward channels.

PrimTorch

PyTorch has more than 1200 operators, and if you take into account the various overloads of each operator, there are more than 2000 indivual. Therefore, writing back-end or cross-domain functionality becomes an energy-consuming task.

In the PrimTorch project, the team defined two smaller and more stable operator sets:

  • Prim ops have about ~250 operators, suitable for compilers. Being low-level enough, it's just a matter of fusing them together to get good performance.
  • ATen ops has about ~750 typical operators suitable for output as-is. These are suitable for backends that are already integrated at the ATen level, or that are not compiled, thus restoring the performance of low-level operator sets like Prim ops.

One line of code, making elixirs twice as fast! PyTorch 2.0 comes out in surprise, LeCun enthusiastically forwards it

Dynamic Shapes

A key requirement when researching what is necessary to support the generality of PyTorch code is to support dynamic shapes and allow models to accept tensors of different sizes without causing recompilation every time the shape changes.

When dynamic shapes are not supported, a common workaround is to pad them to the nearest power of 2. However, as we can see from the chart below, it incurs a significant performance overhead and also results in significantly longer compilation times.

Now, with support for dynamic shapes, PyTorch 2.0 has achieved up to 40% higher performance than Eager.

One line of code, making elixirs twice as fast! PyTorch 2.0 comes out in surprise, LeCun enthusiastically forwards it

Finally, in the roadmap for PyTorch 2.x, the team hopes to further push the compilation model forward in terms of performance and scalability.

One line of code, making elixirs twice as fast! PyTorch 2.0 comes out in surprise, LeCun enthusiastically forwards it

The above is the detailed content of One line of code, making elixirs twice as fast! PyTorch 2.0 comes out in surprise, LeCun enthusiastically forwards it. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete
Gemma Scope: Google's Microscope for Peering into AI's Thought ProcessGemma Scope: Google's Microscope for Peering into AI's Thought ProcessApr 17, 2025 am 11:55 AM

Exploring the Inner Workings of Language Models with Gemma Scope Understanding the complexities of AI language models is a significant challenge. Google's release of Gemma Scope, a comprehensive toolkit, offers researchers a powerful way to delve in

Who Is a Business Intelligence Analyst and How To Become One?Who Is a Business Intelligence Analyst and How To Become One?Apr 17, 2025 am 11:44 AM

Unlocking Business Success: A Guide to Becoming a Business Intelligence Analyst Imagine transforming raw data into actionable insights that drive organizational growth. This is the power of a Business Intelligence (BI) Analyst – a crucial role in gu

How to Add a Column in SQL? - Analytics VidhyaHow to Add a Column in SQL? - Analytics VidhyaApr 17, 2025 am 11:43 AM

SQL's ALTER TABLE Statement: Dynamically Adding Columns to Your Database In data management, SQL's adaptability is crucial. Need to adjust your database structure on the fly? The ALTER TABLE statement is your solution. This guide details adding colu

Business Analyst vs. Data AnalystBusiness Analyst vs. Data AnalystApr 17, 2025 am 11:38 AM

Introduction Imagine a bustling office where two professionals collaborate on a critical project. The business analyst focuses on the company's objectives, identifying areas for improvement, and ensuring strategic alignment with market trends. Simu

What are COUNT and COUNTA in Excel? - Analytics VidhyaWhat are COUNT and COUNTA in Excel? - Analytics VidhyaApr 17, 2025 am 11:34 AM

Excel data counting and analysis: detailed explanation of COUNT and COUNTA functions Accurate data counting and analysis are critical in Excel, especially when working with large data sets. Excel provides a variety of functions to achieve this, with the COUNT and COUNTA functions being key tools for counting the number of cells under different conditions. Although both functions are used to count cells, their design targets are targeted at different data types. Let's dig into the specific details of COUNT and COUNTA functions, highlight their unique features and differences, and learn how to apply them in data analysis. Overview of key points Understand COUNT and COU

Chrome is Here With AI: Experiencing Something New Everyday!!Chrome is Here With AI: Experiencing Something New Everyday!!Apr 17, 2025 am 11:29 AM

Google Chrome's AI Revolution: A Personalized and Efficient Browsing Experience Artificial Intelligence (AI) is rapidly transforming our daily lives, and Google Chrome is leading the charge in the web browsing arena. This article explores the exciti

AI's Human Side: Wellbeing And The Quadruple Bottom LineAI's Human Side: Wellbeing And The Quadruple Bottom LineApr 17, 2025 am 11:28 AM

Reimagining Impact: The Quadruple Bottom Line For too long, the conversation has been dominated by a narrow view of AI’s impact, primarily focused on the bottom line of profit. However, a more holistic approach recognizes the interconnectedness of bu

5 Game-Changing Quantum Computing Use Cases You Should Know About5 Game-Changing Quantum Computing Use Cases You Should Know AboutApr 17, 2025 am 11:24 AM

Things are moving steadily towards that point. The investment pouring into quantum service providers and startups shows that industry understands its significance. And a growing number of real-world use cases are emerging to demonstrate its value out

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
1 months agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
1 months agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
1 months agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Chat Commands and How to Use Them
1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

SecLists

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

PhpStorm Mac version

PhpStorm Mac version

The latest (2018.2.1) professional PHP integrated development tool

DVWA

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

Dreamweaver Mac version

Dreamweaver Mac version

Visual web development tools

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools