search
HomeTechnology peripheralsAIPyTorch 2.0 official version released! One line of code speeds up 2 times, 100% backwards compatible

The official version of PyTorch 2.0 is finally here!

PyTorch 2.0 official version released! One line of code speeds up 2 times, 100% backwards compatible

## Last December, the PyTorch Foundation released the first preview version of PyTorch 2.0 at the PyTorch Conference 2022.

# Compared with the previous version 1.0, 2.0 has undergone subversive changes. In PyTorch 2.0, the biggest improvement is torch.compile.

The new compiler can generate code on-the-fly much faster than the default "eager mode" in PyTorch 1.0, further improving PyTorch performance. .

PyTorch 2.0 official version released! One line of code speeds up 2 times, 100% backwards compatible

In addition to 2.0, a series of beta updates for the PyTorch domain libraries have been released, including those in the tree library, as well as standalone libraries including TorchAudio, TorchVision and TorchText. Updates to TorchX are also released at the same time to provide community support mode.

PyTorch 2.0 official version released! One line of code speeds up 2 times, 100% backwards compatible

Highlight Summary

-torch.compile is the main API of PyTorch 2.0, it wraps and returns For compiled models, torch.compile is a completely add-on (and optional) feature, so version 2.0 is 100% backwards compatible.

- As the underlying technology of torch.compile, TorchInductor with Nvidia and AMD GPUs will rely on the OpenAI Triton deep learning compiler to generate high-performance code and Hide low-level hardware details. The performance of kernel implementations generated by OpenAI Triton is comparable to handwritten kernels and specialized CUDA libraries such as cublas.

- Accelerated Transformers introduces high-performance support for training and inference, using a custom kernel architecture to implement Scaled Dot Product Attention (SPDA). The API is integrated with torch.compile(), and model developers can also use scaled dot product attention kernels directly by calling the new scaled_dot_product_attention() operator.

- Metal Performance Shaders (MPS) backend provides GPU-accelerated PyTorch training on the Mac platform and adds support for the top 60 most commonly used operations , covering more than 300 operators.

- Amazon AWS optimizes PyTorch CPU inference on C7g instances based on AWS Graviton3. PyTorch 2.0 improves Graviton’s inference performance compared to previous versions, including improvements to Resnet50 and Bert.

- New prototyping features and techniques across TensorParallel, DTensor, 2D parallel, TorchDynamo, AOTAutograd, PrimTorch and TorchInductor.

PyTorch 2.0 official version released! One line of code speeds up 2 times, 100% backwards compatible

Compile, still compile!

The latest compiler technologies in PyTorch 2.0 include: TorchDynamo, AOTAutograd, PrimTorch and TorchInductor. All of this is developed in Python, not C (with which Python is compatible).

It also supports dynamic shape, which can send vectors of different sizes without recompiling. It is flexible and easy to learn.

    TorchDynamo​
It can safely obtain PyTorch programs with the help of Python Frame Evaluation Hooks. This major innovation is PyTorch’s safety graph structure capture (safe A summary of research and development results in graph capture).

    AOTAutograd​
Overload the PyTorch autograd engine as a tracing autodiff for generating advanced backward traces.

    PrimTorch​
The 2000 PyTorch operators are summarized into about 250 primitive operator closed sets. Developers can build a complete PyTorch backend. PrimTorch greatly simplifies the process of writing PyTorch functions or backends.

  • TorchInductor​

TorchInductor is a deep learning compiler that can generate fast code for multiple accelerators and backends. For NVIDIA GPUs, it uses OpenAI Triton as a key building block.

The PyTorch Foundation said that the launch of 2.0 will promote "the return from C to Python", adding that this is a substantial new direction for PyTorch.

# "We knew the performance limits of "eager execution" from day one. In July 2017, we started our first research project, developing a compiler for PyTorch. The compiler needs to make PyTorch programs run quickly, but not at the expense of the PyTorch experience, while retaining flexibility and ease of use, so that researchers can use dynamic models and programs at different stages of exploration. "

# Of course, the non-compiled "eager mode" uses a dynamic real-time code generator and is still available in 2.0. Developers can use the porch.compile command to quickly upgrade to compiled mode by adding just one line of code.

# Users can see that the compilation time of 2.0 is increased by 43% compared to 1.0.

This data comes from the PyTorch Foundation’s benchmark test on 163 open source models using PyTorch 2.0 on Nvidia A100 GPU, including image classification, target detection , image generation and other tasks, as well as various NLP tasks.

These Benchmarks are divided into three categories: HuggingFace Transformers, TIMM and TorchBench.

PyTorch 2.0 official version released! One line of code speeds up 2 times, 100% backwards compatible

##NVIDIA A100 GPU eager mode torch.compile speed-up performance for different models

According to the PyTorch Foundation, the new compiler runs 21% faster when using Float32 precision mode and 51% faster when using automatic mixed precision (AMP) mode.

Among these 163 models, torch.compile can run normally on 93% of the models.

"In the PyTorch 2.x roadmap, we hope to take the compilation model further and further in terms of performance and scalability. There is still some work It didn't start. Some work couldn't be completed due to insufficient bandwidth."

PyTorch 2.0 official version released! One line of code speeds up 2 times, 100% backwards compatible

Training LLM to speed up 2 times

In addition, performance is another major focus of PyTorch 2.0, and it is also a focus that developers have been generous in promoting.

#In fact, one of the highlights of the new feature is Accelerated Transformers, previously known as Better Transformers.

In addition, the official version of PyTorch 2.0 includes a new high-performance PyTorch TransformAPI implementation.

One of the goals of the PyTorch project is to make training and deployment of state-of-the-art transformer models easier and faster.

Transformers are the basic technology that helps realize the modern era of generative artificial intelligence, including OpenAI models such as GPT-3 and GPT-4.

PyTorch 2.0 official version released! One line of code speeds up 2 times, 100% backwards compatible

##In PyTorch 2.0 Accelerated Transformers, a custom kernel architecture approach (also known as scaled dot product Attention SDPA), providing high-performance support for training and inference.

Since there are many types of hardware that can support Transformers, PyTorch 2.0 can support multiple SDPA custom kernels. Going a step further, PyTorch integrates custom kernel selection logic that will select the highest performing kernel for a given model and hardware type.

#The impact of the acceleration is significant, as it helps enable developers to train models faster than in previous iterations of PyTorch.

The new version enables high-performance support for training and inference, using a customized kernel architecture to handle Scaled Dot Product Attention (SPDA), extending inference fast path architecture.

Similar to the fastpath architecture, the custom kernel is fully integrated into the PyTorch Transformer API - therefore, using the native Transformer and MultiHeadAttention API will enable users to:

- See significant speed improvements;

- Support more use cases, including using cross-attention Models, Transformer decoders and training models;

# - Continue to use fast path inference for fixed and variable sequence length transformer encoders and self-attention Use cases for force mechanisms.

To take full advantage of different hardware models and Transformer use cases, multiple SDPA custom cores are supported and custom core selection logic will be picked for specific models and hardware types Highest performance core.

#In addition to the existing Transformer API, developers can also directly use scaled dot product attention attention kernels to speed up PyTorch by calling the new scaled_dot_product_attention() operator. 2 Transformers are integrated with torch.compile().

In order to get the additional acceleration of PT2 compilation (for inference or training) while using the model, you can use model = torch.compile(model ) to preprocess the model.

Currently, a combination of custom kernels and torch.compile() has been used to train Transformer models, especially large languages ​​using the accelerated PyTorch 2 Transformer Substantial acceleration has been achieved in models.

PyTorch 2.0 official version released! One line of code speeds up 2 times, 100% backwards compatible

Using a custom kernel and torch.compile to provide significant acceleration for large language model training

Sylvain Gugger, the main maintainer of HuggingFace Transformers, wrote in a statement released by the PyTorch project, "With just one line of code, PyTorch 2.0 can provide 1.5x better performance when training Transformers models. to 2.0x speedup. This is the most exciting thing since the introduction of mixed precision training!"

PyTorch and Google's TensorFlow are the two most popular deep learning framework. Thousands of institutions around the world are using PyTorch to develop deep learning applications, and its usage is growing.

The launch of PyTorch 2.0 will help accelerate the development of deep learning and artificial intelligence applications, said Chief Technology Officer of Lightning AI and one of the main maintainers of PyTorch Lightning Luca Antiga said:

## "PyTorch 2.0 embodies the future of deep learning frameworks. No user intervention is required to capture PyTorch programs, which can be used out of the box. Generation, along with huge device acceleration, this possibility opens up a whole new dimension for AI developers."

References:

https://www.php.cn/link/d6f84c02e2a54908d96f410083beb6e0

https://www.php.cn/link/89b9e0a6f6d1505fe13dea0f18a2dcfa

https://www.php.cn/link/3b2acfe2e38102074656ed938abf4ac3


The above is the detailed content of PyTorch 2.0 official version released! One line of code speeds up 2 times, 100% backwards compatible. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete
Most Used 10 Power BI Charts - Analytics VidhyaMost Used 10 Power BI Charts - Analytics VidhyaApr 16, 2025 pm 12:05 PM

Harnessing the Power of Data Visualization with Microsoft Power BI Charts In today's data-driven world, effectively communicating complex information to non-technical audiences is crucial. Data visualization bridges this gap, transforming raw data i

Expert Systems in AIExpert Systems in AIApr 16, 2025 pm 12:00 PM

Expert Systems: A Deep Dive into AI's Decision-Making Power Imagine having access to expert advice on anything, from medical diagnoses to financial planning. That's the power of expert systems in artificial intelligence. These systems mimic the pro

Three Of The Best Vibe Coders Break Down This AI Revolution In CodeThree Of The Best Vibe Coders Break Down This AI Revolution In CodeApr 16, 2025 am 11:58 AM

First of all, it’s apparent that this is happening quickly. Various companies are talking about the proportions of their code that are currently written by AI, and these are increasing at a rapid clip. There’s a lot of job displacement already around

Runway AI's Gen-4: How Can AI Montage Go Beyond AbsurdityRunway AI's Gen-4: How Can AI Montage Go Beyond AbsurdityApr 16, 2025 am 11:45 AM

The film industry, alongside all creative sectors, from digital marketing to social media, stands at a technological crossroad. As artificial intelligence begins to reshape every aspect of visual storytelling and change the landscape of entertainment

How to Enroll for 5 Days ISRO AI Free Courses? - Analytics VidhyaHow to Enroll for 5 Days ISRO AI Free Courses? - Analytics VidhyaApr 16, 2025 am 11:43 AM

ISRO's Free AI/ML Online Course: A Gateway to Geospatial Technology Innovation The Indian Space Research Organisation (ISRO), through its Indian Institute of Remote Sensing (IIRS), is offering a fantastic opportunity for students and professionals to

Local Search Algorithms in AILocal Search Algorithms in AIApr 16, 2025 am 11:40 AM

Local Search Algorithms: A Comprehensive Guide Planning a large-scale event requires efficient workload distribution. When traditional approaches fail, local search algorithms offer a powerful solution. This article explores hill climbing and simul

OpenAI Shifts Focus With GPT-4.1, Prioritizes Coding And Cost EfficiencyOpenAI Shifts Focus With GPT-4.1, Prioritizes Coding And Cost EfficiencyApr 16, 2025 am 11:37 AM

The release includes three distinct models, GPT-4.1, GPT-4.1 mini and GPT-4.1 nano, signaling a move toward task-specific optimizations within the large language model landscape. These models are not immediately replacing user-facing interfaces like

The Prompt: ChatGPT Generates Fake PassportsThe Prompt: ChatGPT Generates Fake PassportsApr 16, 2025 am 11:35 AM

Chip giant Nvidia said on Monday it will start manufacturing AI supercomputers— machines that can process copious amounts of data and run complex algorithms— entirely within the U.S. for the first time. The announcement comes after President Trump si

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
1 months agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
1 months agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Chat Commands and How to Use Them
1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

ZendStudio 13.5.1 Mac

ZendStudio 13.5.1 Mac

Powerful PHP integrated development environment

DVWA

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

SublimeText3 English version

SublimeText3 English version

Recommended: Win version, supports code prompts!

WebStorm Mac version

WebStorm Mac version

Useful JavaScript development tools

SublimeText3 Linux new version

SublimeText3 Linux new version

SublimeText3 Linux latest version