search
HomeTechnology peripheralsAIGoogle is ecstatic: JAX performance surpasses Pytorch and TensorFlow! It may become the fastest choice for GPU inference training

The performance of JAX, promoted by Google, has surpassed that of Pytorch and TensorFlow in recent benchmark tests, ranking first in 7 indicators.

Google is ecstatic: JAX performance surpasses Pytorch and TensorFlow! It may become the fastest choice for GPU inference training

And the test was not completed on the TPU with the best JAX performance.

Google is ecstatic: JAX performance surpasses Pytorch and TensorFlow! It may become the fastest choice for GPU inference training

Although now among developers, Pytorch is still more popular than Tensorflow.

Google is ecstatic: JAX performance surpasses Pytorch and TensorFlow! It may become the fastest choice for GPU inference training

But in the future, perhaps more large models will be trained and run based on the JAX platform.

Google is ecstatic: JAX performance surpasses Pytorch and TensorFlow! It may become the fastest choice for GPU inference training

Model

Recently, the Keras team implemented and paired the three backends (TensorFlow, JAX, PyTorch) with native PyTorch TensorFlow's Keras 2 was benchmarked.

First, they selected a set of mainstream computer vision and natural language processing models for generative and non-generative AI tasks:

Google is ecstatic: JAX performance surpasses Pytorch and TensorFlow! It may become the fastest choice for GPU inference training

For the Keras version of the model, it is built using the existing implementations in KerasCV and KerasNLP. For the native PyTorch version, we chose the most popular options on the Internet:

- BERT, Gemma, Mistral from HuggingFace Transformers

- StableDiffusion from HuggingFace Diffusers

- SegmentAnything from Meta

They call this set of models "Native PyTorch" to distinguish it from the Keras 3 version that uses the PyTorch backend.

They used synthetic data for all benchmarks and used bfloat16 precision in all LLM training and inference, while using LoRA (fine-tuning) in all LLM training.

According to the suggestion of the PyTorch team, they used torch.compile(model, mode="reduce-overhead") in the native PyTorch implementation (except for Gemma and Mistral training due to incompatibility ).

To measure out-of-the-box performance, they use high-level APIs (such as HuggingFace’s Trainer(), standard PyTorch training loops, and Keras model.fit()) and minimize configuration.

Hardware configuration

All benchmark tests were conducted using Google Cloud Compute Engine, configured as: an NVIDIA A100 GPU with 40GB of video memory, 12 virtual CPUs and 85GB Host memory.

Benchmark Results

Table 2 shows the benchmark results in steps/ms. Each step involves training or prediction on a single batch of data.

The result is the average of 100 steps, but the first step is excluded because the first step includes model creation and compilation, which takes extra time.

To ensure a fair comparison, the same batch size is used for the same model and task (whether training or inference).

However, for different models and tasks, due to their different scale and architecture, the data batch size can be adjusted as needed to avoid memory overflow due to being too large, or The batch size is too small and the GPU is underutilized.

A batch size that is too small can also make PyTorch appear slower because it increases Python overhead.

For the large language models (Gemma and Mistral), the same batch size was also used when testing because they are the same type of model with a similar number of parameters (7B).

Considering users’ needs for single-batch text generation, a benchmark test was also conducted on text generation with a batch size of 1.

Google is ecstatic: JAX performance surpasses Pytorch and TensorFlow! It may become the fastest choice for GPU inference training

Key findings

Discover that 1

There is no "optimal" end.

The three backends of Keras each have their own strengths. The important thing is that in terms of performance, no one backend can always win.

Choosing which backend is the fastest often depends on the architecture of the model.

This point highlights the importance of choosing different frameworks to pursue optimal performance. Keras 3 makes it easy to switch backends to find the best fit for your model.

Found 2

The performance of Keras 3 generally exceeds the standard implementation of PyTorch.

Compared to native PyTorch, Keras 3 has a significant improvement in throughput (steps/ms).

In particular, in 5 of the 10 test tasks, the speed increase exceeded 50%. Among them, the highest reached 290%.

Google is ecstatic: JAX performance surpasses Pytorch and TensorFlow! It may become the fastest choice for GPU inference training

If it is 100%, it means that Keras 3 is twice as fast as PyTorch; if it is 0%, it means that the performance of the two is equivalent

Discover 3

Keras 3 delivers best-in-class performance “out of the box”.

In other words, all Keras models participating in the test have not been optimized in any way. In contrast, when using native PyTorch implementation, users usually need to perform more performance optimizations on their own.

In addition to the data shared above, it was also noticed during the test that when upgrading the StableDiffusion inference function of HuggingFace Diffusers from version 0.25.0 to 0.3.0, the performance improved by more than 100% .

Similarly, in HuggingFace Transformers, upgrading Gemma from version 4.38.1 to version 4.38.2 also significantly improved performance.

These performance improvements highlight HuggingFace’s focus and efforts in performance optimization.

For some models with less manual optimization, such as SegmentAnything, the implementation provided by the study author is used. In this case, the performance gap compared to Keras is larger than most other models.

This shows that Keras can provide excellent out-of-the-box performance, and users can enjoy fast model running speeds without having to delve into all optimization techniques.

Found 4

Keras 3 consistently outperforms Keras 2.

For example, SegmentAnything’s inference speed has increased by an astonishing 380%, StableDiffusion’s training processing speed has increased by more than 150%, and BERT’s training processing speed has also increased by more than 100%.

This is mainly because Keras 2 directly uses more TensorFlow fusion operations in some cases, which may not be the best choice for XLA compilation.

It’s worth noting that even just upgrading to Keras 3 and continuing to use the TensorFlow backend can result in significant performance improvements.

Google is ecstatic: JAX performance surpasses Pytorch and TensorFlow! It may become the fastest choice for GPU inference training

Conclusion

The performance of the framework depends largely on the specific model used.

Keras 3 can help choose the fastest framework for the task, and this choice will almost always outperform Keras 2 and PyTorch implementations.

More importantly, Keras 3 models provide excellent out-of-the-box performance without complex underlying optimizations.

The above is the detailed content of Google is ecstatic: JAX performance surpasses Pytorch and TensorFlow! It may become the fastest choice for GPU inference training. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete
Tool Calling in LLMsTool Calling in LLMsApr 14, 2025 am 11:28 AM

Large language models (LLMs) have surged in popularity, with the tool-calling feature dramatically expanding their capabilities beyond simple text generation. Now, LLMs can handle complex automation tasks such as dynamic UI creation and autonomous a

How ADHD Games, Health Tools & AI Chatbots Are Transforming Global HealthHow ADHD Games, Health Tools & AI Chatbots Are Transforming Global HealthApr 14, 2025 am 11:27 AM

Can a video game ease anxiety, build focus, or support a child with ADHD? As healthcare challenges surge globally — especially among youth — innovators are turning to an unlikely tool: video games. Now one of the world’s largest entertainment indus

UN Input On AI: Winners, Losers, And OpportunitiesUN Input On AI: Winners, Losers, And OpportunitiesApr 14, 2025 am 11:25 AM

“History has shown that while technological progress drives economic growth, it does not on its own ensure equitable income distribution or promote inclusive human development,” writes Rebeca Grynspan, Secretary-General of UNCTAD, in the preamble.

Learning Negotiation Skills Via Generative AILearning Negotiation Skills Via Generative AIApr 14, 2025 am 11:23 AM

Easy-peasy, use generative AI as your negotiation tutor and sparring partner. Let’s talk about it. This analysis of an innovative AI breakthrough is part of my ongoing Forbes column coverage on the latest in AI, including identifying and explaining

TED Reveals From OpenAI, Google, Meta Heads To Court, Selfie With MyselfTED Reveals From OpenAI, Google, Meta Heads To Court, Selfie With MyselfApr 14, 2025 am 11:22 AM

The ​TED2025 Conference, held in Vancouver, wrapped its 36th edition yesterday, April 11. It featured 80 speakers from more than 60 countries, including Sam Altman, Eric Schmidt, and Palmer Luckey. TED’s theme, “humanity reimagined,” was tailor made

Joseph Stiglitz Warns Of The Looming Inequality Amid AI Monopoly PowerJoseph Stiglitz Warns Of The Looming Inequality Amid AI Monopoly PowerApr 14, 2025 am 11:21 AM

Joseph Stiglitz is renowned economist and recipient of the Nobel Prize in Economics in 2001. Stiglitz posits that AI can worsen existing inequalities and consolidated power in the hands of a few dominant corporations, ultimately undermining economic

What is Graph Database?What is Graph Database?Apr 14, 2025 am 11:19 AM

Graph Databases: Revolutionizing Data Management Through Relationships As data expands and its characteristics evolve across various fields, graph databases are emerging as transformative solutions for managing interconnected data. Unlike traditional

LLM Routing: Strategies, Techniques, and Python ImplementationLLM Routing: Strategies, Techniques, and Python ImplementationApr 14, 2025 am 11:14 AM

Large Language Model (LLM) Routing: Optimizing Performance Through Intelligent Task Distribution The rapidly evolving landscape of LLMs presents a diverse range of models, each with unique strengths and weaknesses. Some excel at creative content gen

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
WWE 2K25: How To Unlock Everything In MyRise
1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

MantisBT

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

Atom editor mac version download

Atom editor mac version download

The most popular open source editor

SublimeText3 Linux new version

SublimeText3 Linux new version

SublimeText3 Linux latest version

DVWA

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

mPDF

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),