


Google is ecstatic: JAX performance surpasses Pytorch and TensorFlow! It may become the fastest choice for GPU inference training
The performance of JAX, promoted by Google, has surpassed that of Pytorch and TensorFlow in recent benchmark tests, ranking first in 7 indicators.
And the test was not completed on the TPU with the best JAX performance.
Although now among developers, Pytorch is still more popular than Tensorflow.
But in the future, perhaps more large models will be trained and run based on the JAX platform.
Model
Recently, the Keras team implemented and paired the three backends (TensorFlow, JAX, PyTorch) with native PyTorch TensorFlow's Keras 2 was benchmarked.
First, they selected a set of mainstream computer vision and natural language processing models for generative and non-generative AI tasks:
For the Keras version of the model, it is built using the existing implementations in KerasCV and KerasNLP. For the native PyTorch version, we chose the most popular options on the Internet:
- BERT, Gemma, Mistral from HuggingFace Transformers
- StableDiffusion from HuggingFace Diffusers
- SegmentAnything from Meta
They call this set of models "Native PyTorch" to distinguish it from the Keras 3 version that uses the PyTorch backend.
They used synthetic data for all benchmarks and used bfloat16 precision in all LLM training and inference, while using LoRA (fine-tuning) in all LLM training.
According to the suggestion of the PyTorch team, they used torch.compile(model, mode="reduce-overhead") in the native PyTorch implementation (except for Gemma and Mistral training due to incompatibility ).
To measure out-of-the-box performance, they use high-level APIs (such as HuggingFace’s Trainer(), standard PyTorch training loops, and Keras model.fit()) and minimize configuration.
Hardware configuration
All benchmark tests were conducted using Google Cloud Compute Engine, configured as: an NVIDIA A100 GPU with 40GB of video memory, 12 virtual CPUs and 85GB Host memory.
Benchmark Results
Table 2 shows the benchmark results in steps/ms. Each step involves training or prediction on a single batch of data.
The result is the average of 100 steps, but the first step is excluded because the first step includes model creation and compilation, which takes extra time.
To ensure a fair comparison, the same batch size is used for the same model and task (whether training or inference).
However, for different models and tasks, due to their different scale and architecture, the data batch size can be adjusted as needed to avoid memory overflow due to being too large, or The batch size is too small and the GPU is underutilized.
A batch size that is too small can also make PyTorch appear slower because it increases Python overhead.
For the large language models (Gemma and Mistral), the same batch size was also used when testing because they are the same type of model with a similar number of parameters (7B).
Considering users’ needs for single-batch text generation, a benchmark test was also conducted on text generation with a batch size of 1.
Key findings
Discover that 1
There is no "optimal" end.
The three backends of Keras each have their own strengths. The important thing is that in terms of performance, no one backend can always win.
Choosing which backend is the fastest often depends on the architecture of the model.
This point highlights the importance of choosing different frameworks to pursue optimal performance. Keras 3 makes it easy to switch backends to find the best fit for your model.
Found 2
The performance of Keras 3 generally exceeds the standard implementation of PyTorch.
Compared to native PyTorch, Keras 3 has a significant improvement in throughput (steps/ms).
In particular, in 5 of the 10 test tasks, the speed increase exceeded 50%. Among them, the highest reached 290%.
If it is 100%, it means that Keras 3 is twice as fast as PyTorch; if it is 0%, it means that the performance of the two is equivalent
Discover 3
Keras 3 delivers best-in-class performance “out of the box”.
In other words, all Keras models participating in the test have not been optimized in any way. In contrast, when using native PyTorch implementation, users usually need to perform more performance optimizations on their own.
In addition to the data shared above, it was also noticed during the test that when upgrading the StableDiffusion inference function of HuggingFace Diffusers from version 0.25.0 to 0.3.0, the performance improved by more than 100% .
Similarly, in HuggingFace Transformers, upgrading Gemma from version 4.38.1 to version 4.38.2 also significantly improved performance.
These performance improvements highlight HuggingFace’s focus and efforts in performance optimization.
For some models with less manual optimization, such as SegmentAnything, the implementation provided by the study author is used. In this case, the performance gap compared to Keras is larger than most other models.
This shows that Keras can provide excellent out-of-the-box performance, and users can enjoy fast model running speeds without having to delve into all optimization techniques.
Found 4
Keras 3 consistently outperforms Keras 2.
For example, SegmentAnything’s inference speed has increased by an astonishing 380%, StableDiffusion’s training processing speed has increased by more than 150%, and BERT’s training processing speed has also increased by more than 100%.
This is mainly because Keras 2 directly uses more TensorFlow fusion operations in some cases, which may not be the best choice for XLA compilation.
It’s worth noting that even just upgrading to Keras 3 and continuing to use the TensorFlow backend can result in significant performance improvements.
Conclusion
The performance of the framework depends largely on the specific model used.
Keras 3 can help choose the fastest framework for the task, and this choice will almost always outperform Keras 2 and PyTorch implementations.
More importantly, Keras 3 models provide excellent out-of-the-box performance without complex underlying optimizations.
The above is the detailed content of Google is ecstatic: JAX performance surpasses Pytorch and TensorFlow! It may become the fastest choice for GPU inference training. For more information, please follow other related articles on the PHP Chinese website!

Large language models (LLMs) have surged in popularity, with the tool-calling feature dramatically expanding their capabilities beyond simple text generation. Now, LLMs can handle complex automation tasks such as dynamic UI creation and autonomous a

Can a video game ease anxiety, build focus, or support a child with ADHD? As healthcare challenges surge globally — especially among youth — innovators are turning to an unlikely tool: video games. Now one of the world’s largest entertainment indus

“History has shown that while technological progress drives economic growth, it does not on its own ensure equitable income distribution or promote inclusive human development,” writes Rebeca Grynspan, Secretary-General of UNCTAD, in the preamble.

Easy-peasy, use generative AI as your negotiation tutor and sparring partner. Let’s talk about it. This analysis of an innovative AI breakthrough is part of my ongoing Forbes column coverage on the latest in AI, including identifying and explaining

The TED2025 Conference, held in Vancouver, wrapped its 36th edition yesterday, April 11. It featured 80 speakers from more than 60 countries, including Sam Altman, Eric Schmidt, and Palmer Luckey. TED’s theme, “humanity reimagined,” was tailor made

Joseph Stiglitz is renowned economist and recipient of the Nobel Prize in Economics in 2001. Stiglitz posits that AI can worsen existing inequalities and consolidated power in the hands of a few dominant corporations, ultimately undermining economic

Graph Databases: Revolutionizing Data Management Through Relationships As data expands and its characteristics evolve across various fields, graph databases are emerging as transformative solutions for managing interconnected data. Unlike traditional

Large Language Model (LLM) Routing: Optimizing Performance Through Intelligent Task Distribution The rapidly evolving landscape of LLMs presents a diverse range of models, each with unique strengths and weaknesses. Some excel at creative content gen


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

MantisBT
Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

Atom editor mac version download
The most popular open source editor

SublimeText3 Linux new version
SublimeText3 Linux latest version

DVWA
Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

mPDF
mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),