Google's super AI supercomputer crushes NVIDIA A100! TPU v4 performance increased by 10 times, details disclosed for the first time-AI-php.cn

Google's super AI supercomputer crushes NVIDIA A100! TPU v4 performance increased by 10 times, details disclosed for the first time

PHPz

Apr 07, 2023 pm 02:54 PM

Googleai

Although Google deployed the most powerful AI chip at the time, TPU v4, in its own data center as early as 2020.

But it was not until April 4 this year that Google announced the technical details of this AI supercomputer for the first time.

Googles super AI supercomputer crushes NVIDIA A100! TPU v4 performance increased by 10 times, details disclosed for the first time

##Paper address: https://arxiv.org/abs/2304.01433

Compared with TPU v3, the performance of TPU v4 is 2.1 times higher, and after integrating 4096 chips, the performance of the supercomputer is increased by 10 times.

Googles super AI supercomputer crushes NVIDIA A100! TPU v4 performance increased by 10 times, details disclosed for the first time

In addition, Google also claims that its own chip is faster and more energy-efficient than Nvidia A100.

Googles super AI supercomputer crushes NVIDIA A100! TPU v4 performance increased by 10 times, details disclosed for the first time

Compete with A100, 1.7 times faster

In the paper, Google stated that for systems of comparable size, TPU v4 It can provide 1.7 times better performance than NVIDIA A100, while also improving energy efficiency by 1.9 times.

In addition, Google’s supercomputing speed is about 4.3 times to 4.5 times faster than Graphcore IPU Bow.

Google demonstrated the TPU v4 package, as well as 4 packages mounted on the circuit board.

Like TPU v3, each TPU v4 contains two TensorCore (TC). Each TC contains four 128x128 matrix multiplication units (MXU), a vector processing unit (VPU) with 128 channels (16 ALUs per channel), and 16 MiB vector memory (VMEM).

Two TCs share a 128 MiB common memory (CMEM).

Googles super AI supercomputer crushes NVIDIA A100! TPU v4 performance increased by 10 times, details disclosed for the first time

It is worth noting that the A100 chip and Google’s fourth-generation TPU were launched at the same time, so how is their specific performance compared?

Google demonstrated the fastest performance of each DSA on 5 MLPerf benchmarks separately. These include BERT, ResNET, DLRM, RetinaNet, and MaskRCNN.

Among them, Graphcore IPU submitted results in BERT and ResNET.

Googles super AI supercomputer crushes NVIDIA A100! TPU v4 performance increased by 10 times, details disclosed for the first time

The results of the two systems on ResNet and BERT are shown below. The dotted lines between the points are interpolations based on the number of chips.

MLPerf results for both TPU v4 and A100 scale to larger systems than the IPU (4096 chips vs. 256 chips).

For similarly sized systems, TPU v4 is 1.15 times faster than A100 on BERT and approximately 4.3 times faster than IPU. For ResNet, TPU v4 is 1.67x and about 4.5x faster respectively.

Googles super AI supercomputer crushes NVIDIA A100! TPU v4 performance increased by 10 times, details disclosed for the first time

For power usage on the MLPerf benchmark, the A100 used 1.3x to 1.9x more power on average.

Googles super AI supercomputer crushes NVIDIA A100! TPU v4 performance increased by 10 times, details disclosed for the first time

Does peak floating point operations per second predict actual performance? Many people in the machine learning field believe that peak floating point operations per second is a good proxy for performance, but in fact it is not.

For example, TPU v4 is 4.3x to 4.5x faster on two MLPerf benchmarks than IPU Bow on the same size system, despite only having a 1.10x advantage in peak floating point operations per second.

Another example is that the A100's peak floating point operations per second is 1.13 times that of TPU v4, but for the same number of chips, TPU v4 is 1.15 to 1.67 times faster.

The following figure uses the Roofline model to show the relationship between peak FLOPS/second and memory bandwidth.

Googles super AI supercomputer crushes NVIDIA A100! TPU v4 performance increased by 10 times, details disclosed for the first time

So, the question is, why doesn’t Google compare with Nvidia’s latest H100?

Google said that because the H100 was built using newer technology after the launch of Google's chips, it did not compare its fourth-generation product to Nvidia's current flagship H100 chip.

However, Google hinted that it is developing a new TPU to compete with Nvidia H100, but did not provide details. Google researcher Jouppi said in an interview with Reuters that Google has "a production line for future chips."

TPU vs GPU

While ChatGPT and Bard are "fighting it out", the two behemoths are also working hard behind the scenes to keep them running - NVIDIA CUDA support GPU (Graphics Processing Unit) and Google's customized TPU (Tensor Processing Unit).

In other words, this is no longer about ChatGPT vs. Bard, but TPU vs. GPU, and how efficiently they can do matrix multiplication.

Googles super AI supercomputer crushes NVIDIA A100! TPU v4 performance increased by 10 times, details disclosed for the first time

Due to their excellent design in hardware architecture, NVIDIA’s GPUs are ideally suited for matrix multiplication tasks – effectively switching between multiple CUDA cores Implement parallel processing.

Therefore, since 2012, training models on GPU has become a consensus in the field of deep learning, and it has not changed to this day.

With the launch of NVIDIA DGX, NVIDIA is able to provide one-stop hardware and software solutions for almost all AI tasks, which competitors cannot provide due to lack of intellectual property rights. .

In contrast, Google launched the first-generation tensor processing unit (TPU) in 2016, which not only included a custom ASIC (dedicated integrated circuit), and is also optimized for its own TensorFlow framework. This also gives TPU an advantage in other AI computing tasks besides matrix multiplication, and can even accelerate fine-tuning and inference tasks.

In addition, researchers at Google DeepMind have also found a way to create a better matrix multiplication algorithm-AlphaTensor.

However, even though Google has achieved good results through self-developed technology and emerging AI computing optimization methods, Microsoft and Nvidia’s long-term in-depth cooperation has relied on their respective expertise in the industry. The accumulation of products has simultaneously expanded the competitive advantages of both parties.

Fourth generation TPU

## Back in 2021 at the Google I/O conference, Pichai announced it for the first time Google's latest generation AI chip TPU v4.

"This is the fastest system we have deployed on Google and is a historic milestone for us."

Googles super AI supercomputer crushes NVIDIA A100! TPU v4 performance increased by 10 times, details disclosed for the first time

This improvement has become a key point in the competition between companies building AI supercomputers, because large language models like Google’s Bard or OpenAI’s ChatGPT have been implemented at parameter scale. Explosive growth.

This means that they are far larger than the capacity that a single chip can store, and the demand for computing power is a huge "black hole".

So these large models have to be distributed across thousands of chips, and then those chips have to work together for weeks, or even longer, to train the models.

Currently, Google’s largest language model publicly disclosed so far, PaLM, has 540 billion parameters, which was divided into two 4,000-chip supercomputers for training within 50 days. of.

Google said its supercomputers can easily reconfigure the connections between chips to avoid problems and perform performance tuning.

Google researcher Norm Jouppi and Google distinguished engineer David Patterson wrote in a blog post about the system,

"Circuit switching enables bypassing It becomes easy to overcome failed components. This flexibility even allows us to change the topology of the supercomputer interconnection to accelerate the performance of machine learning models."

Although Google is only now releasing relevant Details of its supercomputer, which has been online since 2020 at a data center located in Mayes County, Oklahoma.

Google said that Midjourney used this system to train its model, and the latest version of V5 allows everyone to see the amazing image generation.

Googles super AI supercomputer crushes NVIDIA A100! TPU v4 performance increased by 10 times, details disclosed for the first time

Recently, Pichai said in an interview with the New York Times that Bard will be transferred from LaMDA to PaLM.

Now with the blessing of TPU v4 supercomputer, Bard will only become stronger.

The above is the detailed content of Google's super AI supercomputer crushes NVIDIA A100! TPU v4 performance increased by 10 times, details disclosed for the first time. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete

A Business Leader's Guide To Generative Engine Optimization (GEO)May 03, 2025 am 11:14 AM

Google is leading this shift. Its "AI Overviews" feature already serves more than one billion users, providing complete answers before anyone clicks a link.[^2] Other players are also gaining ground fast. ChatGPT, Microsoft Copilot, and Pe

This Startup Is Using AI Agents To Fight Malicious Ads And Impersonator AccountsMay 03, 2025 am 11:13 AM

In 2022, he founded social engineering defense startup Doppel to do just that. And as cybercriminals harness ever more advanced AI models to turbocharge their attacks, Doppel’s AI systems have helped businesses combat them at scale— more quickly and

How World Models Are Radically Reshaping The Future Of Generative AI And LLMsMay 03, 2025 am 11:12 AM

Voila, via interacting with suitable world models, generative AI and LLMs can be substantively boosted. Let’s talk about it. This analysis of an innovative AI breakthrough is part of my ongoing Forbes column coverage on the latest in AI, including

May Day 2050: What Have We Left To Celebrate?May 03, 2025 am 11:11 AM

Labor Day 2050. Parks across the nation fill with families enjoying traditional barbecues while nostalgic parades wind through city streets. Yet the celebration now carries a museum-like quality — historical reenactment rather than commemoration of c

The Deepfake Detector You've Never Heard Of That's 98% AccurateMay 03, 2025 am 11:10 AM

To help address this urgent and unsettling trend, a peer-reviewed article in the February 2025 edition of TEM Journal provides one of the clearest, data-driven assessments as to where that technological deepfake face off currently stands. Researcher

Quantum Talent Wars: The Hidden Crisis Threatening Tech's Next FrontierMay 03, 2025 am 11:09 AM

From vastly decreasing the time it takes to formulate new drugs to creating greener energy, there will be huge opportunities for businesses to break new ground. There’s a big problem, though: there’s a severe shortage of people with the skills busi

The Prototype: These Bacteria Can Generate ElectricityMay 03, 2025 am 11:08 AM

Years ago, scientists found that certain kinds of bacteria appear to breathe by generating electricity, rather than taking in oxygen, but how they did so was a mystery. A new study published in the journal Cell identifies how this happens: the microb

AI And Cybersecurity: The New Administration's 100-Day ReckoningMay 03, 2025 am 11:07 AM

At the RSAC 2025 conference this week, Snyk hosted a timely panel titled “The First 100 Days: How AI, Policy & Cybersecurity Collide,” featuring an all-star lineup: Jen Easterly, former CISA Director; Nicole Perlroth, former journalist and partne

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

How to fix KB5055523 fails to install in Windows 11?

3 weeks agoByDDD

How to fix KB5055518 fails to install in Windows 10?

3 weeks agoByDDD

Strength Levels for Every Enemy & Monster in R.E.P.O.

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Roblox: Dead Rails - How To Tame Wolves

3 weeks agoByDDD

Blue Prince: How To Get To The Basement

3 weeks agoByDDD

Hot Tools

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software