Technology peripherals

Tesla Dojo supercomputing architecture details disclosed for the first time! 'Fucked to pieces' for autonomous driving

Tesla Dojo supercomputing architecture details disclosed for the first time! 'Fucked to pieces' for autonomous driving

Apr 11, 2023 pm 09:46 PM

chipTesla

To meet the growing demand for artificial intelligence and machine learning models, Tesla created its own artificial intelligence technology to teach Tesla cars to drive themselves.

Recently, Tesla disclosed a large number of details about the Dojo supercomputing architecture at the Hot Chips 34 conference.

Essentially, Dojo is a giant composable supercomputer built from a completely custom architecture covering computation, networking, input/output (I/O) chip to instruction set architecture (ISA), power delivery, packaging and cooling. All of this is done to run custom, specific machine learning training algorithms at scale.

Ganesh Venkataramanan is Tesla’s senior director of autonomous driving hardware and is responsible for the Dojo project and AMD’s CPU design team. At the Hot Chips 34 conference, he and a group of chip, system and software engineers unveiled many of the machine's architectural features for the first time.

Data Center "Sandwich"

" Generally speaking, our process of manufacturing chips is to put them on the package and put the package on the printed circuit board , and then it goes into the system. The system goes into the rack," Venkataramanan said.

But there’s a problem with this process: every time data moves from the chip to the package and off the package, there’s latency and bandwidth loss.

To get around these limitations, Venkataramanan and his team decided to start from scratch.

Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

Thus, Dojo’s training tiles were born.

This is a self-contained computing cluster that takes up half a cubic foot and is capable of 556TFLOPS of FP32 performance in a 15kW liquid-cooled package.

Each tile is equipped with 11GB of SRAM and is connected via a 9TB/s fabric using a custom transport protocol throughout the stack.

Venkataramanan said: "This training board represents an unmatched level of integration from computer to memory, to power delivery, to communications, without the need for any additional switches."

The core of the training tile is Tesla’s D1, a 50 billion transistor chip based on TSMC’s 7nm process. Tesla says each D1 is capable of achieving 22TFLOPS of FP32 performance at a TDP of 400W.

Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

Tesla then used 25 D1s, divided them into known good molds, and then used TSMC's on-wafer system technology Wrap them up to enable massive computing integration with extremely low latency and extremely high bandwidth.

However, the system design and vertical stacking architecture on the chip bring challenges to power delivery.

According to Venkataramanan, most current accelerators place the power supply directly next to the silicon wafer. He explained that this approach, while effective, meant that a large portion of the accelerator had to be dedicated to these components, which was impractical for Dojo. Therefore, Tesla chose to provide power directly through the bottom of the chip.

In addition, Tesla has also developed the Dojo Interface Processor (DIP), which is the bridge between the host CPU and the training processor.

Each DIP has 32GB of HBM, and up to five of these cards can be connected to a training tile at 900GB/s for a total of 4.5TB/s amount, each tile has a total of 160GB HBM.

Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

Tesla’s V1 configuration pairs these tiles – or 150 D1 dies – in an array to support four host CPUs , equipped with five DIP cards per host CPU to achieve an exaflop of claimed BF16 or CFP8 performance.

Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

Software

Such a specialized computing architecture requires a specialized software stack. However, Venkataramanan and his team recognized that programmability would determine Dojo's success or failure.

"When we design these systems, ease of programmability by software peers is paramount. Researchers don't wait for your software folks to write a handwritten kernel to accommodate the new algorithms we want to run. "

In order to do this, Tesla gave up the idea of using the kernel and designed Dojo's architecture around the compiler.

"What we do is we use PiTorch. We create a middle layer that helps us parallelize to scale the hardware underneath it. Underneath everything is compiled code. "In order to create a software stack that can adapt to any future workload, this is the only way.

Despite emphasizing the flexibility of the software, Venkataramanan pointed out that the platform currently running in their lab is currently limited to Tesla.

Dojo Architecture Overview

After reading the above, let us take a deeper look at the Dojo architecture.

Tesla has an exascale artificial intelligence system for machine learning. Tesla has enough capital to hire employees and build chips and systems specifically for its applications, just like Tesla's in-car systems.

Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

Tesla is not only building its own AI chip, but also a supercomputer.

Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

Distributed system analysis

Each node of Dojo has Own CPU, memory and communication interfaces.

Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

Dojo node

This is the processing pipeline of the Dojo processor.

Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

Processing Pipeline

Each node has 1.25MB of SRAM. In AI training and inference chips, a common technique is to co-locate memory with computation to minimize data transfers, which are very expensive from a power and performance perspective.

Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

Node memory

Then each node is connected to a 2D grid.

Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

Network Interface

This is an overview of the data path.

Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

Data Path

Here is an example of what the chip can do list parsing.

Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

List parsing

More about the instruction set here , is a Tesla original, rather than a typical Intel, Arm, NVIDIA or AMD CPU/GPU instruction set.

Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

Instruction set

In artificial intelligence, arithmetic format is very important, especially what the chip supports Format. Using DOJO, Tesla can study common formats such as FP32, FP16, and BFP16. These are common industry formats.

Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

Arithmetic Format

Tesla is also working on configurable FP8 or CFP8. It comes in 4/3 and 5/2 range options. This is similar to the NVIDIA H100 Hopper configuration of FP8. We also see the Untether.AI Boqueria 1458 RISC-V core AI accelerator focusing on different FP8 types.

Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

Arithmetic Format 2

Dojo also has a different CFP16 format, to achieve higher accuracy and support FP32, BFP16, CFP8 and CFP16.

Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

Arithmetic Format 3

These cores are then integrated into the fabricated in the mold. Tesla's D1 chip is manufactured by TSMC using a 7nm process. Each chip has 354 Dojo processing nodes and 440MB of SRAM.

Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

First Integration Box D1 Mold

These D1 chips are packaged in On a dojo training tile. The D1 chips are tested and then assembled into a 5×5 tile. These tiles have 4.5TB/s bandwidth per edge. They also have a power delivery envelope of 15kW per module, or roughly 600W per D1 chip after subtracting the power used by the 40 I/O dies. The comparison shows why something like Lightmatter Passage would be more attractive if a company didn't want to design such a thing.

Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

Secondary integration box Dojo training tile

Dojo interface The processor is located at the edge of the 2D grid. Each training block has 11GB of SRAM and 160GB of shared DRAM.

Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

Dojo system topology

The following is the 2D network connecting the processing nodes Grid bandwidth data.

Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

Dojo system communication logic two-dimensional grid

Each DIP Provides a 32GB/s link to the host system.

Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

##Dojo system communication PCIe link DIP and host

Tesla also has Z-plane links for longer routes. In the rest of the speech, Tesla talked about system-level innovation.

Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

Communication mechanism

This is the delay boundary of die and tiles, That's why they are handled differently in Dojo. The reason Z-plane links are needed is that long paths are expensive.

Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

Dojo system communication mechanism

Any processing node can cross the system Access data. Each node can push or pull data to SRAM or DRAM.

Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

Dojo system batch communication

Dojo uses a flat addressing scheme communication.

Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

System Network 1

These chips can be bypassed in software Wrong processing node.

Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

System Network 2

This means that the software must understand the system topology .

Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

System Network 3

Dojo does not guarantee end-to-end traffic ordering , so packets need to be counted at the destination.

Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

System Network 4

Here's how packets are counted into the system part of synchronization.

Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

System synchronization

The compiler needs to define a Tree

. Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

System synchronization 2

Tesla said that one exa-pod has more than 1 million CPU (or compute node). These are large systems.

Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

Summary

Tesla built the Dojo specifically to work at scale. Typically, startups look to build one or a few AI chips per system. Clearly, Tesla is focused on greater scale.

In many ways, it makes sense for Tesla to have a huge AI training ground. What's even more exciting is that it's not only using commercially available systems, but it's also building its own chips and systems. Some ISAs on the scalar side are borrowed from RISC-V, but the vector side and many of the architectures Tesla has customized, so this requires a lot of work.

The above is the detailed content of Tesla Dojo supercomputing architecture details disclosed for the first time! 'Fucked to pieces' for autonomous driving. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete

Related Article

How to Run LLM Locally Using LM Studio? - Analytics Vidhya

How to Run LLM Locally Using LM Studio? - Analytics VidhyaApr 19, 2025 am 11:38 AM

Running large language models at home with ease: LM Studio User Guide In recent years, advances in software and hardware have made it possible to run large language models (LLMs) on personal computers. LM Studio is an excellent tool to make this process easy and convenient. This article will dive into how to run LLM locally using LM Studio, covering key steps, potential challenges, and the benefits of having LLM locally. Whether you are a tech enthusiast or are curious about the latest AI technologies, this guide will provide valuable insights and practical tips. Let's get started! Overview Understand the basic requirements for running LLM locally. Set up LM Studi on your computer

Guy Peri Helps Flavor McCormick's Future Through Data Transformation

Guy Peri Helps Flavor McCormick's Future Through Data TransformationApr 19, 2025 am 11:35 AM

Guy Peri is McCormick’s Chief Information and Digital Officer. Though only seven months into his role, Peri is rapidly advancing a comprehensive transformation of the company’s digital capabilities. His career-long focus on data and analytics informs

What is the Chain of Emotion in Prompt Engineering? - Analytics Vidhya

What is the Chain of Emotion in Prompt Engineering? - Analytics VidhyaApr 19, 2025 am 11:33 AM

Introduction Artificial intelligence (AI) is evolving to understand not just words, but also emotions, responding with a human touch. This sophisticated interaction is crucial in the rapidly advancing field of AI and natural language processing. Th

12 Best AI Tools for Data Science Workflow - Analytics Vidhya

12 Best AI Tools for Data Science Workflow - Analytics VidhyaApr 19, 2025 am 11:31 AM

Introduction In today's data-centric world, leveraging advanced AI technologies is crucial for businesses seeking a competitive edge and enhanced efficiency. A range of powerful tools empowers data scientists, analysts, and developers to build, depl

AV Byte: OpenAI's GPT-4o Mini and Other AI Innovations

AV Byte: OpenAI's GPT-4o Mini and Other AI InnovationsApr 19, 2025 am 11:30 AM

This week's AI landscape exploded with groundbreaking releases from industry giants like OpenAI, Mistral AI, NVIDIA, DeepSeek, and Hugging Face. These new models promise increased power, affordability, and accessibility, fueled by advancements in tr

Perplexity's Android App Is Infested With Security Flaws, Report Finds

Perplexity's Android App Is Infested With Security Flaws, Report FindsApr 19, 2025 am 11:24 AM

But the company’s Android app, which offers not only search capabilities but also acts as an AI assistant, is riddled with a host of security issues that could expose its users to data theft, account takeovers and impersonation attacks from malicious

Everyone's Getting Better At Using AI: Thoughts On Vibe Coding

Everyone's Getting Better At Using AI: Thoughts On Vibe CodingApr 19, 2025 am 11:17 AM

You can look at what’s happening in conferences and at trade shows. You can ask engineers what they’re doing, or consult with a CEO. Everywhere you look, things are changing at breakneck speed. Engineers, and Non-Engineers What’s the difference be

Rocket Launch Simulation and Analysis using RocketPy - Analytics Vidhya

Rocket Launch Simulation and Analysis using RocketPy - Analytics VidhyaApr 19, 2025 am 11:12 AM

Simulate Rocket Launches with RocketPy: A Comprehensive Guide This article guides you through simulating high-power rocket launches using RocketPy, a powerful Python library. We'll cover everything from defining rocket components to analyzing simula

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

3 weeks agoByDDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

2 weeks agoByDDD

Where to find the Crane Control Keycard in Atomfall

3 weeks agoByDDD

Saving in R.E.P.O. Explained (And Save Files)

1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows - How To Find The Blacksmith And Unlock Weapon And Armour Customisation

4 weeks agoByDDD

Hot Tools

SecLists

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

WebStorm Mac version

WebStorm Mac version

Useful JavaScript development tools

ZendStudio 13.5.1 Mac

ZendStudio 13.5.1 Mac

Powerful PHP integrated development environment

Safe Exam Browser

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

MinGW - Minimalist GNU for Windows

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

Hot Topics

Where is the login entrance for gmail email?

7569

15

CakePHP Tutorial

1386

52

What is the format of the account name of steam

87

11

win11 activation key permanent

61

19

nyt connections hints and answers

28

107