Home >Technology peripherals >AI >NVIDIA releases ChatGPT dedicated GPU, increasing inference speed by 10 times

NVIDIA releases ChatGPT dedicated GPU, increasing inference speed by 10 times

WBOY
WBOYforward
2023-05-13 23:04:04991browse

Once upon a time, artificial intelligence entered a decades-long bottleneck due to insufficient computing power, and GPU ignited deep learning. In the era of ChatGPT, AI once again faces the problem of insufficient computing power due to large models. Is there any solution for NVIDIA this time?

On March 22, the GTC conference was officially held. At the just-conducted Keynote, NVIDIA CEO Jen-Hsun Huang moved out the chips prepared for ChatGPT.

"Accelerating computing is not easy. In 2012, the computer vision model AlexNet used GeForce GTX 580 and could process 262 PetaFLOPS per second. This model triggered an explosion in AI technology," Huang Renxun said. "Ten years later, Transformer appeared. GPT-3 used 323 ZettaFLOPS of computing power, 1 million times that of AlexNet, to create ChatGPT, an AI that shocked the world. A new computing platform appeared, and the iPhone era of AI has arrived. ."

NVIDIA releases ChatGPT dedicated GPU, increasing inference speed by 10 times

The prosperity of AI has pushed Nvidia's stock price up 77% this year. Currently, Nvidia's market value is US$640 billion, which is Intel's Nearly five times. However, today’s release tells us that Nvidia has not stopped yet.

Designing dedicated computing power for AIGC

The development of generative AI (AIGC) is changing the needs of technology companies for computing power. Nvidia once demonstrated four types of AI-specific computing power Task reasoning platforms, they all use a unified architecture.

NVIDIA releases ChatGPT dedicated GPU, increasing inference speed by 10 times

Among them, NVIDIA L4 provides "120 times higher AI-driven video performance than CPU, and 99% energy efficiency", which can be used for video Streaming, encoding and decoding, and generating AI videos; the more powerful NVIDIA L40 is specially used for 2D/3D image generation.

In response to ChatGPT, which requires huge computing power, NVIDIA has released NVIDIA H100 NVL, a large language model (LLM) dedicated solution with 94GB of memory and accelerated Transformer Engine, equipped with PCIE H100 GPU with dual GPU NVLINK.

NVIDIA releases ChatGPT dedicated GPU, increasing inference speed by 10 times

"Currently the only GPU that can actually handle ChatGPT is the NVIDIA HGX A100. Compared with the former, one is now equipped with four pairs of H100 and dual NVLINK The standard server speed can be 10 times faster, which can reduce the processing cost of large language models by an order of magnitude," Huang said.

Finally there’s NVIDIA Grace Hopper for Recommendation Models, which in addition to being optimized for recommendation tasks, can power graph neural networks and vector databases.

Let chips break through physical limits

Currently, the production process of semiconductors has approached the limits that physics can achieve. After the 2nm process, what is the breakthrough point? NVIDIA decided to start with the most primitive stage of chip manufacturing - photolithography.

Fundamentally speaking, this is an imaging problem at the limits of physics. Under advanced processes, many features on the chip will be smaller than the wavelength of the light used in the printing process, and the design of the mask must be constantly modified, a step called optical proximity correction. Computational lithography simulates the behavior of light when it passes through the original and interacts with the photoresist. These behaviors are described according to Maxwell's equations. This is the most computationally demanding task in the field of chip design and manufacturing.

NVIDIA releases ChatGPT dedicated GPU, increasing inference speed by 10 times

Jen-Hsun Huang announced a new technology called CuLitho at GTC to speed up the design and manufacturing of semiconductors. The software uses Nvidia chips to speed up the steps between software-based chip design and the physical fabrication of the photolithography masks used to print the design on the chip.

CuLitho runs on GPUs and delivers 40x performance improvements over current lithography technologies, accelerating large-scale computing workloads that currently consume tens of billions of CPU hours annually. "Building H100 requires 89 masks. When running on the CPU, one mask takes two weeks, but if H100 is used to run on CuLitho, it only takes 8 hours," Huang said.

This means that 500 NVIDIA DGX H100 systems can replace the work of 40,000 CPU systems and run all parts of the computational lithography process, helping to reduce power requirements and environmental impact potential impact.

This advancement will make the transistors and circuits of chips smaller than today, while speeding up the time to market of chips and improving the massive data centers that operate around the clock to drive the manufacturing process. energy efficiency.

Nvidia said it is working with ASML, Synopsys and TSMC to bring the technology to market. According to reports, TSMC will begin preparing for trial production of this technology in June.

"The chip industry is the foundation for almost every other industry in the world," Huang said. "With lithography technology at the limits of physics, through CuLitho and working with our partners TSMC, ASML and Synopsys, fabs can increase production, reduce their carbon footprint, and lay the foundation for 2nm and beyond."

The first GPU-accelerated quantum computing system

At today’s event, NVIDIA also announced a new system built using Quantum Machines, which is designed for high-tech applications. Researchers deliver a revolutionary new architecture for performance and low-latency quantum classical computing.

NVIDIA releases ChatGPT dedicated GPU, increasing inference speed by 10 times

As the world’s first GPU-accelerated quantum computing system, NVIDIA DGX Quantum combines the world’s most powerful accelerated computing platform (Achieved by NVIDIA Grace Hopper super chip and CUDA Quantum open source programming model) combined with the world's most advanced quantum control platform OPX (provided by Quantum Machines). This combination enables researchers to build unprecedentedly powerful applications that combine quantum computing with state-of-the-art classical computing to enable calibration, control, quantum error correction and hybrid algorithms.

At the heart of DGX Quantum is an NVIDIA Grace Hopper system connected by PCIe to Quantum Machines OPX, enabling sub-microsecond latency between the GPU and the Quantum Processing Unit (QPU).

Tim Costa, director of HPC and quantum at NVIDIA, said: "Quantum-accelerated supercomputing has the potential to reshape science and industry, and NVIDIA DGX Quantum will enable researchers to break through the quantum-classical computing gap. Boundaries."

In response, NVIDIA integrated the high-performance Hopper architecture GPU with the company's new Grace CPU into "Grace Hopper" to provide super powerful AI and HPC applications. motivation. It delivers up to 10x performance for applications running terabytes of data, giving quantum-classical researchers more power to solve the world's most complex problems.

DGX Quantum also equips developers with NVIDIA CUDA Quantum, a powerful unified software stack that is now open source. CUDA Quantum is a hybrid quantum-classical computing platform that integrates and programs QPUs, GPUs, and CPUs in one system.

$37,000 per month, train your own ChatGPT on the webpage

Microsoft spent hundreds of millions of dollars to purchase tens of thousands of A100s to build a GPT-specific supercomputer, you are now You may want to rent the same GPUs used by OpenAI and Microsoft to train ChatGPT and Bing Search to train your own large models.

The DGX Cloud proposed by NVIDIA provides a dedicated NVIDIA DGX AI super computing cluster, paired with NVIDIA AI software. This service allows every enterprise to access AI super computing using a simple web browser. computing, eliminating the complexity of acquiring, deploying, and managing on-premises infrastructure.

NVIDIA releases ChatGPT dedicated GPU, increasing inference speed by 10 times

According to reports, each DGX Cloud instance has eight H100 or A100 80GB Tensor Core GPUs, with a total of 640GB GPU memory per node. A high-performance, low-latency fabric built with NVIDIA Networking ensures workloads can scale across clusters of interconnected systems, allowing multiple instances to act as one giant GPU to meet the performance requirements of advanced AI training.

Now, enterprises can rent a DGX Cloud cluster on a monthly basis to quickly and easily scale development of large multi-node training workloads without waiting for accelerated computing resources that are often in high demand.

The monthly rental price, according to Huang Renxun, starts at $36,999 per instance per month.

“We are in the iPhone moment of artificial intelligence,” Huang said. “Startups are racing to create disruptive products and business models, and incumbents are looking to respond. DGX Cloud gives customers instant access to NVIDIA AI supercomputing in the cloud at global scale."

To help enterprises embrace the wave of generative AI, NVIDIA also announced a series of cloud services that allow Enterprises can build and improve customized large-scale language models and generative AI models.

Now people can use NVIDIA NeMo language services and NVIDIA Picasso image, video and 3D services to build proprietary, domain-specific generative AI applications for intelligent conversations and Customer support, professional content creation, digital simulations and more. Separately, NVIDIA announced new models of the NVIDIA BioNeMo biology cloud service.

"Generative AI is a new type of computer that can be programmed in human natural language. This ability has far-reaching implications - everyone can command the computer to solve problems, which was not the case before Soon, this was just for programmers," Huang said.

Judging from today’s release, Nvidia is not only continuously improving hardware design for technology companies’ AI workloads, but is also proposing new business models. In the eyes of some, NVIDIA wants to be "TSMC in the field of AI": providing advanced productivity foundry services like a wafer factory, helping other companies train AI algorithms for their specific scenarios on top of it.

Using NVIDIA’s supercomputer training to directly eliminate the need for middlemen to earn the price difference, will this be the direction of AI development in the future?

The above is the detailed content of NVIDIA releases ChatGPT dedicated GPU, increasing inference speed by 10 times. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete