Home > Article > Technology peripherals > Lao Huang holds a special "nuclear bomb" for ChatGPT to explode the scene, and the NVIDIA H100 Extreme Edition is 10 times faster!
NVIDIA, win!
At the just-concluded GTC conference, relying on the full screen of "generative AI" and holding an H100 NVLINK chip that supports ChatGPT computing power and speeds up 10 times, Lao Huang is missing Write these words on your face - "I am a winner".
##ChatGPT, Microsoft 365, Azure, Stable Diffusion, DALL-E, Midjourney... all of these are the most popular nowadays Nvidia can get a share of the explosive AI products.
The global popularity of ChatGPT at the beginning of this year caused NVIDIA’s stock price to soar, and its market value directly increased by more than 70 billion US dollars. Currently, Nvidia’s market capitalization is US$640 billion.
Now, the iPhone moment of AI has arrived, and the fourth technological revolution is about to begin, and Nvidia, which holds the A100 and H100, may Become the biggest winner.
At the GTC conference, Huang announced NVIDIA’s remarkable progress in GPUs, acceleration libraries, computational lithography, and cloud platforms, and even made bold claims—NVIDIA is going to Be the TSMC in the AI industry!
Some people have already speculated that today’s speech was all generated using the AIGC model on the H100.
ChatGPT dedicated GPU has arrivedThe biggest announcement at this conference is the NVIDIA H100 NVLINK built for ChatGPT.
Due to the huge demand for computing power, NVIDIA has launched a new Hopper GPU for inference of LLMs such as ChatGPT, PCIE H100 equipped with dual GPU NVLINK and 94B memory.
In fact, the history of deep learning has been closely related to NVIDIA since 2012.
Lao Huang said that in 2012, deep learning veteran Hinton and students Alex Kerchevsky and Ilya Suskever used GeForce GTX 580 when training AlexNet.
Subsequently, AlexNet won the ImageNet image classification competition in one fell swoop, becoming the singular point of the deep learning explosion.
After 10 years, Ilya Suskever at OpenAI also used NVIDIA’s DGX to train GPT3 and GPT3.5 behind ChatGPT.
Lao Huang proudly said that the only GPU on the cloud that can actually handle ChatGPT is the HGX A100.
But compared with A100, a server equipped with four pairs of H100 and dual GPU NVLINK is 10 times faster! Because H100 can reduce the processing cost of LLM by an order of magnitude.
AI is at an inflection point as generative AI creates a wave of opportunity, causing inference workloads to grow in a step-function fashion.
In the past, designing a cloud data center to handle generative AI was a huge challenge.
On the one hand, it is ideally better to use an accelerator to make the data center elastic; but on the other hand, no accelerator can handle the problems in algorithms and models in an optimal way. , diversity in data type and size. NVIDIA's One Architecture platform has both acceleration capabilities and flexibility.
Today, NVIDIA announced the launch of a new inference platform. Each configuration is optimized for a certain type of workload.
For example, for AI video workloads, NVIDIA has launched L4, which has optimized video decoding and transcoding, video content review, and video calling functions.
And an 8-GPU L4 server will replace more than a hundred dual-socket CPU servers used to process AI videos.
At the same time, NVIDIA also launched L40 for generative AI such as Omniverse, graphics rendering, and text-to-image/video conversion. Its performance is 10 times that of Nvidia’s most popular cloud inference GPU T4.
Currently, the powerful capabilities of the Gen-1 and Gen-2 generative AI models launched by Runway rely on NVIDIA GPUs.
In addition, NVIDIA also launched a new super chip Grace-Hopper, which is suitable for recommendation systems and vector databases.
In the chip field, NVIDIA teamed up with TSMC, ASML and Synopsys to finally complete computational lithography after 4 years A major breakthrough in technology - NVIDIA cuLitho computational lithography library.
After reaching the limit of the 2nm process, photolithography is the breakthrough point.
Computational lithography simulates the behavior of light as it interacts with the photoresist after passing through the optical element. By applying inverse physics algorithms, we can predict the pattern on the mask so that it can be The final pattern is generated on the wafer.
Computational lithography is the largest computing workload in chip design and manufacturing, consuming tens of billions of CPU hours every year. In contrast, the new algorithm created by NVIDIA allows increasingly complex computational lithography workflows to be executed in parallel on GPUs.
In summary, cuLitho can not only increase computing speed by 40 times, but also reduce power consumption by as much as 9 times.
For example, Nvidia’s H100 requires 89 masks.
If processed by CPU, each mask will take two weeks. If you run cuLitho on the GPU, it only takes 8 hours to process a mask.
TSMC can also use 4,000 Hopper GPUs in 500 DGX H100 systems to complete work that previously required up to 40,000 CPU-based servers, and the power will also increase. 35MW reduced to 5MW.
It is worth noting that the cuLitho acceleration library is also compatible with Ampere and Volta architecture GPUs, but Hopper is the fastest solution.
Lao Huang said that because photolithography technology is already at the limit of physics, wafer factories can increase production and prepare for the development of 2nm and beyond.
AI’s iPhone Moment
In the past few months, ChatGPT has been on the verge of setting off the fourth technological revolution. The saying “We are in the iPhone moment of AI” has also been widely circulated.
At the GTC conference, Lao Huang excitedly repeated this sentence three times.
The iPhone moment is coming, startups such as OpenAI are competing to build disruptive products and business models, while established companies like Google and Microsoft are looking for ways to deal with it.
Their various actions are all caused by the sense of urgency in formulating AI strategies triggered by generative AI around the world.
NVIDIA accelerated computing started with the DGX AI supercomputer, which is also the engine behind the current breakthroughs in large-scale language models.
At GTC, Lao Huang proudly said that I personally handed over the world’s first DGX to OpenAI.
Since then, half of the Fortune 100 companies have installed DGXAI supercomputers.
DGX is equipped with 8 H100 GPU modules, and H100 is equipped with a Transformer engine, which can handle amazing models like ChatGPT.
Eight H100 modules are connected to each other through NVLINK Switch, achieving comprehensive non-blocking communication. Eight H100s work together like a giant GPU.
What makes Lao Huang even more excited is that Microsoft announced that Azure will open a private preview version of its H100 AI supercomputer.
also said, "The DGX supercomputer is a modern AI factory. We are in the iPhone moment of AI."
Over the past decade, the combination of acceleration and vertical scaling has enabled applications to achieve million-fold performance improvements.
The most impressive example is the introduction of the AlexNet deep learning framework in 2012.
At that time, Alex Krizhevsky, Ilya Suskever, and Hinton completed training using 14 million images on GeForce GTX 580, which can handle 262 petaflops of floating point operations.
Ten years later, Transformer was released.
Ilya Suskever trained GPT-3 to predict the next word, which required a million times more floating point operations than training the AlexNet model.
Thus, an AI that shocked the world was created - ChatGPT.
To summarize in Lao Huang’s words:
This means that a new computing platform has been born, and the “iPhone moment” of AI has arrived. . Accelerated computing and AI technology have entered reality.
The acceleration library is the core of accelerated computing. These acceleration libraries connect various applications, and then connect to various industries, forming a network within the network.
After 30 years of development, thousands of applications have been accelerated by NVIDIA's libraries, covering almost every field of science and industry.
Currently, all NVIDIA GPUs are compatible with CUDA.
The existing 300 acceleration libraries and 400 AI models cover a wide range of fields such as quantum computing, data processing, and machine learning.
At this GTC conference, Nvidia announced that it has updated 100 of them.
The NVIDIA Quantum platform consists of libraries and systems that allow researchers to advance quantum programming models, system architectures and algorithms.
cuQuantum is an acceleration library for quantum circuit simulation. Companies such as IBM and Baidu have integrated this acceleration library into their simulation frameworks.
Open Quantum CUDA is NVIDIA’s hybrid GPU-Quantum programming model.
Nvidia also announced the launch of a quantum control link, developed in partnership with Quantum Machines. It can connect Nvidia GPUs to quantum computers to perform error correction at extremely fast speeds.
There is also a new RAFT library launched to speed up indexing, data loading and nearest neighbor search.
In addition, NVIDIA also announced DGX Quantum, built with DGX and leveraging the latest open source CUDA Quantum. This new platform provides a revolutionary high-end platform for researchers engaged in quantum computing. Performance and low-latency architecture.
NVIDIA also launched NVIDIA Triton Management Service software to automatically scale and orchestrate Triton inference instances throughout the data center. Suitable for multi-GPU and multi-node inference of large language models like GPT-3.
CV-CUDA for computer vision and VPF for video processing are Nvidia’s new cloud-scale acceleration libraries.
Lao Huang announced that CV-CUDA Beta optimizes pre-processing and post-processing, achieving higher cloud throughput and reducing costs and energy consumption by a quarter.
Currently, Microsoft handles visual search and Runway uses CV-CUDA and VRF libraries for its generative AI video processing.
In addition, NVIDIA accelerated computing has also helped genomics achieve milestone developments. Using NVIDIA-powered instruments, reducing the cost of entire genome sequencing to $100 has become another milestone.
NVIDIA Parabrics acceleration libraries can be used for end-to-end genomic analysis in the cloud or within the instrument, and are suitable for various public cloud and genomics platforms.
Now, ChatGPT, Stable Diffusion, DALL-E and Midjourney have awakened the world’s awareness of generative AI .
The popular ChatGPT has exceeded 100 million monthly users just 2 months after its launch, and has become the application with the fastest user growth in history.
It can be said that it is a computer. Not only can you generate text, write poetry, rewrite research papers, solve math problems, and even program.
Many breakthrough results have created today’s generative AI.
Transformer is able to learn context and meaning from the relationships and dependencies of data in a massively parallel way. This enables LLMs to learn from massive amounts of data and perform downstream tasks without explicit training.
In addition, a diffusion model inspired by physics can generate images through unsupervised learning.
Lao Huang concluded that in just a dozen years, we have gone from identifying cats to the process of generating cats in space suits walking on the moon.
It can now be said that generative AI is a new kind of computer, a computer that can be programmed in human language.
# Previously, ordering the computer to solve problems was the exclusive privilege of programmers, but now, everyone can be a programmer.
Like Bill Gates, Huang also gave a similar definition: Generative AI is a new computing platform, similar to PCs, the Internet, mobile devices and the cloud.
Through Debuild, we can directly design and deploy web applications as long as we clearly state what we want.
It is clear that generative AI will reshape nearly every industry.
In this context, professional companies need to use their own proprietary data to build customized models.
Then, Huang proudly announced that the industry needs a foundry similar to TSMC to build customized large language models, and NVIDIA is this "TSMC"!
At the conference, NVIDIA announced the launch of NVIDIA AI Foundations cloud service, allowing customers to customize LLM and generative AI.
This cloud service includes language, vision and biological model production services.
Among them, Nemo is used to build a customized language text-to-text generative model.
Picasso is a visual language model maker that can be used to train custom models, including images, videos and 3D applications.
Just send an API call with text prompts and metadata to Picasso, and Picasso will send the generated material back to the application using the model on DGX Cloud.
What’s even more amazing is that by importing these materials into NVIDIA Omniverse, you can build realistic Metaverse applications and digital twin simulations.
In addition, NVIDIA is also working with Shutterstock to develop Edify-3D generative models.
At the same time, the cooperation between NVIDIA and Adobe continues to expand, integrating generative AI into the daily workflow of marketers and creative people, and paying special attention to the protection of artists' copyrights.
The third field is biology.
Today, the value of the drug research and development industry has reached nearly 2 trillion yuan, with R&D investment of up to 250 billion US dollars.
NVIDIA Clara is a medical and health application framework for imaging, instrumentation, genomic analysis and drug development.
Recently, a popular direction in the biosphere is to use generative AI to discover disease targets and design new molecules or protein drugs.
Correspondingly, BIONEMO allows users to use proprietary data to create, fine-tune, and provide customized models, including protein prediction models such as AlphaFold, ESMFold, and OpenFold.
Finally, Huang concluded that NVIDIA AI Foundations is a cloud service and foundry for building custom language models and generative AI.
老黄云service, monthly rental is US$36,999
Nvidia also launched a cloud service this time.
It keenly saw the need for customers to access NVIDIA AI more easily and quickly, so it launched NVIDIA DGX Cloud.
DGX Cloud cooperates with Microsoft Azure, Google GCP and Oracle OCI. With just a browser and NVIDIA DGX AI supercomputer, you can instantly access every company!
On this cloud, you can run the NVIDIA AI Enterprise acceleration library suite to directly solve the end-to-end development and deployment of AI.
Moreover, the cloud not only provides NVIDIA AI, but also several major cloud service providers in the world.
And NVIDIA’s first NVIDIA DGX Cloud is Oracle Cloud Infrastructure (OCI).
In OCI, the two kings of NVIDIA CX-7 and BlueField-3 combined to form a powerful supercomputer.
According to reports, enterprises can now rent DGX Cloud, starting at $36,999 per month.
Finally, of course, there is still the reserved program of the annual GTC conference-Omniverse. Lao Huang announced the update of the metaverse platform Omniverse.
Now, Microsoft and NVIDIA are preparing to bring Omniverse to hundreds of millions of Microsoft 365 and Azure users.
In addition, there is news that in order to allow H100 to be exported to China in compliance with regulations, Lao Huang specially adjusted a "H800" based on the experience of the previous A800, combining the chips. The data transfer rate between H100 and H100 is reduced to about 50%.
In summary, Lao Huang has made it quite clear at this conference that NVIDIA wants to be the TSMC in the field of AI, providing OEMs like a wafer factory, and on this basis, other companies in the industry can train algorithms.
Can this business model be successful?
The above is the detailed content of Lao Huang holds a special "nuclear bomb" for ChatGPT to explode the scene, and the NVIDIA H100 Extreme Edition is 10 times faster!. For more information, please follow other related articles on the PHP Chinese website!