search
HomeTechnology peripheralsAIGalaxy AI Network, the answer to transportation capacity in the era of large models

As the value of large AI pre-trained models continues to emerge, the scale of the models is becoming larger and larger. Industry and academia have reached a consensus: In the AI ​​era, computing power is productivity.

Although this understanding is correct, it is not comprehensive. Digital systems have three pillars: storage, computing, and networking, and the same goes for AI technology. If you put aside storage and network computing power, then large models can only stand alone. In particular, network infrastructure adapted to large models has not received effective attention.

Faced with large-scale AI models that frequently require "tens of thousands of cards for training", "tens of thousands of miles of deployment" and "trillions of parameters", network transport capacity is a link that cannot be ignored in the entire intelligent system. The challenges it faces are very prominent, and it is waiting for answers that can break the situation.

Galaxy AI Network, the answer to transportation capacity in the era of large models

Wang Lei, President of Huawei Data Communication Product Line

On September 20, a data communications summit with the theme of "Galaxy AI Network, Accelerating Industry Intelligence" was held during the Huawei Connect Conference 2023. Representatives from all walks of life discussed the transformation and development trends of AI network technology. At the meeting, Wang Lei, President of Huawei’s Data Communications Product Line, officially launched the Galaxy AI network solution. He said that large models make AI smarter, but the cost of training a large model is very high, and the cost of AI talent must also be considered. Therefore, in the intelligentization stage of the industry, only by concentrating on building large computing power clusters and providing intelligent computing cloud services to the society can artificial intelligence be truly penetrated into thousands of industries. Huawei has released a new generation of Galaxy AI network solution. Facing the intelligent era, it builds a new network infrastructure with ultra-high throughput, long-term stability, reliability, elasticity and high concurrency to help AI benefit everyone and accelerate the intelligence of the industry.

Take this opportunity to learn about the network challenges brought by the rise of large models to intelligent computing data centers, and why Huawei Galaxy AI Network is the optimal solution to these problems.

When it comes to the AI ​​era, a model, a piece of data, and a computing unit can be regarded as a starlight. However, only by connecting them together efficiently and stably can a brilliant intelligent world be formed

The explosion of large models triggered a hidden network torrent

We know that the AI ​​model is divided into two stages: training and inference deployment. With the rise of pre-trained large models, huge AI network challenges have also occurred in these two stages.

The first is the training phase of the large model. As the model scale and data parameters become larger and larger, large model training begins to require computing clusters of kilocalorie or even 10,000 kilowatts to complete. This also means that large model training must occur in data centers with AI computing power.

At the current stage, the cost of intelligent computing data centers is very high. According to industry data, the cost of building a cluster with 100P computing power reaches 400 million yuan. Taking a well-known international large model as an example, its daily computing power expenditure during the training process reaches 700,000 US dollars

If the connection capability of the data center network is not smooth, resulting in a large amount of computing resources being lost during network transmission, the losses to the data center and AI models will be immeasurable. On the contrary, if cluster training is more efficient under the same computing power scale, then data centers will gain huge business opportunities. The load rate and other network factors directly determine the training efficiency of the AI ​​model. On the other hand, as the scale of the AI ​​computing power cluster continues to expand, its complexity also increases accordingly, so the probability of failure is also increasing. Building a long-term stable and reliable cluster network is an important pivot for data centers to improve their input-output ratio

Galaxy AI Network, the answer to transportation capacity in the era of large models

Outside of the data center, the value of the AI ​​network can also be seen in the reasoning and deployment scenarios of AI models. The inference deployment of large models mainly relies on cloud services, and cloud service providers must try to serve larger customers with limited computing resources to maximize the commercial value of large models. As a result, the more users there are, the more complex the entire cloud network structure will be. How to provide long-term and stable network services has become a new challenge for cloud computing service providers.

In addition, in the last mile of AI inference deployment, government and enterprise users are faced with the need to improve network quality. In real scenarios, 1% link packet loss will cause TCP performance to drop 50 times, which means that for a 100Mbps broadband, the actual capacity is less than 2Mbps. Therefore, only by improving the network capabilities of the application scenario itself can we ensure the smooth flow of AI computing power and realize truly inclusive AI.

It is not difficult to see from this that in the entire process of the birth, transmission, and application of large AI models, every link faces the challenges and needs of network upgrades. The transportation capacity problem in the era of large models needs to be solved urgently.

The network breakthrough ideas in the intelligent era can extend from starlight to galaxy

The rise of large models has brought about a multi-link, full-process network problem. Therefore, we must take a systematic approach to address this challenge

Huawei has proposed a new network infrastructure for intelligent computing cloud services. The facility needs to support the three capabilities of "high-efficiency training", "non-stop computing power" and "inclusive AI services". These three capabilities cover the entire scenario of AI large models from training to inference deployment. Huawei not only focuses on meeting a single need and upgrading a single technology, but also comprehensively promotes the iteration of AI networks, bringing unique breakthrough ideas to the industry

Specifically, the network infrastructure in the AI ​​era needs to include the following capabilities:

First of all, the network needs to maximize the value of the AI ​​computing cluster in the training scenario. By building a network with ultra-large-scale connection capabilities, we can achieve high efficiency in training large AI models.

Secondly, in order to ensure the stability and sustainability of AI tasks, it is necessary to build long-term and reliable network capabilities to ensure that monthly training is not interrupted, and at the same time, stable delimitation, positioning and recovery at the second level are required. Minimize training interruptions as much as possible. This is the non-stop capacity building of computing power.

Thirdly, during the AI ​​inference deployment process, the network is required to have the characteristics of elasticity and high concurrency, which can intelligently orchestrate massive user flows and provide the best AI landing experience. At the same time, it can resist the impact of network degradation and ensure different AI computing power flows smoothly between regions, which also realizes the capacity building of "inclusive AI services".

Huawei finally launched the Galaxy AI network solution, adhering to this game-breaking idea. This solution integrates dispersed AI technologies and forms a galaxy-like network through powerful computing capabilities

Galaxy AI Network gives a capacity answer to the big model era

During the Huawei Full Connection Conference 2023, Huawei shared its development vision for accelerating the creation of large AI models with large computing power, large storage capacity, and large transportation capacity. The new generation of Huawei's Galaxy AI network solution can be said to be Huawei's solution to large-scale transport capacity in the era of intelligence.

For intelligent data centers, Huawei Galaxy AI Network is the optimal solution based on network power.

Galaxy AI Network, the answer to transportation capacity in the era of large models

Its ultra-high throughput network characteristics can provide important value to the AI ​​cluster in the intelligent computing center to improve the network load rate and enhance training efficiency. Specifically, Galaxy AI network intelligent computing switches have the industry's highest density 400GE and 800GE port capabilities. Only a layer 2 switching network can realize a convergence-free cluster network of 18,000 cards, thus supporting large model training with over one trillion parameters. . Once the networking levels are reduced, it means that the data center can save a lot of optical module costs, while improving the predictability of network risks and obtaining more stable large model training capabilities.

The Galaxy AI network can support network-level load balancing NSLB, increasing the load rate from 50% to 98%, which is equivalent to realizing overclocking operation of the AI ​​cluster, thereby improving the training efficiency by 20%, and meeting the expectations of efficient training

Galaxy AI Network, the answer to transportation capacity in the era of large models

For cloud service manufacturers, Galaxy AI Network can provide stable and reliable computing power guarantee.

In DCI computing room interconnection scenarios, this technology can provide functions such as multi-path intelligent scheduling, automatically identify and proactively adapt to the impact of peak business traffic. It can identify large and small flows from millions of data flows and reasonably allocate them to 100,000 paths to achieve zero congestion in the network and provide elastic guarantee for high-concurrency intelligent computing cloud services

For government and enterprise users, Galaxy AI Network can cope with network degradation problems and ensure universal AI computing power.

It can support elastic anti-degradation capabilities in DCA calculation scenarios. It uses Fillp technology to optimize the TCP protocol, which can increase the bandwidth load rate from 10% to 60% under the condition of 1% packet loss rate, thereby ensuring that data from metropolitan areas The smooth flow of computing power to remote areas accelerates the inclusive application of AI services.

In this way, the network requirements of all aspects of large models from training to deployment are solved. From intelligent computing centers to thousands of industries, they all have the development fulcrum of network-based computing.

In an era of intelligence, a new era of science and technology opened by large models has just begun. Galaxy AI Network provides answers to transportation capacity in the intelligent era

The above is the detailed content of Galaxy AI Network, the answer to transportation capacity in the era of large models. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:搜狐. If there is any infringement, please contact admin@php.cn delete
网络ms是什么意思网络ms是什么意思Jul 12, 2021 am 10:52 AM

网络ms是指网络延迟了以ms(毫秒)为单位的数据。网络中的ms就是指的毫秒,ms数值则代表了网络的延时情况,如果ms数值越高,说明当前网络延迟状况严重,用户进行游戏时会出现卡顿现象;如果ms数值越低,也就代表了网络状况流畅。

网络接入已满是什么意思网络接入已满是什么意思Feb 28, 2023 pm 02:15 PM

网络接入已满的意思是指当前连接的WIFI已经达到预定的设备数量了,无法再接入新的设备了;通俗说就是路由器设置了只能连接N个设备,现在已经足够了,所以新的设备就连接不了。

在因特网上的每一台主机都有唯一的地址标识称为什么在因特网上的每一台主机都有唯一的地址标识称为什么Aug 22, 2022 pm 03:24 PM

每一台主机都有唯一的地址标识称为“IP地址”。IP地址是IP协议提供的一种统一的地址格式,它为互联网上的每一个网络和每一台主机分配一个唯一的逻辑地址,以此来屏蔽物理地址的差异。由于有这种唯一的地址,才保证了用户在连网的计算机上操作时,能够高效而且方便地从千千万万台计算机中选出自己所需的对象来。

网络忙是什么意思网络忙是什么意思Mar 10, 2023 pm 03:39 PM

网络忙的意思就是“网络忙线”,指对方拒绝接听电话或者当信号不好时,就会出现提示网络忙;提示网络忙的其他原因有:1、所处的电话基站的无线信道太少或打电话的人太多;2、晚上IP路由比较忙,所以会经常听到网络忙的提示。

进网许可和进网试用有什么区别进网许可和进网试用有什么区别Sep 28, 2022 am 11:22 AM

进网许可和进网试用的区别:1、标志上的颜色不同,进网试用的标志颜色是绿色,而进网许可标志是蓝色的;2、两者的使用时间不同,进网试用是给用户一年的试用期,但是进网许可是直接进行使用,没有时间限制。

chn-ct是什么网络chn-ct是什么网络Oct 27, 2022 pm 05:09 PM

chn-ct是中国电信的4G网络。CHN-CT全称China Telecom(FDD-LTE),翻译过来是中国电信(第四代移动通信网络),属于中国电信的移动通信网络,只有电信用户可以使用。CHN-CT技术包括TD-LTE和FDD-LTE两种制式,但LTE只是3.9G,因此在严格意义上其还未达到4G的标准;只有升级版的LTE Advanced才满足国际电信联盟对4G的要求。

evdo是什么网络evdo是什么网络Oct 26, 2022 am 11:31 AM

evdo是电信的CDMA网络的3G网络制式,最高速度可以达到3.1M左右;evdo是三个单词的缩写,全称为“CDMA2000 1xEV-DO”,已被国际电联ITU接纳为国际3G标准。

puo的网络意思是什么puo的网络意思是什么Nov 21, 2022 am 10:43 AM

puo的网络意思是禁止的用户操作。puo其原理是通知用户是否对应用程序使用硬盘驱动器和系统文件授权,以达到帮助阻止恶意程序损坏系统的效果。puo提示要求获得许可才能提升权限时,桌面被锁定,这样它只接受来自Windows进程的消息;Windows页面内存管理进程作为单线程运行在每个处理器上,并在系统不处理其他线程的时候分派处理器的时间。

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
Repo: How To Revive Teammates
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Dreamweaver Mac version

Dreamweaver Mac version

Visual web development tools

Safe Exam Browser

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

SAP NetWeaver Server Adapter for Eclipse

SAP NetWeaver Server Adapter for Eclipse

Integrate Eclipse with SAP NetWeaver application server.

SublimeText3 English version

SublimeText3 English version

Recommended: Win version, supports code prompts!