Home  >  Article  >  Technology peripherals  >  Huawei launches new architecture Ascend AI computing cluster to support large model training with over one trillion parameters

Huawei launches new architecture Ascend AI computing cluster to support large model training with over one trillion parameters

WBOY
WBOYforward
2023-09-22 21:49:01550browse

IT House reported on September 20 that during the Huawei Full Connectivity Conference 2023 held today, Wang Tao, Huawei’s Managing Director, Director of the ICT Infrastructure Business Management Committee, and President of Enterprise BG, officially released the new architecture of the Ascend AI computing cluster— —Atlas 900 SuperCluster, which can support large model training with over one trillion parameters.

华为推出全新架构昇腾 AI 计算集群,支持超万亿参数大模型训练

The new cluster uses Huawei Galaxy AI intelligent computing switch CloudEngine XH16800. This switch has high-density 800GE port capabilities, allowing the two-layer switching network to implement an ultra-large-scale non-convergence cluster group with 2,250 nodes (equivalent to 18,000 cards). net

The new cluster also uses an innovative super-node architecture, which greatly improves large model training capabilities. In addition, Huawei leverages its comprehensive advantages in computing, network, storage, energy and other fields to comprehensively improve system reliability from the device level, node level, cluster level and business level, and improve the stability of large model training from days to months.

Huawei has released the more open and easier-to-use CANN 7.0 heterogeneous computing architecture. This architecture is not only fully compatible with the industry's AI frameworks, acceleration libraries, and mainstream large models, but also deeply opens up underlying capabilities, allowing AI frameworks and acceleration libraries to more directly call and manage computing resources. This allows developers to customize high-performance operators and make large models differentiated and competitive

华为推出全新架构昇腾 AI 计算集群,支持超万亿参数大模型训练

Huawei has also upgraded the Ascend C programming language to simplify operator implementation logic with a more efficient programming method, greatly shortening the development cycle of fusion operators, and providing support for the rapid development of AI models and applications

华为推出全新架构昇腾 AI 计算集群,支持超万亿参数大模型训练

Huawei Cloud official website today officially launched the Shengteng AI cloud service "Hundred Modes and Thousands of Conditions" zone, targeting global enterprises and developers. This area contains the industry's mainstream open source large models, which are fully adapted and optimized based on Ascend AI cloud services. At the same time, it provides a tool chain for application development. All development tools have been cloud-based, eliminating the cumbersome configuration process and achieving one-click access and ready-to-use

华为推出全新架构昇腾 AI 计算集群,支持超万亿参数大模型训练

华为推出全新架构昇腾 AI 计算集群,支持超万亿参数大模型训练

The content that needs to be rewritten is: ▲ Shengteng AI Cloud Service Special Zone

According to IT Home inquiry, as of July this year, Shengteng AI cluster has supported the construction of artificial intelligence computing centers in 25 cities across the country. Among them, the public computing power platforms of 7 cities were selected as the first batch of national "new generation artificial intelligence public computing power open innovation platforms"

At the same time, Shengteng AI has developed more than 30 hardware partners and more than 1,200 ISVs, and has jointly launched more than 2,500 industry AI solutions, serving operators, Internet, finance and other industries on a large scale.

The above is the detailed content of Huawei launches new architecture Ascend AI computing cluster to support large model training with over one trillion parameters. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:sohu.com. If there is any infringement, please contact admin@php.cn delete