Home >Technology peripherals >AI >In the main battlefield of AI, Wanka is the standard configuration: the domestic GPU Wanka WanP cluster is here!

In the main battlefield of AI, Wanka is the standard configuration: the domestic GPU Wanka WanP cluster is here!

王林Original: 2024-07-10 15:07:26751browse

Scaling Law continues to take effect, and the computing power can hardly keep up with the expansion rate of large models. "The larger the scale, the higher the computing power, and the better the effect" has become an industry standard. It only took one year for mainstream large models to jump from tens of billions to 1.8 trillion parameters. Giants such as META, Google, and Microsoft have also been building ultra-large clusters with more than 15,000 cards since 2022. "Wanka has become the standard for the main AI battlefield."

In the main battlefield of AI, Wanka is the standard configuration: the domestic GPU Wanka WanP cluster is here!

However, in China, there are only a handful of nationally produced GPU Wanka clusters. There is a super large-scale Wanka cluster with super versatility, which is a gap in the industry.

When the domestic GPU Wankawan P cluster made its debut, it naturally attracted widespread attention in the industry.

On July 3, Moore Thread announced in Shanghai that its AI flagship product KUAE intelligent computing cluster solution has achieved a major upgrade, significantly expanding from the current kilo-card level to the 10,000-card scale. The Moore Thread Kuae (KUAE) Wanka intelligent computing cluster is based on a full-featured GPU and aims to create a domestically-leading domestic general-purpose accelerated computing platform that can carry Wanka scale and has 10,000 P-level floating point computing capabilities. Designed for complex large model training with trillions of parameters. This milestone progress has set a new benchmark for domestic GPU technology, helps achieve a new leap in the computing capabilities of domestic intelligent computing clusters, and will provide a solid and reliable basis for technological and application innovation, scientific research and industrial upgrading in the field of artificial intelligence in my country. critical infrastructure.

In addition, Moore Thread has joined forces with China Mobile Communications Group Qinghai Co., Ltd., China Unicom Qinghai Company, Beijing Dedao Xinke Group, China Energy Construction Co., Ltd. General Contracting Company, Guilin Huajue Big Data Technology Co., Ltd. (ranked not (in sequence) respectively signed strategic contracts for three Wanka cluster projects, and all parties worked together to build a useful domestic GPU cluster.

In the main battlefield of AI, Wanka is the standard configuration: the domestic GPU Wanka WanP cluster is here!

Zhang Jianzhong, founder and CEO of Moore Thread, said: "Currently, we are in the golden age of generative artificial intelligence. The intersection of technologies promotes the emergence of intelligence, and GPU has become the innovation engine that accelerates the arrival of the new wave of new technologies. Moore Thread We are committed to this historic creation process, committed to providing accelerated computing infrastructure and one-stop solutions to the world, and creating an advanced accelerated computing platform for the digital world that integrates artificial intelligence and digital twins. As an important piece of Moore Thread's full-stack AI strategy, the intelligent computing cluster can provide surging computing power for the digital intelligence transformation of all walks of life. It not only effectively demonstrates the strength of Moore Thread in technological innovation and engineering practice, but also will become a driving force for AI. A new starting point for industrial development. "

In the main battlefield of AI, Wanka is the standard configuration: the domestic GPU Wanka WanP cluster is here!

In the main battlefield of AI, Wanka universal computing power is standard. Since the advent of the large model, its future direction and development trend need to be verified by time, but from now on Judging from the above, several evolutionary trends are worthy of attention, making its core demand for computing power increasingly clear.

First of all, Scaling Law will continue to work.

Since Scaling Law was proposed in 2020, it has revealed the "aesthetics of violence" behind the development of large models, that is, through the deep integration of computing power, algorithms, data and the accumulation of experience, a leap in model performance has been achieved, which has also become a recognized industry standard. Continue to influence the development trend of future large models. Scaling Law will continue to work, requiring a single point of sufficient scale and general computing power to quickly keep up with technological evolution.

Secondly, the Transformer architecture cannot be unified and will continue to evolve and coexist with other architectures to form a diversified technology ecosystem.

The evolution of generative AI does not just rely on simple expansion of scale, but innovation in technical architecture is also crucial. Although the Transformer architecture is currently mainstream, emerging architectures such as Mamba, RWKV and RetNet continue to refresh computing efficiency and accelerate innovation. With the iteration and evolution of technology, the Transformer architecture cannot achieve a unified model. From dense to sparse models to the fusion of multi-modal models, technological progress has demonstrated the desire for higher-performance computing resources.

At the same time, the cross-technology and cross-domain integration of AI, 3D and HPC continues to accelerate

, promoting the expansion of the boundaries of spatial intelligence, physical AI, AI 4Science, world models and other fields, making the training and application environment of large models more Complex and diverse, the market has an increasingly urgent need for a general accelerated computing platform that can support the integrated development of multiple computing such as AI+3D, AI+physical simulation, AI+scientific computing, etc.

다양한 트렌드 속에서 Wanka는 AI 모델 훈련의 주요 전장의 표준이 되었습니다. 컴퓨팅 양이 계속 증가함에 따라 대규모 모델 교육에는 교육 시간을 단축하고 모델 기능의 빠른 반복을 달성하기 위한 "대형 및 범용" 가속 컴퓨팅 플랫폼인 슈퍼 팩토리가 시급히 필요합니다. 현재 국제 기술 대기업들은 대형 모델 제품의 경쟁력 확보를 위해 1,000장, 심지어는 10,000장 이상의 카드 규모의 컴퓨팅 클러스터를 적극적으로 구축하고 있습니다. 모델 매개변수의 수가 수천억에서 수조로 증가함에 따라 모델 기능이 더욱 일반화되고 기본 컴퓨팅 성능에 대한 대형 모델의 요구가 더욱 확대되어 Wanka 또는 심지어 10,000ka 슈퍼 클러스터가 이 대형 모델 라운드의 티켓이 되었습니다. 경쟁.

그러나 Wanka 클러스터를 구축하는 것은 단순히 GPU 카드 10,000개를 쌓는 것이 아니라 매우 복잡한 슈퍼 시스템 프로젝트입니다. 초대규모 네트워킹 상호 연결, 효율적인 클러스터 컴퓨팅, 장기 안정성 및 고가용성 등 많은 기술적 문제가 수반됩니다. 이는 어렵지만 올바른 일입니다. Moore Thread는 10,000장 이상의 카드 규모와 보편적인 시나리오를 갖춘 가속화된 컴퓨팅 플랫폼을 구축하고 대규모 모델 훈련 문제 해결을 우선적으로 수행하기를 희망합니다.

In the main battlefield of AI, Wanka is the standard configuration: the domestic GPU Wanka WanP cluster is here!

Kuae: 국내 Wanka 10,000P조 대형 모델 훈련 플랫폼

Kuae(KUAE)는 Moore Thread 지능형 컴퓨팅 센터의 풀스택 솔루션으로 모든 기능을 갖춘 GPU를 기반으로 하며 소프트웨어와 Kua'e 컴퓨팅 클러스터를 핵심으로 하는 인프라, Kua'e 클러스터 관리 플랫폼(KUAE Platform) 및 Kua'e 대형 모델 서비스 플랫폼(KUAE ModelStudio)을 포함하는 포괄적이고 완벽한 시스템 수준의 컴퓨팅 파워 솔루션입니다. 통합 전달 이 방법은 대규모 GPU 컴퓨팅 성능의 구축 및 운영 관리 문제를 해결합니다.

In the main battlefield of AI, Wanka is the standard configuration: the domestic GPU Wanka WanP cluster is here!

AI 컴퓨팅 성능 요구 사항에 대한 깊은 통찰력과 미래 지향적인 레이아웃을 기반으로 Moore Thread Kua'e 지능형 컴퓨팅 클러스터는 킬로카드에서 10,000ka 클러스터까지 원활하게 확장할 수 있습니다. 대형 모델 시대에 컴퓨팅 성능에 대한 핵심 수요는 "충분히 크며 컴퓨팅에 다재다능하고 생태학적으로 호환 가능"입니다. 초대형 GPU Wanka 클러스터, 극도의 컴퓨팅 효율성 최적화 및 매우 안정적인 운영 환경을 통합하여 Wanka 지능형 컴퓨팅 클러스터 새로운 슈퍼 프로젝트는 국내 클러스터 컴퓨팅 기능의 새로운 표준을 재정의합니다.

In the main battlefield of AI, Wanka is the standard configuration: the domestic GPU Wanka WanP cluster is here!

Kuae Wanka 지능형 컴퓨팅 솔루션에는 여러 핵심 기능이 있습니다.

초대형 컴퓨팅 성능, Wanka Wanka P: 클러스터 컴퓨팅 성능 측면에서 차세대 Kuae 지능형 컴퓨팅 클러스터는 단일 클러스터를 달성합니다. 크기는 10,000개 카드를 초과하고 부동 소수점 컴퓨팅 성능은 10Exa-Flops에 도달합니다. 이는 단일 클러스터의 컴퓨팅 성능을 크게 향상시키고 수조 개의 매개변수가 있는 대규모 모델을 교육하기 위한 견고한 컴퓨팅 성능 기반을 제공할 수 있습니다. 동시에 GPU 메모리 및 전송 대역폭 측면에서 Kua'e Wanka 클러스터는 PB 수준의 초대형 총 그래픽 메모리 용량, PB 수준의 초고속 카드 간 상호 연결 총 대역폭/초에 도달했습니다. PB 수준의 초고속 노드 상호 연결 총 대역폭으로 컴퓨팅을 실현합니다. 전력, 비디오 메모리 및 대역폭의 체계적인 협업 최적화를 통해 클러스터 컴퓨팅 성능을 종합적으로 향상시킵니다.

In the main battlefield of AI, Wanka is the standard configuration: the domestic GPU Wanka WanP cluster is here!

월별 장기 안정적인 트레이닝: 안정성은 슈퍼 10,000개 카드 클러스터의 성능을 측정하는 열쇠입니다. 클러스터 안정성 측면에서 Moore Thread는 Wanka 클러스터의 평균 문제 없는 실행 시간이 15일 이상이며, 30일 이상 대형 모델의 안정적인 학습을 달성할 수 있다는 점을 자랑합니다. 평균 주간 학습 효율성 목표를 달성할 수 있습니다. 99% 이상으로 업계 평균을 훨씬 웃도는 수준입니다. 이는 Moore Threads가 독립적으로 개발한 일련의 예측 가능하고 진단 가능한 다단계 신뢰성 메커니즘에 기인합니다. 여기에는 분 단위 오류 위치를 달성하기 위한 소프트웨어 및 하드웨어 오류의 자동 위치 및 진단 예측과 Checkpoint 다중 레벨 저장 메커니즘이 포함됩니다. 2차 메모리 저장 및 훈련 작업의 분 단위 복구와 내결함성이 뛰어난 고성능 Wanka 클러스터 관리 플랫폼은 2차 관리 할당 및 작업 스케줄링을 실현합니다.

In the main battlefield of AI, Wanka is the standard configuration: the domestic GPU Wanka WanP cluster is here!

극도의 최적화, 초고도 MFU: MFU는 대규모 모델의 훈련 효율성을 평가하기 위한 일반적인 지표로, 엔드투엔드 클러스터 훈련 효율성을 직접 반영할 수 있습니다. Kua'e Wanka 클러스터는 시스템 소프트웨어, 프레임워크 및 알고리즘 측면에서 최적화되었으며, 유효 컴퓨팅 효율성(MFU) 목표는 최대 60%로 국제 수준에 도달했습니다. 그 중 시스템 소프트웨어 수준에서는 익스트림 컴퓨팅, 통신 효율성 최적화 등 기술적 수단을 기반으로 클러스터의 실행 효율성과 성능이 크게 향상된다. 프레임워크 및 알고리즘 수준에서 Kua'e Wanka 클러스터는 다양한 적응형 하이브리드 병렬 전략과 효율적인 메모리 최적화를 지원하며 애플리케이션 로드에 따라 최적의 병렬 전략을 선택하고 자동으로 구성하여 훈련 효율성과 메모리 활용도를 크게 향상시킵니다. 동시에 시퀀스가 매우 긴 대형 모델의 경우 Kua'e Wanka 클러스터는 CP 병렬성 및 RingAttention과 같은 최적화 기술을 사용하여 컴퓨팅 시간과 메모리 사용량을 효과적으로 줄이고 클러스터 훈련 효율성을 크게 향상시킵니다.

In the main battlefield of AI, Wanka is the standard configuration: the domestic GPU Wanka WanP cluster is here!

다양하고 보편적이며 친환경적입니다: Kua'e Wanka 클러스터는 일반 시나리오용으로 설계된 컴퓨팅 기능을 갖춘 일반 가속 컴퓨팅 플랫폼이며 LLM, MoE, 다중 아키텍처와 같은 다양한 아키텍처를 가속화할 수 있습니다. 모달, 맘바 등 다양한 양식의 대형 모델. 동시에 효율적이고 사용하기 쉬운 MUSA 프로그래밍 언어, 완벽한 CUDA 호환성 및 자동화된 마이그레이션 도구 Musify를 기반으로 새 모델의 "Day0" 수준 마이그레이션을 가속화하고 생태학적 적응 "Instant On"을 실현하며, 고객은 빠르게 온라인에 접속합니다.

In the main battlefield of AI, Wanka is the standard configuration: the domestic GPU Wanka WanP cluster is here!

모두가 하나되어 대형 모델 애플리케이션 생태계 구축

Wanka 클러스터 구축에는 대형 모델 혁신 애플리케이션의 신속한 구현을 달성하고 국내 컴퓨팅이 가능하도록 업계의 공동 노력이 필요합니다. "사용을 위해 제작되었습니다". 기자회견에서 Moore Thread는 China Mobile Communications Group Qinghai Co., Ltd., China Unicom Qinghai Company, Beijing Dedao Xinke Group, China Energy Construction Co., Ltd. 일반 계약 회사, Guilin Huajue Big Data Technology Co., Ltd.와 손을 잡았습니다. , Ltd.(특별한 순서 없음)는 각각 Qinghai Zero Carbon Industrial Park Wanka 클러스터 프로젝트, Qinghai Plateau Kua'e Wanka 클러스터 프로젝트, Guangxi ASEAN Wanka 클러스터 프로젝트에 대한 전략적 계약을 체결했습니다.

Moore Thread의 고급 Kua'e 풀 스택 지능형 컴퓨팅 솔루션의 도움으로 모든 당사자는 강력한 국가 산업 및 지능형 컴퓨팅 플랫폼을 구축하여 산업의 디지털 전환과 고품질 발전을 가속화하기 위해 협력할 것입니다. Kua'e Wanka 스마트 컴퓨팅 클러스터 프로젝트는 국내 AI 컴퓨팅 파워 인프라의 또 다른 주요 발전을 의미하며 다양한 곳에서 디지털 경제 발전에 새로운 활력을 불어넣을 것입니다. 중국 Moore thread와 China Mobile Communications Group Qinghai Co., Ltd.와 전략적 계약

In the main battlefield of AI, Wanka is the standard configuration: the domestic GPU Wanka WanP cluster is here!

Moore Moore와 China Unicom Qinghai Company와 Beijing De Daoxinke Group과 전략적 계약

In the main battlefield of AI, Wanka is the standard configuration: the domestic GPU Wanka WanP cluster is here!

中國 Moore Threads와 중국 에너지 건설 유한 공사 일반 계약 회사 및 계림 화성 빅 데이터 기술 유한 회사 전략 서명 회의 후 핵심 돔, Qingcheng Jizhi, 360, Jingdong Yun, Zhi Zhi, Zhi Zhi, Zhi를 묻지 않았습니다. Zhi Zhi Zhi, Zhi Zhi, Zhi Zhi, Zhi Zhi, Zhi Zhi, Zhi Zhi, Zhi Zhi, Zhi Zhi, Zhi Zhi, Zhi Zhi, Zhi Zhi, Zhi Zhi, Zhi Zhi, Zhi Zhi, Zhi Zhi, Zhi Zhi, Zhizhi, Zhizhi, Zhi Zhi, Zhizhi, Zhizhi, Zhizhi 및 Zhizhi. Square를 포함한 5개 파트너의 대표가 차례로 무대에 올라 Moore Thread Kua'e 지능형 컴퓨팅 클러스터가 다양한 시나리오와 분야에서 혁신을 수행하는 데 어떻게 도움이 되는지 공유했습니다. 대형 모델 훈련, 대형 모델 추론, 구체화된 지능 등 Kua'e 지능형 컴퓨팅 클러스터의 역할을 실증적으로 보여줍니다.

Moore Thread는 수많은 업계 파트너와 협력하고, 풀 스택 AI의 힘을 활용하고, 파트너가 구축한 Wanka 지능형 컴퓨팅 클러스터를 강력한 기반으로 하여 국내 지능형 컴퓨팅 생태계 촉진을 가속화할 의향이 있습니다. 다양한 분야에서 디지털 경제를 널리 강화하고 공동으로 열어가는 대형 모델과 생성 인공지능의 새로운 시대를 열어 더 나은 세상을 향해 가속화합니다. WAIC 기간 동안 Moore Thread는 상하이 세계 엑스포 전시 및 컨벤션 센터(H2 홀 D616)에서 "더 나은 세상을 위한 풀 스택 AI 가속화"라는 주제를 수행할 예정입니다. in-one 머신과 AIGC 애플리케이션이 모두 공개되었고, 많은 업계 파트너들이 Kua'e 지능형 컴퓨팅 클러스터를 기반으로 한 풍부한 산업 모델과 애플리케이션 솔루션을 공동으로 시연했습니다.

The above is the detailed content of In the main battlefield of AI, Wanka is the standard configuration: the domestic GPU Wanka WanP cluster is here!. For more information, please follow other related articles on the PHP Chinese website!

架构栈堆线程算法人工智能 transformer 自动化 AIGC

Statement：

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Previous article：The latest progress of the Ant Bailing large model: it already has native multi-modal capabilitiesNext article：The latest progress of the Ant Bailing large model: it already has native multi-modal capabilities

See more

In the main battlefield of AI, Wanka is the standard configuration: the domestic GPU Wanka WanP cluster is here!

Related articles