search
HomeTechnology peripheralsAIIBM develops cloud-native AI supercomputer Vela to flexibly deploy and train tens of billions of parameter models

ChatGPT is popular on the Internet, and the AI ​​model training behind it has also attracted widespread attention. IBM Research recently announced that the cloud-native supercomputer Vela it developed can be quickly deployed and used to train basic AI models. Since May 2022, dozens of the company’s researchers have been using this supercomputer to train AI models with tens of billions of parameters.

IBM develops cloud-native AI supercomputer Vela to flexibly deploy and train tens of billions of parameter models

Basic models are AI models trained on large amounts of unlabeled data, and their versatility means they can be used for a range of different tasks with just fine-tuning. Their scale is enormous and requires massive and costly computing power. Therefore, as experts say, computing power will become the biggest bottleneck in developing the next generation of large-scale basic models, and training them requires a lot of computing power and time.

Training a model that can run tens of billions or hundreds of billions of parameters requires the use of high-performance computing hardware, including networks, parallel file systems, and bare metal nodes. This hardware is difficult to deploy and expensive to run. Microsoft built an AI supercomputer for OpenAI in May 2020 and hosted it in the Azure cloud platform. But IBM says they are hardware-driven, which increases cost and limits flexibility.

Cloud AI Supercomputer

So IBM created a system called Vela that is “specifically focused on large-scale AI.”

Vela can be deployed to any of IBM's cloud data centers as needed, and it is itself a "virtual cloud". While this approach reduces computing power compared to building physics-based supercomputers, it creates a more flexible solution. Cloud computing solutions provide engineers with resources through API interfaces, easier access to the broad IBM cloud ecosystem for deeper integration, and the ability to scale performance as needed.

IBM engineers explained that Vela is able to access data sets on IBM Cloud Object Storage instead of building a custom storage backend. Previously this infrastructure had to be built separately into supercomputers.

The key component of any AI supercomputer is a large number of GPUs and the nodes connecting them. Vela actually configures each node as a virtual machine (rather than bare metal). This is the most common method and is widely considered to be the most ideal method for AI training.

How is Vela built?

One of the disadvantages of cloud virtual computers is that performance cannot be guaranteed. To address performance degradation and deliver bare-metal performance inside virtual machines, IBM engineers found a way to unlock full node performance (including GPU, CPU, network and storage) and reduce load losses to less than 5%.

This involves configuring a bare metal host for virtualization, supporting virtual machine scaling, large page and single root IO virtualization, and realistic representation of all devices and connections within the virtual machine; also includes network cards and CPUs and GPUs matches, and how they bridge each other. After completing this work, they found that the performance of the virtual machine nodes was "close to bare metal."

In addition, they are also committed to designing AI nodes with large GPU memory and large amounts of local storage for caching AI training data, models and finished products. In tests using PyTorch, they found that by optimizing workload communication patterns, they were also able to bridge the bottleneck of relatively slow Ethernet networks compared to faster networks like Infiniband used in supercomputing.

In terms of configuration, each Vela uses eight 80GB A100 GPUs, two second-generation Intel Xeon scalable processors, 1.5TB of memory and four 3.2TB NVMe hard drives, and can be used at any scale Deploy to any IBM cloud data center around the world.

IBM engineers said: "Having the right tools and infrastructure is a key factor in improving R&D efficiency. Many teams choose to follow the tried-and-true path of building traditional supercomputers for AI... We have been working on a better solutions to provide the dual benefits of high-performance computing and high-end user productivity.”

The above is the detailed content of IBM develops cloud-native AI supercomputer Vela to flexibly deploy and train tens of billions of parameter models. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete
Tool Calling in LLMsTool Calling in LLMsApr 14, 2025 am 11:28 AM

Large language models (LLMs) have surged in popularity, with the tool-calling feature dramatically expanding their capabilities beyond simple text generation. Now, LLMs can handle complex automation tasks such as dynamic UI creation and autonomous a

How ADHD Games, Health Tools & AI Chatbots Are Transforming Global HealthHow ADHD Games, Health Tools & AI Chatbots Are Transforming Global HealthApr 14, 2025 am 11:27 AM

Can a video game ease anxiety, build focus, or support a child with ADHD? As healthcare challenges surge globally — especially among youth — innovators are turning to an unlikely tool: video games. Now one of the world’s largest entertainment indus

UN Input On AI: Winners, Losers, And OpportunitiesUN Input On AI: Winners, Losers, And OpportunitiesApr 14, 2025 am 11:25 AM

“History has shown that while technological progress drives economic growth, it does not on its own ensure equitable income distribution or promote inclusive human development,” writes Rebeca Grynspan, Secretary-General of UNCTAD, in the preamble.

Learning Negotiation Skills Via Generative AILearning Negotiation Skills Via Generative AIApr 14, 2025 am 11:23 AM

Easy-peasy, use generative AI as your negotiation tutor and sparring partner. Let’s talk about it. This analysis of an innovative AI breakthrough is part of my ongoing Forbes column coverage on the latest in AI, including identifying and explaining

TED Reveals From OpenAI, Google, Meta Heads To Court, Selfie With MyselfTED Reveals From OpenAI, Google, Meta Heads To Court, Selfie With MyselfApr 14, 2025 am 11:22 AM

The ​TED2025 Conference, held in Vancouver, wrapped its 36th edition yesterday, April 11. It featured 80 speakers from more than 60 countries, including Sam Altman, Eric Schmidt, and Palmer Luckey. TED’s theme, “humanity reimagined,” was tailor made

Joseph Stiglitz Warns Of The Looming Inequality Amid AI Monopoly PowerJoseph Stiglitz Warns Of The Looming Inequality Amid AI Monopoly PowerApr 14, 2025 am 11:21 AM

Joseph Stiglitz is renowned economist and recipient of the Nobel Prize in Economics in 2001. Stiglitz posits that AI can worsen existing inequalities and consolidated power in the hands of a few dominant corporations, ultimately undermining economic

What is Graph Database?What is Graph Database?Apr 14, 2025 am 11:19 AM

Graph Databases: Revolutionizing Data Management Through Relationships As data expands and its characteristics evolve across various fields, graph databases are emerging as transformative solutions for managing interconnected data. Unlike traditional

LLM Routing: Strategies, Techniques, and Python ImplementationLLM Routing: Strategies, Techniques, and Python ImplementationApr 14, 2025 am 11:14 AM

Large Language Model (LLM) Routing: Optimizing Performance Through Intelligent Task Distribution The rapidly evolving landscape of LLMs presents a diverse range of models, each with unique strengths and weaknesses. Some excel at creative content gen

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
WWE 2K25: How To Unlock Everything In MyRise
1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Safe Exam Browser

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

EditPlus Chinese cracked version

EditPlus Chinese cracked version

Small size, syntax highlighting, does not support code prompt function

DVWA

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SAP NetWeaver Server Adapter for Eclipse

SAP NetWeaver Server Adapter for Eclipse

Integrate Eclipse with SAP NetWeaver application server.