Home >Technology peripherals >AI >Sangfor and TrendForce join forces to use high-performance storage to support the development of large AI models

Sangfor and TrendForce join forces to use high-performance storage to support the development of large AI models

WBOY
WBOYforward
2023-08-28 18:53:09809browse

Recently, Beijing Trendong Technology Co., Ltd. (hereinafter referred to as " Trendong Technology ") and Sangfor officially launched a joint solution. This solution organically combines Sangfor EDS's high-performance storage with OrionX AI computing resource pooling software and Gemini AI training platform. It aims to integrate storage and computing resources to help users build efficient artificial intelligence platforms and effectively manage them. and utilizing artificial intelligence resources

Sangfor and TrendForce join forces to use high-performance storage to support the development of large AI models

Specifically, the launch of joint solutions will bring the following changes to the construction of user infrastructure in the field of artificial intelligence

The high-performance joint solution is ready to create a more efficient training platform

As the construction of large-scale artificial intelligence models accelerates, users have increasingly higher requirements for the efficiency of artificial intelligence model training. However, due to problems such as insufficient GPU computing resources and insufficient read and write performance of small files in the underlying storage, a large number of training tasks in the training platform have to be queued. Insufficient computing power and storage capacity slow down the training efficiency of the entire artificial intelligence training platform

To solve this problem, the joint solution has been fully optimized. In response to the efficiency issue of the upper-layer training platform, OrionX AI computing resource pooling software of Trend Technology helps users flexibly allocate GPU resources according to task conditions by creating a computing resource pool to achieve resource segmentation, aggregation, remote calling, and super-scoring. , task queuing, dynamic mounting and release, and domestic chip heterogeneous pooling and other capabilities to fully meet the computing power requirements of various training tasks and accelerate the progress of tasks. At the same time, the scheduling capability provided by the Gemini AI training platform optimizes the management mechanism of the training platform. Under unified scheduling, the training of AI models is more efficient

Through EDS's self-developed heuristic read-ahead mechanism and multi-live metadata service, we can effectively solve the performance problems of underlying storage. Even with tens of billions of data sets, we can still provide high-speed reading and writing capabilities. In this way, it can not only reduce the waiting time of the GPU, but also improve the throughput and training efficiency of short-term loop training

Sangfor and TrendForce join forces to use high-performance storage to support the development of large AI models

2. Capacity and performance can be expanded simultaneously to create a cost-effective storage solution

In the daily AI data set training process, in order to make the model more accurate, it is often necessary to use massive images, text and other data to train the AI ​​model. The rapid growth of data has put tremendous pressure on the capacity and performance of underlying storage. The high-cost and inefficient expansion model of traditional storage has become increasingly difficult to meet performance and capacity needs.

Relying on fully self-developed technologies such as matrix storage algorithms, EDS can effectively solve the space waste problem caused by write amplification of small files in the process of storing small files such as images, texts, and videos, maximizing the use of storage space, and is composed of three nodes. A cluster can meet the storage needs of a medium-sized AI training team. In terms of performance expansion, also benefiting from the architectural advantages of software-defined storage, EDS can achieve simultaneous expansion of capacity and performance during expansion, flexibly responding to the rapidly growing performance requirements of AI business.

3. Unified management and deep mining of the value of data

Before releasing the solution, EDS had successfully achieved seamless integration with Trend Micro’s Gemini AI training platform through protocols such as NFS CSI and S3. Through deep adaptation, the Kubernetes container orchestration platform can complete dynamic allocation of storage resources faster. Users can directly skip the feasibility study of the solution during deployment and quickly launch AI training tasks. At the same time, EDS also supports data interoperability between multiple protocols. Multiple types of clients can share a storage system. The result data of each stage does not need to be copied across storages, thereby achieving efficient flow and ensuring that users can Effectively use data results to more conveniently mine the value of data

In the future, the two parties will also strengthen cooperation in the technical field and help users accelerate the upgrade and construction of AI training platforms by providing joint solutions with higher storage performance, so that more users can achieve greater results on the road to AI training. Faster and more stable progress

About Trend Technology: Trend Technology is committed to providing users with the world's leading data center-level AI computing power virtualization and resource pooling solutions. At present, many leading companies and users in artificial intelligence, Internet, operators, finance, automobile and autonomous driving, education and other industries have used OrionX AI computing resource pooling solution

The above is the detailed content of Sangfor and TrendForce join forces to use high-performance storage to support the development of large AI models. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:sohu.com. If there is any infringement, please contact admin@php.cn delete