


A single 4090 inferable, 200 billion sparse large model 'Tiangong MoE' is open source
In the wave of large models, training and deploying state-of-the-art dense set LLMs poses huge challenges in terms of computational requirements and associated costs, especially at scales of tens or hundreds of billions of parameters. To address these challenges, sparse models, such as Mixture of Experts (MoE) models, have become increasingly important. These models offer an economically viable alternative by distributing computation to various specialized sub-models, or "experts," with the potential to match or even exceed the performance of dense set models with very low resource requirements.
On June 3, important news came from the field of open source large models: Kunlun Wanwei announced the open source of the 200 billion sparse large model Skywork-MoE. While maintaining strong performance, it has greatly improved Reduces reasoning costs.
Based on the previous open source Skywork-13B model intermediate checkpoint extension of Kunlun Wanwei. It is the first open source 100 billion MoE large model that fully applies and implements MoE Upcycling technology. It is also the first to support the use of a single 4090 An open source 100 billion MoE large model for server inference.
What attracts more attention to the large model community is that Skywork-MoE’s model weights and technical reports are completely open source and free for commercial use without application.
Model weight download address:
○ https://huggingface.co/Skywork/Skywork-MoE-base
○ https://huggingface.co/Skywork/Skywork-MoE-Base-FP8
Model open source warehouse: https://github.com/SkyworkAI/Skywork-MoE
Model technical report: https://github.com/SkyworkAI/Skywork-MoE/blob/main/skywork-moe-tech-report.pdf
-
Model inference code: (Supports 8-bit quantitative loading inference on 8x4090 server) https://github.com/SkyworkAI/vllm
Skywork-MoE is currently available on 8x4090 server The largest open source MoE model for inference. The 8x4090 server has a total of 192GB of GPU memory. Under FP8 quantization (weight occupies 146GB), using the non-uniform Tensor Parallel parallel reasoning method pioneered by the Kunlun Wanwei team, Skywork-MoE can reach 2200 tokens/s within a suitable batch size. Hesitation.
For the complete related inference framework code and installation environment, please see: https://github.com/SkyworkAI/Skywork-MoE
Skywork-MoE Introduction
This open source Skywork-MoE model belongs to the R&D model series of Tiangong 3.0, and is the mid-range model (Skywork-MoE-Medium). The total parameter amount of the model is 146B, and the amount of activated parameters is 146B. 22B, there are 16 Experts in total, each Expert is 13B in size, and 2 Experts are activated each time.
It is understood that Tiangong 3.0 has also trained two MoE models, 75B (Skywork-MoE-Small) and 400B (Skywork-MoE-Large), which are not included in this open source.
Kunlun Wanwei evaluated Skywork-MoE based on the current major mainstream model evaluation lists. Under the same activation parameter amount of 20B (inference calculation amount), Skywork-MoE's capabilities are at the forefront of the industry, close to 70B Dense Model. This reduces the model’s inference cost by nearly 3 times.
It is worth noting that the total parameter size of Skywork-MoE is 1/3 smaller than the total parameter size of DeepSeekV2, achieving similar capabilities with a smaller parameter size. .
Technical Innovation
In order to solve the problems of difficult MoE model training and poor generalization performance, Skywork-MoE designed two training optimization algorithms:
Gating Logits Normalization operation
Skywork-MoE adds a new normalization operation in the token distribution logic of the Gating Layer, making the parameter learning of the Gating Layer more inclined to the selected top -2 experts, increasing the confidence of the MoE model for top-2:
Adaptive Aux Loss
is different from the traditional fixed coefficient ( (Fixed hyperparameters) aux loss, Skywork-MoE allows the model to adaptively select appropriate aux loss hyperparameter coefficients at different stages of MoE training, thereby keeping the Drop Token Rate within an appropriate range, and achieving expert distribution Balance can also allow expert learning to be differentiated, thereby improving the overall performance and generalization level of the model. In the early stage of MoE training, due to insufficient parameter learning, the Drop Token Rate was too high (the token distribution was too different). At this time, a larger aux loss was needed to help token load balance; in the later stage of MoE training, the Skywork-MoE team hopes A certain degree of differentiation is still ensured between Experts to avoid Gating's tendency to randomly distribute Tokens, so a lower aux loss is required to reduce correction.
Training Infra
How to efficiently conduct large-scale distributed training of MoE models is a difficult challenge. Skywork-MoE proposes two important parallel optimization designs to achieve 38% training throughput of MFU on a kilocalorie cluster, where MFU calculates the theoretical computational load with an activation parameter of 22B.
Expert Data Parallel
Different from the existing EP (Expert Parallel) and ETP (Expert Tensor Parallel) designs in the Megatron-LM community, the Skywork-MoE team proposed a parallel design solution called Expert Data Parallel. This parallel solution can When the number of Experts is small, the model can still be segmented efficiently, and the all2all communication introduced by Experts can also be optimized and masked to the greatest extent. Compared with EP's limitation on the number of GPUs and ETP's inefficiency on kilo-card clusters, EDP can better solve the parallel pain points of large-scale distributed training MoE. At the same time, EDP's design is simple, robust, easy to expand, and can be compared Quick implementation and verification.
This is the simplest EDP example. In the case of two cards, TP = 2, EP = 2, where the attention part uses Tensor Parallel, Expert part Using Expert Parallel
Non-uniform split pipeline parallel
Due to the Embedding calculation of the first stage and the Loss calculation of the last stage, as well as the Pipeline Buffer There is an obvious imbalance in the computing load and video memory load of each stage when the Layer is evenly divided under pipeline parallelism. The Skywork-MoE team proposed a non-uniform pipeline parallel segmentation and recalculation layer allocation method to make the overall computing/graphics memory load more balanced and improve the end-to-end training throughput by about 10%.Compare the parallel bubbles under uniform and non-uniform cutting: For a 24-layer LLM, (a) is uniform cutting Divided into 4 stages, the number of layers in each stage is: [6, 6, 6, 6]. (b) is the optimized non-uniform segmentation method, divided into 5 stages, the number of layers in each stage is :[5, 5, 5, 5, 4], in the stage when the middle flow is full, the non-uniformly divided bubbles are lower.
In addition, Skywork-MoE also used a series of experiments based on Scaling Law to explore which constraints affect the performance of Upcycling and From Scratch training MoE models.The above is the detailed content of A single 4090 inferable, 200 billion sparse large model 'Tiangong MoE' is open source. For more information, please follow other related articles on the PHP Chinese website!
![Can't use ChatGPT! Explaining the causes and solutions that can be tested immediately [Latest 2025]](https://img.php.cn/upload/article/001/242/473/174717025174979.jpg?x-oss-process=image/resize,p_40)
ChatGPT is not accessible? This article provides a variety of practical solutions! Many users may encounter problems such as inaccessibility or slow response when using ChatGPT on a daily basis. This article will guide you to solve these problems step by step based on different situations. Causes of ChatGPT's inaccessibility and preliminary troubleshooting First, we need to determine whether the problem lies in the OpenAI server side, or the user's own network or device problems. Please follow the steps below to troubleshoot: Step 1: Check the official status of OpenAI Visit the OpenAI Status page (status.openai.com) to see if the ChatGPT service is running normally. If a red or yellow alarm is displayed, it means Open

On 10 May 2025, MIT physicist Max Tegmark told The Guardian that AI labs should emulate Oppenheimer’s Trinity-test calculus before releasing Artificial Super-Intelligence. “My assessment is that the 'Compton constant', the probability that a race to

AI music creation technology is changing with each passing day. This article will use AI models such as ChatGPT as an example to explain in detail how to use AI to assist music creation, and explain it with actual cases. We will introduce how to create music through SunoAI, AI jukebox on Hugging Face, and Python's Music21 library. Through these technologies, everyone can easily create original music. However, it should be noted that the copyright issue of AI-generated content cannot be ignored, and you must be cautious when using it. Let’s explore the infinite possibilities of AI in the music field together! OpenAI's latest AI agent "OpenAI Deep Research" introduces: [ChatGPT]Ope

The emergence of ChatGPT-4 has greatly expanded the possibility of AI applications. Compared with GPT-3.5, ChatGPT-4 has significantly improved. It has powerful context comprehension capabilities and can also recognize and generate images. It is a universal AI assistant. It has shown great potential in many fields such as improving business efficiency and assisting creation. However, at the same time, we must also pay attention to the precautions in its use. This article will explain the characteristics of ChatGPT-4 in detail and introduce effective usage methods for different scenarios. The article contains skills to make full use of the latest AI technologies, please refer to it. OpenAI's latest AI agent, please click the link below for details of "OpenAI Deep Research"

ChatGPT App: Unleash your creativity with the AI assistant! Beginner's Guide The ChatGPT app is an innovative AI assistant that handles a wide range of tasks, including writing, translation, and question answering. It is a tool with endless possibilities that is useful for creative activities and information gathering. In this article, we will explain in an easy-to-understand way for beginners, from how to install the ChatGPT smartphone app, to the features unique to apps such as voice input functions and plugins, as well as the points to keep in mind when using the app. We'll also be taking a closer look at plugin restrictions and device-to-device configuration synchronization

ChatGPT Chinese version: Unlock new experience of Chinese AI dialogue ChatGPT is popular all over the world, did you know it also offers a Chinese version? This powerful AI tool not only supports daily conversations, but also handles professional content and is compatible with Simplified and Traditional Chinese. Whether it is a user in China or a friend who is learning Chinese, you can benefit from it. This article will introduce in detail how to use ChatGPT Chinese version, including account settings, Chinese prompt word input, filter use, and selection of different packages, and analyze potential risks and response strategies. In addition, we will also compare ChatGPT Chinese version with other Chinese AI tools to help you better understand its advantages and application scenarios. OpenAI's latest AI intelligence

These can be thought of as the next leap forward in the field of generative AI, which gave us ChatGPT and other large-language-model chatbots. Rather than simply answering questions or generating information, they can take action on our behalf, inter

Efficient multiple account management techniques using ChatGPT | A thorough explanation of how to use business and private life! ChatGPT is used in a variety of situations, but some people may be worried about managing multiple accounts. This article will explain in detail how to create multiple accounts for ChatGPT, what to do when using it, and how to operate it safely and efficiently. We also cover important points such as the difference in business and private use, and complying with OpenAI's terms of use, and provide a guide to help you safely utilize multiple accounts. OpenAI


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

SublimeText3 Chinese version
Chinese version, very easy to use

VSCode Windows 64-bit Download
A free and powerful IDE editor launched by Microsoft

SecLists
SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

Notepad++7.3.1
Easy-to-use and free code editor

SAP NetWeaver Server Adapter for Eclipse
Integrate Eclipse with SAP NetWeaver application server.
