Home  >  Article  >  Technology peripherals  >  Wenxin 4.0 performed well in the SuperBench evaluation, leading in many indicators

Wenxin 4.0 performed well in the SuperBench evaluation, leading in many indicators

WBOY
WBOYforward
2024-04-23 13:37:06477browse

In March 2024, in the "SuperBench Large Model Comprehensive Capability Evaluation Report" recently released by the Basic Model Research Center of Tsinghua University, the report comprehensively evaluated 14 influential models at home and abroad.

In this report, the outstanding performance of Wenian 4.0 has attracted widespread attention. Its overall performance is close to the top international models, and it is gradually narrowing the gap with the world's leading models, showing that it has become the leading domestic model.

Wenxin 4.0 performed well in the SuperBench evaluation, leading in many indicators

In the evaluation of human alignment ability, Text 4.0 showed outstanding strength and ranked first in the country without any doubt. At the same time, in the evaluation of Chinese reasoning and Chinese language ability, Text 4.0 is also the best. Compared with other models, its advantages are very obvious. Especially in the evaluation of Chinese understanding, the score of Text 4.0 is 0.41 points higher than the second-placed GLM-4, showing its profound skills in Chinese processing.

In the evaluation of mathematical capabilities for semantic understanding, Text 4.0 and Claude-3 models tied for first place in the world, while the well-known GPT-4 series models followed closely behind, ranking fourth and fifth. The scores of other models are mostly concentrated around 55 points, and there is a significant gap between the leading groups.

Wenxin 4.0 performed well in the SuperBench evaluation, leading in many indicators

#In the evaluation of reading comprehension ability, Wenxin 4.0 also shines. It not only surpassed GPT-4 Turbo and Claude-3, but also surpassed GLM-4 and achieved the highest score.

In the security evaluation that enterprises are most concerned about, Text GPT 4.0 also showed excellent performance. It reached a high score of 89.1 points, surpassing the world-class GPT-4 series models and Claude-3. ranked first, while Claude-3 only ranked fourth in this review.

The report also mentioned that since Wenxinyiyan made its public debut on March 16 last year, it has achieved a breakthrough in the number of users in a short period of time, and currently has more than 200 million users. At the same time, the number of daily API calls is also extremely active, exceeding 200 million times.

The above is the detailed content of Wenxin 4.0 performed well in the SuperBench evaluation, leading in many indicators. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:itbear.com. If there is any infringement, please contact admin@php.cn delete