Home > Article > Technology peripherals > Alibaba Cloud releases General Question Answering 2.0, which surpasses GPT-3.5 in performance and accelerates its pursuit of GPT-4
On October 31, Alibaba Cloud officially released Tongyi Qianwen 2.0, a large model with hundreds of billions of parameters. In 10 authoritative evaluations, the comprehensive performance of Tongyi Qianwen 2.0 exceeded GPT-3.5 and is currently Accelerate to catch up with GPT-4. On the same day, Tongyi Qianwen APP was officially launched in major mobile application markets, and everyone can directly experience the latest model capabilities through the APP.
In the past six months, Tongyi Qianwen 2.0 has made a huge leap in performance. Compared with version 1.0 released in April, Tongyi Qianwen 2.0has been significantly improvedin the abilities of understanding complex instructions, literary creation, general mathematics, knowledge memory, and resisting hallucinations. At present, the comprehensive performance of
Tongyi Qianwen has exceeded GPT-3.5, accelerating to catch up with GPT-4.Picture: Tongyi Qianwen 2.0 comprehensive performancehas exceeded GPT-3.5 and is accelerating to catch up GPT-4
in MMLU, C-Eval, GSM8K, HumanEval, MATH, etc. 10 On a
mainstream benchmark evaluation set, Tongyi Qianwen 2.0's overall score surpassed Meta's Llama-2-70B, compared with OpenAI's Chat-3.5, it was nine wins and one loss, and compared with GPT-4, it was With four wins and six losses, the gap with GPT-4 has further narrowed.The ability to understand Chinese and English is the basic skill of a large language model.
In terms of English tasks, Tongyi Qianwen 2.0 scored 82.5 on the MMLU benchmark, second only to GPT-4. By significantly increasing the number of parameters, Tongyi Qianwen 2.0 can better understand and process complex tasks. In terms of language structure and concepts; in terms of Chinese tasks, Tongyi Qianwen 2.0 achieved the highest score on the C-Eval benchmark with a clear advantage. This is because the model learned more Chinese corpus during training, further strengthening its Chinese understanding and expression capabilities.In areas such as mathematical reasoning and code understanding, Tongyi Qianwen 2.0 has made significant progress. In the reasoning benchmark test GSM8K, Tongyi Qianwen ranked second, demonstrating strong computing and logical reasoning capabilities; in the HumanEval test, Tongyi Qianwen's score closely followed GPT-4 and GPT-3.5, which mainly measures large-scale The ability of the model to understand and execute code fragments is the basis for large models to be used in scenarios such as programming assistance and automatic code repair.
##Picture: Tongyi Qianwen 2.0release
##Tongyi Qianwen is more mature and easier to use. Tongyi Qianwen 2.0 has made technical optimizations in terms of instruction compliance, tool use, refined creation, etc. can be better integrated into downstream application scenarios. Tongyi Large Model official website has launched multi-modal and plug-in functions, supporting segmented tasks such as image input and document parsing.
At the same time, eight major industry model groups based on Tongyi large model training were launched. They are Tongyi Lingma-Intelligent Coding Assistant, Tongyi Zhiwen-AI Reading Assistant, Tongyi Listening-Work and Study AI Assistant. ##、Tongyi Xiaomi-Intelligent Customer Service、 Tongyi Renxin-Personal Exclusive health assistant , Tongyi Farui-AI legal advisor. 8 major industry models are oriented to the most popular vertical scenarios, using domain data for specialized training. Users can directly experience model functions on the official website, and developers can integrate model capabilities into their own large model applications and services through web page embedding, API/SDK calls, etc. Picture: Tongyi large model family has been fully upgraded, 8 major industry modelsgroups are online
As of October, Alibaba Cloud has conducted in-depth cooperation with more than 60 industry leaders , to promote the implementation of Tongyi Qianwen in the fields of office, cultural tourism, electric power, government affairs, medical insurance, transportation, manufacturing, finance, software development and other fields.
Zhou Jingren revealed that Alibaba Cloud plans to open source the 72B version of Tongyi Qianwen in the near future. Previously, Alibaba Cloud has open sourced the 7B and 14B version models, and the cumulative number of
. Alibaba Cloud will continue to support developers from thousands of industries to innovate models and applications based on the Tongyi Qianwen open source model.
Picture: Tongyi Qianwen 72B will be open source soon
The above is the detailed content of Alibaba Cloud releases General Question Answering 2.0, which surpasses GPT-3.5 in performance and accelerates its pursuit of GPT-4. For more information, please follow other related articles on the PHP Chinese website!