Home >Technology peripherals >AI >Tongyi Qianwen open sourced Qwen2-Math, becoming the most advanced mathematics-specific model
According to news on August 9, Alibaba Tongyi team open sourced a new generation mathematical model Qwen2-Math, which includes a basic model and an instruction fine-tuning model with three parameters of 1.5B, 7B, and 72B. Qwen2-Math is developed based on the Tongyi Qianwen open source large language model Qwen2. The flagship model Qwen2-Math-72B-Instruct scores higher than GPT-4o, Claude-3.5-Sonnet, Gemini-1.5-Pro, and Llama on the authoritative evaluation set MATH. -3.1-405B, etc., handle a variety of mathematical problems such as algebra, geometry, counting and probability, number theory, etc. with an accuracy of 84%, becoming the most advanced mathematics-specific model.
Note: In the MATH benchmark evaluation, Qwen2-Math-72B-Instruct, the flagship model of Tongyi Qianwen mathematical model, achieved an accuracy of 84%, surpassing GPT-4, Claude-3.5, Gemini-1.5-Pro and Open and closed source models such as Llama-3.1.The Qwen2-Math basic model is initialized using the Qwen2 large language model and pre-trained on a carefully designed mathematics-specific corpus. The training data includes large-scale and high-quality mathematics online texts, books, codes, examination questions, and Qwen2 Mathematical pre-training data for model synthesis. All pre-training and fine-tuning datasets were decontaminated.
Subsequently, the R&D team trained the instruction fine-tuning version of the model: first, a mathematics-specific reward model was trained based on Qwen2-Math-72B; then, the dense reward signal was combined with a binary signal indicating whether the model answered the question correctly, using Learn labels, then construct supervised fine-tuning (SFT) data through rejection sampling; finally, use the GRPO method to optimize the model based on the SFT model.
It is reported that the Qwen2-Math series models currently mainly support English. The Tongyi team will soon launch a Chinese and English bilingual version, and multi-language versions are also under development.
The Tongyi team has evaluated the performance of the instruction fine-tuning model in multiple Chinese and English mathematics benchmark evaluation sets. In addition to common evaluation benchmarks such as GSM8K and MATH, it has also introduced more challenging exam competition tests, such as Olympic-level tests. Benchmark assessment OlympiadBench, college mathematics benchmark assessment CollegeMath, College Entrance Examination (GaoKao), American Mathematics Invitational Competition (AIME) 2024 competition questions, American Mathematics Contest (AMC) 2023 competition questions, Chinese assessments include CMATH assessment set, 2024 China College Entrance Examination and High school entrance examination mathematics questions. In the end, Qwen2-Math-72B-Instruct performed extremely well and achieved results far exceeding those of other open source mathematical models in the top ten evaluations.
Note: The R&D team evaluated the model under greedy and RM@8 conditions. The table lists three score results for each Qwen2-Math-72B-Instruct model, which are the first answer scores. (No subscript number), the score of the answer that appears most often among the 8 answers, and the score of the answer selected by the reward model among the 8 answers."Can large models do math problems?" is not only a hot topic on social platforms, but also a research topic of great concern to the industry. Handling advanced mathematical problems requires models with complex multi-step logical reasoning capabilities. The Tongyi team stated in a technical blog that it hopes to "contribute to the scientific community in solving advanced mathematical problems" through open source, and will continue to enhance the mathematical capabilities of the model in the future.
Attachment: Qwen2-Math problem solving example
The above is the detailed content of Tongyi Qianwen open sourced Qwen2-Math, becoming the most advanced mathematics-specific model. For more information, please follow other related articles on the PHP Chinese website!