Princeton open source 34B mathematical model: parameters are halved, performance is comparable to Google Minerva, and 55 billion Tokens are used for professional data training-AI-php.cn

Princeton open source 34B mathematical model: parameters are halved, performance is comparable to Google Minerva, and 55 billion Tokens are used for professional data training

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Nov 18, 2023 am 10:13 AM

dataModel

Mathematics, as the cornerstone of science, has always been a key area of research and innovation.

Recently, seven institutions including Princeton University jointly released a large language model LLEMMA specifically for mathematics, with performance comparable to Google Minerva 62B, and made its model, data set and code public , bringing unprecedented opportunities and resources to mathematical research.

Princeton open source 34B mathematical model: parameters are halved, performance is comparable to Google Minerva, and 55 billion Tokens are used for professional data training

Paper address: https://arxiv.org/abs/2310.10631

Dataset The link address is: https://huggingface.co/datasets/EleutherAI/proof-pile-2

Project address: https://github.com/EleutherAI/math-lm What needs to be rewritten is:

LLEMMA inherits the foundation of Code Llama and is pre-trained on Proof-Pile-2.

Proof-Pile-2, a huge mixed data set, contains information on 55 billion tokens, including scientific papers, web data rich in mathematical content, and mathematical codes.

Part of this data set, the Algebraic Stack, brings together 11B data sets from 17 languages, covering numerical, symbolic and mathematical proofs.

Princeton open source 34B mathematical model: parameters are halved, performance is comparable to Google Minerva, and 55 billion Tokens are used for professional data training

With 700 million and 3.4 billion parameters, it performs extremely well on the MATH benchmark, surpassing all known Open source base model.

Princeton open source 34B mathematical model: parameters are halved, performance is comparable to Google Minerva, and 55 billion Tokens are used for professional data training

Compared with the closed model developed by Google Research specifically for mathematics, the number of parameters is only half of Minerva 62B Conditions, Llemma 34B achieved almost the same performance.

Llemma surpasses Minerva's performance in solving problems on a parametric basis. It uses computational tools and formal theorem proofs to provide unlimited possibilities for solving mathematical problems

Princeton open source 34B mathematical model: parameters are halved, performance is comparable to Google Minerva, and 55 billion Tokens are used for professional data training

It can conveniently use the Python interpreter and formal prover, further demonstrating its ability to solve mathematical problems

Princeton open source 34B mathematical model: parameters are halved, performance is comparable to Google Minerva, and 55 billion Tokens are used for professional data training

Due to special emphasis on formal proof data, Algebraic Stack has become the first open basic model to demonstrate the ability to prove few-sample theorem

Princeton open source 34B mathematical model: parameters are halved, performance is comparable to Google Minerva, and 55 billion Tokens are used for professional data training

Princeton open source 34B mathematical model: parameters are halved, performance is comparable to Google Minerva, and 55 billion Tokens are used for professional data training Figure

The researchers also openly shared all the training data and code of LLEMMA. Different from previous mathematical models, LLEMMA is an open source, open and shared model, opening the door to the entire scientific research community.

The researchers tried to quantify the model memory effect, and surprisingly, they found that Llemma did not become more accurate for problems that appeared in the training set. Because the code and data are publicly available, the researchers encourage others to replicate and extend their analysis

Princeton open source 34B mathematical model: parameters are halved, performance is comparable to Google Minerva, and 55 billion Tokens are used for professional data training

Training data and experimental configuration

LLEMMA is a large language model dedicated to mathematics, which continues on the basis of Code Llama on Proof-Pile-2 Do pre-training. Proof-Pile-2 is a mixed dataset containing scientific papers, web data with mathematical content, and mathematical code. It contains 55 billion tags

The code part of AlgebraicStack contains 11B A dataset that includes source code in 17 languages, covering numerical, symbolic and formal mathematics, and has been publicly released for every model of

Princeton open source 34B mathematical model: parameters are halved, performance is comparable to Google Minerva, and 55 billion Tokens are used for professional data training

All are initialized by Code Llama. The Code Llama model is a decoder-only language model that is initialized from Llama 2

The author further trained the Code Llama model on Proof-Pile-2 , using standard autoregressive language modeling objectives. For the 7B model, the author performed training with 200B markers, while for the 34B model, the author performed training with 50B markers

Evaluation methods and experimental results

The author uses Proof-Pile-2 to continue pre-training Code Llama, and conducts a few-shot evaluation of LLEMMA on multiple mathematical problem solving tasks such as MATH and GSM8k.

The researchers found that LLEMMA significantly improved on these tasks and was able to adapt to different problem types and difficulties.

LLEMMA 34B demonstrates more powerful mathematical capabilities than other open-ended basic models in extremely difficult mathematical problems

Princeton open source 34B mathematical model: parameters are halved, performance is comparable to Google Minerva, and 55 billion Tokens are used for professional data training

On math benchmarks, LLEMMA’s continuous pre-training on Proof-Pile-2 improves few-shot performance on five math benchmarks.

The improvement of LLEMMA 34B is 20 percentage points higher than Code Llama on GSM8k and 13 percentage points higher on MATH. Moreover, LLEMMA 7B also outperforms the proprietary Minerva model of similar size, proving that pre-training on Proof-Pile-2 can effectively improve the mathematical problem-solving capabilities of large models

Princeton open source 34B mathematical model: parameters are halved, performance is comparable to Google Minerva, and 55 billion Tokens are used for professional data training

When solving mathematical problems, using computing tools such as Python, LLEMMA is better than Code Llama on both MATH Python and GSM8k Python tasks

When using MATH and GSM8k datasets, LLEMMA performs better than without the tool

Princeton open source 34B mathematical model: parameters are halved, performance is comparable to Google Minerva, and 55 billion Tokens are used for professional data training

In mathematical proof tasks, LLEMMA performs well

The goal of the informal-to-formal proof task is to generate a formal proof, given a formal statement, an informal LATEX statement, and an informal LATEX proof, Then verify it through the proof assistant.

Formal to formal proof is to prove a formal statement by generating a series of proof steps (strategies). The results show that continuous pre-training of LLEMMA on Proof-Pile-2 improves the few-shot performance of these two formal theorem proving tasks.

Princeton open source 34B mathematical model: parameters are halved, performance is comparable to Google Minerva, and 55 billion Tokens are used for professional data training

LLEMMA not only has impressive performance, but also opens up revolutionary data sets and demonstrates amazing problem-solving capabilities.

The spirit of open source sharing marks the mathematical world entering a new era. The future of mathematics is here, and every one of us mathematics enthusiasts, researchers, and educators will benefit from it.

The emergence of LLEMMA provides us with unprecedented tools to make solving mathematical problems more efficient and innovative.

In addition, the concept of open sharing will also promote deeper cooperation among the global scientific research community and jointly promote scientific progress.

The above is the detailed content of Princeton open source 34B mathematical model: parameters are halved, performance is comparable to Google Minerva, and 55 billion Tokens are used for professional data training. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete

How to Run LLM Locally Using LM Studio? - Analytics VidhyaApr 19, 2025 am 11:38 AM

Running large language models at home with ease: LM Studio User Guide In recent years, advances in software and hardware have made it possible to run large language models (LLMs) on personal computers. LM Studio is an excellent tool to make this process easy and convenient. This article will dive into how to run LLM locally using LM Studio, covering key steps, potential challenges, and the benefits of having LLM locally. Whether you are a tech enthusiast or are curious about the latest AI technologies, this guide will provide valuable insights and practical tips. Let's get started! Overview Understand the basic requirements for running LLM locally. Set up LM Studi on your computer

Guy Peri Helps Flavor McCormick's Future Through Data TransformationApr 19, 2025 am 11:35 AM

Guy Peri is McCormick’s Chief Information and Digital Officer. Though only seven months into his role, Peri is rapidly advancing a comprehensive transformation of the company’s digital capabilities. His career-long focus on data and analytics informs

What is the Chain of Emotion in Prompt Engineering? - Analytics VidhyaApr 19, 2025 am 11:33 AM

Introduction Artificial intelligence (AI) is evolving to understand not just words, but also emotions, responding with a human touch. This sophisticated interaction is crucial in the rapidly advancing field of AI and natural language processing. Th

12 Best AI Tools for Data Science Workflow - Analytics VidhyaApr 19, 2025 am 11:31 AM

Introduction In today's data-centric world, leveraging advanced AI technologies is crucial for businesses seeking a competitive edge and enhanced efficiency. A range of powerful tools empowers data scientists, analysts, and developers to build, depl

AV Byte: OpenAI's GPT-4o Mini and Other AI InnovationsApr 19, 2025 am 11:30 AM

This week's AI landscape exploded with groundbreaking releases from industry giants like OpenAI, Mistral AI, NVIDIA, DeepSeek, and Hugging Face. These new models promise increased power, affordability, and accessibility, fueled by advancements in tr

Perplexity's Android App Is Infested With Security Flaws, Report FindsApr 19, 2025 am 11:24 AM

But the company’s Android app, which offers not only search capabilities but also acts as an AI assistant, is riddled with a host of security issues that could expose its users to data theft, account takeovers and impersonation attacks from malicious

Everyone's Getting Better At Using AI: Thoughts On Vibe CodingApr 19, 2025 am 11:17 AM

You can look at what’s happening in conferences and at trade shows. You can ask engineers what they’re doing, or consult with a CEO. Everywhere you look, things are changing at breakneck speed. Engineers, and Non-Engineers What’s the difference be

Rocket Launch Simulation and Analysis using RocketPy - Analytics VidhyaApr 19, 2025 am 11:12 AM

Simulate Rocket Launches with RocketPy: A Comprehensive Guide This article guides you through simulating high-power rocket launches using RocketPy, a powerful Python library. We'll cover everything from defining rocket components to analyzing simula

See all articles