AI is once again involved in the world of mathematics, and new DSP methods double the success rate of machine proofs-AI-php.cn

Home

Technology peripherals

AI is once again involved in the world of mathematics, and new DSP methods double the success rate of machine proofs

PHPz

May 13, 2023 pm 01:43 PM

Googleai

Automatically proving mathematical theorems is an original intention of artificial intelligence and has always been a difficult problem. So far, human mathematicians have used two different ways of writing mathematics.

#The first is a method that everyone is familiar with, which is to use natural language to describe mathematical proofs. Most mathematics is written in this way, including mathematics textbooks, mathematics papers, etc.

#The second type is called formal mathematics. This is a tool created by computer scientists over the past half century to check mathematical proofs.

It seems today that computers can be used to verify mathematical proofs, but they can only do so using specially designed proof languages that cannot handle mathematical symbols. and a mixture of written texts used by mathematicians. Converting mathematical problems written in natural language into formal code so that computers can more easily solve them may help build machines that can explore new discoveries in mathematics. This process is called formalisation, and autoformalization refers to the task of automatically translating mathematics from natural language into a formal language.

Automation of formal proofs is a challenging task, and deep learning methods have not yet achieved great success in this field, mainly due to the scarcity of formal data. In fact, formal proof itself is very difficult and only a few experts can do it, which makes large-scale annotation efforts unrealistic. The largest corpus of formal proofs is written in Isabelle code (Paulson, 1994) and is less than 0.6GB in size, orders of magnitude smaller than commonly used datasets in vision or natural language processing. To address the scarcity of formal proofs, previous studies have proposed using synthetic data, self-supervised or reinforcement learning to synthesize additional formal training data. Although these methods alleviate the lack of data to a certain extent, they are unable to fully utilize the large number of manually written mathematical proofs.

Let’s take the language model Minerva as an example. After training with enough data, we found that its math ability is very strong and it can get higher than average scores in high school math tests. However, such a language model also has shortcomings. It can only imitate, but cannot train independently to improve mathematics. Formal proof systems provide a training environment, but there is very little data for formal mathematics.

Unlike formal mathematics, informal mathematical data is abundant and widely available. Recently, large language models trained on informal mathematical data have demonstrated impressive quantitative reasoning capabilities. However, they often produce faulty proofs, and automatically detecting faulty reasoning in these proofs is challenging.

In a recent work, researchers such as Yuhuai Tony Wu of Google designed a new method called DSP (Draft, Sketch, and Prove). Convert informal mathematical proofs into formal proofs, thus possessing both the logical rigor provided by formal systems and a large amount of informal data.

AI is once again involved in the world of mathematics, and new DSP methods double the success rate of machine proofs

Paper link: https://arxiv.org/pdf/2210.12283.pdf

Earlier this year, Wu Yuhuai and several collaborators used the neural network of OpenAI Codex to perform automatic formalization work, demonstrating that a large language model can be used to automatically translate informal statements into Feasibility of formal statements. DSP goes a step further and uses large language models to generate formal proof sketches from informal proofs. Proof sketches consist of high-level reasoning steps that can be interpreted by a formal system like an interactive theorem prover. They differ from complete formal proofs in that they contain sequences of unjustified intermediate conjectures. In the final step of the DSP, the formal proof sketch is elaborated into a complete formal proof, using an automatic verifier to prove all intermediate conjectures.

Wu Yuhuai said: Now, we have shown that LLM can convert the informal proofs it generates into verified formal proofs!

AI is once again involved in the world of mathematics, and new DSP methods double the success rate of machine proofs

Method

The Method section describes the DSP method for formal proof automation, which utilizes informal proofs to guide automatic formal theorem proving. Proof sketch of the device. It is assumed here that each problem has an informal proposition and a formal proposition describing the problem. The overall pipeline consists of three stages (shown in Figure 1).

AI is once again involved in the world of mathematics, and new DSP methods double the success rate of machine proofs

##Figure 1.

Informal proof The initial stage of drafting a

DSP approach involves finding an informal proof for the problem based on its description in a natural mathematical language (possibly in LATEX). The resulting informal proof is treated as a rough draft for subsequent stages. In mathematics textbooks, proofs of theorems are generally provided, but sometimes they are missing or incomplete. Therefore, the researchers considered two situations corresponding to the presence or absence of informal proofs.

In the first case, the researcher assumes that there is a "real" informal proof (that is, a proof written by a human), which is the formal practice of existing mathematical theory typical situations. In the second case, the researchers make a more general assumption that no real informal proof is given, and use a large language model trained on informal mathematical data to draft proof candidates. The language model eliminates the dependence on human proofs and can generate multiple alternative solutions to each problem. While there is no easy way to automatically verify the correctness of these proofs, informal proofs only need to be useful at the next stage in generating a good formal proof sketch.

Mapping informal proofs into formal sketches

Formal proof sketches encode the structure of a solution and leave aside low-level details. Intuitively, it is a partial proof that outlines a high-level conjectural statement. Figure 2 is a concrete example of a proof sketch. Although informal proofs often leave aside low-level details, these details cannot be excluded in formal proofs, making the direct conversion of informal to formal proofs difficult. Instead, this paper proposes mapping informal proofs onto formal proof sketches that share the same high-level structure. Low-level details missing from the proof sketch can be filled in by the automatic prover. Since large informal-formal parallel corpora do not exist, standard machine translation methods are not suitable for this task. Instead, the few-shot learning capabilities of a large language model are used here. Specifically, some example pairs containing informal proofs and their corresponding formal sketches are used to prompt the model, followed by an informal proof to be converted, and then let the model generate subsequent tokens to obtain the required formal sketch. This model is called an "automatic formalizer".

AI is once again involved in the world of mathematics, and new DSP methods double the success rate of machine proofs

##Figure 2.Proof of Disclosure in Sketch Guess

As the last part of this process, the researcher implements an off-the-shelf automatic prover to fill in the missing details in the proof sketch. The "automatic prover" here refers to one that can produce a formally verifiable proof. Proven system. The framework is agnostic to the specific choice of the automatic prover: it can be a symbolic prover (such as a heuristic proof automation tool), a neural network-based prover, or a hybrid approach. If the automatic prover successfully fills in all the gaps in the proof sketch, it returns a final formal proof that can be checked against the specification of the problem. If the automatic prover fails (for example, it exceeds the allocated time limit), the evaluation is considered unsuccessful.

Experiments

The researchers conducted a series of experiments, including formal proofs of problems generated from the miniF2F dataset, and showed that a large proportion of theorems can be automatically proved using this method. Two environments are studied here, where informal proofs are written by humans or drafted by a large language model trained on mathematical text. These two settings correspond to situations that often occur in the formalization of existing theories, that is, there are often informal proofs, but they are sometimes left as an exercise to the reader, or are missing due to constraints in the margins.

Table 1 shows the proportion of successful formal proofs found on the miniF2F dataset. The results include four baselines from our experiments, as well as the DSP method with human-written proofs and model-generated proofs.

AI is once again involved in the world of mathematics, and new DSP methods double the success rate of machine proofs

It can be seen that the automatic prover with 11 heuristic strategies has greatly increased the performance of Sledgehammer. miniF2F improves its success rate from 9.9% to 18.0% on the validation set and from 10.4% to 20.9% on the test set. Two baselines using language models and proof search achieved success rates of 29.9% and 35.2% respectively on the test set of miniF2F.

Based on human-written informal proofs, the DSP approach achieves success rates of 42.6% and 39.3% on the validation and test sets of miniF2F. A total of 200 of the 488 problems can be proved in this way. The Codex model and the Minerva (8B) model gave very similar results when solving problems on miniF2F: they both guided the automatic verifier to solve 40.6% and 35.3% of the problems on the validation and test sets respectively.

When switching to the Minerva (62B) model, the success rates increased to 43.9% and 37.7% respectively. Compared to human-written informal proofs, its success rate is 1.3% higher on the validation set and 1.6% lower on the test set. Overall, the Minerva (62B) model is able to solve 199 problems on miniF2F, one less than a human-written proof. The Minerva (540B) model solved 42.6% and 38.9% of the problems in miniF2F's validation set and test set, respectively, and also generated 199 successful proofs.

The DSP approach effectively guides automated provers in both cases: informal proofs using humans or informal proofs generated by language models. DSP nearly doubles the prover success rate and produces SOTA performance on miniF2F using Isabelle. Furthermore, larger Minerva models are almost as helpful as humans in guiding automated formal provers.

As shown in the figure below, the DSP method significantly improves the performance of the Sledgehammer heuristic prover (~20% -> ~40%), achieving a new SOTA on miniF2F.

The 62B and 540B versions of Minerva generate proofs that are very similar to human proofs.

AI is once again involved in the world of mathematics, and new DSP methods double the success rate of machine proofs

For more information, please refer to the original paper.

The above is the detailed content of AI is once again involved in the world of mathematics, and new DSP methods double the success rate of machine proofs. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete

How to Build Your Personal AI Assistant with Huggingface SmolLMApr 18, 2025 am 11:52 AM

Harness the Power of On-Device AI: Building a Personal Chatbot CLI In the recent past, the concept of a personal AI assistant seemed like science fiction. Imagine Alex, a tech enthusiast, dreaming of a smart, local AI companion—one that doesn't rely

AI For Mental Health Gets Attentively Analyzed Via Exciting New Initiative At Stanford UniversityApr 18, 2025 am 11:49 AM

Their inaugural launch of AI4MH took place on April 15, 2025, and luminary Dr. Tom Insel, M.D., famed psychiatrist and neuroscientist, served as the kick-off speaker. Dr. Insel is renowned for his outstanding work in mental health research and techno

The 2025 WNBA Draft Class Enters A League Growing And Fighting Online HarassmentApr 18, 2025 am 11:44 AM

"We want to ensure that the WNBA remains a space where everyone, players, fans and corporate partners, feel safe, valued and empowered," Engelbert stated, addressing what has become one of women's sports' most damaging challenges. The anno

Comprehensive Guide to Python Built-in Data Structures - Analytics VidhyaApr 18, 2025 am 11:43 AM

Introduction Python excels as a programming language, particularly in data science and generative AI. Efficient data manipulation (storage, management, and access) is crucial when dealing with large datasets. We've previously covered numbers and st

First Impressions From OpenAI's New Models Compared To AlternativesApr 18, 2025 am 11:41 AM

Before diving in, an important caveat: AI performance is non-deterministic and highly use-case specific. In simpler terms, Your Mileage May Vary. Don't take this (or any other) article as the final word—instead, test these models on your own scenario

AI Portfolio | How to Build a Portfolio for an AI Career?Apr 18, 2025 am 11:40 AM

Building a Standout AI/ML Portfolio: A Guide for Beginners and Professionals Creating a compelling portfolio is crucial for securing roles in artificial intelligence (AI) and machine learning (ML). This guide provides advice for building a portfolio

What Agentic AI Could Mean For Security OperationsApr 18, 2025 am 11:36 AM

The result? Burnout, inefficiency, and a widening gap between detection and action. None of this should come as a shock to anyone who works in cybersecurity. The promise of agentic AI has emerged as a potential turning point, though. This new class

Google Versus OpenAI: The AI Fight For StudentsApr 18, 2025 am 11:31 AM

Immediate Impact versus Long-Term Partnership? Two weeks ago OpenAI stepped forward with a powerful short-term offer, granting U.S. and Canadian college students free access to ChatGPT Plus through the end of May 2025. This tool includes GPT‑4o, an a

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

3 weeks agoByDDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

2 weeks agoByDDD

Where to find the Crane Control Keycard in Atomfall

3 weeks agoByDDD

Saving in R.E.P.O. Explained (And Save Files)

1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows - How To Find The Blacksmith And Unlock Weapon And Armour Customisation

4 weeks agoByDDD

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.