Home >Technology peripherals >AI >Caltech Chinese use AI to subvert mathematical proofs! Speed ​​up 5 times shocked Tao Zhexuan, 80% of mathematical steps are fully automated

Caltech Chinese use AI to subvert mathematical proofs! Speed ​​up 5 times shocked Tao Zhexuan, 80% of mathematical steps are fully automated

王林
王林forward
2024-04-23 15:01:29630browse

Lean Copilot, this formal mathematics tool that has been praised by many mathematicians such as Terence Tao, has evolved again?

Just now, Caltech professor Anima Anandkumar announced that the team released an expanded version of Lean Copilot's paper and updated the code base.

Caltech Chinese use AI to subvert mathematical proofs! Speed ​​up 5 times shocked Tao Zhexuan, 80% of mathematical steps are fully automatedPicture

Paper address: https://arxiv.org/pdf/2404.12534.pdf

The latest experiments show that this Copilot tool can automate more than 80% of the mathematical proof steps! This record is 2.3 times better than the previous baseline aesop.

And, as before, it is open source under the MIT license.

Caltech Chinese use AI to subvert mathematical proofs! Speed ​​up 5 times shocked Tao Zhexuan, 80% of mathematical steps are fully automated##Picture

He is Song Peiyang, a Chinese guy. He is an honorary CS undergraduate student at UCSB and a computing student at Caltech. SURF Fellow in the Department of Mathematical Sciences (CMS).

Netizens exclaimed: So, Terence Tao’s current mathematical research can be accelerated 5 times in place?

Caltech Chinese use AI to subvert mathematical proofs! Speed ​​up 5 times shocked Tao Zhexuan, 80% of mathematical steps are fully automatedPicture

LLM proposes a proof strategy and seamless human intervention

The team released this Lean Copilot The tool hopes to activate the collaboration between humans and LLM to write 100% accurate formal mathematical proofs.

It solves a core technical challenge: running LLM inference in Lean.

Through this tool, we can let LLM propose proof strategies in Lean, allowing humans to intervene and modify in a seamless way.

Caltech Chinese use AI to subvert mathematical proofs! Speed ​​up 5 times shocked Tao Zhexuan, 80% of mathematical steps are fully automatedPicture

This project was developed because automated theorem proving remains a difficult challenge today.

We all know that LLM often makes mistakes and hallucinations when doing mathematics and reasoning tasks, and is very unreliable.

Caltech Chinese use AI to subvert mathematical proofs! Speed ​​up 5 times shocked Tao Zhexuan, 80% of mathematical steps are fully automatedPicture

Therefore, so far, mathematical proofs have mostly been derived manually and require careful verification.

Theorem proving tools like Lean can formalize every step of the proof process, but it is really laborious for humans to write Lean.

In this case, the birth of Lean Copilot is of great significance.

The artifact that shocked Terence Tao many times: Mathematicians are finished before they can use it

LLM can be used as a tool to assist humans in proving theorems , this argument has been confirmed many times by Terence Tao.

He just predicted in his blog that in 2026 AI will be combined with search and symbolic mathematics tools to become a trustworthy co-author in mathematical research.

Immediately afterwards, studies supporting his views sprung up like mushrooms after a rain.

In June last year, scholars from California Institute of Technology, NVIDIA, MIT and other institutions built LeanDojo, a theorem prover based on open source LLM. Picture

Caltech Chinese use AI to subvert mathematical proofs! Speed ​​up 5 times shocked Tao Zhexuan, 80% of mathematical steps are fully automatedSeptember,

Microsoft Research Asia, Peking University, Researchers from Beihang University and other institutions successfully made GPT-4 reach the conclusion of "P≠NP" through 97 rounds of "Socratic" rigorous reasoning, solving this millennium problem .

Caltech Chinese use AI to subvert mathematical proofs! Speed ​​up 5 times shocked Tao Zhexuan, 80% of mathematical steps are fully automatedPicture

In the 97th round of dialogue, GPT-4 concluded that the example cannot be solved without exhaustive method, proving The conclusion is that P≠NP Hidden bugs everywhere.

In the process of using Lean4 to formalize the argument on page 6, he discovered that the expression

Picture

When n=3,k=2, it actually diverges.

Caltech Chinese use AI to subvert mathematical proofs! Speed ​​up 5 times shocked Tao Zhexuan, 80% of mathematical steps are fully automatedThis bug that is not easy to see can be caught in time, thanks to Lean4. The reason is that Lean asked him to construct 02. Therefore, Lean cannot get disproof based on negative 0

Picture

This discovery directly shocked Tao Zhexuan's pupils.

Caltech Chinese use AI to subvert mathematical proofs! Speed ​​up 5 times shocked Tao Zhexuan, 80% of mathematical steps are fully automated

Picture

And at the end of last year,

Caltech Chinese use AI to subvert mathematical proofs! Speed ​​up 5 times shocked Tao Zhexuan, 80% of mathematical steps are fully automatedTao Zhexuan directly and successfully used AI tools, Completed the work of formalizing the proof process of the polynomial Freiman-Ruzsa conjecture

. Picture

Finally, the dependency graph has been completely covered with green, and the Lean compiler also reported that this conjecture is completely Follow standard axioms.

Caltech Chinese use AI to subvert mathematical proofs! Speed ​​up 5 times shocked Tao Zhexuan, 80% of mathematical steps are fully automated

Picture

In this process, all the frontline mathematics researchers felt for the first time the impact of AI on A direct onslaught of disruptive forces in mathematical research.

Caltech Chinese use AI to subvert mathematical proofs! Speed ​​up 5 times shocked Tao Zhexuan, 80% of mathematical steps are fully automatedLean Coilot makes Lean more usable

Today, this research by Lean Copilot has made Lean directly more powerful.

In this paper, the team built some tools based on Lean Copilot for suggesting proof steps (strategy suggestion), completing intermediate proof goals (proof search) and using LLM to select relevant Premise (premise selection).

The experimental results also fully demonstrate that compared with the existing rule-based proof automation in Lean, Lean Copilot is effective in assisting humans in automated theorem proof.

Lean Copilot provides a general framework that can run LLM inference locally through CTranslate 2, or on the server.

Through this framework, users can create various automated certification tools.

Picture

Lean is a popular proof assistant among mathematicians. As shown in the figure below, a proof in Lean consists of a series of proof steps called tactics.

Caltech Chinese use AI to subvert mathematical proofs! Speed ​​up 5 times shocked Tao Zhexuan, 80% of mathematical steps are fully automated

Picture

Starting from the entire theorem as the initial goal, the strategy iteratively transforms the current goal into simpler sub-goals , until all goals are resolved.

Caltech Chinese use AI to subvert mathematical proofs! Speed ​​up 5 times shocked Tao Zhexuan, 80% of mathematical steps are fully automated

Users write strategies interactively in the IDE driven by VSCode, and the goals are displayed in the infoview panel on the right.

Generate strategy suggestions

Using Lean Copilot, the team built suggest_tropics, a tool for generating strategy suggestions using LLM.

And it itself is also a strategy.

When applied, it inputs the current target into LLM and obtains the generated strategy candidate list from LLM.

It looks at each option to see if they 1) result in an error; 2) result in nothing wrong but fail to complete the proof; 3) successfully complete the proof.

If it is 1), this policy will be deleted.

Caltech Chinese use AI to subvert mathematical proofs! Speed ​​up 5 times shocked Tao Zhexuan, 80% of mathematical steps are fully automatedPicture

Only error-free strategies will be displayed in the view panel on the right.

Among them, the strategy that successfully completed the proof is marked with green (category 3); the strategy that changed the proof goal without errors but did not complete the proof is marked with blue (category 2).

Notice! When all listed strategies fall into Category 2, this information can be extremely valuable to the user.

In this case, the remaining target information can directly help the user choose a strategy as the next intermediate proof step.

After seeing the suggestions, users can choose whether to accept them or use them as a source of inspiration to develop new strategies.

For example, we define a theorem add_abc in the Lean code, and its initial goal is shown on the right side of Figure 3.

Caltech Chinese use AI to subvert mathematical proofs! Speed ​​up 5 times shocked Tao Zhexuan, 80% of mathematical steps are fully automatedPicture

When we enter suggest_tropics, we will see strategy suggestions on the right.

The first strategy is shown in green, indicating that the proof was completed successfully.

The next three suggestions are all blue, which indicates that the proof cannot be completed directly, but will not lead to errors.

Thus, they are likely to be valid intermediate proof steps!

At the same time, the remaining sub-goals are also displayed.

The Tactic state field displays No goal because at least one strategy proposal can be proven.

Caltech Chinese use AI to subvert mathematical proofs! Speed ​​up 5 times shocked Tao Zhexuan, 80% of mathematical steps are fully automatedPicture

Search for complete proof

#Furthermore, because both humans and machines The correct strategy cannot be consistently produced, so one must backtrack and explore different alternatives in a process known as proof search.

When it comes to Suggest_tropics mentioned above, it can only generate the strategy of the current step and does not have the ability to search for multi-strategy proofs.

To this end, the team combined it with the rule-based proof search tool aesop to build an LLM-based proof search tool.

Aesop will implement best-first search as a Lean strategy and allow users to configure how the search tree is expanded.

Caltech Chinese use AI to subvert mathematical proofs! Speed ​​up 5 times shocked Tao Zhexuan, 80% of mathematical steps are fully automatedPicture

The search tree is composed of targets that are nodes.

Initially, it only has the original target as the root node. At each step, aesop selects the most promising unexpanded node, expands it by applying a policy, and adds the resulting node as a child node.

Caltech Chinese use AI to subvert mathematical proofs! Speed ​​up 5 times shocked Tao Zhexuan, 80% of mathematical steps are fully automatedpicture

And when aesop finds a path from the root cause to a target that can be easily solved, it proves that the search is successful!

Therefore, the performance of aesop depends critically on whether the user configures an effective rule set.

This shows that aesop lacks flexibility. Therefore, Search_proof enhances aesop's rule set by making it more flexible with target-related policies generated by suggest_tropics at each step.

For the original goal in Figure 3, the user only needs to enter search_prrof to find a complete proof that can solve the goal, which is displayed in the information view (right in Figure 5).

It can be seen that since evidence of success is found, the remaining Tactic state is No goals.

Caltech Chinese use AI to subvert mathematical proofs! Speed ​​up 5 times shocked Tao Zhexuan, 80% of mathematical steps are fully automatedPicture

Choose the annotated premise

In addition, in the theorem proof Another challenging and important task is to find relevant premises that reduce or complete the proof.

In addition to a large number of prerequisites in the source code library and standard library, Lean also has a large mathematics library (Mathlib).

However, searching for candidate premises from all libraries is extremely difficult and time-consuming.

So many people are trying to get assistance from Lean or other proof assistants, or to automatically complete this process.

Caltech Chinese use AI to subvert mathematical proofs! Speed ​​up 5 times shocked Tao Zhexuan, 80% of mathematical steps are fully automatedPicture

In Lean, the most advanced premise selection method is based on random forest (implemented directly in Lean) random forest) framework.

However, the premise selection task is well suited for retrieval-enhanced LLM, where the retrieval matrix (premise embedding) is trained during large model training to estimate the correlation between the proof target and the candidate premises .

Given a proof goal at inference time, the goal is first encoded into a vector, and then a matrix-vector multiplication is performed between the premise embedding and the goal vector.

Then, in order to select the top k premises (where k can be a hyperparameter that determines how many premises the user wants to return), then just return the k premises with the highest scores .

To perform reasoning tasks in Lean, in addition to the fast reasoning provided by Lean Copilot, you also need an efficient matrix multiplication library and a C numpy matrix reader.

The researchers used the matrix multiplication function from CTranslate2, and the C fast numpy file reader from Libnpy.

They link these numbers to Lean again through the FFI mechanism.

Thus, the premise selection strategy can run very efficiently, since the premise embeddings can be pre-computed and all subsequent operations can be done quickly in C using the libraries introduced above.

After obtaining the returned premise, the researcher further annotated it with useful information.

All premises are divided into two categories: premises that can be used directly in the current environment (in-scope premises) and premises that cannot be used directly in the current environment (out-of-scope premises) ).

This depends on whether the required packages are imported.

You can easily use the prerequisite if you have already imported the packages required by the prerequisite. Figure 6 below shows an annotated scope premise.

Caltech Chinese use AI to subvert mathematical proofs! Speed ​​up 5 times shocked Tao Zhexuan, 80% of mathematical steps are fully automated

# Figure 7 shows an annotated out-of-scope premise.

Caltech Chinese use AI to subvert mathematical proofs! Speed ​​up 5 times shocked Tao Zhexuan, 80% of mathematical steps are fully automated

The following is an example of using "premise selection". For the theorem add_abc in Figure 3, you can directly enter select_premises in the proof (Figure 8 Left).

Then, a list of related premises will appear in the information view (right in Figure 8).

For this simple theorem, it can be clearly seen that the chosen premises are indeed relevant, because they are all related to natural numbers and the addition rule.

In this case, the 4 selected prerequisites are all in the current scope, which means their modules have already been imported.

Caltech Chinese use AI to subvert mathematical proofs! Speed ​​up 5 times shocked Tao Zhexuan, 80% of mathematical steps are fully automated

The above are three practical proof automation tools built by researchers through Lean Copilot for strategy suggestions, search proofs and premise selection. .

81.2% of the proof steps are all automated

Through the Lean Copilot framework, researchers put forward hypotheses based on experience - in Lean Interactive Theorem Proving (ITP) Human-machine collaboration is beneficial.

Due to the theorem proving process in Lean, it mainly focuses on strategy proof.

Therefore, in the specific experiment, the author mainly evaluated the proof automation tools for "strategy suggestion" and "proof search".

In summary, aesop is currently the most advanced rule-based proof automation tool for proof search.

The researchers verified the effectiveness of LLM-based search proof compared with aesop in two cases:

(1) Prove theorems autonomously (LLM completes independently)

(2) Assist humans in proving theorems (collaboration between humans and AI)

In addition, researchers Search proofs are also compared with strategy recommendations to demonstrate the advantages of search proofs over and above single strategy recommendations.

Study how Lean Copilot can effectively help humans perform the process of ITP, which is similar to the paradigm of humans using Copilot in software programming.

In other words, when we face a goal, we will first call Copilot to see if it can directly solve the problem.

If not, we simplify the goal further and try Copilot again. Then, the above process is repeated until Copilot successfully solves the remaining targets.

The researchers used this iterative collaboration example to see how much human labor can be automated by each proven automation tool.

The specific results are shown in Table 1 below.

Proof search (search_proof) can automatically prove 64% of theorems (32 out of 50), significantly higher than aesop and strategy suggestions (suggest_tropics).

When used to assist humans, proof search only requires an average of 1.02 manually entered strategies, which is also better than aesop (3.62) and strategy suggestions (2.72).

Caltech Chinese use AI to subvert mathematical proofs! Speed ​​up 5 times shocked Tao Zhexuan, 80% of mathematical steps are fully automatedPicture

Finally, for each theorem tested, the author computes a proof that can be automated by each of the three tools Percentage of steps.

The results found that proof search can automatically complete about 81.2% of the proof steps in the theorem, which is significantly higher than strategy suggestions (48.6%) and aesop (35.2%).

In summary, the performance of proven search is 1.67 times higher than policy recommendations and 2.31 times higher than the rule-based baseline aesop.

Native LLM inference in Lean via Copilot

Lean Tactical suggestion, proof search and premise selection in Copilot, these three tasks may look like in essence Different, but the requirements for user experience are similar.

They all need to generate responses quickly enough, have moderate computational requirements, while running in Lean.

The reason why users have these requirements is that Lean itself can provide environment feedback (such as remaining targets, error messages, type information, etc.) very quickly in most cases.

This speed is consistent with the essence of proving theorem - it requires coherent reasoning.

If Lean Copilot requires users to wait for a long time, it will be difficult for collaboration between humans and AI to work.

Similarly, we also very much hope to meet the needs of low computing. Because theorem proving in Lean itself does not require a GPU and can be run on the user's local laptop.

Therefore, it is very important for Lean users to be able to run efficiently on most hardware (including laptops without GPU).

Because users may not have access to a CUDA-capable GPU when writing proofs.

Because fast inference and low computational requirements need to be met, and all popular and efficient deep learning frameworks are in Python, a natural solution that the team thought of was to host it in Python model (local or remote) and then make a request to the model from Lean.

However, this approach suffers from the overhead of inter-process communication, and it requires users to perform additional setup steps and is not suitable for Lean’s traditional workflow.

To overcome these problems, Lean Copilot runs LLM natively in Lean through the Foreign Function Interface (FFI).

FFI is a mechanism that allows a program written in one language to call a subroutine in another language.

The Lean part is implemented in c and can efficiently interoperate with c.

Programmers can declare a function in Lean but implement the function body in C. The implementation is compiled into a shared library and dynamically linked to Lean.

By default, we use the LeanDojo pre-trained repver model. It is based on an encoder-decoder converter, BVT5, which maps input strings to output strings.

Lean Copilot makes it runnable in Lean by wrapping the model into a c function that operates on strings, which can be called in Lean through FFI.

Caltech Chinese use AI to subvert mathematical proofs! Speed ​​up 5 times shocked Tao Zhexuan, 80% of mathematical steps are fully automatedPicture

Chinese authors have made great contributions

The three-person team in the latest paper also has 23 years of experience in 6 The author of the monthly open source platform LeanDojo.

Caltech Chinese use AI to subvert mathematical proofs! Speed ​​up 5 times shocked Tao Zhexuan, 80% of mathematical steps are fully automatedPicture

Paper address: https://arxiv.org/pdf/2306.15626.pdf

Peiyang Song(松 Peiyang)


Caltech Chinese use AI to subvert mathematical proofs! Speed ​​up 5 times shocked Tao Zhexuan, 80% of mathematical steps are fully automated##Picture

Song Peiyang is Honors undergraduate student in Computer Science at UC Santa Barbara's College of Creative Studies (CCS), mentored by Richert Wang and Phill Conrad.

At the same time, he is also a SURF researcher in the Department of Computational and Mathematical Sciences (CMS) at Caltech, co-supervised by Professor Anima Anandkumar and Dr. Kaiyu Yang.

Caltech Chinese use AI to subvert mathematical proofs! Speed ​​up 5 times shocked Tao Zhexuan, 80% of mathematical steps are fully automatedPicture

In addition, he is a researcher at the UC Berkeley Architecture Laboratory, working with Tim Sherwood and Dr. Jeremy Lau (Google )work together.

His research interests are machine learning (ML), involving application areas such as natural language processing (NLP) and computer vision (CV), as well as fundamentals such as systems and programming languages ​​(PL) theory.

Song Peiyang’s recent research mainly has two directions.

The first is neurosymbolic reasoning and artificial intelligence mathematics (AI4Math), which combines large models with interactive theorem provers (ITPs).

The other is energy-efficient machine learning based on sequential logic.

#Kaiyu Yang(杨凯宇)

##PictureCaltech Chinese use AI to subvert mathematical proofs! Speed ​​up 5 times shocked Tao Zhexuan, 80% of mathematical steps are fully automated

Yang Kaiyu is from the California Institute of Technology Postdoctoral researcher in the Department of Computational Mathematical Sciences (CMS), mentored by Anima Anandkumar.

He received his PhD from Princeton University, where his supervisor was Jia Deng, and he also worked with Olga Russakovsky and Chen Danqi.

His research focuses on neurosymbolic artificial intelligence, which aims to enable machine learning to perform symbolic reasoning, hoping to achieve this in two directions:

(1) Apply machine learning to symbolic reasoning tasks, such as mathematical reasoning and theorem proving in formal logic or natural language;

(2) Introduce symbolic components into machine learning models, Making it more explainable, verifiable and data efficient.

Currently, he is working on artificial intelligence that can understand and reason about mathematics. Mathematical reasoning is an important milestone in human intelligence and has the potential to transform many important problems in science and engineering, such as solving partial differential equations and formula verification.

Anima Anandkumar

Anima Anandkumar is now a professor of computational and mathematical sciences at Caltech.

Caltech Chinese use AI to subvert mathematical proofs! Speed ​​up 5 times shocked Tao Zhexuan, 80% of mathematical steps are fully automatedPicture

Her research interests mainly focus on the fields of large-scale machine learning, non-convex optimization and high-dimensional statistics.

In particular, she has been spearheading the development and analysis of tensor algorithms for machine learning.

The tensor decomposition method has extremely high parallelism and scalability and can be applied to massive data. It can guarantee convergence to the optimal solution and output consistent estimation results for many probability models (such as Markov models).

More broadly, Professor Anandkumar has been researching efficient techniques for accelerating non-convex optimization.

Reference:

##https://www.php.cn/link/1dd5a4016c624ef51f0542d4ae60e281

https://www.php.cn/link/ed798eec75807df6e79b0be391f720e4

##https://www.php.cn/link /a652e914c736dfaf8a6667ae6936f0d6

The above is the detailed content of Caltech Chinese use AI to subvert mathematical proofs! Speed ​​up 5 times shocked Tao Zhexuan, 80% of mathematical steps are fully automated. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete