search
HomeTechnology peripheralsAIThe number of papers has increased sharply in the past ten years. How does deep learning slowly open the door to mathematical reasoning?

Mathematical reasoning is a key manifestation of human intelligence, allowing us to understand and make decisions based on numerical data and language. Mathematical reasoning applies to a variety of fields, including science, engineering, finance and everyday life, and encompasses a range of abilities from basic skills such as pattern recognition and number crunching to advanced skills such as problem solving, logical reasoning and abstract thinking.

For a long time, developing AI systems that can solve mathematical problems and prove mathematical theorems has been a research focus in the fields of machine learning and natural language processing. This also dates back to the 1960s.

In the past ten years since the rise of deep learning, people’s interest in this field has grown significantly:

The number of papers has increased sharply in the past ten years. How does deep learning slowly open the door to mathematical reasoning?

Figure 1: Estimated number of deep learning papers published each year on mathematical reasoning. Since 2018, this area has experienced rapid growth.

# Deep learning has shown great success in various natural language processing tasks, such as question answering and machine translation. Likewise, researchers have developed various neural network methods for mathematical reasoning, which have proven effective in handling complex tasks such as word problems, theorem proving, and geometric problem solving. For example, deep learning-based application problem solvers adopt a sequence-to-sequence framework and use an attention mechanism as an intermediate step to generate mathematical expressions. Furthermore, with large-scale corpora and Transformer models, pre-trained language models have achieved promising results on various mathematical tasks. Recently, large language models like GPT-3 have further advanced the field of mathematical reasoning by demonstrating impressive capabilities in complex reasoning and contextual learning.

In a recently released report, researchers from UCLA and other institutions systematically reviewed the progress of deep learning in mathematical reasoning.

The number of papers has increased sharply in the past ten years. How does deep learning slowly open the door to mathematical reasoning?

##Paper link: https://arxiv.org/pdf/2212.10535.pdf

Project address: https://github.com/lupantech/dl4math

Specifically, this article discusses various tasks and data sets (Section 2), and examines advances in neural networks (Section 3) and pretrained language models (Section 4) in mathematics. The rapid development of contextual learning of large language models in mathematical reasoning is also explored (Section 5). The article further analyzes existing benchmarks and finds that less attention is paid to multimodal and low-resource environments (Section 6.1). Evidence-based research shows that current representations of computing capabilities are inadequate and deep learning methods are inconsistent with respect to mathematical reasoning (Section 6.2). Subsequently, the authors suggest improvements to the current work in terms of generalization and robustness, trustworthy reasoning, learning from feedback, and multimodal mathematical reasoning (Section 7).

Tasks and Datasets

This section examines the various tasks and datasets currently available for studying mathematical reasoning using deep learning methods, see Table 2.

The number of papers has increased sharply in the past ten years. How does deep learning slowly open the door to mathematical reasoning?

Math Word Problem

The Math Word Problem contains a A short narrative involving people, entities, and quantities whose mathematical relationships can be modeled by a set of equations whose solution reveals the final answer to the problem. Table 1 is a typical example. A question involves the four basic mathematical operations of addition, subtraction, multiplication, and division, with single or multiple steps. The challenge of application problems to NLP systems lies in the demand for language understanding, semantic analysis and various mathematical reasoning capabilities.

The number of papers has increased sharply in the past ten years. How does deep learning slowly open the door to mathematical reasoning?

Existing word problem datasets cover elementary school-level problems that are scraped from online learning websites, collected from textbooks, or manually annotated by humans. Early word problem data sets were relatively small or limited to a small number of steps. Some recent datasets aim to increase the variety and difficulty of the problem. For example, Ape210K, the largest current public problem set, consists of 210k elementary school word problems; while problems in GSM8K can involve up to 8-step solutions. SVAMP is a benchmark that tests the robustness of deep learning models to word problems with simple variations. Some recently established datasets also involve modalities other than text. For example, IconQA provides an abstract diagram as a visual background, while TabMWP provides a tabular background for each question.

Most word problem data sets provide a justification for the annotated equation as a solution (see Table 1). To improve the performance and interpretability of learned solvers, MathQA is annotated with precise computational procedures, and MathQA-Python provides concrete Python procedures. Other datasets annotate questions with multi-step natural language solutions that are considered more suitable for human reading. Lila annotated many of the previously mentioned word problem data sets using the principles of Python programming.

Theoretical Proof

Automated theorem proving is a long-term challenge in the field of AI. The problem usually involves proving the truth of a mathematical theorem through a series of logical arguments. Theorem proving involves a variety of skills, such as choosing efficient multi-step strategies, using background knowledge, and performing symbolic operations such as arithmetic or derivations.

Recently, there has been growing interest in using language models for theorem proving in formal interactive theorem provers (ITPs). A theorem is stated in ITP's programming language and then simplified by generating "proof steps" until it is reduced to a known fact. The result is a sequence of steps that constitute a verified proof.

Informal theorem proving proposes an alternative medium for theorem proving, that is, using a mixture of natural language and "standard" mathematical notation (such as LATEX) to write statements and proofs. and checked for correctness by humans.

An emerging field of research aims to combine elements of informal and formal theorem proving. For example, Wu et al. (2022b) explore translating informal statements into formal statements, while Jiang et al. (2022b) released a new version of the miniF2F benchmark that adds informal statements and proofs, called miniF2F informal. Jiang et al. (2022b) explore converting provided (or generated) informal proofs into formal proofs.

Geometric Problems

Automated Solving of Geometric Problems (GPS) is also a long-standing artificial intelligence in mathematical reasoning research mission and has attracted widespread attention in recent years. Unlike word problems, geometry problems consist of natural language textual descriptions and geometric figures. As shown in Figure 2, multimodal input describes entities, properties, and relationships of geometric elements, while the goal is to find numerical solutions to unknown variables. GPS is a challenging task for deep learning methods due to the complex skills required. It involves the ability to parse multimodal information, engage in symbolic abstraction, utilize theorem knowledge, and engage in quantitative reasoning.

Early data sets promoted research in this field, however these data sets were relatively small or not publicly available, which limited the development of deep learning methods. To address this limitation, Lu et al. created the Geometry3K dataset, which consists of 3002 multiple-choice geometry questions annotated in a unified logical form for multimodal inputs. Recently, larger scale datasets such as GeoQA, GeoQA, and UniGeo have been introduced and annotated with programs that can be learned and executed by neural solvers to obtain final answers.

MATHEMATICS Q&A

Recent research shows that SOTA mathematical reasoning systems may be "brittle" in reasoning, that is, the model relies on false signals from specific data sets and plug-and-play calculations to achieve "satisfactory" performance. In order to solve this problem, new benchmarks have been proposed from various aspects. The Mathematics (Saxton et al., 2020) dataset includes many different types of mathematical problems, spanning arithmetic, algebra, probability, and calculus. This dataset can measure the algebraic generalization ability of the model. Similarly, MATH (Hendrycks et al., 2021) consists of challenging competition mathematics to measure a model's problem-solving ability in complex situations.

Some work added table background to question input. For example, FinQA, TAT-QA, and MultiHiertt collect questions that require table understanding and numerical reasoning to answer. Some studies have proposed unified benchmarks for large-scale numerical reasoning. NumGLUE (Mishra et al., 2022b) is a multi-task benchmark that aims to evaluate model performance on eight different tasks. Mishra et al. 2022a further pushed this direction by proposing Lila, which consists of 23 numerical reasoning tasks spanning a wide range of mathematical topics, language complexity, question formats, and background knowledge requirements.

AI has also made achievements in other types of quantitative problems. Numbers, charts, and drawings, for example, are essential media for conveying large amounts of information in a concise manner. FigureQA, DVQA, MNS, PGDP5K, and GeoRE were all introduced to study the ability of models to reason about quantitative relationships between graph-based entities. NumerSense investigates whether and to what extent existing pre-trained language models are able to sense numerical common sense knowledge. EQUATE formalizes various aspects of quantitative reasoning in a natural language reasoning framework. Quantitative reasoning also appears frequently in specific fields such as finance, science, and programming. For example, ConvFinQA performs numerical reasoning on financial reports in the form of conversational question and answer; ScienceQA involves numerical reasoning in the scientific field; and P3 studies the functional reasoning ability of deep learning models to find a valid input for a given program to return True.

Neural Networks for Mathematical Reasoning

The author of this article also summarizes several common neural networks used for mathematical reasoning.

Seq2Seq Network

Seq2Seq neural network has been successfully applied to mathematical reasoning tasks such as application problems and theorem proving , geometry questions and math question answers. Seq2Seq models use an encoder-decoder architecture that typically formalizes mathematical reasoning as a sequence generation task. The basic idea of ​​this method is to map input sequences (such as mathematical problems) to output sequences (such as equations, programs, and proofs). Common encoders and decoders include long short-term memory network (LSTM) and gated recurrent unit (GRU). Extensive work has shown that Seq2Seq models have performance advantages over previous statistical learning methods, including their bidirectional variants BiLSTM and BiGRU. DNS is the first work to use the Seq2Seq model to convert sentences from word problems into mathematical equations.

Graph-based networks

The Seq2Seq method has the advantage of generating mathematical expressions and not relying on hand-crafted features. Mathematical expressions can be transformed into tree-based structures, such as abstract syntax trees (AST) and graph-based structures, which describe the structured information in the expression. However, this important information is not explicitly modeled by the Seq2Seq approach. To solve this problem, researchers developed graph-based neural networks to explicitly model the structure in expressions.

Sequence-to-tree (Seq2Tree) models explicitly model tree structures when encoding output sequences. For example, Liu et al. designed a Seq2Tree model to better utilize the AST information of equations. In contrast, Seq2DAG applies a sequence graph (Seq2Graph) framework when generating equations because the graph decoder is able to extract complex relationships between multiple variables. Graph-based information can also be embedded when encoding input mathematical sequences. For example, ASTactic applies TreeLSTM on AST to represent the input goals and premises of theorem proofs.

Attention-based network

The attention mechanism has been successfully applied to natural language processing and computer vision problems, taking the input hidden vector into account during the decoding process. Researchers have been exploring its role in mathematical reasoning tasks because it can be used to identify the most important relationships between mathematical concepts. For example, MATH-EN is a word problem solver that benefits from long-range dependency information learned through self-attention. Attention-based methods have also been applied to other mathematical reasoning tasks, such as geometry problems and theorem proving. In order to extract better representations, various attention mechanisms have been studied, such as Group-ATT, which uses different multi-head attention to extract various types of MWP features, and graph attention, which is applied to extract knowledge-aware information.

Other Neural Networks

Deep learning methods for mathematical reasoning tasks can also make use of other neural networks, such as convolutions Neural networks and multimodal networks. Some works use convolutional neural network architectures to encode input text, giving the model the ability to capture long-term relationships between symbols in the input. For example, Irving et al. proposed the first application of deep neural networks in theorem proving, which relied on convolutional networks for premise selection in large theories.

Multimodal mathematical reasoning tasks, such as geometric problem solving and graph-based mathematical reasoning, are formalized as visual question answering (VQA) questions. In this domain, visual inputs are encoded using ResNet or Faster-RCNN, while textual representations are obtained through GRU or LTSM. Subsequently, joint representations are learned using multimodal fusion models such as BAN, FiLM, and DAFA.

Other deep neural network structures can also be used for mathematical reasoning. Zhang et al. exploited the success of graph neural networks (GNN) in spatial reasoning and applied it to geometric problems. WaveNet is applied to theorem proving due to its ability to solve longitudinal time series data. Furthermore, Transformer was found to outperform GRU in generating mathematical equations in DDT. And, MathDQN is the first work exploring reinforcement learning for solving mathematical word problems, primarily leveraging its powerful search capabilities.

Pre-trained language model for mathematical reasoning

Pre-trained language model has shown significant performance improvements on a wide range of NLP tasks, also applied to mathematics Related issues, previous work has shown that pre-trained language models perform well in solving word problems, assisting with theorem proving and other mathematical tasks. However, using it for mathematical reasoning presents several challenges.

First of all, the pre-trained language model is not specifically trained on mathematical data. This may result in their lower proficiency in math-related tasks than in natural language tasks. There is also less mathematical or scientific data available for large-scale pre-training compared to text data.

Second, the size of pre-trained models continues to grow, making it expensive to train an entire model from scratch for a specific downstream task.

Additionally, downstream tasks may handle different input formats or modalities, such as structured tables or charts. To address these challenges, researchers must fine-tune pretrained models or adjust neural architectures on downstream tasks.

Finally, although pre-trained language models can encode a large amount of language information, looking at the goal of language modeling alone, it may be difficult for the model to learn numerical representation or high-level reasoning skills. With this in mind, recent research has investigated the infusion of mathematics-related skills in courses that begin with the basics.

Self-supervised learning of mathematics

Table 4 below provides a pre-trained self-supervised task for mathematical reasoning A list of language models.

The number of papers has increased sharply in the past ten years. How does deep learning slowly open the door to mathematical reasoning?

Task-specific mathematical fine-tuning

When there is not enough data to Task-specific fine-tuning is also a common practice when training large models from scratch. As shown in Table 5, existing work attempts to fine-tune pre-trained language models on various downstream tasks.

The number of papers has increased sharply in the past ten years. How does deep learning slowly open the door to mathematical reasoning?

In addition to fine-tuning model parameters, many works also use pre-trained language models as encoders and combine them with other modules to complete downstream tasks. For example, IconQA proposes to use ResNet and BERT for graph recognition and BERT respectively. Text comprehension.

Contextual Learning in Mathematical Reasoning

A sample of context usually contains an input-output pair and some prompt words, for example, please select the largest number from the list .

Input: [2, 4, 1, 5, 8]

Output: 8.

Few-shot learning will give multiple samples, and then the model will predict the output on the last input sample. However, this standard few-shot prompting, which provides large language models with contextual samples of input-output pairs before test-time samples, has not been proven to be sufficient to achieve good performance on challenging tasks such as mathematical reasoning.

Chain-of-thought prompting (CoT) uses the intermediate natural language explanation as a prompt, allowing a large language model to first generate a reasoning chain and then predict the answer to an input question. For example, a CoT prompt for solving a word problem could be "Let's think step by step!" prompt will make large language models good zero-shot inferencers. Apart from this, most recent work has focused on how to improve thought chain reasoning in the setting of zero-shot inference. This type of work is mainly divided into two parts: (i) selecting better contextual samples and (ii) creating better inference chains.

The number of papers has increased sharply in the past ten years. How does deep learning slowly open the door to mathematical reasoning?

Context sample selection

Early thought chain work was to select context samples randomly or heuristically. Recent research has shown that this type of few-shot learning can be very unstable under different selections of contextual examples. Therefore, which contextual reasoning samples can make the most efficient prompts is still an unknown issue in academic circles. To address this limitation, some recent works have investigated various methods to optimize the context sample selection process. For example, Rubin et al. (2022) attempted to solve this problem by retrieving semantically similar samples. However, this approach does not work well on mathematical reasoning problems, and it is difficult to measure similarity if structured information (such as tables) is included. In addition, Fu et al. (2022) proposed a complexity-based prompt, selecting samples with complex reasoning chains (i.e., chains with more reasoning steps) as prompts. Lu et al. (2022b) proposed a method to select contextual samples through reinforcement learning. Specifically, the agent learns to find the best contextual sample from a pool of candidates, with the goal of maximizing the predicted reward for a given training sample when interacting with the GPT-3 environment. Furthermore, Zhang et al. (2022b) found that diversification of example problems can also improve model performance. They proposed a two-step approach to construct example problems in context: first, divide the problems of a given dataset into several groups; second, select a representative problem from each group and use A simple heuristic's zero-shot thinking chain generates its reasoning chain.

High quality reasoning chain

Early thought chain work mainly relied on a single human annotated reasoning chain as prompt . However, manually creating reasoning chains has two disadvantages: first, as tasks become more and more complex, current models may not be sufficient to learn to perform all necessary reasoning steps and cannot be easily generalized to different tasks; second, a single The decoding process is easily affected by faulty reasoning steps, leading to incorrect predictions in the final answer. To address this limitation, recent research has mainly focused on two aspects: (i) hand-crafting more complex examples, known as process-based methods; (ii) utilizing ensemble-like methods, known as outcome-based methods. After evaluating existing benchmarks and methods, the authors also discuss future research directions in this area. For more research details, please refer to the original paper.

The above is the detailed content of The number of papers has increased sharply in the past ten years. How does deep learning slowly open the door to mathematical reasoning?. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete
Can't use ChatGPT! Explaining the causes and solutions that can be tested immediately [Latest 2025]Can't use ChatGPT! Explaining the causes and solutions that can be tested immediately [Latest 2025]May 14, 2025 am 05:04 AM

ChatGPT is not accessible? This article provides a variety of practical solutions! Many users may encounter problems such as inaccessibility or slow response when using ChatGPT on a daily basis. This article will guide you to solve these problems step by step based on different situations. Causes of ChatGPT's inaccessibility and preliminary troubleshooting First, we need to determine whether the problem lies in the OpenAI server side, or the user's own network or device problems. Please follow the steps below to troubleshoot: Step 1: Check the official status of OpenAI Visit the OpenAI Status page (status.openai.com) to see if the ChatGPT service is running normally. If a red or yellow alarm is displayed, it means Open

Calculating The Risk Of ASI Starts With Human MindsCalculating The Risk Of ASI Starts With Human MindsMay 14, 2025 am 05:02 AM

On 10 May 2025, MIT physicist Max Tegmark told The Guardian that AI labs should emulate Oppenheimer’s Trinity-test calculus before releasing Artificial Super-Intelligence. “My assessment is that the 'Compton constant', the probability that a race to

An easy-to-understand explanation of how to write and compose lyrics and recommended tools in ChatGPTAn easy-to-understand explanation of how to write and compose lyrics and recommended tools in ChatGPTMay 14, 2025 am 05:01 AM

AI music creation technology is changing with each passing day. This article will use AI models such as ChatGPT as an example to explain in detail how to use AI to assist music creation, and explain it with actual cases. We will introduce how to create music through SunoAI, AI jukebox on Hugging Face, and Python's Music21 library. Through these technologies, everyone can easily create original music. However, it should be noted that the copyright issue of AI-generated content cannot be ignored, and you must be cautious when using it. Let’s explore the infinite possibilities of AI in the music field together! OpenAI's latest AI agent "OpenAI Deep Research" introduces: [ChatGPT]Ope

What is ChatGPT-4? A thorough explanation of what you can do, the pricing, and the differences from GPT-3.5!What is ChatGPT-4? A thorough explanation of what you can do, the pricing, and the differences from GPT-3.5!May 14, 2025 am 05:00 AM

The emergence of ChatGPT-4 has greatly expanded the possibility of AI applications. Compared with GPT-3.5, ChatGPT-4 has significantly improved. It has powerful context comprehension capabilities and can also recognize and generate images. It is a universal AI assistant. It has shown great potential in many fields such as improving business efficiency and assisting creation. However, at the same time, we must also pay attention to the precautions in its use. This article will explain the characteristics of ChatGPT-4 in detail and introduce effective usage methods for different scenarios. The article contains skills to make full use of the latest AI technologies, please refer to it. OpenAI's latest AI agent, please click the link below for details of "OpenAI Deep Research"

Explaining how to use the ChatGPT app! Japanese support and voice conversation functionExplaining how to use the ChatGPT app! Japanese support and voice conversation functionMay 14, 2025 am 04:59 AM

ChatGPT App: Unleash your creativity with the AI ​​assistant! Beginner's Guide The ChatGPT app is an innovative AI assistant that handles a wide range of tasks, including writing, translation, and question answering. It is a tool with endless possibilities that is useful for creative activities and information gathering. In this article, we will explain in an easy-to-understand way for beginners, from how to install the ChatGPT smartphone app, to the features unique to apps such as voice input functions and plugins, as well as the points to keep in mind when using the app. We'll also be taking a closer look at plugin restrictions and device-to-device configuration synchronization

How do I use the Chinese version of ChatGPT? Explanation of registration procedures and feesHow do I use the Chinese version of ChatGPT? Explanation of registration procedures and feesMay 14, 2025 am 04:56 AM

ChatGPT Chinese version: Unlock new experience of Chinese AI dialogue ChatGPT is popular all over the world, did you know it also offers a Chinese version? This powerful AI tool not only supports daily conversations, but also handles professional content and is compatible with Simplified and Traditional Chinese. Whether it is a user in China or a friend who is learning Chinese, you can benefit from it. This article will introduce in detail how to use ChatGPT Chinese version, including account settings, Chinese prompt word input, filter use, and selection of different packages, and analyze potential risks and response strategies. In addition, we will also compare ChatGPT Chinese version with other Chinese AI tools to help you better understand its advantages and application scenarios. OpenAI's latest AI intelligence

5 AI Agent Myths You Need To Stop Believing Now5 AI Agent Myths You Need To Stop Believing NowMay 14, 2025 am 04:54 AM

These can be thought of as the next leap forward in the field of generative AI, which gave us ChatGPT and other large-language-model chatbots. Rather than simply answering questions or generating information, they can take action on our behalf, inter

An easy-to-understand explanation of the illegality of creating and managing multiple accounts using ChatGPTAn easy-to-understand explanation of the illegality of creating and managing multiple accounts using ChatGPTMay 14, 2025 am 04:50 AM

Efficient multiple account management techniques using ChatGPT | A thorough explanation of how to use business and private life! ChatGPT is used in a variety of situations, but some people may be worried about managing multiple accounts. This article will explain in detail how to create multiple accounts for ChatGPT, what to do when using it, and how to operate it safely and efficiently. We also cover important points such as the difference in business and private use, and complying with OpenAI's terms of use, and provide a guide to help you safely utilize multiple accounts. OpenAI

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

MinGW - Minimalist GNU for Windows

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

DVWA

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

PhpStorm Mac version

PhpStorm Mac version

The latest (2018.2.1) professional PHP integrated development tool