Home  >  Article  >  Technology peripherals  >  Traffic Engineering doubles code generation accuracy: from 19% to 44%

Traffic Engineering doubles code generation accuracy: from 19% to 44%

WBOY
WBOYforward
2024-02-05 09:15:09793browse

The authors of a new paper propose a way to "enhance" code generation.

Traffic Engineering doubles code generation accuracy: from 19% to 44%

#Code generation is an increasingly important capability in artificial intelligence. It automatically generates computer code based on natural language descriptions by training machine learning models. This technology has broad application prospects and can transform software specifications into usable code, automate back-end development, and assist human programmers to improve work efficiency.

However, generating high-quality code is still challenging for AI systems, compared with language tasks such as translation or summarization. The code must accurately conform to the syntax of the target programming language, handle edge cases and unexpected inputs gracefully, and handle the many small details of the problem description accurately. Even small errors that may seem innocuous in other areas can completely disrupt a program's functionality, causing it to fail to compile or run.

Recently, researchers at CodiumAI proposed AlphaCodium, a new method that can significantly improve the code generation capabilities of large language models such as GPT-4. Their point is that merely fine-tuning the wording of prompts has inherent limitations in solving complex coding problems. Instead, they designed a multi-stage process focused on iteratively generating, running, and debugging code against test cases, allowing the model to learn from practice.

Limitations of Prompt Engineering

In natural language tasks, prompt engineering refers to carefully adjusting the wording and structure of prompts to Guide the model to produce the desired output. For example, adding the phrase "Write a concise summary:" before the input text can cause the model to generate a more accurate summary.

Prompt engineering has proven to be very effective in doing text generation to guide the behavior of large language models. However, when it comes to coding problems, researchers have found that even with extensive timely adjustments, only small gains can be achieved. This discovery is thought-provoking. Therefore, generating high-quality code still requires other solutions:

  • # Exactly match the syntax of the target programming language
  • Handle it elegantly Corner Cases and Unexpected Input
  • Address all the little details and requirements described in the problem statement
  • Ensure the code compiles correctly for all valid input and Run

These structural requirements are beyond the scope of text generation and cannot be hardcoded into the prompt. The prompts themselves lacked the coding skills and concrete feedback needed for model learning.

AlphaCodium Iterative Process

To address these challenges, researchers developed an iterative process specifically tailored to the structure of the code generation problem. The key innovation is to use the execution results of the generated code as learning signals to provide direct feedback.

AlphaCodium’s process has two main stages:

Preprocessing

  • The model interprets the problem description as Bullet points to extract key details.
  • Explain the intended logic behind each example input/output.
  • Provide two or three natural language solutions.
  • Generate additional different test cases for code coverage.

Code Iteration

  • The model generates initial code solutions.
  • Repeat this code against the public test case to fix any errors that occur.
  • Do the same thing for the test cases generated by the model.
  • Additional test cases are added to the growing suite of "test anchors" to prevent regressions.

By incrementally reasoning about the problem, developing solution hypotheses, extending test coverage, and iteratively generating and debugging code, the model learns through experience - which is high-quality code Generate the required skills.

Traffic Engineering doubles code generation accuracy: from 19% to 44%

Figure 1. Prompt example with structured output (generate possible solution phase)

Researchers found that designing processes into modules with clear interfaces and goals leads to better results compared to an end-to-end model. Each phase first focuses on simpler subtasks to build knowledge and uncover insights that inform downstream phases. Upstream stages like test generation do not require a complete solution, only basic reasoning.

Experimental Results

The researchers evaluated AlphaCodium against the CodeContests benchmark, which contains hundreds of participants from competitive programming competitions. A coding issue.

Traffic Engineering doubles code generation accuracy: from 19% to 44%

Figure 2. Problem description and reflection - an example of a typical CodeContests question, self-reflection on the problem based on artificial intelligence. While the initial description is long and complex, proper self-reflection can make the problem clearer and more coherent, leading to improved code solutions Compared with a single hint, AlphaCodium improved the code generation accuracy on the validation set from 19% to 44%. This benefit holds true across different model sizes and test sets, and is significantly more effective than a separate hint project.

AlphaCodium also performs significantly better than previously published methods, such as AlphaCode and CodeChain, while using fewer computing resources. For example, by avoiding unnecessary brute force generation, its accuracy is comparable to AlphaCode while requiring 10,000 times fewer model queries.

These results demonstrate the value of designing an AI system holistically around a task structure, rather than treating it as a general-purpose text generator. By incorporating iterative code running and debugging, AlphaCodium better aligns the training process with the ultimate goal of producing robust, practical code.

Broader Impact

While demonstrated against a competitive programming problem, the concepts used in AlphaCodium provide insights into AI advancing code generation A wider range of applicable experience:

The prompt project alone has limitations for handling complex code tasks. Concrete problem-solving experience is critical.

  • Test-based development specifications can provide a basis for model training. The test provides an explicit fitness function.
  • Iterative code debugging focuses model improvement on the errors that actually occur.
  • Test coverage extensions highlight generalization gaps not visible in hints.
  • Soft decision-making with double verification reduces vulnerability and bias.
  • #AlphaCodium provides a promising new paradigm for code generation based on software engineering best practices. There remain open research questions regarding generalizability and computational overhead. But the principles presented here (learning from experience, test-driven development, modular reasoning, and iterative debugging) seem to provide a solid foundation for improving coding capabilities for AI.

Paper link: https://arxiv.org/pdf/2401.08500.pdf.

Code base: https://github.com/Codium-ai/AlphaCodium.

Original title: "Flow engineering" doubles code generation accuracy (19% vs 44%), author: Mike Young

Link: https:// notes.aimodels.fyi/flow-engineering-intensifies-for-code-generation/.

The above is the detailed content of Traffic Engineering doubles code generation accuracy: from 19% to 44%. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete