Topping the list of open source AI software engineers, UIUC's agent-less solution easily solves SWE-bench real programming problems-AI-php.cn

Topping the list of open source AI software engineers, UIUC's agent-less solution easily solves SWE-bench real programming problems

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jul 17, 2024 pm 10:02 PM

project

Topping the list of open source AI software engineers, UIUCs agent-less solution easily solves SWE-bench real programming problems

The AIxiv column is a column where this site publishes academic and technical content. In the past few years, the AIxiv column of this site has received more than 2,000 reports, covering top laboratories from major universities and companies around the world, effectively promoting academic exchanges and dissemination. If you have excellent work that you want to share, please feel free to contribute or contact us for reporting. Submission email: liyazhou@jiqizhixin.com; zhaoyunfeng@jiqizhixin.com

The authors of this paper are all from the team of teacher Zhang Lingming at the University of Illinois at Urbana-Champaign (UIUC), including: Steven Xia, a fourth-year doctoral student, whose research direction is Automatic code repair based on AI large models; Deng Yinlin, a fourth-year doctoral student, whose research direction is code generation based on AI large models; Soren Dunn, a scientific research intern, is currently a third-year student at UIUC. Teacher Zhang Lingming is currently an associate professor in the Department of Computer Science at UIUC, mainly engaged in research related to software engineering, machine learning, and large code models.

For more detailed information, please see Teacher Zhang’s personal homepage: https://lingming.cs.illinois.edu/

Since Devin (the first fully automatic AI software engineer) proposed it, AI for software engineering The design of Agent has become the focus of research. More and more Agent-based AI automatic software engineers have been proposed, and have achieved good performance on the SWE-bench data set and automatically repaired many real GitHub issues.

However, a complex Agent system will bring additional overhead and uncertainty. Do we really need to use such a complex Agent to solve GitHub issues? Can an agent-free solution come close to their performance?

Starting from these two problems, the team of teacher Zhang Lingming from the University of Illinois at Urbana-Champaign (UIUC) proposed OpenAutoCoder-Agentless, a simple, efficient and completely open source Agent-less solution that can solve a real GitHub issue for only $0.34. Agentless has attracted more than 300 GitHub stars on GitHub in just a few days, and has made it into the top three of DAIR.AI’s weekly list of hottest ML papers.

Topping the list of open source AI software engineers, UIUCs agent-less solution easily solves SWE-bench real programming problems

Paper: AGENTLESS: Demystifying LLM-based Software Engineering Agents
Paper address: https://huggingface.co/papers/2407.01489
Open source code: https://github.com /OpenAutoCoder/Agentless

AWS Research Scientist Leo Boytsov said: "The Agentless framework outperformed all open source Agent solutions and almost reached the top level of SWE Bench Lite (27%). Moreover, it defeated it at a significantly lower cost. All open source solutions. The framework uses a hierarchical query approach (by asking LLM to find files, classes, functions, etc.) to determine patch locations, but does not allow LLM to make planning decisions. Agentless is an automated approach to software development problems that uses a simple two-phase approach to locate and fix bugs in your code base. In the locating phase, Agentless uses a hierarchical approach to gradually narrow down to suspicious files, classes/functions and specific editing locations. For fixes, it uses a simple diff format (referenced from the open source tool Aider) to generate multiple candidate patches, filtering and sorting them.

Topping the list of open source AI software engineers, UIUCs agent-less solution easily solves SWE-bench real programming problems

The researchers compared Agentless with existing AI software agents, including state-of-the-art open source and commercial/closed source projects. Surprisingly, Agentless can outperform all existing open source software agents at a lower cost! Agentless solved 27.33% of problems, the highest among open source solutions, and solved it for an average of $0.29 per problem and about $0.34 on average across all problems (both solvable and unsolved).

Topping the list of open source AI software engineers, UIUCs agent-less solution easily solves SWE-bench real programming problems

Not only that, Agentless has the potential to improve. Agentless can solve 41% of the problems when considering all generated patches, an upper bound that indicates significant room for improvement in the patch sorting and selection stages. Furthermore, Agentless is able to solve some unique problems that even the best commercial tool (Alibaba Lingma Agent) cannot solve, suggesting that it can be used as a complement to existing tools.

Topping the list of open source AI software engineers, UIUCs agent-less solution easily solves SWE-bench real programming problems

Analysis of SWE-bench Lite data set

The researchers also conducted manual inspection and detailed analysis of the SWE-bench Lite data set.

The study found that 4.3% of the problems in the SWE-bench Lite data set gave complete answers directly in the problem description, which is the correct fix patch. While the other 10% of the questions describe the exact steps to the correct solution. This suggests that some problems in SWE-bench Lite may be easier to solve.

In addition, the research team observed that 4.3% of the issues included user-proposed solutions or steps in the issue description, but these solutions were not consistent with the developers' actual patches. This further reveals a potential problem with this benchmark, as these misleading solutions could cause the AI tool to generate incorrect solutions simply by following the problem description.

In terms of problem description quality, researchers observed that although most tasks in SWE-bench Lite contain sufficient information, and many tasks also provide failure examples to reproduce errors, there are still 9.3% of problems Not enough information included. For example, you need to implement a new function or add an error message, but the specific function name or specific error message string is not given in the problem description. This means that even if the underlying functionality is implemented correctly, the test will fail if the function name or error message string does not match exactly.

Topping the list of open source AI software engineers, UIUCs agent-less solution easily solves SWE-bench real programming problems

Researchers at Princeton University and one of the authors of SWE-Bench, Ofir Press confirmed their findings: "Agentless performed a good manual analysis of SWE-bench Lite. They believe that the theoretical highest on Lite The score is probably 90.7%. I think the actual upper limit may be lower (around 80%). Some questions have insufficient information and others are too rigorously tested. ”

Topping the list of open source AI software engineers, UIUCs agent-less solution easily solves SWE-bench real programming problems

SWE-bench Lite-S: passed. Filtered strict problem subset

To address these problems, the researchers proposed a strict problem subset SWE-bench Lite-S (containing 252 questions). Specifically, issues that contained exact patches, misleading solutions, or did not provide sufficient information in the issue description were excluded from SWE-bench Lite (containing 300 issues). This removes unreasonable questions and standardizes the difficulty level of the benchmark. Compared to the original SWE-bench Lite, the filtered benchmark more accurately reflects the true capabilities of automated software development tools.

Conclusion

Although Agent-based software development is very promising, the authors believe that it is time for the technology and research community to stop and think about its key design and evaluation methods, rather than rushing to release more Agents. Researchers hope that Agentless can help reset the baseline and direction of Agents in future software engineering.

The above is the detailed content of Topping the list of open source AI software engineers, UIUC's agent-less solution easily solves SWE-bench real programming problems. For more information, please follow other related articles on the PHP Chinese website!

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

You Must Build Workplace AI Behind A Veil Of IgnoranceApr 29, 2025 am 11:15 AM

In John Rawls' seminal 1971 book The Theory of Justice, he proposed a thought experiment that we should take as the core of today's AI design and use decision-making: the veil of ignorance. This philosophy provides a simple tool for understanding equity and also provides a blueprint for leaders to use this understanding to design and implement AI equitably. Imagine that you are making rules for a new society. But there is a premise: you don’t know in advance what role you will play in this society. You may end up being rich or poor, healthy or disabled, belonging to a majority or marginal minority. Operating under this "veil of ignorance" prevents rule makers from making decisions that benefit themselves. On the contrary, people will be more motivated to formulate public

Decisions, Decisions… Next Steps For Practical Applied AIApr 29, 2025 am 11:14 AM

Numerous companies specialize in robotic process automation (RPA), offering bots to automate repetitive tasks—UiPath, Automation Anywhere, Blue Prism, and others. Meanwhile, process mining, orchestration, and intelligent document processing speciali

The Agents Are Coming – More On What We Will Do Next To AI PartnersApr 29, 2025 am 11:13 AM

The future of AI is moving beyond simple word prediction and conversational simulation; AI agents are emerging, capable of independent action and task completion. This shift is already evident in tools like Anthropic's Claude. AI Agents: Research a

Why Empathy Is More Important Than Control For Leaders In An AI-Driven FutureApr 29, 2025 am 11:12 AM

Rapid technological advancements necessitate a forward-looking perspective on the future of work. What happens when AI transcends mere productivity enhancement and begins shaping our societal structures? Topher McDougal's upcoming book, Gaia Wakes:

AI For Product Classification: Can Machines Master Tax Law?Apr 29, 2025 am 11:11 AM

Product classification, often involving complex codes like "HS 8471.30" from systems such as the Harmonized System (HS), is crucial for international trade and domestic sales. These codes ensure correct tax application, impacting every inv

Could Data Center Demand Spark A Climate Tech Rebound?Apr 29, 2025 am 11:10 AM

The future of energy consumption in data centers and climate technology investment This article explores the surge in energy consumption in AI-driven data centers and its impact on climate change, and analyzes innovative solutions and policy recommendations to address this challenge. Challenges of energy demand: Large and ultra-large-scale data centers consume huge power, comparable to the sum of hundreds of thousands of ordinary North American families, and emerging AI ultra-large-scale centers consume dozens of times more power than this. In the first eight months of 2024, Microsoft, Meta, Google and Amazon have invested approximately US$125 billion in the construction and operation of AI data centers (JP Morgan, 2024) (Table 1). Growing energy demand is both a challenge and an opportunity. According to Canary Media, the looming electricity

AI And Hollywood's Next Golden AgeApr 29, 2025 am 11:09 AM

Generative AI is revolutionizing film and television production. Luma's Ray 2 model, as well as Runway's Gen-4, OpenAI's Sora, Google's Veo and other new models, are improving the quality of generated videos at an unprecedented speed. These models can easily create complex special effects and realistic scenes, even short video clips and camera-perceived motion effects have been achieved. While the manipulation and consistency of these tools still need to be improved, the speed of progress is amazing. Generative video is becoming an independent medium. Some models are good at animation production, while others are good at live-action images. It is worth noting that Adobe's Firefly and Moonvalley's Ma

Is ChatGPT Slowly Becoming AI's Biggest Yes-Man?Apr 29, 2025 am 11:08 AM

ChatGPT user experience declines: is it a model degradation or user expectations? Recently, a large number of ChatGPT paid users have complained about their performance degradation, which has attracted widespread attention. Users reported slower responses to models, shorter answers, lack of help, and even more hallucinations. Some users expressed dissatisfaction on social media, pointing out that ChatGPT has become “too flattering” and tends to verify user views rather than provide critical feedback. This not only affects the user experience, but also brings actual losses to corporate customers, such as reduced productivity and waste of computing resources. Evidence of performance degradation Many users have reported significant degradation in ChatGPT performance, especially in older models such as GPT-4 (which will soon be discontinued from service at the end of this month). this

See all articles