



The AIxiv column is a column where this site publishes academic and technical content. In the past few years, the AIxiv column of this site has received more than 2,000 reports, covering top laboratories from major universities and companies around the world, effectively promoting academic exchanges and dissemination. If you have excellent work that you want to share, please feel free to contribute or contact us for reporting. Submission email: liyazhou@jiqizhixin.com; zhaoyunfeng@jiqizhixin.com
The authors of this paper are all from the team of teacher Zhang Lingming at the University of Illinois at Urbana-Champaign (UIUC), including: Steven Xia, a fourth-year doctoral student, whose research direction is Automatic code repair based on AI large models; Deng Yinlin, a fourth-year doctoral student, whose research direction is code generation based on AI large models; Soren Dunn, a scientific research intern, is currently a third-year student at UIUC. Teacher Zhang Lingming is currently an associate professor in the Department of Computer Science at UIUC, mainly engaged in research related to software engineering, machine learning, and large code models.
For more detailed information, please see Teacher Zhang’s personal homepage: https://lingming.cs.illinois.edu/
Since Devin (the first fully automatic AI software engineer) proposed it, AI for software engineering The design of Agent has become the focus of research. More and more Agent-based AI automatic software engineers have been proposed, and have achieved good performance on the SWE-bench data set and automatically repaired many real GitHub issues.
However, a complex Agent system will bring additional overhead and uncertainty. Do we really need to use such a complex Agent to solve GitHub issues? Can an agent-free solution come close to their performance?
Starting from these two problems, the team of teacher Zhang Lingming from the University of Illinois at Urbana-Champaign (UIUC) proposed OpenAutoCoder-Agentless, a simple, efficient and completely open source Agent-less solution that can solve a real GitHub issue for only $0.34. Agentless has attracted more than 300 GitHub stars on GitHub in just a few days, and has made it into the top three of DAIR.AI’s weekly list of hottest ML papers.
Paper: AGENTLESS: Demystifying LLM-based Software Engineering Agents
Paper address: https://huggingface.co/papers/2407.01489
Open source code: https://github.com /OpenAutoCoder/Agentless
AWS Research Scientist Leo Boytsov said: "The Agentless framework outperformed all open source Agent solutions and almost reached the top level of SWE Bench Lite (27%). Moreover, it defeated it at a significantly lower cost. All open source solutions. The framework uses a hierarchical query approach (by asking LLM to find files, classes, functions, etc.) to determine patch locations, but does not allow LLM to make planning decisions. Agentless is an automated approach to software development problems that uses a simple two-phase approach to locate and fix bugs in your code base. In the locating phase, Agentless uses a hierarchical approach to gradually narrow down to suspicious files, classes/functions and specific editing locations. For fixes, it uses a simple diff format (referenced from the open source tool Aider) to generate multiple candidate patches, filtering and sorting them.
The researchers compared Agentless with existing AI software agents, including state-of-the-art open source and commercial/closed source projects. Surprisingly, Agentless can outperform all existing open source software agents at a lower cost! Agentless solved 27.33% of problems, the highest among open source solutions, and solved it for an average of $0.29 per problem and about $0.34 on average across all problems (both solvable and unsolved).
Not only that, Agentless has the potential to improve. Agentless can solve 41% of the problems when considering all generated patches, an upper bound that indicates significant room for improvement in the patch sorting and selection stages. Furthermore, Agentless is able to solve some unique problems that even the best commercial tool (Alibaba Lingma Agent) cannot solve, suggesting that it can be used as a complement to existing tools.
Analysis of SWE-bench Lite data set
The researchers also conducted manual inspection and detailed analysis of the SWE-bench Lite data set.
The study found that 4.3% of the problems in the SWE-bench Lite data set gave complete answers directly in the problem description, which is the correct fix patch. While the other 10% of the questions describe the exact steps to the correct solution. This suggests that some problems in SWE-bench Lite may be easier to solve.
In addition, the research team observed that 4.3% of the issues included user-proposed solutions or steps in the issue description, but these solutions were not consistent with the developers' actual patches. This further reveals a potential problem with this benchmark, as these misleading solutions could cause the AI tool to generate incorrect solutions simply by following the problem description.
In terms of problem description quality, researchers observed that although most tasks in SWE-bench Lite contain sufficient information, and many tasks also provide failure examples to reproduce errors, there are still 9.3% of problems Not enough information included. For example, you need to implement a new function or add an error message, but the specific function name or specific error message string is not given in the problem description. This means that even if the underlying functionality is implemented correctly, the test will fail if the function name or error message string does not match exactly.
Researchers at Princeton University and one of the authors of SWE-Bench, Ofir Press confirmed their findings: "Agentless performed a good manual analysis of SWE-bench Lite. They believe that the theoretical highest on Lite The score is probably 90.7%. I think the actual upper limit may be lower (around 80%). Some questions have insufficient information and others are too rigorously tested. ”
SWE-bench Lite-S: passed. Filtered strict problem subset
To address these problems, the researchers proposed a strict problem subset SWE-bench Lite-S (containing 252 questions). Specifically, issues that contained exact patches, misleading solutions, or did not provide sufficient information in the issue description were excluded from SWE-bench Lite (containing 300 issues). This removes unreasonable questions and standardizes the difficulty level of the benchmark. Compared to the original SWE-bench Lite, the filtered benchmark more accurately reflects the true capabilities of automated software development tools.
Conclusion
Although Agent-based software development is very promising, the authors believe that it is time for the technology and research community to stop and think about its key design and evaluation methods, rather than rushing to release more Agents. Researchers hope that Agentless can help reset the baseline and direction of Agents in future software engineering.
The above is the detailed content of Topping the list of open source AI software engineers, UIUC's agent-less solution easily solves SWE-bench real programming problems. For more information, please follow other related articles on the PHP Chinese website!

Hey there, Coding ninja! What coding-related tasks do you have planned for the day? Before you dive further into this blog, I want you to think about all your coding-related woes—better list those down. Done? – Let’

AI Augmenting Food Preparation While still in nascent use, AI systems are being increasingly used in food preparation. AI-driven robots are used in kitchens to automate food preparation tasks, such as flipping burgers, making pizzas, or assembling sa

Introduction Understanding the namespaces, scopes, and behavior of variables in Python functions is crucial for writing efficiently and avoiding runtime errors or exceptions. In this article, we’ll delve into various asp

Introduction Imagine walking through an art gallery, surrounded by vivid paintings and sculptures. Now, what if you could ask each piece a question and get a meaningful answer? You might ask, “What story are you telling?

Continuing the product cadence, this month MediaTek has made a series of announcements, including the new Kompanio Ultra and Dimensity 9400 . These products fill in the more traditional parts of MediaTek’s business, which include chips for smartphone

#1 Google launched Agent2Agent The Story: It’s Monday morning. As an AI-powered recruiter you work smarter, not harder. You log into your company’s dashboard on your phone. It tells you three critical roles have been sourced, vetted, and scheduled fo

I would guess that you must be. We all seem to know that psychobabble consists of assorted chatter that mixes various psychological terminology and often ends up being either incomprehensible or completely nonsensical. All you need to do to spew fo

Only 9.5% of plastics manufactured in 2022 were made from recycled materials, according to a new study published this week. Meanwhile, plastic continues to pile up in landfills–and ecosystems–around the world. But help is on the way. A team of engin


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

MantisBT
Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

MinGW - Minimalist GNU for Windows
This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

ZendStudio 13.5.1 Mac
Powerful PHP integrated development environment

EditPlus Chinese cracked version
Small size, syntax highlighting, does not support code prompt function

Zend Studio 13.0.1
Powerful PHP integrated development environment