search
HomeTechnology peripheralsAITopping the list of open source AI software engineers, UIUC's agent-less solution easily solves SWE-bench real programming problems

Topping the list of open source AI software engineers, UIUCs agent-less solution easily solves SWE-bench real programming problems
The AIxiv column is a column where this site publishes academic and technical content. In the past few years, the AIxiv column of this site has received more than 2,000 reports, covering top laboratories from major universities and companies around the world, effectively promoting academic exchanges and dissemination. If you have excellent work that you want to share, please feel free to contribute or contact us for reporting. Submission email: liyazhou@jiqizhixin.com; zhaoyunfeng@jiqizhixin.com

The authors of this paper are all from the team of teacher Zhang Lingming at the University of Illinois at Urbana-Champaign (UIUC), including: Steven Xia, a fourth-year doctoral student, whose research direction is Automatic code repair based on AI large models; Deng Yinlin, a fourth-year doctoral student, whose research direction is code generation based on AI large models; Soren Dunn, a scientific research intern, is currently a third-year student at UIUC. Teacher Zhang Lingming is currently an associate professor in the Department of Computer Science at UIUC, mainly engaged in research related to software engineering, machine learning, and large code models.

For more detailed information, please see Teacher Zhang’s personal homepage: https://lingming.cs.illinois.edu/

Since Devin (the first fully automatic AI software engineer) proposed it, AI for software engineering The design of Agent has become the focus of research. More and more Agent-based AI automatic software engineers have been proposed, and have achieved good performance on the SWE-bench data set and automatically repaired many real GitHub issues.

However, a complex Agent system will bring additional overhead and uncertainty. Do we really need to use such a complex Agent to solve GitHub issues? Can an agent-free solution come close to their performance?

Starting from these two problems, the team of teacher Zhang Lingming from the University of Illinois at Urbana-Champaign (UIUC) proposed OpenAutoCoder-Agentless, a simple, efficient and completely open source Agent-less solution that can solve a real GitHub issue for only $0.34. Agentless has attracted more than 300 GitHub stars on GitHub in just a few days, and has made it into the top three of DAIR.AI’s weekly list of hottest ML papers.

Topping the list of open source AI software engineers, UIUCs agent-less solution easily solves SWE-bench real programming problems

  • Paper: AGENTLESS: Demystifying LLM-based Software Engineering Agents

  • Paper address: https://huggingface.co/papers/2407.01489

  • Open source code: https://github.com /OpenAutoCoder/Agentless

AWS Research Scientist Leo Boytsov said: "The Agentless framework outperformed all open source Agent solutions and almost reached the top level of SWE Bench Lite (27%). Moreover, it defeated it at a significantly lower cost. All open source solutions. The framework uses a hierarchical query approach (by asking LLM to find files, classes, functions, etc.) to determine patch locations, but does not allow LLM to make planning decisions. Agentless is an automated approach to software development problems that uses a simple two-phase approach to locate and fix bugs in your code base. In the locating phase, Agentless uses a hierarchical approach to gradually narrow down to suspicious files, classes/functions and specific editing locations. For fixes, it uses a simple diff format (referenced from the open source tool Aider) to generate multiple candidate patches, filtering and sorting them.

Topping the list of open source AI software engineers, UIUCs agent-less solution easily solves SWE-bench real programming problems

The researchers compared Agentless with existing AI software agents, including state-of-the-art open source and commercial/closed source projects. Surprisingly, Agentless can outperform all existing open source software agents at a lower cost! Agentless solved 27.33% of problems, the highest among open source solutions, and solved it for an average of $0.29 per problem and about $0.34 on average across all problems (both solvable and unsolved).

Topping the list of open source AI software engineers, UIUCs agent-less solution easily solves SWE-bench real programming problems

Not only that, Agentless has the potential to improve. Agentless can solve 41% of the problems when considering all generated patches, an upper bound that indicates significant room for improvement in the patch sorting and selection stages. Furthermore, Agentless is able to solve some unique problems that even the best commercial tool (Alibaba Lingma Agent) cannot solve, suggesting that it can be used as a complement to existing tools.

Topping the list of open source AI software engineers, UIUCs agent-less solution easily solves SWE-bench real programming problems

Analysis of SWE-bench Lite data set

The researchers also conducted manual inspection and detailed analysis of the SWE-bench Lite data set.

The study found that 4.3% of the problems in the SWE-bench Lite data set gave complete answers directly in the problem description, which is the correct fix patch. While the other 10% of the questions describe the exact steps to the correct solution. This suggests that some problems in SWE-bench Lite may be easier to solve.

In addition, the research team observed that 4.3% of the issues included user-proposed solutions or steps in the issue description, but these solutions were not consistent with the developers' actual patches. This further reveals a potential problem with this benchmark, as these misleading solutions could cause the AI ​​tool to generate incorrect solutions simply by following the problem description.

In terms of problem description quality, researchers observed that although most tasks in SWE-bench Lite contain sufficient information, and many tasks also provide failure examples to reproduce errors, there are still 9.3% of problems Not enough information included. For example, you need to implement a new function or add an error message, but the specific function name or specific error message string is not given in the problem description. This means that even if the underlying functionality is implemented correctly, the test will fail if the function name or error message string does not match exactly.

Topping the list of open source AI software engineers, UIUCs agent-less solution easily solves SWE-bench real programming problems

Researchers at Princeton University and one of the authors of SWE-Bench, Ofir Press confirmed their findings: "Agentless performed a good manual analysis of SWE-bench Lite. They believe that the theoretical highest on Lite The score is probably 90.7%. I think the actual upper limit may be lower (around 80%). Some questions have insufficient information and others are too rigorously tested. ”

Topping the list of open source AI software engineers, UIUCs agent-less solution easily solves SWE-bench real programming problems

SWE-bench Lite-S: passed. Filtered strict problem subset

To address these problems, the researchers proposed a strict problem subset SWE-bench Lite-S (containing 252 questions). Specifically, issues that contained exact patches, misleading solutions, or did not provide sufficient information in the issue description were excluded from SWE-bench Lite (containing 300 issues). This removes unreasonable questions and standardizes the difficulty level of the benchmark. Compared to the original SWE-bench Lite, the filtered benchmark more accurately reflects the true capabilities of automated software development tools.

Conclusion

Although Agent-based software development is very promising, the authors believe that it is time for the technology and research community to stop and think about its key design and evaluation methods, rather than rushing to release more Agents. Researchers hope that Agentless can help reset the baseline and direction of Agents in future software engineering.

The above is the detailed content of Topping the list of open source AI software engineers, UIUC's agent-less solution easily solves SWE-bench real programming problems. For more information, please follow other related articles on the PHP Chinese website!

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
10 Generative AI Coding Extensions in VS Code You Must Explore10 Generative AI Coding Extensions in VS Code You Must ExploreApr 13, 2025 am 01:14 AM

Hey there, Coding ninja! What coding-related tasks do you have planned for the day? Before you dive further into this blog, I want you to think about all your coding-related woes—better list those down. Done? – Let’

Cooking Up Innovation: How Artificial Intelligence Is Transforming Food ServiceCooking Up Innovation: How Artificial Intelligence Is Transforming Food ServiceApr 12, 2025 pm 12:09 PM

AI Augmenting Food Preparation While still in nascent use, AI systems are being increasingly used in food preparation. AI-driven robots are used in kitchens to automate food preparation tasks, such as flipping burgers, making pizzas, or assembling sa

Comprehensive Guide on Python Namespaces & Variable ScopesComprehensive Guide on Python Namespaces & Variable ScopesApr 12, 2025 pm 12:00 PM

Introduction Understanding the namespaces, scopes, and behavior of variables in Python functions is crucial for writing efficiently and avoiding runtime errors or exceptions. In this article, we’ll delve into various asp

A Comprehensive Guide to Vision Language Models (VLMs)A Comprehensive Guide to Vision Language Models (VLMs)Apr 12, 2025 am 11:58 AM

Introduction Imagine walking through an art gallery, surrounded by vivid paintings and sculptures. Now, what if you could ask each piece a question and get a meaningful answer? You might ask, “What story are you telling?

MediaTek Boosts Premium Lineup With Kompanio Ultra And Dimensity 9400MediaTek Boosts Premium Lineup With Kompanio Ultra And Dimensity 9400Apr 12, 2025 am 11:52 AM

Continuing the product cadence, this month MediaTek has made a series of announcements, including the new Kompanio Ultra and Dimensity 9400 . These products fill in the more traditional parts of MediaTek’s business, which include chips for smartphone

This Week In AI: Walmart Sets Fashion Trends Before They Ever HappenThis Week In AI: Walmart Sets Fashion Trends Before They Ever HappenApr 12, 2025 am 11:51 AM

#1 Google launched Agent2Agent The Story: It’s Monday morning. As an AI-powered recruiter you work smarter, not harder. You log into your company’s dashboard on your phone. It tells you three critical roles have been sourced, vetted, and scheduled fo

Generative AI Meets PsychobabbleGenerative AI Meets PsychobabbleApr 12, 2025 am 11:50 AM

I would guess that you must be. We all seem to know that psychobabble consists of assorted chatter that mixes various psychological terminology and often ends up being either incomprehensible or completely nonsensical. All you need to do to spew fo

The Prototype: Scientists Turn Paper Into PlasticThe Prototype: Scientists Turn Paper Into PlasticApr 12, 2025 am 11:49 AM

Only 9.5% of plastics manufactured in 2022 were made from recycled materials, according to a new study published this week. Meanwhile, plastic continues to pile up in landfills–and ecosystems–around the world. But help is on the way. A team of engin

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
WWE 2K25: How To Unlock Everything In MyRise
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

MantisBT

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

MinGW - Minimalist GNU for Windows

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

ZendStudio 13.5.1 Mac

ZendStudio 13.5.1 Mac

Powerful PHP integrated development environment

EditPlus Chinese cracked version

EditPlus Chinese cracked version

Small size, syntax highlighting, does not support code prompt function

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment