Home  >  Article  >  Technology peripherals  >  The open source version of AI programmers is here: GPT-4 blessing, ability comparable to Devin, 1.4k Stars a day

The open source version of AI programmers is here: GPT-4 blessing, ability comparable to Devin, 1.4k Stars a day

WBOY
WBOYforward
2024-04-03 15:01:201039browse

To learn more about AIGC, please visit:

51CTO AI.x Community

https://www.51cto.com/ aigc/

Recently, many people are worried about AI replacing their jobs.

Devin, the "first AI programmer" who became popular in the AI ​​​​circle last month, has mastered full-stack skills by using large model capabilities. He only needs humans to give natural language instructions. Automate complex coding tasks.

The tool capabilities demonstrated by Devin are very amazing, especially for this startup company that takes the closed source route. Currently, only a few people can use this closed beta quota.

On Tuesday, researchers from the Princeton University NLP Group released SWE-agent, an open source version of the AI ​​programmer, which received thousands of GitHub stars in less than a day. . This SWE-agent is based on deep learning technology and can automatically write efficient and reliable code. His release attracted widespread attention, and many developers expressed high recognition of his technology and performance. These achievements also prove the advancement of AI research in the field of NLP

开源版AI程序员来了:GPT-4加持,能力比肩Devin,一天1.4k Star

SWE-agent is a new system for autonomously solving problems in GitHub repositories. It achieved similar accuracy to Devin on SWE-bench, taking an average of 93 seconds.

开源版AI程序员来了:GPT-4加持,能力比肩Devin,一天1.4k Star

  • Project website: https://swe-agent.com/
  • GitHub :https://github.com/princeton-nlp/SWE-agent

John Yang, the author of the project, said that preprints of related papers The version will also be uploaded on April 10th.

In principle, SWE-agent can fix bugs and issues in real GitHub repositories by turning large models (such as GPT-4) into software engineering agents.

On the complete SWE-bench test set, SWE-agent solved 12.29% of the problems and achieved SOTA performance.

开源版AI程序员来了:GPT-4加持,能力比肩Devin,一天1.4k Star

To provide automation during development, SWE-agent works by interacting with a dedicated terminal, which can open, search file contents, use automatic Syntax check, edit specific lines, and also write and execute tests.

The developers of this project carefully designed the UI interface and introduced it on GitHub.

Agent-Computer Interface (ACI)

The research team designed simple Large Model (LM)-centric commands and feedback format that enables large models to more easily browse repositories, view, edit, and execute code files, known as the Agent-Computer Interface (ACI). The research team also built a SWE agent repository to easily iterate on ACI designs of repository-level coded agents.

Just like language models require good prompt engineering, good ACI design will lead to better results when using agents. The baseline agent without well-tuned ACI performs much worse than the SWE-agent.

SWE-agent contains features that the research team found to be very useful during the design of the agent-computer interface, including:

1. Add a linter that runs when an edit command is issued and won't let the edit command go through if the code syntax is incorrect.

2. Provide the agent with a purpose-built file viewer. The research team found that this file viewer works best when it displays only 100 lines per round, and that the file editor has commands for scrolling up and down and performing searches within the file.

3. Provide specially built directory-wide string search commands for agents. The research team found it important that the tool lists matches succinctly—just list every file that has at least one match. The study showed that showing the model more context about each match would be too confusing for the model.

4. When the output of the command is empty, return a message: "Your command ran successfully, but did not produce any output."

Future published papers will detail more information.

Installation and use

To use SWE-agent, you must first set the following conditions:

1. Install Docker , and start Docker locally;

2. Install Miniconda, and use conda env create -fenvironment.yml to create the swe-agent environment;

3. Use conda activate swe-agent to activate;

4. Run ./setup.sh to create the swe-agent docker image;

5. Create a keys.cfg file in the root directory of this repository and fill in the following content:

OPENAI_API_KEY: 'OpenAI API Key Here if using OpenAI Model (optional)'ANTHROPIC_API_KEY: 'Anthropic API Key Here if using Anthropic Model (optional)'GITHUB_TOKEN: 'GitHub Token Here (required)'

The SWE-agent pipeline consists of two steps:

  • Step 1: SWE-agent receives the input GitHub issue and returns a pull request to try to fix it;
  • Step 2: Evaluate the pull request to verify that it actually solves the issue (currently only available for issues in the SWE-bench benchmark).

If you want to run and evaluate on the entire SWE-bench, the easiest way is to use an x86 machine.

python run.py --model_name gpt4 \--data_path https://github.com/pvlib/pvlib-python/issues/1603 --config_file config/default_from_url.yaml
python run.py --model_name gpt4 \--per_instance_cost_limit 2.00 \--config_file ./config/default.yaml

If you want to run a single question in SWE-bench, you can use --instance_filter:

python run.py --model_name gpt4 \--instance_filter marshmallow-code__marshmallow-1359

To learn more about AIGC, please visit:

51CTO AI.x Community

https://www.51cto.com/ aigc/

The above is the detailed content of The open source version of AI programmers is here: GPT-4 blessing, ability comparable to Devin, 1.4k Stars a day. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete