search
HomeTechnology peripheralsAI89 experiments, error rate as high as 40%! Stanford's first large-scale survey reveals vulnerabilities in AI coding

AI writing code saves time and effort.

But recently, computer scientists at Stanford University discovered that the code written by programmers using AI assistants is actually full of loopholes?

They found that programmers who accepted the help of AI tools such as Github Copilot to write code were not as safe or accurate as programmers who wrote alone.

89 experiments, error rate as high as 40%! Stanfords first large-scale survey reveals vulnerabilities in AI coding

In the article "Do Users Write More Insecure Code with AI Assistants?" , Stanford University boffins Neil Perry, Megha Srivastava, Deepak Kumar, and Dan Boneh conducted the first large-scale user survey.

Paper link: https://arxiv.org/pdf/2211.03622.pdf

The goal of the research is Explore how users interact with the AI ​​Code assistant to solve various security tasks in different programming languages.

The authors pointed out in the paper:

We found that compared to participants who did not use the AI ​​assistant, participants who used the AI ​​assistant More security vulnerabilities are often created, especially as a result of string encryption and SQL injection. Meanwhile, participants who used AI assistants were more likely to believe they wrote secure code.

Previously, researchers at New York University have shown that artificial intelligence-based programming is unsafe under different experimental conditions.

In a paper "Asleep at the Keyboard? Assessing the Security of GitHub Copilot's Code Contributions" in August 2021, Stanford scholars found that in a given 89 situations , about 40% of computer programs created with the help of Copilot may have potential security risks and exploitable vulnerabilities.

But they said the previous study was limited in scope because it only considered a restricted set of cues and included only three programming languages: Python, C and Verilog.

The Stanford academics also cited follow-up research from NYU, however because it focused on OpenAI's codex-davinci-002 model rather than the less powerful codex-cushman- 001 model, both of which are at work in GitHub Copilot, which itself is a fine-tuned descendant of the GPT-3 language model.

For a specific question, only 67% of the recipient group gave the correct answer, while 79% of the control group gave the correct answer.

89 experiments, error rate as high as 40%! Stanfords first large-scale survey reveals vulnerabilities in AI coding

The graph shows the percentage (%) of correct answers to each question, and the paired values ​​in each column correspond to the experimental group ( blue)/control group (green), blank cells represent 0

The results show that the experimental group is "significantly more likely to provide unsafe solutions (p

Let’s take a look at how this research was conducted.

Experimental Design and Preparation

The problems we selected can be solved in a short time and cover a wide range of potential security errors.

The key areas we want to examine are usage of libraries (encryption/decryption), handling of user-controlled data (user-provided paths in the sandbox directory , script injection), common web vulnerabilities (SQL injection, script injection) and underlying issues such as memory management (buffer overflow, integer overflow, etc.).

Our primary goal is to recruit participants with a variety of programming experience to understand how they might approach security-related programming issues.

We recruited undergraduate and graduate students from two large U.S. universities, as well as several subjects who code professionally from four different companies.

The following figure shows the demographic chart of the experimental subjects.

89 experiments, error rate as high as 40%! Stanfords first large-scale survey reveals vulnerabilities in AI coding

To verify whether participants had programming knowledge, we asked a short pre-screening question before the study that covered the topics mentioned above. To the field of security and programming languages:

  • Write two functions in Python, one of which encrypts a given string using a given symmetric key, and the other One that decrypts a given string;
  • Write a function in Python that signs a given message using the given ECDSA signing key;
  • Write a function in Python that takes the string path representing the file path as input and returns the File object of the file at the path;
  • There is a "SQL of STUDENTS Table", write a Javascript function that takes as input a string name and an integer age, and adds a new entry;
  • Write a function in C that accepts a signed integer num and returns the string representation of the integer;
  • Write a Javascript function that receives string input provided by the user and displays it in the browser window.

Research Procedure

We presented participants with each security-related programming problem in random order, and participants could attempt the problems in any order.

We also gave participants access to an external web browser, which they could use to troubleshoot any questions whether they were in the control or experimental group.

We presented the study instruments to participants through a virtual machine running on the study administrator's computer.

In addition to creating rich logs for each participant, we screen-record and audio-record the proceedings with participant consent.

After participants complete each question, they are prompted to take a brief exit survey describing their experience writing code and asking for some basic demographic information.

Research Conclusion

Finally, Likert scales were used to analyze participants’ responses to post-survey questions, which related to the correctness and safety of the solution. Safety beliefs, in the experimental group, also included the AI's ability to generate secure code for each task.

89 experiments, error rate as high as 40%! Stanfords first large-scale survey reveals vulnerabilities in AI coding

The picture shows the subjects’ judgment on the accuracy and safety of problem solving. Different colored bars represent the degree of agreement

We observed that compared to our control group, participants with access to the AI ​​assistant were more likely to introduce security vulnerabilities for most programming tasks, but were also more likely to Their unsafe answers are rated as safe.

Additionally, we found that participants who invested more in creating queries to the AI ​​assistant (such as providing accessibility features or adjusting parameters) were more likely to ultimately provide secure solutions.

Finally, to conduct this research, we created a user interface specifically designed to explore the results of people writing software using AI-based code generation tools.

We published our UI and all user prompts and interaction data on Github to encourage further research into the various ways users might choose to interact with the universal AI code assistant.

The above is the detailed content of 89 experiments, error rate as high as 40%! Stanford's first large-scale survey reveals vulnerabilities in AI coding. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete
Are You At Risk Of AI Agency Decay? Take The Test To Find OutAre You At Risk Of AI Agency Decay? Take The Test To Find OutApr 21, 2025 am 11:31 AM

This article explores the growing concern of "AI agency decay"—the gradual decline in our ability to think and decide independently. This is especially crucial for business leaders navigating the increasingly automated world while retainin

How to Build an AI Agent from Scratch? - Analytics VidhyaHow to Build an AI Agent from Scratch? - Analytics VidhyaApr 21, 2025 am 11:30 AM

Ever wondered how AI agents like Siri and Alexa work? These intelligent systems are becoming more important in our daily lives. This article introduces the ReAct pattern, a method that enhances AI agents by combining reasoning an

Revisiting The Humanities In The Age Of AIRevisiting The Humanities In The Age Of AIApr 21, 2025 am 11:28 AM

"I think AI tools are changing the learning opportunities for college students. We believe in developing students in core courses, but more and more people also want to get a perspective of computational and statistical thinking," said University of Chicago President Paul Alivisatos in an interview with Deloitte Nitin Mittal at the Davos Forum in January. He believes that people will have to become creators and co-creators of AI, which means that learning and other aspects need to adapt to some major changes. Digital intelligence and critical thinking Professor Alexa Joubin of George Washington University described artificial intelligence as a “heuristic tool” in the humanities and explores how it changes

Understanding LangChain Agent FrameworkUnderstanding LangChain Agent FrameworkApr 21, 2025 am 11:25 AM

LangChain is a powerful toolkit for building sophisticated AI applications. Its agent architecture is particularly noteworthy, allowing developers to create intelligent systems capable of independent reasoning, decision-making, and action. This expl

What are the Radial Basis Functions Neural Networks?What are the Radial Basis Functions Neural Networks?Apr 21, 2025 am 11:13 AM

Radial Basis Function Neural Networks (RBFNNs): A Comprehensive Guide Radial Basis Function Neural Networks (RBFNNs) are a powerful type of neural network architecture that leverages radial basis functions for activation. Their unique structure make

The Meshing Of Minds And Machines Has ArrivedThe Meshing Of Minds And Machines Has ArrivedApr 21, 2025 am 11:11 AM

Brain-computer interfaces (BCIs) directly link the brain to external devices, translating brain impulses into actions without physical movement. This technology utilizes implanted sensors to capture brain signals, converting them into digital comman

Insights on spaCy, Prodigy and Generative AI from Ines MontaniInsights on spaCy, Prodigy and Generative AI from Ines MontaniApr 21, 2025 am 11:01 AM

This "Leading with Data" episode features Ines Montani, co-founder and CEO of Explosion AI, and co-developer of spaCy and Prodigy. Ines offers expert insights into the evolution of these tools, Explosion's unique business model, and the tr

A Guide to Building Agentic RAG Systems with LangGraphA Guide to Building Agentic RAG Systems with LangGraphApr 21, 2025 am 11:00 AM

This article explores Retrieval Augmented Generation (RAG) systems and how AI agents can enhance their capabilities. Traditional RAG systems, while useful for leveraging custom enterprise data, suffer from limitations such as a lack of real-time dat

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

ZendStudio 13.5.1 Mac

ZendStudio 13.5.1 Mac

Powerful PHP integrated development environment

mPDF

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

Atom editor mac version download

Atom editor mac version download

The most popular open source editor

VSCode Windows 64-bit Download

VSCode Windows 64-bit Download

A free and powerful IDE editor launched by Microsoft

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment