Before the release of GPT-4, OpenAI hired experts from various industries to conduct 'adversarial testing” to avoid issues such as discrimination.-AI-php.cn

Before the release of GPT-4, OpenAI hired experts from various industries to conduct 'adversarial testing” to avoid issues such as discrimination.

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Apr 30, 2023 pm 05:28 PM

openaigpt-4

Before the release of GPT-4, OpenAI hired experts from various industries to conduct adversarial testing” to avoid issues such as discrimination.

It was reported on April 17 that before the release of the large-scale language model GPT-4, the artificial intelligence start-up OpenAI hired experts from all walks of life to form a "blue army" team to evaluate the model. What issues might arise for "adversarial testing". Experts ask various exploratory or dangerous questions to test how the AI responds; OpenAI will use these findings to retrain GPT-4 and solve the problems.

After Andrew White gained access to GPT-4, the new model behind the artificial intelligence chatbot, he used it to propose a brand new nerve agent.

As a professor of chemical engineering at the University of Rochester, White was one of 50 scholars and experts hired by OpenAI last year to form OpenAI’s “Blue Army” team. Over the course of six months, members of the "Blue Army" will conduct "qualitative detection and adversarial testing" of the new model to see if it can break GPT-4.

White said he used GPT-4 to propose a compound that could be used as a chemical poison, and also introduced various "plug-ins" that can provide information sources for the new language model, such as scientific papers and chemical manufacturer names. ". Turns out the AI chatbot even found a place to make the chemical poison.

"I think artificial intelligence will give everyone the tools to do chemistry experiments faster and more accurately," White said. "But there is also a risk that people will use artificial intelligence to do dangerous chemical experiments... Now this This situation does exist."

The introduction of "Blue Army Testing" allows OpenAI to ensure that this consequence will not occur when GPT-4 is released.

The purpose of the "Blue Force Test" is to dispel concerns that there are dangers in deploying powerful artificial intelligence systems in society. The job of the "blue team" team is to ask various probing or dangerous questions and test how the artificial intelligence responds.

OpenAI wants to know how the new model will react to bad problems. So the Blues team tested lies, language manipulation and dangerous scientific common sense. They also examined the potential of the new model to aid and abet illegal activities such as plagiarism, financial crime and cyberattacks.

The GPT-4 “Blue Army” team comes from all walks of life and includes academics, teachers, lawyers, risk analysts and security researchers. The main working locations are in the United States and Europe.

They fed back their findings to OpenAI, which used team members’ findings to retrain GPT-4 and solve problems before publicly releasing GPT-4. Over the course of several months, members spend 10 to 40 hours each testing new models. Many interviewees stated that their hourly wages were approximately US$100.

Many "Blue Army" team members are worried about the rapid development of large language models, and even more worried about the risks of connecting to external knowledge sources through various plug-ins.

"Now the system is frozen, which means that it no longer learns and no longer has memory," said José E, a member of the GPT-4 "Blue Team" and a professor at the Valencia Institute of Artificial Intelligence. José Hernández-Orallo said. "But what if we use it to go online? This could be a very powerful system connected to the whole world."

OpenAI said that the company attaches great importance to security and will test various plug-ins before release. And as more and more people use GPT-4, OpenAI will regularly update the model.

Technology and human rights researcher Roya Pakzad used questions in English and Farsi to test whether the GPT-4 model was biased in terms of gender, race, and religion.

Pakzad found that even after updates, GPT-4 had clear stereotypes about marginalized communities, even in later versions.

She also found that when testing the model with Farsi questions, the chatbot's "illusion" of making up information to answer questions was more severe. The robot made up more names, numbers and events in Farsi than in English.

Pakzadeh said: "I am worried that linguistic diversity and the culture behind the language may attenuate."

Boru Gollo, a lawyer based in Nairobi, is the only A tester from Africa also noticed that the new model had a discriminatory tone. "When I was testing the model, it was like a white man talking to me," Golo said. "If you ask a specific group, it will give you a biased view or a very biased answer." OpenAI also admitted that GPT-4 still has biases.

Members of the "Blue Army" who evaluate the model from a security perspective have different views on the security of the new model. Lauren Kahn, a researcher from the Council on Foreign Relations, said that when she began researching whether this technique could potentially be used in cyberattacks, she "didn't expect it to be so detailed that it could be fine-tuned." implementation". Yet Kahn and other testers found that the new model's responses became considerably safer over time. OpenAI said that before the release of GPT-4, the company trained it on rejecting malicious network security requests.

Many members of the “Blue Army” stated that OpenAI had conducted a rigorous security assessment before release. Maarten Sap, an expert on language model toxicity at Carnegie Mellon University, said: "They have done a pretty good job of eliminating obvious toxicity in the system."

Since the launch of ChatGPT, OpenAI has also been criticized by many parties. , a technology ethics organization complained to the U.S. Federal Trade Commission (FTC) that GPT-4 is "biased, deceptive, and poses a threat to privacy and public safety."

Recently, OpenAI also launched a feature called the ChatGPT plug-in, through which partner applications such as Expedia, OpenTable and Instacart can give ChatGPT access to their services, allowing them to order goods on behalf of human users.

Dan Hendrycks, an artificial intelligence security expert on the "Blue Army" team, said that such plug-ins may make humans themselves "outsiders."

“What would you think if a chatbot could post your private information online, access your bank account, or send someone to your home?” Hendricks said. “Overall, we need stronger security assessments before we let AI take over cyber power.”

Members of the “Blue Army” also warned that OpenAI cannot stop just because the software responds in real time. Safety test. Heather Frase, who works at Georgetown University's Center for Security and Emerging Technologies, also tested whether GPT-4 could assist criminal behavior. She said the risks will continue to increase as more people use the technology.

The reason you do real-run tests is because they behave differently once used in a real environment, she said. She believes that public systems should be developed to report the types of events caused by large language models , similar to cybersecurity or consumer fraud reporting systems.

Labor economist and researcher Sara Kingsley suggests that the best solution is something like "Nutrition Labels" on food packaging "That way, speak directly to the hazards and risks.

The key is to have a framework and know what the common problems are so you can have a safety valve," she said. “That’s why I say the work is never done. ”

The above is the detailed content of Before the release of GPT-4, OpenAI hired experts from various industries to conduct 'adversarial testing” to avoid issues such as discrimination.. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete

Excel TRANSPOSE FunctionApr 22, 2025 am 09:52 AM

Powerful tools in Excel data analysis and processing: Detailed explanation of TRANSPOSE function Excel remains a powerful tool in the field of data analysis and processing. Among its many features, the TRANSPOSE function stands out for its ability to reorganize data quickly and efficiently. This feature is especially useful for data scientists and AI professionals who often need to reconstruct data to suit specific analytics needs. In this article, we will explore the TRANSPOSE function of Excel in depth, exploring its uses, usage and its practical application in data science and artificial intelligence. Learn more: Microsoft Excel Data Analytics Table of contents In Excel

How to Install Power BI DesktopApr 22, 2025 am 09:49 AM

Get Started with Microsoft Power BI Desktop: A Comprehensive Guide Microsoft Power BI is a powerful, free business analytics tool enabling data visualization and seamless insight sharing. Whether you're a data scientist, analyst, or business user, P

Graph RAG: Enhancing RAG with Graph Structures - Analytics VidhyaApr 22, 2025 am 09:48 AM

Introduction Ever wondered how some AI systems seem to effortlessly access and integrate relevant information into their responses, mimicking a conversation with an expert? This is the power of Retrieval-Augmented Generation (RAG). RAG significantly

SQL GRANT CommandApr 22, 2025 am 09:45 AM

Introduction Database security hinges on managing user permissions. SQL's GRANT command is crucial for this, enabling administrators to assign specific access rights to different users or roles. This article explains the GRANT command, its syntax, c

What is Python IDLE?Apr 22, 2025 am 09:43 AM

Introduction Python IDLE is a powerful tool that can easily develop, debug and run Python code. Its interactive shell, syntax highlighting, autocomplete and integrated debugger make it ideal for programmers of all levels of experience. This article will outline its functions, settings, and practical applications. Overview Learn about Python IDLE and its development benefits. Browse and use the main components of the IDLE interface. Write, save, and run Python scripts in IDLE. Use syntax highlighting, autocomplete and intelligent indentation. Use the IDLE integrated debugger to effectively debug Python code. Table of contents

Python & # 039: S maximum Integer ValueApr 22, 2025 am 09:40 AM

Python: Mastering Large Integers – A Comprehensive Guide Python's exceptional capabilities extend to handling integers of any size. While this offers significant advantages, it's crucial to understand potential limitations. This guide provides a deta

9 Free Stanford AI CoursesApr 22, 2025 am 09:35 AM

Introduction Artificial intelligence (AI) is revolutionizing industries and unlocking unprecedented possibilities across diverse fields. Stanford University, a leading institution in AI research, provides a wealth of free online courses to help you

What is Meta's Segment Anything Model(SAM)?Apr 22, 2025 am 09:25 AM

Meta's Segment Anything Model (SAM): A Revolutionary Leap in Image Segmentation Meta AI has unveiled SAM (Segment Anything Model), a groundbreaking AI model poised to revolutionize computer vision and image segmentation. This article delves into SAM

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

3 weeks agoByDDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

2 weeks agoByDDD

Where to find the Crane Control Keycard in Atomfall

3 weeks agoByDDD

Assassin's Creed Shadows - How To Find The Blacksmith And Unlock Weapon And Armour Customisation

1 months agoByDDD

Roblox: Dead Rails - How To Complete Every Challenge

3 weeks agoByDDD

Hot Tools

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software