Home  >  Article  >  Technology peripherals  >  OpenAI develops new tool to try to explain the behavior of language models

OpenAI develops new tool to try to explain the behavior of language models

WBOY
WBOYforward
2023-05-12 10:28:051120browse

Language model is an artificial intelligence technology that can generate natural language based on given text. OpenAI's GPT series language models are currently one of the most advanced representatives, but IT House has noticed that they also have a problem: their behavior is difficult to understand and predict. To make language models more transparent and trustworthy, OpenAI is developing a new tool that can automatically identify which parts of a language model are responsible for its behavior and explain it in natural language.

OpenAI develops new tool to try to explain the behavior of language models

The principle of this tool is to use another language model (that is, OpenAI's latest GPT-4) to analyze other language models (such as OpenAI's own GPT-2) internal structure. A language model is composed of many "neurons", each of which can observe a specific pattern in the text and influence the next output of the model. For example, given a question about superheroes (such as "Which superheroes have the most useful superpowers?"), a "Marvel superhero neuron" might increase the probability that the model mentions a specific superhero from a Marvel movie. .

OpenAI’s tools use this mechanism to decompose the various parts of the model. First, it feeds a text sequence into the model being evaluated and waits for a certain neuron to "fire" frequently. It then "shows" these highly active neurons to GPT-4, and lets GPT-4 generate an explanation. To determine the accuracy of the interpretation, it feeds GPT-4 some text sequences and asks it to predict or simulate the neuron’s behavior. It then compares the behavior of the simulated neurons to the behavior of actual neurons.

"With this approach, we can basically generate some preliminary natural language explanations for each neuron, and also have a score that measures how well those explanations match actual behavior." OpenAI Scalable Alignment "We use GPT-4 as part of a process to generate interpretations of what the neuron is looking for and assess how well those interpretations match what it actually does," said team leader Jeff Wu.

Research We were able to generate explanations for all 307,200 neurons in GPT-2 and compile them into a dataset that is released as open source on GitHub along with the tool code. Tools like this may one day be used to improve the performance of language models, such as reducing bias or harmful speech. But they also admit there's still a long way to go before it's truly useful. The tool is confident in its interpretation of about 1,000 neurons, a small fraction of the total.

One might argue that this tool is actually an advertisement for GPT-4, since it requires GPT-4 to run. But Wu says that’s not the purpose of the tool, that its use of GPT-4 was “accidental” and that, instead, it shows GPT-4’s weakness in this area. He also said that it was not created for commercial applications and could theoretically be adapted to other language models besides GPT-4.

“Most of the explanations received very low scores, or didn’t explain much about the actual neuron behavior,” Wu said. “It’s hard to tell how many neurons are active—for example, they behave in five or six different ways. is activated on something, but there is no obvious pattern. Sometimes there is an obvious pattern, but GPT-4 cannot find it."

Not to mention more complex, newer, larger models, or that can browse the web Get information from the model. But for the latter, Wu believes that browsing the web won't change the basic mechanics of the tool too much. It only takes a little tweaking, he says, to figure out why neurons decide to make certain search engine queries or visit specific websites.

"We hope this will open up a promising avenue to solve the explainability problem in an automated way that others can build on and contribute to," Wu said. We were really able to have good explanations of the behavior of these models.”

The above is the detailed content of OpenAI develops new tool to try to explain the behavior of language models. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete