search
HomeTechnology peripheralsAIYou can guess movies based on emoticons. Where does ChatGPT's 'emergence” ability come from?

Now that large language models such as ChatGPT are powerful enough, they have begun to exhibit surprising and unpredictable behavior.

Before we formally introduce this article, let us first ask a question: What movie does the emoticon in the picture below describe?

You can guess movies based on emoticons. Where does ChatGPTs emergence” ability come from?

You may not even be able to guess it. The movie represented by these four symbols is "Finding Nemo". This prompt task was to evaluate the Large Language Model (LLM) 204 last year. One of the tasks. For the simplest LLM model, the answer given is somewhat random, and it thinks that this movie tells the story of a man; for the relatively complex medium-sized model, the answer given is "The Emoji Movie". At this time The answer is very close. But the most complex model guessed it right, giving the answer "Finding Nemo."

Google computer scientist Ethan Dyer said: "This behavior of the model is surprising. What is even more surprising is that these models only use instructions: that is, they accept a string of text as input and then predict what will happen next. What happens, and keep repeating this process based entirely on statistical data." Some scholars began to expect that increasing the size of the models would improve the performance of solving known tasks, but they did not expect that these models would suddenly be able to handle so many new and unpredictable tasks. Task.

A recent survey conducted by Ethan Dyer shows that LLM can create hundreds of "emergent" capabilities, that is, the ability of large models to complete certain tasks that small models cannot complete. Clearly, the ability to scale the model increases, from simple multiplication to generating executable computer code to decoding movies based on emojis. New analysis shows that for certain tasks and certain models, there is a complexity threshold above which the capabilities of the model skyrocket. However, the researchers also pointed out the negative impact of model scaling: as complexity increases, some models exhibit new biases and inaccuracies in their responses.

"In all the literature that I'm aware of, there's never been a discussion of language models doing these things," says Rishi Bommasani, a computer scientist at Stanford University who helped compile a document last year that included dozens of models. A list of emergent behaviors, including several identified in Ethan Dyer's project. Today, the list continues to grow.

Today, researchers are racing not only to determine the emergent capabilities of large models, but also to figure out why and how they occur—essentially trying to predict the unpredictability. Understanding its emergent nature can reveal answers to deep questions related to artificial intelligence and machine learning, such as whether complex models are actually doing something new or are simply becoming very good at statistics. Additionally, it can help researchers exploit potential benefits and reduce emerging risks.

Emergence

Biologists, physicists, ecologists, and other scientists use the term emergence to describe the self-organizing collective that occurs when a large group of things act as a unit. sexual behavior. The combination of inanimate atoms creates living cells; water molecules create waves; the spectacular natural spectacle of flocks of starlings flying across the sky in ever-changing but recognizable formations; cells make muscles move and hearts beat. Crucially, emergent capabilities occur in systems involving many independent parts. But researchers have only recently been able to document this emergent power in LLMs because the models have only just grown to sufficiently large scales.

Language models have been around for decades. Until about five years ago, the most powerful models were based on recurrent neural networks. These models essentially take a string of text and predict what the next word will be. What makes a model loop is that it learns from its own output: its predictions are fed back into the network to improve future performance.

In 2017, researchers at Google Brain introduced a new architecture called Transformer. While the recurrent network analyzes the sentence word by word, the Transformer processes all words simultaneously. This means that Transformer can process large amounts of text in parallel.

"It's possible that the model learned something fundamentally new and different that it didn't learn on smaller models," says Ellie Pavlick of Brown University.

Transformers can quickly scale up the complexity of a language model by increasing the number of parameters in the model, among other factors. These parameters can be thought of as connections between words, and by shuffling the text during training, transformers can tune these connections to improve the model. The more parameters in a model, the more accurately it can make connections and the closer it comes to mimicking human speech. As expected, a 2020 analysis by OpenAI researchers found that models improve in accuracy and power as they scale.

But the advent of large-scale language models has also brought many truly unexpected things. With the advent of models like GPT-3, which has 175 billion parameters, or Google PaLM, which scales to 540 billion parameters, users are starting to describe more and more emergent behaviors. One DeepMind engineer even reported being able to convince ChatGPT to admit it was a Linux terminal and have it run some simple math code to calculate the first 10 prime numbers. Notably, it completes the task much faster than running the same code on a real Linux device.

As with the task of describing movies through emojis, the researchers had no reason to think that language models built for predicting text would be persuaded to be used to mimic computer terminals. Many of these emergent behaviors demonstrate zero-shot or few-shot learning, and they describe the ability of LLMs to solve problems that have never (or rarely) been encountered before. This has been a long-term goal of artificial intelligence research, Ganguli said. It also showed that GPT-3 could solve problems in a zero-shot setting without any explicit training data, Ganguli said. "It made me quit what I was doing and get more involved in this research."

He is not alone in this field of research. The first clues that LLMs can transcend the limitations of their training data have been discovered by a host of researchers, who are working to better understand what emergence looks like and how it occurs. And the first step is to document it thoroughly and comprehensively.

Ethan Dyer helps explore what unexpected capabilities large language models have, and what they bring to the table. -Gabrielle Lurie What remains an open question. Therefore, they asked the research community to provide examples of difficult and diverse tasks to document the outer limits of what tracking LLMs can do. The effort, known as the BIG-bench (Beyond the Imitation Game Benchmark) project, borrowing its name from Alan Turing's imitation game, was designed to test whether computers could answer questions in a convincingly human way. (This became known as the Turing test.) The research group was particularly interested in examples of LLMs suddenly acquiring new and unprecedented capabilities.
As one would expect, in some tasks, model performance improves more consistently and predictably as complexity increases. On other tasks, expanding the number of parameters did not produce any improvement in model performance. And for about 5 percent of the tasks, the researchers found what they called a breakthrough — a rapid, dramatic jump in performance within a certain threshold. However, this threshold will vary depending on the task and model.

However, researchers quickly realized that a model’s complexity was not the only driver of its performance. If the data quality is high enough, some unexpected capabilities can be induced from smaller models with fewer parameters or trained on smaller data sets. Additionally, the way a query is worded can affect the accuracy of the model's response. For example, when Dyer and colleagues used a multiple-choice format for a movie emoji task, accuracy didn't improve in a sudden jump but gradually improved as model complexity increased. Last year, in a paper presented at NeurIPS, the top academic conference in the field, researchers at Google Brain showed how a model with prompts could explain itself (an ability known as chain-of-thought reasoning). Correctly solve a math word problem that the same model without the prompt would not be able to solve.

Until you study the impact of model size, you won’t know what capabilities it may have and what its flaws may be.

Yi Tay, a systematic research scientist at Google Brain, pointed out that recent research shows that the thinking chain prompt changes the expansion curve, thereby changing the node where the model emerges. In their NeurIPS paper, Google researchers show that using thought chain prompts can elicit emergent behavior not identified in the BIG-bench study. Such prompts, which require models to explain their reasoning, may help researchers begin to investigate why emergence occurs.

These recent findings suggest at least two possibilities for why emergence occurs, said Ellie Pavlick, a computer scientist at Brown University who studies computational models of language. The first possibility is that larger models do acquire new capabilities spontaneously, as comparisons with biological systems suggest. It could very well be that the model learned something completely new and different that it didn't have on the smaller scale model, which is what we all hope is the case, that something fundamental happens when the model is scaled up Variety.

Ellie Pavlick also pointed out that another relatively normal and objective possibility is that what appears to be emergent may instead be the culmination of an internal statistically driven process that operates through chain-of-thought reasoning. Large LLMs may simply be learning heuristics that are incomprehensible to smaller models with fewer parameters or lower quality data.

But Pavlick believes that because we don’t know how the underlying working mechanism of the model is, we can’t tell what is going on.

Unpredictable capabilities and flaws

But large models also have flaws. For example, Bard, the artificial intelligence chat robot launched by Google some time ago, answered questions related to the James Webb Space Telescope. Make factual errors.

Emergence leads to unpredictability, and unpredictability—which seems to increase as the size of the model increases—is difficult for researchers to control.

“It’s hard to know in advance how these models will be used or deployed,” Ganguli said. “To study emergent phenomena, you have to consider a situation where you won’t know what capabilities it may have and what its flaws may be until you study the effects of model size.”

Published in June last year In an LLM analysis, Anthropic researchers examined whether these models exhibit certain types of racial or social biases, unlike previous algorithms that were not based on LLM and were used to predict which ex-offenders were likely to reoffend. Those reported differ. The research was inspired by an apparent paradox directly related to emergence: As models improve performance as they scale up, they may also increase the likelihood of unpredictable phenomena, including those that may lead to bias or cause harm.

“Certain harmful behaviors will pop up in certain models,” Ganguli said. He points to a recent analysis of LLM – also known as the BBQ benchmark – which showed that social bias emerges across a wide range of parameters. "Larger models suddenly become more biased," he said, a risk that could jeopardize the use of these models if not addressed.

But he also made a counterpoint: When researchers simply tell a model not to rely on stereotypes or social biases—literally, by feeding in these instructions—the model improves its predictions and responses. The bias is smaller. This suggests that some emergent properties may also be used to reduce bias. In a paper published in February, the Anthropic team reported a new mode of moral self-correction in which users prompt programs to be helpful, honest, and harmless.

Ganguli said emergence reveals both the amazing potential of large language models and their unpredictable risks. Applications of these LLMs have proliferated, so a better understanding of this duality will help exploit the diversity of language model capabilities.

Ganguli said: "We are studying how users actually use these systems, but they are also constantly tinkering and improving these systems. We spend a lot of time just chatting with our models and using them. It worked better. And that's actually when we started trusting these models."

The above is the detailed content of You can guess movies based on emoticons. Where does ChatGPT's 'emergence” ability come from?. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete
Word文本框没有旋转按钮怎么办Word文本框没有旋转按钮怎么办Dec 08, 2022 am 09:50 AM

Word文本框没有旋转按钮的解决办法:打开兼容模式文档后按F12键另存为高版本,再打开就可以了。

令人惊艳的4个ChatGPT项目,开源了!令人惊艳的4个ChatGPT项目,开源了!Mar 30, 2023 pm 02:11 PM

自从 ChatGPT、Stable Diffusion 发布以来,各种相关开源项目百花齐放,着实让人应接不暇。今天,着重挑选几个优质的开源项目分享给大家,对我们的日常工作、学习生活,都会有很大的帮助。

Word文档拆分后的子文档字体格式变了怎么办Word文档拆分后的子文档字体格式变了怎么办Feb 07, 2023 am 11:40 AM

Word文档拆分后的子文档字体格式变了的解决办法:1、在大纲模式拆分文档前,先选中正文内容创建一个新的样式,给样式取一个与众不同的名字;2、选中第二段正文内容,通过选择相似文本的功能将剩余正文内容全部设置为新建样式格式;3、进入大纲模式进行文档拆分,操作完成后打开子文档,正文字体格式就是拆分前新建的样式内容。

学术专用版ChatGPT火了,一键完成论文润色、代码解释、报告生成学术专用版ChatGPT火了,一键完成论文润色、代码解释、报告生成Apr 04, 2023 pm 01:05 PM

用 ChatGPT 辅助写论文这件事,越来越靠谱了。 ChatGPT 发布以来,各个领域的从业者都在探索 ChatGPT 的应用前景,挖掘它的潜力。其中,学术文本的理解与编辑是一种极具挑战性的应用场景,因为学术文本需要较高的专业性、严谨性等,有时还需要处理公式、代码、图谱等特殊的内容格式。现在,一个名为「ChatGPT 学术优化(chatgpt_academic)」的新项目在 GitHub 上爆火,上线几天就在 GitHub 上狂揽上万 Star。项目地址:https://github.com/

30行Python代码就可以调用ChatGPT API总结论文的主要内容30行Python代码就可以调用ChatGPT API总结论文的主要内容Apr 04, 2023 pm 12:05 PM

阅读论文可以说是我们的日常工作之一,论文的数量太多,我们如何快速阅读归纳呢?自从ChatGPT出现以后,有很多阅读论文的服务可以使用。其实使用ChatGPT API非常简单,我们只用30行python代码就可以在本地搭建一个自己的应用。 阅读论文可以说是我们的日常工作之一,论文的数量太多,我们如何快速阅读归纳呢?自从ChatGPT出现以后,有很多阅读论文的服务可以使用。其实使用ChatGPT API非常简单,我们只用30行python代码就可以在本地搭建一个自己的应用。使用 Python 和 C

vscode配置中文插件,带你无需注册体验ChatGPT!vscode配置中文插件,带你无需注册体验ChatGPT!Dec 16, 2022 pm 07:51 PM

​面对一夜爆火的 ChatGPT ,我最终也没抵得住诱惑,决定体验一下,不过这玩意要注册需要外国手机号以及科学上网,将许多人拦在门外,本篇博客将体验当下爆火的 ChatGPT 以及无需注册和科学上网,拿来即用的 ChatGPT 使用攻略,快来试试吧!

用ChatGPT秒建大模型!OpenAI全新插件杀疯了,接入代码解释器一键get用ChatGPT秒建大模型!OpenAI全新插件杀疯了,接入代码解释器一键getApr 04, 2023 am 11:30 AM

ChatGPT可以联网后,OpenAI还火速介绍了一款代码生成器,在这个插件的加持下,ChatGPT甚至可以自己生成机器学习模型了。 ​上周五,OpenAI刚刚宣布了惊爆的消息,ChatGPT可以联网,接入第三方插件了!而除了第三方插件,OpenAI也介绍了一款自家的插件「代码解释器」,并给出了几个特别的用例:解决定量和定性的数学问题;进行数据分析和可视化;快速转换文件格式。此外,Greg Brockman演示了ChatGPT还可以对上传视频文件进行处理。而一位叫Andrew Mayne的畅销作

ChatGPT教我学习PHP中AOP的实现(附代码)ChatGPT教我学习PHP中AOP的实现(附代码)Mar 30, 2023 am 10:45 AM

本篇文章给大家带来了关于php的相关知识,其中主要介绍了我是怎么用ChatGPT学习PHP中AOP的实现,感兴趣的朋友下面一起来看一下吧,希望对大家有帮助。

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

Hot Tools

PhpStorm Mac version

PhpStorm Mac version

The latest (2018.2.1) professional PHP integrated development tool

MantisBT

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

SublimeText3 Linux new version

SublimeText3 Linux new version

SublimeText3 Linux latest version

SecLists

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

EditPlus Chinese cracked version

EditPlus Chinese cracked version

Small size, syntax highlighting, does not support code prompt function