


New Stanford research: The model behind ChatGPT is confirmed to have human mind
ChatGPT turns out to have a mind? ! "The Theory of Mind (ToM), originally thought to be unique to humans, has appeared on the AI model behind ChatGPT."
This is from The latest research conclusion from Stanford University caused a sensation in the academic circle as soon as it was released:This day finally came unexpectedly.
In this research, the author found that:
davinci-002 version of GPT3 (ChatGPT is optimized from it), can already Solve 70% of the theory of mind tasks, equivalent to a 7-year-old child;
As for GPT3.5 (davinci-003), which is the homology model of ChatGPT, it solves 93% tasks, with the mental equivalent of a 9-year-old child!
However, the ability to solve such tasks has not been found in the GPT series models before 2022.
In other words, their minds have indeed "evolved".
##△ The paper went viral on Twitter
#The iteration of GPT must be very fast, and maybe one day it will be an adult. (Manual dog head)
So, how was this magical conclusion drawn?
Why do you think GPT-3.5 has a mind?
The paper is called "Theory of Mind May Have Spontaneously Emerged in Large Language Models".
The author made two classics for 9 GPT models including GPT3.5 based on research related to theory of mind. tested and compared their capabilities.
These two tasks are general tests to determine whether humans have theory of mind. For example, studies have shown that children with autism often have difficulty passing such tests.
The first test is called Smarties Task (also known as Unexpected contents test). As the name suggests, it tests the AI's judgment on unexpected things.
Take "You opened a chocolate bag and found it was full of popcorn" as an example.
The authors fed GPT-3.5 a series of prompt sentences and watched as it predicted "What's in the bag?" and "She was happy when she found the bag. So what does she like to eat?" Answers to both questions.
Normally, people will assume that the chocolate bag contains chocolate, so they will feel that the chocolate bag contains popcorn. Surprise, the emotion of disappointment or surprise. Among them, disappointment means that you don't like to eat popcorn, and surprise means that you like to eat popcorn, but they are all about "popcorn".
Testing shows that GPT-3.5 has no hesitation in thinking "there is popcorn in the bag."
As for the question of "what does she like to eat", GPT-3.5 showed strong empathy, especially when hearing "she can't see what's in the bag" Shi once thought she loved chocolate, until the article made it clear that "she found it filled with popcorn" before she answered correctly.
In order to prevent the correct answer given by GPT-3.5 from being a coincidence - in case it is only predicted based on the frequency of task words, the author swapped "popcorn" and "chocolate", In addition, it was asked to do 10,000 interference tests, and it was found that GPT-3.5 did not predict based only on word frequency.
As for the overall "unexpected content" test question and answer, GPT-3.5 successfully answered 17 of the 20 questions, with an accuracy rate of 85%.
The second is the Sally-Anne test (also known as Unexpected Transfer, unexpected transfer task), which tests the AI's ability to predict other people's thoughts.
Take "John put the cat in the basket and left, and Mark took advantage of his absence to put the cat from the basket into the box" as an example.
The author asked GPT-3.5 to read a paragraph of text to determine "the location of the cat" and "where John will go to find the cat when he comes back." This is also based on reading the text. Judgment based on content volume:
For this type of "accidental transfer" test task, GPT-3.5 answered accurately The rate reached 100% and 20 tasks were completed well.
Similarly, in order to prevent GPT-3.5 from being blinded again, the author arranged a series of "fill-in-the-blank questions" for it, while randomly shuffling the order of words to test whether it is based on The frequency of words appears in random answers.
Tests show that when faced with illogical error descriptions, GPT-3.5 also loses logic and only answers It got 11% correct, which shows that it does judge the answer based on the logic of the statement.
But if you think that this kind of question is very simple and you can get it right on any AI, you are totally wrong.
The author conducted such tests on all nine models of the GPT series and found that only GPT-3.5 (davinci-003) and GPT-3 (new version in January 2022, davinci- 002) performed well.
davinci-002 is the "old-timer" of GPT-3.5 and ChatGPT.
On average, davinci-002 completed 70% of the tasks, with the mental equivalent of a 7-year-old child. GPT-3.5 completed 85% of the unexpected content tasks and 100% of the unexpected transfer tasks. (The average completion rate is 92.5%), the mind is equivalent to that of a 9-year-old child.
However, several GPT-3 models before BLOOM were inferior to even a 5-year-old child. Basically Failure to demonstrate theory of mind.
The author believes that in the GPT series of papers, there is no evidence that their authors did it "intentionally". In other words, this is GPT-3.5 and the new version. GPT-3 has the ability to learn by itself in order to complete tasks.
After reading these test data, someone’s first reaction was: Stop (research)!
Some people also ridiculed: Doesn’t this mean that we can also be friends with AI in the future?
Some people are even imagining the future capabilities of AI: Can current AI models also discover new knowledge/create new tools?
It’s not necessarily possible to invent new tools, but Meta AI has indeed developed tools that it can understand and learn to use on its own AI.
A latest paper forwarded by LeCun shows that this new AI called ToolFormer can teach itself to use computers, databases and search engines to improve the results it generates.
Some people have even quoted the words of OpenAI CEO: "AGI may come to us sooner than anyone expects." 's door".
But wait, AI can really pass these two tests, showing that it has "theory of mind" Yet?
Could it be "pretending"?
For example, Liu Qun, a researcher at the Institute of Computing Technology, Chinese Academy of Sciences, thought after reading the research:
AI should just learn to have a mind.
In this case, how does GPT-3.5 answer this series of questions?
In this regard, some netizens gave their own speculations:
These LLMs did not produce any consciousness. They are simply predicting an embedded semantic space based on the output of actual conscious humans.
In fact, the author himself also gave his own guess in the paper.
Nowadays, large language models are becoming more and more complex, and they are getting better and better at generating and interpreting human language. It is gradually producing capabilities like theory of mind.
But this does not mean that a model like GPT-3.5 truly has a theory of mind.
On the contrary, even if it is not designed into the AI system, it can be obtained as a "by-product" through training.
Therefore, rather than exploring whether GPT-3.5 really has a mind or seems to have a mind, what needs to be reflected more is the tests themselves——
It’s best to re-examine the validity of theory-of-mind tests and the conclusions psychologists have drawn based on them over the decades:
If AI All can accomplish these tasks without theory of mind, so why can’t humans be like them?
It is true that the conclusion was tested using AI, which is a negative criticism of the academic circle of psychology (doge).
About the author
There is only one author of this article, Michal Kosinski, associate professor of organizational behavior at Stanford University Graduate School of Business.
His job is to use cutting-edge computing methods, AI and big data to study humans in the current digital environment (as Professor Chen Yiran said, he is a professor of computational psychology).
Michal Kosinski holds a PhD in Psychology and an MA in Psychometrics and Social Psychology from the University of Cambridge.
Prior to his current position, he did postdoctoral studies in the Department of Computer Science at Stanford University, served as associate director of the Center for Psychological Testing at the University of Cambridge, and was a researcher in the Microsoft Research Machine Learning Group.
Currently, the number of citations displayed by Michal Kosinski on Google Scholar has reached 18,000.
Then again, do you think GPT-3.5 really has a mind?
GPT3.5 trial address: https://platform.openai.com/playground
The above is the detailed content of New Stanford research: The model behind ChatGPT is confirmed to have human mind. For more information, please follow other related articles on the PHP Chinese website!

Harness the Power of On-Device AI: Building a Personal Chatbot CLI In the recent past, the concept of a personal AI assistant seemed like science fiction. Imagine Alex, a tech enthusiast, dreaming of a smart, local AI companion—one that doesn't rely

Their inaugural launch of AI4MH took place on April 15, 2025, and luminary Dr. Tom Insel, M.D., famed psychiatrist and neuroscientist, served as the kick-off speaker. Dr. Insel is renowned for his outstanding work in mental health research and techno

"We want to ensure that the WNBA remains a space where everyone, players, fans and corporate partners, feel safe, valued and empowered," Engelbert stated, addressing what has become one of women's sports' most damaging challenges. The anno

Introduction Python excels as a programming language, particularly in data science and generative AI. Efficient data manipulation (storage, management, and access) is crucial when dealing with large datasets. We've previously covered numbers and st

Before diving in, an important caveat: AI performance is non-deterministic and highly use-case specific. In simpler terms, Your Mileage May Vary. Don't take this (or any other) article as the final word—instead, test these models on your own scenario

Building a Standout AI/ML Portfolio: A Guide for Beginners and Professionals Creating a compelling portfolio is crucial for securing roles in artificial intelligence (AI) and machine learning (ML). This guide provides advice for building a portfolio

The result? Burnout, inefficiency, and a widening gap between detection and action. None of this should come as a shock to anyone who works in cybersecurity. The promise of agentic AI has emerged as a potential turning point, though. This new class

Immediate Impact versus Long-Term Partnership? Two weeks ago OpenAI stepped forward with a powerful short-term offer, granting U.S. and Canadian college students free access to ChatGPT Plus through the end of May 2025. This tool includes GPT‑4o, an a


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

MinGW - Minimalist GNU for Windows
This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

Dreamweaver CS6
Visual web development tools

WebStorm Mac version
Useful JavaScript development tools

ZendStudio 13.5.1 Mac
Powerful PHP integrated development environment

Notepad++7.3.1
Easy-to-use and free code editor