


Large model ceiling GPT-4, has it...become stupid?
First a few users raised questions, and then a large number of netizens said they had noticed it and posted a lot of evidence.
Some people reported that they used up the 3 hours and 25 dialogue quotas of GPT-4 in one go, and still did not solve their own code problems.
I had no choice but to switch to GPT-3.5, but it solved the problem.
To summarize everyone’s feedback, the most important manifestations are:
- Before GPT-4 could write The correct code is now full of bugs
- The depth and analysis of answering questions have become less
- The response speed is faster than before
This has caused a lot of people I wonder if OpenAI is cutting corners to save costs?
Two months ago GPT-4 was the world's greatest writing assistant, and a few weeks ago it started to fall into mediocrity. I suspect they cut back on the computing power or made it less intelligent.
This inevitably reminds people of Microsoft's new Bing, which "reached its peak when it debuted", but later suffered "frontal lobotomy surgery" to change its ability. Bad things...
After netizens shared their experiences with each other, it became everyone's consensus that "it started to get worse a few weeks ago."
A storm of public opinion also formed in technical communities such as Hacker News, Reddit and Twitter.
Now the officials can’t sit still.
OpenAI Developer Promotion Ambassador Logan Kilpatrick responded to a netizen’s question:
The API will not change without us notifying you. The model there is at rest.
Worried netizens continued to ask for confirmation, "That means GPT-4 has been static since it was released on March 14, right?" ?", also received a positive answer from Logan.
"I noticed inconsistent performance for some prompt words, is it just due to the instability of the large model itself?", also got " Yes" reply.
But so far, the two questions about whether the web version of GPT-4 has been downgraded have not been answered, and Logan has not received any answers during this period. There is other content posted.
So what exactly is going on? Why not try it yourself.
As netizens generally mentioned that GPT-4’s coding skills have become worse, we conducted a simple experiment.
Has the measured GPT-4 “alchemy” ability declined?
At the end of March, we experimented with letting GPT-4 "make elixirs" and write a multi-layer perceptron in Python to implement an XOR gate.
△ShareGPT screenshot, the interface is slightly different
After changing GPT-4 to use numpy without a framework, the first time The result is wrong.
After modifying the code twice, the correct result was obtained. The first time is to modify the number of hidden neurons, and the second time is to change the activation function from sigmoid to tanh.
On June 2, we tried again to let GPT-4 complete this task, but changed to Chinese prompt words.
This time GPT-4 did not use the framework for the first time, but the code given was still wrong.
After only one modification, the correct result was obtained, and the idea was changed to the idea of directly increasing the number of training epochs and learning rate.
We searched in the OpenAI official Discord channel and found that starting from late April, sporadic users reported that GPT-4 had become worse.
The person who raised this issue earlier on Twitter was Matt Shumer, CEO of HyperWrite (a writing tool developed based on GPT API).
Putting aside the differences in people’s psychological feelings, some people also suspect that the API version and the web version are not necessarily consistent, but there is no solid evidence.
Another guess is that when the plug-in is enabled, the extra prompt words of the plug-in may be considered a kind of pollution to the problem to be solved.
△Additional prompt words in the WebPilot plug-in
This netizen said that in his opinion, the performance of GPT has deteriorated. It started after the plug-in function started public testing.
Some people also asked OpenAI employees whether the model itself has not changed, but whether the inference parameters have changed?
Qubits also accidentally "tortured" that the system prompt words of ChatGPT on iOS were not consistent with the web version.
- If you start a conversation on your mobile phone, it will know that it is interacting with you through your mobile phone.
- Will keep the answer to one or two sentences, unless a long reasoning is required.
- will not use emoticons unless you explicitly ask him to use them.
△It may not be successful, and there is a high probability of refusing to answer
Then if you continue in the web version, open it in the iOS version dialogue without realizing it, you may observe that GPT-4 answers become simpler.
In short, it is still an unsolved mystery whether GPT-4 has become dumber since its release.
But one thing is certain:
The GPT-4 that everyone started playing on March 14th was not as good as the one in the paper from the beginning.
Aligning with humans reduces AI capabilities
The more than 150-page paper published by Microsoft Research "The Spark of AGI: Early Experiments with GPT-4" clearly states:
They obtained testing qualifications before the development of GPT-4 was completed and conducted long-term testing.
Later, for many amazing examples in the paper, netizens were unable to successfully reproduce them using the public version of GPT-4.
There is currently a view in the academic community that although the subsequent RLHF training made GPT-4 more aligned with humans - that is, more obedient to human instructions and consistent with human values - it also allowed it to use its own reasoning, etc. Ability becomes worse.
One of the authors of the paper, Microsoft scientist Zhang Yi, also mentioned in the S7E11 issue of the Chinese podcast program "What's Next|Technology Knows Early":
That version of the model is better than the current one. GPT-4, which is available to everyone, is even stronger, much stronger.
For example, the Microsoft team mentioned in the paper that they let GPT-4 use TikZ in LaTeX to draw a unicorn at regular intervals to track changes in GPT-4 capabilities. .
The last result shown in the paper is quite complete.
But the first author of the paper, Sebastien Bubeck, later revealed more information when he gave a speech at MIT.
Later, when OpenAI began to focus on security issues, subsequent versions became increasingly worse at this task.
Training methods that are aligned with humans but do not reduce the upper limit of AI's own capabilities have become the research direction of many teams now, but Still in its infancy.
In addition to professional research teams, netizens who care about AI are also using their own methods to track changes in AI capabilities.
Someone asked GPT-4 to draw a unicorn once a day and record it publicly on the website.
Since April 12, I still haven’t seen the general shape of a unicorn.
Of course, the website author said that he let GPT-4 use SVG format to draw pictures, which is different from the TikZ format in the paper and has an impact.
And what I drew in April seems to be just as bad as what I draw now, and there is no obvious regression.
Finally, let me ask you, are you a GPT-4 user? Have you felt that GPT-4's capabilities have declined in recent weeks? Welcome to chat in the comment area.
Bubeck’s speech: https://www.php.cn/link/a8a5d22acb383aae55937a6936e120b0
Zhang Yi’s interview: https://www.php.cn/link/ 764f9642ebf04622c53ebc366a68c0a7
One GPT-4 unicorn every dayhttps://www.php.cn/link/7610db9e380ba9775b3c215346184a87
Reference link:
[1]https://www.php.cn/link/cd3e48b4bce1f295bd8ed1eb90eb0d85
[2]https://www.php.cn/link/fc2dc7d20994a777cfd5e6de734fe254
[3]https://www.php.cn/link/4dcfbc057e2ae8589f9bbd98b591c50a
[4]https://www.php.cn/link/0007cda84fafdcf42f96c4f4adb7f8ce
[5]https://www.php.cn/link/cd163419a5f4df0ba7e252841f95fcc1
[6]https://www.php.cn/link/afb0b97df87090596ae7c503f60bb23f
[7]https://www.php.cn/link/ef8f94395be9fd78b7d0aecf7864a03
[8]https://www.php.cn/link/30082754836bf11b2c31a0fd3cb4b091
[9]https://www.php.cn/link/14553eed6ae802daf3f8e8c10b1961f0
##
The above is the detailed content of GPT-4 becomes stupid and triggers public opinion! The quality of text code has declined, and OpenAI has just responded to questions about cost reduction and material reduction.. For more information, please follow other related articles on the PHP Chinese website!

Since 2008, I've championed the shared-ride van—initially dubbed the "robotjitney," later the "vansit"—as the future of urban transportation. I foresee these vehicles as the 21st century's next-generation transit solution, surpas

Revolutionizing the Checkout Experience Sam's Club's innovative "Just Go" system builds on its existing AI-powered "Scan & Go" technology, allowing members to scan purchases via the Sam's Club app during their shopping trip.

Nvidia's Enhanced Predictability and New Product Lineup at GTC 2025 Nvidia, a key player in AI infrastructure, is focusing on increased predictability for its clients. This involves consistent product delivery, meeting performance expectations, and

Google's Gemma 2: A Powerful, Efficient Language Model Google's Gemma family of language models, celebrated for efficiency and performance, has expanded with the arrival of Gemma 2. This latest release comprises two models: a 27-billion parameter ver

This Leading with Data episode features Dr. Kirk Borne, a leading data scientist, astrophysicist, and TEDx speaker. A renowned expert in big data, AI, and machine learning, Dr. Borne offers invaluable insights into the current state and future traje

There were some very insightful perspectives in this speech—background information about engineering that showed us why artificial intelligence is so good at supporting people’s physical exercise. I will outline a core idea from each contributor’s perspective to demonstrate three design aspects that are an important part of our exploration of the application of artificial intelligence in sports. Edge devices and raw personal data This idea about artificial intelligence actually contains two components—one related to where we place large language models and the other is related to the differences between our human language and the language that our vital signs “express” when measured in real time. Alexander Amini knows a lot about running and tennis, but he still

Caterpillar's Chief Information Officer and Senior Vice President of IT, Jamie Engstrom, leads a global team of over 2,200 IT professionals across 28 countries. With 26 years at Caterpillar, including four and a half years in her current role, Engst

Google Photos' New Ultra HDR Tool: A Quick Guide Enhance your photos with Google Photos' new Ultra HDR tool, transforming standard images into vibrant, high-dynamic-range masterpieces. Ideal for social media, this tool boosts the impact of any photo,


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

ZendStudio 13.5.1 Mac
Powerful PHP integrated development environment

mPDF
mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

SublimeText3 Mac version
God-level code editing software (SublimeText3)

PhpStorm Mac version
The latest (2018.2.1) professional PHP integrated development tool

Dreamweaver CS6
Visual web development tools