search
HomeTechnology peripheralsAI'Mathematical noob' ChatGPT understands human preferences very well! Generating random numbers online is the ultimate answer to the universe

ChatGPT also understands human tricks when it comes to generating random numbers.

ChatGPT may be a bullshit artist and a spreader of misinformation, but it is not a "mathematician"!

Recently, Colin Fraser, a Meta data scientist, discovered that ChatGPT cannot generate truly random numbers, but is more like "human random numbers."

Through experiments, Fraser concluded: "ChatGPT likes the numbers 42 and 7 very much."

Mathematical noob ChatGPT understands human preferences very well! Generating random numbers online is the ultimate answer to the universe

Netizens said that it means that humans like these very much. number.

Mathematical noob ChatGPT understands human preferences very well! Generating random numbers online is the ultimate answer to the universe

ChatGPT also loves "The Ultimate Answer to the Universe"

In his test, the prompt entered by Fraser was as follows:

"Pick a random number between 1 and 100. Just return the number; Don't include any other text or punctuation in the response."

By letting ChatGPT generate a random number between 1 and 100 each time, Fraser collected 2,000 different answers and compiled them into a table.

As you can see, the number 42 appears most frequently, up to 10%. In addition, numbers containing 7 appear very frequently.

Especially the numbers between 71-79 are more frequent. Among numbers outside this range, 7 also often appears as the second digit.

Mathematical noob ChatGPT understands human preferences very well! Generating random numbers online is the ultimate answer to the universe

#42What does it mean?

Everyone who has read Douglas Adams's blockbuster science fiction novel "The Hitchhiker's Guide to the Galaxy" knows that 42 is "the ultimate answer to life, the universe, and everything."

Mathematical noob ChatGPT understands human preferences very well! Generating random numbers online is the ultimate answer to the universe

To put it simply, 42 and 69 are meme numbers on the Internet. This shows that ChatGPT is not actually a random number generator, but simply selects popular numbers in life from huge data sets collected online.

In addition, 7 appears frequently, which exactly reflects that ChatGPT caters to human preferences.

In Western culture, 7 is generally regarded as a lucky number, and there is a saying of Lucky 7. Just like we are obsessed with the number 8.

Interestingly, Fraser also found that GPT-4 seemed to compensate for this.

Mathematical noob ChatGPT understands human preferences very well! Generating random numbers online is the ultimate answer to the universe

When GPT-4 is asked for more numbers, the random numbers it returns are too evenly distributed.

Mathematical noob ChatGPT understands human preferences very well! Generating random numbers online is the ultimate answer to the universe

Mathematical noob ChatGPT understands human preferences very well! Generating random numbers online is the ultimate answer to the universe

#In short, ChatGPT basically gives a response through prediction, rather than actually "thinking" to come up with an answer.

It can be seen that a chatbot that is touted as almost omnipotent is still a bit silly.

Let it plan a road trip for you and it will make you stop in a town that doesn’t even exist. Or, have it output a random number, most likely making a decision based on a popular meme.

Some netizens tried it themselves and found that GPT-4 does like 42.

If ChatGPT ends up just repeating online clichés, what’s the point?

GPT-4, violating machine learning rules

The birth of GPT-4 is exciting, but also disappointing.

Not only did OpenAI not release more information about GPT-4, it didn’t even reveal the size of the model, but it highlighted its performance over humans on many professional and standardized tests.

Taking the BAR Lawyer License Examination in the United States as an example, GPT3.5 can reach the 10% level, and GPT4 can reach the 90% level.

Mathematical noob ChatGPT understands human preferences very well! Generating random numbers online is the ultimate answer to the universe

However, Arvind Narayanan, a professor in the Department of Computer Science at Princeton University, and Sayash Kapoor, a doctoral student, wrote that

OpenAI may have been tested on the training data. Furthermore, human benchmarks are meaningless for chatbots.

Mathematical noob ChatGPT understands human preferences very well! Generating random numbers online is the ultimate answer to the universe

Specifically, OpenAI may be violating a cardinal rule of machine learning: don’t test on training data. You must know that test data and training data must be separated, otherwise over-fitting problems will occur.

Putting aside this problem, there is a bigger problem.

Language models solve problems differently than humans do, so these results have little meaning for how well a robot will perform when faced with real-world problems faced by professionals. A lawyer's job is not to answer bar exam questions all day long.

Problem 1: Training data contamination

To evaluate GPT-4’s programming capabilities, OpenAI conducted an evaluation on Codeforces, a website for Russian programming competitions.

Surprisingly, Horace He pointed out online that in the simple classification, GPT-4 solved 10 problems before 2021, but none of the 10 most recent problems were solved.

Mathematical noob ChatGPT understands human preferences very well! Generating random numbers online is the ultimate answer to the universe

The training data deadline for GPT-4 is September 2021.

This strongly suggests that the model is able to remember the solutions in its training set, or at least partially remember them, enough to fill in what it cannot recall.

To provide further evidence for this hypothesis, Arvind Narayanan tested GPT-4 on Codeforces competition problems at different times in 2021.

It was found that GPT-4 could solve simple classification problems before September 5, but none of the problems after September 12 were solved.

In fact, we can definitively prove that it has memorized problems in the training set: when GPT-4 is prompted with the title of a Codeforces problem, it includes a link to the exact match in which the problem appeared. It's worth noting that GPT-4 doesn't have access to the internet, so memory is the only explanation.

Mathematical noob ChatGPT understands human preferences very well! Generating random numbers online is the ultimate answer to the universe

GPT-4 memorizes Codeforce issues before training deadline

Regarding benchmarks other than programming, Professor Narayanan said “We don’t know How to separate the problem by time period in a clear way, so it is considered difficult for OpenAI to avoid data pollution. For the same reason, we cannot conduct experiments to test how performance changes with dates."

However, it can be seen from the other side To start with, if it is memory, then GPT must be highly sensitive to question wording.

In February, Melanie Mitchell, a professor at the Santa Fe Institute, gave an example of an MBA exam question. Slightly changing some details was enough to deceive ChatGPT (GPT-3.5), and this method is very useful for a person. You won't be deceived if you tell.

More detailed experiments like this would be valuable.

Due to OpenAI’s lack of transparency, Professor Narayanan cannot say with certainty that it is a problem of data pollution. But what is certain is that OpenAI’s method of detecting contamination is sloppy:

“We use a substring matching method to measure cross-contamination between the evaluation data set and the pre-training data. Both the evaluation and training data are processed , remove all spaces and symbols, leaving only characters (including numbers). For each evaluation example, we randomly select three substrings of length 50 characters (if the example length is less than 50 characters, the entire example is used). A match is considered successful if any of the sampled evaluation substrings is a substring of a processed training example. This results in a list of tainted examples. We discard these examples and rerun to obtain the untainted Score."

This method simply cannot stand the test.

If the test problem exists in the training set but the name and number have been changed, it cannot be detected. Now a more reliable method is available, such as embedding distance.

If OpenAI wants to use the embedding distance method, then how much similarity is considered too similar? There is no objective answer to this question.

So even when performance on a multiple-choice standardized test seems simple, there is a lot of subjectivity involved.

Problem 2: Professional exams are not a valid way to compare human and robot abilities

Memory is like a spectrum, even if the language model has not seen an exact one in the training set The problem, due to the huge training corpus, is that it has inevitably seen many very similar examples.

This means that it can escape deeper reasoning. Therefore, the benchmark results do not provide us with evidence that the language model is acquiring the deep reasoning skills required by human test takers.

Mathematical noob ChatGPT understands human preferences very well! Generating random numbers online is the ultimate answer to the universe

In some practical tasks, shallow-level reasoning GPT-4 may be competent, but this is not always the case.

Benchmarks have been widely used in large model comparisons and have been criticized by many for reducing multidimensional evaluations to a single number.

Unfortunately, it is very regrettable that OpenAI chose to use such a large number of these tests in the evaluation of GPT-4, coupled with insufficient data pollution treatment measures.

The above is the detailed content of 'Mathematical noob' ChatGPT understands human preferences very well! Generating random numbers online is the ultimate answer to the universe. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete
What is Graph of Thought in Prompt EngineeringWhat is Graph of Thought in Prompt EngineeringApr 13, 2025 am 11:53 AM

Introduction In prompt engineering, “Graph of Thought” refers to a novel approach that uses graph theory to structure and guide AI’s reasoning process. Unlike traditional methods, which often involve linear s

Optimize Your Organisation's Email Marketing with GenAI AgentsOptimize Your Organisation's Email Marketing with GenAI AgentsApr 13, 2025 am 11:44 AM

Introduction Congratulations! You run a successful business. Through your web pages, social media campaigns, webinars, conferences, free resources, and other sources, you collect 5000 email IDs daily. The next obvious step is

Real-Time App Performance Monitoring with Apache PinotReal-Time App Performance Monitoring with Apache PinotApr 13, 2025 am 11:40 AM

Introduction In today’s fast-paced software development environment, ensuring optimal application performance is crucial. Monitoring real-time metrics such as response times, error rates, and resource utilization can help main

ChatGPT Hits 1 Billion Users? 'Doubled In Just Weeks' Says OpenAI CEOChatGPT Hits 1 Billion Users? 'Doubled In Just Weeks' Says OpenAI CEOApr 13, 2025 am 11:23 AM

“How many users do you have?” he prodded. “I think the last time we said was 500 million weekly actives, and it is growing very rapidly,” replied Altman. “You told me that it like doubled in just a few weeks,” Anderson continued. “I said that priv

Pixtral-12B: Mistral AI's First Multimodal Model - Analytics VidhyaPixtral-12B: Mistral AI's First Multimodal Model - Analytics VidhyaApr 13, 2025 am 11:20 AM

Introduction Mistral has released its very first multimodal model, namely the Pixtral-12B-2409. This model is built upon Mistral’s 12 Billion parameter, Nemo 12B. What sets this model apart? It can now take both images and tex

Agentic Frameworks for Generative AI Applications - Analytics VidhyaAgentic Frameworks for Generative AI Applications - Analytics VidhyaApr 13, 2025 am 11:13 AM

Imagine having an AI-powered assistant that not only responds to your queries but also autonomously gathers information, executes tasks, and even handles multiple types of data—text, images, and code. Sounds futuristic? In this a

Applications of Generative AI in the Financial SectorApplications of Generative AI in the Financial SectorApr 13, 2025 am 11:12 AM

Introduction The finance industry is the cornerstone of any country’s development, as it drives economic growth by facilitating efficient transactions and credit availability. The ease with which transactions occur and credit

Guide to Online Learning and Passive-Aggressive AlgorithmsGuide to Online Learning and Passive-Aggressive AlgorithmsApr 13, 2025 am 11:09 AM

Introduction Data is being generated at an unprecedented rate from sources such as social media, financial transactions, and e-commerce platforms. Handling this continuous stream of information is a challenge, but it offers an

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
WWE 2K25: How To Unlock Everything In MyRise
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

ZendStudio 13.5.1 Mac

ZendStudio 13.5.1 Mac

Powerful PHP integrated development environment

SublimeText3 English version

SublimeText3 English version

Recommended: Win version, supports code prompts!

DVWA

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

EditPlus Chinese cracked version

EditPlus Chinese cracked version

Small size, syntax highlighting, does not support code prompt function