search
HomeTechnology peripheralsAIA New Apple Study Shows AI Reasoning Has Critical Flaws

It’s no surprise that AI doesn’t always get things right. Occasionally, it even hallucinates. However, a recent study by Apple researchers has shown even more significant flaws within the mathematical models used by AI for formal reasoning.

✕ Remove Ads

As part of the study, Apple scientists asked an AI Large Language Model (LLM) a question, multiple times, in slightly varying ways, and were astounded when they found the LLM offered unexpected variations in the answers. These variations were most prominent when numbers were involved.

Apple's Study Suggests Big Problems With AI's Reliability

A New Apple Study Shows AI Reasoning Has Critical Flaws

The research, published by arxiv.org, concluded there was “significant performance variability across different instantiations of the same question, challenging the reliability of current GSM8K results that rely on single point accuracy metrics.” GSM8K is a dataset which includes over 8000 diverse grade-school math questions and answers.

✕ Remove Ads

Apple researchers identified the variance in this performance could be as much as 10%. And even slight variations in prompts can cause colossal problems with the reliability of the LLM’s answers.

In other words, you might want to fact-check your answers anytime you use something like ChatGPT. That's because, while it may sometimes look like AI is using logic to give you answers to your inquiries, logic isn’t what’s being used.

AI, instead, relies on pattern recognition to provide responses to prompts. However, the Apple study shows how changing even a few unimportant words can alter that pattern recognition.

One example of the critical variance presented came about through a problem regarding collecting kiwis over several days. Apple researchers conducted a control experiment, then added some inconsequential information about kiwi size.

✕ Remove Ads

Both Meta and OpenAI Models Showed Issues

A New Apple Study Shows AI Reasoning Has Critical Flaws

Meta’s Llama, and OpenAI’s o1, then altered their answers to the problem from the control despite kiwi size data having no tangible influence on the problem’s outcome. OpenAI’s GPT-4o also had issues with its performance when introducing tiny variations in the data given to the LLM.

Since LLMs are becoming more prominent in our culture, this news raises a tremendous concern about whether we can trust AI to provide accurate answers to our inquiries. Especially for issues like financial advice. It also reinforces the need to accurately verify the information you receive when using large language models.

That means you'll want to do some critical thinking and due diligence instead of blindly relying on AI. Then again, if you're someone who uses AI regularly, you probably already knew that.

✕ Remove Ads

The above is the detailed content of A New Apple Study Shows AI Reasoning Has Critical Flaws. For more information, please follow other related articles on the PHP Chinese website!

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
A Comprehensive Guide to Selenium with PythonA Comprehensive Guide to Selenium with PythonApr 15, 2025 am 09:57 AM

Introduction This guide explores the powerful combination of Selenium and Python for web automation and testing. Selenium automates browser interactions, significantly improving testing efficiency for large web applications. This tutorial focuses o

A Guide to Understanding Interaction TermsA Guide to Understanding Interaction TermsApr 15, 2025 am 09:56 AM

Introduction Interaction terms are incorporated in regression modelling to capture the effect of two or more independent variables in the dependent variable. At times, it is not just the simple relationship between the control

Swiggy's Hermes: AI Solution for Seamless Data-Driven DecisionsSwiggy's Hermes: AI Solution for Seamless Data-Driven DecisionsApr 15, 2025 am 09:50 AM

Swiggy's Hermes: Revolutionizing Data Access with Generative AI In today's data-driven landscape, Swiggy, a leading Indian food delivery service, is leveraging the power of generative AI through its innovative tool, Hermes. Designed to accelerate da

Gaurav Agarwal's Blueprint for Success with RagaAI - Analytics VidhyaGaurav Agarwal's Blueprint for Success with RagaAI - Analytics VidhyaApr 15, 2025 am 09:46 AM

This episode of "Leading with Data" features Gaurav Agarwal, CEO and founder of RagaAI, a company focused on ensuring the reliability of generative AI. Gaurav discusses his journey in AI, the challenges of building dependable AI systems, a

Grok 2 Image Generator: Shown Angry Elon Musk Holding AR15Grok 2 Image Generator: Shown Angry Elon Musk Holding AR15Apr 15, 2025 am 09:45 AM

Grok-2: Unfiltered AI Image Generation Sparks Ethical Debate Elon Musk's xAI has launched Grok-2, a powerful AI model boasting enhanced chat, coding, and reasoning capabilities, alongside a controversial unfiltered image generator. This release has

Top 10 GitHub Repositories to Master Statistics - Analytics VidhyaTop 10 GitHub Repositories to Master Statistics - Analytics VidhyaApr 15, 2025 am 09:44 AM

Statistical Mastery: Top 10 GitHub Repositories for Data Science Statistics is fundamental to data science and machine learning. This article explores ten leading GitHub repositories that provide excellent resources for mastering statistical concept

How to Become Robotics Engineer?How to Become Robotics Engineer?Apr 15, 2025 am 09:41 AM

Robotics: A Rewarding Career Path in a Rapidly Expanding Field The field of robotics is experiencing explosive growth, driving innovation across numerous sectors and daily life. From automated manufacturing to medical robots and autonomous vehicles,

How to Remove Duplicates in Excel? - Analytics VidhyaHow to Remove Duplicates in Excel? - Analytics VidhyaApr 15, 2025 am 09:20 AM

Data Integrity: Removing Duplicates in Excel for Accurate Analysis Clean data is crucial for effective decision-making. Duplicate entries in Excel spreadsheets can lead to errors and unreliable analysis. This guide shows you how to easily remove dup

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
WWE 2K25: How To Unlock Everything In MyRise
1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

Safe Exam Browser

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

SublimeText3 Linux new version

SublimeText3 Linux new version

SublimeText3 Linux latest version

MantisBT

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

WebStorm Mac version

WebStorm Mac version

Useful JavaScript development tools