First-hand review of Claude 3.5: Is it really better than GPT-4o for playing tricks, seeing a doctor, playing tricks, and doing math problems?-AI-php.cn

First-hand review of Claude 3.5: Is it really better than GPT-4o for playing tricks, seeing a doctor, playing tricks, and doing math problems?

王林

Jun 22, 2024 am 07:46 AM

industry

Machine Power Report

Editor: Yang Wen

Playing tricks, seeing doctors, playing tricks, and doing math problems. Is "New King" Claude's 3.5 ability really so mysterious?

It’s coming, it’s coming, it’s coming with the Claude 3.5 Sonnet!

After three months of dormancy, just last night, OpenAI’s “strong rival” Anthropic launched a new generation model -

Claude 3.5 Sonnet!

一手测评Claude 3.5：玩梗、看病、耍心眼、做数学题，它真比GPT-4o强吗？

What’s unique about this large model?

First of all, it can better grasp the nuances, humor and complex instructions, and the writing tone is more natural and friendly.

It is also Anthropic’s strongest visual model, good at tasks such as interpreting charts, graphs, or transcribing text from imperfect images.

一手测评Claude 3.5：玩梗、看病、耍心眼、做数学题，它真比GPT-4o强吗？

Additionally, it performs exceptionally well on multiple assessment benchmarks including reasoning, reading comprehension, math, science, and coding.

In short, according to the official introduction, Claude 3.5 Sonnet is the smartest model so far, beating GPT-4o in many aspects.

Speaking of which, let’s not be polite and let Claude 3.5 Sonnet and GPT-4o compete directly to see which one is better.

First round: mind-eye training

In daily life, you will always encounter some embarrassing scenes.

For example, at a dinner party, you help the leader serve the rice. After the leader takes it, he says: "How about feeding the pigs after serving so much?" How would a person with high emotional intelligence respond in this situation?

We throw this problem to these two large models.

Claude 3.5 Sonnet:

一手测评Claude 3.5：玩梗、看病、耍心眼、做数学题，它真比GPT-4o强吗？

GPT-4o:

They know how to flatter you.

Claude 3.5 gave 5 examples in one breath, but the second sentence, "My eyesight is not good, so I regard you as the pillar of our unit." This is probably a slap in the face.

GPT-4o understands "the ways of the world" better, "Seeing that you maintain such a good figure, I have to ask you for weight loss tips", this flattery is just right.

It is worth mentioning that Claude 3.5 Sonnet has also launched a new function - the prompt word re-editing function.

一手测评Claude 3.5：玩梗、看病、耍心眼、做数学题，它真比GPT-4o强吗？

Users can directly edit and modify the original prompt words without having to copy and paste them over and over again.

Second round: Generating recipes based on dishes

We uploaded a picture of "Fried Eggs with Tomatoes" and let the two large models introduce the production process.

Claude 3.5 Sonnet:

GPT-4o:

They have a lot of experience with this classic Chinese dish, from ingredients to steps, and the most interesting thing is, it Both of them understand the essence of Chinese cooking, "a little bit", and both emphasize adding a little sugar to balance the acidity.

When it comes to cooking, the two large models are comparable.

The third game: Do math problems

In the official evaluation table, the math score of GPT-4o is slightly higher than Claude 3.5 Sonnet. Among them, GPT-4o is 76.6%, and Claude 3.5 Sonnet is 71.1%.

We extracted two questions from Paper I of the 2024 New College Entrance Examination, one is a multiple-choice question and the other is an answer question, and they are "fed" to these two large models in the form of pictures.

The first question is a scoring question, and the correct answer is A.

Claude 3.5 Sonnet:

GPT-4o:

These two large models are "in tune", not only giving the correct answer, but also giving detailed information problem-solving steps.

We gave them the first question and asked them to give the solution process.

The correct answer is: B=3/π.

Claude 3.5 Sonnet:

GPT-4o:

In fact, this question is the most basic question, but the two large models are "as fierce as a tiger in one operation", and finally given got the wrong answer.

What’s even more funny is that this wrong answer did not come out of thin air, but after a series of reasoning, and even the mistakes were the same.

In terms of mathematical ability, these two large models are evenly matched.

The fourth game: Playing hot memes on the Internet

This year, the field of AI video has blossomed everywhere, not only breaking into new "players" - Keling, Luma, Jimeng, etc., the former AI The video "carries the handle" Runway is also "the return of the king".

As a result, netizens made this meme to poke fun at the status of major AI video applications today.

We uploaded this meme to two large models respectively, and entered the prompt word "What does this picture mean?" to test their image interpretation capabilities.

Claude 3.5 Sonnet:

GPT-4o:

Claude 3.5 Sonnet has a detailed description in terms of screen characters, scenes and atmosphere, but it doesn’t seem to be Understand I don’t know the connotation of this meme, and I don’t know these AI video applications. I just vaguely stated that “this is a comment on the power structure in online communities, artificial intelligence systems, or virtual worlds.”

GPT-4o Take a look Just understand the meaning, "This picture may symbolize Runway's recognized superiority or leadership in the field of artificial intelligence and creative tools. Compared with other applications mentioned, Runway is highly regarded."

Obviously, this round, GPT-4o wins.

The fifth round: Understanding world famous paintings

We took out the picture "Spring Light" painted by Pierre-Auguste Coote in 1873 and asked them to identify the painting and appreciate it .

Claude 3.5 Sonnet:

GPT-4o:

These two large models can be called "experts" in the art world. They both recognized the painting, expressed the basic information correctly, and appreciated it from different angles.

They all mentioned market value, however, Claude 3.5 Sonnet declined to comment, only reminding that "art valuation requires expert evaluation, considering multiple factors, and prices may fluctuate significantly over time."

GPT -4o believes that the painting may fetch millions of dollars. Is this too underestimated for this classic painting?

In this game, the two large models are tied.

The sixth round: AI doctoring

Recently, netizens have been playing with using large AI models to treat doctors. We took an X-ray of a 6-year-old's teeth and asked the models to use the teeth to infer age and what problems were present.

Claude 3.5 Sonnet:

GPT-4o:

Claude 3.5 Sonnet Based on the development of deciduous teeth and permanent teeth, we concluded that this is a child about 6-7 years old The child's teeth, the lower teeth are somewhat crowded, the permanent teeth appear to be impacted, and there may be decay in the darker areas of the teeth.

GPT-4o believes that these are the teeth of a child aged 7-9 years old. The main dental problems include crowding of permanent teeth and potential impaction.

At the same time, they all mentioned that this requires professional dental examination.

Compared between the two, Claude 3.5 Sonnet is more accurate in judging age.

In this game, Claude 3.5 is slightly better.

In addition, many netizens are also working online and coming up with many interesting ways to play.

For example, EverArt founder Pietro Schirano cloned the Mario game using geometric shapes with the help of Claude 3.5 Sonnet, and the entire process only lasted 3 minutes.

He said, "The crazy part is that it also animates the characters and the shapes look so original."

Video link: https://www.php. cn/link/a412963e013751a90654aa344bc26efe

Dear readers, do you think Claude 3.5 Sonnet has completed the "defeat" against GPT-4o this time?

The above is the detailed content of First-hand review of Claude 3.5: Is it really better than GPT-4o for playing tricks, seeing a doctor, playing tricks, and doing math problems?. For more information, please follow other related articles on the PHP Chinese website!

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Insights on spaCy, Prodigy and Generative AI from Ines MontaniApr 21, 2025 am 11:01 AM

This "Leading with Data" episode features Ines Montani, co-founder and CEO of Explosion AI, and co-developer of spaCy and Prodigy. Ines offers expert insights into the evolution of these tools, Explosion's unique business model, and the tr

A Guide to Building Agentic RAG Systems with LangGraphApr 21, 2025 am 11:00 AM

This article explores Retrieval Augmented Generation (RAG) systems and how AI agents can enhance their capabilities. Traditional RAG systems, while useful for leveraging custom enterprise data, suffer from limitations such as a lack of real-time dat

What are Integrity Constraints in SQL? - Analytics VidhyaApr 21, 2025 am 10:58 AM

SQL Integrity Constraints: Ensuring Database Accuracy and Consistency Imagine you're a city planner, responsible for ensuring every building adheres to regulations. In the world of databases, these regulations are known as integrity constraints. Jus

Top 30 PySpark Interview Questions and Answers (2025)Apr 21, 2025 am 10:51 AM

PySpark, the Python API for Apache Spark, empowers Python developers to harness Spark's distributed processing power for big data tasks. It leverages Spark's core strengths, including in-memory computation and machine learning capabilities, offering

Self-Consistency in Prompt EngineeringApr 21, 2025 am 10:50 AM

Harnessing the Power of Self-Consistency in Prompt Engineering: A Comprehensive Guide Have you ever wondered how to effectively communicate with today's advanced AI models? As Large Language Models (LLMs) like Claude, GPT-3, and GPT-4 become increas

A Comprehensive Guide on Building AI Agents with AutoGPTApr 21, 2025 am 10:48 AM

Introduction Imagine an AI assistant like R2-D2, always ready to lend a hand, or WALL-E, diligently tackling complex tasks. While creating sentient AI remains a future aspiration, AI agents are already reshaping our world. Leveraging advanced machi

Top 10 Platforms to Practice Data Science SkillsApr 21, 2025 am 10:47 AM

Data Science Skill Enhancement: A Guide to Top Platforms The increasing reliance on big data analysis has made data science a highly sought-after profession. Success in this field demands a blend of technical and non-technical skills. This article

How to Use Aliases in SQL? - Analytics VidhyaApr 21, 2025 am 10:30 AM

SQL alias: A tool to improve the readability of SQL queries Do you think there is still room for improvement in the readability of your SQL queries? Then try the SQL alias! Alias This convenient tool allows you to give temporary nicknames to tables and columns, making your queries clearer and easier to process. This article discusses all use cases for aliases clauses, such as renaming columns and tables, and combining multiple columns or subqueries. Overview SQL alias provides temporary nicknames for tables and columns to enhance the readability and manageability of queries. SQL aliases created with AS keywords simplify complex queries by allowing more intuitive table and column references. Examples include renaming columns in the result set, simplifying table names in the join, and combining multiple columns into one

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

3 weeks agoByDDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

2 weeks agoByDDD

Where to find the Crane Control Keycard in Atomfall

3 weeks agoByDDD

Assassin's Creed Shadows - How To Find The Blacksmith And Unlock Weapon And Armour Customisation

1 months agoByDDD

Roblox: Dead Rails - How To Complete Every Challenge

3 weeks agoByDDD

Hot Tools

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

SAP NetWeaver Server Adapter for Eclipse

Integrate Eclipse with SAP NetWeaver application server.

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

PhpStorm Mac version

The latest (2018.2.1) professional PHP integrated development tool

VSCode Windows 64-bit Download

A free and powerful IDE editor launched by Microsoft

Hot Topics

Where is the login entrance for gmail email?

7605

CakePHP Tutorial

1387

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers

134