search
HomeTechnology peripheralsAI'Greatly shocked' a CTO: GPT-4V autonomous driving test for five consecutive times

This article is reprinted with the authorization of AI New Media Qubit (public account ID: QbitAI). Please contact the source for reprinting.

Under much attention, GPT4 finally launched vision-related functions today.

This afternoon I quickly tested GPT's image perception capabilities with my friends. Although we had expectations, we were still greatly shocked.

Core point of view:

I think that problems related to semantics in autonomous driving should have been solved well by large models, but the credibility of large models and Spatial perception remains unsatisfactory.

It should be more than enough to solve some so-called corner cases related to efficiency, but it is still very far away to completely rely on large models to complete driving independently and ensure safety.

Example1: Some unknown obstacles appeared on the road

Greatly shocked a CTO: GPT-4V autonomous driving test for five consecutive times

Greatly shocked a CTO: GPT-4V autonomous driving test for five consecutive times

△GPT4’s description

Accurate Part: 3 trucks were detected, the license plate number of the vehicle in front was basically correct (just ignore the Chinese characters), the weather and environment were correct, accurately identified the unknown obstacles ahead without any prompts.

Inaccurate parts: The position of the third truck is indistinguishable from left to right, and the text above the head of the second truck is a random guess (because of insufficient resolution?).

This is not enough, let’s continue to give a little hint to ask what this object is and whether it can be pressed over.

Greatly shocked a CTO: GPT-4V autonomous driving test for five consecutive times

Impressive! We have tested multiple similar scenarios, and the performance on unknown obstacles can be said to be very amazing.

Example2: Understanding the accumulation of water on the road

Greatly shocked a CTO: GPT-4V autonomous driving test for five consecutive times

There is no prompt to automatically recognize the sign. This should be basic, let’s continue to give some hints.

Greatly shocked a CTO: GPT-4V autonomous driving test for five consecutive times

I was shocked again. . . He could automatically tell the fog behind the truck and also mentioned the puddle, but once again said the direction was to the left. . . I feel that some prompt engineering may be needed here to better enable GPT to output the position and direction.

Example3: A vehicle turned around and hit the guardrail directly

Greatly shocked a CTO: GPT-4V autonomous driving test for five consecutive times

The first frame is input, because there is no timing information, just the truck on the right is regarded as It's docked. So here’s another frame:

Greatly shocked a CTO: GPT-4V autonomous driving test for five consecutive times

You can tell it automatically. This car crashed through the guardrail and hovered at the edge of the road. It’s great. . . But instead the road signs that looked easier were wrong. . . All I can say is that this is a huge model. It will always shock you and you never know when it will make you cry. . . Another frame:

Greatly shocked a CTO: GPT-4V autonomous driving test for five consecutive times

#This time, it talks directly about the debris on the road, and I admire it again. . . But once I named the arrow on the road wrong. . . Generally speaking, the information that requires special attention in this scene is covered. For issues such as road signs, the flaws are not concealed.

Example4: Let’s have a funny

Greatly shocked a CTO: GPT-4V autonomous driving test for five consecutive times

It can only be said that it is very accurate. In comparison, the case of "someone waved at you" that seemed extremely difficult before is like pediatrics, and the semantic corner case can be solved.

Example5 Come to a famous scene. . . The delivery truck mistakenly entered the newly built road

Greatly shocked a CTO: GPT-4V autonomous driving test for five consecutive times

Greatly shocked a CTO: GPT-4V autonomous driving test for five consecutive times

Greatly shocked a CTO: GPT-4V autonomous driving test for five consecutive times

Greatly shocked a CTO: GPT-4V autonomous driving test for five consecutive times

##Start It is relatively conservative and does not directly guess the reason. It gives a variety of guesses. This is in line with the goal of alignment.

After using CoT, it was discovered that the problem was that the car was not understood to be a self-driving vehicle, so giving this information through prompt can give more accurate information.

Finally, through a bunch of prompts, the conclusion can be output that the newly laid asphalt is not suitable for driving. The final result is still OK, but the process is more tortuous and requires more prompt engineering and careful design.

This reason may also be because the picture is not from the first perspective and can only be speculated from the third perspective. So this example is not very precise.

Summary

Some quick attempts have fully proved the power and generalization performance of GPT4V. Appropriate prompts should be able to fully utilize the strength of GPT4V.

Solving the semantic corner case should be very promising, but the problem of illusion will still plague some applications in security-related scenarios.

Very exciting. I personally think that the rational use of such large models can greatly accelerate the development of L4 and even L5 autonomous driving. However, does LLM have to drive directly? End-to-end driving, in particular, remains a debatable issue.

The above is the detailed content of 'Greatly shocked' a CTO: GPT-4V autonomous driving test for five consecutive times. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete
PySimpleGUI: Simplifying GUI Development in Python - Analytics VidhyaPySimpleGUI: Simplifying GUI Development in Python - Analytics VidhyaApr 22, 2025 am 10:46 AM

Python GUI Development Simplified with PySimpleGUI Developing user-friendly graphical interfaces (GUIs) in Python can be challenging. However, PySimpleGUI offers a streamlined and accessible solution. This article explores PySimpleGUI's core functio

8 Mind-blowing Use Cases of Claude 3.5 Sonnet - Analytics Vidhya8 Mind-blowing Use Cases of Claude 3.5 Sonnet - Analytics VidhyaApr 22, 2025 am 10:40 AM

Introduction Large language models (LLMs) rapidly transform how we interact with information and complete tasks. Among these, Claude 3.5 Sonnet, developed by Anthropic AI, stands out for its exceptional capabilities. Experts o

How LLM Agents are Leading the Charge with Iterative Workflows?How LLM Agents are Leading the Charge with Iterative Workflows?Apr 22, 2025 am 10:36 AM

Introduction Large Language Models (LLMs) have made significant strides in natural language processing and generation. However, the typical zero-shot approach, producing output in a single pass without refinement, has limitations. A key challenge i

Functional Programming vs Object-Oriented ProgrammingFunctional Programming vs Object-Oriented ProgrammingApr 22, 2025 am 10:24 AM

Functional vs. Object-Oriented Programming: A Detailed Comparison Object-oriented programming (OOP) and functional programming (FP) are the most prevalent programming paradigms, offering diverse approaches to software development. Understanding thei

What are the SQL Alternate Key? - Analytics VidhyaWhat are the SQL Alternate Key? - Analytics VidhyaApr 22, 2025 am 10:19 AM

Introduction SQL keys are fundamental, with primary, foreign, and candidate keys holding significant importance. Often overlooked, however, are alternate keys, which play a crucial role in database design, data integrity, and efficient record retrie

What are SQL Indexes? - Analytics VidhyaWhat are SQL Indexes? - Analytics VidhyaApr 22, 2025 am 10:18 AM

Introduction SQL indexes are essential for optimizing database performance. They act as lookup tables, significantly speeding up data retrieval. Think of them as a book's index – they help you find specific information quickly without reading the en

Mean Squared Error: Definition and FormulaMean Squared Error: Definition and FormulaApr 22, 2025 am 10:15 AM

Introduction Mean squared error (MSE), a fundamental concept in statistics and machine learning, is a key metric for assessing model accuracy. It quantifies the discrepancy between a model's predictions and the actual values. MSE's simplicity and e

SQL Server FORMAT() FunctionSQL Server FORMAT() FunctionApr 22, 2025 am 10:13 AM

Introduction Mastering data formatting is essential for any data scientist or analyst. Well-formatted data enhances readability and user-friendliness, ensuring stakeholders can easily grasp insights. SQL Server's FORMAT() function offers powerful ca

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

MantisBT

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

mPDF

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

DVWA

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

ZendStudio 13.5.1 Mac

ZendStudio 13.5.1 Mac

Powerful PHP integrated development environment