search
HomeTechnology peripheralsAIOpenAI President: GPT-4 is not perfect but it is definitely different

OpenAI President: GPT-4 is not perfect but it is definitely different

# News on March 16th, artificial intelligence research company OpenAI released the highly anticipated text generation AI model GPT-4 yesterday. Greg Brockman, co-founder and president of OpenAI, said in an interview that GPT-4 is not perfect, but it is definitely different.

GPT-4 improves on its predecessor GPT-3 in a number of key ways, such as providing more truthful representations and allowing developers to more easily control its style and behavior. GPT-4 is also multimodal in the sense that it can understand images, add annotations to photos, and even describe in detail what is in the photo.

But GPT-4 also has serious flaws. Just like GPT-3, the model suffers from "illusions" (i.e., the text aggregated by the model is irrelevant or inaccurate enough to the source text) and makes basic inference errors. OpenAI gave an example on its blog. GPT-4 described "Elvis Presley" as "the son of an actor", but in fact neither of his parents were actors.

When asked to compare GPT-4 with GPT-3, Brockman only gave a four-word answer: different. He explained: "GPT-4 is definitely different, even though it still has a lot of problems and bugs. But you can see a jump in skills in subjects like calculus or law. It has performed very poorly in some areas , but now it has reached a level beyond ordinary people."

The test results support Brockman's view. In the college entrance calculus test, GPT-4 is scored 4 points (out of 5 points), GPT-3 is scored 1 point, and GPT-3.5, which is between GPT-3 and GPT-4, is also scored 4 points. In the mock bar exam, GPT-4 scores entered the top 10%, while GPT-3.5 scores hovered around the bottom 10%.

At the same time, GPT-4 is more concerned about the multi-mode mentioned above. Unlike GPT-3 and GPT-3.5, which can only accept text prompts, such as asking to "write an article about giraffes," GPT-4 can accept both image and text prompts to perform certain operations, such as identifying people in An image of a giraffe captured in the Serengeti, with a basic description of the content.

This is because GPT-4 is trained on image and text data, while its predecessor was trained only on text. OpenAI said the training data comes from "a variety of legally authorized, publicly available data sources, which may include publicly available personal information," but when asked to provide details, Brockman declined. Training data has landed OpenAI in legal trouble before.

The image understanding capabilities of GPT-4 left a deep impression on people. For example, typing the prompt "What's so funny about this image?" GPT-4 will break down the entire image and correctly explain the punch line of the joke.

Currently, only one partner can use GPT - 4, an assistive app for the visually impaired called Be My Eyes. Brockman said that a wider rollout is in the works as OpenAI evaluates the risks and pros and cons of any time. It will be "slowly and deliberately".

He also said: "There are policy issues that also need to be addressed, such as facial recognition and how to process images of people. We need to find out where the danger zones are, where the red lines are, and then find solutions over time. ”

OpenAI faced a similar ethical dilemma with its text-to-image conversion system Dall-E 2. After initially disabling the feature, OpenAI allowed customers to upload faces to be used with the AI-powered image generation system. It edits. At the time, OpenAI claimed that upgrades to its security system made the face-editing feature possible because it minimized the potential harm of deepfakes and attempts to create pornographic, political and violent content.

Another The long-term issue is preventing GPT-4 from being used inadvertently in ways that could cause harm. Hours after the model was released, Israeli cybersecurity startup Adversa AI published a blog post demonstrating bypassing OpenAI's content filters And let GPT-4 generate phishing emails, offensive descriptions of gays, and other objectionable text.

This is not a new problem in the world of language models. BlenderBot, a chatbot from Facebook parent company Meta and OpenAI’s ChatGPT have also been tempted to output inappropriate content and even reveal sensitive details of their inner workings. But many, including journalists, had hoped that GPT-4 might bring significant improvements in this regard.

When asked about the robustness of GPT-4, Brockman emphasized that the model has undergone six months of security training. In internal testing, it did not respond to requests for content not allowed by the OpenAI usage policy. "We spent a lot of time trying to understand GPT," Brockman said. -4 ability. We're continually updating it to include a range of improvements so that the model is more scalable to suit the personality or mode people want it to have. ”

Frankly, the early real-world test results are not that satisfactory. In addition to the Adversa AI test, Microsoft's chatbot Bing Chat also proved to be very easy to jailbreak. Using carefully crafted inputs, users can tell the chatbot to express affection, threaten harm, justify mass murder, and invent conspiracy theories.

Brockman did not deny that GPT-4 fell short in this area, but he highlighted the model's new limiting tools, including an API-level feature called "system" messages. System messages are essentially instructions that set the tone and establish boundaries for interactions with GPT-4. For example, a system message might read: "You are a tutor who always answers questions in a Socratic style. You never give your students answers, but always try to ask the right questions to help them learn Think for yourself."

The idea is that system messages act as guardrails to prevent GPT-4 from going off track. "Really figuring out the tone, style and substance of GPT-4 has been a big focus of ours," Brockman said. "I think we're starting to understand more about how to do engineering, how to have a repeatable process that allows you to Get predictable results that are actually useful to people."

Brockman also mentioned Evals, OpenAI's latest open source software framework for evaluating the performance of its AI models, which OpenAI is committed to" Enhance” the hallmark of its model. Evals allows users to develop and run benchmarks that evaluate models (such as GPT-4) while checking their performance, a crowdsourced approach to model testing.

Brockman said: "With Evals, we can better see the use cases that users care about and can test them. Part of the reason why we open sourced this framework is that we no longer Release a new model every three months to keep improving. You wouldn't make something you can't measure, right? But as we roll out new versions of the model, we can at least know what changes have occurred."

布Rockman was also asked whether OpenAI would compensate people for testing its models with Evals. He was reluctant to commit to this, but he did note that for a limited time, OpenAI is allowing early access to the GPT-4 API to Eevals users who request it.

Brockman also talked about GPT-4’s context window, which refers to the text that the model can consider before generating additional text. OpenAI is testing a version of GPT-4 that can "remember" about 50 pages of content, five times the "memory" of regular GPT-4 and eight times the "memory" of GPT-3.

Brockman believes that the expanded contextual window will lead to new, previously unexplored use cases, especially in the enterprise. He envisioned an AI chatbot built for companies that could use background and knowledge from different sources, including employees across departments, to answer questions in a very knowledgeable but conversational way.

This is not a new concept. But Brockman believes GPT-4’s answers will be far more useful than those currently provided by other chatbots and search engines. "Before, the model had no idea who you were, what you were interested in, etc. And having a larger context window definitely makes it stronger, greatly enhancing the support it can provide people," he said. Xiaoxiao)

The above is the detailed content of OpenAI President: GPT-4 is not perfect but it is definitely different. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete
The AI Skills Gap Is Slowing Down Supply ChainsThe AI Skills Gap Is Slowing Down Supply ChainsApr 26, 2025 am 11:13 AM

The term "AI-ready workforce" is frequently used, but what does it truly mean in the supply chain industry? According to Abe Eshkenazi, CEO of the Association for Supply Chain Management (ASCM), it signifies professionals capable of critic

How One Company Is Quietly Working To Transform AI ForeverHow One Company Is Quietly Working To Transform AI ForeverApr 26, 2025 am 11:12 AM

The decentralized AI revolution is quietly gaining momentum. This Friday in Austin, Texas, the Bittensor Endgame Summit marks a pivotal moment, transitioning decentralized AI (DeAI) from theory to practical application. Unlike the glitzy commercial

Nvidia Releases NeMo Microservices To Streamline AI Agent DevelopmentNvidia Releases NeMo Microservices To Streamline AI Agent DevelopmentApr 26, 2025 am 11:11 AM

Enterprise AI faces data integration challenges The application of enterprise AI faces a major challenge: building systems that can maintain accuracy and practicality by continuously learning business data. NeMo microservices solve this problem by creating what Nvidia describes as "data flywheel", allowing AI systems to remain relevant through continuous exposure to enterprise information and user interaction. This newly launched toolkit contains five key microservices: NeMo Customizer handles fine-tuning of large language models with higher training throughput. NeMo Evaluator provides simplified evaluation of AI models for custom benchmarks. NeMo Guardrails implements security controls to maintain compliance and appropriateness

AI Paints A New Picture For The Future Of Art And DesignAI Paints A New Picture For The Future Of Art And DesignApr 26, 2025 am 11:10 AM

AI: The Future of Art and Design Artificial intelligence (AI) is changing the field of art and design in unprecedented ways, and its impact is no longer limited to amateurs, but more profoundly affecting professionals. Artwork and design schemes generated by AI are rapidly replacing traditional material images and designers in many transactional design activities such as advertising, social media image generation and web design. However, professional artists and designers also find the practical value of AI. They use AI as an auxiliary tool to explore new aesthetic possibilities, blend different styles, and create novel visual effects. AI helps artists and designers automate repetitive tasks, propose different design elements and provide creative input. AI supports style transfer, which is to apply a style of image

How Zoom Is Revolutionizing Work With Agentic AI: From Meetings To MilestonesHow Zoom Is Revolutionizing Work With Agentic AI: From Meetings To MilestonesApr 26, 2025 am 11:09 AM

Zoom, initially known for its video conferencing platform, is leading a workplace revolution with its innovative use of agentic AI. A recent conversation with Zoom's CTO, XD Huang, revealed the company's ambitious vision. Defining Agentic AI Huang d

The Existential Threat To UniversitiesThe Existential Threat To UniversitiesApr 26, 2025 am 11:08 AM

Will AI revolutionize education? This question is prompting serious reflection among educators and stakeholders. The integration of AI into education presents both opportunities and challenges. As Matthew Lynch of The Tech Edvocate notes, universit

The Prototype: American Scientists Are Looking For Jobs AbroadThe Prototype: American Scientists Are Looking For Jobs AbroadApr 26, 2025 am 11:07 AM

The development of scientific research and technology in the United States may face challenges, perhaps due to budget cuts. According to Nature, the number of American scientists applying for overseas jobs increased by 32% from January to March 2025 compared with the same period in 2024. A previous poll showed that 75% of the researchers surveyed were considering searching for jobs in Europe and Canada. Hundreds of NIH and NSF grants have been terminated in the past few months, with NIH’s new grants down by about $2.3 billion this year, a drop of nearly one-third. The leaked budget proposal shows that the Trump administration is considering sharply cutting budgets for scientific institutions, with a possible reduction of up to 50%. The turmoil in the field of basic research has also affected one of the major advantages of the United States: attracting overseas talents. 35

All About Open AI's Latest GPT 4.1 Family - Analytics VidhyaAll About Open AI's Latest GPT 4.1 Family - Analytics VidhyaApr 26, 2025 am 10:19 AM

OpenAI unveils the powerful GPT-4.1 series: a family of three advanced language models designed for real-world applications. This significant leap forward offers faster response times, enhanced comprehension, and drastically reduced costs compared t

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

PhpStorm Mac version

PhpStorm Mac version

The latest (2018.2.1) professional PHP integrated development tool

mPDF

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

MinGW - Minimalist GNU for Windows

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

MantisBT

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

EditPlus Chinese cracked version

EditPlus Chinese cracked version

Small size, syntax highlighting, does not support code prompt function