search
Homeweb3.0Claude3 is released, will it completely surpass GPT-4?

Just now, Anthropic announced the launch of the Claude 3 model series, which sets a new industry benchmark across a wide range of cognitive tasks. The range includes three state-of-the-art models, arranged in increasing order of capability: Claude 3 Haiku, Claude 3 Sonnet and Claude 3 Opus. Each subsequent model offers increasingly powerful performance, allowing users to choose the best balance of intelligence, speed and cost for their specific applications.

Opus and Sonnet are now available in claude.ai and the Claude API, with the latter now fully available in 159 countries. Haiku will be available soon.

Claude 3 Model Series

Claude3 发布,或将全面超越 GPT-4?

#The new standard of intelligence

Opus, Anthropic’s most intelligent model in most common AI systems Excellent performance on assessment benchmarks, including undergraduate level expert knowledge (MMLU), postgraduate level expert reasoning (GPQA), basic mathematics (GSM8K), etc. It demonstrates near-human-level understanding and fluency on complex tasks, leading the frontier of general intelligence.

Claude 3 models demonstrate strong capabilities in analysis and prediction, detail in content creation, code generation, and conversational delivery in non-English languages ​​such as Spanish, Japanese, and French.

Here's how the Claude 3 model compares to its Anthropic counterparts on multiple capability benchmarks[1]:

Claude3 发布,或将全面超越 GPT-4?

Near-instant results

## The #Claude 3 model can support live customer chat, autocomplete, and data extraction tasks where responses must be immediate and real-time.

In the field of intelligence, Haiku is an extremely cost-effective model with the fastest speed on the market. It was able to decipher an information-dense arXiv research paper (~10,000 tokens) containing charts and graphs in less than three seconds. Anthropic will further optimize its performance in the near future, and Haiku's performance will also be improved.

For the vast majority of workloads, Sonnet is more than 2x faster than Claude 2 and Claude 2.1, and has a higher level of intelligence. It excels at tasks that require fast responses, such as knowledge retrieval or sales automation. The Opus is similar in speed to the Claude 2 and 2.1, but with a higher level of intelligence.

Powerful Visual Capabilities

Claude 3 models have sophisticated visual capabilities on par with other leading models. They can handle a variety of visual formats, including photos, charts, graphs, and technical diagrams. Anthropic is particularly excited to offer this new modality to enterprise customers, some of whom have as much as 50% of their knowledge bases encoded in various formats such as PDFs, flowcharts, or presentation slides.

Claude3 发布,或将全面超越 GPT-4?

Rejection reduction

The previous Claude model often made unnecessary rejections, indicating a lack of contextual understanding. Anthropic has made substantial progress in this regard: Opus, Sonnet and Haiku are significantly less likely to refuse to answer prompts that approach the system's alert line, much less so than previous models. As shown in the figure below, the Claude 3 model has a more nuanced understanding of requests, identifies real harm, and refuses to answer harmless prompts significantly less often.

Claude3 发布,或将全面超越 GPT-4?

Improved Accuracy

Businesses of all sizes rely on Anthropic’s models to serve their customers, which makes Anthropic’s model output at scale Maintaining high accuracy is crucial. To assess this, Anthropic used a large set of complex, factual questions that target known weaknesses in current models. Anthropic classifies responses as correct answers, incorrect answers (or hallucinations), and admissions of uncertainty, where the model expresses not knowing the answer rather than providing false information. Compared to Claude 2.1, Opus achieved a twofold improvement in accuracy (or correct answers) on these challenging open-ended questions while also reducing the level of incorrect answers.

In addition to producing more trustworthy responses, Anthropic will soon enable citations in Anthropic's Claude 3 models so that they can point to precise sentences in references to verify their answers.

Claude3 发布,或将全面超越 GPT-4?

Long context and nearly perfect recall

Claude 3 Series models will offer a 200,000-mark context window at launch. However, all three models are capable of accepting inputs of over 1 million tokens, which Anthropic may offer to specific customers who require increased processing power.

In order to effectively handle long contextual cues, the model needs strong recall capabilities. "Needle In A Haystack" (NIAH) evaluates the ability of a measurement model to accurately recall information from a large data corpus. Anthropic enhances the robustness of this benchmark by using one of 30 random pin/question pairs for each prompt and testing on a diverse crowdsourced corpus of documents.

Claude 3 Opus not only achieves near-perfect recall, exceeding 99% accuracy, but in some cases it even identifies the evaluations themselves by identifying "needle" sentences that appear to have been artificially inserted into the original text limitations.

Claude3 发布,或将全面超越 GPT-4?

Responsible Design

Anthropic developed the Claude 3 series of models to deliver dependability alongside capability. Anthropic has several dedicated teams tracking and mitigating a variety of risks, from misinformation and CSAM to bioabuse, election interference, and autonomous replication skills. Anthropic continues to develop methods, such as Constitutional AI, to improve the security and transparency of Anthropic's models, and to adjust Anthropic's models to mitigate privacy concerns that may arise from new modalities.

Addressing bias in increasingly complex models is an ongoing effort, and Anthropic is making progress with this new release. As shown in the model card, Claude 3 shows less bias than Anthropic's previous model according to the Bias Question Answering Benchmark (BBQ). Anthropic remains committed to advancing technology that reduces bias and promotes greater neutrality in models, ensuring they are not biased toward any particular partisan position.

While the Claude 3 model series offers improvements in biological knowledge, network-related knowledge, and autonomy compared to previous models, it remains at AI Safety Level 2 (according to Anthropic’s Responsible Scaling Policy) ASL-2). Anthropic’s red team assessment (conducted in line with Anthropic’s White House commitments and the 2023 U.S. Executive Order) concluded that current models have negligible potential for catastrophic risk. Anthropic will continue to closely monitor future models to assess how close they are to the ASL-3 threshold. Additional security details are provided on the Claude 3 model card.

Easier to use

Claude 3 model performs better at following complex multi-step instructions. They are particularly good at following brand voice and response guidelines and developing customer-facing experiences that users can trust. Additionally, the Claude 3 model performs better at generating popular structured outputs, such as JSON formats—making it easier to coach Claude for use cases such as natural language classification and sentiment analysis.

Model Details

Claude 3 Opus is Anthropic’s smartest model, showing the best performance on the market on highly complex tasks. It flows brilliantly in open-ended prompts and unseen situations, with human-like understanding. Opus shows Anthropic the limits of what is possible with generative AI.

Claude3 发布,或将全面超越 GPT-4?

Claude 3 Sonnet strikes the ideal balance between intelligence and speed—especially for enterprise workloads. It delivers powerful performance at a lower cost than its peers and is designed for high durability for large-scale AI deployments.

Claude3 发布,或将全面超越 GPT-4?

Claude 3 Haiku is Anthropic’s fastest and most compact model, allowing for near-instant response. It answers simple queries and requests with unparalleled speed. Users will be able to build seamless AI experiences that simulate human interactions.

Claude3 发布,或将全面超越 GPT-4?

Model Availability

Opus and Sonnet are available today in Anthropic’s API, which is now generally available and developers can sign up and get started today Use these models. Haiku will be available soon. Sonnet is powering the free experience on claude.ai, while Opus is available for Claude Pro subscribers.

Sonnet is also available through Amazon’s Bedrock and Google Cloud’s Vertex AI Model Garden, with Opus and Haiku coming soon.

Smarter, Faster, Safer

Anthropic believes model intelligence is far from reaching its limits and plans to frequently update the Claude 3 model series over the next few months. Anthropic is also pleased to release a series of features to enhance the capabilities of Anthropic models, especially for enterprise use cases and large-scale deployments. These new features will include tool usage (also known as function calls), interactive coding (also known as REPL), and more advanced agent capabilities.

The above is the detailed content of Claude3 is released, will it completely surpass GPT-4?. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:ChainCatcher. If there is any infringement, please contact admin@php.cn delete
As Fear Drives Selling, BlockDAG (BDAG) Stands Out from the CrowdAs Fear Drives Selling, BlockDAG (BDAG) Stands Out from the CrowdApr 13, 2025 am 11:48 AM

As fear drives selling in the crypto market, major coins like Cardano and Solana face tough times.

The general crypto market has recovered as digital assets shake off bearish sentimentsThe general crypto market has recovered as digital assets shake off bearish sentimentsApr 13, 2025 am 11:46 AM

In the past 24 hours, the general crypto market has recovered as digital assets shake off bearish sentiments. Within this time frame

Bitcoin's Current Cycle Shows Strong Similarities to the Structural Resets Seen in Both 2017 and 2021.Bitcoin's Current Cycle Shows Strong Similarities to the Structural Resets Seen in Both 2017 and 2021.Apr 13, 2025 am 11:44 AM

Key market indicators, like the relationship between Bitcoin and market volatility (BTC/VIX ratio) and the total crypto market capitalization on weekly charts

New Meme Coins to Watch Now: Cheems Token, Siren, and PeiPeiNew Meme Coins to Watch Now: Cheems Token, Siren, and PeiPeiApr 13, 2025 am 11:42 AM

As meme coins continue to take the crypto world by storm, savvy investors are constantly looking for the next big opportunity.

The Crypto Market Has Witnessed a Rebound Following the Recent Sheer DownturnThe Crypto Market Has Witnessed a Rebound Following the Recent Sheer DownturnApr 13, 2025 am 11:40 AM

The crypto market has witnessed a rebound following the recent sheer downturn. As per the exclusive market data, the total crypto market capitalization has reached $2.71Ts

He got rich off pixelated punks — and punked the Internal Revenue Service in the processHe got rich off pixelated punks — and punked the Internal Revenue Service in the processApr 13, 2025 am 11:38 AM

A Pennsylvania man faces federal prison after pleading guilty to flipping more than $13 million worth of digital art from the infamous CryptoPunks NFT collection — and reporting none of it to the IRS, prosecutors said Friday.

Trending Meme Coins To Buy NowTrending Meme Coins To Buy NowApr 13, 2025 am 11:36 AM

In recent times, meme coins have emerged as a unique and exciting investment opportunity, drawing both seasoned investors and newcomers alike.

Toncoin (TON) Appears to Be Positioning Itself for a Short-term ReboundToncoin (TON) Appears to Be Positioning Itself for a Short-term ReboundApr 13, 2025 am 11:34 AM

Toncoin appears to be positioning itself for a short-term rebound after enduring a week of market-wide losses that affected most cryptocurrencies.

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
WWE 2K25: How To Unlock Everything In MyRise
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

mPDF

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

DVWA

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

Dreamweaver Mac version

Dreamweaver Mac version

Visual web development tools

SecLists

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.