Home  >  Article  >  Claude3 is released, will it completely surpass GPT-4?

Claude3 is released, will it completely surpass GPT-4?

WBOY
WBOYforward
2024-03-05 23:01:15406browse

Just now, Anthropic announced the launch of the Claude 3 model series, which sets a new industry benchmark across a wide range of cognitive tasks. The range includes three state-of-the-art models, arranged in increasing order of capability: Claude 3 Haiku, Claude 3 Sonnet and Claude 3 Opus. Each subsequent model offers increasingly powerful performance, allowing users to choose the best balance of intelligence, speed and cost for their specific applications.

Opus and Sonnet are now available in claude.ai and the Claude API, with the latter now fully available in 159 countries. Haiku will be available soon.

Claude 3 Model Series

Claude3 发布,或将全面超越 GPT-4?

#The new standard of intelligence

Opus, Anthropic’s most intelligent model in most common AI systems Excellent performance on assessment benchmarks, including undergraduate level expert knowledge (MMLU), postgraduate level expert reasoning (GPQA), basic mathematics (GSM8K), etc. It demonstrates near-human-level understanding and fluency on complex tasks, leading the frontier of general intelligence.

Claude 3 models demonstrate strong capabilities in analysis and prediction, detail in content creation, code generation, and conversational delivery in non-English languages ​​such as Spanish, Japanese, and French.

Here's how the Claude 3 model compares to its Anthropic counterparts on multiple capability benchmarks[1]:

Claude3 发布,或将全面超越 GPT-4?

Near-instant results

## The #Claude 3 model can support live customer chat, autocomplete, and data extraction tasks where responses must be immediate and real-time.

In the field of intelligence, Haiku is an extremely cost-effective model with the fastest speed on the market. It was able to decipher an information-dense arXiv research paper (~10,000 tokens) containing charts and graphs in less than three seconds. Anthropic will further optimize its performance in the near future, and Haiku's performance will also be improved.

For the vast majority of workloads, Sonnet is more than 2x faster than Claude 2 and Claude 2.1, and has a higher level of intelligence. It excels at tasks that require fast responses, such as knowledge retrieval or sales automation. The Opus is similar in speed to the Claude 2 and 2.1, but with a higher level of intelligence.

Powerful Visual Capabilities

Claude 3 models have sophisticated visual capabilities on par with other leading models. They can handle a variety of visual formats, including photos, charts, graphs, and technical diagrams. Anthropic is particularly excited to offer this new modality to enterprise customers, some of whom have as much as 50% of their knowledge bases encoded in various formats such as PDFs, flowcharts, or presentation slides.

Claude3 发布,或将全面超越 GPT-4?

Rejection reduction

The previous Claude model often made unnecessary rejections, indicating a lack of contextual understanding. Anthropic has made substantial progress in this regard: Opus, Sonnet and Haiku are significantly less likely to refuse to answer prompts that approach the system's alert line, much less so than previous models. As shown in the figure below, the Claude 3 model has a more nuanced understanding of requests, identifies real harm, and refuses to answer harmless prompts significantly less often.

Claude3 发布,或将全面超越 GPT-4?

Improved Accuracy

Businesses of all sizes rely on Anthropic’s models to serve their customers, which makes Anthropic’s model output at scale Maintaining high accuracy is crucial. To assess this, Anthropic used a large set of complex, factual questions that target known weaknesses in current models. Anthropic classifies responses as correct answers, incorrect answers (or hallucinations), and admissions of uncertainty, where the model expresses not knowing the answer rather than providing false information. Compared to Claude 2.1, Opus achieved a twofold improvement in accuracy (or correct answers) on these challenging open-ended questions while also reducing the level of incorrect answers.

In addition to producing more trustworthy responses, Anthropic will soon enable citations in Anthropic's Claude 3 models so that they can point to precise sentences in references to verify their answers.

Claude3 发布,或将全面超越 GPT-4?

Long context and nearly perfect recall

Claude 3 Series models will offer a 200,000-mark context window at launch. However, all three models are capable of accepting inputs of over 1 million tokens, which Anthropic may offer to specific customers who require increased processing power.

In order to effectively handle long contextual cues, the model needs strong recall capabilities. "Needle In A Haystack" (NIAH) evaluates the ability of a measurement model to accurately recall information from a large data corpus. Anthropic enhances the robustness of this benchmark by using one of 30 random pin/question pairs for each prompt and testing on a diverse crowdsourced corpus of documents.

Claude 3 Opus not only achieves near-perfect recall, exceeding 99% accuracy, but in some cases it even identifies the evaluations themselves by identifying "needle" sentences that appear to have been artificially inserted into the original text limitations.

Claude3 发布,或将全面超越 GPT-4?

Responsible Design

Anthropic developed the Claude 3 series of models to deliver dependability alongside capability. Anthropic has several dedicated teams tracking and mitigating a variety of risks, from misinformation and CSAM to bioabuse, election interference, and autonomous replication skills. Anthropic continues to develop methods, such as Constitutional AI, to improve the security and transparency of Anthropic's models, and to adjust Anthropic's models to mitigate privacy concerns that may arise from new modalities.

Addressing bias in increasingly complex models is an ongoing effort, and Anthropic is making progress with this new release. As shown in the model card, Claude 3 shows less bias than Anthropic's previous model according to the Bias Question Answering Benchmark (BBQ). Anthropic remains committed to advancing technology that reduces bias and promotes greater neutrality in models, ensuring they are not biased toward any particular partisan position.

While the Claude 3 model series offers improvements in biological knowledge, network-related knowledge, and autonomy compared to previous models, it remains at AI Safety Level 2 (according to Anthropic’s Responsible Scaling Policy) ASL-2). Anthropic’s red team assessment (conducted in line with Anthropic’s White House commitments and the 2023 U.S. Executive Order) concluded that current models have negligible potential for catastrophic risk. Anthropic will continue to closely monitor future models to assess how close they are to the ASL-3 threshold. Additional security details are provided on the Claude 3 model card.

Easier to use

Claude 3 model performs better at following complex multi-step instructions. They are particularly good at following brand voice and response guidelines and developing customer-facing experiences that users can trust. Additionally, the Claude 3 model performs better at generating popular structured outputs, such as JSON formats—making it easier to coach Claude for use cases such as natural language classification and sentiment analysis.

Model Details

Claude 3 Opus is Anthropic’s smartest model, showing the best performance on the market on highly complex tasks. It flows brilliantly in open-ended prompts and unseen situations, with human-like understanding. Opus shows Anthropic the limits of what is possible with generative AI.

Claude3 发布,或将全面超越 GPT-4?

Claude 3 Sonnet strikes the ideal balance between intelligence and speed—especially for enterprise workloads. It delivers powerful performance at a lower cost than its peers and is designed for high durability for large-scale AI deployments.

Claude3 发布,或将全面超越 GPT-4?

Claude 3 Haiku is Anthropic’s fastest and most compact model, allowing for near-instant response. It answers simple queries and requests with unparalleled speed. Users will be able to build seamless AI experiences that simulate human interactions.

Claude3 发布,或将全面超越 GPT-4?

Model Availability

Opus and Sonnet are available today in Anthropic’s API, which is now generally available and developers can sign up and get started today Use these models. Haiku will be available soon. Sonnet is powering the free experience on claude.ai, while Opus is available for Claude Pro subscribers.

Sonnet is also available through Amazon’s Bedrock and Google Cloud’s Vertex AI Model Garden, with Opus and Haiku coming soon.

Smarter, Faster, Safer

Anthropic believes model intelligence is far from reaching its limits and plans to frequently update the Claude 3 model series over the next few months. Anthropic is also pleased to release a series of features to enhance the capabilities of Anthropic models, especially for enterprise use cases and large-scale deployments. These new features will include tool usage (also known as function calls), interactive coding (also known as REPL), and more advanced agent capabilities.

The above is the detailed content of Claude3 is released, will it completely surpass GPT-4?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:chaincatcher.com. If there is any infringement, please contact admin@php.cn delete