Beat LLaMA? The ranking of the most powerful 'Falcon' in history is in doubt, Fu Yao personally tested 7 lines of code, and LeCun forwarded it to like-AI-php.cn

Beat LLaMA? The ranking of the most powerful 'Falcon' in history is in doubt, Fu Yao personally tested 7 lines of code, and LeCun forwarded it to like

王林

Jun 10, 2023 pm 07:46 PM

Modelranking

Some time ago, the fledgling Falcon crushed LLaMA in the LLM rankings, causing waves in the entire community.

But, is Falcon really better than LLaMA?

Short answer: Probably not.

Beat LLaMA? The ranking of the most powerful Falcon in history is in doubt, Fu Yao personally tested 7 lines of code, and LeCun forwarded it to like

Fu Yao’s team conducted a more in-depth evaluation of the model:

"We The evaluation of LLaMA 65B was reproduced on MMLU and obtained a score of 61.4, which is close to the official score (63.4), much higher than its score on the Open LLM Leaderboard (48.8), and significantly higher than the Falcon (52.7)."

No fancy prompt engineering, no fancy decoding, everything is the default setting.

Beat LLaMA? The ranking of the most powerful Falcon in history is in doubt, Fu Yao personally tested 7 lines of code, and LeCun forwarded it to like

Currently, the code and test methods have been made public on Github.

There are doubts about the Falcons surpassing LLaMA, LeCun expressed his position, the problem with the test script...

Beat LLaMA? The ranking of the most powerful Falcon in history is in doubt, Fu Yao personally tested 7 lines of code, and LeCun forwarded it to like

LLaMA is true· Strength

Currently in the OpenLLM rankings, Falcon ranks first, surpassing LLaMA, and has been highly recommended by researchers including Thomas Wolf.

Beat LLaMA? The ranking of the most powerful Falcon in history is in doubt, Fu Yao personally tested 7 lines of code, and LeCun forwarded it to like

However, some people have their doubts.

First, a netizen questioned where these LLaMA numbers came from. They seemed inconsistent with the numbers in the paper...

Beat LLaMA? The ranking of the most powerful Falcon in history is in doubt, Fu Yao personally tested 7 lines of code, and LeCun forwarded it to like

Subsequently, OpenAI scientist Andrej Karpathy also expressed concern about why LLaMA 65B’s score on the Open LLM rankings was significantly lower than the official one (48.8 vs. 63.4).

And post, so far I have avoided tweeting about Falcons because of this, not sure.

In order to clarify this problem, Fu Yao and team members decided to conduct a public test on LLaMA 65B, and the result was 61.4 points.

Beat LLaMA? The ranking of the most powerful Falcon in history is in doubt, Fu Yao personally tested 7 lines of code, and LeCun forwarded it to like

In the test, the researchers did not use any special mechanism, and LLaMA 65B was able to achieve this score.

This result just proves that if you want the model to achieve a level close to GPT-3.5, it is best to use RLHF on LLaMA 65B.

The basis is the findings of a Chain-of-Thought Hub paper recently published by Fu Yao’s team.

Beat LLaMA? The ranking of the most powerful Falcon in history is in doubt, Fu Yao personally tested 7 lines of code, and LeCun forwarded it to like

Of course, Fu Yao said that their evaluation was not intended to cause a dispute between LLaMA and Falcon. After all, these are great open source projects. Models have made significant contributions to this field!

In addition, Falcon has a more convenient license, which also gives it great development potential.

For this latest review, netizen BlancheMinerva pointed out that a fair comparison should be to run Falcon on MMLU under default settings.

In this regard, Fu Yao said that this was correct and that the work was being carried out and the results were expected to be available in one day.

Beat LLaMA? The ranking of the most powerful Falcon in history is in doubt, Fu Yao personally tested 7 lines of code, and LeCun forwarded it to like

No matter what the final result is, you must know that the mountain of GPT-4 is the goal that the open source community really wants to pursue.

OpenLLM ranking problem

Researchers from Meta praised Fu Yao for reproducing the LLaMa results well and pointed out the problem with the OpenLLM ranking list.

At the same time, he also shared some questions about the OpenLLM rankings.

Beat LLaMA? The ranking of the most powerful Falcon in history is in doubt, Fu Yao personally tested 7 lines of code, and LeCun forwarded it to like

First, the MMLU results: The LLaMa 65B MMLU result is 15 points on the leaderboard, but it is the same for the 7B model. There is also a small performance gap between the 13B and 30B models.

OpenLLM really needs to look at this before announcing which model is the best.

Beat LLaMA? The ranking of the most powerful Falcon in history is in doubt, Fu Yao personally tested 7 lines of code, and LeCun forwarded it to like

Benchmarks: How are these benchmarks chosen?

The ARC 25 shot and the Hellaswag 10 shot don’t seem to be particularly relevant to LLM. It would be better if some generative benchmarks could be included. Although generative benchmarks have their limitations, they can still be useful.

Beat LLaMA? The ranking of the most powerful Falcon in history is in doubt, Fu Yao personally tested 7 lines of code, and LeCun forwarded it to like

Single Average Score: It is always tempting to reduce the results to a single score, and the average score is easiest.

But in this case, is the average of 4 benchmarks really useful? Is getting 1 point on MMLU the same as getting 1 point on HellaSwag?

In the world of rapid iteration of LLM, there is definitely some value in developing such a ranking list.

Beat LLaMA? The ranking of the most powerful Falcon in history is in doubt, Fu Yao personally tested 7 lines of code, and LeCun forwarded it to like

And Lucas Beyer, a researcher from Google, also expressed his opinion,

Crazy Yes, NLP researchers have different understandings of the same benchmark, thus leading to completely different results. At the same time, every time one of my colleagues implements a metric, I immediately ask them if they actually check for a perfect reproduction of the official code, and if not, discard their results.

Beat LLaMA? The ranking of the most powerful Falcon in history is in doubt, Fu Yao personally tested 7 lines of code, and LeCun forwarded it to like

Also, he said that as far as I know, regardless of the model, it will not actually reproduce the results of the original benchmark.

Beat LLaMA? The ranking of the most powerful Falcon in history is in doubt, Fu Yao personally tested 7 lines of code, and LeCun forwarded it to like

Netizens echoed that this is the reality of LLM benchmark...

Beat LLaMA? The ranking of the most powerful Falcon in history is in doubt, Fu Yao personally tested 7 lines of code, and LeCun forwarded it to like

Falcon——Open source, commercially available, strong performance

Speaking of Falcon, it is actually worth a good review.

According to LeCun, in the era of large models, open source is the most important.

Beat LLaMA? The ranking of the most powerful Falcon in history is in doubt, Fu Yao personally tested 7 lines of code, and LeCun forwarded it to like

After Meta’s LLaMA code was leaked, developers from all walks of life began to be eager to try it.

Falcon is a surprise weapon developed by the Technology Innovation Institute (TII) in Abu Dhabi, United Arab Emirates.

In terms of performance when it was first released, Falcon performed better than LLaMA.

Currently, "Falcon" has three versions-1B, 7B and 40B.

TII stated that Falcon is the most powerful open source language model to date. Its largest version, Falcon 40B, has 40 billion parameters, which is still a bit smaller in scale than LLaMA, which has 65 billion parameters.

However, TII has previously stated that despite its small scale, Falcon has great performance.

Faisal Al Bannai, Secretary General of the Advanced Technology Research Council (ATRC), believes that the release of “Falcon” will break the way to obtain LLM and allow researchers and entrepreneurs to propose the best solutions. Most innovative use cases.

Beat LLaMA? The ranking of the most powerful Falcon in history is in doubt, Fu Yao personally tested 7 lines of code, and LeCun forwarded it to like

The two versions of FalconLM, Falcon 40B Instruct and Falcon 40B, rank in the top two on the Hugging Face OpenLLM rankings, while Meta’s LLaMA Located in third place.

The problem with the rankings mentioned above is exactly this.

Although the "Falcon" paper has not yet been publicly released, Falcon 40B has been extensively trained on a carefully screened 1 trillion token network data set.

Researchers have revealed that "Falcon" attaches great importance to the importance of achieving high performance on large-scale data during the training process.

What we all know is that LLM is very sensitive to the quality of training data, which is why researchers spend a lot of effort building one that can perform efficient processing on tens of thousands of CPU cores data pipeline.

The purpose is to extract high-quality content from the Internet based on filtering and deduplication.

Currently, TII has released a refined network data set, which is a carefully filtered and deduplicated data set. Practice has proved that it is very effective.

The model trained using only this data set can be on par with other LLMs, or even surpass them in performance. This demonstrates the excellent quality and influence of "Falcon".

Beat LLaMA? The ranking of the most powerful Falcon in history is in doubt, Fu Yao personally tested 7 lines of code, and LeCun forwarded it to like

In addition, the Falcon model also has multi-language capabilities.

It understands English, German, Spanish and French, and some small European languages such as Dutch, Italian, Romanian, Portuguese, Czech, Polish and Swedish I also know a lot about it.

Falcon 40B is the second truly open source model after the release of the H2O.ai model.

In addition, there is another very important point - Falcon is currently the only open source model that can be used commercially for free.

In the early days, TII required that if Falcon is used for commercial purposes and generates more than $1 million in attributable income, a 10% "use tax" will be charged.

But it didn’t take long for the wealthy Middle Eastern tycoons to lift this restriction.

At least so far, all commercial use and fine-tuning of Falcon will be free of charge.

The wealthy people said that they do not need to make money through this model for the time being.

Moreover, TII is also soliciting commercialization plans from around the world.

For potential scientific research and commercialization solutions, they will also provide more "training computing power support" or provide further commercialization opportunities.

Beat LLaMA? The ranking of the most powerful Falcon in history is in doubt, Fu Yao personally tested 7 lines of code, and LeCun forwarded it to like

This is simply saying: as long as the project is good, the model is free! Enough computing power! If you don’t have enough money, we can still collect it for you!

For start-ups, this is simply a "one-stop solution for AI large model entrepreneurship" from the Middle East tycoon.

According to the development team, an important aspect of FalconLM’s competitive advantage is the selection of training data.

The research team developed a process to extract high-quality data from public crawled datasets and remove duplicate data.

After thorough cleaning of redundant and duplicate content, 5 trillion tokens were retained—enough to train powerful language models.

The 40B Falcon LM uses 1 trillion tokens for training, and the 7B version of the model uses 1.5 trillion tokens for training.

Beat LLaMA? The ranking of the most powerful Falcon in history is in doubt, Fu Yao personally tested 7 lines of code, and LeCun forwarded it to like

(The research team’s goal is to filter out only the highest quality raw data from the Common Crawl using the RefinedWeb dataset)

In addition, Falcon’s training costs are relatively more controllable.

TII stated that compared with GPT-3, Falcon achieved significant performance improvements while using only 75% of the training computing budget.

Beat LLaMA? The ranking of the most powerful Falcon in history is in doubt, Fu Yao personally tested 7 lines of code, and LeCun forwarded it to like

And only requires 20% of the calculation time during inference, which was successfully implemented Efficient utilization of computing resources.

The above is the detailed content of Beat LLaMA? The ranking of the most powerful 'Falcon' in history is in doubt, Fu Yao personally tested 7 lines of code, and LeCun forwarded it to like. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete

What is Graph of Thought in Prompt EngineeringApr 13, 2025 am 11:53 AM

Introduction In prompt engineering, “Graph of Thought” refers to a novel approach that uses graph theory to structure and guide AI’s reasoning process. Unlike traditional methods, which often involve linear s

Optimize Your Organisation's Email Marketing with GenAI AgentsApr 13, 2025 am 11:44 AM

Introduction Congratulations! You run a successful business. Through your web pages, social media campaigns, webinars, conferences, free resources, and other sources, you collect 5000 email IDs daily. The next obvious step is

Real-Time App Performance Monitoring with Apache PinotApr 13, 2025 am 11:40 AM

Introduction In today’s fast-paced software development environment, ensuring optimal application performance is crucial. Monitoring real-time metrics such as response times, error rates, and resource utilization can help main

ChatGPT Hits 1 Billion Users? 'Doubled In Just Weeks' Says OpenAI CEOApr 13, 2025 am 11:23 AM

“How many users do you have?” he prodded. “I think the last time we said was 500 million weekly actives, and it is growing very rapidly,” replied Altman. “You told me that it like doubled in just a few weeks,” Anderson continued. “I said that priv

Pixtral-12B: Mistral AI's First Multimodal Model - Analytics VidhyaApr 13, 2025 am 11:20 AM

Introduction Mistral has released its very first multimodal model, namely the Pixtral-12B-2409. This model is built upon Mistral’s 12 Billion parameter, Nemo 12B. What sets this model apart? It can now take both images and tex

Agentic Frameworks for Generative AI Applications - Analytics VidhyaApr 13, 2025 am 11:13 AM

Imagine having an AI-powered assistant that not only responds to your queries but also autonomously gathers information, executes tasks, and even handles multiple types of data—text, images, and code. Sounds futuristic? In this a

Applications of Generative AI in the Financial SectorApr 13, 2025 am 11:12 AM

Introduction The finance industry is the cornerstone of any country’s development, as it drives economic growth by facilitating efficient transactions and credit availability. The ease with which transactions occur and credit

Guide to Online Learning and Passive-Aggressive AlgorithmsApr 13, 2025 am 11:09 AM

Introduction Data is being generated at an unprecedented rate from sources such as social media, financial transactions, and e-commerce platforms. Handling this continuous stream of information is a challenge, but it offers an

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Best Graphic Settings

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution

2 weeks agoByDDD

R.E.P.O. How to Fix Audio if You Can't Hear Anyone

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

WWE 2K25: How To Unlock Everything In MyRise

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Dreamweaver CS6

Visual web development tools

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

Hot Topics

Where is the login entrance for gmail email?

7490

CakePHP Tutorial

1377

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers