search
HomeTechnology peripheralsAIThe large-scale inference cost rankings led by Jia Yangqing's high efficiency are released

"Is the API of large models a loss-making business?"

The large-scale inference cost rankings led by Jia Yangqings high efficiency are released

With the practicalization of large language model technology, many technologies The company has launched a large model API for developers to use. However, we can't help but start to wonder whether a business based on large models can be sustained, especially considering that OpenAI is burning through $700,000 a day.

This Thursday, AI startup Martian calculated it carefully for us.

The large-scale inference cost rankings led by Jia Yangqings high efficiency are released

Leaderboard link: https://leaderboard.withmartian.com/

The LLM Inference Provider Leaderboard is an open-source ranking of API inference products for large models. It benchmarks the cost, rate limits, throughput, and P50 and P90 TTFT for the Mixtral-8x7B and Llama-2-70B-Chat public endpoints of each vendor.

Although they compete with each other, Martian found that there are significant differences in the cost, throughput and rate limits of each company's large model services. These differences exceed the 5x cost difference, 6x throughput difference, and even larger rate limit differences. Choosing different APIs is critical to getting the best performance, even though it's just part of doing business.

According to the current ranking, the service provided by Anyscale has the best throughput under the medium service load of Llama-2-70B. For large service loads, Together AI performed best with P50 and P90 throughput on Llama-2-70B and Mixtral-8x7B.

Additionally, Jia Yangqing’s LeptonAI showed the best throughput when handling small task loads with short input and long output cues. Its P50 throughput of 130 tks/s is the fastest among the models currently provided by all manufacturers on the market.

Well-known AI scholar and Lepton AI founder Jia Yangqing commented immediately after the rankings were released. Let’s see what he said.

The large-scale inference cost rankings led by Jia Yangqings high efficiency are released

Jia Yangqing first explained the current status of the industry in the field of artificial intelligence, then affirmed the significance of benchmark testing, and finally pointed out that LeptonAI will help users find the best AI Basic strategy.

1. Big model API is "burning money"

If the model is in high workload benchmark test Leading position, then congratulations, it is "burning money."

LLM Reasoning about the capacity of a public API is like running a restaurant: you have a chef and you need to estimate customer traffic. Hiring a chef costs money. Latency and throughput can be understood as "how fast you can cook for customers." For a reasonable business, you need a "reasonable" number of chefs. In other words, you want to have capacity that can handle normal traffic, not sudden bursts of traffic that occur in a matter of seconds. A surge in traffic means waiting; otherwise, the "cook" will have nothing to do.

In the world of artificial intelligence, GPU plays the role of "chef". Baseline loads are bursty. Under low workloads, the baseline load is blended into normal traffic, and the measurements provide an accurate representation of how the service performs under current workloads.

The high service load scenario is interesting because it will cause interruptions. The benchmark only runs a few times per day/week, so it's not the regular traffic one should expect. Imagine having 100 people flock to your local restaurant to check out how quickly the chef is cooking. The results would be great. To borrow the terminology of quantum physics, this is called the "observer effect." The stronger the interference (i.e. the larger the burst load), the lower the accuracy. In other words: if you put a sudden high load on a service and see that the service responds very quickly, you know that the service has quite a bit of idle capacity. As an investor, when you see this situation, you should ask: Is this way of burning money responsible?

2. The model will eventually achieve similar performance

The field of artificial intelligence is very fond of competitive competitions, which is indeed interesting. Everyone quickly converges on the same solution, and Nvidia always wins in the end because of the GPU. This is thanks to great open source projects, vLLM is a great example. This means that, as a provider, if your model performs much worse than others, you can easily catch up by looking at open source solutions and applying good engineering.

3. "As a customer, I don't care about the provider's cost"

For artificial intelligence application building For developers, we are lucky: there are always API providers willing to "burn money". The AI ​​industry is burning money to gain traffic, and the next step is to worry about profits.

Benchmarking is a tedious and error-prone task. For better or worse, it usually happens that winners praise you and losers blame you. Such was the case with the last round of convolutional neural network benchmarks. It’s not an easy task, but benchmarking will help us achieve the next 10x in AI infrastructure.

Based on the artificial intelligence framework and cloud infrastructure, LeptonAI will help users find the best AI basic strategy.

The above is the detailed content of The large-scale inference cost rankings led by Jia Yangqing's high efficiency are released. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete
ai合并图层的快捷键是什么ai合并图层的快捷键是什么Jan 07, 2021 am 10:59 AM

ai合并图层的快捷键是“Ctrl+Shift+E”,它的作用是把目前所有处在显示状态的图层合并,在隐藏状态的图层则不作变动。也可以选中要合并的图层,在菜单栏中依次点击“窗口”-“路径查找器”,点击“合并”按钮。

ai橡皮擦擦不掉东西怎么办ai橡皮擦擦不掉东西怎么办Jan 13, 2021 am 10:23 AM

ai橡皮擦擦不掉东西是因为AI是矢量图软件,用橡皮擦不能擦位图的,其解决办法就是用蒙板工具以及钢笔勾好路径再建立蒙板即可实现擦掉东西。

谷歌超强AI超算碾压英伟达A100!TPU v4性能提升10倍,细节首次公开谷歌超强AI超算碾压英伟达A100!TPU v4性能提升10倍,细节首次公开Apr 07, 2023 pm 02:54 PM

虽然谷歌早在2020年,就在自家的数据中心上部署了当时最强的AI芯片——TPU v4。但直到今年的4月4日,谷歌才首次公布了这台AI超算的技术细节。论文地址:https://arxiv.org/abs/2304.01433相比于TPU v3,TPU v4的性能要高出2.1倍,而在整合4096个芯片之后,超算的性能更是提升了10倍。另外,谷歌还声称,自家芯片要比英伟达A100更快、更节能。与A100对打,速度快1.7倍论文中,谷歌表示,对于规模相当的系统,TPU v4可以提供比英伟达A100强1.

ai可以转成psd格式吗ai可以转成psd格式吗Feb 22, 2023 pm 05:56 PM

ai可以转成psd格式。转换方法:1、打开Adobe Illustrator软件,依次点击顶部菜单栏的“文件”-“打开”,选择所需的ai文件;2、点击右侧功能面板中的“图层”,点击三杠图标,在弹出的选项中选择“释放到图层(顺序)”;3、依次点击顶部菜单栏的“文件”-“导出”-“导出为”;4、在弹出的“导出”对话框中,将“保存类型”设置为“PSD格式”,点击“导出”即可;

ai顶部属性栏不见了怎么办ai顶部属性栏不见了怎么办Feb 22, 2023 pm 05:27 PM

ai顶部属性栏不见了的解决办法:1、开启Ai新建画布,进入绘图页面;2、在Ai顶部菜单栏中点击“窗口”;3、在系统弹出的窗口菜单页面中点击“控制”,然后开启“控制”窗口即可显示出属性栏。

GPT-4的研究路径没有前途?Yann LeCun给自回归判了死刑GPT-4的研究路径没有前途?Yann LeCun给自回归判了死刑Apr 04, 2023 am 11:55 AM

Yann LeCun 这个观点的确有些大胆。 「从现在起 5 年内,没有哪个头脑正常的人会使用自回归模型。」最近,图灵奖得主 Yann LeCun 给一场辩论做了个特别的开场。而他口中的自回归,正是当前爆红的 GPT 家族模型所依赖的学习范式。当然,被 Yann LeCun 指出问题的不只是自回归模型。在他看来,当前整个的机器学习领域都面临巨大挑战。这场辩论的主题为「Do large language models need sensory grounding for meaning and u

强化学习再登Nature封面,自动驾驶安全验证新范式大幅减少测试里程强化学习再登Nature封面,自动驾驶安全验证新范式大幅减少测试里程Mar 31, 2023 pm 10:38 PM

引入密集强化学习,用 AI 验证 AI。 自动驾驶汽车 (AV) 技术的快速发展,使得我们正处于交通革命的风口浪尖,其规模是自一个世纪前汽车问世以来从未见过的。自动驾驶技术具有显着提高交通安全性、机动性和可持续性的潜力,因此引起了工业界、政府机构、专业组织和学术机构的共同关注。过去 20 年里,自动驾驶汽车的发展取得了长足的进步,尤其是随着深度学习的出现更是如此。到 2015 年,开始有公司宣布他们将在 2020 之前量产 AV。不过到目前为止,并且没有 level 4 级别的 AV 可以在市场

ai移动不了东西了怎么办ai移动不了东西了怎么办Mar 07, 2023 am 10:03 AM

ai移动不了东西的解决办法:1、打开ai软件,打开空白文档;2、选择矩形工具,在文档中绘制矩形;3、点击选择工具,移动文档中的矩形;4、点击图层按钮,弹出图层面板对话框,解锁图层;5、点击选择工具,移动矩形即可。

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Tools

Safe Exam Browser

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

PhpStorm Mac version

PhpStorm Mac version

The latest (2018.2.1) professional PHP integrated development tool

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

MinGW - Minimalist GNU for Windows

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools