search
HomeTechnology peripheralsAIThree secrets for deploying large models in the cloud

Three secrets for deploying large models in the cloud

##Compiled | Produced by Xingxuan

| 51CTO Technology Stack (WeChat ID: blog51cto)

In the past two years, I More involved in generative AI projects using large language models (LLMs) rather than traditional systems. I'm starting to miss serverless cloud computing. Their applications range from enhancing conversational AI to providing complex analytics solutions for various industries, and many other capabilities. Many enterprises deploy these models on cloud platforms because public cloud providers already provide a ready-made ecosystem and it is the path of least resistance. However, it doesn't come cheap.

The cloud also provides other benefits such as scalability, efficiency and advanced computing capabilities (GPUs available on demand). The process of deploying LLM on a public cloud platform has some little-known secrets that can have a significant impact on success or failure. Perhaps because there are not many AI experts dealing with LLMs, and because we don’t have much experience in this area yet, there are many gaps in our knowledge system.

Let’s explore three little-known “tricks” when deploying LLM on the cloud, maybe even your AI engineers don’t know. Considering these engineers often make over $300,000 a year, maybe it's time to think about the details of what they do. I see everyone rushing towards becoming AI like their hair is on fire, but making more mistakes than ever before.

1. Management cost-effectiveness and scalability

One of the main attractions of deploying LLMs on cloud platforms is the ability to scale resources on demand. We don’t need to be good capacity planners because cloud platforms have resources that we just click the mouse on or automatically allocate.

But wait, we are about to make the same mistake we made when we used cloud computing. Managing costs while scaling is a skill that many people need help navigating effectively. Note that cloud services typically charge based on the computing resources consumed; they operate like utilities. The more you process, the more you pay. Given that GPUs cost more (and consume more power), this is a core concern when using LLMs provided by public cloud providers.

Please make sure you use cost management tools, including tools provided by cloud platforms and tools provided by reliable third-party cost governance and monitoring service providers (finops). For example, implement automatic scaling and scheduling, choose the right instance type, or use preemptible instances to optimize costs. Also, remember to continuously monitor your deployment and adjust resources based on usage rather than just predicted load. This means avoiding overprovisioning at all costs (get my pun here?).

2. Data Privacy in Multi-Tenant Environments

Deploying LLMs often involves processing large amounts of data and training knowledge models, which may contain sensitive or proprietary data. The risk with using a public cloud is that your "neighbors" are in the form of processing instances running on the same physical hardware. Therefore, public cloud storage does carry the risk that during data storage and processing, the data may be accessed by other virtual machines running on the same physical hardware in the public cloud data center. To solve this problem, many public cloud providers offer cloud security options for enterprises. These options provide isolation and protection of your data from access by other virtual machines running on the physical hardware. Another security issue is the transmission of data during storage and processing. Data may be transmitted over public cloud networks, which means it may be intercepted or eavesdropped during transmission. To solve this problem, public clouds usually provide encryption and secure transmission protocols to protect the security of data during transmission. Overall, deploying LLMs

If you ask a public cloud provider about this, they'll rush out with their latest PowerPoint presentation showing how it's impossible. While this is mostly true, it's not entirely accurate. This risk exists with all multi-tenant systems; you need to mitigate it. I've found that the smaller the cloud provider, such as those that only operate in a single country, the greater the likelihood of this problem occurring. This applies to data stores and LLMs.

The secret is to choose a cloud provider that meets and provides proof of strict security standards: data encryption at rest and in transit, identity and access management (IAM), and isolation policies. Of course, it's better to implement your own security policy and security technology stack to ensure that using multi-tenant LLMs on the cloud is less risky.

3. Handling stateful model deployment

Large language models (LLMs) are mostly stateful, meaning they retain information from one interaction to the next. This old approach offers new benefits: the ability to be more efficient in continuous learning scenarios. However, managing the statefulness of these models in cloud environments is challenging because instances in cloud environments may be ephemeral or stateless by design.

Orchestration tools that support stateful deployment (such as Kubernetes) are helpful. They can leverage persistent storage options for large language models and be configured to maintain and manipulate their state across sessions. You need to do this in order to support continuity and performance of large language models.

With the explosive growth of generative artificial intelligence, deploying large language models on cloud platforms is a foregone conclusion. For most businesses, not using the cloud is simply too inconvenient. My worry about the ensuing craze is that we will miss some easy-to-solve problems and make huge and expensive mistakes that are mostly avoidable in the end.

To learn more about AIGC, please visit:

51CTO AI.x Community

https://www.51cto.com/aigc/

The above is the detailed content of Three secrets for deploying large models in the cloud. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete
一文搞懂Tokenization!一文搞懂Tokenization!Apr 12, 2024 pm 02:31 PM

语言模型是对文本进行推理的,文本通常是字符串形式,但模型的输入只能是数字,因此需要将文本转换成数字形式。Tokenization是自然语言处理的基本任务,根据特定需求能够把一段连续的文本序列(如句子、段落等)切分为一个字符序列(如单词、短语、字符、标点等多个单元),其中的单元称为token或词语。根据下图所示的具体流程,首先将文本句子切分成一个个单元,然后将单元素数值化(映射为向量),再将这些向量输入到模型进行编码,最后输出到下游任务进一步得到最终的结果。文本切分按照文本切分的粒度可以将Toke

为大模型提供全新科学复杂问答基准与测评体系,UNSW、阿贡、芝加哥大学等多家机构联合推出SciQAG框架为大模型提供全新科学复杂问答基准与测评体系,UNSW、阿贡、芝加哥大学等多家机构联合推出SciQAG框架Jul 25, 2024 am 06:42 AM

编辑|ScienceAI问答(QA)数据集在推动自然语言处理(NLP)研究发挥着至关重要的作用。高质量QA数据集不仅可以用于微调模型,也可以有效评估大语言模型(LLM)的能力,尤其是针对科学知识的理解和推理能力。尽管当前已有许多科学QA数据集,涵盖了医学、化学、生物等领域,但这些数据集仍存在一些不足。其一,数据形式较为单一,大多数为多项选择题(multiple-choicequestions),它们易于进行评估,但限制了模型的答案选择范围,无法充分测试模型的科学问题解答能力。相比之下,开放式问答

云端部署大模型的三个秘密云端部署大模型的三个秘密Apr 24, 2024 pm 03:00 PM

编译|星璇出品|51CTO技术栈(微信号:blog51cto)在过去的两年里,我更多地参与了使用大型语言模型(LLMs)的生成AI项目,而非传统的系统。我开始怀念无服务器云计算。它们的应用范围广泛,从增强对话AI到为各行各业提供复杂的分析解决方案,以及其他许多功能。许多企业将这些模型部署在云平台上,因为公共云提供商已经提供了现成的生态系统,而且这是阻力最小的路径。然而,这并不便宜。云还提供了其他好处,如可扩展性、效率和高级计算能力(按需提供GPU)。在公共云平台上部署LLM的过程有一些鲜为人知的

大规模语言模型高效参数微调--BitFit/Prefix/Prompt 微调系列大规模语言模型高效参数微调--BitFit/Prefix/Prompt 微调系列Oct 07, 2023 pm 12:13 PM

2018年谷歌发布了BERT,一经面世便一举击败11个NLP任务的State-of-the-art(Sota)结果,成为了NLP界新的里程碑;BERT的结构如下图所示,左边是BERT模型预训练过程,右边是对于具体任务的微调过程。其中,微调阶段是后续用于一些下游任务的时候进行微调,例如:文本分类,词性标注,问答系统等,BERT无需调整结构就可以在不同的任务上进行微调。通过”预训练语言模型+下游任务微调”的任务设计,带来了强大的模型效果。从此,“预训练语言模型+下游任务微调”便成为了NLP领域主流训

顺手训了一个史上超大ViT?Google升级视觉语言模型PaLI:支持100+种语言顺手训了一个史上超大ViT?Google升级视觉语言模型PaLI:支持100+种语言Apr 12, 2023 am 09:31 AM

近几年自然语言处理的进展很大程度上都来自于大规模语言模型,每次发布的新模型都将参数量、训练数据量推向新高,同时也会对现有基准排行进行一次屠榜!比如今年4月,Google发布5400亿参数的语言模型PaLM(Pathways Language Model)在语言和推理类的一系列测评中成功超越人类,尤其是在few-shot小样本学习场景下的优异性能,也让PaLM被认为是下一代语言模型的发展方向。同理,视觉语言模型其实也是大力出奇迹,可以通过提升模型的规模来提升性能。当然了,如果只是多任务的视觉语言模

RoSA: 一种高效微调大模型参数的新方法RoSA: 一种高效微调大模型参数的新方法Jan 18, 2024 pm 05:27 PM

随着语言模型扩展到前所未有的规模,对下游任务进行全面微调变得十分昂贵。为了解决这个问题,研究人员开始关注并采用PEFT方法。PEFT方法的主要思想是将微调的范围限制在一小部分参数上,以降低计算成本,同时仍能实现自然语言理解任务的最先进性能。通过这种方式,研究人员能够在保持高性能的同时,节省计算资源,为自然语言处理领域带来新的研究热点。RoSA是一种新的PEFT技术,通过在一组基准测试的实验中,发现在使用相同参数预算的情况下,RoSA表现出优于先前的低秩自适应(LoRA)和纯稀疏微调方法。本文将深

Meta 推出 AI 语言模型 LLaMA,一个有着 650 亿参数的大型语言模型Meta 推出 AI 语言模型 LLaMA,一个有着 650 亿参数的大型语言模型Apr 14, 2023 pm 06:58 PM

2月25日消息,Meta在当地时间周五宣布,它将推出一种针对研究社区的基于人工智能(AI)的新型大型语言模型,与微软、谷歌等一众受到ChatGPT刺激的公司一同加入人工智能竞赛。Meta的LLaMA是“大型语言模型MetaAI”(LargeLanguageModelMetaAI)的缩写,它可以在非商业许可下提供给政府、社区和学术界的研究人员和实体工作者。该公司将提供底层代码供用户使用,因此用户可以自行调整模型,并将其用于与研究相关的用例。Meta表示,该模型对算力的要

BLOOM可以为人工智能研究创造一种新的文化,但挑战依然存在BLOOM可以为人工智能研究创造一种新的文化,但挑战依然存在Apr 09, 2023 pm 04:21 PM

​译者 | 李睿审校 | 孙淑娟BigScience研究项目日前发布了一个大型语言模型BLOOM,乍一看,它看起来像是复制OpenAI的GPT-3的又一次尝试。 但BLOOM与其他大型自然语言模型(LLM)的不同之处在于,它在研究、开发、培训和发布机器学习模型方面所做的努力。近年来,大型科技公司将大型自然语言模型(LLM)就像严守商业机密一样隐藏起来,而BigScience团队从项目一开始就把透明与开放放在了BLOOM的中心。 其结果是一个大型语言模型,可以供研究和学习,并可供所有人使用。B

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

VSCode Windows 64-bit Download

VSCode Windows 64-bit Download

A free and powerful IDE editor launched by Microsoft

WebStorm Mac version

WebStorm Mac version

Useful JavaScript development tools

DVWA

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

SecLists

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

Atom editor mac version download

Atom editor mac version download

The most popular open source editor