Home >Technology peripherals >AI >Three secrets for deploying large models in the cloud

Three secrets for deploying large models in the cloud

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB
WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBforward
2024-04-24 15:00:02771browse

Three secrets for deploying large models in the cloud

##Compiled | Produced by Xingxuan

| 51CTO Technology Stack (WeChat ID: blog51cto)

In the past two years, I More involved in generative AI projects using large language models (LLMs) rather than traditional systems. I'm starting to miss serverless cloud computing. Their applications range from enhancing conversational AI to providing complex analytics solutions for various industries, and many other capabilities. Many enterprises deploy these models on cloud platforms because public cloud providers already provide a ready-made ecosystem and it is the path of least resistance. However, it doesn't come cheap.

The cloud also provides other benefits such as scalability, efficiency and advanced computing capabilities (GPUs available on demand). The process of deploying LLM on a public cloud platform has some little-known secrets that can have a significant impact on success or failure. Perhaps because there are not many AI experts dealing with LLMs, and because we don’t have much experience in this area yet, there are many gaps in our knowledge system.

Let’s explore three little-known “tricks” when deploying LLM on the cloud, maybe even your AI engineers don’t know. Considering these engineers often make over $300,000 a year, maybe it's time to think about the details of what they do. I see everyone rushing towards becoming AI like their hair is on fire, but making more mistakes than ever before.

1. Management cost-effectiveness and scalability

One of the main attractions of deploying LLMs on cloud platforms is the ability to scale resources on demand. We don’t need to be good capacity planners because cloud platforms have resources that we just click the mouse on or automatically allocate.

But wait, we are about to make the same mistake we made when we used cloud computing. Managing costs while scaling is a skill that many people need help navigating effectively. Note that cloud services typically charge based on the computing resources consumed; they operate like utilities. The more you process, the more you pay. Given that GPUs cost more (and consume more power), this is a core concern when using LLMs provided by public cloud providers.

Please make sure you use cost management tools, including tools provided by cloud platforms and tools provided by reliable third-party cost governance and monitoring service providers (finops). For example, implement automatic scaling and scheduling, choose the right instance type, or use preemptible instances to optimize costs. Also, remember to continuously monitor your deployment and adjust resources based on usage rather than just predicted load. This means avoiding overprovisioning at all costs (get my pun here?).

2. Data Privacy in Multi-Tenant Environments

Deploying LLMs often involves processing large amounts of data and training knowledge models, which may contain sensitive or proprietary data. The risk with using a public cloud is that your "neighbors" are in the form of processing instances running on the same physical hardware. Therefore, public cloud storage does carry the risk that during data storage and processing, the data may be accessed by other virtual machines running on the same physical hardware in the public cloud data center. To solve this problem, many public cloud providers offer cloud security options for enterprises. These options provide isolation and protection of your data from access by other virtual machines running on the physical hardware. Another security issue is the transmission of data during storage and processing. Data may be transmitted over public cloud networks, which means it may be intercepted or eavesdropped during transmission. To solve this problem, public clouds usually provide encryption and secure transmission protocols to protect the security of data during transmission. Overall, deploying LLMs

If you ask a public cloud provider about this, they'll rush out with their latest PowerPoint presentation showing how it's impossible. While this is mostly true, it's not entirely accurate. This risk exists with all multi-tenant systems; you need to mitigate it. I've found that the smaller the cloud provider, such as those that only operate in a single country, the greater the likelihood of this problem occurring. This applies to data stores and LLMs.

The secret is to choose a cloud provider that meets and provides proof of strict security standards: data encryption at rest and in transit, identity and access management (IAM), and isolation policies. Of course, it's better to implement your own security policy and security technology stack to ensure that using multi-tenant LLMs on the cloud is less risky.

3. Handling stateful model deployment

Large language models (LLMs) are mostly stateful, meaning they retain information from one interaction to the next. This old approach offers new benefits: the ability to be more efficient in continuous learning scenarios. However, managing the statefulness of these models in cloud environments is challenging because instances in cloud environments may be ephemeral or stateless by design.

Orchestration tools that support stateful deployment (such as Kubernetes) are helpful. They can leverage persistent storage options for large language models and be configured to maintain and manipulate their state across sessions. You need to do this in order to support continuity and performance of large language models.

With the explosive growth of generative artificial intelligence, deploying large language models on cloud platforms is a foregone conclusion. For most businesses, not using the cloud is simply too inconvenient. My worry about the ensuing craze is that we will miss some easy-to-solve problems and make huge and expensive mistakes that are mostly avoidable in the end.

To learn more about AIGC, please visit:

51CTO AI.x Community

https://www.51cto.com/aigc/

The above is the detailed content of Three secrets for deploying large models in the cloud. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete