search
Homeweb3.0Real-Time AI Inference at Scale with WebSockets and Durable Objects

Real-Time AI Inference at Scale with WebSockets and Durable Objects

Nov 20, 2024 am 09:08 AM
authenticationDurable Objects WebSockets AI Gateway

In October 2024, we talked about storing billions of logs from your AI application using AI Gateway, and how we used Cloudflare's Developer Platform to do this.

Real-Time AI Inference at Scale with WebSockets and Durable Objects

In October 2024, we covered how to store billions of logs from your AI application using AI Gateway, and how we used Cloudflare’s Developer Platform to do this.

With AI Gateway already processing over 3 billion logs and experiencing rapid growth, the number of connections to the platform continues to increase steadily. To help developers manage this scale more effectively, we wanted to offer an alternative to implementing HTTP/2 keep-alive to maintain persistent HTTP(S) connections, thereby avoiding the overhead of repeated handshakes and TLS negotiations with each new HTTP connection to AI Gateway. We understand that implementing HTTP/2 can present challenges, particularly when many libraries and tools may not support it by default and most modern programming languages have well-established WebSocket libraries available.

With this in mind, we used Cloudflare’s Developer Platform and Durable Objects (yes, again!) to build a WebSockets API that establishes a single, persistent connection, enabling continuous communication.

Through this API, all AI providers supported by AI Gateway can be accessed via WebSocket, allowing you to maintain a single TCP connection between your client or server application and the AI Gateway. The best part? Even if your chosen provider doesn’t support WebSockets, we handle it for you, managing the requests to your preferred AI provider.

By connecting via WebSocket to AI Gateway, we make the requests to the inference service for you using the provider’s supported protocols (HTTPS, WebSocket, etc.), and you can keep the connection open to execute as many inference requests as you would like.

To make your connection to AI Gateway more secure, we are also introducing authentication for AI Gateway. The new WebSockets API will require authentication. All you need to do is create a Cloudflare API token with the permission “AI Gateway: Run” and send that in the cf-aig-authorization header.

In the flow diagram above:

1. When Authenticated Gateway is enabled and a valid token is included, requests will pass successfully.

2. If Authenticated Gateway is enabled, but a request does not contain the required cf-aig-authorization header with a valid token, the request will fail. This ensures only verified requests pass through the gateway.

3. When Authenticated Gateway is disabled, the cf-aig-authorization header is bypassed entirely, and any token — whether valid or invalid — is ignored.

How we built it

We recently used Durable Objects (DOs) to scale our logging solution for AI Gateway, so using WebSockets within the same DOs was a natural fit.

When a new WebSocket connection is received by our Cloudflare Workers, we implement authentication in two ways to support the diverse capabilities of WebSocket clients. The primary method involves validating a Cloudflare API token through the cf-aig-authorization header, ensuring the token is valid for the connecting account and gateway.

However, due to limitations in browser WebSocket implementations, we also support authentication via the “sec-websocket-protocol” header. Browser WebSocket clients don't allow for custom headers in their standard API, complicating the addition of authentication tokens in requests. While we don’t recommend that you store API keys in a browser, we decided to add this method to add more flexibility to all WebSocket clients.

After this initial verification step, we upgrade the connection to the Durable Object, meaning that it will now handle all the messages for the connection. Before the new connection is fully accepted, we generate a random UUID, so this connection is identifiable among all the messages received by the Durable Object. During an open connection, any AI Gateway settings passed via headers — such as cf-aig-skip-cache (which bypasses caching when set to true) — are stored and applied to all requests in the session. However, these headers can still be overridden on a per-request basis, just like with the Universal Endpoint today.

How it works

Once the connection is established, the Durable Object begins listening for incoming messages. From this point on, users can send messages in the AI Gateway universal format via WebSocket, simplifying the transition of your application from an existing HTTP setup to WebSockets-based communication.

When a new message reaches the Durable Object, it’s processed using the same code that powers the HTTP Universal Endpoint, enabling seamless code reuse across Workers and Durable Objects — one of the key benefits of building on Cloudflare.

For non-streaming requests, the response is wrapped in a JSON envelope, allowing us to include additional information beyond the AI inference itself, such as the AI Gateway log ID for that request.

Here’s an example response for the request above:

For streaming requests, AI Gateway sends an initial message with request metadata telling the developer the stream is starting.

After this initial message, all streaming chunks are relayed in real-time to the WebSocket connection as they arrive from the inference provider. Note that only the eventId field is included in the metadata for these streaming chunks (more info on what this new field is below).

This approach serves two purposes:

The above is the detailed content of Real-Time AI Inference at Scale with WebSockets and Durable Objects. For more information, please follow other related articles on the PHP Chinese website!

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
The Pi Network token price has increased by more than 14% over the past week.The Pi Network token price has increased by more than 14% over the past week.Apr 16, 2025 am 11:22 AM

As of press time, Pi is trading at $0.6711 after its integration with Chainlink on April 12th. The announcement caused a surge in the price of Pi

More Reasons to Be Bullish on SUI as Price Enters Prime Buying ZoneMore Reasons to Be Bullish on SUI as Price Enters Prime Buying ZoneApr 16, 2025 am 11:20 AM

An analyst on X, Frigg, highlights multiple reasons to be optimistic about SUI Network price trajectory.

Bitcoin (BTC) Sentiment Is Turning Bullish as Over $467 Million of the Cryptocurrency Was Withdrawn From Exchanges YesterdayBitcoin (BTC) Sentiment Is Turning Bullish as Over $467 Million of the Cryptocurrency Was Withdrawn From Exchanges YesterdayApr 16, 2025 am 11:18 AM

Following today's data released by IntoTheBlock, sentiment around Bitcoin appears heightening towards bullishness.

Movement Labs and the Movement Network Foundation have launched an independent investigation into recent market-making irregularities related to the MOVE token.Movement Labs and the Movement Network Foundation have launched an independent investigation into recent market-making irregularities related to the MOVE token.Apr 16, 2025 am 11:16 AM

nt Labs and the Movement Network Foundation Launch Independent Investigation into MOVE Token Market-Making Irregularities

A wave of capital is flowing out of Ethereum [ETH] and into Tron [TRX]A wave of capital is flowing out of Ethereum [ETH] and into Tron [TRX]Apr 16, 2025 am 11:14 AM

With $1.52 billion in stablecoins migrating to Tron, investors appear to be favoring lower-cost chains and diversifying beyond traditional USD-backed assets.

Mantra CEO John Patrick Mullin Burns His Allocation of OM Tokens to Restore Investor ConfidenceMantra CEO John Patrick Mullin Burns His Allocation of OM Tokens to Restore Investor ConfidenceApr 16, 2025 am 11:12 AM

Mantra CEO John Patrick Mullin has proposed burning his allocation of OM tokens in a move aimed at restoring investor confidence after the protocol's native token suffered a sharp collapse.

Recent market movements and technical analysis suggest that BONK may be on the brink of a significant price surge, outpacing Solana's broader ecosystem.Recent market movements and technical analysis suggest that BONK may be on the brink of a significant price surge, outpacing Solana's broader ecosystem.Apr 16, 2025 am 11:10 AM

Technical Setup for Bonk Price Recovery According to prominent crypto analyst Altcoin Sherpa, Bonk is showing signs of a potential rebound.

Securitize Acquires MG Stover's Fund Administration Business to Become the Largest Digital Asset Fund AdministratorSecuritize Acquires MG Stover's Fund Administration Business to Become the Largest Digital Asset Fund AdministratorApr 16, 2025 am 11:08 AM

Securitize, one of the largest tokenized asset issuers, said on Tuesday it has acquired MG Stover's fund administration business

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
1 months agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
1 months agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
1 months agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Chat Commands and How to Use Them
1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

MinGW - Minimalist GNU for Windows

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

EditPlus Chinese cracked version

EditPlus Chinese cracked version

Small size, syntax highlighting, does not support code prompt function

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

SublimeText3 Linux new version

SublimeText3 Linux new version

SublimeText3 Linux latest version

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment