首页  >  文章  >  web前端  >  Step Functions 中间件:自动存储和加载来自 Amazon S3 的有效负载

Step Functions 中间件:自动存储和加载来自 Amazon S3 的有效负载

王林
王林原创
2024-08-21 06:18:36699浏览

我想介绍一下 middy-store,这是我在过去几个月里建立的一个新库。我思考这个想法已经有一段时间了,回到一年多前我提出的这个功能请求。 middy-store 是 Middy 的中间件,可自动在 Amazon S3 等 Store 或其他潜在服务中存储和加载有效负载。

动机

AWS 服务有一定的限制,必须注意。例如,AWS Lambda 对于同步调用的负载限制为 6MB,对于异步调用的负载限制为 256KB。 AWS Step Functions 允许以 UTF-8 编码字符串形式输入或输出最大 256KB 的数据。如果返回数据时超过此限制,您将遇到臭名昭著的 States.DataLimitExceeded 异常。

Middleware for Step Functions: Automatically Store and Load Payloads from Amazon S3

解决此限制的通常方法是检查有效负载的大小并将其临时保存在持久存储(例如 Amazon S3)中。然后,您返回 S3 的对象 URL 或 ARN。下一个 Lambda 检查输入中是否存在 URL 或 ARN,并从 S3 加载有效负载。可以想象,这会产生大量样板代码来存储 Amazon S3 和向 Amazon S3 加载有效负载,而这些代码必须在每个 Lambda 中重复。

Middleware for Step Functions: Automatically Store and Load Payloads from Amazon S3

当您只想将部分有效负载保存到 S3 并保留其余部分时,这会变得更加麻烦。例如,在使用 Step Functions 时,有效负载可能包含必须直接访问的 Choice 或 Map 等状态的控制流数据。这意味着第一个 Lambda 将部分有效负载保存到 S3,下一个 Lambda 必须从 S3 加载部分有效负载并将其与其余有效负载合并。这需要确保多个函数之间的类型一致,这当然很容易出错。

它是如何运作的

middy-store是Middy的中间件。它附加到 Lambda 函数,并在 Lambda 调用期间被调用两次:在 Lambda handler() 运行之前运行之后。它在处理程序运行之前接收输入,并在处理程序完成后接收来自处理程序的输出。

Middleware for Step Functions: Automatically Store and Load Payloads from Amazon S3

让我们从成功调用后的输出开始,以便于理解:middy-store 从 handler() 函数接收输出(有效负载)并检查大小。为了计算大小,它会将有效负载(如果它是对象)字符串化,并使用 Buffer.byteLength() 计算 UTF-8 编码的字符串大小。如果大小大于某个可配置阈值,则有效负载将存储在 Amazon S3 等存储中。然后,对存储的有效负载(例如 S3 URL 或 ARN)的引用将作为输出而不是原始输出返回。

现在让我们看看下一个 Lambda 函数(例如在状态机中),它将接收此输出作为其输入。这次我们在调用 handler() 之前查看输入 :middy-store 接收处理程序的输入并搜索对存储的有效负载的引用。如果找到,则从 Store 加载有效负载并作为处理程序的输入返回。处理程序使用有效负载,就像直接传递给它一样。

这里有一个例子来说明 middy-store 的工作原理:

/* ./src/functions/handler1.ts */
export const handler1 = middy()
  .use(
    middyStore({
      stores: [new S3Store({ /* S3 options */ })],
    })
  )
  .handler(async (input) => {
    // Return 1MB of random data as a base64 encoded string as output 
    return randomBytes(1024 * 1024).toString('base64');
  });

/* ./src/functions/handler2.ts */
export const handler2 = middy()
  .use(
    middyStore({
      stores: [new S3Store({ /* S3 options */ })],
    })
  )
  .handler(async (input) => {
    // Print the size of the input
    return console.log(`Size: ${Buffer.from(input, "base64").byteLength / 1024 / 1024} MB`);
  });

/* ./src/workflow.ts */
// First Lambda returns a large output
// It automatically uploads the data to S3 
const output1 = await handler1({});

// Output is a reference to the S3 object: { "@middy-store": "s3://bucket/key"}
console.log(output1); 

// Second Lambda receives the output as input
// It automatically downloads the data from S3
const output2 = await handler2(output1);

什么是商店?

一般来说,Store 是允许您存储和加载任意负载的任何服务,例如 Amazon S3 或其他持久存储系统。 DynamoDB 等数据库也可以充当存储。 Store 从 Lambda 处理程序接收有效负载,将其序列化(如果它是对象),并将其存储在持久存储中。当下一个 Lambda 处理程序需要有效负载时,Store 从存储中加载有效负载,反序列化并返回它。

middle-store 通过 StoreInterface 接口与 Store 进行交互,每个 Store 都必须实现该接口。该接口定义了函数 canStore() 和 store() 来存储有效负载,以及 canLoad() 和 load() 来加载有效负载。

interface StoreInterface<TPayload = unknown, TReference = unknown> {
  name: string;
  canLoad: (args: LoadArgs<unknown>) => boolean;
  load: (args: LoadArgs<TReference | unknown>) => Promise<TPayload>;
  canStore: (args: StoreArgs<TPayload>) => boolean;
  store: (args: StoreArgs<TPayload>) => Promise<TReference>;
}
  • canStore() 作为一个守卫来检查 Store 是否可以存储给定的有效负载。它接收有效负载及其字节大小,并检查有效负载是否符合存储的最大大小限制。例如,DynamoDB 支持的存储的最大项目大小为 400KB,而 S3 存储实际上对其可以存储的有效负载大小没有限制。

  • store() receives a payload and stores it in its underlying storage system. It returns a reference to the payload, which is a unique identifier to identify the stored payload within the underlying service. For example, the Amazon S3 Store uses an S3 URI in the format s3:/// as a reference, while other Amazon services might use ARNs.

  • canLoad() acts like a filter to check if the Store can load a certain reference. It receives the reference to a stored payload and checks if it's a valid identifier for the underlying storage system. For example, the Amazon S3 Store checks if the reference is a valid S3 URI, while a DynamoDB Store would check if it's a valid ARN.

  • load() receives the reference to a stored payload and loads the payload from storage. Depending on the Store, the payload will be deserialized into its original type according to the metadata that was stored alongside it. For example, a payload of type application/json will get parsed back into a JSON object, while a plain string of type text/plain will remain unaltered.

Single and Multiple Stores

Most of the time, you will only need one Store, like Amazon S3, which can effectively store any payload. However, middy-store lets you work with multiple Stores at the same time. This can be useful if you want to store different types of payloads in different Stores. For example, you might want to store large payloads in S3 and small payloads in DynamoDB.

Middleware for Step Functions: Automatically Store and Load Payloads from Amazon S3

middy-store accepts an Array in the options to provide one or more Stores. When middy-store runs before the handler and finds a reference in the input, it will iterate over the Stores and call canLoad() with the reference for each Store. The first Store that returns true will be used to load the payload with load().

On the other hand, when middy-store runs after the handler and the output is larger than the maximum allowed size, it will iterate over the Stores and call canStore() for each Store. The first Store that returns true will be used to store the payload with store().

Therefore, it is important to note that the order of the Stores in the array is important.

References

When a payload is stored in a Store, middy-store will return a reference to the stored payload. The reference is a unique identifier to find the stored payload in the Store. The value of the identifier depends on the Store and its configuration. For example, the Amazon S3 Store will use an S3 URI by default. However, it can also be configured to return other formats like an ARN arn:aws:s3:::/, an HTTP endpoint https://.s3.us-west-1.amazonaws.com/, or a structured object with the bucket and key.

The output from the handler after middy-store will contain the reference to the stored payload:

/* Output with reference */
{
  "@middy-store": "s3://bucket/key"
}

middy-store embeds the reference from the Store in the output as an object with a key "@middy-store". This allows middy-store to quickly find all references when the next Lambda function is called and load the payloads from the Store before the handler runs. In case you are wondering, middy-store recursively iterates through the input object and searches for the "@middy-store" key. That means the input can contain multiple references, even from different Stores, and middy-store will find and load them.

Selecting a Payload

By default, middy-store will store the entire output of the handler as a payload in the Store. However, you can also select only a part of the output to be stored. This is useful for workflows like AWS Step Functions, where you might need some of the data for control flow, e.g., a Choice state.

middy-store accepts a selector in its storingOptions config. The selector is a string path to the relevant value in the output that should be stored.

Here's an example:

const output = {
  a: {
    b: ['foo', 'bar', 'baz'],
  },
};

export const handler = middy()
  .use(
    middyStore({
      stores: [new S3Store({ /* S3 options */ })],
      storingOptions: {
        selector: '',          /* select the entire output as payload */
        // selector: 'a';      /* selects the payload at the path 'a' */
        // selector: 'a.b';    /* selects the payload at the path 'a.b' */
        // selector: 'a.b[0]'; /* selects the payload at the path 'a.b[0]' */
        // selector: 'a.b[*]'; /* selects the payloads at the paths 'a.b[0]', 'a.b[1]', 'a.b[2]', etc. */
      }
    })
  )
  .handler(async () => output);

await handler({});

The default selector is an empty string (or undefined), which selects the entire output as a payload. In this case, middy-store will return an object with only one property, which is the reference to the stored payload.

/* selector: '' */
{
  "@middy-store": "s3://bucket/key"
}

The selectors a, a.b, or a.b[0] select the value at the path and store only this part in the Store. The reference to the stored payload will be inserted at the path in the output, thereby replacing the original value.

/* selector: 'a' */
{
  a: {
    "@middy-store": "s3://bucket/key"
  }
}
/* selector: 'a.b' */
{
  a: {
    b: {
      "@middy-store": "s3://bucket/key"
    }
  }
}
/* selector: 'a.b[0]' */
{
  a: {
    b: [
      { "@middy-store": "s3://bucket/key" }, 
      'bar', 
      'baz'
    ]
  }
}

A selector ending with [*] like a.b[*] acts like an iterator. It will select the array at a.b and store each element in the array in the Store separately. Each element will be replaced with the reference to the stored payload.

/* selector: 'a.b[*]' */
{
  a: {
    b: [
      { "@middy-store": "s3://bucket/key" }, 
      { "@middy-store": "s3://bucket/key" }, 
      { "@middy-store": "s3://bucket/key" }
    ]
  }
}

Size Limit

middy-store will calculate the size of the entire output returned from the handler. The size is calculated by stringifying the output, if it's not already a string, and calculating the UTF-8 encoded size of the string in bytes. It will then compare this size to the configured minSize in the storingOptions config. If the output size is equal to or greater than the minSize, it will store the output or a part of it in the Store.

export const handler = middy()
  .use(
    middyStore({
      stores: [new S3Store({ /* S3 options */ })],
      storingOptions: {
        minSize: Sizes.STEP_FUNCTIONS,  /* 256KB */
        // minSize: Sizes.LAMBDA_SYNC,  /* 6MB */
        // minSize: Sizes.LAMBDA_ASYNC, /* 256KB */
        // minSize: 1024 * 1024,        /* 1MB */
        // minSize: Sizes.ZERO,         /* 0 */
        // minSize: Sizes.INFINITY,     /* Infinity */
        // minSize: Sizes.kb(512),      /* 512KB */
        // minSize: Sizes.mb(1),        /* 1MB */
      }
    })
  )
  .handler(async () => output);

await handler({});

middy-store provides a Sizes helper with some predefined limits for Lambda and Step Functions. If minSize is not specified, it will use Sizes.STEP_FUNCTIONS with 256KB as the default minimum size. The Sizes.ZERO (equal to the number 0) means that middy-store will always store the payload in a Store, ignoring the actual output size. On the other hand, Sizes.INFINITY (equal to Math.POSITIVE_INFINITY) means that it will never store the payload in a Store.

Stores

Currently, there is only one Store implementation for Amazon S3, but I'm planning to implement a Store backed by DynamoDB and DAX. DynamoDB, with its Time-To-Live (TTL) feature, provides a great option for short-term payloads that only need to exist during the execution of a workflow like Step Functions.

Amazon S3

The middy-store-s3 package provides a store implementation for Amazon S3. It uses the official @aws-sdk/client-s3 package to interact with S3.

import { middyStore } from 'middy-store';
import { S3Store } from 'middy-store-s3';

const handler = middy()
  .use(
    middyStore({
      stores: [
        new S3Store({
          config: { region: "us-east-1" },
          bucket: "bucket",
          key: () => randomUUID(),
          format: "arn",
        }),
      ],
    }),
  )
  .handler(async (input) => {
    return { /* ... */ };
  });

The S3Store only requires a bucket where the payloads are being stored. The key is optional and defaults to randomUUID(). The format configures the style of the reference that is returned after a payload is stored. The supported formats include arn, object, or one of the URL formats from the amazon-s3-url package. It's important to note that S3Store can load any of these formats; the format config only concerns the returned reference. The config is the S3 client configuration and is optional. If not set, the S3 client will resolve the config (credentials, region, etc.) from the environment or file system.

Custom Store

A new Store can be implemented as a class or a plain object, as long as it provides the required functions from the StoreInterface interface.

Here's an example of a Store to store and load payloads as base64 encoded data URLs:

import { StoreInterface, middyStore } from 'middy-store';

const base64Store: StoreInterface<string, string> = {
  name: "base64",
  /* Reference must be a string starting with "data:text/plain;base64," */
  canLoad: ({ reference }) => {
    return (
      typeof reference === "string" &&
      reference.startsWith("data:text/plain;base64,")
    );
  },
  /* Decode base64 string and parse into object */
  load: async ({ reference }) => {
    const base64 = reference.replace("data:text/plain;base64,", "");
    return Buffer.from(base64, "base64").toString();
  },
  /* Payload must be a string or an object */
  canStore: ({ payload }) => {
    return typeof payload === "string" || typeof payload === "object";
  },
  /* Stringify object and encode as base64 string */
  store: async ({ payload }) => {
    const base64 = Buffer.from(JSON.stringify(payload)).toString("base64");
    return `data:text/plain;base64,${base64}`;
  },
};

const handler = middy()
  .use(
    middyStore({
      stores: [base64Store],
      storingOptions: {
        minSize: Sizes.ZERO, /* Always store the data */ 
      }
    }),
  )
  .handler(async (input) => {
    /* Random text with 100 words */ 
    return `Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.`;
  });

const output = await handler(null, context);

/* Prints: { '@middy-store': 'data:text/plain;base64,IkxvcmVtIGlwc3VtIGRvbG9yIHNpdC...' } */
console.log(output);

This example is the perfect way to try middy-store, because it doesn't rely on external resources like an S3 bucket. You will find it in the repository at examples/custom-store and should be able to run it locally.

Contributions and Feedback

I've been tinkering with the API design for a while, and it's definitely not stable yet. I would love to get feedback on the current state as well as suggestions for changes or improvements. If you are eager to contribute to this project, please go ahead and submit feature requests or pull requests.

Middleware for Step Functions: Automatically Store and Load Payloads from Amazon S3 zirkelc / middy-store

Middleware for Step Functions: Automatically Store and Load Payloads

Middleware for Step Functions: Automatically Store and Load Payloads from Amazon S3 Middleware for Step Functions: Automatically Store and Load Payloads from Amazon S3 Middleware for Step Functions: Automatically Store and Load Payloads from Amazon S3

Middleware middy-store

middy-store is a middleware for Lambda that automatically stores and loads payloads from and to a Store like Amazon S3 or potentially other services.

Installation

You will need @middy/core >= v5 to use middy-store Please be aware that the API is not stable yet and might change in the future. To avoid accidental breaking changes, please pin the version of middy-store and its sub-packages in your package.json to an exact version.

npm install --save-exact @middy/core middy-store middy-store-s3 
Enter fullscreen mode Exit fullscreen mode

Motivation

AWS services have certain limits that one must be aware of. For example, AWS Lambda has a payload limit of 6MB for synchronous invocations and 256KB for asynchronous invocations. AWS Step Functions allows for a maximum input or output size of 256KB of data as a UTF-8 encoded string. If you exceed this limit when returning data, you will encounter the infamous States.DataLimitExceeded exception.

Middleware for Step Functions: Automatically Store and Load Payloads from Amazon S3

The usual workaround for this…

View on GitHub

以上是Step Functions 中间件:自动存储和加载来自 Amazon S3 的有效负载的详细内容。更多信息请关注PHP中文网其他相关文章!

声明:
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系admin@php.cn