首页  >  文章  >  后端开发  >  JSON、FlatBuffers、Protocol Buffers

JSON、FlatBuffers、Protocol Buffers

WBOY
WBOY原创
2024-08-08 01:31:54426浏览

当我们考虑服务/微服务之间的通信时,首先想到的选项是古老的 JSON。这并不是没有道理的,因为这种格式有其优点,例如:

  • 计算机和人类都很容易阅读;
  • 所有现代编程语言都可以读取并生成 JSON;
  • 它比之前的替代方案 Jurassic XML 简洁得多。

使用 JSON 是公司日常生活中开发的绝大多数 API 的建议。但在某些情况下,性能至关重要,我们可能需要考虑其他替代方案。这篇文章旨在展示应用程序之间通信时 JSON 的两种替代方案。

但是 JSON 有什么问题呢?它的优点之一是“易于人类阅读”,但这可能是性能方面的弱点。事实上,我们需要将 JSON 内容转换为我们所使用的编程语言已知的某种结构。此规则的一个例外是如果我们使用 JavaScript,因为 JSON 是它的本机。但是,如果您使用另一种语言(例如 Go),我们需要解析数据,如下面的(不完整)代码示例所示:

type event struct {
    ID      uuid.UUID
    Type    string `json:"type"`
    Source  string `json:"source"`
    Subject string `json:"subject"`
    Time    string `json:"time"`
    Data    string `json:"data"`
}

var e event
err := json.NewDecoder(data).Decode(&e)
if err != nil {
    http.Error(w, err.Error(), http.StatusBadRequest)
}

为了解决这个问题,我们可以测试两种替代方案,Protocol Buffers 和 Flatbuffers。

协议缓冲区

Protobuf(协议缓冲区),由 Google 创建,根据官方网站:

协议缓冲区是 Google 的语言中立、平台中立、可扩展的序列化结构化数据机制 - 类似于 XML,但更小、更快、更简单。您可以一次性定义数据的结构方式。然后,您可以使用专门生成的源代码,使用多种语言在各种数据流中快速写入和读取结构化数据。

Protobuf 通常与 gRPC 结合使用(但不一定),它是一种二进制协议,与 JSON 文本格式相比,它显着提高了性能。但它“遭受”与 JSON 相同的问题:我们需要将其解析为我们语言的数据结构。例如,在 Go 中:

//generated code
type Event struct {
    state         protoimpl.MessageState
    sizeCache     protoimpl.SizeCache
    unknownFields protoimpl.UnknownFields

    Type    string `protobuf:"bytes,1,opt,name=type,proto3" json:"type,omitempty"`
    Subject string `protobuf:"bytes,2,opt,name=subject,proto3" json:"subject,omitempty"`
    Source  string `protobuf:"bytes,3,opt,name=source,proto3" json:"source,omitempty"`
    Time    string `protobuf:"bytes,4,opt,name=time,proto3" json:"time,omitempty"`
    Data    string `protobuf:"bytes,5,opt,name=data,proto3" json:"data,omitempty"`
}

e := Event{}
err := proto.Unmarshal(data, &e)
if err != nil {
    http.Error(w, err.Error(), http.StatusBadRequest)
}

采用二进制协议给我们带来了性能提升,但我们仍然需要解决数据解析的问题。我们的第三个竞争对手致力于解决这个问题。

平面缓冲区

根据官网:

FlatBuffers 是一个高效的跨平台序列化库,适用于 C++、C#、C、Go、Java、Kotlin、JavaScript、Lobster、Lua、TypeScript、PHP、Python、Rust 和 Swift。它最初是在 Google 创建的,用于游戏开发和其他性能关键型应用程序。

虽然最初是为了游戏开发而创建的,但它非常适合我们在本文中研究的环境。它的优点是,除了是二进制协议之外,我们不需要解析数据。例如,在 Go 中:

//generated code
e := events.GetRootAsEvent(data, 0)

//we can use the data directly
saveEvent(string(e.Type()), string(e.Source()), string(e.Subject()), string(e.Time()), string(e.Data()))

但这两种替代方案比 JSON 性能提高了多少?让我们调查一下...

应用

我想到的第一个问题是“我如何将其应用到实际场景中?”。我想象了以下场景:

一家拥有移动应用程序的公司,每天有数百万客户访问,具有内部微服务架构,需要保存用户和系统生成的事件以用于审计目的。

这是一个真实的场景。如此真实,以至于我在工作的公司每天都生活在其中:)

JSON vs FlatBuffers vs Protocol Buffers

注意:上面的场景是一个简化的情况,并不代表团队应用程序的实际复杂度。它具有教育目的。

第一步是在 Protocol Buffers 和 Flatbuffers 中定义一个事件。两者都有自己的用于定义模式的语言,然后我们可以使用它来生成我们将使用的语言的代码。我不会深入研究每个方案的细节,因为这很容易在文档中找到。

文件 event.proto 具有 Protocol Buffer 定义:

syntax = "proto3";
package events;

option go_package = "./events_pb";

message Event {
    string type = 1;
    string subject = 2;
    string source = 3;
    string time = 4;
    string data = 5;
}

文件 event.fbs 在 Flatbuffers 中具有等效项:

namespace events;

table Event {
    type: string;
    subject:string;
    source:string;
    time:string;
    data:string;
}

root_type Event;

下一步是使用这些定义来生成必要的代码。以下命令在 macOS 上安装依赖项:

go install google.golang.org/protobuf/cmd/protoc-gen-go@latest
brew install protobuf
protoc -I=. --go_out=./ event.proto
brew install flatbuffers
flatc --go event.fbs

结果是创建了 Go 包来操作每种格式的数据。

满足要求后,下一步就是实现事件 API。 main.go 看起来像这样:

package main

import (
    "fmt"
    "net/http"
    "os"

    "github.com/go-chi/chi/v5"
    "github.com/go-chi/chi/v5/middleware"
    "github.com/google/uuid"
)

func main() {
    r := handlers()
    http.ListenAndServe(":3000", r)
}

func handlers() *chi.Mux {
    r := chi.NewRouter()
    if os.Getenv("DEBUG") != "false" {
        r.Use(middleware.Logger)
    }
    r.Post("/json", processJSON())
    r.Post("/fb", processFB())
    r.Post("/pb", processPB())
    return r
}

func saveEvent(evType, source, subject, time, data string) {
    if os.Getenv("DEBUG") != "false" {
        id := uuid.New()
        q := fmt.Sprintf("insert into event values('%s', '%s', '%s', '%s', '%s', '%s')", id, evType, source, subject, time, data)
        fmt.Println(q)
    }
    // save event to database
}

为了更好的组织,我创建了文件来分隔每个函数,如下所示:

package main

import (
    "encoding/json"
    "net/http"

    "github.com/google/uuid"
)

type event struct {
    ID      uuid.UUID
    Type    string `json:"type"`
    Source  string `json:"source"`
    Subject string `json:"subject"`
    Time    string `json:"time"`
    Data    string `json:"data"`
}

func processJSON() http.HandlerFunc {
    return func(w http.ResponseWriter, r *http.Request) {
        var e event
        err := json.NewDecoder(r.Body).Decode(&e)
        if err != nil {
            http.Error(w, err.Error(), http.StatusBadRequest)
        }
        saveEvent(e.Type, e.Source, e.Subject, e.Time, e.Data)
        w.WriteHeader(http.StatusCreated)
        w.Write([]byte("json received"))
    }
}

package main

import (
    "io"
    "net/http"

    "github.com/eminetto/post-flatbuffers/events_pb"
    "google.golang.org/protobuf/proto"
)

func processPB() http.HandlerFunc {
    return func(w http.ResponseWriter, r *http.Request) {
        body := r.Body
        data, _ := io.ReadAll(body)

        e := events_pb.Event{}
        err := proto.Unmarshal(data, &e)
        if err != nil {
            http.Error(w, err.Error(), http.StatusBadRequest)
        }
        saveEvent(e.GetType(), e.GetSource(), e.GetSubject(), e.GetTime(), e.GetData())
        w.WriteHeader(http.StatusCreated)
        w.Write([]byte("protobuf received"))
    }
}
package main

import (
    "io"
    "net/http"

    "github.com/eminetto/post-flatbuffers/events"
)

func processFB() http.HandlerFunc {
    return func(w http.ResponseWriter, r *http.Request) {
        body := r.Body
        data, _ := io.ReadAll(body)
        e := events.GetRootAsEvent(data, 0)
        saveEvent(string(e.Type()), string(e.Source()), string(e.Subject()), string(e.Time()), string(e.Data()))
        w.WriteHeader(http.StatusCreated)
        w.Write([]byte("flatbuffer received"))
    }
}

In the functions processPB() and processFB(), we can see how the generated packages are used to manipulate the data.

Benchmark

The last step of our proof of concept is generating the benchmark to compare the formats. I used the Go stdlib benchmark package for this.

The file main_test.go has tests for each format:

package main

import (
    "bytes"
    "fmt"
    "net/http"
    "net/http/httptest"
    "os"
    "strings"
    "testing"

    "github.com/eminetto/post-flatbuffers/events"
    "github.com/eminetto/post-flatbuffers/events_pb"
    flatbuffers "github.com/google/flatbuffers/go"
    "google.golang.org/protobuf/proto"
)

func benchSetup() {
    os.Setenv("DEBUG", "false")
}

func BenchmarkJSON(b *testing.B) {
    benchSetup()
    r := handlers()
    payload := fmt.Sprintf(`{
        "type": "button.clicked",
        "source": "Login",
        "subject": "user1000",
        "time": "2018-04-05T17:31:00Z",
        "data": "User clicked because X"}`)
    for i := 0; i < b.N; i++ {
        w := httptest.NewRecorder()
        req, _ := http.NewRequest("POST", "/json", strings.NewReader(payload))
        r.ServeHTTP(w, req)
        if w.Code != http.StatusCreated {
            b.Errorf("expected status 201, got %d", w.Code)
        }
    }
}

func BenchmarkFlatBuffers(b *testing.B) {
    benchSetup()
    r := handlers()
    builder := flatbuffers.NewBuilder(1024)
    evtType := builder.CreateString("button.clicked")
    evtSource := builder.CreateString("service-b")
    evtSubject := builder.CreateString("user1000")
    evtTime := builder.CreateString("2018-04-05T17:31:00Z")
    evtData := builder.CreateString("User clicked because X")

    events.EventStart(builder)
    events.EventAddType(builder, evtType)
    events.EventAddSource(builder, evtSource)
    events.EventAddSubject(builder, evtSubject)
    events.EventAddTime(builder, evtTime)
    events.EventAddData(builder, evtData)
    evt := events.EventEnd(builder)
    builder.Finish(evt)

    buff := builder.FinishedBytes()
    for i := 0; i < b.N; i++ {
        w := httptest.NewRecorder()
        req, _ := http.NewRequest("POST", "/fb", bytes.NewReader(buff))
        r.ServeHTTP(w, req)
        if w.Code != http.StatusCreated {
            b.Errorf("expected status 201, got %d", w.Code)
        }
    }
}

func BenchmarkProtobuffer(b *testing.B) {
    benchSetup()
    r := handlers()
    evt := events_pb.Event{
        Type:    "button.clicked",
        Subject: "user1000",
        Source:  "service-b",
        Time:    "2018-04-05T17:31:00Z",
        Data:    "User clicked because X",
    }
    payload, err := proto.Marshal(&evt)
    if err != nil {
        panic(err)
    }
    for i := 0; i < b.N; i++ {
        w := httptest.NewRecorder()
        req, _ := http.NewRequest("POST", "/pb", bytes.NewReader(payload))
        r.ServeHTTP(w, req)
        if w.Code != http.StatusCreated {
            b.Errorf("expected status 201, got %d", w.Code)
        }
    }
}

It generates an event in each format and sends it to the API.

When we run the benchmark, we have the following result:

Running tool: /opt/homebrew/bin/go test -benchmem -run=^$ -coverprofile=/var/folders/vn/gff4w90d37xbfc_2tn3616h40000gn/T/vscode-gojAS4GO/go-code-cover -bench . github.com/eminetto/post-flatbuffers/cmd/api -failfast -v

goos: darwin
goarch: arm64
pkg: github.com/eminetto/post-flatbuffers/cmd/api
BenchmarkJSON
BenchmarkJSON-8               658386          1732 ns/op        2288 B/op         26 allocs/op
BenchmarkFlatBuffers
BenchmarkFlatBuffers-8       1749194           640.5 ns/op      1856 B/op         21 allocs/op
BenchmarkProtobuffer
BenchmarkProtobuffer-8       1497356           696.9 ns/op      1952 B/op         21 allocs/op
PASS
coverage: 77.5% of statements
ok      github.com/eminetto/post-flatbuffers/cmd/api    5.042s

If this is the first time you have analyzed the results of a Go benchmark, I recommend reading this post, where the author describes the details of each column and its meaning.

To make it easier to visualize, I created graphs for the most critical information generated by the benchmark:

‌Number of iterations (higher is better)

JSON vs FlatBuffers vs Protocol Buffers

Nanoseconds per operation (lower is better)

JSON vs FlatBuffers vs Protocol Buffers

Number of bytes allocated per operation (lower is better)

JSON vs FlatBuffers vs Protocol Buffers

Number of allocations per operation (lower is better)

JSON vs FlatBuffers vs Protocol Buffers

Conclusion

The numbers show a great advantage of binary protocols over JSON, especially Flatbuffers. This advantage is that we do not need to parse the data into structures of the language we are using.

Should you refactor your applications to replace JSON with Flatbuffers? Not necessarily. Performance is just one factor that teams must consider when selecting a communication protocol between their services and applications. But if your application receives billions of requests per day, performance improvements like those presented in this post can make a big difference in terms of costs and user experience.

The codes presented here can be found in this repository. I made the examples using the Go language, but both Protocol Buffers and Flatbuffers support different programming languages, so I would love to see other versions of these comparisons. Additionally, other benchmarks can be used, such as network consumption, CPU, etc. (since we only compare memory here).

I hope this post serves as an introduction to these formats and an incentive for new tests and experiments.

Originally published at https://eltonminetto.dev on August 05, 2024

以上是JSON、FlatBuffers、Protocol Buffers的详细内容。更多信息请关注PHP中文网其他相关文章!

声明:
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系admin@php.cn