首頁  >  文章  >  後端開發  >  JSON、FlatBuffers、Protocol Buffers

JSON、FlatBuffers、Protocol Buffers

WBOY
WBOY原創
2024-08-08 01:31:54242瀏覽

當我們考慮服務/微服務之間的通訊時,首先想到的選項是古老的 JSON。這並不是沒有道理的,因為這種格式有其優點,例如:

  • 電腦和人類都很容易閱讀;
  • 所有現代程式語言都可以讀取並產生 JSON;
  • 它比之前的替代方案 Jurassic XML 簡潔得多。

使用 JSON 是公司日常生活中開發的絕大多數 API 的建議。但在某些情況下,性能至關重要,我們可能需要考慮其他替代方案。這篇文章旨在展示應用程式之間通訊時 JSON 的兩種替代方案。

但是 JSON 有什麼問題呢?它的優點之一是“易於人類閱讀”,但這可能是性能方面的弱點。事實上,我們需要將 JSON 內容轉換為我們所使用的程式語言已知的某種結構。此規則的例外是如果我們使用 JavaScript,因為 JSON 是它的本機。但是,如果您使用另一種語言(例如 Go),我們需要解析數據,如下面的(不完整)程式碼範例所示:

type event struct {
    ID      uuid.UUID
    Type    string `json:"type"`
    Source  string `json:"source"`
    Subject string `json:"subject"`
    Time    string `json:"time"`
    Data    string `json:"data"`
}

var e event
err := json.NewDecoder(data).Decode(&e)
if err != nil {
    http.Error(w, err.Error(), http.StatusBadRequest)
}

為了解決這個問題,我們可以測試兩個替代方案,Protocol Buffers 和 Flatbuffers。

協定緩衝區

Protobuf(協議緩衝區),由 Google 創建,根據官方網站:

協定緩衝區是 Google 的語言中立、平台中立、可擴展的序列化結構化資料機制 - 類似於 XML,但更小、更快、更簡單。您可以一次定義資料的結構方式。然後,您可以使用專門產生的原始程式碼,使用多種語言在各種資料流中快速寫入和讀取結構化資料。

Protobuf 通常與 gRPC 結合使用(但不一定),它是一種二進位協議,與 JSON 文字格式相比,它顯著提高了效能。但它「遭受」與 JSON 相同的問題:我們需要將其解析為我們語言的資料結構。例如,在 Go 中:

//generated code
type Event struct {
    state         protoimpl.MessageState
    sizeCache     protoimpl.SizeCache
    unknownFields protoimpl.UnknownFields

    Type    string `protobuf:"bytes,1,opt,name=type,proto3" json:"type,omitempty"`
    Subject string `protobuf:"bytes,2,opt,name=subject,proto3" json:"subject,omitempty"`
    Source  string `protobuf:"bytes,3,opt,name=source,proto3" json:"source,omitempty"`
    Time    string `protobuf:"bytes,4,opt,name=time,proto3" json:"time,omitempty"`
    Data    string `protobuf:"bytes,5,opt,name=data,proto3" json:"data,omitempty"`
}

e := Event{}
err := proto.Unmarshal(data, &e)
if err != nil {
    http.Error(w, err.Error(), http.StatusBadRequest)
}

採用二進位協定為我們帶來了效能提升,但我們仍然需要解決資料解析的問題。我們的第三個競爭對手致力於解決這個問題。

平面緩衝區

依官網:

FlatBuffers 是一個高效率的跨平台序列化函式庫,適用於 C++、C#、C、Go、Java、Kotlin、JavaScript、Lobster、Lua、TypeScript、PHP、Python、Rust 和 Swift。它最初是在 Google 創建的,用於遊戲開發和其他性能關鍵型應用程式。

雖然最初是為了遊戲開發而創建的,但它非常適合我們在本文中研究的環境。它的優點是,除了是二進位協定之外,我們不需要解析資料。例如,在 Go 中:

//generated code
e := events.GetRootAsEvent(data, 0)

//we can use the data directly
saveEvent(string(e.Type()), string(e.Source()), string(e.Subject()), string(e.Time()), string(e.Data()))

但這兩種替代方案比 JSON 效能提高了多少?讓我們調查一下...

應用

我想到的第一個問題是「我如何將其應用到實際場景中?」。我想像了以下場景:

一家擁有行動應用程式的公司,每天有數百萬客戶訪問,具有內部微服務架構,需要保存用戶和系統生成的事件以用於審計目的。

這是一個真實的場景。如此真實,以至於我在工作的公司每天都生活在其中:)

JSON vs FlatBuffers vs Protocol Buffers

注意:上面的場景是一個簡化的情況,並不代表團隊應用程式的實際複雜度。它具有教育目的。

第一步是在 Protocol Buffers 和 Flatbuffers 中定義一個事件。兩者都有自己的用於定義模式的語言,然後我們可以使用它來產生我們將使用的語言的程式碼。我不會深入研究每個方案的細節,因為這很容易在文件中找到。

檔案 event.proto 具有 Protocol Buffer 定義:

syntax = "proto3";
package events;

option go_package = "./events_pb";

message Event {
    string type = 1;
    string subject = 2;
    string source = 3;
    string time = 4;
    string data = 5;
}

檔案 event.fbs 在 Flatbuffers 中具有等效項:

namespace events;

table Event {
    type: string;
    subject:string;
    source:string;
    time:string;
    data:string;
}

root_type Event;

下一步是使用這些定義來產生必要的程式碼。以下命令在 macOS 上安裝相依性:

go install google.golang.org/protobuf/cmd/protoc-gen-go@latest
brew install protobuf
protoc -I=. --go_out=./ event.proto
brew install flatbuffers
flatc --go event.fbs

結果是創建了 Go 套件來操作每種格式的資料。

滿足要求後,下一步就是實作事件 API。 main.go 看起來像這樣:

package main

import (
    "fmt"
    "net/http"
    "os"

    "github.com/go-chi/chi/v5"
    "github.com/go-chi/chi/v5/middleware"
    "github.com/google/uuid"
)

func main() {
    r := handlers()
    http.ListenAndServe(":3000", r)
}

func handlers() *chi.Mux {
    r := chi.NewRouter()
    if os.Getenv("DEBUG") != "false" {
        r.Use(middleware.Logger)
    }
    r.Post("/json", processJSON())
    r.Post("/fb", processFB())
    r.Post("/pb", processPB())
    return r
}

func saveEvent(evType, source, subject, time, data string) {
    if os.Getenv("DEBUG") != "false" {
        id := uuid.New()
        q := fmt.Sprintf("insert into event values('%s', '%s', '%s', '%s', '%s', '%s')", id, evType, source, subject, time, data)
        fmt.Println(q)
    }
    // save event to database
}

為了更好的組織,我建立了檔案來分隔每個函數​​,如下所示:

package main

import (
    "encoding/json"
    "net/http"

    "github.com/google/uuid"
)

type event struct {
    ID      uuid.UUID
    Type    string `json:"type"`
    Source  string `json:"source"`
    Subject string `json:"subject"`
    Time    string `json:"time"`
    Data    string `json:"data"`
}

func processJSON() http.HandlerFunc {
    return func(w http.ResponseWriter, r *http.Request) {
        var e event
        err := json.NewDecoder(r.Body).Decode(&e)
        if err != nil {
            http.Error(w, err.Error(), http.StatusBadRequest)
        }
        saveEvent(e.Type, e.Source, e.Subject, e.Time, e.Data)
        w.WriteHeader(http.StatusCreated)
        w.Write([]byte("json received"))
    }
}

package main

import (
    "io"
    "net/http"

    "github.com/eminetto/post-flatbuffers/events_pb"
    "google.golang.org/protobuf/proto"
)

func processPB() http.HandlerFunc {
    return func(w http.ResponseWriter, r *http.Request) {
        body := r.Body
        data, _ := io.ReadAll(body)

        e := events_pb.Event{}
        err := proto.Unmarshal(data, &e)
        if err != nil {
            http.Error(w, err.Error(), http.StatusBadRequest)
        }
        saveEvent(e.GetType(), e.GetSource(), e.GetSubject(), e.GetTime(), e.GetData())
        w.WriteHeader(http.StatusCreated)
        w.Write([]byte("protobuf received"))
    }
}
package main

import (
    "io"
    "net/http"

    "github.com/eminetto/post-flatbuffers/events"
)

func processFB() http.HandlerFunc {
    return func(w http.ResponseWriter, r *http.Request) {
        body := r.Body
        data, _ := io.ReadAll(body)
        e := events.GetRootAsEvent(data, 0)
        saveEvent(string(e.Type()), string(e.Source()), string(e.Subject()), string(e.Time()), string(e.Data()))
        w.WriteHeader(http.StatusCreated)
        w.Write([]byte("flatbuffer received"))
    }
}

In the functions processPB() and processFB(), we can see how the generated packages are used to manipulate the data.

Benchmark

The last step of our proof of concept is generating the benchmark to compare the formats. I used the Go stdlib benchmark package for this.

The file main_test.go has tests for each format:

package main

import (
    "bytes"
    "fmt"
    "net/http"
    "net/http/httptest"
    "os"
    "strings"
    "testing"

    "github.com/eminetto/post-flatbuffers/events"
    "github.com/eminetto/post-flatbuffers/events_pb"
    flatbuffers "github.com/google/flatbuffers/go"
    "google.golang.org/protobuf/proto"
)

func benchSetup() {
    os.Setenv("DEBUG", "false")
}

func BenchmarkJSON(b *testing.B) {
    benchSetup()
    r := handlers()
    payload := fmt.Sprintf(`{
        "type": "button.clicked",
        "source": "Login",
        "subject": "user1000",
        "time": "2018-04-05T17:31:00Z",
        "data": "User clicked because X"}`)
    for i := 0; i < b.N; i++ {
        w := httptest.NewRecorder()
        req, _ := http.NewRequest("POST", "/json", strings.NewReader(payload))
        r.ServeHTTP(w, req)
        if w.Code != http.StatusCreated {
            b.Errorf("expected status 201, got %d", w.Code)
        }
    }
}

func BenchmarkFlatBuffers(b *testing.B) {
    benchSetup()
    r := handlers()
    builder := flatbuffers.NewBuilder(1024)
    evtType := builder.CreateString("button.clicked")
    evtSource := builder.CreateString("service-b")
    evtSubject := builder.CreateString("user1000")
    evtTime := builder.CreateString("2018-04-05T17:31:00Z")
    evtData := builder.CreateString("User clicked because X")

    events.EventStart(builder)
    events.EventAddType(builder, evtType)
    events.EventAddSource(builder, evtSource)
    events.EventAddSubject(builder, evtSubject)
    events.EventAddTime(builder, evtTime)
    events.EventAddData(builder, evtData)
    evt := events.EventEnd(builder)
    builder.Finish(evt)

    buff := builder.FinishedBytes()
    for i := 0; i < b.N; i++ {
        w := httptest.NewRecorder()
        req, _ := http.NewRequest("POST", "/fb", bytes.NewReader(buff))
        r.ServeHTTP(w, req)
        if w.Code != http.StatusCreated {
            b.Errorf("expected status 201, got %d", w.Code)
        }
    }
}

func BenchmarkProtobuffer(b *testing.B) {
    benchSetup()
    r := handlers()
    evt := events_pb.Event{
        Type:    "button.clicked",
        Subject: "user1000",
        Source:  "service-b",
        Time:    "2018-04-05T17:31:00Z",
        Data:    "User clicked because X",
    }
    payload, err := proto.Marshal(&evt)
    if err != nil {
        panic(err)
    }
    for i := 0; i < b.N; i++ {
        w := httptest.NewRecorder()
        req, _ := http.NewRequest("POST", "/pb", bytes.NewReader(payload))
        r.ServeHTTP(w, req)
        if w.Code != http.StatusCreated {
            b.Errorf("expected status 201, got %d", w.Code)
        }
    }
}

It generates an event in each format and sends it to the API.

When we run the benchmark, we have the following result:

Running tool: /opt/homebrew/bin/go test -benchmem -run=^$ -coverprofile=/var/folders/vn/gff4w90d37xbfc_2tn3616h40000gn/T/vscode-gojAS4GO/go-code-cover -bench . github.com/eminetto/post-flatbuffers/cmd/api -failfast -v

goos: darwin
goarch: arm64
pkg: github.com/eminetto/post-flatbuffers/cmd/api
BenchmarkJSON
BenchmarkJSON-8               658386          1732 ns/op        2288 B/op         26 allocs/op
BenchmarkFlatBuffers
BenchmarkFlatBuffers-8       1749194           640.5 ns/op      1856 B/op         21 allocs/op
BenchmarkProtobuffer
BenchmarkProtobuffer-8       1497356           696.9 ns/op      1952 B/op         21 allocs/op
PASS
coverage: 77.5% of statements
ok      github.com/eminetto/post-flatbuffers/cmd/api    5.042s

If this is the first time you have analyzed the results of a Go benchmark, I recommend reading this post, where the author describes the details of each column and its meaning.

To make it easier to visualize, I created graphs for the most critical information generated by the benchmark:

‌Number of iterations (higher is better)

JSON vs FlatBuffers vs Protocol Buffers

Nanoseconds per operation (lower is better)

JSON vs FlatBuffers vs Protocol Buffers

Number of bytes allocated per operation (lower is better)

JSON vs FlatBuffers vs Protocol Buffers

Number of allocations per operation (lower is better)

JSON vs FlatBuffers vs Protocol Buffers

Conclusion

The numbers show a great advantage of binary protocols over JSON, especially Flatbuffers. This advantage is that we do not need to parse the data into structures of the language we are using.

Should you refactor your applications to replace JSON with Flatbuffers? Not necessarily. Performance is just one factor that teams must consider when selecting a communication protocol between their services and applications. But if your application receives billions of requests per day, performance improvements like those presented in this post can make a big difference in terms of costs and user experience.

The codes presented here can be found in this repository. I made the examples using the Go language, but both Protocol Buffers and Flatbuffers support different programming languages, so I would love to see other versions of these comparisons. Additionally, other benchmarks can be used, such as network consumption, CPU, etc. (since we only compare memory here).

I hope this post serves as an introduction to these formats and an incentive for new tests and experiments.

Originally published at https://eltonminetto.dev on August 05, 2024

以上是JSON、FlatBuffers、Protocol Buffers的詳細內容。更多資訊請關注PHP中文網其他相關文章!

陳述:
本文內容由網友自願投稿,版權歸原作者所有。本站不承擔相應的法律責任。如發現涉嫌抄襲或侵權的內容,請聯絡admin@php.cn