Home  >  Article  >  Backend Development  >  Convert parquet file to Golang structure with nested elements

Convert parquet file to Golang structure with nested elements

王林
王林forward
2024-02-10 19:10:08769browse

将 parquet 文件转换为带有嵌套元素的 Golang 结构

php Editor Shinichi will introduce how to convert parquet files into Golang structures with nested elements. Parquet is an efficient columnar storage format, and Golang is a powerful programming language. Combining them can help us better process and analyze large amounts of data. By using appropriate libraries and techniques, we can easily parse parquet files into Golang structures and can handle nested elements for better organization and manipulation of data. This article will introduce the implementation steps and precautions in detail to help readers get started easily.

Question content

I'm trying to read a parquet file with nested arrays/structures in go using the xitongsys/parquet-go library. The list data is not read and no values ​​are seen. Below is my structure in golang

type Play struct {
    SID            string   `parquet:"name=si, type=BYTE_ARRAY, convertedtype=UTF8, encoding=PLAIN_DICTIONARY, repetitiontype=OPTIONAL" json:"si,omitempty"`
    TimeStamp      int      `parquet:"name=ts, type=INT64, repetitiontype=OPTIONAL" json:"ts,omitempty"`
    SingleID       int      `parquet:"name=sg, type=INT64, repetitiontype=OPTIONAL" json:"sg,omitempty"`
    PID            int      `parquet:"name=playid, type=INT64, repetitiontype=OPTIONAL" json:"playid,omitempty"`
    StartTimeStamp string   `parquet:"name=startts, type=BYTE_ARRAY,repetitiontype=OPTIONAL"`
    Price          []Price1 `parquet:"name=price, type=LIST, repetitiontype=REQUIRED" json:"price,omitempty"`
}

type Price1 struct {
    CurrID int    `parquet:"name=currId, type=INT64, repetitiontype=REQUIRED" json:"currId,omitempty"`
    LPTag  string `parquet:"name=lptag, type=BYTE_ARRAY,convertedtype=UTF8, repetitiontype=REQUIRED" json:"lptag,omitempty"`
    LPrice Money  `parquet:"name=lpmoney, type=STRUCT" json:"lpmoney,omitempty"`
}

type Money struct {
    AdmCurrCode  string `parquet:"name=admCC, type=BYTE_ARRAY, repetitiontype=OPTIONAL" json:"admCC,omitempty"`
    AdmCurrValue string `parquet:"name=admCV, type=BYTE_ARRAY" json:"admCV,omitempty"`
}

currid and lptag are empty even though the parquet file has valid values

Solution

I found thatgithub.com/segmentio/parquet-gothe package can be correct Read the file. Do you need to stick with the github.com/xitongsys/parquet-go package?

package main

import (
    "fmt"

    "github.com/segmentio/parquet-go"
)

type Play struct {
    SID            string  `parquet:"si"`
    TimeStamp      int     `parquet:"ts"`
    SingleID       int     `parquet:"sg"`
    PID            int     `parquet:"playid"`
    StartTimeStamp string  `parquet:"startts"`
    Price          []Price `parquet:"price,list"`
}

type Price struct {
    CurrID int    `parquet:"currId"`
    LPTag  string `parquet:"lptag"`
    LPrice Money  `parquet:"lpmoney"`
}

type Money struct {
    AdmCurrCode  string `parquet:"admCC"`
    AdmCurrValue string `parquet:"admCV"`
}

func main() {
    rows, err := parquet.ReadFile[Play]("s3.parquet")
    if err != nil {
        panic(err)
    }

    for _, c := range rows {
        fmt.Printf("%+v\n", c)
    }
}

The above is the detailed content of Convert parquet file to Golang structure with nested elements. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:stackoverflow.com. If there is any infringement, please contact admin@php.cn delete