阿夫羅文件

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB原創: 2024-08-30 16:17:321165瀏覽

Avro 是一個基於行的社交系統，可以使用 Apache Hadoop 開發的資料序列化框架。 Avro檔案是一種可以承載資料序列化的資料文件，用於將資料序列化為緊湊的二進位格式。當我們使用 Apache MapReduce 嘗試時，模式將採用 JSON 格式；然後，當我們有需要分配到子集的巨大資料集時，這些檔案可以保留標記。它還具有一個容器文件，用於保留可輕鬆讀寫的謹慎資料；無需進行額外配置。

開始您的免費軟體開發課程

網頁開發、程式語言、軟體測試及其他

Avro 文件概述

Avro 檔案是一個資料序列化系統，可以提供大型資料結構和緊湊、快速的二進位資料格式。它還可以有容器文件，可以攜帶連續的資料並使用RPC程式。此外，由於它整合簡單，可以與多種語言一起使用，因此讀取或寫入資料檔案時不需要建立新的程式碼，並且不需要建立程式碼。它只能使用嚴格類型語言進行部署。

通常它有兩個部分：第一個是可以自願的模式，第二個是二進位資料。例如，假設我們想使用文字編輯器來查看 avro 檔案。在這種情況下，我們可以查看兩個段，其中第一個段將包含以物件開頭的數據，第二個段將包含可以讀取的數據以及我們需要的文件類型確認哪個 Bhoomi 能夠讀寫。

Avro 檔案配置

讓我們看看 Avro 檔案的配置，在其中我們可以藉助不同的結構化參數來轉換 Avro 資料檔的操作。

當我們使用 Hadoop 時，

如果我們想要設定 avro 文件，我們不希望實作‘. Avro 的擴充功能在讀取文件時允許使用「avro.mapred.ignore.inputs.without.extension」調整參數，預設值為 false。

阿夫羅文件

對於上述內容，我們必須先到達 Spark，然後是 Spark 內容，然後是 Hadoop 配置，然後我們必須設定(“avro.mapred.ignore.inputs.without.extension”, “true”)。

當我們嘗試配置壓縮時，我們必須設定以下屬性，

壓縮編解碼器——spark.sql.avro.compression.codec 有一個 snappy 和 deflates 編解碼器，其中 snappy 是預設編解碼器。
如果我們想要將壓縮編解碼器設為 deflate，那麼我們必須將壓縮等級調整為“spark.sql.avro.deflate.level”，預設等級為“-1”。
我們還可以調整spark簇中的東西，例如

spark.conf.set(“spark.sql.avro.compression.codec”, “deflate”)

spark.conf.set(“spark.sql.avro.deflate.level”, “4”).

阿夫羅文件

Avro 檔案的類型

Avro 檔案有兩種類型，

1.原始型

包括 null、Boolean、int、long、double、bytes 和 string。

Schema: {"type": "null"}

2.複雜型

陣列:

{
"kind": "array"
"objects": "long"
}

map：鍵是字串

{
"kind": "map"
"values": "long"
}

記錄：

{
"kind": "record",
"name": "---",
"doc": "---",
"area": [
{"name": "--", "type": "int"},
---
]
}

列舉：

{
"kind": "enum",
"name": "---",
"doc": "---",
"symbols": ["--", "--"]
}

已修正： 它有 8 位元無符號位元組

{
"kind": "fixed",
"name": "---",
"size": in bytes
}

聯合：資料將等於模式

[
"null",
"string",
--
]

Avro 檔案範例

讓我們看看有架構和不帶架構的 avro 檔案的範例，

範例#1

使用架構的 Avro 檔案：

import java.util.Properties
import java.io.InputStream
import com.boomi.execution.ExecutionUtil
import org.apache.avro.Schema;
import org.apache.avro.file.DataFileStream;
import org.apache.avro.generic.GenericDatumReader;
import org.apache.avro.generic.GenericRecord;
import org.apache.avro.io.DatumReader;
logger = ExecutionUtil.getBaseLogger();
for (int j = 0; j < dataContext.getDataCount(); j++) {
InputStream istm = dataContext.getStream(j)
Properties prop = dataContext.getProperties(j)
DatumReader<GenericRecord> datumReader = new GenericDatumReader<GenericRecord>();
DataFileStream<GenericRecord> dataFileStream = new DataFileStream<GenericRecord>(istm, datumReader);
Schema sche = dataFileStream.getSchema();
logger.info("Schema utilize for: " + sche);
GenericRecord rec = null;
while (dataFileStream.hasNext()) {
rec = dataFileStream.next(rec);
System.out.println(rec);
istm = new ByteArrayInputStream(rec.toString().getBytes('UTF-8'))
dataContext.storeStream(istm, prop)
}
}

在上面的範例中，schema 與 avro 檔案一起使用，我們可以說這是一個可以讀取 avro 檔案的腳本，在這個範例中，我們產生了多個 JSON 文件。我們已經導入了相關的包，設定了模式，並透過建立物件並使用上面腳本中給出的程式碼將資料寫入 JSON 來呼叫它。

範例#2

沒有架構的 Avro 檔案：

import java.util.Properties
import java.io.InputStream
import com.boomi.execution.ExecutionUtil
import org.apache.avro.Schema;
import org.apache.avro.file.DataFileStream;
import org.apache.avro.generic.GenericDatumReader;
import org.apache.avro.generic.GenericRecord;
import org.apache.avro.io.DatumReader;
logger = ExecutionUtil.getBaseLogger();
String schemaString = '{"type":"record","name":"college","namespace":"student.avro",' +
'"fields":[{"name":"title","type":"string","doc":"college title"},{"name":"exam_date","type":"string","sub":"start date"},{"name":"teacher","type":"int","sub":"main charactor is the teacher in college"}]}'
for (int k = 0; k < dataContext.getDataCount(); k++) {
InputStream istm = dataContext.getStream(k)
Properties prop = dataContext.getProperties(k)
DatumReader<GenericRecord> datumReader = new GenericDatumReader<GenericRecord>();
DataFileStream<GenericRecord> dataFileStre= new DataFileStream<GenericRecord>(istm, datumReader);
Schema sche = Schema.parse(scheString)
logger.info("Schema used: " + sche);
GenericRecord rec = null;
while (dataFileStre.hasNext()) {
rec = dataFileStre.next(rec);
System.out.println(rec);
is = new ByteArrayInputStream(rec.toString().getBytes('UTF-8'))
dataContext.storeStream(is, prop)
}
}

In the above example, we have written an example of reading files without schema in which we have to understand that if we have not included the schema under the avro file, then we have to perform some steps for informing the interpreter how to explain binary avro data, we also need to generate the schema which has been utilizing, in which this example can avro schema with a different name. We can also set it on another path.

Conclusion

In this article, we have concluded that the avro file is a data file that can work with the data serialized system utilized by Apache Hadoop. It has an open-source platform; we have also seen the configuration of the data files and examples, which helps to understand the concept.

以上是阿夫羅文件的詳細內容。更多資訊請關注PHP中文網其他相關文章！

sql json String Boolean Array Object NULL if for format try enum union int double using map default this transform hadoop spark mapreduce apache rpc

陳述：

本文內容由網友自願投稿，版權歸原作者所有。本站不承擔相應的法律責任。如發現涉嫌抄襲或侵權的內容，請聯絡admin@php.cn

上一篇：AVL樹java下一篇：AVL樹java

看更多