Home  >  Article  >  Java  >  Features of Hadoop record I

Features of Hadoop record I

怪我咯
怪我咯Original
2017-06-26 11:24:361442browse

Hadoop record I/O contains class files and record description language interpreters to simplify the serialization and deserialization of records.

Introduction

Any software system of significant complexity requires a mechanism for data exchange with the outside world. Data interaction usually involves the packetization and unpacking of input and output data logical units (such as files, network connections, memory buffers, etc.). Applications often have nested opcodes for serializing and deserializing data types. Serialization has several features that allow automated code generation. Given you a particular output encoding format (such as binary, XML, etc.), serialization of basic data types and combinations of basic data types will be a mechanical task. Manually writing serialization code is prone to bugs, especially when records have many fields or a record is defined differently between versions. Finally it is useful for data exchange between applications written in different programming languages. It becomes easier by describing the data records manipulated by the application in a language-independent way, and using the description to derive implementations in different target languages. This document describes Hadoop Record I/O, a mechanism whose purpose is to:

1) Provide a simple specification of serialized data types

2) Provide different targets for encapsulation and unpacking of the above types Code generation for the language

3) Provides target-specific pre-research support, enabling application programmers to integrate generated code into applications.

Hadoop Record I/O targets mechanisms such as XDR, ASN.1, PADS and ICE. Although these systems contain a canonical DDL file for most record types, they differ in many other respects. Hadoop Record I/O focuses on data serialization and multi-language support. We can do serialization based on the translator. Hadoop users must use a simple data description language to describe their data. The Hadoop DDL translator rcc generates code, and users can read and write data by calling a simple read and write data flow abstraction. Next we will list some goals and non-goals of Hadoop Record I/O.

Goals:

1) Support commonly used basic types. Hadoop should contain common built-in types that we wish to support.

2) Support composite types (including recursive composite). Hadoop should support composite types such as structs or vectors.

3) Code generation in different target languages. Hadoop should be able to support generating serialization code in different target languages ​​and scale well. The initial targets are C++ and JAVA.

4) Target language support. Hadoop should have built-in header files, libraries or packages that support the target so that it can be well integrated into the application.

5) Supports multiple different output encoding formats. It can be encapsulated binary, comma-separated text, XML, etc.

6) Support backward or forward compatible record types.

Non-target:

1) Serialize arbitrary C++ files.

2) Serialize complex data structures such as trees, linked lists, etc.

3) Built-in indexing, compression or checksumming.

4) Dynamically constructed entities generated from XML.

The subsequent documents mainly describe the characteristics of Hadoop record I/O in detail. Part 2 describes the data types supported by the system, Part 3 describes the DDL syntax for simple recording examples, Part 4 describes the process of using rcc code generation, Part 5 describes the target language mapping and support for Hadoop types, we already have A relatively complete description of the C++ mapping, which will include Java and other languages ​​in upcoming documentation updates. The last section describes the output encoding support.

The above is the detailed content of Features of Hadoop record I. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn