Let's talk about how to parse Apache Avro data (explanation with examples)-Apache-php.cn

Home

Operation and Maintenance

Apache

Let's talk about how to parse Apache Avro data (explanation with examples)

青灯夜游

Feb 22, 2022 am 10:47 AM

apache

How to parse Apache Avro data? This article will introduce you to the methods of serializing to generate Avro data, deserializing to parse Avro data, and using FlinkSQL to parse Avro data. I hope it will be helpful to you! Let's talk about how to parse Apache Avro data (explanation with examples)

With the rapid development of the Internet, cutting-edge technologies such as cloud computing, big data, artificial intelligence AI, and the Internet of Things have become mainstream high-tech technologies in today's era, such as e-commerce websites , face recognition, driverless driving, smart homes, smart cities, etc., not only facilitate people's daily necessities, food, housing and transportation, but behind the scenes, there is always a large amount of data being collected, cleared and analyzed by various system platforms. , and it is particularly important to ensure low latency, high throughput, and security of data. Apache Avro itself is serialized through Schema for binary transmission. On the one hand, it ensures high-speed transmission of data, and on the other hand, it ensures data security. , avro is currently used more and more widely in various industries. How to process and parse avro data is particularly important. This article will demonstrate how to generate avro data through serialization and use FlinkSQL for analysis. This article is a demo of avro parsing. Currently, FlinkSQL is only suitable for simple avro data parsing. Complex nested avro data is not supported for the time being.

Scene introduction

This article mainly introduces the following three key contents:

How to serialize and generate Avro data
How to deserialize and parse Avro data
How to use FlinkSQL to parse Avro data

Prerequisites

To understand what avro is, please refer to the apache avro official website quick start guide
Understand avro application scenarios

Operation steps

1. Create a new avro maven project and configure the pom dependency Lets talk about how to parse Apache Avro data (explanation with examples)

The content of the pom file is as follows:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.huawei.bigdata</groupId>
    <artifactId>avrodemo</artifactId>
    <version>1.0-SNAPSHOT</version>
    <dependencies>
        <dependency>
            <groupId>org.apache.avro</groupId>
            <artifactId>avro</artifactId>
            <version>1.8.1</version>
        </dependency>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.12</version>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.avro</groupId>
                <artifactId>avro-maven-plugin</artifactId>
                <version>1.8.1</version>
                <executions>
                    <execution>
                        <phase>generate-sources</phase>
                        <goals>
                            <goal>schema</goal>
                        </goals>
                        <configuration>
                            <sourceDirectory>${project.basedir}/src/main/avro/</sourceDirectory>
                            <outputDirectory>${project.basedir}/src/main/java/</outputDirectory>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <configuration>
                    <source>1.6</source>
                    <target>1.6</target>
                </configuration>
            </plugin>
        </plugins>
    </build>

</project>

Note: The above pom file is configured to be automatically generated The path to the class, i.e.

##p r o j e c t . b a s e d i r / s r c / m a i n / a v r o / and {project.basedir}/src/ main/avro/and

2. Define schema

Use JSON to define schema for Avro. The schema consists of basic types (null, boolean, int, long, float, double, bytes, and string) and complex types (record, enum, array, map, union, and fixed). For example, the following defines a user's schema, creates an avro directory in the main directory, and then creates a new file user.avsc in the avro directory:

{"namespace": "lancoo.ecbdc.pre",
 "type": "record",
 "name": "User",
 "fields": [
     {"name": "name", "type": "string"},
     {"name": "favorite_number",  "type": ["int", "null"]},
     {"name": "favorite_color", "type": ["string", "null"]}
 ]
}

Lets talk about how to parse Apache Avro data (explanation with examples) 3. Compile schema

点击maven projects项目的compile进行编译，会自动在创建namespace路径和User类代码

Lets talk about how to parse Apache Avro data (explanation with examples)

4、序列化

创建TestUser类，用于序列化生成数据

User user1 = new User();
user1.setName("Alyssa");
user1.setFavoriteNumber(256);
// Leave favorite col or null

// Alternate constructor
User user2 = new User("Ben", 7, "red");

// Construct via builder
User user3 = User.newBuilder()
        .setName("Charlie")
        .setFavoriteColor("blue")
        .setFavoriteNumber(null)
        .build();

// Serialize user1, user2 and user3 to disk
DatumWriter<User> userDatumWriter = new SpecificDatumWriter<User>(User.class);
DataFileWriter<User> dataFileWriter = new DataFileWriter<User>(userDatumWriter);
dataFileWriter.create(user1.getSchema(), new File("user_generic.avro"));
dataFileWriter.append(user1);
dataFileWriter.append(user2);
dataFileWriter.append(user3);
dataFileWriter.close();

执行序列化程序后，会在项目的同级目录下生成avro数据

Lets talk about how to parse Apache Avro data (explanation with examples)

user_generic.avro内容如下：

Objavro.schema�{"type":"record","name":"User","namespace":"lancoo.ecbdc.pre","fields":[{"name":"name","type":"string"},{"name":"favorite_number","type":["int","null"]},{"name":"favorite_color","type":["string","null"]}]}

至此avro数据已经生成。

5、反序列化

通过反序列化代码解析avro数据

// Deserialize Users from disk
DatumReader<User> userDatumReader = new SpecificDatumReader<User>(User.class);
DataFileReader<User> dataFileReader = new DataFileReader<User>(new File("user_generic.avro"), userDatumReader);
User user = null;
while (dataFileReader.hasNext()) {
    // Reuse user object by passing it to next(). This saves us from
    // allocating and garbage collecting many objects for files with
    // many items.
    user = dataFileReader.next(user);
    System.out.println(user);
}

执行反序列化代码解析user_generic.avro

Lets talk about how to parse Apache Avro data (explanation with examples)

avro数据解析成功。

6、将user_generic.avro上传至hdfs路径

hdfs dfs -mkdir -p /tmp/lztest/

hdfs dfs -put user_generic.avro /tmp/lztest/

Lets talk about how to parse Apache Avro data (explanation with examples)

7、配置flinkserver

准备avro jar包

将flink-sql-avro-*.jar、flink-sql-avro-confluent-registry-*.jar放入flinkserver lib，将下面的命令在所有flinkserver节点执行

cp /opt/huawei/Bigdata/FusionInsight_Flink_8.1.2/install/FusionInsight-Flink-1.12.2/flink/opt/flink-sql-avro*.jar /opt/huawei/Bigdata/FusionInsight_Flink_8.1.3/install/FusionInsight-Flink-1.12.2/flink/lib

chmod 500 flink-sql-avro*.jar

chown omm:wheel flink-sql-avro*.jar

Lets talk about how to parse Apache Avro data (explanation with examples)

同时重启FlinkServer实例，重启完成后查看avro包是否被上传

hdfs dfs -ls /FusionInsight_FlinkServer/8.1.2-312005/lib

Lets talk about how to parse Apache Avro data (explanation with examples)

8、编写FlinkSQL

CREATE TABLE testHdfs(
  name String,
  favorite_number int,
  favorite_color String
) WITH(
  &#39;connector&#39; = &#39;filesystem&#39;,
  &#39;path&#39; = &#39;hdfs:///tmp/lztest/user_generic.avro&#39;,
  &#39;format&#39; = &#39;avro&#39;
);CREATE TABLE KafkaTable (
  name String,
  favorite_number int,
  favorite_color String
) WITH (
  &#39;connector&#39; = &#39;kafka&#39;,
  &#39;topic&#39; = &#39;testavro&#39;,
  &#39;properties.bootstrap.servers&#39; = &#39;96.10.2.1:21005&#39;,
  &#39;properties.group.id&#39; = &#39;testGroup&#39;,
  &#39;scan.startup.mode&#39; = &#39;latest-offset&#39;,
  &#39;format&#39; = &#39;avro&#39;
);
insert into
  KafkaTable
select
  *
from
  testHdfs;

Lets talk about how to parse Apache Avro data (explanation with examples)

保存提交任务

9、查看对应topic中是否有数据

Lets talk about how to parse Apache Avro data (explanation with examples)

FlinkSQL解析avro数据成功。

【推荐：Apache使用教程】

The above is the detailed content of Let's talk about how to parse Apache Avro data (explanation with examples). For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:掘金社区. If there is any infringement, please contact admin@php.cn delete

What Defined Apache? Its Core FunctionalityMay 09, 2025 am 12:21 AM

The core function of Apache is modular design and high customization, allowing it to meet various web service needs. 1. Modular design allows for extended functions by loading different modules. 2. Supports multiple operating systems and is suitable for different environments. 3. Multi-process, multi-threaded and event-driven models improve performance. 4. The basic usage includes configuring the virtual host and document root directory. 5. Advanced usage involves URL rewriting, load balancing and reverse proxying. 6. Common errors can be debugged through syntax checking and log analysis. 7. Performance optimization includes adjusting MPM settings and enabling cache.

Apache's Continued Use: Web Hosting and BeyondMay 08, 2025 am 12:15 AM

What makes Apache still popular in modern web environments is its powerful capabilities and flexibility. 1) Modular design allows custom functions such as security certification and load balancing. 2) Support multiple operating systems to enhance popularity. 3) Efficiently handle concurrent requests, suitable for various application scenarios.

Apache: From Open Source to Industry StandardMay 07, 2025 am 12:05 AM

The reasons why Apache has developed from an open source project to an industry standard include: 1) community-driven, attracting global developers to participate; 2) standardization and compatibility, complying with Internet standards; 3) business support and ecosystem, and obtaining enterprise-level market support.

Apache's Legacy: Impact on Web HostingMay 06, 2025 am 12:03 AM

Apache's impact on Webhosting is mainly reflected in its open source features, powerful capabilities and flexibility. 1) Open source features lower the threshold for Webhosting. 2) Powerful features and flexibility make it the first choice for large websites and businesses. 3) The virtual host function saves costs. Although performance may decline in high concurrency conditions, Apache remains competitive through continuous optimization.

Apache: The History and Contributions to the WebMay 05, 2025 am 12:14 AM

Originally originated in 1995, Apache was created by a group of developers to improve the NCSAHTTPd server and become the most widely used web server in the world. 1. Originated in 1995, it aims to improve the NCSAHTTPd server. 2. Define the Web server standards and promote the development of the open source movement. 3. It has nurtured important sub-projects such as Tomcat and Kafka. 4. Facing the challenges of cloud computing and container technology, we will focus on integrating with cloud-native technologies in the future.

Apache's Impact: Shaping the InternetMay 04, 2025 am 12:05 AM

Apache has shaped the Internet by providing a stable web server infrastructure, promoting open source culture and incubating important projects. 1) Apache provides a stable web server infrastructure and promotes innovation in web technology. 2) Apache has promoted the development of open source culture, and ASF has incubated important projects such as Hadoop and Kafka. 3) Despite the performance challenges, Apache's future is still full of hope, and ASF continues to launch new technologies.

The Legacy of Apache: A Look at Its Impact on Web ServersMay 03, 2025 am 12:03 AM

Since its creation by volunteers in 1995, ApacheHTTPServer has had a profound impact on the web server field. 1. It originates from dissatisfaction with NCSAHTTPd and provides more stable and reliable services. 2. The establishment of the Apache Software Foundation marks its transformation into an ecosystem. 3. Its modular design and security enhance the flexibility and security of the web server. 4. Despite the decline in market share, Apache is still closely linked to modern web technologies. 5. Through configuration optimization and caching, Apache improves performance. 6. Error logs and debug mode help solve common problems.

Apache's Purpose: Serving Web ContentMay 02, 2025 am 12:23 AM

ApacheHTTPServer continues to efficiently serve Web content in modern Internet environments through modular design, virtual hosting functions and performance optimization. 1) Modular design allows adding functions such as URL rewriting to improve website SEO performance. 2) Virtual hosting function hosts multiple websites on one server, saving costs and simplifying management. 3) Through multi-threading and caching optimization, Apache can handle a large number of concurrent connections, improving response speed and user experience.

See all articles