Sqoop1.4.4 实现将 Oracle10g 中的增量数据导入 Hive0.13.1 ，并更新Hive中的主表-Mysql Tutorial-php.cn

Home

Database

Mysql Tutorial

Sqoop1.4.4 实现将 Oracle10g 中的增量数据导入 Hive0.13.1 ，并更新Hive中的主表

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jun 07, 2016 pm 04:44 PM

将Oracle中的业务基础表增量数据导入Hive中，与当前的全量表合并为最新的全量表。通过Sqoop将Oracle中表的导入Hive，模拟全量表和

需求

将Oracle中的业务基础表增量数据导入Hive中，与当前的全量表合并为最新的全量表。

设计

涉及的三张表：

步骤：

通过Sqoop将Oracle中的表导入Hive，模拟全量表和增量表

通过Hive将“全量表+增量表”合并为“更新后的全量表”，覆盖当前的全量表

步骤1：通过Sqoop将Oracle中表的导入Hive，模拟全量表和增量表

为了模拟场景，需要一张全量表，和一张增量表，由于数据源有限，所以两个表都来自Oracle中的OMP_SERVICE，全量表包含所有数据，，在Hive中名称叫service_all，增量表包含部分时间段数据，在Hive中名称叫service_tmp。

（1）全量表导入：导出所有数据，只要部分字段，导入到Hive指定表里

为实现导入Hive功能，需要先配置HCatalog（HCatalog是Hive子模块）的环境变量，/etc/profile中新增：

export HCAT_HOME=/home/fulong/Hive/apache-hive-0.13.1-bin/hcatalog

执行以下命令导入数据：

fulong@FBI006:~/Sqoop/sqoop-1.4.4/bin$ ./sqoop import \

> --connect jdbc:oracle:thin:@192.168.0.147:1521:ORCLGBK --username SP --password fulong \

> --table OMP_SERVICE \

> --columns "SERVICE_CODE,SERVICE_NAME,SERVICE_PROCESS,CREATE_TIME,ENABLE_ORG,ENABLE_PLATFORM,IF_DEL" \

> --hive-import --hive-table SERVICE_ALL

注意：用户名必须大写

（2）增量表导入：只导出所需时间范围内的数据，只要部分字段，导入到Hive指定表里

使用以下命令导入数据：

fulong@FBI006:~/Sqoop/sqoop-1.4.4/bin$ ./sqoop import \

> --connect jdbc:oracle:thin:@192.168.0.147:1521:ORCLGBK --username SP --password fulong \

> --table OMP_SERVICE \

> --columns "SERVICE_CODE,SERVICE_NAME,SERVICE_PROCESS,CREATE_TIME,ENABLE_ORG,ENABLE_PLATFORM,IF_DEL" \

> --where "CREATE_TIME > to_date('2012/12/4 17:00:00','yyyy-mm-dd hh24:mi:ss') and CREATE_TIME

> --hive-import --hive-overwrite --hive-table SERVICE_TMP

注意：

由于使用了--hive-overwrite参数，所以该语句可反复执行，往service_tmp表中覆盖插入最新的增量数据；

Sqoop还支持使用复杂Sql语句查询数据导入，相亲参见的“7.2.3.Free-form Query Imports”章节

（3）验证导入结果：列出所有表，统计行数，查看表结构

hive> show tables;

searchlog

searchlog_tmp

service_all

service_tmp

Time taken: 0.04 seconds, Fetched: 4 row(s)

hive> select count(*) from service_all;

Total jobs = 1

Launching Job 1 out of 1

Number of reduce tasks determined at compile time: 1

In order to change the average load for a reducer (in bytes):

set hive.exec.reducers.bytes.per.reducer=

In order to limit the maximum number of reducers:

set hive.exec.reducers.max=

In order to set a constant number of reducers:

set mapreduce.job.reduces=

Starting Job = job_1407233914535_0013, Tracking URL = :8088/proxy/application_1407233914535_0013/

Kill Command = /home/fulong/Hadoop/hadoop-2.2.0/bin/hadoop job -kill job_1407233914535_0013

Hadoop job information for Stage-1: number of mappers: 3; number of reducers: 1

2014-08-21 16:51:47,389 Stage-1 map = 0%, reduce = 0%

2014-08-21 16:51:59,816 Stage-1 map = 33%, reduce = 0%, Cumulative CPU 1.36 sec

2014-08-21 16:52:01,996 Stage-1 map = 67%, reduce = 0%, Cumulative CPU 2.45 sec

2014-08-21 16:52:07,877 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.96 sec

2014-08-21 16:52:17,639 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 5.29 sec

MapReduce Total cumulative CPU time: 5 seconds 290 msec

Ended Job = job_1407233914535_0013

MapReduce Jobs Launched:

Job 0: Map: 3 Reduce: 1 Cumulative CPU: 5.46 sec HDFS Read: 687141 HDFS Write: 5 SUCCESS

Total MapReduce CPU Time Spent: 5 seconds 460 msec

6803

Time taken: 59.386 seconds, Fetched: 1 row(s)

hive> select count(*) from service_tmp;

Total jobs = 1

Launching Job 1 out of 1

Number of reduce tasks determined at compile time: 1

In order to change the average load for a reducer (in bytes):

set hive.exec.reducers.bytes.per.reducer=

In order to limit the maximum number of reducers:

set hive.exec.reducers.max=

In order to set a constant number of reducers:

set mapreduce.job.reduces=

Starting Job = job_1407233914535_0014, Tracking URL = :8088/proxy/application_1407233914535_0014/

Kill Command = /home/fulong/Hadoop/hadoop-2.2.0/bin/hadoop job -kill job_1407233914535_0014

Hadoop job information for Stage-1: number of mappers: 3; number of reducers: 1

2014-08-21 16:53:03,951 Stage-1 map = 0%, reduce = 0%

2014-08-21 16:53:15,189 Stage-1 map = 67%, reduce = 0%, Cumulative CPU 2.17 sec

2014-08-21 16:53:16,236 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.38 sec

2014-08-21 16:53:57,935 Stage-1 map = 100%, reduce = 22%, Cumulative CPU 3.78 sec

2014-08-21 16:54:01,811 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 5.34 sec

MapReduce Total cumulative CPU time: 5 seconds 340 msec

Ended Job = job_1407233914535_0014

MapReduce Jobs Launched:

Job 0: Map: 3 Reduce: 1 Cumulative CPU: 5.66 sec HDFS Read: 4720 HDFS Write: 3 SUCCESS

Total MapReduce CPU Time Spent: 5 seconds 660 msec

Time taken: 75.856 seconds, Fetched: 1 row(s)

hive> describe service_all;

service_code string

service_name string

service_process string

create_time string

enable_org string

enable_platform string

if_del string

Time taken: 0.169 seconds, Fetched: 7 row(s)

hive> describe service_tmp;

service_code string

service_name string

service_process string

create_time string

enable_org string

enable_platform string

if_del string

Time taken: 0.117 seconds, Fetched: 7 row(s)

合并新表的逻辑如下：

整个tmp表进入最终表中

all表的数据中不包含在tmp表service_code范围内的数据全部进入新表

执行以下sql语句可以合并得到更新后的全量表：

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

How to use MySQL functions for data processing and calculationApr 29, 2025 pm 04:21 PM

MySQL functions can be used for data processing and calculation. 1. Basic usage includes string processing, date calculation and mathematical operations. 2. Advanced usage involves combining multiple functions to implement complex operations. 3. Performance optimization requires avoiding the use of functions in the WHERE clause and using GROUPBY and temporary tables.

An efficient way to batch insert data in MySQLApr 29, 2025 pm 04:18 PM

Efficient methods for batch inserting data in MySQL include: 1. Using INSERTINTO...VALUES syntax, 2. Using LOADDATAINFILE command, 3. Using transaction processing, 4. Adjust batch size, 5. Disable indexing, 6. Using INSERTIGNORE or INSERT...ONDUPLICATEKEYUPDATE, these methods can significantly improve database operation efficiency.

Steps to add and delete fields to MySQL tablesApr 29, 2025 pm 04:15 PM

In MySQL, add fields using ALTERTABLEtable_nameADDCOLUMNnew_columnVARCHAR(255)AFTERexisting_column, delete fields using ALTERTABLEtable_nameDROPCOLUMNcolumn_to_drop. When adding fields, you need to specify a location to optimize query performance and data structure; before deleting fields, you need to confirm that the operation is irreversible; modifying table structure using online DDL, backup data, test environment, and low-load time periods is performance optimization and best practice.

How to analyze the execution plan of MySQL queryApr 29, 2025 pm 04:12 PM

Use the EXPLAIN command to analyze the execution plan of MySQL queries. 1. The EXPLAIN command displays the execution plan of the query to help find performance bottlenecks. 2. The execution plan includes fields such as id, select_type, table, type, possible_keys, key, key_len, ref, rows and Extra. 3. According to the execution plan, you can optimize queries by adding indexes, avoiding full table scans, optimizing JOIN operations, and using overlay indexes.

How to use MySQL subquery to improve query efficiencyApr 29, 2025 pm 04:09 PM

Subqueries can improve the efficiency of MySQL query. 1) Subquery simplifies complex query logic, such as filtering data and calculating aggregated values. 2) MySQL optimizer may convert subqueries to JOIN operations to improve performance. 3) Using EXISTS instead of IN can avoid multiple rows returning errors. 4) Optimization strategies include avoiding related subqueries, using EXISTS, index optimization, and avoiding subquery nesting.

How to configure the character set and collation rules of MySQLApr 29, 2025 pm 04:06 PM

Methods for configuring character sets and collations in MySQL include: 1. Setting the character sets and collations at the server level: SETNAMES'utf8'; SETCHARACTERSETutf8; SETCOLLATION_CONNECTION='utf8_general_ci'; 2. Create a database that uses specific character sets and collations: CREATEDATABASEexample_dbCHARACTERSETutf8COLLATEutf8_general_ci; 3. Specify character sets and collations when creating a table: CREATETABLEexample_table(idINT

How to uninstall MySQL and clean residual filesApr 29, 2025 pm 04:03 PM

To safely and thoroughly uninstall MySQL and clean all residual files, follow the following steps: 1. Stop MySQL service; 2. Uninstall MySQL packages; 3. Clean configuration files and data directories; 4. Verify that the uninstallation is thorough.

How to rename a database in MySQLApr 29, 2025 pm 04:00 PM

Renaming a database in MySQL requires indirect methods. The steps are as follows: 1. Create a new database; 2. Use mysqldump to export the old database; 3. Import the data into the new database; 4. Delete the old database.

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

What's New in Windows 11 KB5054979 & How to Fix Update Issues

3 weeks agoByDDD

How to fix KB5055523 fails to install in Windows 11?

2 weeks agoByDDD

InZoi: How To Apply To School And University

3 weeks agoByDDD

How to fix KB5055518 fails to install in Windows 10?

2 weeks agoByDDD

Roblox: Dead Rails – How To Summon And Defeat Nikola Tesla

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

WebStorm Mac version

Useful JavaScript development tools

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

SublimeText3 English version

Recommended: Win version, supports code prompts!

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

Notepad++7.3.1

Easy-to-use and free code editor

Hot Topics

Where is the login entrance for gmail email?

7825

1647

1402

1300

1238