将Oracle中的业务基础表增量数据导入Hive中,与当前的全量表合并为最新的全量表。通过Sqoop将Oracle中表的导入Hive,模拟全量表和
需求
将Oracle中的业务基础表增量数据导入Hive中,与当前的全量表合并为最新的全量表。
设计
涉及的三张表:
步骤:
步骤1:通过Sqoop将Oracle中表的导入Hive,模拟全量表和增量表
为了模拟场景,需要一张全量表,和一张增量表,由于数据源有限,所以两个表都来自Oracle中的OMP_SERVICE,全量表包含所有数据,,在Hive中名称叫service_all,增量表包含部分时间段数据,在Hive中名称叫service_tmp。
(1)全量表导入:导出所有数据,只要部分字段,导入到Hive指定表里
为实现导入Hive功能,需要先配置HCatalog(HCatalog是Hive子模块)的环境变量,/etc/profile中新增:
export HCAT_HOME=/home/fulong/Hive/apache-hive-0.13.1-bin/hcatalog
执行以下命令导入数据:
fulong@FBI006:~/Sqoop/sqoop-1.4.4/bin$ ./sqoop import \
> --connect jdbc:oracle:thin:@192.168.0.147:1521:ORCLGBK --username SP --password fulong \
> --table OMP_SERVICE \
> --columns "SERVICE_CODE,SERVICE_NAME,SERVICE_PROCESS,CREATE_TIME,ENABLE_ORG,ENABLE_PLATFORM,IF_DEL" \
> --hive-import --hive-table SERVICE_ALL
注意:用户名必须大写
(2)增量表导入:只导出所需时间范围内的数据,只要部分字段,导入到Hive指定表里
使用以下命令导入数据:
fulong@FBI006:~/Sqoop/sqoop-1.4.4/bin$ ./sqoop import \
> --connect jdbc:oracle:thin:@192.168.0.147:1521:ORCLGBK --username SP --password fulong \
> --table OMP_SERVICE \
> --columns "SERVICE_CODE,SERVICE_NAME,SERVICE_PROCESS,CREATE_TIME,ENABLE_ORG,ENABLE_PLATFORM,IF_DEL" \
> --where "CREATE_TIME > to_date('2012/12/4 17:00:00','yyyy-mm-dd hh24:mi:ss') and CREATE_TIME
> --hive-import --hive-overwrite --hive-table SERVICE_TMP
注意:
(3)验证导入结果:列出所有表,统计行数,查看表结构
hive> show tables;
OK
searchlog
searchlog_tmp
service_all
service_tmp
Time taken: 0.04 seconds, Fetched: 4 row(s)
hive> select count(*) from service_all;
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapreduce.job.reduces=
Starting Job = job_1407233914535_0013, Tracking URL = :8088/proxy/application_1407233914535_0013/
Kill Command = /home/fulong/Hadoop/hadoop-2.2.0/bin/hadoop job -kill job_1407233914535_0013
Hadoop job information for Stage-1: number of mappers: 3; number of reducers: 1
2014-08-21 16:51:47,389 Stage-1 map = 0%, reduce = 0%
2014-08-21 16:51:59,816 Stage-1 map = 33%, reduce = 0%, Cumulative CPU 1.36 sec
2014-08-21 16:52:01,996 Stage-1 map = 67%, reduce = 0%, Cumulative CPU 2.45 sec
2014-08-21 16:52:07,877 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.96 sec
2014-08-21 16:52:17,639 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 5.29 sec
MapReduce Total cumulative CPU time: 5 seconds 290 msec
Ended Job = job_1407233914535_0013
MapReduce Jobs Launched:
Job 0: Map: 3 Reduce: 1 Cumulative CPU: 5.46 sec HDFS Read: 687141 HDFS Write: 5 SUCCESS
Total MapReduce CPU Time Spent: 5 seconds 460 msec
OK
6803
Time taken: 59.386 seconds, Fetched: 1 row(s)
hive> select count(*) from service_tmp;
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapreduce.job.reduces=
Starting Job = job_1407233914535_0014, Tracking URL = :8088/proxy/application_1407233914535_0014/
Kill Command = /home/fulong/Hadoop/hadoop-2.2.0/bin/hadoop job -kill job_1407233914535_0014
Hadoop job information for Stage-1: number of mappers: 3; number of reducers: 1
2014-08-21 16:53:03,951 Stage-1 map = 0%, reduce = 0%
2014-08-21 16:53:15,189 Stage-1 map = 67%, reduce = 0%, Cumulative CPU 2.17 sec
2014-08-21 16:53:16,236 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.38 sec
2014-08-21 16:53:57,935 Stage-1 map = 100%, reduce = 22%, Cumulative CPU 3.78 sec
2014-08-21 16:54:01,811 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 5.34 sec
MapReduce Total cumulative CPU time: 5 seconds 340 msec
Ended Job = job_1407233914535_0014
MapReduce Jobs Launched:
Job 0: Map: 3 Reduce: 1 Cumulative CPU: 5.66 sec HDFS Read: 4720 HDFS Write: 3 SUCCESS
Total MapReduce CPU Time Spent: 5 seconds 660 msec
OK
13
Time taken: 75.856 seconds, Fetched: 1 row(s)
hive> describe service_all;
OK
service_code string
service_name string
service_process string
create_time string
enable_org string
enable_platform string
if_del string
Time taken: 0.169 seconds, Fetched: 7 row(s)
hive> describe service_tmp;
OK
service_code string
service_name string
service_process string
create_time string
enable_org string
enable_platform string
if_del string
Time taken: 0.117 seconds, Fetched: 7 row(s)
合并新表的逻辑如下:
执行以下sql语句可以合并得到更新后的全量表:

MySQL functions can be used for data processing and calculation. 1. Basic usage includes string processing, date calculation and mathematical operations. 2. Advanced usage involves combining multiple functions to implement complex operations. 3. Performance optimization requires avoiding the use of functions in the WHERE clause and using GROUPBY and temporary tables.

Efficient methods for batch inserting data in MySQL include: 1. Using INSERTINTO...VALUES syntax, 2. Using LOADDATAINFILE command, 3. Using transaction processing, 4. Adjust batch size, 5. Disable indexing, 6. Using INSERTIGNORE or INSERT...ONDUPLICATEKEYUPDATE, these methods can significantly improve database operation efficiency.

In MySQL, add fields using ALTERTABLEtable_nameADDCOLUMNnew_columnVARCHAR(255)AFTERexisting_column, delete fields using ALTERTABLEtable_nameDROPCOLUMNcolumn_to_drop. When adding fields, you need to specify a location to optimize query performance and data structure; before deleting fields, you need to confirm that the operation is irreversible; modifying table structure using online DDL, backup data, test environment, and low-load time periods is performance optimization and best practice.

Use the EXPLAIN command to analyze the execution plan of MySQL queries. 1. The EXPLAIN command displays the execution plan of the query to help find performance bottlenecks. 2. The execution plan includes fields such as id, select_type, table, type, possible_keys, key, key_len, ref, rows and Extra. 3. According to the execution plan, you can optimize queries by adding indexes, avoiding full table scans, optimizing JOIN operations, and using overlay indexes.

Subqueries can improve the efficiency of MySQL query. 1) Subquery simplifies complex query logic, such as filtering data and calculating aggregated values. 2) MySQL optimizer may convert subqueries to JOIN operations to improve performance. 3) Using EXISTS instead of IN can avoid multiple rows returning errors. 4) Optimization strategies include avoiding related subqueries, using EXISTS, index optimization, and avoiding subquery nesting.

Methods for configuring character sets and collations in MySQL include: 1. Setting the character sets and collations at the server level: SETNAMES'utf8'; SETCHARACTERSETutf8; SETCOLLATION_CONNECTION='utf8_general_ci'; 2. Create a database that uses specific character sets and collations: CREATEDATABASEexample_dbCHARACTERSETutf8COLLATEutf8_general_ci; 3. Specify character sets and collations when creating a table: CREATETABLEexample_table(idINT

To safely and thoroughly uninstall MySQL and clean all residual files, follow the following steps: 1. Stop MySQL service; 2. Uninstall MySQL packages; 3. Clean configuration files and data directories; 4. Verify that the uninstallation is thorough.

Renaming a database in MySQL requires indirect methods. The steps are as follows: 1. Create a new database; 2. Use mysqldump to export the old database; 3. Import the data into the new database; 4. Delete the old database.


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

WebStorm Mac version
Useful JavaScript development tools

mPDF
mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

SublimeText3 English version
Recommended: Win version, supports code prompts!

SecLists
SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

Notepad++7.3.1
Easy-to-use and free code editor
