search
HomeDatabaseMysql TutorialSqoop1.4.4 实现将 Oracle10g 中的增量数据导入 Hive0.13.1 ,并更新Hive中的主表

将Oracle中的业务基础表增量数据导入Hive中,与当前的全量表合并为最新的全量表。通过Sqoop将Oracle中表的导入Hive,模拟全量表和

需求

将Oracle中的业务基础表增量数据导入Hive中,与当前的全量表合并为最新的全量表。

设计

涉及的三张表:

 

步骤:

  • 通过Sqoop将Oracle中的表导入Hive,模拟全量表和增量表
  • 通过Hive将“全量表+增量表”合并为“更新后的全量表”,覆盖当前的全量表
  • 步骤1:通过Sqoop将Oracle中表的导入Hive,模拟全量表和增量表

    为了模拟场景,需要一张全量表,和一张增量表,由于数据源有限,所以两个表都来自Oracle中的OMP_SERVICE,全量表包含所有数据,,在Hive中名称叫service_all,增量表包含部分时间段数据,在Hive中名称叫service_tmp。

    (1)全量表导入:导出所有数据,只要部分字段,导入到Hive指定表里

    为实现导入Hive功能,需要先配置HCatalog(HCatalog是Hive子模块)的环境变量,/etc/profile中新增:

    export HCAT_HOME=/home/fulong/Hive/apache-hive-0.13.1-bin/hcatalog

     

    执行以下命令导入数据:

    fulong@FBI006:~/Sqoop/sqoop-1.4.4/bin$ ./sqoop import \

    > --connect jdbc:oracle:thin:@192.168.0.147:1521:ORCLGBK  --username SP --password fulong \

    > --table OMP_SERVICE \

    > --columns "SERVICE_CODE,SERVICE_NAME,SERVICE_PROCESS,CREATE_TIME,ENABLE_ORG,ENABLE_PLATFORM,IF_DEL" \

    > --hive-import --hive-table SERVICE_ALL

     

    注意:用户名必须大写

     

    (2)增量表导入:只导出所需时间范围内的数据,只要部分字段,导入到Hive指定表里

    使用以下命令导入数据:

    fulong@FBI006:~/Sqoop/sqoop-1.4.4/bin$ ./sqoop import \

    > --connect jdbc:oracle:thin:@192.168.0.147:1521:ORCLGBK  --username SP --password fulong \

    > --table OMP_SERVICE \

    > --columns "SERVICE_CODE,SERVICE_NAME,SERVICE_PROCESS,CREATE_TIME,ENABLE_ORG,ENABLE_PLATFORM,IF_DEL" \

    > --where "CREATE_TIME > to_date('2012/12/4 17:00:00','yyyy-mm-dd hh24:mi:ss') and CREATE_TIME

    > --hive-import --hive-overwrite --hive-table SERVICE_TMP

     

    注意:

  • 由于使用了--hive-overwrite参数,所以该语句可反复执行,往service_tmp表中覆盖插入最新的增量数据;
  • Sqoop还支持使用复杂Sql语句查询数据导入,相亲参见的“7.2.3.Free-form Query Imports”章节
  • (3)验证导入结果:列出所有表,统计行数,查看表结构

    hive> show tables;

    OK

    searchlog

    searchlog_tmp

    service_all

    service_tmp

    Time taken: 0.04 seconds, Fetched: 4 row(s)

    hive> select count(*) from service_all;

    Total jobs = 1

    Launching Job 1 out of 1

    Number of reduce tasks determined at compile time: 1

    In order to change the average load for a reducer (in bytes):

      set hive.exec.reducers.bytes.per.reducer=

    In order to limit the maximum number of reducers:

      set hive.exec.reducers.max=

    In order to set a constant number of reducers:

      set mapreduce.job.reduces=

    Starting Job = job_1407233914535_0013, Tracking URL = :8088/proxy/application_1407233914535_0013/

    Kill Command = /home/fulong/Hadoop/hadoop-2.2.0/bin/hadoop job  -kill job_1407233914535_0013

    Hadoop job information for Stage-1: number of mappers: 3; number of reducers: 1

    2014-08-21 16:51:47,389 Stage-1 map = 0%,  reduce = 0%

    2014-08-21 16:51:59,816 Stage-1 map = 33%,  reduce = 0%, Cumulative CPU 1.36 sec

    2014-08-21 16:52:01,996 Stage-1 map = 67%,  reduce = 0%, Cumulative CPU 2.45 sec

    2014-08-21 16:52:07,877 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.96 sec

    2014-08-21 16:52:17,639 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 5.29 sec

    MapReduce Total cumulative CPU time: 5 seconds 290 msec

    Ended Job = job_1407233914535_0013

    MapReduce Jobs Launched:

    Job 0: Map: 3  Reduce: 1   Cumulative CPU: 5.46 sec   HDFS Read: 687141 HDFS Write: 5 SUCCESS

    Total MapReduce CPU Time Spent: 5 seconds 460 msec

    OK

    6803

    Time taken: 59.386 seconds, Fetched: 1 row(s)

    hive> select count(*) from service_tmp;

    Total jobs = 1

    Launching Job 1 out of 1

    Number of reduce tasks determined at compile time: 1

    In order to change the average load for a reducer (in bytes):

      set hive.exec.reducers.bytes.per.reducer=

    In order to limit the maximum number of reducers:

      set hive.exec.reducers.max=

    In order to set a constant number of reducers:

      set mapreduce.job.reduces=

    Starting Job = job_1407233914535_0014, Tracking URL = :8088/proxy/application_1407233914535_0014/

    Kill Command = /home/fulong/Hadoop/hadoop-2.2.0/bin/hadoop job  -kill job_1407233914535_0014

    Hadoop job information for Stage-1: number of mappers: 3; number of reducers: 1

    2014-08-21 16:53:03,951 Stage-1 map = 0%,  reduce = 0%

    2014-08-21 16:53:15,189 Stage-1 map = 67%,  reduce = 0%, Cumulative CPU 2.17 sec

    2014-08-21 16:53:16,236 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.38 sec

    2014-08-21 16:53:57,935 Stage-1 map = 100%,  reduce = 22%, Cumulative CPU 3.78 sec

    2014-08-21 16:54:01,811 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 5.34 sec

    MapReduce Total cumulative CPU time: 5 seconds 340 msec

    Ended Job = job_1407233914535_0014

    MapReduce Jobs Launched:

    Job 0: Map: 3  Reduce: 1   Cumulative CPU: 5.66 sec   HDFS Read: 4720 HDFS Write: 3 SUCCESS

    Total MapReduce CPU Time Spent: 5 seconds 660 msec

    OK

    13

    Time taken: 75.856 seconds, Fetched: 1 row(s)

    hive> describe service_all;

    OK

    service_code            string

    service_name            string

    service_process         string

    create_time             string

    enable_org              string

    enable_platform         string

    if_del                  string

    Time taken: 0.169 seconds, Fetched: 7 row(s)

    hive> describe service_tmp;

    OK

    service_code            string

    service_name            string

    service_process         string

    create_time             string

    enable_org              string

    enable_platform         string

    if_del                  string

    Time taken: 0.117 seconds, Fetched: 7 row(s)

    合并新表的逻辑如下:

  • 整个tmp表进入最终表中
  • all表的数据中不包含在tmp表service_code范围内的数据全部进入新表
  • 执行以下sql语句可以合并得到更新后的全量表:

    Statement
    The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
    How to use MySQL functions for data processing and calculationHow to use MySQL functions for data processing and calculationApr 29, 2025 pm 04:21 PM

    MySQL functions can be used for data processing and calculation. 1. Basic usage includes string processing, date calculation and mathematical operations. 2. Advanced usage involves combining multiple functions to implement complex operations. 3. Performance optimization requires avoiding the use of functions in the WHERE clause and using GROUPBY and temporary tables.

    An efficient way to batch insert data in MySQLAn efficient way to batch insert data in MySQLApr 29, 2025 pm 04:18 PM

    Efficient methods for batch inserting data in MySQL include: 1. Using INSERTINTO...VALUES syntax, 2. Using LOADDATAINFILE command, 3. Using transaction processing, 4. Adjust batch size, 5. Disable indexing, 6. Using INSERTIGNORE or INSERT...ONDUPLICATEKEYUPDATE, these methods can significantly improve database operation efficiency.

    Steps to add and delete fields to MySQL tablesSteps to add and delete fields to MySQL tablesApr 29, 2025 pm 04:15 PM

    In MySQL, add fields using ALTERTABLEtable_nameADDCOLUMNnew_columnVARCHAR(255)AFTERexisting_column, delete fields using ALTERTABLEtable_nameDROPCOLUMNcolumn_to_drop. When adding fields, you need to specify a location to optimize query performance and data structure; before deleting fields, you need to confirm that the operation is irreversible; modifying table structure using online DDL, backup data, test environment, and low-load time periods is performance optimization and best practice.

    How to analyze the execution plan of MySQL queryHow to analyze the execution plan of MySQL queryApr 29, 2025 pm 04:12 PM

    Use the EXPLAIN command to analyze the execution plan of MySQL queries. 1. The EXPLAIN command displays the execution plan of the query to help find performance bottlenecks. 2. The execution plan includes fields such as id, select_type, table, type, possible_keys, key, key_len, ref, rows and Extra. 3. According to the execution plan, you can optimize queries by adding indexes, avoiding full table scans, optimizing JOIN operations, and using overlay indexes.

    How to use MySQL subquery to improve query efficiencyHow to use MySQL subquery to improve query efficiencyApr 29, 2025 pm 04:09 PM

    Subqueries can improve the efficiency of MySQL query. 1) Subquery simplifies complex query logic, such as filtering data and calculating aggregated values. 2) MySQL optimizer may convert subqueries to JOIN operations to improve performance. 3) Using EXISTS instead of IN can avoid multiple rows returning errors. 4) Optimization strategies include avoiding related subqueries, using EXISTS, index optimization, and avoiding subquery nesting.

    How to configure the character set and collation rules of MySQLHow to configure the character set and collation rules of MySQLApr 29, 2025 pm 04:06 PM

    Methods for configuring character sets and collations in MySQL include: 1. Setting the character sets and collations at the server level: SETNAMES'utf8'; SETCHARACTERSETutf8; SETCOLLATION_CONNECTION='utf8_general_ci'; 2. Create a database that uses specific character sets and collations: CREATEDATABASEexample_dbCHARACTERSETutf8COLLATEutf8_general_ci; 3. Specify character sets and collations when creating a table: CREATETABLEexample_table(idINT

    How to uninstall MySQL and clean residual filesHow to uninstall MySQL and clean residual filesApr 29, 2025 pm 04:03 PM

    To safely and thoroughly uninstall MySQL and clean all residual files, follow the following steps: 1. Stop MySQL service; 2. Uninstall MySQL packages; 3. Clean configuration files and data directories; 4. Verify that the uninstallation is thorough.

    How to rename a database in MySQLHow to rename a database in MySQLApr 29, 2025 pm 04:00 PM

    Renaming a database in MySQL requires indirect methods. The steps are as follows: 1. Create a new database; 2. Use mysqldump to export the old database; 3. Import the data into the new database; 4. Delete the old database.

    See all articles

    Hot AI Tools

    Undresser.AI Undress

    Undresser.AI Undress

    AI-powered app for creating realistic nude photos

    AI Clothes Remover

    AI Clothes Remover

    Online AI tool for removing clothes from photos.

    Undress AI Tool

    Undress AI Tool

    Undress images for free

    Clothoff.io

    Clothoff.io

    AI clothes remover

    Video Face Swap

    Video Face Swap

    Swap faces in any video effortlessly with our completely free AI face swap tool!

    Hot Tools

    WebStorm Mac version

    WebStorm Mac version

    Useful JavaScript development tools

    mPDF

    mPDF

    mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

    SublimeText3 English version

    SublimeText3 English version

    Recommended: Win version, supports code prompts!

    SecLists

    SecLists

    SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

    Notepad++7.3.1

    Notepad++7.3.1

    Easy-to-use and free code editor