搜索
首页数据库mysql教程使用Sqoop在HDFS和RDBMS之间导数据

SQOOP是一款开源的工具,主要用于在HADOOP与传统的数据库间进行数据的传递,下面从SQOOP用户手册上摘录一段描述

SQOOP是一款开源的工具,主要用于在Hadoop与传统的数据库间进行数据的传递,,下面从SQOOP用户手册上摘录一段描述

Sqoopis a tool designed to transfer data between Hadoop and relational databases.You can use Sqoop to import data from a relational database management system(RDBMS) such as MySQL or Oracle into the Hadoop Distributed File System(HDFS),transform the data in Hadoop MapReduce, and then export the data backinto an RDBMS.

SQOOP是Cloudera公司开源的一款在HDFS以及数据库之间传输数据的软件。内部通过JDBC连接HADOOP以及数据库,因此从理论上来讲,只要是支持JDBC的数据库,SQOOP都可以兼容。并且,SQOOP不仅能把数据以文件的形式导入到HDFS上,还可以直接导入数据到HBASE或者HIVE中。

下面是一些性能测试数据,仅供参考:

表名:tb_keywords

行数:11628209

数据文件大小:1.4G

 

HDFS –> DB

DB -> HDFS

SQOOP

428s

166s

HDFSFILEDB

209s

105s

从结果上来看,以FILE作为中转方式性能是要高于SQOOP的。原因如下:

1、 本质上SQOOP使用的是JDBC,效率不会比MYSQL自带的到导入\导出工具效率高

2、 以导入数据到DB为例,SQOOP的设计思想是分阶段提交,也就是说假设一个表有1K行,那么它会先读出100行(默认值),然后插入,提交,再读取100行……如此往复

即便如此,SQOOP也是有优势的,比如说使用的便利性,任务执行的容错性等。在一些测试环境中如果需要的话可以考虑把它拿来作为一个工具使用。

下面是一些操作记录

[wanghai01@tc-crm-rd01.tc.baidu.com bin]$ sh export.sh
Fri Sep 23 20:15:47 CST 2011
11/09/23 20:15:48 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
11/09/23 20:15:48 INFO tool.CodeGenTool: Beginning code generation
11/09/23 20:15:48 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `tb_keyword_data_201104` AS t LIMIT 1
11/09/23 20:15:48 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `tb_keyword_data_201104` AS t LIMIT 1
11/09/23 20:15:48 INFO orm.CompilationManager: HADOOP_HOME is /home/wanghai01/hadoop/hadoop-0.20.2/bin/..
11/09/23 20:15:48 INFO orm.CompilationManager: Found hadoop core jar at: /home/wanghai01/hadoop/hadoop-0.20.2/bin/../hadoop-0.20.2-core.jar
11/09/23 20:15:49 ERROR orm.CompilationManager: Could not rename /tmp/sqoop-wanghai01/compile/eb16aae87a119b93acb3bc6ea74b5e97/tb_keyword_data_201104.java to /home/wanghai01/cloudera/sqoop-1.2.0-CDH3B4/bin/./tb_keyword_data_201104.java
11/09/23 20:15:49 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-wanghai01/compile/eb16aae87a119b93acb3bc6ea74b5e97/tb_keyword_data_201104.jar
11/09/23 20:15:49 INFO mapreduce.ExportJobBase: Beginning export of tb_keyword_data_201104
11/09/23 20:15:49 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `tb_keyword_data_201104` AS t LIMIT 1
11/09/23 20:15:49 INFO input.FileInputFormat: Total input paths to process : 1
11/09/23 20:15:49 INFO input.FileInputFormat: Total input paths to process : 1
11/09/23 20:15:49 INFO mapred.JobClient: Running job: job_201109211521_0012
11/09/23 20:15:50 INFO mapred.JobClient:  map 0% reduce 0%
11/09/23 20:16:04 INFO mapred.JobClient:  map 1% reduce 0%
11/09/23 20:16:10 INFO mapred.JobClient:  map 2% reduce 0%
11/09/23 20:16:13 INFO mapred.JobClient:  map 3% reduce 0%
11/09/23 20:16:19 INFO mapred.JobClient:  map 4% reduce 0%
11/09/23 20:16:22 INFO mapred.JobClient:  map 5% reduce 0%
11/09/23 20:16:25 INFO mapred.JobClient:  map 6% reduce 0%
11/09/23 20:16:31 INFO mapred.JobClient:  map 7% reduce 0%
11/09/23 20:16:34 INFO mapred.JobClient:  map 8% reduce 0%
11/09/23 20:16:41 INFO mapred.JobClient:  map 9% reduce 0%
11/09/23 20:16:44 INFO mapred.JobClient:  map 10% reduce 0%
11/09/23 20:16:50 INFO mapred.JobClient:  map 11% reduce 0%
11/09/23 20:16:53 INFO mapred.JobClient:  map 12% reduce 0%
11/09/23 20:16:56 INFO mapred.JobClient:  map 13% reduce 0%
11/09/23 20:17:02 INFO mapred.JobClient:  map 14% reduce 0%
11/09/23 20:17:05 INFO mapred.JobClient:  map 15% reduce 0%
11/09/23 20:17:11 INFO mapred.JobClient:  map 16% reduce 0%
11/09/23 20:17:14 INFO mapred.JobClient:  map 17% reduce 0%
11/09/23 20:17:17 INFO mapred.JobClient:  map 18% reduce 0%
11/09/23 20:17:23 INFO mapred.JobClient:  map 19% reduce 0%
11/09/23 20:17:25 INFO mapred.JobClient:  map 20% reduce 0%
11/09/23 20:17:28 INFO mapred.JobClient:  map 21% reduce 0%
11/09/23 20:17:34 INFO mapred.JobClient:  map 22% reduce 0%
11/09/23 20:17:37 INFO mapred.JobClient:  map 23% reduce 0%
11/09/23 20:17:43 INFO mapred.JobClient:  map 24% reduce 0%
11/09/23 20:17:46 INFO mapred.JobClient:  map 25% reduce 0%
11/09/23 20:17:49 INFO mapred.JobClient:  map 26% reduce 0%
11/09/23 20:17:55 INFO mapred.JobClient:  map 27% reduce 0%
11/09/23 20:17:58 INFO mapred.JobClient:  map 28% reduce 0%
11/09/23 20:18:04 INFO mapred.JobClient:  map 29% reduce 0%
11/09/23 20:18:07 INFO mapred.JobClient:  map 30% reduce 0%
11/09/23 20:18:10 INFO mapred.JobClient:  map 31% reduce 0%
11/09/23 20:18:16 INFO mapred.JobClient:  map 32% reduce 0%
11/09/23 20:18:19 INFO mapred.JobClient:  map 33% reduce 0%
11/09/23 20:18:25 INFO mapred.JobClient:  map 34% reduce 0%
11/09/23 20:18:28 INFO mapred.JobClient:  map 35% reduce 0%
11/09/23 20:18:31 INFO mapred.JobClient:  map 36% reduce 0%
11/09/23 20:18:37 INFO mapred.JobClient:  map 37% reduce 0%
11/09/23 20:18:40 INFO mapred.JobClient:  map 38% reduce 0%
11/09/23 20:18:46 INFO mapred.JobClient:  map 39% reduce 0%
11/09/23 20:18:49 INFO mapred.JobClient:  map 40% reduce 0%
11/09/23 20:18:52 INFO mapred.JobClient:  map 41% reduce 0%
11/09/23 20:18:58 INFO mapred.JobClient:  map 42% reduce 0%
11/09/23 20:19:01 INFO mapred.JobClient:  map 43% reduce 0%
11/09/23 20:19:04 INFO mapred.JobClient:  map 44% reduce 0%
11/09/23 20:19:10 INFO mapred.JobClient:  map 45% reduce 0%
11/09/23 20:19:13 INFO mapred.JobClient:  map 46% reduce 0%
11/09/23 20:19:19 INFO mapred.JobClient:  map 47% reduce 0%
11/09/23 20:19:22 INFO mapred.JobClient:  map 48% reduce 0%
11/09/23 20:19:25 INFO mapred.JobClient:  map 49% reduce 0%
11/09/23 20:19:34 INFO mapred.JobClient:  map 50% reduce 0%
11/09/23 20:19:37 INFO mapred.JobClient:  map 52% reduce 0%
11/09/23 20:19:40 INFO mapred.JobClient:  map 53% reduce 0%
11/09/23 20:19:43 INFO mapred.JobClient:  map 54% reduce 0%
11/09/23 20:19:46 INFO mapred.JobClient:  map 55% reduce 0%
11/09/23 20:19:49 INFO mapred.JobClient:  map 56% reduce 0%
11/09/23 20:19:52 INFO mapred.JobClient:  map 57% reduce 0%
11/09/23 20:19:55 INFO mapred.JobClient:  map 58% reduce 0%
11/09/23 20:19:58 INFO mapred.JobClient:  map 59% reduce 0%
11/09/23 20:20:01 INFO mapred.JobClient:  map 60% reduce 0%
11/09/23 20:20:04 INFO mapred.JobClient:  map 62% reduce 0%
11/09/23 20:20:07 INFO mapred.JobClient:  map 63% reduce 0%
11/09/23 20:20:10 INFO mapred.JobClient:  map 64% reduce 0%
11/09/23 20:20:13 INFO mapred.JobClient:  map 65% reduce 0%
11/09/23 20:20:16 INFO mapred.JobClient:  map 66% reduce 0%
11/09/23 20:20:19 INFO mapred.JobClient:  map 67% reduce 0%
11/09/23 20:20:22 INFO mapred.JobClient:  map 68% reduce 0%
11/09/23 20:20:25 INFO mapred.JobClient:  map 69% reduce 0%
11/09/23 20:20:28 INFO mapred.JobClient:  map 70% reduce 0%
11/09/23 20:20:31 INFO mapred.JobClient:  map 72% reduce 0%
11/09/23 20:20:34 INFO mapred.JobClient:  map 73% reduce 0%
11/09/23 20:20:37 INFO mapred.JobClient:  map 74% reduce 0%
11/09/23 20:20:40 INFO mapred.JobClient:  map 75% reduce 0%
11/09/23 20:20:43 INFO mapred.JobClient:  map 76% reduce 0%
11/09/23 20:20:46 INFO mapred.JobClient:  map 77% reduce 0%
11/09/23 20:20:49 INFO mapred.JobClient:  map 78% reduce 0%
11/09/23 20:20:52 INFO mapred.JobClient:  map 80% reduce 0%
11/09/23 20:20:55 INFO mapred.JobClient:  map 81% reduce 0%
11/09/23 20:20:58 INFO mapred.JobClient:  map 82% reduce 0%
11/09/23 20:21:01 INFO mapred.JobClient:  map 83% reduce 0%
11/09/23 20:21:04 INFO mapred.JobClient:  map 84% reduce 0%
11/09/23 20:21:07 INFO mapred.JobClient:  map 85% reduce 0%
11/09/23 20:21:10 INFO mapred.JobClient:  map 86% reduce 0%
11/09/23 20:21:13 INFO mapred.JobClient:  map 87% reduce 0%
11/09/23 20:21:22 INFO mapred.JobClient:  map 88% reduce 0%
11/09/23 20:21:28 INFO mapred.JobClient:  map 89% reduce 0%
11/09/23 20:21:37 INFO mapred.JobClient:  map 90% reduce 0%
11/09/23 20:21:47 INFO mapred.JobClient:  map 91% reduce 0%
11/09/23 20:21:53 INFO mapred.JobClient:  map 92% reduce 0%
11/09/23 20:22:02 INFO mapred.JobClient:  map 93% reduce 0%
11/09/23 20:22:11 INFO mapred.JobClient:  map 94% reduce 0%
11/09/23 20:22:17 INFO mapred.JobClient:  map 95% reduce 0%
11/09/23 20:22:26 INFO mapred.JobClient:  map 96% reduce 0%
11/09/23 20:22:32 INFO mapred.JobClient:  map 97% reduce 0%
11/09/23 20:22:41 INFO mapred.JobClient:  map 98% reduce 0%
11/09/23 20:22:47 INFO mapred.JobClient:  map 99% reduce 0%
11/09/23 20:22:53 INFO mapred.JobClient:  map 100% reduce 0%
11/09/23 20:22:55 INFO mapred.JobClient: Job complete: job_201109211521_0012
11/09/23 20:22:55 INFO mapred.JobClient: Counters: 6
11/09/23 20:22:55 INFO mapred.JobClient:   Job Counters
11/09/23 20:22:55 INFO mapred.JobClient:     Launched map tasks=4
11/09/23 20:22:55 INFO mapred.JobClient:     Data-local map tasks=4
11/09/23 20:22:55 INFO mapred.JobClient:   FileSystemCounters
11/09/23 20:22:55 INFO mapred.JobClient:     HDFS_BYTES_READ=1392402240
11/09/23 20:22:55 INFO mapred.JobClient:   Map-Reduce Framework
11/09/23 20:22:55 INFO mapred.JobClient:     Map input records=11628209
11/09/23 20:22:55 INFO mapred.JobClient:     Spilled Records=0
11/09/23 20:22:55 INFO mapred.JobClient:     Map output records=11628209
11/09/23 20:22:55 INFO mapreduce.ExportJobBase: Transferred 1.2968 GB in 425.642 seconds (3.1198 MB/sec)
11/09/23 20:22:55 INFO mapreduce.ExportJobBase: Exported 11628209 records.
Fri Sep 23 20:22:55 CST 2011

###############

[wanghai01@tc-crm-rd01.tc.baidu.com bin]$ sh import.sh
Fri Sep 23 20:40:33 CST 2011
11/09/23 20:40:33 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
11/09/23 20:40:33 INFO tool.CodeGenTool: Beginning code generation
11/09/23 20:40:33 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `tb_keyword_data_201104` AS t LIMIT 1
11/09/23 20:40:33 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `tb_keyword_data_201104` AS t LIMIT 1
11/09/23 20:40:33 INFO orm.CompilationManager: HADOOP_HOME is /home/wanghai01/hadoop/hadoop-0.20.2/bin/..
11/09/23 20:40:33 INFO orm.CompilationManager: Found hadoop core jar at: /home/wanghai01/hadoop/hadoop-0.20.2/bin/../hadoop-0.20.2-core.jar
11/09/23 20:40:34 ERROR orm.CompilationManager: Could not rename /tmp/sqoop-wanghai01/compile/a913cede5621df95376a26c1af737ee2/tb_keyword_data_201104.java to /home/wanghai01/cloudera/sqoop-1.2.0-CDH3B4/bin/./tb_keyword_data_201104.java
11/09/23 20:40:34 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-wanghai01/compile/a913cede5621df95376a26c1af737ee2/tb_keyword_data_201104.jar
11/09/23 20:40:34 WARN manager.MySQLManager: It looks like you are importing from mysql.
11/09/23 20:40:34 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
11/09/23 20:40:34 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
11/09/23 20:40:34 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
11/09/23 20:40:34 INFO mapreduce.ImportJobBase: Beginning import of tb_keyword_data_201104
11/09/23 20:40:34 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `tb_keyword_data_201104` AS t LIMIT 1
11/09/23 20:40:40 INFO mapred.JobClient: Running job: job_201109211521_0014
11/09/23 20:40:41 INFO mapred.JobClient:  map 0% reduce 0%
11/09/23 20:40:54 INFO mapred.JobClient:  map 25% reduce 0%
11/09/23 20:40:57 INFO mapred.JobClient:  map 50% reduce 0%
11/09/23 20:41:36 INFO mapred.JobClient:  map 75% reduce 0%
11/09/23 20:42:00 INFO mapred.JobClient:  map 100% reduce 0%
11/09/23 20:43:19 INFO mapred.JobClient: Job complete: job_201109211521_0014
11/09/23 20:43:19 INFO mapred.JobClient: Counters: 5
11/09/23 20:43:19 INFO mapred.JobClient:   Job Counters
11/09/23 20:43:19 INFO mapred.JobClient:     Launched map tasks=4
11/09/23 20:43:19 INFO mapred.JobClient:   FileSystemCounters
11/09/23 20:43:19 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=1601269219
11/09/23 20:43:19 INFO mapred.JobClient:   Map-Reduce Framework
11/09/23 20:43:19 INFO mapred.JobClient:     Map input records=11628209
11/09/23 20:43:19 INFO mapred.JobClient:     Spilled Records=0
11/09/23 20:43:19 INFO mapred.JobClient:     Map output records=11628209
11/09/23 20:43:19 INFO mapreduce.ImportJobBase: Transferred 1.4913 GB in 165.0126 seconds (9.2544 MB/sec)
11/09/23 20:43:19 INFO mapreduce.ImportJobBase: Retrieved 11628209 records.
Fri Sep 23 20:43:19 CST 2011

import.sh和export.sh中的主要命令如下

/home/wanghai01/cloudera/sqoop-1.2.0-CDH3B4/bin/sqoop import --connect jdbc:mysql://XXXX/crm --username XX --password XX --table tb_keyword_data_201104 --split-by winfo_id --target-dir /user/wanghai01/data/ --fields-terminated-by '\t' --lines-terminated-by '\n' --input-null-string '' --input-null-non-string ''
/home/wanghai01/cloudera/sqoop-1.2.0-CDH3B4/bin/sqoop export --connect jdbc:mysql://XXXX/crm --username XX --password XX --table tb_keyword_data_201104 --export-dir /user/wanghai01/data/ --fields-terminated-by '\t' --lines-terminated-by '\n' --input-null-string '' --input-null-non-string ''

声明
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系admin@php.cn
mysql无法打开共享库怎么解决mysql无法打开共享库怎么解决Mar 04, 2025 pm 04:01 PM

本文介绍了MySQL的“无法打开共享库”错误。 该问题源于MySQL无法找到必要的共享库(.SO/.DLL文件)。解决方案涉及通过系统软件包M验证库安装

减少在Docker中使用MySQL内存的使用减少在Docker中使用MySQL内存的使用Mar 04, 2025 pm 03:52 PM

本文探讨了Docker中的优化MySQL内存使用量。 它讨论了监视技术(Docker统计,性能架构,外部工具)和配置策略。 其中包括Docker内存限制,交换和cgroups

如何使用Alter Table语句在MySQL中更改表?如何使用Alter Table语句在MySQL中更改表?Mar 19, 2025 pm 03:51 PM

本文讨论了使用MySQL的Alter Table语句修改表,包括添加/删除列,重命名表/列以及更改列数据类型。

在 Linux 中运行 MySQl(有/没有带有 phpmyadmin 的 podman 容器)在 Linux 中运行 MySQl(有/没有带有 phpmyadmin 的 podman 容器)Mar 04, 2025 pm 03:54 PM

本文比较使用/不使用PhpMyAdmin的Podman容器直接在Linux上安装MySQL。 它详细介绍了每种方法的安装步骤,强调了Podman在孤立,可移植性和可重复性方面的优势,还

什么是 SQLite?全面概述什么是 SQLite?全面概述Mar 04, 2025 pm 03:55 PM

本文提供了SQLite的全面概述,SQLite是一个独立的,无服务器的关系数据库。 它详细介绍了SQLite的优势(简单,可移植性,易用性)和缺点(并发限制,可伸缩性挑战)。 c

如何为MySQL连接配置SSL/TLS加密?如何为MySQL连接配置SSL/TLS加密?Mar 18, 2025 pm 12:01 PM

文章讨论了为MySQL配置SSL/TLS加密,包括证书生成和验证。主要问题是使用自签名证书的安全含义。[角色计数:159]

在MacOS上运行多个MySQL版本:逐步指南在MacOS上运行多个MySQL版本:逐步指南Mar 04, 2025 pm 03:49 PM

本指南展示了使用自制在MacOS上安装和管理多个MySQL版本。 它强调使用自制装置隔离安装,以防止冲突。 本文详细详细介绍了安装,起始/停止服务和最佳PRA

哪些流行的MySQL GUI工具(例如MySQL Workbench,PhpMyAdmin)是什么?哪些流行的MySQL GUI工具(例如MySQL Workbench,PhpMyAdmin)是什么?Mar 21, 2025 pm 06:28 PM

文章讨论了流行的MySQL GUI工具,例如MySQL Workbench和PhpMyAdmin,比较了它们对初学者和高级用户的功能和适合性。[159个字符]

See all articles

热AI工具

Undresser.AI Undress

Undresser.AI Undress

人工智能驱动的应用程序,用于创建逼真的裸体照片

AI Clothes Remover

AI Clothes Remover

用于从照片中去除衣服的在线人工智能工具。

Undress AI Tool

Undress AI Tool

免费脱衣服图片

Clothoff.io

Clothoff.io

AI脱衣机

AI Hentai Generator

AI Hentai Generator

免费生成ai无尽的。

热门文章

R.E.P.O.能量晶体解释及其做什么(黄色晶体)
2 周前By尊渡假赌尊渡假赌尊渡假赌
仓库:如何复兴队友
1 个月前By尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island冒险:如何获得巨型种子
4 周前By尊渡假赌尊渡假赌尊渡假赌

热工具

Dreamweaver Mac版

Dreamweaver Mac版

视觉化网页开发工具

SublimeText3 Linux新版

SublimeText3 Linux新版

SublimeText3 Linux最新版

SublimeText3汉化版

SublimeText3汉化版

中文版,非常好用

SublimeText3 英文版

SublimeText3 英文版

推荐:为Win版本,支持代码提示!

ZendStudio 13.5.1 Mac

ZendStudio 13.5.1 Mac

功能强大的PHP集成开发环境