搜尋
首頁資料庫mysql教程使用Sqoop在HDFS和RDBMS之间导数据

SQOOP是一款开源的工具,主要用于在HADOOP与传统的数据库间进行数据的传递,下面从SQOOP用户手册上摘录一段描述

SQOOP是一款开源的工具,主要用于在Hadoop与传统的数据库间进行数据的传递,,下面从SQOOP用户手册上摘录一段描述

Sqoopis a tool designed to transfer data between Hadoop and relational databases.You can use Sqoop to import data from a relational database management system(RDBMS) such as MySQL or Oracle into the Hadoop Distributed File System(HDFS),transform the data in Hadoop MapReduce, and then export the data backinto an RDBMS.

SQOOP是Cloudera公司开源的一款在HDFS以及数据库之间传输数据的软件。内部通过JDBC连接HADOOP以及数据库,因此从理论上来讲,只要是支持JDBC的数据库,SQOOP都可以兼容。并且,SQOOP不仅能把数据以文件的形式导入到HDFS上,还可以直接导入数据到HBASE或者HIVE中。

下面是一些性能测试数据,仅供参考:

表名:tb_keywords

行数:11628209

数据文件大小:1.4G

 

HDFS –> DB

DB -> HDFS

SQOOP

428s

166s

HDFSFILEDB

209s

105s

从结果上来看,以FILE作为中转方式性能是要高于SQOOP的。原因如下:

1、 本质上SQOOP使用的是JDBC,效率不会比MYSQL自带的到导入\导出工具效率高

2、 以导入数据到DB为例,SQOOP的设计思想是分阶段提交,也就是说假设一个表有1K行,那么它会先读出100行(默认值),然后插入,提交,再读取100行……如此往复

即便如此,SQOOP也是有优势的,比如说使用的便利性,任务执行的容错性等。在一些测试环境中如果需要的话可以考虑把它拿来作为一个工具使用。

下面是一些操作记录

[wanghai01@tc-crm-rd01.tc.baidu.com bin]$ sh export.sh
Fri Sep 23 20:15:47 CST 2011
11/09/23 20:15:48 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
11/09/23 20:15:48 INFO tool.CodeGenTool: Beginning code generation
11/09/23 20:15:48 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `tb_keyword_data_201104` AS t LIMIT 1
11/09/23 20:15:48 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `tb_keyword_data_201104` AS t LIMIT 1
11/09/23 20:15:48 INFO orm.CompilationManager: HADOOP_HOME is /home/wanghai01/hadoop/hadoop-0.20.2/bin/..
11/09/23 20:15:48 INFO orm.CompilationManager: Found hadoop core jar at: /home/wanghai01/hadoop/hadoop-0.20.2/bin/../hadoop-0.20.2-core.jar
11/09/23 20:15:49 ERROR orm.CompilationManager: Could not rename /tmp/sqoop-wanghai01/compile/eb16aae87a119b93acb3bc6ea74b5e97/tb_keyword_data_201104.java to /home/wanghai01/cloudera/sqoop-1.2.0-CDH3B4/bin/./tb_keyword_data_201104.java
11/09/23 20:15:49 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-wanghai01/compile/eb16aae87a119b93acb3bc6ea74b5e97/tb_keyword_data_201104.jar
11/09/23 20:15:49 INFO mapreduce.ExportJobBase: Beginning export of tb_keyword_data_201104
11/09/23 20:15:49 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `tb_keyword_data_201104` AS t LIMIT 1
11/09/23 20:15:49 INFO input.FileInputFormat: Total input paths to process : 1
11/09/23 20:15:49 INFO input.FileInputFormat: Total input paths to process : 1
11/09/23 20:15:49 INFO mapred.JobClient: Running job: job_201109211521_0012
11/09/23 20:15:50 INFO mapred.JobClient:  map 0% reduce 0%
11/09/23 20:16:04 INFO mapred.JobClient:  map 1% reduce 0%
11/09/23 20:16:10 INFO mapred.JobClient:  map 2% reduce 0%
11/09/23 20:16:13 INFO mapred.JobClient:  map 3% reduce 0%
11/09/23 20:16:19 INFO mapred.JobClient:  map 4% reduce 0%
11/09/23 20:16:22 INFO mapred.JobClient:  map 5% reduce 0%
11/09/23 20:16:25 INFO mapred.JobClient:  map 6% reduce 0%
11/09/23 20:16:31 INFO mapred.JobClient:  map 7% reduce 0%
11/09/23 20:16:34 INFO mapred.JobClient:  map 8% reduce 0%
11/09/23 20:16:41 INFO mapred.JobClient:  map 9% reduce 0%
11/09/23 20:16:44 INFO mapred.JobClient:  map 10% reduce 0%
11/09/23 20:16:50 INFO mapred.JobClient:  map 11% reduce 0%
11/09/23 20:16:53 INFO mapred.JobClient:  map 12% reduce 0%
11/09/23 20:16:56 INFO mapred.JobClient:  map 13% reduce 0%
11/09/23 20:17:02 INFO mapred.JobClient:  map 14% reduce 0%
11/09/23 20:17:05 INFO mapred.JobClient:  map 15% reduce 0%
11/09/23 20:17:11 INFO mapred.JobClient:  map 16% reduce 0%
11/09/23 20:17:14 INFO mapred.JobClient:  map 17% reduce 0%
11/09/23 20:17:17 INFO mapred.JobClient:  map 18% reduce 0%
11/09/23 20:17:23 INFO mapred.JobClient:  map 19% reduce 0%
11/09/23 20:17:25 INFO mapred.JobClient:  map 20% reduce 0%
11/09/23 20:17:28 INFO mapred.JobClient:  map 21% reduce 0%
11/09/23 20:17:34 INFO mapred.JobClient:  map 22% reduce 0%
11/09/23 20:17:37 INFO mapred.JobClient:  map 23% reduce 0%
11/09/23 20:17:43 INFO mapred.JobClient:  map 24% reduce 0%
11/09/23 20:17:46 INFO mapred.JobClient:  map 25% reduce 0%
11/09/23 20:17:49 INFO mapred.JobClient:  map 26% reduce 0%
11/09/23 20:17:55 INFO mapred.JobClient:  map 27% reduce 0%
11/09/23 20:17:58 INFO mapred.JobClient:  map 28% reduce 0%
11/09/23 20:18:04 INFO mapred.JobClient:  map 29% reduce 0%
11/09/23 20:18:07 INFO mapred.JobClient:  map 30% reduce 0%
11/09/23 20:18:10 INFO mapred.JobClient:  map 31% reduce 0%
11/09/23 20:18:16 INFO mapred.JobClient:  map 32% reduce 0%
11/09/23 20:18:19 INFO mapred.JobClient:  map 33% reduce 0%
11/09/23 20:18:25 INFO mapred.JobClient:  map 34% reduce 0%
11/09/23 20:18:28 INFO mapred.JobClient:  map 35% reduce 0%
11/09/23 20:18:31 INFO mapred.JobClient:  map 36% reduce 0%
11/09/23 20:18:37 INFO mapred.JobClient:  map 37% reduce 0%
11/09/23 20:18:40 INFO mapred.JobClient:  map 38% reduce 0%
11/09/23 20:18:46 INFO mapred.JobClient:  map 39% reduce 0%
11/09/23 20:18:49 INFO mapred.JobClient:  map 40% reduce 0%
11/09/23 20:18:52 INFO mapred.JobClient:  map 41% reduce 0%
11/09/23 20:18:58 INFO mapred.JobClient:  map 42% reduce 0%
11/09/23 20:19:01 INFO mapred.JobClient:  map 43% reduce 0%
11/09/23 20:19:04 INFO mapred.JobClient:  map 44% reduce 0%
11/09/23 20:19:10 INFO mapred.JobClient:  map 45% reduce 0%
11/09/23 20:19:13 INFO mapred.JobClient:  map 46% reduce 0%
11/09/23 20:19:19 INFO mapred.JobClient:  map 47% reduce 0%
11/09/23 20:19:22 INFO mapred.JobClient:  map 48% reduce 0%
11/09/23 20:19:25 INFO mapred.JobClient:  map 49% reduce 0%
11/09/23 20:19:34 INFO mapred.JobClient:  map 50% reduce 0%
11/09/23 20:19:37 INFO mapred.JobClient:  map 52% reduce 0%
11/09/23 20:19:40 INFO mapred.JobClient:  map 53% reduce 0%
11/09/23 20:19:43 INFO mapred.JobClient:  map 54% reduce 0%
11/09/23 20:19:46 INFO mapred.JobClient:  map 55% reduce 0%
11/09/23 20:19:49 INFO mapred.JobClient:  map 56% reduce 0%
11/09/23 20:19:52 INFO mapred.JobClient:  map 57% reduce 0%
11/09/23 20:19:55 INFO mapred.JobClient:  map 58% reduce 0%
11/09/23 20:19:58 INFO mapred.JobClient:  map 59% reduce 0%
11/09/23 20:20:01 INFO mapred.JobClient:  map 60% reduce 0%
11/09/23 20:20:04 INFO mapred.JobClient:  map 62% reduce 0%
11/09/23 20:20:07 INFO mapred.JobClient:  map 63% reduce 0%
11/09/23 20:20:10 INFO mapred.JobClient:  map 64% reduce 0%
11/09/23 20:20:13 INFO mapred.JobClient:  map 65% reduce 0%
11/09/23 20:20:16 INFO mapred.JobClient:  map 66% reduce 0%
11/09/23 20:20:19 INFO mapred.JobClient:  map 67% reduce 0%
11/09/23 20:20:22 INFO mapred.JobClient:  map 68% reduce 0%
11/09/23 20:20:25 INFO mapred.JobClient:  map 69% reduce 0%
11/09/23 20:20:28 INFO mapred.JobClient:  map 70% reduce 0%
11/09/23 20:20:31 INFO mapred.JobClient:  map 72% reduce 0%
11/09/23 20:20:34 INFO mapred.JobClient:  map 73% reduce 0%
11/09/23 20:20:37 INFO mapred.JobClient:  map 74% reduce 0%
11/09/23 20:20:40 INFO mapred.JobClient:  map 75% reduce 0%
11/09/23 20:20:43 INFO mapred.JobClient:  map 76% reduce 0%
11/09/23 20:20:46 INFO mapred.JobClient:  map 77% reduce 0%
11/09/23 20:20:49 INFO mapred.JobClient:  map 78% reduce 0%
11/09/23 20:20:52 INFO mapred.JobClient:  map 80% reduce 0%
11/09/23 20:20:55 INFO mapred.JobClient:  map 81% reduce 0%
11/09/23 20:20:58 INFO mapred.JobClient:  map 82% reduce 0%
11/09/23 20:21:01 INFO mapred.JobClient:  map 83% reduce 0%
11/09/23 20:21:04 INFO mapred.JobClient:  map 84% reduce 0%
11/09/23 20:21:07 INFO mapred.JobClient:  map 85% reduce 0%
11/09/23 20:21:10 INFO mapred.JobClient:  map 86% reduce 0%
11/09/23 20:21:13 INFO mapred.JobClient:  map 87% reduce 0%
11/09/23 20:21:22 INFO mapred.JobClient:  map 88% reduce 0%
11/09/23 20:21:28 INFO mapred.JobClient:  map 89% reduce 0%
11/09/23 20:21:37 INFO mapred.JobClient:  map 90% reduce 0%
11/09/23 20:21:47 INFO mapred.JobClient:  map 91% reduce 0%
11/09/23 20:21:53 INFO mapred.JobClient:  map 92% reduce 0%
11/09/23 20:22:02 INFO mapred.JobClient:  map 93% reduce 0%
11/09/23 20:22:11 INFO mapred.JobClient:  map 94% reduce 0%
11/09/23 20:22:17 INFO mapred.JobClient:  map 95% reduce 0%
11/09/23 20:22:26 INFO mapred.JobClient:  map 96% reduce 0%
11/09/23 20:22:32 INFO mapred.JobClient:  map 97% reduce 0%
11/09/23 20:22:41 INFO mapred.JobClient:  map 98% reduce 0%
11/09/23 20:22:47 INFO mapred.JobClient:  map 99% reduce 0%
11/09/23 20:22:53 INFO mapred.JobClient:  map 100% reduce 0%
11/09/23 20:22:55 INFO mapred.JobClient: Job complete: job_201109211521_0012
11/09/23 20:22:55 INFO mapred.JobClient: Counters: 6
11/09/23 20:22:55 INFO mapred.JobClient:   Job Counters
11/09/23 20:22:55 INFO mapred.JobClient:     Launched map tasks=4
11/09/23 20:22:55 INFO mapred.JobClient:     Data-local map tasks=4
11/09/23 20:22:55 INFO mapred.JobClient:   FileSystemCounters
11/09/23 20:22:55 INFO mapred.JobClient:     HDFS_BYTES_READ=1392402240
11/09/23 20:22:55 INFO mapred.JobClient:   Map-Reduce Framework
11/09/23 20:22:55 INFO mapred.JobClient:     Map input records=11628209
11/09/23 20:22:55 INFO mapred.JobClient:     Spilled Records=0
11/09/23 20:22:55 INFO mapred.JobClient:     Map output records=11628209
11/09/23 20:22:55 INFO mapreduce.ExportJobBase: Transferred 1.2968 GB in 425.642 seconds (3.1198 MB/sec)
11/09/23 20:22:55 INFO mapreduce.ExportJobBase: Exported 11628209 records.
Fri Sep 23 20:22:55 CST 2011

###############

[wanghai01@tc-crm-rd01.tc.baidu.com bin]$ sh import.sh
Fri Sep 23 20:40:33 CST 2011
11/09/23 20:40:33 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
11/09/23 20:40:33 INFO tool.CodeGenTool: Beginning code generation
11/09/23 20:40:33 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `tb_keyword_data_201104` AS t LIMIT 1
11/09/23 20:40:33 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `tb_keyword_data_201104` AS t LIMIT 1
11/09/23 20:40:33 INFO orm.CompilationManager: HADOOP_HOME is /home/wanghai01/hadoop/hadoop-0.20.2/bin/..
11/09/23 20:40:33 INFO orm.CompilationManager: Found hadoop core jar at: /home/wanghai01/hadoop/hadoop-0.20.2/bin/../hadoop-0.20.2-core.jar
11/09/23 20:40:34 ERROR orm.CompilationManager: Could not rename /tmp/sqoop-wanghai01/compile/a913cede5621df95376a26c1af737ee2/tb_keyword_data_201104.java to /home/wanghai01/cloudera/sqoop-1.2.0-CDH3B4/bin/./tb_keyword_data_201104.java
11/09/23 20:40:34 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-wanghai01/compile/a913cede5621df95376a26c1af737ee2/tb_keyword_data_201104.jar
11/09/23 20:40:34 WARN manager.MySQLManager: It looks like you are importing from mysql.
11/09/23 20:40:34 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
11/09/23 20:40:34 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
11/09/23 20:40:34 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
11/09/23 20:40:34 INFO mapreduce.ImportJobBase: Beginning import of tb_keyword_data_201104
11/09/23 20:40:34 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `tb_keyword_data_201104` AS t LIMIT 1
11/09/23 20:40:40 INFO mapred.JobClient: Running job: job_201109211521_0014
11/09/23 20:40:41 INFO mapred.JobClient:  map 0% reduce 0%
11/09/23 20:40:54 INFO mapred.JobClient:  map 25% reduce 0%
11/09/23 20:40:57 INFO mapred.JobClient:  map 50% reduce 0%
11/09/23 20:41:36 INFO mapred.JobClient:  map 75% reduce 0%
11/09/23 20:42:00 INFO mapred.JobClient:  map 100% reduce 0%
11/09/23 20:43:19 INFO mapred.JobClient: Job complete: job_201109211521_0014
11/09/23 20:43:19 INFO mapred.JobClient: Counters: 5
11/09/23 20:43:19 INFO mapred.JobClient:   Job Counters
11/09/23 20:43:19 INFO mapred.JobClient:     Launched map tasks=4
11/09/23 20:43:19 INFO mapred.JobClient:   FileSystemCounters
11/09/23 20:43:19 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=1601269219
11/09/23 20:43:19 INFO mapred.JobClient:   Map-Reduce Framework
11/09/23 20:43:19 INFO mapred.JobClient:     Map input records=11628209
11/09/23 20:43:19 INFO mapred.JobClient:     Spilled Records=0
11/09/23 20:43:19 INFO mapred.JobClient:     Map output records=11628209
11/09/23 20:43:19 INFO mapreduce.ImportJobBase: Transferred 1.4913 GB in 165.0126 seconds (9.2544 MB/sec)
11/09/23 20:43:19 INFO mapreduce.ImportJobBase: Retrieved 11628209 records.
Fri Sep 23 20:43:19 CST 2011

import.sh和export.sh中的主要命令如下

/home/wanghai01/cloudera/sqoop-1.2.0-CDH3B4/bin/sqoop import --connect jdbc:mysql://XXXX/crm --username XX --password XX --table tb_keyword_data_201104 --split-by winfo_id --target-dir /user/wanghai01/data/ --fields-terminated-by '\t' --lines-terminated-by '\n' --input-null-string '' --input-null-non-string ''
/home/wanghai01/cloudera/sqoop-1.2.0-CDH3B4/bin/sqoop export --connect jdbc:mysql://XXXX/crm --username XX --password XX --table tb_keyword_data_201104 --export-dir /user/wanghai01/data/ --fields-terminated-by '\t' --lines-terminated-by '\n' --input-null-string '' --input-null-non-string ''

陳述
本文內容由網友自願投稿,版權歸原作者所有。本站不承擔相應的法律責任。如發現涉嫌抄襲或侵權的內容,請聯絡admin@php.cn
如何識別和優化MySQL中的慢速查詢? (慢查詢日誌,performance_schema)如何識別和優化MySQL中的慢速查詢? (慢查詢日誌,performance_schema)Apr 10, 2025 am 09:36 AM

要優化MySQL慢查詢,需使用slowquerylog和performance_schema:1.啟用slowquerylog並設置閾值,記錄慢查詢;2.利用performance_schema分析查詢執行細節,找出性能瓶頸並優化。

MySQL和SQL:開發人員的基本技能MySQL和SQL:開發人員的基本技能Apr 10, 2025 am 09:30 AM

MySQL和SQL是開發者必備技能。 1.MySQL是開源的關係型數據庫管理系統,SQL是用於管理和操作數據庫的標準語言。 2.MySQL通過高效的數據存儲和檢索功能支持多種存儲引擎,SQL通過簡單語句完成複雜數據操作。 3.使用示例包括基本查詢和高級查詢,如按條件過濾和排序。 4.常見錯誤包括語法錯誤和性能問題,可通過檢查SQL語句和使用EXPLAIN命令優化。 5.性能優化技巧包括使用索引、避免全表掃描、優化JOIN操作和提升代碼可讀性。

描述MySQL異步主奴隸複製過程。描述MySQL異步主奴隸複製過程。Apr 10, 2025 am 09:30 AM

MySQL異步主從復制通過binlog實現數據同步,提升讀性能和高可用性。 1)主服務器記錄變更到binlog;2)從服務器通過I/O線程讀取binlog;3)從服務器的SQL線程應用binlog同步數據。

mysql:簡單的概念,用於輕鬆學習mysql:簡單的概念,用於輕鬆學習Apr 10, 2025 am 09:29 AM

MySQL是一個開源的關係型數據庫管理系統。 1)創建數據庫和表:使用CREATEDATABASE和CREATETABLE命令。 2)基本操作:INSERT、UPDATE、DELETE和SELECT。 3)高級操作:JOIN、子查詢和事務處理。 4)調試技巧:檢查語法、數據類型和權限。 5)優化建議:使用索引、避免SELECT*和使用事務。

MySQL:數據庫的用戶友好介紹MySQL:數據庫的用戶友好介紹Apr 10, 2025 am 09:27 AM

MySQL的安裝和基本操作包括:1.下載並安裝MySQL,設置根用戶密碼;2.使用SQL命令創建數據庫和表,如CREATEDATABASE和CREATETABLE;3.執行CRUD操作,使用INSERT,SELECT,UPDATE,DELETE命令;4.創建索引和存儲過程以優化性能和實現複雜邏輯。通過這些步驟,你可以從零開始構建和管理MySQL數據庫。

InnoDB緩衝池如何工作,為什麼對性能至關重要?InnoDB緩衝池如何工作,為什麼對性能至關重要?Apr 09, 2025 am 12:12 AM

InnoDBBufferPool通過將數據和索引頁加載到內存中來提升MySQL數據庫的性能。 1)數據頁加載到BufferPool中,減少磁盤I/O。 2)臟頁被標記並定期刷新到磁盤。 3)LRU算法管理數據頁淘汰。 4)預讀機制提前加載可能需要的數據頁。

MySQL:初學者的數據管理易用性MySQL:初學者的數據管理易用性Apr 09, 2025 am 12:07 AM

MySQL適合初學者使用,因為它安裝簡單、功能強大且易於管理數據。 1.安裝和配置簡單,適用於多種操作系統。 2.支持基本操作如創建數據庫和表、插入、查詢、更新和刪除數據。 3.提供高級功能如JOIN操作和子查詢。 4.可以通過索引、查詢優化和分錶分區來提升性能。 5.支持備份、恢復和安全措施,確保數據的安全和一致性。

與MySQL中使用索引相比,全表掃描何時可以更快?與MySQL中使用索引相比,全表掃描何時可以更快?Apr 09, 2025 am 12:05 AM

全表掃描在MySQL中可能比使用索引更快,具體情況包括:1)數據量較小時;2)查詢返回大量數據時;3)索引列不具備高選擇性時;4)複雜查詢時。通過分析查詢計劃、優化索引、避免過度索引和定期維護表,可以在實際應用中做出最優選擇。

See all articles

熱AI工具

Undresser.AI Undress

Undresser.AI Undress

人工智慧驅動的應用程序,用於創建逼真的裸體照片

AI Clothes Remover

AI Clothes Remover

用於從照片中去除衣服的線上人工智慧工具。

Undress AI Tool

Undress AI Tool

免費脫衣圖片

Clothoff.io

Clothoff.io

AI脫衣器

AI Hentai Generator

AI Hentai Generator

免費產生 AI 無盡。

熱門文章

R.E.P.O.能量晶體解釋及其做什麼(黃色晶體)
3 週前By尊渡假赌尊渡假赌尊渡假赌
R.E.P.O.最佳圖形設置
3 週前By尊渡假赌尊渡假赌尊渡假赌
R.E.P.O.如果您聽不到任何人,如何修復音頻
3 週前By尊渡假赌尊渡假赌尊渡假赌
WWE 2K25:如何解鎖Myrise中的所有內容
3 週前By尊渡假赌尊渡假赌尊渡假赌

熱工具

WebStorm Mac版

WebStorm Mac版

好用的JavaScript開發工具

禪工作室 13.0.1

禪工作室 13.0.1

強大的PHP整合開發環境

SublimeText3 英文版

SublimeText3 英文版

推薦:為Win版本,支援程式碼提示!

SublimeText3 Mac版

SublimeText3 Mac版

神級程式碼編輯軟體(SublimeText3)

DVWA

DVWA

Damn Vulnerable Web App (DVWA) 是一個PHP/MySQL的Web應用程序,非常容易受到攻擊。它的主要目標是成為安全專業人員在合法環境中測試自己的技能和工具的輔助工具,幫助Web開發人員更好地理解保護網路應用程式的過程,並幫助教師/學生在課堂環境中教授/學習Web應用程式安全性。 DVWA的目標是透過簡單直接的介面練習一些最常見的Web漏洞,難度各不相同。請注意,該軟體中