search
HomeDatabaseMysql TutorialHive入门3–Hive与HBase的整合

Hive入门3–Hive与HBase的整合

Jun 07, 2016 pm 04:25 PM
hbasehivegetting StartedIntegrate

开场白: Hive与HBase的整合功能的实现是利用两者本身对外的API接口互相进行通信,相互通信主要是依靠hive_hbase-handler.jar工具类 (Hive Storage Handlers), 大致意思如图所示: 口水: 对 hive_hbase-handler.jar 这个东东还有点兴趣,有空来磋磨一下。

开场白:
Hive与HBase的整合功能的实现是利用两者本身对外的API接口互相进行通信,相互通信主要是依靠hive_hbase-handler.jar工具类 (Hive Storage Handlers), 大致意思如图所示:
hive-hbase

口水:
 对 hive_hbase-handler.jar 这个东东还有点兴趣,有空来磋磨一下。

一、2个注意事项:
1、需要的软件有 Hadoop、Hive、Hbase、Zookeeper,Hive与HBase的整合对Hive的版本有要求,所以不要下载.0.6.0以前的老版本,Hive.0.6.0的版本才支持与HBase对接,因此在Hive的lib目录下可以看见多了hive_hbase-handler.jar这个jar包,他是Hive扩展存储的Handler ,HBase 建议使用 0.20.6的版本,这次我没有启动HDFS的集群环境,本次所有测试环境都在一台机器上。
    
2、运行Hive时,也许会出现如下错误,表示你的JVM分配的空间不够,错误信息如下:
Invalid maximum heap size: -Xmx4096m
The specified size exceeds the maximum representable size.
Could not create the Java virtual machine.

解决方法:
/work/hive/bin/ext# vim util/execHiveCmd.sh 文件中第33行
修改,
HADOOP_HEAPSIZE=4096

HADOOP_HEAPSIZE=256

另外,在 /etc/profile/ 加入 export $HIVE_HOME=/work/hive

二、启动运行环境
1启动Hive
hive –auxpath /work/hive/lib/hive_hbase-handler.jar,/work/hive/lib/hbase-0.20.3.jar,/work/hive/lib/zookeeper-3.2.2.jar -hiveconf hbase.master=127.0.0.1:60000
加载 Hive需要的工具类,并且指向HBase的master服务器地址,我的HBase master服务器和Hive运行在同一台机器,所以我指向本地。

2启动HBase
/work/hbase/bin/hbase master start

3启动Zookeeper
/work/zookeeper/bin/zkServer.sh start

三、执行
在Hive中创建一张表,相互关联的表
CREATE TABLE hbase_table_1(key int, value string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val") TBLPROPERTIES ("hbase.table.name" = "xyz");

在运行一个在Hive中建表语句,并且将数据导入
建表
    CREATE TABLE pokes (foo INT, bar STRING);
数据导入
    LOAD DATA LOCAL INPATH '/work/hive/examples/files/kv1.txt' OVERWRITE INTO TABLE pokes;

在Hive与HBase关联的表中 插入一条数据
    INSERT OVERWRITE TABLE hbase_table_1 SELECT * FROM pokes WHERE foo=98;
运行成功后,如图所示:
hive

插入数据时采用了MapReduce的策略算法,并且同时向HBase写入,如图所示:
Map-Reduce Job for INSERT

在HBase shell中运行 scan 'xyz' 和describe "xyz" 命令,查看表结构,运行结果如图所示:
hive

xyz是通过Hive在Hbase中创建的表,刚刚在Hive的建表语句中指定了映射的属性 "hbase.columns.mapping" = ":key,cf1:val"  和 在HBase中建表的名称 "hbase.table.name" = "xyz"

在hbase在运行put命令,插入一条记录
    put 'xyz','10001','cf1:val','www.javabloger.com'

在hive上运行查询语句,看看刚刚在hbase中插入的数据有没有同步过来,
    select * from hbase_table_1 WHERE key=10001;
如图所示:
hive

最终的效果
    以上整合过程和操作步骤已经执行完毕,现在Hive中添加记录HBase中有记录添加,同样你在HBase中添加记录Hive中也会添加, 表示Hive与HBase整合成功,对海量级别的数据我们是不是可以在HBase写入,在Hive中查询 喃?因为HBase 不支持复杂的查询,但是HBase可以作为基于 key 获取一行或多行数据,或者扫描数据区间,以及过滤操作。而复杂的查询可以让Hive来完成,一个作为存储的入口(HBase),一个作为查询的入口(Hive)。如下图示。
    hive mapreduce
    
    呵呵,见笑了,以上只是我面片的观点。

先这样,稍后我将继续更新,感谢你的阅读。

相关文章:
 Apache Hive入门2
 Apache Hive入门1

 HBase入门篇4
 HBase入门篇3
 HBase入门篇2
 HBase入门篇

–end–

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Explain the ACID properties (Atomicity, Consistency, Isolation, Durability).Explain the ACID properties (Atomicity, Consistency, Isolation, Durability).Apr 16, 2025 am 12:20 AM

ACID attributes include atomicity, consistency, isolation and durability, and are the cornerstone of database design. 1. Atomicity ensures that the transaction is either completely successful or completely failed. 2. Consistency ensures that the database remains consistent before and after a transaction. 3. Isolation ensures that transactions do not interfere with each other. 4. Persistence ensures that data is permanently saved after transaction submission.

MySQL: Database Management System vs. Programming LanguageMySQL: Database Management System vs. Programming LanguageApr 16, 2025 am 12:19 AM

MySQL is not only a database management system (DBMS) but also closely related to programming languages. 1) As a DBMS, MySQL is used to store, organize and retrieve data, and optimizing indexes can improve query performance. 2) Combining SQL with programming languages, embedded in Python, using ORM tools such as SQLAlchemy can simplify operations. 3) Performance optimization includes indexing, querying, caching, library and table division and transaction management.

MySQL: Managing Data with SQL CommandsMySQL: Managing Data with SQL CommandsApr 16, 2025 am 12:19 AM

MySQL uses SQL commands to manage data. 1. Basic commands include SELECT, INSERT, UPDATE and DELETE. 2. Advanced usage involves JOIN, subquery and aggregate functions. 3. Common errors include syntax, logic and performance issues. 4. Optimization tips include using indexes, avoiding SELECT* and using LIMIT.

MySQL's Purpose: Storing and Managing Data EffectivelyMySQL's Purpose: Storing and Managing Data EffectivelyApr 16, 2025 am 12:16 AM

MySQL is an efficient relational database management system suitable for storing and managing data. Its advantages include high-performance queries, flexible transaction processing and rich data types. In practical applications, MySQL is often used in e-commerce platforms, social networks and content management systems, but attention should be paid to performance optimization, data security and scalability.

SQL and MySQL: Understanding the RelationshipSQL and MySQL: Understanding the RelationshipApr 16, 2025 am 12:14 AM

The relationship between SQL and MySQL is the relationship between standard languages ​​and specific implementations. 1.SQL is a standard language used to manage and operate relational databases, allowing data addition, deletion, modification and query. 2.MySQL is a specific database management system that uses SQL as its operating language and provides efficient data storage and management.

Explain the role of InnoDB redo logs and undo logs.Explain the role of InnoDB redo logs and undo logs.Apr 15, 2025 am 12:16 AM

InnoDB uses redologs and undologs to ensure data consistency and reliability. 1.redologs record data page modification to ensure crash recovery and transaction persistence. 2.undologs records the original data value and supports transaction rollback and MVCC.

What are the key metrics to look for in an EXPLAIN output (type, key, rows, Extra)?What are the key metrics to look for in an EXPLAIN output (type, key, rows, Extra)?Apr 15, 2025 am 12:15 AM

Key metrics for EXPLAIN commands include type, key, rows, and Extra. 1) The type reflects the access type of the query. The higher the value, the higher the efficiency, such as const is better than ALL. 2) The key displays the index used, and NULL indicates no index. 3) rows estimates the number of scanned rows, affecting query performance. 4) Extra provides additional information, such as Usingfilesort prompts that it needs to be optimized.

What is the Using temporary status in EXPLAIN and how to avoid it?What is the Using temporary status in EXPLAIN and how to avoid it?Apr 15, 2025 am 12:14 AM

Usingtemporary indicates that the need to create temporary tables in MySQL queries, which are commonly found in ORDERBY using DISTINCT, GROUPBY, or non-indexed columns. You can avoid the occurrence of indexes and rewrite queries and improve query performance. Specifically, when Usingtemporary appears in EXPLAIN output, it means that MySQL needs to create temporary tables to handle queries. This usually occurs when: 1) deduplication or grouping when using DISTINCT or GROUPBY; 2) sort when ORDERBY contains non-index columns; 3) use complex subquery or join operations. Optimization methods include: 1) ORDERBY and GROUPB

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
1 months agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Chat Commands and How to Use Them
1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Atom editor mac version download

Atom editor mac version download

The most popular open source editor

MinGW - Minimalist GNU for Windows

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

EditPlus Chinese cracked version

EditPlus Chinese cracked version

Small size, syntax highlighting, does not support code prompt function

Dreamweaver Mac version

Dreamweaver Mac version

Visual web development tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor