开场白: Hive与HBase的整合功能的实现是利用两者本身对外的API接口互相进行通信,相互通信主要是依靠hive_hbase-handler.jar工具类 (Hive Storage Handlers), 大致意思如图所示: 口水: 对 hive_hbase-handler.jar 这个东东还有点兴趣,有空来磋磨一下。
开场白:
Hive与HBase的整合功能的实现是利用两者本身对外的API接口互相进行通信,相互通信主要是依靠hive_hbase-handler.jar工具类 (Hive Storage Handlers), 大致意思如图所示:
口水:
对 hive_hbase-handler.jar 这个东东还有点兴趣,有空来磋磨一下。
一、2个注意事项:
1、需要的软件有 Hadoop、Hive、Hbase、Zookeeper,Hive与HBase的整合对Hive的版本有要求,所以不要下载.0.6.0以前的老版本,Hive.0.6.0的版本才支持与HBase对接,因此在Hive的lib目录下可以看见多了hive_hbase-handler.jar这个jar包,他是Hive扩展存储的Handler ,HBase 建议使用 0.20.6的版本,这次我没有启动HDFS的集群环境,本次所有测试环境都在一台机器上。
2、运行Hive时,也许会出现如下错误,表示你的JVM分配的空间不够,错误信息如下:
Invalid maximum heap size: -Xmx4096m
The specified size exceeds the maximum representable size.
Could not create the Java virtual machine.
解决方法:
/work/hive/bin/ext# vim util/execHiveCmd.sh 文件中第33行
修改,
HADOOP_HEAPSIZE=4096
为
HADOOP_HEAPSIZE=256
另外,在 /etc/profile/ 加入 export $HIVE_HOME=/work/hive
二、启动运行环境
1启动Hive
hive –auxpath /work/hive/lib/hive_hbase-handler.jar,/work/hive/lib/hbase-0.20.3.jar,/work/hive/lib/zookeeper-3.2.2.jar -hiveconf hbase.master=127.0.0.1:60000
加载 Hive需要的工具类,并且指向HBase的master服务器地址,我的HBase master服务器和Hive运行在同一台机器,所以我指向本地。
2启动HBase
/work/hbase/bin/hbase master start
3启动Zookeeper
/work/zookeeper/bin/zkServer.sh start
三、执行
在Hive中创建一张表,相互关联的表
CREATE TABLE hbase_table_1(key int, value string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val") TBLPROPERTIES ("hbase.table.name" = "xyz");
在运行一个在Hive中建表语句,并且将数据导入
建表
CREATE TABLE pokes (foo INT, bar STRING);
数据导入
LOAD DATA LOCAL INPATH '/work/hive/examples/files/kv1.txt' OVERWRITE INTO TABLE pokes;
在Hive与HBase关联的表中 插入一条数据
INSERT OVERWRITE TABLE hbase_table_1 SELECT * FROM pokes WHERE foo=98;
运行成功后,如图所示:
插入数据时采用了MapReduce的策略算法,并且同时向HBase写入,如图所示:
在HBase shell中运行 scan 'xyz' 和describe "xyz" 命令,查看表结构,运行结果如图所示:
xyz是通过Hive在Hbase中创建的表,刚刚在Hive的建表语句中指定了映射的属性 "hbase.columns.mapping" = ":key,cf1:val" 和 在HBase中建表的名称 "hbase.table.name" = "xyz"
在hbase在运行put命令,插入一条记录
put 'xyz','10001','cf1:val','www.javabloger.com'
在hive上运行查询语句,看看刚刚在hbase中插入的数据有没有同步过来,
select * from hbase_table_1 WHERE key=10001;
如图所示:
最终的效果
以上整合过程和操作步骤已经执行完毕,现在Hive中添加记录HBase中有记录添加,同样你在HBase中添加记录Hive中也会添加, 表示Hive与HBase整合成功,对海量级别的数据我们是不是可以在HBase写入,在Hive中查询 喃?因为HBase 不支持复杂的查询,但是HBase可以作为基于 key 获取一行或多行数据,或者扫描数据区间,以及过滤操作。而复杂的查询可以让Hive来完成,一个作为存储的入口(HBase),一个作为查询的入口(Hive)。如下图示。
呵呵,见笑了,以上只是我面片的观点。
先这样,稍后我将继续更新,感谢你的阅读。
相关文章:
Apache Hive入门2
Apache Hive入门1
HBase入门篇4
HBase入门篇3
HBase入门篇2
HBase入门篇
–end–
原文地址:Hive入门3–Hive与HBase的整合, 感谢原作者分享。

Mastering the method of adding MySQL users is crucial for database administrators and developers because it ensures the security and access control of the database. 1) Create a new user using the CREATEUSER command, 2) Assign permissions through the GRANT command, 3) Use FLUSHPRIVILEGES to ensure permissions take effect, 4) Regularly audit and clean user accounts to maintain performance and security.

ChooseCHARforfixed-lengthdata,VARCHARforvariable-lengthdata,andTEXTforlargetextfields.1)CHARisefficientforconsistent-lengthdatalikecodes.2)VARCHARsuitsvariable-lengthdatalikenames,balancingflexibilityandperformance.3)TEXTisidealforlargetextslikeartic

Best practices for handling string data types and indexes in MySQL include: 1) Selecting the appropriate string type, such as CHAR for fixed length, VARCHAR for variable length, and TEXT for large text; 2) Be cautious in indexing, avoid over-indexing, and create indexes for common queries; 3) Use prefix indexes and full-text indexes to optimize long string searches; 4) Regularly monitor and optimize indexes to keep indexes small and efficient. Through these methods, we can balance read and write performance and improve database efficiency.

ToaddauserremotelytoMySQL,followthesesteps:1)ConnecttoMySQLasroot,2)Createanewuserwithremoteaccess,3)Grantnecessaryprivileges,and4)Flushprivileges.BecautiousofsecurityrisksbylimitingprivilegesandaccesstospecificIPs,ensuringstrongpasswords,andmonitori

TostorestringsefficientlyinMySQL,choosetherightdatatypebasedonyourneeds:1)UseCHARforfixed-lengthstringslikecountrycodes.2)UseVARCHARforvariable-lengthstringslikenames.3)UseTEXTforlong-formtextcontent.4)UseBLOBforbinarydatalikeimages.Considerstorageov

When selecting MySQL's BLOB and TEXT data types, BLOB is suitable for storing binary data, and TEXT is suitable for storing text data. 1) BLOB is suitable for binary data such as pictures and audio, 2) TEXT is suitable for text data such as articles and comments. When choosing, data properties and performance optimization must be considered.

No,youshouldnotusetherootuserinMySQLforyourproduct.Instead,createspecificuserswithlimitedprivilegestoenhancesecurityandperformance:1)Createanewuserwithastrongpassword,2)Grantonlynecessarypermissionstothisuser,3)Regularlyreviewandupdateuserpermissions

MySQLstringdatatypesshouldbechosenbasedondatacharacteristicsandusecases:1)UseCHARforfixed-lengthstringslikecountrycodes.2)UseVARCHARforvariable-lengthstringslikenames.3)UseBINARYorVARBINARYforbinarydatalikecryptographickeys.4)UseBLOBorTEXTforlargeuns


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

SublimeText3 Linux new version
SublimeText3 Linux latest version

WebStorm Mac version
Useful JavaScript development tools
