Hive是基于Hadoop的数据仓库平台。 Hive提供了类SQL查询语言。Hive的数据存储于HDFS中。一般情况下,用户提交的查询将被Hive转换为MapReduce作业并提交给Hadoop运行。 我们从Hive的安装开始,逐步学习Hive的方方面面。 安装Hive 安装前提 l Java 6 l Hadoop
Hive是基于Hadoop的数据仓库平台。
Hive提供了类SQL查询语言。Hive的数据存储于HDFS中。一般情况下,用户提交的查询将被Hive转换为MapReduce作业并提交给Hadoop运行。
我们从Hive的安装开始,逐步学习Hive的方方面面。
安装Hive
安装前提
l Java 6
l Hadoop
选择哪一个版本请参照Hive官方文档。安装Have是不需要特别设置关于Hadoop的信息,只要保证HADOOP_HOME环境变量正确设置就可以了。
安装
我们选择下载0.11.1稳定版本。下载地址:
http://mirrors.hust.edu.cn/apache/hive/stable/
1) 解压安装包到指定的目录:
tar xzf hive-0.11.0.tar.gz
2) 设置环境变量
export HIVE_INSTALL=/opt/Hive-0.11.0
export PATH=$PATH:$HIVE_INSTALL/bin
3)输入以下命令进入Shell
Hive
Hive交互环境( Shell)
Shell是我们和Hive交互的主要工具。
Hive的查询语言我们称为HiveQL。HiveQL的设计受到了MySQL的很多影响,所以如果你熟悉MySQL的话,你会发现使用HiveQL是同样的方便。
进入Shell后,输入以下命令看看Hive是否工作正常:
SHOW TABLES;
输出结果为:
OK
Time taken: 8.207seconds
如果输出结果显示有错误,可能是Hadoop没有运行,或者HADOOP_HOME变量没有真确设置。
和SQL一样,HiveQL一般是大小写无关的(字符串比较除外)。
输入命令是按Tab键,Hive将提示所有可用的输入。(命令自动完成)
第一次使用该命令可能会花上好几秒中甚至更长,因为Hive将创建metastore数据库(存储于metastore_db目录,此目录在你运行hive时所在目录之下,所以第一次运行Hive时,请先进入到合适的目录下)。
我们也可以直接从命令行运行hive脚本,比如:
hive –f /home/user/ hive.q
其中,-f 后面跟上脚本文件名(包括路径)。
无论是在交互模式还是非交互模式下,hive一般都会输出一些辅助信息,比如执行命令的时间等。如果你不需要输出这些消息,可以在进入hive时加上-s选项,比如:
hive –S
注意:S为大写
简单示例
我们以以下数据作为测试数据,结构为(班级号,学号,成绩)。
C01,N0101,82
C01,N0102,59
C01,N0103,65
C02,N0201,81
C02,N0202,82
C02,N0203,79
C03,N0301,56
C03,N0302,92
C03,N0306,72
执行以下命令:
create table student(classNostring, stuNo string, score int) row format delimited fields terminated by ',';
其中,定义表结构和SQL类似.。其它设置表示字段间以逗号分隔,一行为一个记录。
load data local inpath '/home/user/input/student.txt'overwrite into table student;
输出结果如下:
Copying data fromfile:/home/user/input/student.txt
Copying file:file:/home/user/input/student.txt
Loading data to tabledefault.student
rmr: DEPRECATED: Please use 'rm-r' instead.
Deleted/user/hive/warehouse/student
Table default.student stats:[num_partitions: 0, num_files: 1, num_rows: 0, total_size: 117, raw_data_size:0]
这个命令将student.txt文件内容加载到表student中。这个加载操作将直接把student.txt文件复制到hive的warehouse目录中,这个目录由hive.metastore.warehouse.dir配置项设置,默认值为/user/hive/warehouse。Overwrite选项将导致Hive事先删除student目录下所有的文件。
Hive不会对student.txt做任何格式处理,因为Hive本身并不强调数据的存储格式。
此例中,Hive将数据存储于HDFS系统中。当然,Hive也可以将数据存储于本地。
如果不加overwrite选项,且加载的文件在Hive中已经存在,则Hive会为文件重新命名。比如不加overwrite选项将以上命令执行两次,则第二次加载后,hive中新产生的文件名将会是“student_copy_1.txt”。(和Hadoop权威教程中描述的不一致,读者请慎重验证)
接下来,我们执行以下命令:
select * from student;
输出如下:
C01 N0101 82
C01 N0102 59
C01 N0103 65
C02 N0201 81
C02 N0202 82
C02 N0203 79
C03 N0301 56
C03 N0302 92
C03 N0306 72
执行以下命令:
Select classNo,count(score) fromstudent where score>=60 group by classNo;
输出如下:
C01 2
C02 3
C03 2
由此看见,HiveQL的使用和SQL及其类似。我们用到了group和count,其实在后台Hive将这些操作都转换成了MapReduce操作提交给Hadoop执行,并最终输出结果。

How to effectively monitor MySQL performance? Use tools such as mysqladmin, SHOWGLOBALSTATUS, PerconaMonitoring and Management (PMM), and MySQL EnterpriseMonitor. 1. Use mysqladmin to view the number of connections. 2. Use SHOWGLOBALSTATUS to view the query number. 3.PMM provides detailed performance data and graphical interface. 4.MySQLEnterpriseMonitor provides rich monitoring functions and alarm mechanisms.

The difference between MySQL and SQLServer is: 1) MySQL is open source and suitable for web and embedded systems, 2) SQLServer is a commercial product of Microsoft and is suitable for enterprise-level applications. There are significant differences between the two in storage engine, performance optimization and application scenarios. When choosing, you need to consider project size and future scalability.

In enterprise-level application scenarios that require high availability, advanced security and good integration, SQLServer should be chosen instead of MySQL. 1) SQLServer provides enterprise-level features such as high availability and advanced security. 2) It is closely integrated with Microsoft ecosystems such as VisualStudio and PowerBI. 3) SQLServer performs excellent in performance optimization and supports memory-optimized tables and column storage indexes.

MySQLmanagescharactersetsandcollationsbyusingUTF-8asthedefault,allowingconfigurationatdatabase,table,andcolumnlevels,andrequiringcarefulalignmenttoavoidmismatches.1)Setdefaultcharactersetandcollationforadatabase.2)Configurecharactersetandcollationfor

A MySQL trigger is an automatically executed stored procedure associated with a table that is used to perform a series of operations when a specific data operation is performed. 1) Trigger definition and function: used for data verification, logging, etc. 2) Working principle: It is divided into BEFORE and AFTER, and supports row-level triggering. 3) Example of use: Can be used to record salary changes or update inventory. 4) Debugging skills: Use SHOWTRIGGERS and SHOWCREATETRIGGER commands. 5) Performance optimization: Avoid complex operations, use indexes, and manage transactions.

The steps to create and manage user accounts in MySQL are as follows: 1. Create a user: Use CREATEUSER'newuser'@'localhost'IDENTIFIEDBY'password'; 2. Assign permissions: Use GRANTSELECT, INSERT, UPDATEONmydatabase.TO'newuser'@'localhost'; 3. Fix permission error: Use REVOKEALLPRIVILEGESONmydatabase.FROM'newuser'@'localhost'; then reassign permissions; 4. Optimization permissions: Use SHOWGRA

MySQL is suitable for rapid development and small and medium-sized applications, while Oracle is suitable for large enterprises and high availability needs. 1) MySQL is open source and easy to use, suitable for web applications and small and medium-sized enterprises. 2) Oracle is powerful and suitable for large enterprises and government agencies. 3) MySQL supports a variety of storage engines, and Oracle provides rich enterprise-level functions.

The disadvantages of MySQL compared to other relational databases include: 1. Performance issues: You may encounter bottlenecks when processing large-scale data, and PostgreSQL performs better in complex queries and big data processing. 2. Scalability: The horizontal scaling ability is not as good as Google Spanner and Amazon Aurora. 3. Functional limitations: Not as good as PostgreSQL and Oracle in advanced functions, some functions require more custom code and maintenance.


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Atom editor mac version download
The most popular open source editor

Dreamweaver Mac version
Visual web development tools

PhpStorm Mac version
The latest (2018.2.1) professional PHP integrated development tool

mPDF
mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

EditPlus Chinese cracked version
Small size, syntax highlighting, does not support code prompt function