[置顶] 数据仓库----Hive进阶篇一-Mysql Tutorial-php.cn

Home

Database

Mysql Tutorial

[置顶] 数据仓库----Hive进阶篇一

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jun 07, 2016 pm 02:50 PM

storehousedatapin to topAdvanced

数据仓库—-hive进阶篇二（表的链接，子查询，客户端jdbc和Thrift Client操作，自定义函数）一、数据的导入 1、使用Load语句执行数据的导入 1.语法：其中（中括号中表示可加指令）： LOCAL：表示指定的文件路径是否是本地的，没有则说明是HDFS上的文件路径

数据仓库—-hive进阶篇二（表的链接，子查询，客户端jdbc和Thrift Client操作，自定义函数）

一、数据的导入

1、使用Load语句执行数据的导入

<code>1.语法：
</code>

这里写图片描述

<code>    其中（中括号中表示可加指令）：
        LOCAL：表示指定的文件路径是否是本地的，没有则说明是HDFS上的文件路径。
        OVERWRITE：表示覆盖表中的已有数据。
        PARTITION ()：如果是向分区表中导入数据的话需要指定分区。
2.实例：
    (1).无分区情况：
</code>

这里写图片描述

<code>    其中的'/root/data'可以是路径也可以是文件：
        路径表示把该路径下的所有文件都导入到表中；
        文件表示只把当前文件导入到表中。
    (2).有分区情况：
</code>

这里写图片描述

2、使用Sqoop进行数据的导入

<code>1.使用sqoop将mysql数据库中的数据导入到HDFS中
</code>

<code class=" hljs brainfuck"><span class="hljs-comment">hive</span>> <span class="hljs-comment">sqoop</span> <span class="hljs-comment">import</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">connect</span> <span class="hljs-comment">jdbc:mysql://localhost/3306/sfd</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">username</span> <span class="hljs-comment">root</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">password</span> <span class="hljs-comment">123</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">table</span> <span class="hljs-comment">student</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">columns</span> <span class="hljs-comment">'sid</span><span class="hljs-string">,</span><span class="hljs-comment">sname'</span> <span class="hljs-literal">-</span><span class="hljs-comment">m</span> <span class="hljs-comment">1</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">target</span><span class="hljs-literal">-</span><span class="hljs-comment">dir</span> <span class="hljs-comment">'/sqoop/student'</span></code>

<code>    其中：
        --connet ：表示数据库的url链接
        --username ：数据库用户名
        --password ：数据库用户密码
        --table ：源数据所在的表
        --clomns ： 表中的列名，（例子中使用',' 链接）
        -m 1 : 表示启用的mapreduce个数为1个
        --target-dir ： 将源数据导入到HDFS上的那个文件夹下

2.使用sqoop将mysql数据库中的数据导入到hive中：
</code>

<code class=" hljs brainfuck"><span class="hljs-comment">hive</span>> <span class="hljs-comment">sqoop</span> <span class="hljs-comment">import</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">hive</span><span class="hljs-literal">-</span><span class="hljs-comment">import</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">connect</span> <span class="hljs-comment">jdbc:mysql://localhost/3306/sfd</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">username</span> <span class="hljs-comment">root</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">password</span> <span class="hljs-comment">123</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">table</span> <span class="hljs-comment">student</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">columns</span> <span class="hljs-comment">'sid</span><span class="hljs-string">,</span><span class="hljs-comment">sname'</span> <span class="hljs-literal">-</span><span class="hljs-comment">m</span> <span class="hljs-comment">1</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">hive</span><span class="hljs-literal">-</span><span class="hljs-comment">table</span> <span class="hljs-comment">stu</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">where</span> <span class="hljs-comment">'sid=1'</span></code>

<code>    其中：
        --hive-table stu ： 表示在导入到hive中名为stu的表中
        --where ：表示插入数据的条件

3.使用sqoop将mysql数据库中的数据导入到hive中,并使用查询语句；
</code>

<code class=" hljs brainfuck"><span class="hljs-comment">hive</span>> <span class="hljs-comment">sqoop</span> <span class="hljs-comment">import</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">hive</span><span class="hljs-literal">-</span><span class="hljs-comment">import</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">connect</span> <span class="hljs-comment">jdbc:mysql://localhost/3306/sfd</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">username</span> <span class="hljs-comment">root</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">password</span> <span class="hljs-comment">123</span> <span class="hljs-literal">-</span><span class="hljs-comment">m</span> <span class="hljs-comment">1</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">query</span> <span class="hljs-comment">'select</span> <span class="hljs-comment">*</span> <span class="hljs-comment">from</span> <span class="hljs-comment">student</span> <span class="hljs-comment">where</span> <span class="hljs-comment">sid='1'</span> <span class="hljs-comment">and</span> <span class="hljs-comment">$CONDITIONS'</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">target</span><span class="hljs-literal">-</span><span class="hljs-comment">dir</span> <span class="hljs-comment">'/sqoop/student1'</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">hive</span><span class="hljs-literal">-</span><span class="hljs-comment">table</span> <span class="hljs-comment">stu</span> </code>

<code>    其中：
        --query : 表示使用的查询语句,如果查询语句中有where条件限制那么必须加上 and $CONDITIONS(大写)

4.使用sqoop将hive中的数据导出到mysql中：
</code>

<code class=" hljs brainfuck"><span class="hljs-comment">hive</span>> <span class="hljs-comment">sqoop</span> <span class="hljs-comment">export</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">connect</span> <span class="hljs-comment">jdbc:mysql://localhost/3306/sfd</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">username</span> <span class="hljs-comment">root</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">password</span> <span class="hljs-comment">123</span> <span class="hljs-literal">-</span><span class="hljs-comment">m</span> <span class="hljs-comment">1</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">table</span> <span class="hljs-comment">student1</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">export</span><span class="hljs-literal">-</span><span class="hljs-comment">dir</span> <span class="hljs-comment">'/data'</span></code>

<code>    其中：
        --table ：为mysql数据库中的已经建立了的表
        --export-dir ：将数据这个文件夹下的数据导入到mysql的student1表中。
</code>

二、Hive的数据查询

1、查询的语法：

这里写图片描述

<code>    例子：查询student表中的信息：
        select * from student;(查询所有信息不用启用mapreduce)
        select sid from student;（需要启动mapreduce）
        select sid,sname,math,english,math+english from student;(在（math+english）表达式中如果有一个变量为空那么整个表达式为空，可以使用nvl（math，0）函数，表示如果math为空令其为0)
</code>

这里写图片描述

2、简单查询的Fetch Task功能，

<code>从上面的例子中可以看出，简单的查询如果不是查询所有的信息，就会开启mapreduce任务，这样会影响工作效率，从Hive0.10.0版本开始支持了Fetch Task功能；
Fetch Task功能配置方式：
    a. 方式一： set hive.fetch.task.conversion=more
    b. 方式二： hive --hiveconf hive.fetch.task.conversion=more
    c. 方式三： 修改hive-site.xml文件
</code>

这里写图片描述

<code>    前两种方式只在当前hive命令行有用，当重启hive时简单查询还是会调用mapreduce程序；而第二种方式配置是一直起作用的。
</code>

3.、在查询中使用过滤

<code>1.where 语句进行过滤。（字符串过滤区分大小写）
</code>

这里写图片描述

<code>    其中：%\\_%  :  由于_是模糊查询中的关键词（表示有一个字符），所以要用到转义字符，第一个'\'表示后面使用的是转义字符，'\_'表示的是'_';
</code>

4、在查询中排序

排序默认是升序的，要想降序只需在末尾加上desc
这里写图片描述

注意：当使用序号进行排序的使用需要设置一个属性：set hive.groupby.orderby.position.alias=true;

三、Hive的内置函数

这里写图片描述

1、数学函数：

<code>round（45.926,2）：四舍五入（第二个参数表示的是保留小数点后面几位，当参数为负数是表示的是小数点前）
</code>

这里写图片描述
ceil（45.9）：向上取整
floor（45.9）：向下取整

2、字符函数：

<code>lower：把字符串转换成小写
upper：把字符串装换成大写
length：字符串的长度
concat('hello','world')：添加一个字符串
substr(a,b)：截取字符串:(从a中，第b为开始取，取到右边所有的字符)
substr(a,b,c)：截取字符串：(从a中，第b为开始取，取c个字符)
trim：去掉字符串两端的空格
lpad('abc',10,'*')：左填充
rpad：右填充
</code>

3、收集函数和转换函数：

<code>1，收集函数：
    size：
</code>

这里写图片描述

<code>2，转换函数：
    cast：cast(1 as bigint);
</code>

4、日期函数：

<code>to_data：取出字符串中的日期部分
</code>

这里写图片描述

<code>year：取出日期中的年
month：取出日期中的月
day：取出日期中的日
</code>

这里写图片描述

<code>weekofyear：返回一个日期在一年中是第几个星期
</code>

这里写图片描述

<code>datediff：两个日期相减返回相差的天数
</code>

这里写图片描述

<code>date_add：在一个日期上加上多少天
date_sub：在一个日期上减去多少天
</code>

这里写图片描述

5、条件函数：

<code>coalesce(a,b,...)：从做到右返回第一个不为null的值
</code>

这里写图片描述

<code>case...when...: 条件表达式
    case a when b then c [when d then e]* [else f] end
</code>

这里写图片描述

6、聚合函数：

<code>count:个数
sum:求和
min:求最小值
max:求最大值
avg:求平均值
</code>

7、表生成函数：

<code>explode：把一个map集合或者是array数组中的一个元素单独生成一行
</code>

这里写图片描述

数据仓库—-hive进阶篇二

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

MySQL: BLOB and other no-sql storage, what are the differences?May 13, 2025 am 12:14 AM

MySQL'sBLOBissuitableforstoringbinarydatawithinarelationaldatabase,whileNoSQLoptionslikeMongoDB,Redis,andCassandraofferflexible,scalablesolutionsforunstructureddata.BLOBissimplerbutcanslowdownperformancewithlargedata;NoSQLprovidesbetterscalabilityand

MySQL Add User: Syntax, Options, and Security Best PracticesMay 13, 2025 am 12:12 AM

ToaddauserinMySQL,use:CREATEUSER'username'@'host'IDENTIFIEDBY'password';Here'showtodoitsecurely:1)Choosethehostcarefullytocontrolaccess.2)SetresourcelimitswithoptionslikeMAX_QUERIES_PER_HOUR.3)Usestrong,uniquepasswords.4)EnforceSSL/TLSconnectionswith

MySQL: How to avoid String Data Types common mistakes?May 13, 2025 am 12:09 AM

ToavoidcommonmistakeswithstringdatatypesinMySQL,understandstringtypenuances,choosetherighttype,andmanageencodingandcollationsettingseffectively.1)UseCHARforfixed-lengthstrings,VARCHARforvariable-length,andTEXT/BLOBforlargerdata.2)Setcorrectcharacters

MySQL: String Data Types and ENUMs?May 13, 2025 am 12:05 AM

MySQloffersechar, Varchar, text, Anddenumforstringdata.usecharforfixed-Lengthstrings, VarcharerForvariable-Length, text forlarger text, AndenumforenforcingdataAntegritywithaetofvalues.

MySQL BLOB: how to optimize BLOBs requestsMay 13, 2025 am 12:03 AM

Optimizing MySQLBLOB requests can be done through the following strategies: 1. Reduce the frequency of BLOB query, use independent requests or delay loading; 2. Select the appropriate BLOB type (such as TINYBLOB); 3. Separate the BLOB data into separate tables; 4. Compress the BLOB data at the application layer; 5. Index the BLOB metadata. These methods can effectively improve performance by combining monitoring, caching and data sharding in actual applications.

Adding Users to MySQL: The Complete TutorialMay 12, 2025 am 12:14 AM

Mastering the method of adding MySQL users is crucial for database administrators and developers because it ensures the security and access control of the database. 1) Create a new user using the CREATEUSER command, 2) Assign permissions through the GRANT command, 3) Use FLUSHPRIVILEGES to ensure permissions take effect, 4) Regularly audit and clean user accounts to maintain performance and security.

Mastering MySQL String Data Types: VARCHAR vs. TEXT vs. CHARMay 12, 2025 am 12:12 AM

ChooseCHARforfixed-lengthdata,VARCHARforvariable-lengthdata,andTEXTforlargetextfields.1)CHARisefficientforconsistent-lengthdatalikecodes.2)VARCHARsuitsvariable-lengthdatalikenames,balancingflexibilityandperformance.3)TEXTisidealforlargetextslikeartic

MySQL: String Data Types and Indexing: Best PracticesMay 12, 2025 am 12:11 AM

Best practices for handling string data types and indexes in MySQL include: 1) Selecting the appropriate string type, such as CHAR for fixed length, VARCHAR for variable length, and TEXT for large text; 2) Be cautious in indexing, avoid over-indexing, and create indexes for common queries; 3) Use prefix indexes and full-text indexes to optimize long string searches; 4) Regularly monitor and optimize indexes to keep indexes small and efficient. Through these methods, we can balance read and write performance and improve database efficiency.

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Roblox: Grow A Garden - Complete Mutation Guide

3 weeks agoByDDD

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

How to fix KB5055612 fails to install in Windows 10?

3 weeks agoByDDD

Nordhold: Fusion System, Explained

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Mandragora: Whispers Of The Witch Tree - How To Unlock The Grappling Hook

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Dreamweaver Mac version

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

EditPlus Chinese cracked version

Small size, syntax highlighting, does not support code prompt function

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

Hot Topics

1666

1426

1328

1273

1253

[置顶] 数据仓库----Hive进阶篇 一

一、数据的导入

1、使用Load语句执行数据的导入

2、使用Sqoop进行数据的导入

二、Hive的数据查询

1、 查询的语法：

2、简单查询的Fetch Task功能，

3.、在查询中使用过滤

4、在查询中排序

三、Hive的内置函数

1、数学函数：

2、字符函数：

3、收集函数和转换函数：

4、日期函数：

5、条件函数：

6、聚合函数：

7、表生成函数：

Hot AI Tools

Undresser.AI Undress

AI Clothes Remover

Undress AI Tool

Clothoff.io

Video Face Swap

Hot Article

Hot Tools

Dreamweaver Mac version

SublimeText3 Mac version

EditPlus Chinese cracked version

MinGW - Minimalist GNU for Windows

SecLists

Hot Topics

[置顶] 数据仓库----Hive进阶篇一

1、查询的语法：