[置顶] 数据仓库----Hive进阶篇一-mysql教程-PHP中文网

首页

数据库

mysql教程

[置顶] 数据仓库----Hive进阶篇一

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jun 07, 2016 pm 02:50 PM

仓库数据置顶进阶

数据仓库—-hive进阶篇二（表的链接，子查询，客户端jdbc和Thrift Client操作，自定义函数）一、数据的导入 1、使用Load语句执行数据的导入 1.语法：其中（中括号中表示可加指令）： LOCAL：表示指定的文件路径是否是本地的，没有则说明是HDFS上的文件路径

数据仓库—-hive进阶篇二（表的链接，子查询，客户端jdbc和Thrift Client操作，自定义函数）

一、数据的导入

1、使用Load语句执行数据的导入

<code>1.语法：
</code>

这里写图片描述

<code>    其中（中括号中表示可加指令）：
        LOCAL：表示指定的文件路径是否是本地的，没有则说明是HDFS上的文件路径。
        OVERWRITE：表示覆盖表中的已有数据。
        PARTITION ()：如果是向分区表中导入数据的话需要指定分区。
2.实例：
    (1).无分区情况：
</code>

这里写图片描述

<code>    其中的'/root/data'可以是路径也可以是文件：
        路径表示把该路径下的所有文件都导入到表中；
        文件表示只把当前文件导入到表中。
    (2).有分区情况：
</code>

这里写图片描述

2、使用Sqoop进行数据的导入

<code>1.使用sqoop将mysql数据库中的数据导入到HDFS中
</code>

<code class=" hljs brainfuck"><span class="hljs-comment">hive</span>> <span class="hljs-comment">sqoop</span> <span class="hljs-comment">import</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">connect</span> <span class="hljs-comment">jdbc:mysql://localhost/3306/sfd</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">username</span> <span class="hljs-comment">root</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">password</span> <span class="hljs-comment">123</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">table</span> <span class="hljs-comment">student</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">columns</span> <span class="hljs-comment">'sid</span><span class="hljs-string">,</span><span class="hljs-comment">sname'</span> <span class="hljs-literal">-</span><span class="hljs-comment">m</span> <span class="hljs-comment">1</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">target</span><span class="hljs-literal">-</span><span class="hljs-comment">dir</span> <span class="hljs-comment">'/sqoop/student'</span></code>

<code>    其中：
        --connet ：表示数据库的url链接
        --username ：数据库用户名
        --password ：数据库用户密码
        --table ：源数据所在的表
        --clomns ： 表中的列名，（例子中使用',' 链接）
        -m 1 : 表示启用的mapreduce个数为1个
        --target-dir ： 将源数据导入到HDFS上的那个文件夹下

2.使用sqoop将mysql数据库中的数据导入到hive中：
</code>

<code class=" hljs brainfuck"><span class="hljs-comment">hive</span>> <span class="hljs-comment">sqoop</span> <span class="hljs-comment">import</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">hive</span><span class="hljs-literal">-</span><span class="hljs-comment">import</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">connect</span> <span class="hljs-comment">jdbc:mysql://localhost/3306/sfd</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">username</span> <span class="hljs-comment">root</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">password</span> <span class="hljs-comment">123</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">table</span> <span class="hljs-comment">student</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">columns</span> <span class="hljs-comment">'sid</span><span class="hljs-string">,</span><span class="hljs-comment">sname'</span> <span class="hljs-literal">-</span><span class="hljs-comment">m</span> <span class="hljs-comment">1</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">hive</span><span class="hljs-literal">-</span><span class="hljs-comment">table</span> <span class="hljs-comment">stu</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">where</span> <span class="hljs-comment">'sid=1'</span></code>

<code>    其中：
        --hive-table stu ： 表示在导入到hive中名为stu的表中
        --where ：表示插入数据的条件

3.使用sqoop将mysql数据库中的数据导入到hive中,并使用查询语句；
</code>

<code class=" hljs brainfuck"><span class="hljs-comment">hive</span>> <span class="hljs-comment">sqoop</span> <span class="hljs-comment">import</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">hive</span><span class="hljs-literal">-</span><span class="hljs-comment">import</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">connect</span> <span class="hljs-comment">jdbc:mysql://localhost/3306/sfd</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">username</span> <span class="hljs-comment">root</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">password</span> <span class="hljs-comment">123</span> <span class="hljs-literal">-</span><span class="hljs-comment">m</span> <span class="hljs-comment">1</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">query</span> <span class="hljs-comment">'select</span> <span class="hljs-comment">*</span> <span class="hljs-comment">from</span> <span class="hljs-comment">student</span> <span class="hljs-comment">where</span> <span class="hljs-comment">sid='1'</span> <span class="hljs-comment">and</span> <span class="hljs-comment">$CONDITIONS'</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">target</span><span class="hljs-literal">-</span><span class="hljs-comment">dir</span> <span class="hljs-comment">'/sqoop/student1'</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">hive</span><span class="hljs-literal">-</span><span class="hljs-comment">table</span> <span class="hljs-comment">stu</span> </code>

<code>    其中：
        --query : 表示使用的查询语句,如果查询语句中有where条件限制那么必须加上 and $CONDITIONS(大写)

4.使用sqoop将hive中的数据导出到mysql中：
</code>

<code class=" hljs brainfuck"><span class="hljs-comment">hive</span>> <span class="hljs-comment">sqoop</span> <span class="hljs-comment">export</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">connect</span> <span class="hljs-comment">jdbc:mysql://localhost/3306/sfd</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">username</span> <span class="hljs-comment">root</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">password</span> <span class="hljs-comment">123</span> <span class="hljs-literal">-</span><span class="hljs-comment">m</span> <span class="hljs-comment">1</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">table</span> <span class="hljs-comment">student1</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">export</span><span class="hljs-literal">-</span><span class="hljs-comment">dir</span> <span class="hljs-comment">'/data'</span></code>

<code>    其中：
        --table ：为mysql数据库中的已经建立了的表
        --export-dir ：将数据这个文件夹下的数据导入到mysql的student1表中。
</code>

二、Hive的数据查询

1、查询的语法：

这里写图片描述

<code>    例子：查询student表中的信息：
        select * from student;(查询所有信息不用启用mapreduce)
        select sid from student;（需要启动mapreduce）
        select sid,sname,math,english,math+english from student;(在（math+english）表达式中如果有一个变量为空那么整个表达式为空，可以使用nvl（math，0）函数，表示如果math为空令其为0)
</code>

这里写图片描述

2、简单查询的Fetch Task功能，

<code>从上面的例子中可以看出，简单的查询如果不是查询所有的信息，就会开启mapreduce任务，这样会影响工作效率，从Hive0.10.0版本开始支持了Fetch Task功能；
Fetch Task功能配置方式：
    a. 方式一： set hive.fetch.task.conversion=more
    b. 方式二： hive --hiveconf hive.fetch.task.conversion=more
    c. 方式三： 修改hive-site.xml文件
</code>

这里写图片描述

<code>    前两种方式只在当前hive命令行有用，当重启hive时简单查询还是会调用mapreduce程序；而第二种方式配置是一直起作用的。
</code>

3.、在查询中使用过滤

<code>1.where 语句进行过滤。（字符串过滤区分大小写）
</code>

这里写图片描述

<code>    其中：%\\_%  :  由于_是模糊查询中的关键词（表示有一个字符），所以要用到转义字符，第一个'\'表示后面使用的是转义字符，'\_'表示的是'_';
</code>

4、在查询中排序

排序默认是升序的，要想降序只需在末尾加上desc
这里写图片描述

注意：当使用序号进行排序的使用需要设置一个属性：set hive.groupby.orderby.position.alias=true;

三、Hive的内置函数

这里写图片描述

1、数学函数：

<code>round（45.926,2）：四舍五入（第二个参数表示的是保留小数点后面几位，当参数为负数是表示的是小数点前）
</code>

这里写图片描述
ceil（45.9）：向上取整
floor（45.9）：向下取整

2、字符函数：

<code>lower：把字符串转换成小写
upper：把字符串装换成大写
length：字符串的长度
concat('hello','world')：添加一个字符串
substr(a,b)：截取字符串:(从a中，第b为开始取，取到右边所有的字符)
substr(a,b,c)：截取字符串：(从a中，第b为开始取，取c个字符)
trim：去掉字符串两端的空格
lpad('abc',10,'*')：左填充
rpad：右填充
</code>

3、收集函数和转换函数：

<code>1，收集函数：
    size：
</code>

这里写图片描述

<code>2，转换函数：
    cast：cast(1 as bigint);
</code>

4、日期函数：

<code>to_data：取出字符串中的日期部分
</code>

这里写图片描述

<code>year：取出日期中的年
month：取出日期中的月
day：取出日期中的日
</code>

这里写图片描述

<code>weekofyear：返回一个日期在一年中是第几个星期
</code>

这里写图片描述

<code>datediff：两个日期相减返回相差的天数
</code>

这里写图片描述

<code>date_add：在一个日期上加上多少天
date_sub：在一个日期上减去多少天
</code>

这里写图片描述

5、条件函数：

<code>coalesce(a,b,...)：从做到右返回第一个不为null的值
</code>

这里写图片描述

<code>case...when...: 条件表达式
    case a when b then c [when d then e]* [else f] end
</code>

这里写图片描述

6、聚合函数：

<code>count:个数
sum:求和
min:求最小值
max:求最大值
avg:求平均值
</code>

7、表生成函数：

<code>explode：把一个map集合或者是array数组中的一个元素单独生成一行
</code>

这里写图片描述

数据仓库—-hive进阶篇二

声明

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系admin@php.cn

解读CRISP-ML（Q）：机器学习生命周期流程Apr 08, 2023 pm 01:21 PM

译者 | 布加迪审校 | 孙淑娟目前，没有用于构建和管理机器学习（ML）应用程序的标准实践。机器学习项目组织得不好，缺乏可重复性，而且从长远来看容易彻底失败。因此，我们需要一套流程来帮助自己在整个机器学习生命周期中保持质量、可持续性、稳健性和成本管理。图1. 机器学习开发生命周期流程使用质量保证方法开发机器学习应用程序的跨行业标准流程（CRISP-ML（Q））是CRISP-DM的升级版，以确保机器学习产品的质量。CRISP-ML（Q）有六个单独的阶段：1. 业务和数据理解2. 数据准备3. 模型

人工智能的环境成本和承诺Apr 08, 2023 pm 04:31 PM

人工智能(AI)在流行文化和政治分析中经常以两种极端的形式出现。它要么代表着人类智慧与科技实力相结合的未来主义乌托邦的关键，要么是迈向反乌托邦式机器崛起的第一步。学者、企业家、甚至活动家在应用人工智能应对气候变化时都采用了同样的二元思维。科技行业对人工智能在创建一个新的技术乌托邦中所扮演的角色的单一关注，掩盖了人工智能可能加剧环境退化的方式，通常是直接伤害边缘人群的方式。为了在应对气候变化的过程中充分利用人工智能技术，同时承认其大量消耗能源，引领人工智能潮流的科技公司需要探索人工智能对环境影响的

抖音怎么置顶自己视频Mar 26, 2024 pm 01:21 PM

抖音里面有用户会拍摄很多的视频作品，一旦视频作品多了以后，各种优秀的视频作品就会被掩埋了，置顶功能就很有用了，那么我们怎么置顶自己的视频作品呢？接下来小编就为大家带来了抖音置顶自己视频作品方法图文教程，还不知道怎么置顶视频的用户快来看看吧。抖音使用教程抖音怎么置顶自己视频1、首先我们打开抖音，点击主界面右下角如图所示的我。2、接着我们进入个人界面后，找到想要置顶的视频作品，点击进入播放即可。3、然后我们在视频界面，点击右下角如图所示的三个点选项。4、最后我们点击新弹出窗口中的置顶即可，返回个人界

找不到中文语音预训练模型？中文版 Wav2vec 2.0和HuBERT来了Apr 08, 2023 pm 06:21 PM

Wav2vec 2.0 [1]，HuBERT [2] 和 WavLM [3] 等语音预训练模型，通过在多达上万小时的无标注语音数据（如 Libri-light ）上的自监督学习，显著提升了自动语音识别（Automatic Speech Recognition, ASR），语音合成（Text-to-speech, TTS）和语音转换（Voice Conversation，VC）等语音下游任务的性能。然而这些模型都没有公开的中文版本，不便于应用在中文语音研究场景。 WenetSpeech [4] 是

条形统计图用什么呈现数据Jan 20, 2021 pm 03:31 PM

条形统计图用“直条”呈现数据。条形统计图是用一个单位长度表示一定的数量，根据数量的多少画成长短不同的直条，然后把这些直条按一定的顺序排列起来；从条形统计图中很容易看出各种数量的多少。条形统计图分为：单式条形统计图和复式条形统计图，前者只表示1个项目的数据，后者可以同时表示多个项目的数据。

自动驾驶车道线检测分类的虚拟-真实域适应方法Apr 08, 2023 pm 02:31 PM

arXiv论文“Sim-to-Real Domain Adaptation for Lane Detection and Classification in Autonomous Driving“，2022年5月，加拿大滑铁卢大学的工作。虽然自主驾驶的监督检测和分类框架需要大型标注数据集，但光照真实模拟环境生成的合成数据推动的无监督域适应（UDA，Unsupervised Domain Adaptation）方法则是低成本、耗时更少的解决方案。本文提出对抗性鉴别和生成（adversarial d

数据通信中的信道传输速率单位是bps，它表示什么Jan 18, 2021 pm 02:58 PM

数据通信中的信道传输速率单位是bps，它表示“位/秒”或“比特/秒”，即数据传输速率在数值上等于每秒钟传输构成数据代码的二进制比特数，也称“比特率”。比特率表示单位时间内传送比特的数目，用于衡量数字信息的传送速度；根据每帧图像存储时所占的比特数和传输比特率，可以计算数字图像信息传输的速度。

闲鱼怎么置顶商品Mar 09, 2024 pm 07:30 PM

置顶功能可以让我们在软件闲鱼中的商品出现在列表最前面，有些用户并不清楚闲鱼怎么置顶商品，在个人页长按商品，点击置顶宝贝即可，今天小编为各位玩家带来了置顶商品帖子方法的介绍，有需要这个文章的玩家快来看看吧。闲鱼怎么置顶商品答：在个人页长按商品，点击置顶宝贝即可。详情介绍：1、进入软件，点击右下【我的】。2、点击上方的【头像】。3、长按想要置顶的商品。4、再点击【置顶宝贝】即可。5、这样该商品就回到列表的最前面了。

See all articles