집 >데이터 베이스 >MySQL 튜토리얼 >Hive分析窗口函数(一) SUM,AVG,MIN,MAX
Hive中提供了越来越多的分析函数,用于完成负责的统计分析。抽时间将所有的分析窗口函数理一遍,将陆续发布。今天先看几个基础的,SUM、AVG、MIN、MAX。p用于实现分组内所有和连续累积的统计。/p CREATE EXTERNAL TABLE yeshuai_test( cookieid string, crea
Hive中提供了越来越多的分析函数,用于完成负责的统计分析。抽时间将所有的分析窗口函数理一遍,将陆续发布。 今天先看几个基础的,SUM、AVG、MIN、MAX。 <p>用于实现分组内所有和连续累积的统计。</p>
CREATE EXTERNAL TABLE yeshuai_test( cookieid string, createtime string, --day pv INT ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
为了测试方便,不用到hadoop上去上传文件进行测试
直接进入本地的一个文件夹 进行编辑数据导入 ,这里我的文件夹路径为/home/work/yeshuai/data1.txt
用vim 编辑 ,复制进去数据
然后 hive>select * from yeshuai_test 进行测试,有数据的话,测试环境准备ok了
SUM — 注意,结果和ORDER BY相关,默认为升序
SELECT cookieid, createtime, pv, SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行 SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2, --从起点到当前行,结果同pv1 SUM(pv) OVER(PARTITION BY cookieid) AS pv3, --分组内所有行 SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4, --当前行+往前3行 SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5, --当前行+往前3行+往后1行 SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 ---当前行+往后所有行 FROM yeshuai_test; cookieid createtime pv pv1 pv2 pv3 pv4 pv5 pv6 ----------------------------------------------------------------------------- cookie1 2015-04-10 1 1 1 26 1 6 26 cookie1 2015-04-11 5 6 6 26 6 13 25 cookie1 2015-04-12 7 13 13 26 13 16 20 cookie1 2015-04-13 3 16 16 26 16 18 13 cookie1 2015-04-14 2 18 18 26 17 21 10 cookie1 2015-04-15 4 22 22 26 16 20 8 cookie1 2015-04-16 4 26 26 26 13 13 4
–其他AVG,MIN,MAX,和SUM用法一样。
<div class="blockcode"> <div id="code_cxN"><ol> <li> --AVG </li> <li> SELECT cookieid, </li> <li> createtime, </li> <li> pv, </li> <li> AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行 </li> <li> AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2, --从起点到当前行,结果同pv1 </li> <li> AVG(pv) OVER(PARTITION BY cookieid) AS pv3, --分组内所有行 </li> <li> AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4, --当前行+往前3行 </li> <li> AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5, --当前行+往前3行+往后1行 </li> <li> AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 ---当前行+往后所有行 </li> <li> FROM lxw1234; </li> <li> cookieid createtime pv pv1 pv2 pv3 pv4 pv5 pv6 </li> <li> ----------------------------------------------------------------------------- </li> <li> cookie1 2015-04-10 1 1.0 1.0 3.7142857142857144 1.0 3.0 3.7142857142857144 </li> <li> cookie1 2015-04-11 5 3.0 3.0 3.7142857142857144 3.0 4.333333333333333 4.166666666666667 </li> <li> cookie1 2015-04-12 7 4.333333333333333 4.333333333333333 3.7142857142857144 4.333333333333333 4.0 4.0 </li> <li> cookie1 2015-04-13 3 4.0 4.0 3.7142857142857144 4.0 3.6 3.25 </li> <li> cookie1 2015-04-14 2 3.6 3.6 3.7142857142857144 4.25 4.2 3.3333333333333335 </li> <li> cookie1 2015-04-15 4 3.6666666666666665 3.6666666666666665 3.7142857142857144 4.0 4.0 4.0 </li> <li> cookie1 2015-04-16 4 3.7142857142857144 3.7142857142857144 3.7142857142857144 3.25 3.25 4.0</li> </ol></div> </div><div class="blockcode"> <div id="code_L6F"><ol> <li> --MIN </li> <li> SELECT cookieid, </li> <li> createtime, </li> <li> pv, </li> <li> MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行 </li> <li> MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2, --从起点到当前行,结果同pv1 </li> <li> MIN(pv) OVER(PARTITION BY cookieid) AS pv3, --分组内所有行 </li> <li> MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4, --当前行+往前3行 </li> <li> MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5, --当前行+往前3行+往后1行 </li> <li> MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 ---当前行+往后所有行 </li> <li> FROM lxw1234; </li> <li> </li> <li> cookieid createtime pv pv1 pv2 pv3 pv4 pv5 pv6 </li> <li> ----------------------------------------------------------------------------- </li> <li> cookie1 2015-04-10 1 1 1 1 1 1 1 </li> <li> cookie1 2015-04-11 5 1 1 1 1 1 2 </li> <li> cookie1 2015-04-12 7 1 1 1 1 1 2 </li> <li> cookie1 2015-04-13 3 1 1 1 1 1 2 </li> <li> cookie1 2015-04-14 2 1 1 1 2 2 2 </li> <li> cookie1 2015-04-15 4 1 1 1 2 2 4 </li> <li> cookie1 2015-04-16 4 1 1 1 2 2 4</li> </ol></div> </div><div class="blockcode"><div id="code_f2D"><ol> <li> ----MAX </li> <li> SELECT cookieid, </li> <li> createtime, </li> <li> pv, </li> <li> MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行 </li> <li> MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2, --从起点到当前行,结果同pv1 </li> <li> MAX(pv) OVER(PARTITION BY cookieid) AS pv3, --分组内所有行 </li> <li> MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4, --当前行+往前3行 </li> <li> MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5, --当前行+往前3行+往后1行 </li> <li> MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 ---当前行+往后所有行 </li> <li> FROM lxw1234; </li> <li> </li> <li> cookieid createtime pv pv1 pv2 pv3 pv4 pv5 pv6 </li> <li> ----------------------------------------------------------------------------- </li> <li> cookie1 2015-04-10 1 1 1 7 1 5 7 </li> <li> cookie1 2015-04-11 5 5 5 7 5 7 7 </li> <li> cookie1 2015-04-12 7 7 7 7 7 7 7 </li> <li> cookie1 2015-04-13 3 7 7 7 7 7 4 </li> <li> cookie1 2015-04-14 2 7 7 7 7 7 4 </li> <li> cookie1 2015-04-15 4 7 7 7 7 7 4 </li> <li> cookie1 2015-04-16 4 7 7 7 4 4 4</li> </ol></div></div>