Home  >  Article  >  Database  >  Hive分析窗口函数(一) SUM,AVG,MIN,MAX

Hive分析窗口函数(一) SUM,AVG,MIN,MAX

WBOY
WBOYOriginal
2016-06-07 14:51:051369browse

Hive中提供了越来越多的分析函数,用于完成负责的统计分析。抽时间将所有的分析窗口函数理一遍,将陆续发布。今天先看几个基础的,SUM、AVG、MIN、MAX。p用于实现分组内所有和连续累积的统计。/p CREATE EXTERNAL TABLE yeshuai_test( cookieid string, crea

Hive中提供了越来越多的分析函数,用于完成负责的统计分析。抽时间将所有的分析窗口函数理一遍,将陆续发布。
今天先看几个基础的,SUM、AVG、MIN、MAX。
<p>用于实现分组内所有和连续累积的统计。</p> 
CREATE EXTERNAL TABLE yeshuai_test(
    cookieid string,
    createtime string, --day
    pv INT
    ) ROW FORMAT DELIMITED
    FIELDS TERMINATED BY ','
    
    

为了测试方便,不用到hadoop上去上传文件进行测试

直接进入本地的一个文件夹 进行编辑数据导入 ,这里我的文件夹路径为/home/work/yeshuai/data1.txt

用vim 编辑 ,复制进去数据


  • cookie1 2015-04-10 1
  •     cookie1 2015-04-11 5
  •     cookie1 2015-04-12 7
  •     cookie1 2015-04-13 3
  •     cookie1 2015-04-14 2
  •     cookie1 2015-04-15 4
  •     cookie1 2015-04-16 4


  • 然后 hive>select * from yeshuai_test 进行测试,有数据的话,测试环境准备ok了


    SUM — 注意,结果和ORDER BY相关,默认为升序

      SELECT cookieid,
        createtime,
        pv,
        SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行
        SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2, --从起点到当前行,结果同pv1
        SUM(pv) OVER(PARTITION BY cookieid) AS pv3,        --分组内所有行
        SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4, --当前行+往前3行
        SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5, --当前行+往前3行+往后1行
        SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 ---当前行+往后所有行
        FROM yeshuai_test;
         
        cookieid createtime pv pv1 pv2 pv3 pv4 pv5 pv6
        -----------------------------------------------------------------------------
        cookie1 2015-04-10 1 1 1 26 1 6 26
        cookie1 2015-04-11 5 6 6 26 6 13 25
        cookie1 2015-04-12 7 13 13 26 13 16 20
        cookie1 2015-04-13 3 16 16 26 16 18 13
        cookie1 2015-04-14 2 18 18 26 17 21 10
        cookie1 2015-04-15 4 22 22 26 16 20 8
        cookie1 2015-04-16 4 26 26 26 13 13 4


    pv1: 分组内从起点到当前行的pv累积,如,11号的pv1=10号的pv+11号的pv, 12号=10号+11号+12号
    pv2: 同pv1
    pv3: 分组内(cookie1)所有的pv累加
    pv4: 分组内当前行+往前3行,如,11号=10号+11号, 12号=10号+11号+12号, 13号=10号+11号+12号+13号, 14号=11号+12号+13号+14号
    pv5: 分组内当前行+往前3行+往后1行,如,14号=11号+12号+13号+14号+15号=5+7+3+2+4=21
    pv6: 分组内当前行+往后所有行,如,13号=13号+14号+15号+16号=3+2+4+4=13,14号=14号+15号+16号=2+4+4=10

    如果不指定ROWS BETWEEN,默认为从起点到当前行;
    如果不指定ORDER BY,则将分组内所有值累加;
    关键是理解ROWS BETWEEN含义,也叫做WINDOW子句:
    PRECEDING:往前
    FOLLOWING:往后
    CURRENT ROW:当前行
    UNBOUNDED:起点,UNBOUNDED PRECEDING 表示从前面的起点, UNBOUNDED FOLLOWING:表示到后面的终点

    –其他AVG,MIN,MAX,和SUM用法一样。

    <div class="blockcode">
    <div id="code_cxN"><ol>
    <li>  --AVG
    </li>
    <li>    SELECT cookieid,
    </li>
    <li>    createtime,
    </li>
    <li>    pv,
    </li>
    <li>    AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行
    </li>
    <li>    AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2, --从起点到当前行,结果同pv1
    </li>
    <li>    AVG(pv) OVER(PARTITION BY cookieid) AS pv3,        --分组内所有行
    </li>
    <li>    AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4, --当前行+往前3行
    </li>
    <li>    AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5, --当前行+往前3行+往后1行
    </li>
    <li>    AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 ---当前行+往后所有行
    </li>
    <li>    FROM lxw1234;
    </li>
    <li>    cookieid createtime pv pv1 pv2 pv3 pv4 pv5 pv6
    </li>
    <li>    -----------------------------------------------------------------------------
    </li>
    <li>    cookie1 2015-04-10 1 1.0 1.0 3.7142857142857144 1.0 3.0 3.7142857142857144
    </li>
    <li>    cookie1 2015-04-11 5 3.0 3.0 3.7142857142857144 3.0 4.333333333333333 4.166666666666667
    </li>
    <li>    cookie1 2015-04-12 7 4.333333333333333 4.333333333333333 3.7142857142857144 4.333333333333333 4.0 4.0
    </li>
    <li>    cookie1 2015-04-13 3 4.0 4.0 3.7142857142857144 4.0 3.6 3.25
    </li>
    <li>    cookie1 2015-04-14 2 3.6 3.6 3.7142857142857144 4.25 4.2 3.3333333333333335
    </li>
    <li>    cookie1 2015-04-15 4 3.6666666666666665 3.6666666666666665 3.7142857142857144 4.0 4.0 4.0
    </li>
    <li>    cookie1 2015-04-16 4 3.7142857142857144 3.7142857142857144 3.7142857142857144 3.25 3.25 4.0</li>
    </ol></div>
    </div><div class="blockcode">
    <div id="code_L6F"><ol>
    <li>    --MIN
    </li>
    <li>    SELECT cookieid,
    </li>
    <li>    createtime,
    </li>
    <li>    pv,
    </li>
    <li>    MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行
    </li>
    <li>    MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2, --从起点到当前行,结果同pv1
    </li>
    <li>    MIN(pv) OVER(PARTITION BY cookieid) AS pv3,        --分组内所有行
    </li>
    <li>    MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4, --当前行+往前3行
    </li>
    <li>    MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5, --当前行+往前3行+往后1行
    </li>
    <li>    MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 ---当前行+往后所有行
    </li>
    <li>    FROM lxw1234;
    </li>
    <li>     
    </li>
    <li>    cookieid createtime pv pv1 pv2 pv3 pv4 pv5 pv6
    </li>
    <li>    -----------------------------------------------------------------------------
    </li>
    <li>    cookie1 2015-04-10 1 1 1 1 1 1 1
    </li>
    <li>    cookie1 2015-04-11 5 1 1 1 1 1 2
    </li>
    <li>    cookie1 2015-04-12 7 1 1 1 1 1 2
    </li>
    <li>    cookie1 2015-04-13 3 1 1 1 1 1 2
    </li>
    <li>    cookie1 2015-04-14 2 1 1 1 2 2 2
    </li>
    <li>    cookie1 2015-04-15 4 1 1 1 2 2 4
    </li>
    <li>    cookie1 2015-04-16 4 1 1 1 2 2 4</li>
    </ol></div>
    </div><div class="blockcode"><div id="code_f2D"><ol>
    <li>    ----MAX
    </li>
    <li>    SELECT cookieid,
    </li>
    <li>    createtime,
    </li>
    <li>    pv,
    </li>
    <li>    MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行
    </li>
    <li>    MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2, --从起点到当前行,结果同pv1
    </li>
    <li>    MAX(pv) OVER(PARTITION BY cookieid) AS pv3,        --分组内所有行
    </li>
    <li>    MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4, --当前行+往前3行
    </li>
    <li>    MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5, --当前行+往前3行+往后1行
    </li>
    <li>    MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 ---当前行+往后所有行
    </li>
    <li>    FROM lxw1234;
    </li>
    <li>     
    </li>
    <li>    cookieid createtime pv pv1 pv2 pv3 pv4 pv5 pv6
    </li>
    <li>    -----------------------------------------------------------------------------
    </li>
    <li>    cookie1 2015-04-10 1 1 1 7 1 5 7
    </li>
    <li>    cookie1 2015-04-11 5 5 5 7 5 7 7
    </li>
    <li>    cookie1 2015-04-12 7 7 7 7 7 7 7
    </li>
    <li>    cookie1 2015-04-13 3 7 7 7 7 7 4
    </li>
    <li>    cookie1 2015-04-14 2 7 7 7 7 7 4
    </li>
    <li>    cookie1 2015-04-15 4 7 7 7 7 7 4
    </li>
    <li>    cookie1 2015-04-16 4 7 7 7 4 4 4</li>
    </ol></div></div>


  • Statement:
    The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn