집 >데이터 베이스 >MySQL 튜토리얼 >Hive中使用自定义函数(UDF)实现分析函数row_number的功能
之前部门实现row_number是使用的transform,我觉得用UDF实现后,平时的使用会更方便,免去了transform相对繁琐的语法。
之前部门实现row_number是使用的transform,,我觉得用UDF实现后,平时的使用会更方便,免去了transform相对繁琐的语法。
用到的测试表为:
hive> desc row_number_test;
OK
id1 int
id2 string
age int
score double
name string
hive> select * from row_number_test;
OK
2 t04 25 60.0 youlia
1 t01 20 85.0 liujiannan
1 t02 24 70.0 zengqiu
2 t03 30 88.0 hongqu
2 t03 27 70.0 yongqi
1 t02 19 75.0 wangdong
1 t02 24 70.0 zengqiu
使用时要先在子查询中进行分区与排序,比如Oracle中这样一句SQL:
select row_number() over (partition by id1 order by age desc) from row_number_test;
转换为hive语句应该是:
select row_number(id1) from --partition by的字段传到row_number函数中去
(select * from row_number_test distribute by id1 sort by id1,age desc) a;
如果partition by 两个字段:
select row_number() over (partition by id1,id2 order by score) from row_number_test;
转换为hive语句应该是:
select row_number(id1,id2) --partition by的字段传到row_number函数中去
from (select * from row_number_test distribute by id1,id2 sort by id1,id2,score) a;
展示一下查询结果:
1.
select id1,id2,age,score,name,row_number(id1) rn from (select * from row_number_test distribute by id1 sort by id1,age desc) a;
OK
2 t03 30 88.0 hongqu 1
2 t03 27 70.0 yongqi 2
2 t04 25 60.0 youlia 3
1 t02 24 70.0 zengqiu 1
1 t02 24 70.0 zengqiu 2
1 t01 20 85.0 liujiannan 3
1 t02 19 75.0 wangdong 4
2.
select id1,id2,age,score,name,row_number(id1,id2) rn from (select * from row_number_test distribute by id1,id2 sort by id1,id2,score) a;
OK
2 t04 25 60.0 youlia 1
1 t02 24 70.0 zengqiu 1
2 t03 27 70.0 yongqi 1
1 t02 24 70.0 zengqiu 2
1 t02 19 75.0 wangdong 3
1 t01 20 85.0 liujiannan 1
2 t03 30 88.0 hongqu 2
下面是代码,只实现了接收1个参数和2个参数的evaluator方法,参数再多的照搬代码就可以了,代码仅供参考:
package com.Hadoopbook.hive;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.hive.ql.udf.UDFType;
@UDFType(deterministic = false)
public class Row_number extends UDF {
private static int MAX_VALUE = 50;
private static String comparedColumn[] = new String[MAX_VALUE];
private static int rowNum = 1;
public int evaluate (Object ...args){
String columnValue[] = new String[args.length];
for(int i=0;i columnValue[i] = args[i].toString(); if (rowNum == 1) { for(int i=0;i comparedColumn[i] = columnValue[i]; } for(int i=0;i { if ( !comparedColumn[i].equals(columnValue[i]) ) { for (int j=0;j { comparedColumn[j] = columnValue[j]; } rowNum = 1; return rowNum++; } } return rowNum++; } public static void main(String args[]) { Row_number t = new Row_number(); System.out.println(t.evaluate(123)); System.out.println(t.evaluate(123)); System.out.println(t.evaluate(123)); System.out.println(t.evaluate(1234)); System.out.println(t.evaluate(1234)); System.out.println(t.evaluate(1234)); System.out.println(t.evaluate(1235)); } } Hive 的详细介绍:请点这里 相关阅读: 基于Hadoop集群的Hive安装 Hive内表和外表的区别 Hadoop + Hive + Map +reduce 集群安装部署 Hive本地独立模式安装 Hive学习之WordCount单词统计
Hive 的下载地址:请点这里