Heim >Datenbank >MySQL-Tutorial >Hive 创建表

Hive 创建表

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB
WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOriginal
2016-06-07 17:28:591734Durchsuche

Hive 创建表时正则表达式与java代码中一致,RegexSerDe 中用\\来表示反转码

1. Hive 创建表时正则表达式与java代码中一致,RegexSerDe 中用\\来表示反转码
create table inst_cn_3 (
  ip string,
  time string,
  mac string,
  lan string,
  ver string,
  lc string,
  pn string,
  reg string,
  vrf string,
  line string)
ROW FORMAT SERDE 'org.apache.Hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
  "input.regex" = "- *([\\d\\.]+) *\\[([\\d]{2}/[\\w]+/[\\d]{4}:[\\d]{2}:[\\d]{2}:[\\d]{2}\\s+\\+[\\d]+)\\] *GET */mx3/inst/([0-9a-f]{12})/ver=([\\.\\d]+),lan=(0x[\\w]+)(?:,lc=([\\w]+))(?:,pn=([\\w]+))(?:,reg=([0-1]))(?:,vrf=([\\w]+))?.*"
 )
STORED AS TEXTFILE;

alter table inst_cn_3 add columns(line string);
ALTER TABLE inst_cn_3 SET SERDEPROPERTIES (
  "input.regex" = "- ([\\d\\.]+) \\[([\\d]{2}/[\\w]+/[\\d]{4}:[\\d]{2}:[\\d]{2}:[\\d]{2}\\s+\\+[\\d]+)\\] GET /mx3/inst/([0-9a-f]{12})/ver=([\\.\\d]+),lan=(0x[\\w]+)(?:,lc=([\\w]+))(?:,pn=([\\w]+))(?:,reg=([0-1]))(?:,vrf=([\\w]+))?.*|(.*)"
 );
select * from inst_cn_3 limit 100;

select
  line
from inst_cn_3
where
  1=1
  and mac is null
  and line is not null
  and  !(line rlike '.*unknowuser00.*')
  ;

hadoop fs -cp /mnt/nfs/log/statcn/inst/inst_cn_3.*.txt /hive/warehouse/inst_cn_3/
 
2011.06.09 hive时间处理
1.select
  from_unixtime(unix_timestamp('02/May/2011:00:00:00 +0800','dd/MMMMM/yyyy:HH:mm:ss Z'),'yyyy-MM-dd HH:mm:ss') ,
  from_unixtime(unix_timestamp('02/May/2011:23:59:59 +0800','dd/MMMMM/yyyy:HH:mm:ss Z'),'yyyy-MM-dd HH:mm:ss')  ,
  from_unixtime(unix_timestamp('03/May/2011:00:00:00 +0800','dd/MMMMM/yyyy:HH:mm:ss Z'),'yyyy-MM-dd HH:mm:ss') ,
  from_unixtime(unix_timestamp('03/May/2011:23:59:59 +0800','dd/MMMMM/yyyy:HH:mm:ss Z'),'yyyy-MM-dd HH:mm:ss')
 from
  stat_c_log
where
  1=1
  and partkey = '20110503'
  and logType = 'inst_cn_3'
  and url rlike '/mx3/inst/.*'
limit
  10
;

2.
select
  from_unixtime(unix_timestamp('02/May/2011:00:00:00 +0800','dd/MMMMM/yyyy:HH:mm:ss Z'),'yyyy-MM-dd HH:mm:ss') ,
  from_unixtime(unix_timestamp('02/May/2011:23:59:59 +0800','dd/MMMMM/yyyy:HH:mm:ss Z'),'yyyy-MM-dd HH:mm:ss')  ,
  from_unixtime(unix_timestamp('03/May/2011:00:00:00 +0800','dd/MMMMM/yyyy:HH:mm:ss Z'),'yyyy-MM-dd HH:mm:ss') ,
  from_unixtime(unix_timestamp('03/May/2011:23:59:59 +0800','dd/MMMMM/yyyy:HH:mm:ss Z'),'yyyy-MM-dd HH:mm:ss')  ,
  round((unix_timestamp('02/May/2011:00:00:00 +0800','dd/MMMMM/yyyy:HH:mm:ss Z')-4*3600)/(3600*24))*(3600*24*1000),
  round((unix_timestamp('02/May/2011:23:59:59 +0800','dd/MMMMM/yyyy:HH:mm:ss Z')-4*3600)/(3600*24))*(3600*24*1000)
 from
  stat_c_log
where
  1=1
  and partkey = '20110503'
  and logType = 'inst_cn_3'
  and url rlike '/mx3/inst/.*'
limit
  10
;

2012.03.01
  1. Hive 方法注册类 FunctionRegistry
2012.06.14
  1. set hive.cli.print.header=true; 可以设置hive shell的输出.
2012.06.26
  1. hive cdh4b2 使用arichive 对表归档后, 使用select line 对归档后的partition查询时,, 报FileNotFoundException 异常。
    https://issues.apache.org/jira/browse/MAPREDUCE-2704 是因为CombineFileInputFormat constructs new Path objects by converting an existing path to a URI, and then only pulling out the "path" part of it. This drops the scheme and host, which makes CombineFileInputFormat fail if the paths are on a filesystem other than the default one.
 2012.07.16
  1. EXPLAIN EXTENDED hive_query; 查看运行.
2012.07.29
  1. DESCRIBE FORMATTED mock; 显示columns, location, params等.
  2. /src/ql/src/test/queries/clientpositive/

linux

Stellungnahme:
Der Inhalt dieses Artikels wird freiwillig von Internetnutzern beigesteuert und das Urheberrecht liegt beim ursprünglichen Autor. Diese Website übernimmt keine entsprechende rechtliche Verantwortung. Wenn Sie Inhalte finden, bei denen der Verdacht eines Plagiats oder einer Rechtsverletzung besteht, wenden Sie sich bitte an admin@php.cn