Home >Database >Mysql Tutorial >Hive导入Apache Nginx等日志与分析

Hive导入Apache Nginx等日志与分析

WBOY
WBOYOriginal
2016-06-07 17:04:131035browse

将nginx日志导入到hive中的两种方法 1 在hive中建表 CREATE TABLE apachelog (ipaddress STRING, identd STRING, user STRING

将nginx日志导入到hive中的两种方法 

1 在hive中建表

  • 导入后日志格式为 

    203.208.60.91 -  -  05/May/2011:01:18:47 +0800      GET /robots.txt HTTP/1.1        404     1238 Mozilla/5.0 

     此方法支持hive中函数parse_url(referer,"HOST")

    第二种方法导入

     注意:这个方法在建表后,使用查询语句等前要先执行

    hive> add jar /home/hjl/hive/lib/hive_contrib.jar;

    或者设置hive/conf/hive-default.conf  添加


    hive.aux.jars.path
    file:///usr/local/hadoop/hive/lib/hive-contrib-0.7.0-cdh3u0.jar

    保存配置

  • 203.208.60.91   -       -       [05/May/2011:01:18:47 +0800]    "GET /robots.txt HTTP/1.1"      404     1238 "-"      "Mozilla/5.0 (compatible; Googlebot/2.1; +)" 

    此方法中的字段类型stringfrom deserializer   经测试不支持parse_url(referer,"HOST")获取域名

    可以用select split(referer,"/")[2] from apilog 获取域名

    如果文件数据是纯文本,可以使用 STORED AS TEXTFILE。如果数据需要压缩,,使用 STORED AS SEQUENCE  。

    导入日志命令

    hive>load data local inpath '/home/log/map.gz' overwrite into table log;  

    导入日志支持.gz等格式

     

    导入日志后进行分析 例句

    统计行数
    select count(*) from nginxlog;

    统计IP数
    select count(DISTINCT ip) from nginxlog;

    排行
    select t2.ip,t2.xx from (SELECT ip, COUNT(*) AS xx FROM nginxlog GROUP by ip) t2 sort by t2.xx desc

    hive>SELECT * from apachelog  WHERE ipaddress = '216.211.123.184';

     

    hive> SELECT ipaddress, COUNT(1) AS numrequest FROM apachelog GROUP BY ipaddress SORT BY numrequest DESC LIMIT 1;

    hive> set mapred.reduce.tasks=2;
    hive> SELECT ipaddress, COUNT(1) AS numrequest FROM apachelog GROUP BY ipaddress SORT BY numrequest DESC LIMIT 1;

    hive>CREATE TABLE ipsummary (ipaddress STRING, numrequest INT);
    hive>INSERT OVERWRITE TABLE ipsummary SELECT ipaddress, COUNT(1) FROM apachelog GROUP BY ipaddress;

    hive>SELECT ipsummary.ipaddress, ipsummary.numrequest FROM (SELECT MAX(numrequest) AS themax FROM ipsummary) ipsummarymax JOIN ipsummary ON ipsummarymax.themax = ipsummary.numrequest;

    hive查询结果导出为csv的方法(未测试)

    hive> set hive.io.output.fileformat=CSVTextFile;
    hive> insert overwrite local directory '/tmp/CSVrepos/' select * from S where ... ;

    linux

  • Statement:
    The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn