search
HomeDatabaseMysql TutorialHive导入Apache Nginx等日志与分析

将nginx日志导入到hive中的两种方法 1 在hive中建表 CREATE TABLE apachelog (ipaddress STRING, identd STRING, user STRING

将nginx日志导入到hive中的两种方法 

1 在hive中建表

  • 导入后日志格式为 

    203.208.60.91 -  -  05/May/2011:01:18:47 +0800      GET /robots.txt HTTP/1.1        404     1238 Mozilla/5.0 

     此方法支持hive中函数parse_url(referer,"HOST")

    第二种方法导入

     注意:这个方法在建表后,使用查询语句等前要先执行

    hive> add jar /home/hjl/hive/lib/hive_contrib.jar;

    或者设置hive/conf/hive-default.conf  添加


    hive.aux.jars.path
    file:///usr/local/hadoop/hive/lib/hive-contrib-0.7.0-cdh3u0.jar

    保存配置

  • 203.208.60.91   -       -       [05/May/2011:01:18:47 +0800]    "GET /robots.txt HTTP/1.1"      404     1238 "-"      "Mozilla/5.0 (compatible; Googlebot/2.1; +)" 

    此方法中的字段类型stringfrom deserializer   经测试不支持parse_url(referer,"HOST")获取域名

    可以用select split(referer,"/")[2] from apilog 获取域名

    如果文件数据是纯文本,可以使用 STORED AS TEXTFILE。如果数据需要压缩,,使用 STORED AS SEQUENCE  。

    导入日志命令

    hive>load data local inpath '/home/log/map.gz' overwrite into table log;  

    导入日志支持.gz等格式

     

    导入日志后进行分析 例句

    统计行数
    select count(*) from nginxlog;

    统计IP数
    select count(DISTINCT ip) from nginxlog;

    排行
    select t2.ip,t2.xx from (SELECT ip, COUNT(*) AS xx FROM nginxlog GROUP by ip) t2 sort by t2.xx desc

    hive>SELECT * from apachelog  WHERE ipaddress = '216.211.123.184';

     

    hive> SELECT ipaddress, COUNT(1) AS numrequest FROM apachelog GROUP BY ipaddress SORT BY numrequest DESC LIMIT 1;

    hive> set mapred.reduce.tasks=2;
    hive> SELECT ipaddress, COUNT(1) AS numrequest FROM apachelog GROUP BY ipaddress SORT BY numrequest DESC LIMIT 1;

    hive>CREATE TABLE ipsummary (ipaddress STRING, numrequest INT);
    hive>INSERT OVERWRITE TABLE ipsummary SELECT ipaddress, COUNT(1) FROM apachelog GROUP BY ipaddress;

    hive>SELECT ipsummary.ipaddress, ipsummary.numrequest FROM (SELECT MAX(numrequest) AS themax FROM ipsummary) ipsummarymax JOIN ipsummary ON ipsummarymax.themax = ipsummary.numrequest;

    hive查询结果导出为csv的方法(未测试)

    hive> set hive.io.output.fileformat=CSVTextFile;
    hive> insert overwrite local directory '/tmp/CSVrepos/' select * from S where ... ;

    linux

  • Statement
    The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
    图文详解apache2.4+php8.0的安装配置方法图文详解apache2.4+php8.0的安装配置方法Dec 06, 2022 pm 04:53 PM

    本文给大家介绍如何安装apache2.4,以及如何配置php8.0,文中附有图文详细步骤,下面就带大家一起看看怎么安装配置apache2.4+php8.0吧~

    Linux apache怎么限制并发连接和下载速度Linux apache怎么限制并发连接和下载速度May 12, 2023 am 10:49 AM

    mod_limitipconn,这个是apache的一个非官方模块,根据同一个来源ip进行并发连接控制,bw_mod,它可以根据来源ip进行带宽限制,它们都是apache的第三方模块。1.下载:wgetwget2.安装#tar-zxvfmod_limitipconn-0.22.tar.gz#cdmod_limitipconn-0.22#vimakefile修改:apxs=“/usr/local/apache2/bin/apxs”#这里是自己apache的apxs路径,加载模块或者#/usr/lo

    apache版本怎么查看?apache版本怎么查看?Jun 14, 2019 pm 02:40 PM

    查看​apache版本的步骤:1、进入cmd命令窗口;2、使用cd命令切换到Apache的bin目录下,语法“cd bin目录路径”;3、执行“httpd -v”命令来查询版本信息,在输出结果中即可查看apache版本号。

    超细!Ubuntu20.04安装Apache+PHP8环境超细!Ubuntu20.04安装Apache+PHP8环境Mar 21, 2023 pm 03:26 PM

    本篇文章给大家带来了关于PHP的相关知识,其中主要跟大家分享在Ubuntu20.04 LTS环境下安装Apache的全过程,并且针对其中可能出现的一些坑也会提供解决方案,感兴趣的朋友下面一起来看一下吧,希望对大家有帮助。

    nginx,tomcat,apache的区别是什么nginx,tomcat,apache的区别是什么May 15, 2023 pm 01:40 PM

    1.Nginx和tomcat的区别nginx常用做静态内容服务和代理服务器,直接外来请求转发给后面的应用服务器(tomcat,Django等),tomcat更多用来做一个应用容器,让javawebapp泡在里面的东西。严格意义上来讲,Apache和nginx应该叫做HTTPServer,而tomcat是一个ApplicationServer是一个Servlet/JSO应用的容器。客户端通过HTTPServer访问服务器上存储的资源(HTML文件,图片文件等),HTTPServer是中只是把服务器

    php站用iis乱码而apache没事怎么解决php站用iis乱码而apache没事怎么解决Mar 23, 2023 pm 02:48 PM

    ​在使用 PHP 进行网站开发时,你可能会遇到字符编码问题。特别是在使用不同的 Web 服务器时,会发现 IIS 和 Apache 处理字符编码的方法不同。当你使用 IIS 时,可能会发现在使用 UTF-8 编码时出现了乱码现象;而在使用 Apache 时,一切正常,没有出现任何问题。这种情况应该怎么解决呢?

    如何在 RHEL 9/8 上设置高可用性 Apache(HTTP)集群如何在 RHEL 9/8 上设置高可用性 Apache(HTTP)集群Jun 09, 2023 pm 06:20 PM

    Pacemaker是适用于类Linux操作系统的高可用性集群软件。Pacemaker被称为“集群资源管理器”,它通过在集群节点之间进行资源故障转移来提供集群资源的最大可用性。Pacemaker使用Corosync进行集群组件之间的心跳和内部通信,Corosync还负责集群中的投票选举(Quorum)。先决条件在我们开始之前,请确保你拥有以下内容:两台RHEL9/8服务器RedHat订阅或本地配置的仓库通过SSH访问两台服务器root或sudo权限互联网连接实验室详情:服务器1:node1.exa

    Linux下如何查看nginx、apache、mysql和php的编译参数Linux下如何查看nginx、apache、mysql和php的编译参数May 14, 2023 pm 10:22 PM

    快速查看服务器软件的编译参数:1、nginx编译参数:your_nginx_dir/sbin/nginx-v2、apache编译参数:catyour_apache_dir/build/config.nice3、php编译参数:your_php_dir/bin/php-i|grepconfigure4、mysql编译参数:catyour_mysql_dir/bin/mysqlbug|grepconfigure以下是完整的实操例子:查看获取nginx的编译参数:[root@www~]#/usr/lo

    See all articles

    Hot AI Tools

    Undresser.AI Undress

    Undresser.AI Undress

    AI-powered app for creating realistic nude photos

    AI Clothes Remover

    AI Clothes Remover

    Online AI tool for removing clothes from photos.

    Undress AI Tool

    Undress AI Tool

    Undress images for free

    Clothoff.io

    Clothoff.io

    AI clothes remover

    AI Hentai Generator

    AI Hentai Generator

    Generate AI Hentai for free.

    Hot Article

    R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
    2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
    Repo: How To Revive Teammates
    4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
    Hello Kitty Island Adventure: How To Get Giant Seeds
    4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

    Hot Tools

    DVWA

    DVWA

    Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

    SublimeText3 Mac version

    SublimeText3 Mac version

    God-level code editing software (SublimeText3)

    PhpStorm Mac version

    PhpStorm Mac version

    The latest (2018.2.1) professional PHP integrated development tool

    Safe Exam Browser

    Safe Exam Browser

    Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

    Zend Studio 13.0.1

    Zend Studio 13.0.1

    Powerful PHP integrated development environment