Use Linux AWK commands to make data processing more efficient!
在Linux系统中,我们经常需要对各种不同格式的数据进行处理和分析。这时候,一个简单又强大的工具就派上用场了 —— AWK。AWK是一种文本处理工具,它可以快速地处理文本文件,并且非常适合用于日志分析、数据提取、统计报表等各种任务。在本文中,我们将为您介绍AWK的基本用法和常见应用场景,让您轻松掌握这个数据处理利器。
0、基本用法
awk是一个强大的文本分析工具,简单来说awk就是把文件逐行读入,(空格,制表符)为默认分隔符将每行切片,切开的部分再进行各种分析处理
awk命令格式如下
awk [-F field-separator] 'commands' input-file(s)
[-F 分隔符]是可选的,因为awk使用空格,制表符作为缺省的字段分隔符,因此如果要浏览字段间有空格,制表符的文本,不必指定这个选项,但如果要浏览诸如/etc/passwd文件,此文件各字段以冒号作为分隔符,则必须指明-F选项
echo "this is a test" | awk '{ print $0 }' ## 输出为 this is a test
shell
读取用户输入的字符串发现|,代表有管道。|左右被理解为简单命令,即前一个(左边)简单命令的标准输出指向后一个(右边)标准命令的标准输入awk
会根据分隔符将行分成若干个字段,为整行,1为第一个字段,$2 为第2个地段,依此类推…
为打印一个字段或所有字段,使用print命令。这是一个awk
动作
echo "this is a test" | awk '{ print $1 }' ## 输出为 this echo "this is a test" | awk '{ print $1, $2 }' ## 输出为 this is
/etc/passwd
的文件内容如下
root:x:0:0:root:/root:/bin/bash bin:x:1:1:bin:/bin:/sbin/nologin daemon:x:2:2:daemon:/sbin:/sbin/nologin adm:x:3:4:adm:/var/adm:/sbin/nologin lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
举几个简单的小需求
1、只显示/etc/passwd的账户
awk -F : '{ print $1 }' /etc/passwd ## 输出为 root bin daemon adm lp

2、显示/etc/passwd的第1列和第7列,用逗号分隔显示,所有行开始前添加列名start1,start7,最后一行添加,end1,end7
awk -F ':' 'BEGIN {print "start1,start7"} {print $1 "," $7} END {print "end1,end7"}' /etc/passwd ## 输出为 start1,start7 root,/bin/bash bin,/sbin/nologin daemon,/sbin/nologin adm,/sbin/nologin lp,/sbin/nologin end1,end7
BEGIN语句在所有文本处理动作执行之前被执行,END在所有文本处理动作执行之后被执行
3、统计/etc/passwd文件中,每行的行号,每行的列数,对应的完整行内容
awk -F : '{ print NR " " NF " " $0 }' /etc/passwd ## 输出为 1 7 root:x:0:0:root:/root:/bin/bash 2 7 bin:x:1:1:bin:/bin:/sbin/nologin 3 7 daemon:x:2:2:daemon:/sbin:/sbin/nologin 4 7 adm:x:3:4:adm:/var/adm:/sbin/nologin 5 7 lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
1、支持内置变量
上面示例中NR
,和NF
其实就是awk
的内置变量,一些内置变量如下
变量名 解释 FILENAMEawk浏览的文件名 FS设置输入字段分隔符,等价于命令行-F选项 NF 浏览记录的字段个数 NR 已读的记录数
2、支持函数
输出字符串的长度
awk 'BEGIN { print length("this is a text") }'
## 输出为
14
将/etc/passwd
的用户名变成大写输出
awk -F ':' '{ print toupper($1) }' /etc/passwd
## 输出为
ROOT BIN DAEMON ADM LP
常用函数如下
函数名 作用 toupper(s)返回s的大写 tolower(s) 返回s的小写 length(s) 返回s长度 substr(s,p) 返回字符串s中从p开始的后缀部分
3、支持条件操作,正则表达式匹配
显示/etc/passwd中有daemon的行
awk -F ‘:’ ‘$0 ~ /daemon/’ /etc/passwd
## 输出为
daemon:x:2:2:daemon:/sbin:/sbin/nologin awk条件操作符 操作符 描述 if while do/while for break continue
输出第一个字段的第一个字符大于d的行
awk -F ':' '{ if ($1 > "d") { print $1 } else { print "-" } }' /etc/passwd
## 输出为
root - daemon - lp
可以把流程控制语句放到一个脚本中,然后调用脚本执行,如test.sh的内容如下
{ if ($1 > "d") { print $1 } else { print "-" } }
用如下方式执行,效果一样
awk -F ':' -f test.sh /etc/passwd
## 输出为
root - daemon - lp
5、应用场景
小编用awk进行文本分析比较少,主要用来写脚本
如一个weibo-interface-1.0.jar应用,启动脚本如下
start.sh nohup java -jar weibo-interface-1.0.jar >out 2>&1 &
关闭脚本如下,kill.sh
kill -9 `jps -l | grep 'weibo-interface-1.0.jar' | awk '{print $1}'`
jps -l的输出如下
70208 com.st.kmp.main.KmpService 31036 com.st.cis.main.BaiduAnalysisService 66813 weibo-interface-1.0.jar
还有就是关闭hadoop集群的所有DataNode节点(不知道hadoop的可以认为DataNode是一个集群应用),假如一个个机器jps,查看pid,kill。很麻烦,直接写了一个脚本,依次ssh到各个节点,然后执行如下命令即可
kill `jps | grep 'DataNode' | awk '{print $1}'`
jps的输出为
508 DataNode 31481 JournalNode 31973 NodeManager
总的来说,AWK是一个非常强大的数据处理工具。通过灵活的语法和功能,我们可以快速地处理不同格式的数据,并且生成各种形式的报表和统计数据。在本文中,我们介绍了AWK的基本概念、语法和常见应用场景,并且通过实例演示了如何使用AWK处理数据。希望本文能够帮助读者更好地理解AWK,提高数据处理的效率!
The above is the detailed content of Use Linux AWK commands to make data processing more efficient!. For more information, please follow other related articles on the PHP Chinese website!

The main tasks of Linux system administrators include system monitoring and performance tuning, user management, software package management, security management and backup, troubleshooting and resolution, performance optimization and best practices. 1. Use top, htop and other tools to monitor system performance and tune it. 2. Manage user accounts and permissions through useradd commands and other commands. 3. Use apt and yum to manage software packages to ensure system updates and security. 4. Configure a firewall, monitor logs, and perform data backup to ensure system security. 5. Troubleshoot and resolve through log analysis and tool use. 6. Optimize kernel parameters and application configuration, and follow best practices to improve system performance and stability.

Learning Linux is not difficult. 1.Linux is an open source operating system based on Unix and is widely used in servers, embedded systems and personal computers. 2. Understanding file system and permission management is the key. The file system is hierarchical, and permissions include reading, writing and execution. 3. Package management systems such as apt and dnf make software management convenient. 4. Process management is implemented through ps and top commands. 5. Start learning from basic commands such as mkdir, cd, touch and nano, and then try advanced usage such as shell scripts and text processing. 6. Common errors such as permission problems can be solved through sudo and chmod. 7. Performance optimization suggestions include using htop to monitor resources, cleaning unnecessary files, and using sy

The average annual salary of Linux administrators is $75,000 to $95,000 in the United States and €40,000 to €60,000 in Europe. To increase salary, you can: 1. Continuously learn new technologies, such as cloud computing and container technology; 2. Accumulate project experience and establish Portfolio; 3. Establish a professional network and expand your network.

The main uses of Linux include: 1. Server operating system, 2. Embedded system, 3. Desktop operating system, 4. Development and testing environment. Linux excels in these areas, providing stability, security and efficient development tools.

The Internet does not rely on a single operating system, but Linux plays an important role in it. Linux is widely used in servers and network devices and is popular for its stability, security and scalability.

The core of the Linux operating system is its command line interface, which can perform various operations through the command line. 1. File and directory operations use ls, cd, mkdir, rm and other commands to manage files and directories. 2. User and permission management ensures system security and resource allocation through useradd, passwd, chmod and other commands. 3. Process management uses ps, kill and other commands to monitor and control system processes. 4. Network operations include ping, ifconfig, ssh and other commands to configure and manage network connections. 5. System monitoring and maintenance use commands such as top, df, du to understand the system's operating status and resource usage.

Introduction Linux is a powerful operating system favored by developers, system administrators, and power users due to its flexibility and efficiency. However, frequently using long and complex commands can be tedious and er

Linux is suitable for servers, development environments, and embedded systems. 1. As a server operating system, Linux is stable and efficient, and is often used to deploy high-concurrency applications. 2. As a development environment, Linux provides efficient command line tools and package management systems to improve development efficiency. 3. In embedded systems, Linux is lightweight and customizable, suitable for environments with limited resources.


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

MantisBT
Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

mPDF
mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

Dreamweaver CS6
Visual web development tools

DVWA
Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

ZendStudio 13.5.1 Mac
Powerful PHP integrated development environment