Home  >  Article  >  php教程  >  Detailed explanation of awk command

Detailed explanation of awk command

高洛峰
高洛峰Original
2016-12-15 11:06:381457browse

1. Foreword

awk has 3 different versions: awk, nawk and gawk. Unless otherwise specified, it generally refers to gawk. The most basic function of the awk language is to decompose and extract information from files or strings based on specified rules, and it can also output data based on specified rules. A complete awk script is typically used to format information in text files.

2. Basic syntax

awk [opion] 'awk_script' input_file1 [input_file2 ...]

awk's common options are:

① -F fs: Use fs as the field separator for input records, if omitted For this option, awk uses the value of the environment variable IFS

② -f filename: Read awk_script from the file filename

③ -v var=value: Set the variable for awk_script

awk has three running modes:

The first , put the awk script command directly in the command.

Second, put all the script commands of awk in a script file, and then use the -f option to specify the script command file to be run.

The third method is to put awk_script into the script file and use #!/bin/awk -f as the first line, give the script executable permission, and then call it by typing the script name in the shell.

3. Awk script

An awk script can be composed of one or more awk_cmd. For multiple awk_cmd, after one awk_cmd is completed, a new line should be started for separation.

awk_cmd consists of two parts: awk_pattern { actions }.

In addition, when using awk_script directly in the awk command, awk_script can also be written in multiple lines, but you must ensure that the entire awk_script is enclosed in single quotes. The general form of

awk command:

awk ' BEGIN { actions }

awk_pattern1 { actions }

............

awk_patternN { actions }

END { actions }

' inputfile

where BEGIN { actions } and END { actions } are optional.

You can use AWK's own built-in variables in the awk script, as follows:

ARGC The number of command line arguments

ARGV The command line argument array

FILENAME The current input file name

FNR The record number in the current file

FS Input field delimiter, default is a space

RS Input record delimiter

NF Number of fields in the current record

NR Number of records so far

OFS Output field delimiter

ORS Output record delimiter

The running process of the awk script:

① If the BEGIN block exists, awk executes the actions specified by it.

② awk reads a line from the input file, which is called an input record. (If the input file is omitted, it will be read from the standard input)

③ awk splits the read record into fields, putting the first field into the variable $1, the second field into $2, and so on. $0 represents the entire record. Field separators are specified using the shell environment variable IFS or by parameters.

④ Compare the current input record with the awk_pattern in each awk_cmd to see if it matches. If it matches, execute the corresponding actions. If there is no match, the corresponding actions are skipped until all awk_cmds are compared.

⑤ When an input record compares all awk_cmd, awk reads the next line of the input and continues to repeat steps ③ and ④. This process continues until awk reads the end of the file.

⑥ When awk has read all the input lines, if END exists, the corresponding actions will be executed.

1) input_file can be a file list with more than one file, and awk will process each file in the list in order.

2) An awk_pattern of awk_cmd can be omitted. When omitted, the corresponding actions will be executed without matching and comparing the input records. An awk_cmd action can also be omitted. When omitted, the default action is to print the current input record, that is, {print $0}. Awk_pattern and actions in an awk_cmd cannot be omitted at the same time.

3) The BEGIN block and the END block are located at the beginning and end of awk_script respectively. Only END blocks or only BEGIN blocks are allowed in awk_script. If there is only BEGIN { actions } in awk_script, awk will not read input_file.

4) awk reads the data of the input file into the memory, and then operates the copy of the input data in the memory. awk will not modify the content of the input file.

5) Awk always outputs to standard output. If you want awk to output to a file, you can use redirection.

3.1.awk_pattern

The awk_pattern pattern part determines when the actions action part is triggered and when the actions are triggered.

awk_pattern can be of the following types:

1) Regular expression is used as awk_pattern: /regexp/

Note that the regular expression regexp must be wrapped by /

It is often used in regular expression matching operations in awk Characters:

^ $ . [] | () * //: Universal regexp metacharacter

+: Matches the single character before it more than once. It is awk's own metacharacter and does not apply to grep or sed, etc.

? : matches the single character before it 1 or 0 times. It is awk’s own metacharacter and is not suitable for grep or sed.

For more information about regular expressions, please refer to "Regular Expressions"

Example:

awk '/ *$0.[0-9][0-9].*/' input_file

For example, the line content is $0.99. The helllo line can match the above regular expression

2) Boolean expressions are used as awk_pattern. When the expression is true, the execution of corresponding actions is triggered.

① You can use variables (such as field variables $1, $2, etc.) and /regexp/

② Operators in Boolean expressions:

Relational operator: < > <= >= == !=

Matching operator: value ~ /regexp/ If value matches /regexp/, return true

value !~ /regexp/ if value If /regexp/ does not match, then return true

Example: awk '$2 > 10 {print "ok"}' input_file

awk '$3 ~ /^d/ {print "ok"}' input_file

③ &&( And) and ||(or) can connect two /regexp/ or Boolean expressions to form a mixed expression. !(not) can be used in Boolean expressions or before /regexp/.

Example: awk '($1 < 10 ) && ($2 > 10) {print $0 "ok"}' input_file

awk '/^d/ || /x$/ {print $0 "ok"}' input_file

④ Other expressions are used as awk_script, such as assignment expressions, etc.

Example:

awk '(tot+=$6); END{print "total points :" tot }' input_file // The semicolon cannot be omitted

awk 'tot+=$6 {print $0} END{print "total points :" tot }' input_file // Equivalent to the above

When using an assignment expression, it means that if the assigned variable is a number, if it is non-0 , it matches, otherwise it does not match; if it is a string, it matches if it is not empty, otherwise it does not match.


awk built-in string functions:

gsub(r, s) Replace r with s throughout $0

awk 'gsub(/name/,"xingming") {print $0}' temp

gsub (r, s, t) Replace r with s in the entire t

index(s,t) Return the first position of the string t in s

awk 'BEGIN {print index("Sunny", "ny") }' temp Returns 4

length(s) Returns the length of s

match(s, r) Tests whether s contains a string matching r

awk '$1=="J.Lulu" {print match($1, "u")}' temp Return 4

split(s, a, fs) Split s into sequence a on fs

awk 'BEGIN {print split("12#345#6789", myarray, "#") "'

returns 3, while myarray[1]="12", myarray[2]="345", myarray[3]="6789"

sprint(fmt, exp) Returns exp formatted by fmt

sub(r, s) Replace r with s from the leftmost longest substring in $0 (only replace the first matching string encountered)

substr(s, p) Return the string s from p The starting suffix part

substr(s, p, n) Returns the suffix part starting from p and having a length of n in string s

awk string concatenation operation

[chengmo@centos5 ~]$ awk 'BEGIN{a= "a";b="b";c=(a""b);print c}'

ab

2.7. Use of printf function:

Character conversion: echo "65" |awk '{printf " %cn", $0}' Output A

awk 'BEGIN {printf "%fn", 999}' Output 999.000000

Formatted output: awk '{printf "%-15s %sn", $1, $3}' temp Align all the first fields to the left

2.8. Other awk usage:

Pass value to a line of awk command:

awk '{if ($5

who | awk '{if ($1==user) print $1 " are in " $2 ' user=$LOGNAME Use environment variables

awk script command: Use !/bin/awk -f at the beginning of

. Without this sentence, the self-contained script will not be executed. Example:

!/bin /awk -f

# all comment lines must start with a hash '#'

# name: student_tot.awk

# to call: student_tot.awk grade.txt

# prints total and average of club student points

# print a header first

BEGIN

{

print "Student Date Member No. Grade Age Points Max"

print "Name Joined Gained Point Available"

print"========== ==============================================="

}

# let's add the scores of points gained

(tot+=$6);

# finished processing now let's print the total and average point

END

{

  print "Club student total points :" tot

Print "Average Club Student points:" tot/N

}


2.9. awk array:

awk’s basic loop structure

For (element in array) print array[element]

awk 'BEGIN {record="123#456#789";split(record, myarray, "#")}

END { for (i in myarray) {print myarray[i]} }

3.0 Custom statements in awk

1. Conditional judgment statement (if)


if (expression) #if (Variable in Array)

Statement 1

else

Statement 2

In the format "Statement 1" can be multiple statements, If you want to facilitate Unix awk's judgment and your own reading, you'd better enclose multiple statements with {}. Unix awk branch structure allows nesting, its format is:


if(expression)


{statement 1}


else if(expression)

{statement 2}

else

{statement 3}

[chengmo@localhost nginx]# awk 'BEGIN{

test=100;

if(test>90)

{

print "very good";

}

else if(test>60)

{

print "good";

}

else

{

  print "no pass";

}

}'


very good



After each command statement Can be ended with ";".



2. Loop statement (while, for, do)


1. while statement


Format:


while(expression (formula)


{statement}

Example:


[chengmo@localhost nginx]# awk 'BEGIN{

test=100;

total=0;

while(i<=test)

{

total+=i;

i++;

}

print total;

}'

5050

2.for loop


for loop has two formats:


Format 1:


for(variable in array)


{statement}


Example:


[chengmo@localhost nginx]# awk 'BEGIN{

for(k in ENVIRON)

{

Print k"="ENVIRON[k];

}

}'


AWKPATH=.:/usr/share/awk

OLDPWD=/home/web97

SSH_ASKPASS=/ usr/libexec/openssh/gnome-ssh-askpass

SELINUX_LEVEL_REQUESTED=

SELINUX_ROLE_REQUESTED=

LANG=zh_CN.GB2312


. . . . . .


Explanation: ENVIRON is an awk constant and a sub-typical array.


Format 2:


for (variable; condition; expression)


{statement}


Example:


[chengmo @localhost nginx]#awk 'BEGIN{

total=0;

for(i=0;i<=100;i++)

{

total+=i;

}

print total;

}'


5050

3.do loop


Format:


do


{statement}while(condition)


Example:


[chengmo@localhost nginx] # awk 'BEGIN{

total=0;

i=0;

do

{

total+=i;

i++;

}while(i<=100)

print total;

}'

5050


The above is the awk flow control statement. From the syntax, you can see that it is the same as the c language. With these statements, many shell programs can actually be handed over to awk, and the performance is very fast.


break When the break statement is used in a while or for statement, it causes the program loop to exit.

continue When the continue statement is used in a while or for statement, causes the program loop to move to the next iteration.

next causes the next line of input to be read and returns to the top of the script. This avoids performing additional operations on the current input line. The

exit statement causes the main input loop to exit and transfers control to END, if END exists. If no END rule is defined, or an exit statement is applied in END, the execution of the script is terminated.


NR and FNR:

QUOTE:

A. The execution sequence of awk for multiple input files is that the code is first applied to the first file (read line by line), and then the repeated code is applied to the second file, and then to the third file.

B. Awk's execution order of multiple input files causes a line number problem. When the first file is executed and the second file is read next time, how to calculate the first line of the second file? If it counts as 1 again, wouldn’t it be two 1s? (Because the first file also has the first line). This is the problem with NR and FNR.

NR: Global line number (counted sequentially from the first line of the second file to the last line of the first file)

FNR: The number of lines in the current file itself (regardless of the number and total number of lines in the previous input files) )

         例如:data1.txt中有40行,data2.txt中有50行,那么awk '{}' data1.txt data2.txt

                  NR  的值依次为:1,2……40,41, 42 ... The value of 90

FNR is: 1, 2 ... 40, 1, 2 ... 50

Getline function description:

AWK's getline statement is used to simply read a record. Getline is especially useful if the user has a data record that resembles two physical records. It completes the separation of general fields (set field variables $0 FNR NF NR). Returns 1 on success, 0 on failure (end of file reached).

QUOTE:

A. Getline as a whole, we should understand its usage:

When there is no regulating direction in the left and right, or & lt;(VAR or $ 0 (no variable); should be noted that because AWK had read a line before processing Getline, the return result of Getline was separated.定 When there is a redirective rune | or & lt; Getline acts on the directional input file. Since the file is just opened, it is not read into a line by

AWK, but getline is read, so the getline returns is returning. The first line of the file, not every other line.

B. The usage of getline can be roughly divided into three major categories (each major category is divided into two sub-categories), that is, there are a total of 6 usages. The code is as follows:

QUOTE:

nawk 'BEGIN{"cat data.txt"|getline d; print d}' data2.txt

nawk 'BEGIN{"cat data.txt"|getline; print $0}' data2 .txt

nawk 'BEGIN{getline d < "data.txt"; print d}' data2.txt

nawk 'BEGIN{getline < "data.txt"; print $0}' data2.txt

above All four lines of code realize "only print the first line of the data.txt file" (if you want to print all lines, use a loop)

eg. nawk 'BEGIN{FS=":";while(getline<"/etc/passwd" >0){print $1}}' data.txt

QUOTE:

nawk '{getline d; print d”#”$3}' data.txt

awk first reads the first line, then Process the getline function, then assign the next line to the variable d, and then print d first. Since there is a newline character after d, the following # will overwrite d, and the following $3 will also overwrite d.

QUOTE:

nawk '{getline; print $0”#”$3}' data.txt

awk first reads the first line and then processes the getline function, and then assigns the next line to $0. The current $0 is already the next A line of content, the following # and $3 (taken from $0) will overwrite the content of $0.

In awk, sometimes it is necessary to call system tools to complete tasks that awk is not good at. The system command provided by awk can be used to execute, but the output results of external tools cannot be received. Fortunately, you can use getline to meet this requirement. For example

test.awk:

{

datecommand="/bin/date -j -f "%d/%b/%Y:%H:%M:%S" " $olddatestr " "+%Y %m%d %H%M%S"";

Datecommand | getline newdatestr

close(datecommand);

}

External commands require awk to occupy a file descriptor, and the maximum number of files that awk can open is There is an upper limit, and it is not large (for example, 16), so it is a good habit to do a close at the end. Defining the command string as a variable is also convenient when closing


For more detailed explanations of awk commands, please pay attention to the PHP Chinese website for related articles!


Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn