用 shell 处理一个文本文件,内容如下:
fdf 284
asd 112
adf 146
csb 513
dfg 576
asd 346
adf 263
csb 092
dfg 547
根据第一列去重,相同的保留第二列值最大的那个,结果数据应该是这样的:
fdf 284
asd 346
adf 263
csb 513
dfg 576
看了下 uniq 命令,好像不支持按字段去重。请问该如何去重呢?
阿神2017-04-17 11:54:44
Method 1
cat data.txt | sort -rnk2 | awk '{if (!keys[]) print cat data.txt | sort -k1,1 | awk '{
if (lastKey == ) {
if (lastValue < ) {
lastLine = rrreee;
lastValue = int();
}
} else {
if (lastLine) {
print lastLine;
}
lastKey = ;
lastLine = rrreee;
lastValue = int();
}
} END {
if (lastLine) {
print lastLine;
}
}'
; keys[] = 1;}'
First arrange in reverse order in the second column to ensure that the numbers are output from large to small, and then use awk. Only the string in the first column will be output when it appears for the first time, and the others will be discarded. This should solve the problem. Problem. However, this method may cause awk to occupy a lot of memory, which may cause problems if the file is too large.
Method 2
rrreeeThis solution is to sort by the first column, and then use awk to filter the results. The filtering process is equivalent to an enhanced version of uniq. This solution is much better in terms of memory usage, but the amount of code is slightly larger and not very concise.
高洛峰2017-04-17 11:54:44
$ sort -r a.txt | awk '{print , }' | uniq -f1 | awk '{print , }'
fdf 284
dfg 576
csb 513
asd 346
adf 263
Reverse order, reverse the first and second columns, remove duplicates by the second column, reverse the first and second columns
高洛峰2017-04-17 11:54:44
awk 'BEGIN{ a[]= }{ if (>a[] ) a[]= }END{for (i in a) if (i) print i,a[i]}' data.txt
Put the first column into the array and then compare the values in the array and replace the larger value with the new value
高洛峰2017-04-17 11:54:44
[root@localhost ~]# sort -k2r 1.txt|awk '!a[$1]++'
dfg 576
csb 513
asd 346
fdf 284
adf 263