Home  >  Q&A  >  body text

linux - shell 排序去重问题

用 shell 处理一个文本文件,内容如下:

fdf     284 
asd     112
adf     146
csb     513
dfg     576
asd     346
adf     263
csb     092
dfg     547

根据第一列去重,相同的保留第二列值最大的那个,结果数据应该是这样的:

fdf    284
asd    346
adf    263
csb    513
dfg    576

看了下 uniq 命令,好像不支持按字段去重。请问该如何去重呢?

PHPzPHPz2743 days ago828

reply all(4)I'll reply

  • 阿神

    阿神2017-04-17 11:54:44

    Method 1

    cat data.txt | sort -rnk2 | awk '{if (!keys[]) print 
    cat data.txt | sort -k1,1 | awk '{
        if (lastKey == ) {
            if (lastValue < ) {
                lastLine = rrreee;
                lastValue = int();
            }
        } else {
            if (lastLine) {
                print lastLine;
            }
    
            lastKey = ;
            lastLine = rrreee;
            lastValue = int();
        }
    } END {
        if (lastLine) {
            print lastLine;
        }
    }'
    
    ; keys[] = 1;}'

    First arrange in reverse order in the second column to ensure that the numbers are output from large to small, and then use awk. Only the string in the first column will be output when it appears for the first time, and the others will be discarded. This should solve the problem. Problem. However, this method may cause awk to occupy a lot of memory, which may cause problems if the file is too large.

    Method 2

    rrreee

    This solution is to sort by the first column, and then use awk to filter the results. The filtering process is equivalent to an enhanced version of uniq. This solution is much better in terms of memory usage, but the amount of code is slightly larger and not very concise.

    reply
    0
  • 高洛峰

    高洛峰2017-04-17 11:54:44

    $ sort -r a.txt | awk '{print , }' | uniq -f1 | awk '{print , }'
    fdf 284
    dfg 576
    csb 513
    asd 346
    adf 263
    

    Reverse order, reverse the first and second columns, remove duplicates by the second column, reverse the first and second columns

    reply
    0
  • 高洛峰

    高洛峰2017-04-17 11:54:44

    awk  'BEGIN{ a[]= }{ if (>a[] )  a[]=  }END{for (i in a) if (i)  print i,a[i]}' data.txt   
    

    Put the first column into the array and then compare the values ​​in the array and replace the larger value with the new value

    reply
    0
  • 高洛峰

    高洛峰2017-04-17 11:54:44

    [root@localhost ~]# sort -k2r 1.txt|awk '!a[$1]++'
    dfg     576
    csb     513
    asd     346
    fdf     284 
    adf     263

    reply
    0
  • Cancelreply