Home  >  Article  >  Operation and Maintenance  >  How to remove duplicate statistics in linux

How to remove duplicate statistics in linux

(*-*)浩
(*-*)浩Original
2019-05-28 17:00:594549browse

The linux command line provides very powerful text processing functions. Many powerful functions can be achieved by using a combination of linux commands. This article gives an example of how to use the Linux command line to deduplicate text by line and sort by the number of repetitions. The main commands used are sort, uniq and cut. Among them, the main function of sort is to sort, the main function of uniq is to realize the deduplication of adjacent text lines, and cut can extract the corresponding text columns from the text lines (simply put, it is to operate the text lines by columns).

How to remove duplicate statistics in linux

Remove duplicate text lines and sort them by the number of repetitions

Example:

First, deduplicate the text lines and count the number of repetitions (adding the -c option to the uniq command can count the number of repetitions).

$ sort test.txt | uniq -c 
2 Apple and Nokia. 
4 Hello World. 
1 I wanna buy an Apple device. 
1 My name is Friendfish. 
2 The Iphone of Apple company.

Sort lines of text by the number of repetitions.

sort -n identifies the number at the beginning of each line and sorts the text lines by their size. The default is to sort in ascending order. If you want to sort in descending order, add the -r option (sort -rn).

$ sort test.txt | uniq -c | sort -rn 
4 Hello World. 
2 The Iphone of Apple company. 
2 Apple and Nokia. 
1 My name is Friendfish.

The number of deleted duplicates in front of each line.

#cut command can operate text lines column by column. It can be seen that the previous number of repetitions occupies 8 characters. Therefore, you can use the command cut -c 9- to remove the 9th and subsequent characters of each line.

$ sort test.txt | uniq -c | sort -rn | cut -c 9- 
Hello World. 
The Iphone of Apple company. 
Apple and Nokia. 
My name is Friendfish. 
I wanna buy an Apple device.

The above is the detailed content of How to remove duplicate statistics in linux. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn