search
HomeOperation and MaintenanceLinux Operation and MaintenanceA classic algorithm question caused by a Linux command commonly used in projects

  小时候家里定了《读者》的月刊,里面记录一个故事:说有有个偏僻的乡村一日突然来了一个美女,她携着万贯家财子女在当地安家落户,成了当地的乡绅。她让她的子女世世代代的保守这个秘密,直到这个秘密不会再对家族带来灾难。她就是陈圆圆。当年吴三桂领清兵入关,冲冠一怒为红颜,改写了中国的历史,自己却能全身而退的那个人。

  周五例行公事的查看一下离线数据推送项目的数据和log。将log用awk分段之后,我想知道实时数据前10个被重复发送的数据ID都被重复发送了几次,从而找到进一步优化的入手点,天知道我对这个项目已经进行了多少次优化了。于是linux命令就是

 cat transmission.log |grep 'IncrementAlbumService.java:146'|awk '{print $6}'|awk -F ',' '{print $1}'| sort |uniq -c| sort -nr |head

I feel very guilty for the results I got

(Data security, the ID rule part of our project is not displayed)

Although this is related to their operation, I originally It is time to send the data when it detects the change, but for such a large resend rate. Regardless of the interface of the update service or the offline service, there are still points that can be optimized. Girl, my thinking has always been different from that of those male idols who use hair dryers and artificial watering cans to create a picture when they appear. In addition to this result, I am also thinking about another classic algorithm problem: there is a text file with about 10,000 lines, one word in each line, and it is required to count the top ten most frequently occurring words.

For this algorithm problem, the above Linux command is sort|uniq -c |sort -nr | head. The time complexity is the largest of the following:

 1> First do a sort,

Direct insertion sort: continuously insert elements into the ordered list, the worst time is complex The degree is O(n2)

Shell sorting: insertion sorting with reduced increment, unstable, depends on the selection of the increment factor sequence, the worst time complexity is O(n 2)

Simple selection sorting: Select the smallest or largest number among the numbers to be sorted and exchange it with the first unsorted position. The worst time complexity is O(n2 )

Binary selection sort: Each simple selection sort determines two elements, which can reduce the cycle by half.

Heap sort: tree selection sort, large root heap, small root heap. The worst time complexity is O(N*logN)

Bubble sorting: Each time two adjacent numbers are compared and exchanged, the worst time complexity is O(n2 )

Quick sorting: select the base element and divide the elements to be sorted each time. The worst time complexity is O(n2)

Merge sorting: divide the two elements An ordered list is synthesized into a new ordered list. The worst-case time complexity is O(N*logN)

. Bucket sorting: an algorithm that trades space for time. The complexity is close to O(n)

  Radix sorting: Allocate and collect according to hundreds of thousands of digits, the time complexity is O(dn)

  2>uniq time complexity is O(n)

  3> ;Sort time service degree is the same as 1>

4>The time complexity after sorting is O(1)

The algorithm used is also related to the size of the file. If the file is too large , if there is too much data, the files need to be split, sorted separately and then merged in multiple ways. So the number of words is mentioned here.

Without linux commands, the classic solution is to first use a dictionary tree to count word frequencies, and then use a big root heap. Let’s first introduce the dictionary tree, also called the tire tree. Because search engines often use this to make text word frequency statistics, and word segmentation algorithms also use this as a basic data structure, so I know a little bit about it. Its advantages are: minimizing unnecessary string comparisons, and query efficiency is higher than hash tables. The core idea is to exchange space for time and use public prefixes to reduce query time overhead. So when talking about statistics, the first thing that comes to mind is the dictionary tree. If you maintain an array of the top ten maximum word frequencies when counting word frequencies, the time complexity will be 10 times higher when compared in loop processing. Therefore, it is more appropriate to make statistics first and then select the top 10 in terms of time efficiency.

Actually, I don’t know much about algorithms, I just know how to use them. A former colleague of mine read an article I wrote on WeChat and asked me: "Is feed streaming a very technical job?" His question reminded me of Li Xiaoyao in "Sword of Immortals" who insisted on pretending to be tall, rich and handsome in the restaurant. When he said he wanted to order the most expensive dish: "Fried beef with vegetables", everyone laughed. Ling'er asked Li Xiaoyao: "Brother Xiaoyao, is fried beef with vegetables a very expensive dish?". Although my colleague was sincerely asking my opinion because he was in JD.com and was considering whether to go to Momo, I felt like the Li Xiaoyao who had never seen the world. The business logic of feed flow can be done in any way. Whether it has technical content depends on how it is done. I have written a patent to introduce a method of assembling feed streams. The process has not been completed, so I will not disclose the calculation method until then. But if you think hard, there are still many optimization points. The year before last, when I liked playing Moments, I would often find that the Moments I had deleted reappeared, or all the recent data in my or other people's Moments suddenly disappeared, leaving only very old data, such as two months ago a year ago. Data from a year ago will be automatically restored after one day. It's all a matter of strategy. There are many problems with WeChat Moments. Since one of our products, mm, is a family member of the WeChat architect, I won’t complain too much.

Although today is Sunday, you can use your imagination a little bit, but you also have to have a theme. The previous example has a classic top K problem. Because search engines often need to count the most popular query strings, the top K question is the basis. TopK problems use small root heaps. Maintain a small root heap of K size, traverse the elements to be compared, and compare them with the following elements respectively. If it is smaller than the root element, it means that it will definitely not enter the top K and will be eliminated. If it is greater than the root element, eliminate the root element. Then adjust the tree to the minimum heap and continue the comparison.

The minimum heap is a complete binary tree, and the value of each non-leaf node is not greater than the value of its child node. If this rule is broken, adjustments must be made from the first non-leaf node to the root node in a bottom-up order.

I decided to interview on Hulu next week, but I haven’t done so yet, so I probably won’t. Two years ago, my former colleague recommended Amazon, but I was not asked to go for an interview. To comfort myself, I guess they were not hiring at that time. I have never been to an interview with a foreign company like this before, so I don’t know what the routine is. If we start preparing now, we will probably be able to pass it after the National Day. I think it would be very disadvantageous for me to go to the interview by myself. It’s not going to be bad at all, it’s going to be very unstable. Friends who have read my articles may think that my articles are very messy and complicated. This is indeed the case for me in life. I have a wide range of knowledge, I am very whimsical, and I have no scruples. On the one hand, this lays the foundation for my creativity, but on the other hand, it is not conducive to my ability to express myself on the spot. The brain is like a computer. I have many parallel programs, the memory is not large enough, and there is a lot of data. Memory paging causes constant disk swapping. Time-sensitive actions like interviews can easily lead to timeout returns. I have so many technical invention patents, and now I think, I can’t even remember what I invented. I just took the bus. Because there were very few people, the driver asked me where to get off. He meant that it would stop wherever no one got off. It took me a long time to remember. My brain runs more in asynchronous non-blocking mode. In fact, synchronous blocking would be better for things like interviews. However, there is a solution to everything. If you can't find a solution, you just don't have enough ability. There's nothing wrong with that. However, the interview is to examine comprehensive abilities, such as teamwork, conversational skills, etc. I believe that no one in our department will have any objection to the sentence "Xiaojing is very smart." I also believe that colleagues with whom I collaborate in the department or work will not think that I am a difficult person to communicate with or get along with. But I tend to forget how to speak during interviews. But if I fail an interview because of this problem, I have no complaints. Because the interviewer is your future colleague and leader, if you are not in tune with the interviewer, you may not be able to use your abilities in the future. If you don’t do well in interviews and still feel that your abilities are sufficient, it’s likely that you are not high-level enough and have never seen what truly outstanding people look like. However, I am the kind of person who continues to do things even when I am determined to hit a wall. If I decide to give up on something, it’s because it’s not worth doing.

I like working. My goal is to still have a creative job when I am 60 years old. So I am afraid that domestic Internet companies will let me retire at the age of 40. There is one more important thing: I want to make my own search engine middleware. Domestic Internet companies mainly use it, so I am afraid it will be difficult for me to have the energy to do this. Of course, if you can’t go to Hulu, the search engine still has to do it. It’s just a matter of how to allocate your time.

I actually like to hit the wall, probably because I don’t want to grow up so quickly. If you act mature and elegant every day, you need to hide some things that you are not good at or things that may go wrong. As a result, I will be happy every day, but I may stay like this for the rest of my life. There are many famous figures in history who were originally playboys, but later became great men after their family fell into decline. In the book, there are two types of turning points in life: meeting noble people and encountering setbacks. When you are young and open-minded, you can have an epiphany when you meet a noble person and open your mind. As experience increases, people will be more selective in receiving the information around them. At this time, they may need to encounter great setbacks before they can rethink life. If I can see a better future, I am willing to go it alone and burn the boat. It’s better to have ups and downs than one day at a time. If you want to live, live a wonderful life~~

The above is the detailed content of A classic algorithm question caused by a Linux command commonly used in projects. For more information, please follow other related articles on the PHP Chinese website!

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
The Building Blocks of Linux: Key Components ExplainedThe Building Blocks of Linux: Key Components ExplainedApr 30, 2025 am 12:26 AM

The core components of the Linux system include the kernel, file system, and user space. 1. The kernel manages hardware resources and provides basic services. 2. The file system is responsible for data storage and organization. 3. Run user programs and services in the user space.

Using Maintenance Mode: Troubleshooting and Repairing LinuxUsing Maintenance Mode: Troubleshooting and Repairing LinuxApr 29, 2025 am 12:28 AM

Maintenance mode is a special operating level entered in Linux systems through single-user mode or rescue mode, and is used for system maintenance and repair. 1. Enter maintenance mode and use the command "sudosystemctlisolaterscue.target". 2. In maintenance mode, you can check and repair the file system and use the command "fsck/dev/sda1". 3. Advanced usage includes resetting the root user password, mounting the file system in read and write mode and editing the password file.

Linux Maintenance Mode: Understanding the PurposeLinux Maintenance Mode: Understanding the PurposeApr 28, 2025 am 12:01 AM

Maintenance mode is used for system maintenance and repair, allowing administrators to work in a simplified environment. 1. System Repair: Repair corrupt file system and boot loader. 2. Password reset: reset the root user password. 3. Package management: Install, update or delete software packages. By modifying the GRUB configuration or entering maintenance mode with specific keys, you can safely exit after performing maintenance tasks.

Linux Operations: Networking and Network ConfigurationLinux Operations: Networking and Network ConfigurationApr 27, 2025 am 12:09 AM

Linux network configuration can be completed through the following steps: 1. Configure the network interface, use the ip command to temporarily set or edit the configuration file persistence settings. 2. Set up a static IP, suitable for devices that require a fixed IP. 3. Manage the firewall and use the iptables or firewalld tools to control network traffic.

Maintenance Mode in Linux: A System Administrator's GuideMaintenance Mode in Linux: A System Administrator's GuideApr 26, 2025 am 12:20 AM

Maintenance mode plays a key role in Linux system management, helping to repair, upgrade and configuration changes. 1. Enter maintenance mode. You can select it through the GRUB menu or use the command "sudosystemctlisolaterscue.target". 2. In maintenance mode, you can perform file system repair and system update operations. 3. Advanced usage includes tasks such as resetting the root password. 4. Common errors such as not being able to enter maintenance mode or mount the file system, can be fixed by checking the GRUB configuration and using the fsck command.

Maintenance Mode in Linux: When and Why to Use ItMaintenance Mode in Linux: When and Why to Use ItApr 25, 2025 am 12:15 AM

The timing and reasons for using Linux maintenance mode: 1) When the system starts up, 2) When performing major system updates or upgrades, 3) When performing file system maintenance. Maintenance mode provides a safe and controlled environment, ensuring operational safety and efficiency, reducing impact on users, and enhancing system security.

Linux: Essential Commands and OperationsLinux: Essential Commands and OperationsApr 24, 2025 am 12:20 AM

Indispensable commands in Linux include: 1.ls: list directory contents; 2.cd: change working directory; 3.mkdir: create a new directory; 4.rm: delete file or directory; 5.cp: copy file or directory; 6.mv: move or rename file or directory. These commands help users manage files and systems efficiently by interacting with the kernel.

Linux Operations: Managing Files, Directories, and PermissionsLinux Operations: Managing Files, Directories, and PermissionsApr 23, 2025 am 12:19 AM

In Linux, file and directory management uses ls, cd, mkdir, rm, cp, mv commands, and permission management uses chmod, chown, and chgrp commands. 1. File and directory management commands such as ls-l list detailed information, mkdir-p recursively create directories. 2. Permission management commands such as chmod755file set file permissions, chownuserfile changes file owner, and chgrpgroupfile changes file group. These commands are based on file system structure and user and group systems, and operate and control through system calls and metadata.

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SAP NetWeaver Server Adapter for Eclipse

SAP NetWeaver Server Adapter for Eclipse

Integrate Eclipse with SAP NetWeaver application server.

PhpStorm Mac version

PhpStorm Mac version

The latest (2018.2.1) professional PHP integrated development tool

Atom editor mac version download

Atom editor mac version download

The most popular open source editor

Safe Exam Browser

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.