From beginner to proficient, learn Linux redirection and pipeline tools to speed up your workflow!-LINUX-php.cn

Home

System Tutorial

LINUX

From beginner to proficient, learn Linux redirection and pipeline tools to speed up your workflow!

PHPz

Feb 09, 2024 pm 11:36 PM

linuxlinux tutoriallinux systemLinux operating systemlinux commandshell scriptembeddedlinuxGetting started with linuxlinux learning

Improving work efficiency, operating system optimization, automation, etc. are the goals pursued by every IT practitioner. In the Linux operating system, being able to skillfully use redirection and pipeline command line tools is one of the skills that must be mastered. This article will explain in detail the usage and principles of redirection and pipeline tools through examples.

I like the Linux system very much, especially some of the designs of Linux are very beautiful. For example, some complex problems can be decomposed into several small problems, and can be solved flexibly with ready-made tools through the pipe character and redirection mechanism. It can be written as a shell script. Very efficient.

From beginner to proficient, learn Linux redirection and pipeline tools to speed up your workflow!

This article will share some of the pitfalls I encountered when using redirection and pipe characters in practice. Understanding some underlying principles can improve the efficiency of writing scripts a lot.

> and >> redirection characters pitfalls

Let’s talk about the first question first. What will happen if we execute the following command?

$ cat file.txt > file.txt

Reading and writing to the same file feels like nothing will happen, right?

Actually, the result of running the above command is to clear the contents of the file.txt file.

PS: Some Linux distributions may report an error directly. You can execute catfile.txt to bypass this detection.

As mentioned above about Linux processes and file descriptors, the program itself does not need to care about where its standard input/output points. It is the shell that modifies the location of the program's standard input/output through pipe characters and redirection symbols.

So when executing the command cat file.txt > file.txt, the shell will first open file.txt. Since the redirection symbol is >, the content in the file will be cleared, and then the shell will set the standard output of the cat command. is file.txt, then the cat command starts to be executed.

That is the following process:

1. Shell opens file.txt and clears its contents.
2. Shell points the standard output of the cat command to the file.txt file.
3. The shell executes the cat command and reads an empty file.
4. The cat command writes an empty string to the standard output (file.txt file).

So, the final result is that file.txt becomes an empty file.

We know that > will clear the target file, and >> will append content to the end of the target file, so what will happen if the redirection symbol > is changed to >>?

$ echo hello world > file.txt # 文件中只有一行内容 
$ cat file.txt >> file.txt # 这个命令会死循环

One line of content is first written into file.txt. After executing cat file.txt >> file.txt, the expected result should be two lines of content.

Unfortunately, the running result is not as expected. Instead, it will continue to write hello world to file.txt in an infinite loop. The file will soon become very large, and the command can only be stopped with Control C.

This is interesting, why is there an infinite loop? In fact, after a little analysis, you can think of the reason:

First, recall the behavior of the cat command. If you only execute the cat command, the keyboard input will be read from the command line. Every time you press Enter, the cat command will echo the input. In other words, the cat command It reads data line by line and then outputs the data.

Then, the execution process of cat file.txt >> file.txt command is as follows:

1. Open file.txt and prepare to append content to the end of the file.
2. Point the standard output of the cat command to the file.txt file.
3. The cat command reads a line of content in file.txt and writes it to the standard output (append to the file.txt file).
4. Since a line of data has just been written, the cat command finds that there is still content that can be read in file.txt, and will repeat step 3.

The above process is like traversing the list and appending elements to the list at the same time. It will never be traversed completely, so our command will loop in an infinite loop.

> The redirection character and the | pipe character work together

We often encounter such a requirement: intercept the first XX lines of the file and delete the rest.

In Linux, the head command can complete the function of intercepting the first few lines of the file:

$ cat file.txt # file.txt 中有五行内容 
1 
2 
3 
4 
5 
$ head -n 2 file.txt # head 命令读取前两行 
1 
2 
$ cat file.txt | head -n 2 # head 也可以读取标准输入 
1 
2

If we want to keep the first 2 lines of the file and delete the others, we may use the following command:

$ head -n 2 file.txt > file.txt

But this makes the mistake mentioned above. In the end, file.txt will be cleared, which cannot meet our needs.

Can we avoid pitfalls by writing commands like this:

$ cat file.txt | head -n 2 > file.txt

The conclusion is that it does not work, the file content will still be cleared.

What? Is there a leak in the pipeline and all the data is missing?

In the previous article, Linux processes and file descriptors, I also said that the implementation principle of the pipe character is essentially to connect the standard input and output of two commands, so that the standard output of the previous command can be used as the standard input of the next command.

However, if you think that writing commands like this can get the expected results, it may be because you think that the commands connected by the pipe character are executed serially. This is a common mistake. In fact, multiple commands connected by the pipe character are executed serially. are executed in parallel.

You may think that the shell will first execute the cat file.txt command, read all the contents in file.txt normally, and then pass these contents to the head -n 2 > file.txt command through the pipe.

Although the contents of file.txt will be cleared at this time, head does not read data from the file, but reads data from the pipe, so it should be possible to write two lines of data to file.txt correctly.

但实际上，上述理解是错误的，shell 会并行执行管道符连接的命令，比如说执行如下命令：

$ sleep 5 | sleep 5

shell 会同时启动两个sleep进程，所以执行结果是睡眠 5 秒，而不是 10 秒。

这是有点违背直觉的，比如这种常见的命令：

$ cat filename | grep 'pattern'

直觉好像是先执行cat命令一次性读取了filename中所有的内容，然后传递给grep命令进行搜索。

但实际上是cat和grep命令是同时执行的，之所以能得到预期的结果，是因为grep ‘pattern’会阻塞等待标准输入，而cat通过 Linux 管道向grep的标准输入写入数据。

执行下面这个命令能直观感受到cat和grep是在同时执行的，grep在实时处理我们用键盘输入的数据：

$ cat | grep 'pattern'

说了这么多，再回顾一开始的问题：

$ cat file.txt | head -n 2 > file.txt

cat命令和head会并行执行，谁先谁后不确定，执行结果也就不确定。

如果head命令先于cat执行，那么file.txt就会被先清空，cat也就读取不到任何内容;反之，如果cat先把文件的内容读取出来，那么可以得到预期的结果。

不过，通过我的实验(将这种并发情况重复 1w 次)发现，file.txt被清空这种错误情况出现的概率远大于预期结果出现的概率，这个暂时还不清楚是为什么，应该和 Linux 内核实现进程和管道的逻辑有关。

解决方案

说了这么多管道符和重定向符的特点，如何才能避免这个文件被清空的坑呢?

最靠谱的办法就是不要同时对同一个文件进行读写，而是通过临时文件的方式做一个中转。

比如说只保留file.txt文件中的头两行，可以这样写代码：

# 先把数据写入临时文件，然后覆盖原始文件

$ cat file.txt | head -n 2 > temp.txt && mv temp.txt file.txt

这是最简单，最可靠，万无一失的方法。

你如果嫌这段命令太长，也可以通过apt/brew/yum等包管理工具安装moreutils包，就会多出一个sponge命令，像这样使用：

# 先把数据传给 sponge，然后由 sponge 写入原始文件 
$ cat file.txt | head -n 2 | sponge file.txt

sponge这个单词的意思是海绵，挺形象的，它会先把输入的数据「吸收」起来，最后再写入file.txt，核心思路和我们使用临时文件时类似的，这个「海绵」就好比一个临时文件，就可以避免同时打开同一个文件进行读写的问题。

在Linux操作系统中，重定向和管道是非常有用的命令行工具，可以让我们更好地掌握系统的运行状态和信息。掌握相关技能能够帮助我们更好地进行系统优化和自动化工作，从而更好地提高工作效率。相信通过本文的介绍，读者对重定向和管道的原理和使用方法都有了更为深入的了解。

The above is the detailed content of From beginner to proficient, learn Linux redirection and pipeline tools to speed up your workflow!. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:良许Linux教程网. If there is any infringement, please contact admin@php.cn delete

How does the command line environment of Linux make it more/less secure than Windows?May 01, 2025 am 12:03 AM

Linux'scommandlinecanbemoresecurethanWindowsifmanagedcorrectly,butrequiresmoreuserknowledge.1)Linux'sopen-sourcenatureallowsforquicksecurityupdates.2)Misconfigurationcanleadtovulnerabilities.Windows'commandlineismorecontrolledbutlesscustomizable,with

How to Make a USB Drive Mount Automatically in LinuxApr 30, 2025 am 10:04 AM

This guide explains how to automatically mount a USB drive on boot in Linux, saving you time and effort. Step 1: Identify Your USB Drive Use the lsblk command to list all block devices. Your USB drive will likely be labeled /dev/sdb1, /dev/sdc1, etc

Best Cross-Platform Apps for Linux, Windows, and Mac in 2025Apr 30, 2025 am 09:57 AM

Cross-platform applications have revolutionized software development, enabling seamless functionality across operating systems like Linux, Windows, and macOS. This eliminates the need to switch apps based on your device, offering consistent experien

Best Linux Tools for AI and Machine Learning in 2025Apr 30, 2025 am 09:44 AM

Artificial Intelligence (AI) is rapidly transforming numerous sectors, from healthcare and finance to creative fields like art and music. Linux, with its open-source nature, adaptability, and performance capabilities, has emerged as a premier platfo

5 Best Lightweight Linux Distros Without a GUIApr 30, 2025 am 09:38 AM

Looking for a fast, minimal, and efficient Linux distribution without a graphical user interface (GUI)? Lightweight, GUI-less Linux distros are perfect for older hardware or specialized tasks like servers and embedded systems. They consume fewer res

How to Install Wine 10.0 in RedHat DistributionsApr 30, 2025 am 09:32 AM

Wine 10.0 stable version release: Running Windows applications on Linux to a higher level Wine, this open source and free application, allows Linux users to run Windows software and games on Unix/Linux operating systems, ushering in the release of the 10.0 stable version! This version has been provided with source code and binary package downloads, and supports various distributions such as Linux, Windows and Mac. This edition embodies a year of hard work and over 8,600 improvements, bringing many exciting improvements. Key highlights include: Enhanced support for Bluetooth devices. Improve support for HID input devices. Optimized performance of 32-bit and 64-bit applications.

How to Install and Configure SQL Server on RHELApr 30, 2025 am 09:27 AM

This tutorial guides you through installing SQL Server 2022 on RHEL 8.x or 9.x, connecting via the sqlcmd command-line tool, database creation, and basic querying. Prerequisites Before beginning, ensure: A supported RHEL version (RHEL 8 or 9). Sudo

How to Install Thunderbird 135 on a Linux DesktopApr 30, 2025 am 09:26 AM

Mozilla Thunderbird 135: Powerful cross-platform mail client Mozilla Thunderbird is a free, open source, cross-platform email, calendar, news, chat and contact management client designed to efficiently handle multiple email accounts and news sources. On February 5, 2025, Mozilla released the Thunderbird 135 version, introducing a number of new features, performance improvements and security fixes. Thunderbird 135 main features: XZ Packaging for Linux Binaries: Smaller files, faster unpacking, and better integration with modern distributions. Cookie storage support: when creating space

See all articles