Home >Java >javaTutorial >Learn about zero-copy in Linux and Java

Learn about zero-copy in Linux and Java

coldplay.xixi
coldplay.xixiforward
2020-07-01 17:41:122610browse

Learn about zero-copy in Linux and Java

Linux Traditional IO

Hello everyone, I am a piece of data lying on the Linux disk. Now to send me from the disk to the network card, I need to go through the following steps:

Read operation

Learn about zero-copy in Linux and Java

As shown above: the operating system Memory is divided into kernel space and user space. First, the application in user space initiates a data read operation, such as the JVM initiating the read() system call. At this time, the operating system will perform a context switch: switching from user space to kernel space.

Then the kernel space notifies the disk, and the kernel copies me from the disk to the kernel buffer. This process is done by a piece of hardware called "DMA (Direct memory access)", so it does not require the participation of the CPU.

Then the kernel copies me from the kernel buffer to the application buffer, which requires the participation of the CPU.

Finally perform a context switch and switch back to the user space context.

The entire read operation process requires two context switches and two copies.

Related learning recommendations: Java video tutorial

##Write operation

Write operation It is similar to the read operation, but in the opposite direction. It still requires two context switches and two data copies. I may be written to disk, or I may be written to the network card.

Learn about zero-copy in Linux and Java

Memory mapping

As you can see from the above process, if you want to send me from the disk to the network card, a total of 4 context switch and 4 copy operations. I was copied back and forth between kernel space and user space by the operating system, but in fact I did nothing during this period, nothing changed, it was just copying, so this IO model was a waste of operating system resources, and I was copied so Many times, physically and mentally exhausted. Moreover, the resources of the operating system are very precious~

Now mainstream operating systems all use

virtual memory. To put it simply, virtual address is used to replace the physical address . This allows multiple virtual memories to only want the same physical address, and the virtual memory space can be much larger than the physical memory space.

If the operating system can map the application buffer in user space and the kernel buffer in kernel space to the same physical address, wouldn't it eliminate a lot of copying processes? As shown below:

Learn about zero-copy in Linux and Java

##Linux Zero CopySo in order to solve this problem, smart Linux developers wrote some new System calls are made to do this. There are two main ways:

mmap write
  • sendfile

mmap write

mmap()

The system call will first use DMA copy to read from the disk to the kernel buffer, and then use memory mapping to make the memory addresses of the user buffer and the kernel read buffer the same Memory address, that is to say, there is no need for the CPU to copy me from the kernel read buffer to the user buffer! When using the

write()

system call, the CPU writes directly from the kernel buffer (equivalent to the user buffer) to the kernel buffer that needs to be sent. For example, network send buffer (socket buffer), and then pass it into the network card driver (or disk) through DMA to prepare for sending.

mmap + write#The mmap write method requires a total of two system calls, 4 context switches, 2 DMA Copy and 1 CPU Copy to read and write data.

sendfilesendfile is also a system call. It essentially combines the functions of the above two system calls into one call. The advantage of this is that the operating system only needs two context switches, reducing the overhead of two context switches.

Learn about zero-copy in Linux and Java

Linux2.4 kernel optimizes sendfile and provides Learn about zero-copy in Linux and Java operation. This operation can remove the last CPU copy in the above picture. The principle is not to copy data. , instead, the memory address and offset record of the data in the previous kernel buffer (such as the Read Buffer in the case in the figure) are sent to the target kernel buffer (such as the Socket Buffer in the case in the figure), so that in the final DMA In the copy stage, you can use this pointer to directly copy the data.

Learn about zero-copy in Linux and Java

Java NIO uses zero copy

Linux’s zero copy can indeed save some operating system resources. Therefore, Java's NIO provides some classes in order to support zero copy:

  • DirectByteBuffer
  • FileChannel

In the previous "Java NIO - Buffer" This article briefly introduces DirectByteBuffer. There are two main implementations of ByteBuffer, one is DirectByteBuffer and the other is HeapByteBuffer.

Among them, DirectByteBuffer allocates memory directly outside the heap, and the bottom layer directly calls the NIO system call of the operating system through JNI, so the performance will be relatively high. The HeapByteBuffer is in-heap memory, and the data needs to be copied one more time, so the performance is relatively low.

FileChannel is a class provided by Java NIO for copying files. It can copy files to disk or network, etc.

mapThe method actually uses the memory mapping method in the operating system to map the memory of the kernel buffer and the memory of the user buffer into an address.

transferToThe method directly transfers the current channel content to another channel, which means that this method does not have the problem of reading and writing from the kernel buffer to the user buffer. The bottom layer is the sendfile system call. transferFromThe method is the same.

Sample code:

File file = new File("test.txt");RandomAccessFile raf = new RandomAccessFile(file, "rw");FileChannel fileChannel = raf.getChannel();SocketChannel socketChannel = SocketChannel.open(new InetSocketAddress("", 8080));// 直接使用了transferTo()进行通道间的数据传输fileChannel.transferTo(0, fileChannel.size(), socketChannel);

Author: Public account _xy’s technical circle

Link: www.imooc.com/article/289550

Source: MOOC.com

The above content comes from MOOC.com

Re-understanding of zero copy

  1. Zero copy is from the operation From a system perspective. Because no data is duplicated between kernel buffers (only the kernel buffer has one copy of data).

  2. Zero copy not only brings less data copying, but also brings other performance advantages, such as fewer context switches, less CPU cache pseudo sharing and no CPU Checksum calculation.

The difference between mmap and sendFile

  1. mmap is suitable for reading and writing small amounts of data, and sendFile is suitable for large file transfers.

  2. mmap requires 4 context switches and 3 data copies; sendFile requires 3 context switches and at least 2 data copies.

  3. sendFile can use DMA to reduce CPU copying, but mmap cannot (it must be copied from the kernel to the Socket buffer).

The above is the detailed content of Learn about zero-copy in Linux and Java. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:learnku.com. If there is any infringement, please contact admin@php.cn delete