search
HomeJavajavaTutorialHow does Java I/O work under the hood?

This blog post mainly discusses how I/O works at the bottom level. This article serves readers who are eager to understand how Java I/O operations are mapped at the machine level and what the hardware does when the application is running. It is assumed that you are familiar with basic I/O operations, such as reading and writing files through the Java I/O API. These contents are beyond the scope of this article.

How does Java I/O work under the hood?

 Cache processing and kernel vs user space

 Buffering and buffering processing methods are the basis of all I/O operations. The terms "input, output" only make sense for moving data into and out of the cache. Keep this in mind at all times. Typically, a process performing an operating system I/O request involves draining data from a buffer (a write operation) and filling the buffer with data (a read operation). This is the whole concept of I/O. The mechanisms that perform these transfer operations inside the operating system can be very complex, but are conceptually very simple. We will discuss it in a small part of the article.

How does Java I/O work under the hood?

The above image shows a simplified "logical" diagram that represents how block data is moved from an external source, such as a disk, into a process's storage area (such as RAM). First, the process requires its buffer to be filled through the read() system call. This system call causes the kernel to issue a command to the disk control hardware to obtain data from the disk. The disk controller writes data directly to the kernel's memory buffer via DMA without further assistance from the main CPU. When a read() operation is requested, once the disk controller has completed filling the cache, the kernel copies the data from the temporary cache in kernel space to the process-specific cache.

One thing to note is that when the kernel tries to cache and prefetch data, the data requested by the process in the kernel space may already be ready. If so, the data requested by the process will be copied. If the data is not available, the process is suspended. The kernel will read the data into memory.

Virtual Memory

You may have heard of virtual memory many times. Let me introduce it again.

 All modern operating systems use virtual memory. Virtual memory means artificial or virtual addresses instead of physical (hardware RAM) memory addresses. Virtual addresses have two important advantages:

Multiple virtual addresses can be mapped to the same physical address.

A virtual address space can be larger than the actual available hardware memory.

In the above introduction, copying from kernel space to end-user cache seems to add extra work. Why not tell the disk controller to send data directly to the user-space cache? Well, this is implemented by virtual memory. Use the above advantages 1.

By mapping a kernel space address to the same physical address as a user space virtual address, the DMA hardware (which can only address physical memory addresses) can fill the cache. This cache is visible to both kernel and user space processes.

How does Java I/O work under the hood?

This eliminates the copy between kernel and user space, but requires the kernel and user buffers to use the same page alignment. The buffer must use a multiple of the block size of the disk controller (usually 512-byte disk sectors). The operating system divides its memory address space into pages, which are fixed-size groups of bytes. These memory pages are always a multiple of the disk block size and usually 2x (simplified addressing). Typical memory page sizes are 1024, 2048 and 4096 bytes. Virtual and physical memory page sizes are always the same.

Memory Paging

In order to support the second advantage of virtual memory (having an addressable space larger than physical memory), virtual memory paging (often called page swapping) is required. This mechanism relies on the fact that pages in the virtual memory space can be persisted in external disk storage, thereby providing space for other virtual pages to be placed in physical memory. Essentially, physical memory acts as a cache for paging areas. The paging area is the space on the disk where the contents of memory pages are saved when they are forcibly swapped out of physical memory.

 Adjust the memory page size to a multiple of the disk block size so that the kernel can send instructions directly to the disk controller hardware to write the memory page to disk or reload it when needed. It turns out that all disk I/O operations are done at the page level. This is the only way data can be moved between disk and physical memory on modern paged operating systems.

Modern CPUs contain a subsystem called the Memory Management Unit (MMU). This device is logically located between the CPU and physical memory. It contains mapping information from virtual addresses to physical memory addresses. When the CPU references a memory location, the MMU determines which pages need to reside (usually by shifting or masking certain bits of the address) and converts the virtual page number to a physical page number (implemented in hardware, which is extremely fast).

 File-oriented, block I/O

  File I/O always occurs in the context switch of the file system. File systems and disks are completely different things. Disks store data in segments, each segment is 512 bytes. It is a hardware device and knows nothing about the saved file semantics. They simply provide a certain number of slots where data can be saved. In this respect, a disk segment is similar to memory paging. They all have uniform size and are one large addressable array.

On the other hand, the file system is a higher level abstraction. A file system is a special method of arranging and translating data stored on a disk (or other random-access, block-oriented device). The code you write will almost always interact with the file system, not the disk directly. The file system defines abstractions such as file names, paths, files, and file attributes.

 A file system organizes (on a hard disk) a series of uniformly sized data blocks. Some blocks hold meta-information, such as mappings of free blocks, directories, indexes, etc. Other blocks contain actual file data. Metainformation for an individual file describes which blocks contain the file's data, where the data ends, when it was last updated, and so on. When a user process sends a request to read file data, the file system accurately locates the location of the data on the disk. Action is then taken to place these disk sectors into memory.

  File systems also have the concept of pages, whose size may be the same as a basic memory page size or a multiple of it. Typical file system page sizes range from 2048 to 8192 bytes and are always a multiple of the base memory page size.

Performing I/O on a paged file system can be boiled down to the following logical steps:

Determine which file system pages (collections of disk segments) the request spans. File contents and metadata on disk may be distributed over multiple file system pages, and these pages may not be contiguous.

Allocate enough kernel space memory pages to hold the same file system pages.

Establish the mapping of these memory pages to the file system pages on disk.

Generate page fault for every memory page.

The virtual memory system gets into paging faults and schedules pagins to verify those pages by reading the contents from disk.

Once pageins are completed, the file system breaks down the raw data to extract the requested file content or attribute information.

It should be noted that this file system data will be cached like other memory pages. In subsequent I/O requests, some or all of the file data remains in physical memory and can be directly reused without rereading from disk.

 File Locking

  File locking is a mechanism by which a process can prevent other processes from accessing a file or restrict other processes from accessing the file. Although named "File Lock", it means locking the entire file (which is often done). Locking can usually be done at a more granular level. As the granularity drops to the byte level, regions of the file are typically locked. A lock is associated with a specific file, starting at a specified byte position in the file and running to a specified byte range. This is important because it allows multiple processes to cooperate in accessing specific areas of the file without preventing other processes from working elsewhere in the file.

  There are two forms of file locks: shared and exclusive. Multiple shared locks can be valid on the same file area at the same time. An exclusive lock, on the other hand, requires that no other lock is valid for the requested region.

Streaming I/O

 Not all I/O is block-oriented. There is also stream I/O, which is the prototype of a pipe, and the bytes of the I/O data stream must be accessed sequentially. Common data flows include TTY (console) devices, print ports, and network connections.

 Data streams are usually, but not necessarily, slower than block devices, providing intermittent input. Most operating systems allow working in non-blocking mode. Allows a process to check whether input to a data stream is available without blocking if it is not. This management allows processes to process input as it arrives and perform other functions while the input stream is idle.

  A step further than non-blocking mode is conditional selection (readiness selection). It is similar to non-blocking mode (and often builds on non-blocking mode), but relieves the operating system from the burden of checking whether the stream is ready. The operating system can be told to observe a collection of streams and return instructions to the process which stream is ready. This capability allows a process to reuse multiple activity streams using common code and a single thread by leveraging preparation information returned by the operating system. This method is widely used by network servers in order to handle a large number of network connections. Preparing for selection is critical for high-volume expansion.


Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Is Java Platform Independent if then how?Is Java Platform Independent if then how?May 09, 2025 am 12:11 AM

Java is platform-independent because of its "write once, run everywhere" design philosophy, which relies on Java virtual machines (JVMs) and bytecode. 1) Java code is compiled into bytecode, interpreted by the JVM or compiled on the fly locally. 2) Pay attention to library dependencies, performance differences and environment configuration. 3) Using standard libraries, cross-platform testing and version management is the best practice to ensure platform independence.

The Truth About Java's Platform Independence: Is It Really That Simple?The Truth About Java's Platform Independence: Is It Really That Simple?May 09, 2025 am 12:10 AM

Java'splatformindependenceisnotsimple;itinvolvescomplexities.1)JVMcompatibilitymustbeensuredacrossplatforms.2)Nativelibrariesandsystemcallsneedcarefulhandling.3)Dependenciesandlibrariesrequirecross-platformcompatibility.4)Performanceoptimizationacros

Java Platform Independence: Advantages for web applicationsJava Platform Independence: Advantages for web applicationsMay 09, 2025 am 12:08 AM

Java'splatformindependencebenefitswebapplicationsbyallowingcodetorunonanysystemwithaJVM,simplifyingdeploymentandscaling.Itenables:1)easydeploymentacrossdifferentservers,2)seamlessscalingacrosscloudplatforms,and3)consistentdevelopmenttodeploymentproce

JVM Explained: A Comprehensive Guide to the Java Virtual MachineJVM Explained: A Comprehensive Guide to the Java Virtual MachineMay 09, 2025 am 12:04 AM

TheJVMistheruntimeenvironmentforexecutingJavabytecode,crucialforJava's"writeonce,runanywhere"capability.Itmanagesmemory,executesthreads,andensuressecurity,makingitessentialforJavadeveloperstounderstandforefficientandrobustapplicationdevelop

Key Features of Java: Why It Remains a Top Programming LanguageKey Features of Java: Why It Remains a Top Programming LanguageMay 09, 2025 am 12:04 AM

Javaremainsatopchoicefordevelopersduetoitsplatformindependence,object-orienteddesign,strongtyping,automaticmemorymanagement,andcomprehensivestandardlibrary.ThesefeaturesmakeJavaversatileandpowerful,suitableforawiderangeofapplications,despitesomechall

Java Platform Independence: What does it mean for developers?Java Platform Independence: What does it mean for developers?May 08, 2025 am 12:27 AM

Java'splatformindependencemeansdeveloperscanwritecodeonceandrunitonanydevicewithoutrecompiling.ThisisachievedthroughtheJavaVirtualMachine(JVM),whichtranslatesbytecodeintomachine-specificinstructions,allowinguniversalcompatibilityacrossplatforms.Howev

How to set up JVM for first usage?How to set up JVM for first usage?May 08, 2025 am 12:21 AM

To set up the JVM, you need to follow the following steps: 1) Download and install the JDK, 2) Set environment variables, 3) Verify the installation, 4) Set the IDE, 5) Test the runner program. Setting up a JVM is not just about making it work, it also involves optimizing memory allocation, garbage collection, performance tuning, and error handling to ensure optimal operation.

How can I check Java platform independence for my product?How can I check Java platform independence for my product?May 08, 2025 am 12:12 AM

ToensureJavaplatformindependence,followthesesteps:1)CompileandrunyourapplicationonmultipleplatformsusingdifferentOSandJVMversions.2)UtilizeCI/CDpipelineslikeJenkinsorGitHubActionsforautomatedcross-platformtesting.3)Usecross-platformtestingframeworkss

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

mPDF

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

SecLists

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

MantisBT

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

DVWA

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software