Home > Article > System Tutorial > Understanding the Linux kernel preparatory work: understand C language and a little bit about the operating system
Preface: The operating system (English: OperatingSystem, abbreviated: OS) is the system software that manages computer hardware and software resources. It is also the core and cornerstone of the computer system. The operating system needs to handle basic tasks such as managing and configuring video memory, determining the priority of system resource supply and demand, controlling input and output devices, operating networks, and managing file systems. The operating system also provides an operating interface for users to interact with the system.
1. Linux kernel preparation work
The best knowledge points to prepare for understanding the Linux kernel:
Understand C language
Know a little knowledge about the operating system
Familiar with a small number of related algorithms
Understand computer architecture
Features of the Linux kernel:
Linux kernel tasks:
1. From a technical perspective, the kernel is an intermediate layer between hardware and software. Its function is to pass application layer sequence requests to the hardware and act as a bottom-level driver to poll various devices and components in the system.
2. From the application level, the application has no connection with the hardware, but only with the kernel. The kernel is the lowest level in the hierarchy that the application knows. In actual work, the kernel embodies the relevant details.
3. The kernel is a resource management program. Responsible for allocating available shared resources (CPU time, disk space, network connections, etc.) to various system processes.
4. The kernel is like a library, providing a set of system-oriented commands. To the application, system calls are just like calling ordinary functions.
Kernel implementation strategy:
1.Microkernel. The most basic functions are implemented by the central kernel (microkernel). All other functions are delegated to independent processes that communicate with the central core through well-defined communication sockets.
2. Macro kernel. All code of the kernel, including subsystems (such as video memory management, file management, device drivers) are packaged into a file. Every function in the kernel has access to all other parts of the kernel. Currently supports dynamic loading and unloading (cutting) of modules. The Linux kernel is implemented based on this strategy.
Where is the kernel mechanism used?
1. To communicate between processes (the address space is allocated in the virtual memory of the CPU, the address space of each process is completely independent; the number of processes executed at the same time does not exceed the number of CPUs), a specific kernel mechanism needs to be used.
2. Switching between processes (the number of processes executed at the same time does not exceed the number of CPUs) also requires the use of the kernel mechanism.
Process switching also needs to save the state like FreeRTOS task switching and put the process in idle state/resume state.
3. Process scheduling. Determine how long that process has been running.
Linux process:
1. Using a hierarchical structure, each process depends on a parent process. The kernel starts the init program as the first process. This process is responsible for further system initialization operations. The init process is the root of the process tree, and all processes originate directly or indirectly from this process.
2. Query through pstree command. In fact, the first process in the system is systemd, not init (this is also a point of confusion)
3. Each process in the system has a unique identifier (ID), and users (or other processes) can use the ID to access the process.
2. Directory structure of Linux kernel source code
Linux kernel source code consists of three main parts:
1. Kernel core code, including various subsystems and submodules described in Chapter 3, as well as other supporting subsystems, such as power management, Linux initialization, etc.
2. Other non-core codes, such as library files (because the Linux kernel is a self-contained kernel, that is, the kernel does not depend on any other software and can be compiled by itself), firmware collections, KVM (virtual machine technology), etc.
3. Compile scripts, configuration files, help documents, copyright statements and other auxiliary files
Use the ls command to hear the top-floor directory structure of the kernel source code. The specific description is as follows:
include/----Kernel header files need to be provided to external modules (such as user space code). kernel/----The core code of the Linux kernel, including the process scheduling subsystem described in Section 3.2, and modules related to process scheduling. mm/----Video memory management subsystem (section 3.3). fs/----VFS subsystem (section 3.4). net/----The network subsystem that does not include network device drivers (section 3.5). ipc/----IPC (inter-process communication) subsystem. arch//----Code related to architecture, such as arm, x86, etc. arch//mach-----specific machine/board related code. arch//include/asm----architecture-related header files. arch//boot/dts----DeviceTree file. init/----Code related to Linux system startup initialization. block/----Provides the hierarchy of block devices. sound/----Audio-related drivers and subsystems can be regarded as "audio subsystems". drivers/----device driver (in Linuxkernel3.10, device driver accounts for 49.4% of the code). lib/----implement library functions that need to be used in the kernel, such as CRC, FIFO, list, MD5, etc. crypto/-----Library functions related to encryption and decryption. security/----Provides security features (SELinux). virt/----Provides support for virtual machine technology (KVM, etc.). usr/----Code used to generate initramfs. firmware/----save the firmware used to drive third-party devices. samples/----Some sample code. tools/----Some common tools, such as performance analysis, self-testing, etc. Kconfig, Kbuild, Makefile, scripts/----Configuration files, scripts, etc. used for kernel compilation. COPYING----Copyright statement. MAINTAINERS----List of maintainers. CREDITS----List of major contributors to Linux. REPORTING-BUGS----Bug reporting manual. Documentation, README----help, documentation.
[Article Benefits] The editor recommends his own Linux kernel technology exchange group: [865977150] has compiled some learning books and video materials that I personally think are better and shared them in the group file. If necessary, you can add them by yourself! ! ! The first 100 people who join the group will receive an additional core information package worth 699 (including video tutorials, e-books, practical projects and codes)
3. Brief analysis of Linux kernel architecture
Figure 1 Linux system hierarchy
The front is the user (or application) space. This is where user applications execute. Under the user space is the kernel space, and the Linux kernel is located here. GNUCLibrary (glibc) is also here. It provides a system call socket to connect to the kernel, and also provides a mechanism to translate between user space applications and the kernel. This is important because kernel and user space applications use different protected address spaces. Each user-space process uses its own virtual address space, while the kernel occupies a separate address space.
The Linux kernel can be further defined into 3 layers. The front is the system call socket, which implements some basic functions, such as read and write. Beneath the system call socket is the kernel code, which can be more precisely defined as architecture-independent kernel code. This code is common to all processor architectures supported by Linux. Underneath this code is architecture-dependent code, forming what is generally called the BSP (BoardSupportPackage). This code serves as processor and platform-specific code for a given architecture.
The Linux kernel implements many important architectural properties. At a higher or lower level, the kernel is defined into subsystems. Linux can also be viewed as a whole, since it integrates all those basic services into the kernel. This is different from the architecture of the microkernellinux kernel detailed explanation. The former will provide some basic services, such as communication, I/O, memory and process management. More specific services are inserted into the microkernel layer. of. Each core has its own advantages, but these are not discussed here.
As time goes by, the Linux kernel has higher efficiency in video memory and CPU usage, and is very stable. And the most interesting thing about Linux is that it still has good portability despite these size and complexity. Linux is compiled to run on a large number of processors and platforms with different architectural constraints and requirements. A counterexample is that Linux can run on a processor that has a graphics memory management unit (MMU), or it can run on those processors that do not provide an MMU.
The uClinux port of the Linux kernel provides support for non-MMU.
Figure 2 Linux kernel architecture
The main components of the Linux kernel are: system call sockets, process management, memory management, virtual file systems, network stacks, device drivers, and hardware architecture related codes.
(1) System call socket
The SCI layer provides individual mechanisms to perform function calls from user space to the kernel. As discussed above, this socket is architecture dependent, even within the same processor family. SCI is actually a very useful function call multiplexing and demultiplexing service. You can find the implementation of SCI in ./linux/kernel and the architecture-dependent parts in ./linux/arch.
(2) Process management
The focus of process management is the execution of the process. In the kernel, such processes are called threads and represent individual processor virtualizations (thread code, data, stack, and CPU registers). In user space, the term process is generally used, but the Linux implementation does not distinguish between these two concepts (process and thread). The kernel provides an application programming interface (API) through SCI to create a new process (fork, exec or PortableOperatingSystemInterface[POSIX] function), stop the process (kill, exit), and communicate and synchronize between them (signal or Then POSIX mechanism).
Process management also includes handling the need to share the CPU between active processes. The kernel implements a new type of scheduling algorithm that operates within a fixed amount of time regardless of how many threads are competing for the CPU. These algorithms are called O(1) schedulers. The name means that the time it takes to schedule multiple threads is the same as the time it takes to schedule one thread. The O(1) scheduler can also support multiple processors (called symmetric multiprocessors or SMP). You can find the source code for process management in ./linux/kernel and the architecture-dependent source code in ./linux/arch.
(3)Video memory management
Another important resource managed by the kernel is video memory. In order to improve efficiency, if virtual video memory is managed by hardware, video memory is managed according to the so-called video memory page format (4KB for most architectures). Linux includes methods for managing available video memory, as well as hardware mechanisms used for chemical and virtual mapping. However, video memory management needs to manage more than just the 4KB buffer. Linux provides a concrete representation of the 4KB buffer, such as the slab allocator. These memory management modes use a 4KB buffer as a base, then allocate structures from it, and track memory page usage, such as which memory pages are full, which pages are not fully used, and which pages are empty. This allows the mode to dynamically adjust video memory usage based on system needs. In order to support the use of video memory by multiple users, sometimes the available video memory is consumed. For this reason in the Linux kernel, pages can be moved out of video memory and placed into the c drive. This process is called swapping because the pages are swapped from video memory to the hard drive. The source code for video memory management can be found in ./linux/mm.
(4)Virtual file system
Virtual File System (VFS) is a very useful aspect of the Linux kernel because it provides a universal interface representation for the file system. VFS provides an exchange layer between SCI and file systems supported by the kernel (see Figure 4).
Figure 3 Linux file system hierarchy
In VFS, it is a common API representation of functions such as open, close, read and write. Under VFS is the file system representation, which defines the implementation method of the lower-level functions. They are plugins for a given file system (more than 50 of them). The source code for the file system can be found in ./linux/fs. Below the file system layer is the buffer cache, which provides a common set of functions for the file system layer (independent of the specific file system). This caching layer optimizes access to chemical equipment by retaining data for a period of time (or pre-fetching the data later to make it available when needed). Beneath the buffer cache are device drivers that implement sockets for specific chemical devices.
(5)Network stack
The network stack is designed to follow the layered architecture of the simulated contract itself. Recall that Internet Protocol (IP) is the core network layer contract underlying the transport contract (commonly known as the transmission control contract or TCP). Inside TCP is the socket layer, which is called through SCI. The socket layer is the standard API of the network subsystem, which provides a user socket for various network contracts. From raw frame access to IP Contract Data Units (PDUs) to TCP and User Datagram Protocol (UDP), the socket layer provides a standardized way to manage connections and communicate data between various endpoints. The network source code in the kernel can be found in ./linux/net.
(6)Device driver
There is a large amount of code in the Linux kernel in device drivers, which can run specific hardware devices. The Linux source tree provides a driver subdirectory, which is further defined as various supported devices, such as Bluetooth, I2C, serial, etc. The code for the device driver can be found in ./linux/drivers.
(7) Architecture-dependent code
Although Linux is largely independent of the architecture it is running on, there are some elements that must be considered for the architecture to operate properly and achieve greater efficiency. The ./linux/arch subdirectory defines the architecture-dependent portions of the kernel source code, which contains various architecture-specific subdirectories (together making up the BSP). For a typical desktop system, the x86 directory is used. Each architecture subdirectory contains many other subdirectories, and each subdirectory focuses on a specific aspect of the kernel, such as booting, kernel, memory management, etc. This architecture-dependent code can be found in ./linux/arch.
If the portability and efficiency of the Linux kernel are not good enough, Linux also provides other features that are difficult to define within the classification. As a production operating system and open source software, Linux is a good platform for testing new contracts and their improvements. Linux supports a wide range of network protocols, including classic TCP/IP, as well as extensions to high-speed networks (less than 1Gigabit Ethernet [GbE] and 10GbE). Linux can also support contracts such as the Stream Control Transmission Contract (SCTP), which provides many more intermediate features than TCP (and is the successor to the transport layer contract). Linux is also a dynamic kernel, supporting the dynamic addition or deletion of software components. Known as dynamically loadable kernel modules, they can be inserted on demand at boot time (the module is currently required for a specific device) or by the user at any time.
One of the latest improvements to Linux is its ability to be used as an operating system (called a hypervisor) for other operating systems. Recently, changes have been made to the kernel called Kernel-based Virtual Machines (KVM). This change enables a new socket for user space, which allows other operating systems to run on top of the KVM-enabled kernel. In addition to running other instances of Linux, Microsoft Windows can also be virtualized. The only restriction is that the underlying processor must support the new virtualization instructions.
4. The difference between Linux architecture and kernel structure
1. When asked about the Linux architecture (that is, how the Linux system is structured), we can answer this with reference to the picture on the right: From a large perspective, the Linux architecture can be divided into two parts:
2. The reason why the Linux architecture is divided into user space and kernel space:
1) Modern CPUs generally implement different working modes,
Take ARM as an example: ARM implements 7 working modes. In different modes, the instructions that the CPU can execute or the registers it accesses are different:
Take (2) X86 as an example: X86 implements 4 different levels of permissions, Ring0-Ring3; Ring0 can execute privileged instructions and access IO devices; Ring3 has many restrictions
2) Therefore, from the perspective of the CPU, Linux divides the system into two parts in order to protect the security of the kernel;
3. User space and kernel space are two different states of program execution. We can complete the transfer from user space to kernel space through "system call" and "hardware interrupt"
4. Linux kernel structure (note the distinction between LInux architecture and Linux kernel structure)
5. Linux-driven platform mechanism
Compared with the traditional device_driver mechanism of these platformdriver mechanisms of Linux, a very significant advantage is that the platform mechanism registers its own resources into the kernel, which is managed uniformly by the kernel. When this resource is used in the driver, it is provided through platform_device Apply for and use the standard socket. This improves the independence of drivers and resource management, but has better portability and security. The following is a schematic diagram of the SPI driver hierarchy. The SPI bus in Linux can be understood as the bus drawn from the SPI controller:
Like traditional drivers, the platform mechanism is also divided into three steps:
1. Bus registration stage:
Kernel_init()→do_basic_setup()→driver_init()→platform_bus_init()→bus_register(&platform_bus_type) in the main.c file during kernel startup initialization registers a platform bus (virtual bus, platform_bus).
2. Add equipment stage:
When registering the device, Platform_device_register()→platform_device_add()→(pdev→dev.bus=&platform_bus_type)→device_add(), just hang the device on the virtual bus.
3. Driver registration stage:
Platform_driver_register()→driver_register()→bus_add_driver()→driver_attach()→bus_for_each_dev(), do __driver_attach()→driver_probe_device() for each device hanging on the virtual platformbus, and determine drv→bus→match () is executed successfully. At this time, execute platform_match→strncmp(pdev→name,drv→name,BUS_ID_SIZE) through the pointer. If it matches, call really_probe (actually execute platform_driver→probe(platform_device) of the corresponding device.) to start the real detection. Test, if the probe is successful, bind the device to the driver.
It can be seen from the inside that the platform mechanism finally calls the three key functions bus_register(), device_add(), and driver_register().
Platform_device structure describes a platform structure device, which contains the general device structure structdevicedev; the device resource structure structresource*resource; and the device name constchar*name. (Note that this name must be the same as the previous platform_driver.driveràname, the reason will be explained earlier.)
The most important thing in this structure is the resource structure, which is why the platform mechanism is introduced.
The reason why names should be the same:
The driver mentioned in it will call the function bus_for_each_dev() when registering, and do __driver_attach()→driver_probe_device() for each device hanging on the virtual platformbus. In this function, it will do dev and drv. For preliminary matching, the function pointed to by drv->bus->match is called. In the platform_driver_register function, drv->driver.bus=&platform_bus_type, so drv->bus->match is platform_bus_type→match, which is the platform_match function.
is to compare the names of dev and drv. If they are the same, they will step into the really_probe() function, so they will step into the probe function written by themselves for further matching. Therefore, dev→name and driver→drv→name must be filled in the same way during initialization.
Different types of drivers have different match functions. The driver of this platform compares the names of dev and drv. Do you remember the match in the USB class driver? It compares ProductID and VendorID.
Personal summary of the benefits of Platform mechanism:
1. Provide a bus of platform_bus_type type and add these soc devices that are not bus type to this virtual bus. As a result, the bus-device-driver model can be popularized.
2. Provide platform_device and platform_driver type data structures, embed traditional device and driver data structures into them, but add resource members to easily integrate with new bootloaders and kernels such as OpenFirmware that dynamically transfer device resources.
6, Linux kernel architecture
Because the Linux kernel is monolithic, it takes up the largest space and has the highest complexity than other types of kernels. This is a design feature that caused quite a bit of controversy in the early days of Linux, but still carries some of the same design flaws inherent in a single kernel.
To address this flaw, one thing the Linux kernel developers have done is make kernel modules loadable and unloadable at runtime, which means you can dynamically add or remove kernel features. In addition to adding hardware capabilities to the kernel, this can also include modules for running server processes, such as low-level virtualization, but it can also replace the entire kernel without requiring a computer restart in individual cases.
Imagine if you could upgrade to a Windows service pack without having to reboot...
Seven, kernel module
What if Windows already has all available drivers installed, and you just need to turn on the ones you need? This is essentially what kernel modules do for Linux. Kernel modules, also known as loadable kernel modules (LKMs), are necessary to keep the kernel working with all hardware without consuming all available video memory.
Modules generally add functions such as devices, file systems, and system calls to the basic kernel. The file extension of lkm is .ko and is generally stored in the /lib/modules directory. Because of the characteristics of the module, you can easily customize the kernel by setting the module to load or notload using the menuconfig command at startup, editing the /boot/config file, or dynamically loading and unloading the module using the modprobe command.
Third-party and closed source modules are available in some distributions, such as Ubuntu, and may be difficult to install by default since the source code for such modules is not available. The developers of the software (i.e. nVidia, ATI, etc.) do not provide source code, but instead build their own modules and compile the required .ko files for distribution. In fact, such modules are free like beer, but they are not free like speech and are therefore not included in some distributions because maintainers feel that it "pollutes" the kernel by providing non-free software.
The kernel is not magical, but it is essential for any properly functioning computer. The Linux kernel differs from OSX and Windows in that it contains kernel-level drivers and enables many things to work "out of the box". Hopefully you will learn more about how software and hardware work together and the files needed to start your computer.
Conclusion: The power of interest is infinite. Interest can bring passion. If work can be combined with interest, there will be enthusiasm for work. In this way, work is not just work, but also a kind of enjoyment.
The above is the detailed content of Understanding the Linux kernel preparatory work: understand C language and a little bit about the operating system. For more information, please follow other related articles on the PHP Chinese website!