Home  >  Article  >  Operation and Maintenance  >  Which resources docker isolates

Which resources docker isolates

青灯夜游
青灯夜游Original
2022-01-28 15:35:514276browse

Docker isolates resources: 1. File system; 2. Network; 3. Communication between processes; 4. Users and user groups for permissions; 5. PID and host within the process PID; 6. Host name and domain name, etc.

Which resources docker isolates

The operating environment of this tutorial: linux5.9.8 system, docker-1.13.1 version, Dell G3 computer.

The essence of Docker container

The essence of Docker container is a process on the host.

Docker achieves resource isolation through namespace, resource limitation through cgroups, and high efficiency through *copy-on-write mechanism* file operations.

Linux namespace mechanism

The namespace mechanism provides a resource isolation solution.

PID, IPC, Network and other system resources are no longer global, but belong to a specific Namespace.

The resources under each namespace are related to the resources under other namespaces. Transparent, invisible.

One of the main purposes of the Linux kernel implementing namespace is to implement lightweight virtualization (container) services. Processes in the same namespace can perceive each other's changes and know nothing about external processes. To achieve independence and isolation.

What namespace can isolate

If a container wants to not interfere with other containers, it needs to be able to do the following:

  • The file system needs to be isolated

  • The network also needs to be isolated

  • Inter-process communication must also be isolated

  • For permissions, users and user groups also need to be isolated

  • The PID in the process also needs to be isolated from the PID in the host

  • Containers must also have their own host names

With the above isolation, we believe that a container can be isolated from the host and other containers of.

It happens that Linux namespace can do this.

namespace Isolated content System call parameters
UTS Host name and domain name CLONE_NEWUTS
IPC Semaphore, message queue and shared memory CLONE_NEWIPC
Network Network devices, network stacks, ports, etc. CLONE_NEWNET
PID Process number CLONE_NEWPID
Mount Mount point (file system) CLONE_NEWNS
User Users and User Groups CLONE_NEWUSER

UTS namespace

UTS (UNIX TIme-sharing System) namespace provides isolation of host and domain name, so that each Docker container can have an independent host name and domain name on the network can be viewed as an independent node rather than a process on the host machine.
In Docker, each image is basically named hostname after the service name it provides, and will not have any impact on the host.

IPC namespace

The IPC resources designed for Inter-Process Communication (IPC) include common semaphores, message queues and shared memory.
When applying for IPC resources, you apply for a globally unique 32-bit ID.
The IPCnamespace contains the system IPC identifier and the file system that implements the POSIX message queue.
Processes in the same IPC namespace are visible to each other, but processes in different namespaces are invisible to each other.

PID namespace

The isolation of PID namespace is very practical. It renumbers the process PID, that is, two processes under different namespaces can have the same PID, each PID namespaces have their own counting procedures.
The kernel maintains a tree structure for all PID namespaces. The topmost one is created when the system is initialized and is called the root namespace. The newly created PID namespace is called the child namespace, and the original PID namespace is the child namespace of the newly created PID namespace, and the original PID namespace is the parent namespace of the newly created PID namespace.
In this way, different PID namespaces will form a hierarchical system. The parent node to which they belong can see the processes in the child nodes and can affect the processes in the child nodes through signals and other methods. However, the child node cannot see anything in the PID namespace of the parent node.

mount namespace

mount namespace provides support for isolating file systems by isolating file system mount points.
After isolation, changes in file structures in different mount namespaces will not affect each other.

network namespace

Network namespace mainly provides isolation of network resources, including network equipment, IPv4, IPv6 protocol stack, IP routing table, firewall, /proc/ net directory, /sys/class/net directory, sockets, etc.

user namespace

User namespace isolates installation-related identifiers and attributes

namespace operations

The namespace API includes clone() setns() unshare() and some files under /proc
In order to determine which namespaces are isolated, you need to specify one or more of the following 6 parameters separated by | The 6 parameters are CLONE_NEWUTS, CLONE_NEWIPC, CLONE_NEWPID, CLONE_NEWNET, CLONE_NEWUSER mentioned in the table above

clone()

Use clone() to create an independent namespace Process is the most common approach and the most basic way for Docker to use namespace.

int clone(int(*child_func)(void *),void *child_stack,int flags, void *arg);

clone() is a more general implementation of the Linux system call fork(). You can control how many functions are used through flags.
There are more than 20 kinds of CLONE_* flags, which control all aspects of the clone process.

  1. child_func is passed in the main function of the program running by the child process
  2. child_stack is passed in the stack space used by the child process
  3. flags identifies which CLONE_* flag bits are used, and The main ones related to namespace are the 6 mentioned above.
  4. args are used to pass in user parameters

/proc/[pid]/ns

Users can enter /proc/[pid ]/ns file, you can see files pointing to different namespaces.

ls -l /proc/10/ns

Which resources docker isolates

The namespace number in square brackets

If the namespace numbers pointed to by two processes are the same, then they are in the same namespace

The purpose of setting link is that even if all processes under the namespace have ended, this namespace will always exist, and subsequent processes can join in.
Mounting the /proc/[pid]/ns directory file using the --bind method can also achieve the function of link

touch ~/utsmount --bind /proc/10/ns/uts ~/uts

setns()

Docker When using the docker exec command to execute a new command on an already running command, you need to use setns().
Through the setns() system call, the process joins an existing namespace from the original namespace
Usually in order not to affect the caller of the process and to make the newly added pid namespace take effect, the process will be added in setns() After the function is executed, use clone() to create a child process to continue executing the command and let the original process end running.

int setns(int fd, in nstype);
#fd 表示要加入namespace的文件描述符。是一个指向/proc/[pid]/ns目录的文件描述符,打开目录链接可以获得
#nstype 调用者可以检查fd指向的namespace类型是否符合实际要求,该参数为0则不检查

In order to make use of the newly added namespace, it is necessary to introduce the execve() series of functions, which can execute user commands. The most commonly used one is to call /bin/bash and accept parameters

unshare()

Namespace isolation on the original process through unshare()
Unshare is very similar to clone. Unshare does not need to start a new process and can be used on the original process. .
docker does not use the

fork() system call

fork does not belong to the namespace API

Powerful kernel tool cgroups

cgroups is a mechanism provided by the Linux kernel. This mechanism can integrate (or separate) a series of system tasks and their subtasks into different levels based on resource levels according to needs. within the group, thereby providing a unified framework for system resource management.

cgroups is another powerful kernel tool in Linux. With cgroups, you can not only limit the resources isolated by namespace, but also set weights for resources, calculate usage, and control the start of tasks (processes or counties). Stop and wait. To put it bluntly: cgroups can limit and record the physical resources (including CPU, Memory, IO, etc.) used by task groups, and is the cornerstone of building a series of virtualization management tools such as Docker.

The role of cgroups

cgroups provides a unified interface for resource management at different user levels, from individual resource control to the operating system level For virtualization, cgroups provides four major functions.

  • Resource Limitation
    • cgroups can limit the total resources used by tasks.
      • If you set the upper limit of memory used when the application is running, an OOM prompt will be issued once the quota is exceeded
  • Priority allocation
    • Through the allocated number of CPU time slices and disk IO bandwidth, it is actually equivalent to controlling the priority of task running
  • Resource statistics
    • cgroups can count the system Resource usage
      • Such as CPU usage time, memory usage, etc. This function is very suitable for billing
  • Task control
    • cgroups can suspend and resume tasks

Recommended learning: "docker video tutorial"

The above is the detailed content of Which resources docker isolates. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn