This involves a question, who should be killed? Generally speaking, students who know a little about the Linux kernel will first respond to whoever uses it the most and kill them. This is of course an important factor that the Linux kernel first considers, but it is not entirely like this. We check some Linux kernel information and we can know that who is killed is actually determined by /proc/1fd460db7b2c2eba732e7ab84276e0af/oom_score , this value is one for each process and is calculated by the oom_badness() function of the Linux kernel. Let's read the badness() function carefully.
In the comment part of the badness() function, the processing idea of the badness() function is stated:
1) we lose the minimum amount of work done
2) we recover a large amount of memory
3) we don't kill anything innocent of eating tons of memory
4) we want to kill the minimum amount of processes (one)
5) we try to kill the process the user expects us to kill, this algorithm has been meticulously tuned to meet the principle of least surprise ... (be careful when you change it)
In general, we kill the minimum number of processes to obtain the maximum amount of memory, which is consistent with our killing the process that takes up the largest amount of memory.
/*
* The size memory of the process is the basis for the badness.
*/
points = p->mm->total_vm;
The starting point of the score is the RAM memory actually used by the process. Note that SWAP is not included here, that is, the OOM Killer will only be related to the actual physical memory of the process and has nothing to do with Swap. And we can see that the physical memory actually used by the process is The more memory, the higher the score, and the higher the score, the easier it is to sacrifice.
/*
* Processes which fork a lot of child processes are likely
* a good choice. We add the vmsize of the children if they
* have an own mm. This prevents forking servers to flood the
* machine with an endless amount of children
*/
...
If (chld->mm != p->mm && chld->mm)
points += chld->mm->total_vm;
This paragraph means that the memory occupied by the child process will be calculated to the parent process.
s = int_sqrt(cpu_time);
if (s)
points /= s;
s = int_sqrt(int_sqrt(run_time));
if (s)
points /= s;
This shows that the longer the CPU time occupied by the process or the longer the process runs, the lower the score and the less likely it is to be killed.
/*
* Niced processes are most likely less important, so double
* their badness points.
*/
If (task_nice(p) > 0)
points *= 2;
If the process priority is low (nice value, positive value is low priority, negative value is high priority), then Point is doubled.
/*
* Superuser processes are usually more important, so we make it
* less likely that we kill those.
*/
If (cap_t(p->cap_effective) & CAP_TO_MASK(CAP_SYS_ADMIN) ||
p->uid == 0 || p->euid == 0)
points /= 4;
The process priority of the super user is lower.
/*
* We don't want to kill a process with direct hardware access.
* Not only could that mess up the hardware, but usually users
* tend to only have this flag set on applications they think
* of as important.
*/
If (cap_t(p->cap_effective) & CAP_TO_MASK(CAP_SYS_RAWIO))
points /= 4;
Processes that can directly access the original device have higher priority.
/*
* Adjust the score by oomkilladj.
*/
If (p->oomkilladj) {
If (p->oomkilladj > 0)
Points 8428cef43a18c8ff27b763aa5f592054oomkilladj;
else
points >>= -(p->oomkilladj);
}
Each process has an oomkilladj that can set the priority of the process to be killed. This parameter seems to have a relatively large impact on Point. The maximum oomkilladj is +15 and the minimum is -17. The larger the value, the easier it is to kill. This value is a mobile Bit operations, so the impact is still relatively large.
Let me write a small program to experiment:
#define MEGABYTE 1024*1024*1024 #include <stdio.h> #include <string.h> #include <stdlib.h> int main(int argc, char *argv[]) { void *myblock = NULL; myblock = (void *) malloc(MEGABYTE); printf("Currently allocating 1GB\n"); sleep(1); int count = 0; while( count < 10) { memset(myblock,1,100*1024*1024); myblock = myblock + 100*1024*1024; count++; printf("Currently allocating %d00 MB\n",count); sleep(10); } exit(0); }
The above program first applies for a 1G memory space, and then fills the memory space in units of 100M. Run the above three processes on a machine with 2G memory and 400M Swap space. Let’s take a look at the running results:
Test1, test2, and test3 each applied for 1G of virtual memory space (VIRT), and then every 10 seconds, the actual occupied RAM space increased by 100M (RES).
When the physical memory space is insufficient, the OS starts to perform Swap, and the available Swap space begins to decrease.
When the memory has no space to allocate, the test1 process is killed by the operating system. dmesg We can see that the test1 process was killed by the OS, and the oom_score was 1000.
The oom_adj of these three processes are all set to the default value 0. Let's experiment with the effect of setting oom_adj. Restart the 3 processes, and then we see that the PID of test2 is 12640
Let’s run the following statement
echo 15 > /proc/12640/oom_adj
After a while, we saw that the Swap space decreased sharply, and basically the OS OOM_Killer was about to start.
Sure enough, as expected, the 12640 process was killed.
So in order to prevent the process you need from being killed, you can do this by setting the oom_adj of the process. Of course, some people will say that all this is caused by overbooking. Since Linux provides overcommit_memory to disable the overcommit feature, why not disable it. This has advantages and disadvantages. Once overcommit is disabled, it means that MySQL cannot apply for more space than the actual memory. In MySQL, there are many places where memory space is dynamically applied for. If the application cannot be applied, MySQL will crash, which greatly increases the time required. The risk of MySQL downtime is why Linux overcommits.
With the above analysis, it is not difficult to see that if oom_adj is not set, MySQL will generally become the first choice of OOM_Killer, because MySQL is generally the largest occupier of memory. So as MySQL, how can we try to avoid the risk of being killed? In the next chapter, we will focus on analyzing how to avoid OOM from the perspective of MySQL.