Home >System Tutorial >LINUX >Linux kernel memory fragmentation prevention technology: in-depth understanding of memory management

Linux kernel memory fragmentation prevention technology: in-depth understanding of memory management

WBOY
WBOYforward
2024-02-12 09:54:15933browse

Have you ever encountered various memory problems in Linux systems? Such as memory leaks, memory fragmentation, etc. These problems can be solved by in-depth understanding of Linux kernel memory fragmentation prevention technology.

Linux kernel内存碎片防治技术:深入理解内存管理

The way the Linux kernel organizes and manages physical memory is the buddy system, and physical memory fragmentation is one of the weaknesses of the buddy system. In order to prevent and solve the fragmentation problem, the kernel has adopted some practical technologies. These technologies will be discussed here. Make a summary.

1 Consolidate fragments when memory is low

Apply memory pages from buddy. If no suitable page is found, two steps of memory adjustment will be performed, compact and reclaim. The former is to consolidate fragments to obtain larger contiguous memory; the latter is to recycle buffer memory that does not necessarily occupy memory. The focus here is to understand comact. The entire process is roughly as follows:

__alloc_pages_nodemask
  -> __alloc_pages_slowpath
    -> __alloc_pages_direct_compact
      -> try_to_compact_pages
        -> compact_zone_order
          -> compact_zone
            -> isolate_migratepages
            -> migrate_pages
            -> release_freepages
并不是所有申请不到内存的场景都会compact,首先要满足order大于0,并且gfp_mask携带__

GFP_FS and __GFP_IO; In addition, the remaining memory of the zone needs to meet certain conditions. The kernel calls it the "fragmentation index". This value is between 0 and 1000. The default fragmentation index can only be used when it is greater than 500. compact, this default value can be adjusted through the proc file exfrag_threshold. The fragmentation index is calculated through the fragmentation_index function:

1. /*
2. \* Index is between 0 and 1000
3. *
4. \* 0 => allocation would fail due to lack of memory
5. \* 1000 => allocation would fail due to fragmentation
6. */
7. return 1000 - div_u64( (1000+(div_u64(info->free_pages * 1000ULL, requested))), info->free_blocks_total)

During the process of consolidating memory fragments, fragmented pages will only move within this zone, and pages located at low addresses in the zone will be moved to the end of the zone as much as possible. Applying for a new page location is implemented through the compaction_alloc function.

The movement process is divided into synchronous and asynchronous. After the memory application fails, the first compact will use asynchronous, and the subsequent reclaim will use synchronous. The synchronous process only moves the pages that are currently unused, and the asynchronous process will traverse and wait for all MOVABLE pages to be used before moving.

2 Organize pages by mobility

Memory pages are divided into the following three types according to mobility:
UNMOVABLE: The location in the memory is fixed and cannot be moved at will. The memory allocated by the kernel basically belongs to this type;
RECLAIMABLE: Cannot be moved, but can be deleted and recycled. For example, file mapped memory;
MOVABLE: It can be moved at will. User space memory basically belongs to this type.
When applying for memory, according to the mobility, first apply for memory in the free page of the specified type. The free memory of each zone is organized as follows:

1. struct zone {
2. ......
3. struct free_area free_area[MAX_ORDER];
4. ......
5. }
6.  
7. struct free_area {
8. struct list_head free_list[MIGRATE_TYPES];
9. unsigned long nr_free;
10. };

When the memory cannot be requested in the free_area of ​​the specified type, it can be appropriated from the backup type. The allocated memory will be released to the newly specified type list. The kernel calls this process "theft".
The alternate type priority list is defined as follows:

1. static int fallbacks[MIGRATE_TYPES][4] = {
2. [MIGRATE_UNMOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE, MIGRATE_RESERVE },
3. [MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE, MIGRATE_MOVABLE, MIGRATE_RESERVE },
4. \#ifdef CONFIG_CMA
5. [MIGRATE_MOVABLE] = { MIGRATE_CMA, MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE },
6. [MIGRATE_CMA] = { MIGRATE_RESERVE }, /* Never used */
7. \#else
8. [MIGRATE_MOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE },
9. \#endif
10. [MIGRATE_RESERVE] = { MIGRATE_RESERVE }, /* Never used */
11. \#ifdef CONFIG_MEMORY_ISOLATION
12. [MIGRATE_ISOLATE] = { MIGRATE_RESERVE }, /* Never used */
13. \#endif
14. };

It is worth noting that not all scenarios are suitable for organizing pages by mobility. When the memory size is not enough to be allocated to various types, it is not suitable to enable mobility. There is a global variable to indicate whether it is enabled, which is set during memory initialization:

1. void __ref build_all_zonelists(pg_data_t *pgdat, struct zone *zone)
2. {
3. ......
4. if (vm_total_pages else
7. page_group_by_mobility_disabled = 0;
8. ......
9. }

If page_group_by_mobility_disabled, all memory is non-movable.
There is a parameter that determines the at least number of pages each memory area has, pageblock_nr_pages, which is defined as follows:

#define pageblock_order HUGETLB_PAGE_ORDER

1. \#else /* CONFIG_HUGETLB_PAGE */
2. /* If huge pages are not used, group by MAX_ORDER_NR_PAGES */
3. \#define pageblock_order (MAX_ORDER-1)
4. \#endif /* CONFIG_HUGETLB_PAGE */
5. \#define pageblock_nr_pages (1UL 

During system initialization, all pages are marked MOVABLE:

1. void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
2. unsigned long start_pfn, enum memmap_context context)
3. {
4. ......
5. if ((z->zone_start_pfn 

Other mobility types of pages are generated later, which is the "stealing" mentioned above. When this happens, higher priority, larger contiguous pages in the fallback are usually "stealed" to avoid the generation of small fragments.

1. /* Remove an element from the buddy allocator from the fallback list */
2. static inline struct page *
3. __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
4. {
5. ......
6. /* Find the largest possible block of pages in the other list */
7. for (current_order = MAX_ORDER-1; current_order >= order;
8. --current_order) {
9. for (i = 0;; i++) {
10. migratetype = fallbacks[start_migratetype][i];
11. ......
12. }

You can view the page distribution of various types of the current system through /proc/pageteypeinfo.

3 Virtual removable memory domain

Before the technology of organizing pages based on mobility, there is another method that has been integrated into the kernel, which is the virtual memory domain: ZONE_MOVABLE. The basic idea is simple: divide the memory into two parts, removable and non-removable.

1. enum zone_type {
2. \#ifdef CONFIG_ZONE_DMA
3. ZONE_DMA,
4. \#endif
5. \#ifdef CONFIG_ZONE_DMA32
6. ZONE_DMA32,
7. \#endif
8. ZONE_NORMAL,
9. \#ifdef CONFIG_HIGHMEM
10. ZONE_HIGHMEM,
11. \#endif
12. ZONE_MOVABLE,
13. __MAX_NR_ZONES
14. };

ZONE_MOVABLE的启用需要指定kernel参数kernelcore或者movablecore,kernelcore用来指定不可移动的内存数量,movablecore指定可移动的内存大小,如果两个都指定,取不可移动内存数量较大的一个。如果都不指定,则不启动。
与其它内存域不同的是ZONE_MOVABLE不关联任何物理内存范围,该域的内存取自高端内存域或者普通内存域。
find_zone_movable_pfns_for_nodes用来计算每个node中ZONE_MOVABLE的内存数量,采用的内存区域通常是每个node的最高内存域,在函数find_usable_zone_for_movable中体现。
在对每个node分配ZONE_MOVABLE内存时,kernelcore会被平均分配到各个Node:
kernelcore_node = required_kernelcore / usable_nodes;
在kernel alloc page时,如果gfp_flag同时指定了__GFP_HIGHMEM和__GFP_MOVABLE,则会从ZONE_MOVABLE内存域申请内存。

总之,Linux kernel内存碎片防治技术是一个非常重要的概念,可以帮助你更好地理解Linux系统中的内存管理。如果你想了解更多关于这个概念的信息,可以查看本文提供的参考资料。

The above is the detailed content of Linux kernel memory fragmentation prevention technology: in-depth understanding of memory management. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:lxlinux.net. If there is any infringement, please contact admin@php.cn delete