


The previous article analyzed the assembly startup process of RISC-V Linux, which mentioned that relocate redirection requires turning on the MMU. Today we analyze the page table creation of RISC-V Linux.
Note: This article is based on the linux5.10.111 kernel
##sv39 mmu
RISC-V Linux supportssv32,
sv39,
sv48 and other virtual address formats, which respectively represent 32-bit virtual address, 38-bit virtual address and 48-bit virtual address. address. RISC-V Linux also uses the sv39 format by default. The virtual address, physical address, and PTE format of sv39 are as follows:



The physical address is represented by 56 bits, the lower 12 bits represent the page offset, and the higher bits are the physical pages PPN[0], PPN[1] and PPN[2]
PTE saves the physical page PPN[0] , PPN[1] and PPN[2], corresponding to the PPN in the physical address; the lower 10 bits of the PTE represent the access rights of the physical address. When RWX is all 0, it means that the address stored in the PTE is The physical address of the next level page table, otherwise it means that the current page table is the last level page table .
Look at the page table format of sv39. sv39 uses a three-level page table, PGD
, PMD
and PTE
, each The level page table is represented by 9 bits, that is, each level page table has 512 page table entries.
In the code, create an array with 512 elements to represent a page table. A PTE has 512 page table entries, each page table entry occupies 8 bytes, 512*8=4096 bytes, so a PTE represents 4K. A PMD also has 512 page table entries, each entry can represent a PTE, 512 *4 K=2M, so a PMD represents 2M. By analogy, one PGD represents 512 * 2M = 1G.
Important conclusion: PGD represents 1G
, PMD represents 2M
, and PTE represents 4K
. The default page size of sv39 is 4K.
Schematic diagram of the process of converting the virtual address of the third-level page table to the physical address:
sv39 The process of converting the virtual address of the third-level page table to the physical address:
MMU passes satp The register obtains the physical address of PGD, and combines it with the PGD index (i.e., V PN[2]) to find the PMD; after finding the PMD, it then combines it with the PMD index (i.e., V PN[1]) to find the PTE, and then combines it with the PTE index (i.e., V PN[0] ]) Get the value of VA in the PTE index to get the physical address.
Finally, take out PPN[2], PPN[1] and PPN[0] from the PTE, and then add them to the low 12-bit offset of the virtual address to get the final physical address.
Temporary page table analysis
Before starting the MMU, you need to create kernel, dtb, trampoline and other page tables. So that after the MMU is turned on and before the memory management module is run, the kernel can be initialized normally and the dtb can be parsed normally. This part of the page table is a temporary page table, and the final page table is created in setup_vm_final().
Temporary page table creation sequence:
First create early PGD and PMD for fixmap. At this time, PGD uses early_pg_dir
. Then create a secondary page table for the first 2M of memory starting from the kernel. At this time, PGD uses trampoline_pg_dir
. The page table created for these 2M is also called superpage
. Then, create a secondary page table for the entire kernel. At this time, PGD uses early_pg_dir
. Finally, reserve 4M size for dtb to create a secondary page table.
Page table creation function
void __init create_pgd_mapping(pgd_t *pgdp,
uintptr_t va, phys_addr_t pa,
phys_addr_t sz, pgprot_t prot)
pgdp: PGD page table
va: virtual address
pa: physical address
sz
:映射大小,PGDIR_SIZE或PMD_SIZE或PTE_SIZE
prot
:PAGE_KERNEL_EXEC/PAGE_KERNEL表示当前是最后一级页表,否则pa代表下一级页表的物理地址
create_pmd_mapping()
static void __init create_pmd_mapping(pmd_t *pmdp, uintptr_t va, phys_addr_t pa, phys_addr_t sz, pgprot_t prot)
pmdp
:PMD页表
va
:虚拟地址
pa
:物理地址
sz
:映射大小,PMD_SIZE或PAGE_SIZE
prot
:权限,PAGE_KERNEL_EXEC/PAGE_KERNEL表示当前是最后一级页表,否则pa代表下一级页表的物理地址
create_pte_mapping()
static void __init create_pte_mapping(pte_t *ptep, uintptr_t va, phys_addr_t pa, phys_addr_t sz, pgprot_t prot)
ptep
:PTE页表
va
:虚拟地址
pa
:物理地址
sz
:映射大小,PAGE_SIZE
prot
:权限,PAGE_KERNEL_EXEC/PAGE_KERNEL表示当前是最后一级页表,否则pa代表下一级页表的物理地址
使用举例
例如,将虚拟地址PAGE_OFFSET映射到物理地址pa,映射大小为4K,创建三级页表PGD、PMD和PTE:
create_pgd_mapping(early_pg_dir,PAGE_OFFSET, (uintptr_t)early_pmd,PGDIR_SIZE,PAGE_TABLE); create_pmd_mapping(early_pmd,PAGE_OFFSET, (uintptr_t)early_pte,PGDIR_SIZE,PAGE_TABLE); create_pte_mapping(early_pte,PAGE_OFFSET, (uintptr_t)pa,PAGE_SIZE,PAGE_KERNEL_EXEC);
这样创建后,MMU就会根据PAGE_OFFSET在PGD中找到PMD,然后再PMD中找到PTE,最后取出物理地址。
页表创建源码分析
RISC-V Linux启动,经历了两次页表创建过程,第一次使用C函数setup_vm()
创建临时页表,第二次使用C函数setup_vm_final()
创建最终页表。
具体细节参考代码中的注释,下面的代码省略了一些不重要的部分。
setup_vm()
asmlinkage void __init setup_vm(uintptr_t dtb_pa) { uintptr_t va, pa, end_va; uintptr_t load_pa = (uintptr_t)(&_start); uintptr_t load_sz = (uintptr_t)(&_end) - load_pa; uintptr_t map_size; //load_pa就是kernel加载的其实物理地址 //load_sz就是kernel的实际大小 //page_offset就是kernel的起始物理地址对应的虚拟地址,va_pa_offset是他们的偏移量 va_pa_offset = PAGE_OFFSET - load_pa; //计算得到kernel起始物理地址的物理页,PFN_DOWN是将物理地址右移12位,因为sv39的物理地址的低12位是pa_offset,所以右移12位,得到pfn pfn_base = PFN_DOWN(load_pa); map_size = PMD_SIZE;//PMD_SIZE为2M,在当前,map_size只能为PGDIR_SIZE或PMD_SIZE。这时kernel默认不允许建立PTE。 //检查PAGE_OFFSET是否1G对齐,以及kernel入口地址是否2M对齐 BUG_ON((PAGE_OFFSET % PGDIR_SIZE) != 0); BUG_ON((load_pa % map_size) != 0); //allc_pte_early里面是BUG(),对于临时页表,kernel不允许我们建立PTE pt_ops.alloc_pte = alloc_pte_early; pt_ops.get_pte_virt = get_pte_virt_early; #ifndef __PAGETABLE_PMD_FOLDED pt_ops.alloc_pmd = alloc_pmd_early; pt_ops.get_pmd_virt = get_pmd_virt_early; #endif /* 设置 early PGD for fixmap */ create_pgd_mapping(early_pg_dir, FIXADDR_START, (uintptr_t)fixmap_pgd_next, PGDIR_SIZE, PAGE_TABLE); /* 设置 fixmap PMD */ create_pmd_mapping(fixmap_pmd, FIXADDR_START, (uintptr_t)fixmap_pte, PMD_SIZE, PAGE_TABLE); /* 设置 trampoline PGD and PMD */ create_pgd_mapping(trampoline_pg_dir, PAGE_OFFSET, (uintptr_t)trampoline_pmd, PGDIR_SIZE, PAGE_TABLE); create_pmd_mapping(trampoline_pmd, PAGE_OFFSET, load_pa, PMD_SIZE, PAGE_KERNEL_EXEC); /* * 设置覆盖整个内核的早期PGD,这将使我们能够达到paging_init()。 * 稍后在下面的 setup_vm_final() 中映射所有内存。 */ end_va = PAGE_OFFSET + load_sz; for (va = PAGE_OFFSET; va < end_va; va += map_size) create_pgd_mapping(early_pg_dir, va, load_pa + (va - PAGE_OFFSET), map_size, PAGE_KERNEL_EXEC); /* 为dtb创建早期的PMD */ create_pgd_mapping(early_pg_dir, DTB_EARLY_BASE_VA, (uintptr_t)early_dtb_pmd, PGDIR_SIZE, PAGE_TABLE); /* 为 FDT 早期扫描创建两个连续的 PMD 映射 */ pa = dtb_pa & ~(PMD_SIZE - 1); create_pmd_mapping(early_dtb_pmd, DTB_EARLY_BASE_VA, pa, PMD_SIZE, PAGE_KERNEL); create_pmd_mapping(early_dtb_pmd, DTB_EARLY_BASE_VA + PMD_SIZE, pa + PMD_SIZE, PMD_SIZE, PAGE_KERNEL); dtb_early_va = (void *)DTB_EARLY_BASE_VA + (dtb_pa & (PMD_SIZE - 1)); ...... }
setup_vm()在最开始就进行了kernel入口地址的对齐检查,要求入口地址2M对齐。假设内存起始地址为0x80000000,那么kernel只能放在0x80000000、0x80200000等2M对齐处。为什么会有这种对齐要求呢?
我猜测单纯是为给opensbi预留了2M空间,因为kernel之前还有opensbi,而opensbi运行完之后,默认跳转地址就是偏移2M,kernel只是为了跟opensbi对应,所以设置了2M对齐。
那opensbi需要占用2M这么大?实际上只需要几百KB,因此opensbi和kernel中间有一段内存是空闲的,没有人使用。这个问题我们下篇再讲。
setup_vm_final()
在该函数中开始为整个物理内存做内存映射,通过swapper
页表来管理,并且清除掉汇编阶段的页表。
static void __init setup_vm_final(void) { uintptr_t va, map_size; phys_addr_t pa, start, end; u64 i; /** * 此时MMU已经开启,但是页表还没完全建立。 */ pt_ops.alloc_pte = alloc_pte_fixmap; pt_ops.get_pte_virt = get_pte_virt_fixmap; #ifndef __PAGETABLE_PMD_FOLDED pt_ops.alloc_pmd = alloc_pmd_fixmap; pt_ops.get_pmd_virt = get_pmd_virt_fixmap; #endif /* Setup swapper PGD for fixmap */ create_pgd_mapping(swapper_pg_dir, FIXADDR_START, __pa_symbol(fixmap_pgd_next), PGDIR_SIZE, PAGE_TABLE); /* 为整个物理内存创建页表 */ for_each_mem_range(i, &start, &end) { if (start >= end) break; if (start <= __pa(PAGE_OFFSET) && __pa(PAGE_OFFSET) < end) start = __pa(PAGE_OFFSET); //best_map_size是选择合适的映射大小,kernel入口地址2M对齐或者kernel大小能被2M整除时,map_size就是2M,否则就是4K。 map_size = best_map_size(start, end - start); for (pa = start; pa < end; pa += map_size) { va = (uintptr_t)__va(pa); create_pgd_mapping(swapper_pg_dir, va, pa, map_size, PAGE_KERNEL_EXEC); } } /* 清除fixmap的PMD和PTE */ clear_fixmap(FIX_PTE); clear_fixmap(FIX_PMD); /* 切换到swapper页表,这个是最终的页表,汇编阶段relocate开启MMU的操作,跟下面这句是一样的。 */ csr_write(CSR_SATP, PFN_DOWN(__pa_symbol(swapper_pg_dir)) | SATP_MODE); local_flush_tlb_all();//刷新TLB ...... }
说明:
在setup_vm_final()函数中,通过swapper_pg_dir
页表来管理整个物理内存的访问。并且清除汇编阶段的页表fixmap_pte和early_pg_dir。(本质上就是把该页表项的内容清0,即赋值为0)
最终把swapper_pg_dir
页表的物理地址赋值给SATP
寄存器。这样CPU就可以通过该页表访问整个物理内存。
切换页表通过如下实现:
csr_write(CSR_SATP,PFN_DOWN(_pa(swapper_pg_dir))|SATP_MODE);
在swapper_pg_dir管理的kernel space中,其虚拟地址与物理地址空间的偏移是固定的,为va_pa_offset
(定义在arch/riscv/mm/init.c中的一个全局变量)
注意:swapper_pg_dir管理的是kernel space的页表,即它把物理内存映射到的虚拟地址空间是只能kernel访问的。user space不能访问,用户空间如果访问,必须自行建立页表,把物理地址映射到user space的虚拟地址空间。kernel线程共享这个swapper_pg_dir页表。
Summary
The page table creation when RISC-V Linux starts is relatively easy to understand. They are all created in C language, and the code is relatively small. The main two page table creation functions are setup_vm() and setup_vm_final(). After understanding some of the address formats of sv39, it will be easier to analyze the source code. However, the codes of different kernel versions are different and require detailed analysis of specific situations.
This article mentioned that setup_vm() will check whether the kernel entry address is 2M aligned. If it is not aligned, the kernel cannot start. But in fact, we can lift this 2M alignment restriction and make use of this part of the space. The next article will teach you Optimize this part of memory.
The above is the detailed content of RISC-V Linux startup page table creation analysis. For more information, please follow other related articles on the PHP Chinese website!

linux设备节点是应用程序和设备驱动程序沟通的一个桥梁;设备节点被创建在“/dev”,是连接内核与用户层的枢纽,相当于硬盘的inode一样的东西,记录了硬件设备的位置和信息。设备节点使用户可以与内核进行硬件的沟通,读写设备以及其他的操作。

区别:1、open是UNIX系统调用函数,而fopen是ANSIC标准中的C语言库函数;2、open的移植性没fopen好;3、fopen只能操纵普通正规文件,而open可以操作普通文件、网络套接字等;4、open无缓冲,fopen有缓冲。

端口映射又称端口转发,是指将外部主机的IP地址的端口映射到Intranet中的一台计算机,当用户访问外网IP的这个端口时,服务器自动将请求映射到对应局域网内部的机器上;可以通过使用动态或固定的公共网络IP路由ADSL宽带路由器来实现。

在linux中,eof是自定义终止符,是“END Of File”的缩写;因为是自定义的终止符,所以eof就不是固定的,可以随意的设置别名,linux中按“ctrl+d”就代表eof,eof一般会配合cat命令用于多行文本输出,指文件末尾。

在linux中,交叉编译是指在一个平台上生成另一个平台上的可执行代码,即编译源代码的平台和执行源代码编译后程序的平台是两个不同的平台。使用交叉编译的原因:1、目标系统没有能力在其上进行本地编译;2、有能力进行源代码编译的平台与目标平台不同。

在linux中,可以利用“rpm -qa pcre”命令判断pcre是否安装;rpm命令专门用于管理各项套件,使用该命令后,若结果中出现pcre的版本信息,则表示pcre已经安装,若没有出现版本信息,则表示没有安装pcre。

linux查询mac地址的方法:1、打开系统,在桌面中点击鼠标右键,选择“打开终端”;2、在终端中,执行“ifconfig”命令,查看输出结果,在输出信息第四行中紧跟“ether”单词后的字符串就是mac地址。

在linux中,rpc是远程过程调用的意思,是Reomote Procedure Call的缩写,特指一种隐藏了过程调用时实际通信细节的IPC方法;linux中通过RPC可以充分利用非共享内存的多处理器环境,提高系统资源的利用率。


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Safe Exam Browser
Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

SublimeText3 Linux new version
SublimeText3 Linux latest version

VSCode Windows 64-bit Download
A free and powerful IDE editor launched by Microsoft

Atom editor mac version download
The most popular open source editor

SublimeText3 Mac version
God-level code editing software (SublimeText3)
