Paging is a system which allows each process to see a full virtual address space, without actually requiring the full amount of physical memory to be available or present. 32-bit x86 processors support 32-bit virtual addresses and 4-GiB virtual address spaces, and current 64-bit processors support 48-bit virtual addressing and 256-TiB virtual address spaces.
- OSDev Wiki


Rationale

For a long time I struggled to visualize IA32e paging structures in my head. I always viewed paging as redundant. You’d think paging redundant on modern systems. Indeed, it is possible to implement virtual addressing, protection and session spacing with identity mapping.

The argument for memory efficiency is also not very convincing, but there aren’t many reasons for Microsoft to mess with things that already work well enough, not to mention that paging as a concept is more a vertical issue than just software alone.

From the malware perspective, acting on the underlying physical memory is far more powerful than using official API to adjust virtual memory. We are able to get around many of the restrictions placed on virtual memory. We can ignore session spacing and page protection. We can construct our own paging structures and allocate memory hidden from the Virtual Address Descriptor or VAD.

The following diagram describes the layout of an IA32e virtual address.

IA32e Paging Overview

Each page table is a list of 512 addresses. The first 256 are reserved for user addresses. There are four page tables. Also consider that 512 * sizeof(uintptr_t) == PAGE_SIZE. The base physical address of the PML4 is described by EPROCESS.DirectoryTableBase.


Address Translation

Paging relates to session spacing. When a thread goes from one address space to another via SwapContext, the CR3 is set. When the CPU wants to translate a physical address, it first checks if the translation already exists in the Translation Lookaside Buffer or TLB. If no translation exists, it walks the page tables.

Paging in long mode is similar to that of 32-bit paging, except Physical Address Extension (PAE) is required. Registers CR2 and CR3 are extended to 64 bits. Instead of just having to utilize 3 levels of page maps: page directory pointer table, page directory, and page table, a fourth page-map table is used: the level-4 page map table (PML4). This allows a processor to map 48-bit virtual addresses to 52-bit physical addresses.
- OSDev Wiki

As a sidenote, you can flush the TLB by toggling the CR4.PGE bit or firing the INVLPG instruction. User mode applications can use SwitchToThread() to force a context switch.

A word on naming conventions. Four-level paging is implemented in both IA32e and AMD64, but with different naming schemes. This often leads to confusion. In this paper we simplify to PML4, PML3, PML2 and PML1.

We translate by walking down the chain.

#define _pfn_to_page(pfn) (pfn << _page_shift)

u8* vtop(EPROCESS* proc, virt_addr_t va)
{
  // read from proc, but if it is currentproc we can obv __readcr3
  // procs can also obscure the DirectoryTableBase with tricks
  pml4e_t pml4e = read<pml4e_t>(_pfn_to_page(proc->DirectoryTableBase.pfn) + va.pml4_idx * sizeof(pml4e_t));
  if (!pml4e.present)
    return 0;
  
  pml3e_t pml3e = read<pml3e_t>(_pfn_to_page(pml4e.pfn) + va.pml3_idx * sizeof(pml3e_t))>
  if (!pml3e.present)
    return 0;

  pml2e_t pml2e = read<pml2e_t>(_pfn_to_page(pml3e.pfn) + va.pml2_idx * sizeof(pml2e_t))>
  if (!pml2e.present)
    return 0;

  pml1e_t pml1e = read<pml1e_t>(_pfn_to_page(pml2e.pfn) + va.pml1_idx * sizeof(pml1e_t))>
  if (!pml1e.present)
    return 0;
  
  // finally return pfn + page offset...
  return (u8*)(_pfn_to_page(pml1e.pfn) + va.offset);
}

Both PML3 and PML2 can have large page entries, describing 1GB and 2MB pages respectively if HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management\LargePageDrivers is set. This is all outlined in my header file. We call these PML3e_LARGE and PML2e_LARGE. So in page table PML3, entries can describe the PML2 or a 1GB page directly. Similarly, entries in PML2 might describe the PML1 or a 2MB page directly. We can check for large page by checking the PageSize bit. I leave large page translation as an exercise to the reader.


Demonstration

PageView64 was created to visualize these structures. The program consists of the PageView64.sys driver and PageView64.exe client. The driver calls MmCopyMemory on IRP. We send the process ID and receive the 4KB page tables. The client uses ImGui with SFML.