UoC crest

Linux and Huge Pages

Assuming one's kernel has been compiled with huge page support, the current policy can be found in /sys/kernel/mm/transparent_hugepage/enabled.

~$ cat /sys/kernel/mm/transparent_hugepage/enabled 
[always] madvise never

The brackets indicate the current policy, and the file helpfully lists the three settings which root can write to that file to change the policy. There is some dispute over whether the better default is madvise, meaning that applications do not get huge pages unless they ask for them, or always, meaning that the kernel always tries to give applications huge pages.

The Theory

A page, typically 4KB, is the granularity on which the OS can give a process memory. It is also the granularity of the page tables which describe the mapping of virtual addresses to physical addresses.

Every memory access a program makes is via a virtual address which must first be translated to a physical address. If this translation is not already cached in the TLB, there is a significant performance penalty, the extent of which will depend on whether relevant parts of the page tables themselves are in the data cache. It is likely to be tens of clock cycles, and could be over a hundred.

Recent Intel CPUs (Haswell onwards) have last level TLBs whose entries can cache either 4KB pages, or 2MB "huge" pages. The Haswell has 1024 entries, so can cache 4MB of address space if using 4KB pages (less than its last level data cache), or 2GB if using huge pages.

Even if simply moving through very large uncacheable arrays, 4KB pages lead to a TLB miss every 4KB (or 512 doubles) read, whereas 2MB pages will reduce this problem to insignificance.

The Problem

The use of huge pages requires the OS to merge consecutive 4KB pages in order to create them, and sometimes split them back into 4KB pages again. This takes time. It also reduces the granularity on which unused pages can be detected, leading to an increase in physical memory use.

The result is that some applications run faster with huge pages, and some slower. Depending on the sort of applications one cares about, one might conclude that huge pages should be unconditionally turned on, or that they should be on on request only.

C/C++ programmers can call madvise(), or, more likely, mmap(...,MAP_ANONYMOUS|MAP_HUGETLB...) when they know that they want huge pages. It is less clear what Fortran (or Java or python) programmers should do. (And C/C++ programmers should be aware that both these calls are Linux-specific, which is not ideal.)