You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The previous non-rsvd max/limit_in_bytes does not account for reserved
huge page memory, making it possible for a process to reserve all the
huge page memory, without being able to allocate it (due to hugetlb
cgroup page fault accounting restrictions).
In practice this makes it possible to successfully mmap more huge page
memory than allowed via the cgroup settings, but when using the memory
the process will get a SIGBUS and crash. This is bad for applications
trying to mmap at startup (and it succeeds), but the program crashes
when starting to use the memory. eg. postgres is doing this by default.
This patch updates and clarifies `LinuxResources.HugepageLimits` and
`LinuxHugepageLimit` by defaulting the configurations go to rsvd hugetlb
cgroup (when supported) and fallback to page fault accounting if not
supported.
Fixes#1050
Signed-off-by: Kailun Qin <[email protected]>
**`hugepageLimits`** (array of objects, OPTIONAL) represents the `hugetlb` controller which allows to limit the
393
-
HugeTLB usage per control group and enforces the controller limit during page fault.
392
+
**`hugepageLimits`** (array of objects, OPTIONAL) represents the `hugetlb` controller which allows to limit the HugeTLB reservations (if supported) or usage (page fault).
393
+
By default if supported by the kernel, `hugepageLimits` defines the hugepage sizes and limits for HugeTLB controller
394
+
reservation accounting, which allows to limit the HugeTLB reservations per control group and enforces the controller
395
+
limit at reservation time and at the fault of HugeTLB memory for which no reservation exists.
396
+
Otherwise if not supported by the kernel, this should fallback to the page fault accounting, which allows users to limit
397
+
the HugeTLB usage (page fault) per control group and enforces the limit during page fault.
398
+
399
+
Note that reservation limits are superior to page fault limits, since reservation limits are enforced at reservation
400
+
time (on mmap or shget), and never causes the application to get SIGBUS signal if the memory was reserved before hand.
401
+
This allows for easier fallback to alternatives such as non-HugeTLB memory for example. In the case of page fault
402
+
accounting, it's very hard to avoid processes getting SIGBUS since the sysadmin needs precisely know the HugeTLB usage
403
+
of all the tasks in the system and make sure there is enough pages to satisfy all requests. Avoiding tasks getting
404
+
SIGBUS on overcommited systems is practically impossible with page fault accounting.
405
+
394
406
For more information, see the kernel cgroups documentation about [HugeTLB][cgroup-v1-hugetlb].
0 commit comments