Skip to content

Commit 0ce20dd

Browse files
ramosian-glidertorvalds
authored andcommitted
mm: add Kernel Electric-Fence infrastructure
Patch series "KFENCE: A low-overhead sampling-based memory safety error detector", v7. This adds the Kernel Electric-Fence (KFENCE) infrastructure. KFENCE is a low-overhead sampling-based memory safety error detector of heap use-after-free, invalid-free, and out-of-bounds access errors. This series enables KFENCE for the x86 and arm64 architectures, and adds KFENCE hooks to the SLAB and SLUB allocators. KFENCE is designed to be enabled in production kernels, and has near zero performance overhead. Compared to KASAN, KFENCE trades performance for precision. The main motivation behind KFENCE's design, is that with enough total uptime KFENCE will detect bugs in code paths not typically exercised by non-production test workloads. One way to quickly achieve a large enough total uptime is when the tool is deployed across a large fleet of machines. KFENCE objects each reside on a dedicated page, at either the left or right page boundaries. The pages to the left and right of the object page are "guard pages", whose attributes are changed to a protected state, and cause page faults on any attempted access to them. Such page faults are then intercepted by KFENCE, which handles the fault gracefully by reporting a memory access error. Guarded allocations are set up based on a sample interval (can be set via kfence.sample_interval). After expiration of the sample interval, the next allocation through the main allocator (SLAB or SLUB) returns a guarded allocation from the KFENCE object pool. At this point, the timer is reset, and the next allocation is set up after the expiration of the interval. To enable/disable a KFENCE allocation through the main allocator's fast-path without overhead, KFENCE relies on static branches via the static keys infrastructure. The static branch is toggled to redirect the allocation to KFENCE. The KFENCE memory pool is of fixed size, and if the pool is exhausted no further KFENCE allocations occur. The default config is conservative with only 255 objects, resulting in a pool size of 2 MiB (with 4 KiB pages). We have verified by running synthetic benchmarks (sysbench I/O, hackbench) and production server-workload benchmarks that a kernel with KFENCE (using sample intervals 100-500ms) is performance-neutral compared to a non-KFENCE baseline kernel. KFENCE is inspired by GWP-ASan [1], a userspace tool with similar properties. The name "KFENCE" is a homage to the Electric Fence Malloc Debugger [2]. For more details, see Documentation/dev-tools/kfence.rst added in the series -- also viewable here: https://raw.githubusercontent.com/google/kasan/kfence/Documentation/dev-tools/kfence.rst [1] http://llvm.org/docs/GwpAsan.html [2] https://linux.die.net/man/3/efence This patch (of 9): This adds the Kernel Electric-Fence (KFENCE) infrastructure. KFENCE is a low-overhead sampling-based memory safety error detector of heap use-after-free, invalid-free, and out-of-bounds access errors. KFENCE is designed to be enabled in production kernels, and has near zero performance overhead. Compared to KASAN, KFENCE trades performance for precision. The main motivation behind KFENCE's design, is that with enough total uptime KFENCE will detect bugs in code paths not typically exercised by non-production test workloads. One way to quickly achieve a large enough total uptime is when the tool is deployed across a large fleet of machines. KFENCE objects each reside on a dedicated page, at either the left or right page boundaries. The pages to the left and right of the object page are "guard pages", whose attributes are changed to a protected state, and cause page faults on any attempted access to them. Such page faults are then intercepted by KFENCE, which handles the fault gracefully by reporting a memory access error. To detect out-of-bounds writes to memory within the object's page itself, KFENCE also uses pattern-based redzones. The following figure illustrates the page layout: ---+-----------+-----------+-----------+-----------+-----------+--- | xxxxxxxxx | O : | xxxxxxxxx | : O | xxxxxxxxx | | xxxxxxxxx | B : | xxxxxxxxx | : B | xxxxxxxxx | | x GUARD x | J : RED- | x GUARD x | RED- : J | x GUARD x | | xxxxxxxxx | E : ZONE | xxxxxxxxx | ZONE : E | xxxxxxxxx | | xxxxxxxxx | C : | xxxxxxxxx | : C | xxxxxxxxx | | xxxxxxxxx | T : | xxxxxxxxx | : T | xxxxxxxxx | ---+-----------+-----------+-----------+-----------+-----------+--- Guarded allocations are set up based on a sample interval (can be set via kfence.sample_interval). After expiration of the sample interval, a guarded allocation from the KFENCE object pool is returned to the main allocator (SLAB or SLUB). At this point, the timer is reset, and the next allocation is set up after the expiration of the interval. To enable/disable a KFENCE allocation through the main allocator's fast-path without overhead, KFENCE relies on static branches via the static keys infrastructure. The static branch is toggled to redirect the allocation to KFENCE. To date, we have verified by running synthetic benchmarks (sysbench I/O, hackbench) that a kernel compiled with KFENCE is performance-neutral compared to the non-KFENCE baseline. For more details, see Documentation/dev-tools/kfence.rst (added later in the series). [[email protected]: fix parameter description for kfence_object_start()] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: avoid stalling work queue task without allocations] Link: https://lkml.kernel.org/r/CADYN=9J0DQhizAGB0-jz4HOBBh+05kMBXb4c0cXMS7Qi5NAJiw@mail.gmail.com Link: https://lkml.kernel.org/r/[email protected] [[email protected]: fix potential deadlock due to wake_up()] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: add option to use KFENCE without static keys] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: add missing copyright and description headers] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Marco Elver <[email protected]> Signed-off-by: Alexander Potapenko <[email protected]> Reviewed-by: Dmitry Vyukov <[email protected]> Reviewed-by: SeongJae Park <[email protected]> Co-developed-by: Marco Elver <[email protected]> Reviewed-by: Jann Horn <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christopher Lameter <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David Rientjes <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Joern Engel <[email protected]> Cc: Kees Cook <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
1 parent 8700539 commit 0ce20dd

File tree

9 files changed

+1484
-0
lines changed

9 files changed

+1484
-0
lines changed

include/linux/kfence.h

Lines changed: 216 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,216 @@
1+
/* SPDX-License-Identifier: GPL-2.0 */
2+
/*
3+
* Kernel Electric-Fence (KFENCE). Public interface for allocator and fault
4+
* handler integration. For more info see Documentation/dev-tools/kfence.rst.
5+
*
6+
* Copyright (C) 2020, Google LLC.
7+
*/
8+
9+
#ifndef _LINUX_KFENCE_H
10+
#define _LINUX_KFENCE_H
11+
12+
#include <linux/mm.h>
13+
#include <linux/types.h>
14+
15+
#ifdef CONFIG_KFENCE
16+
17+
/*
18+
* We allocate an even number of pages, as it simplifies calculations to map
19+
* address to metadata indices; effectively, the very first page serves as an
20+
* extended guard page, but otherwise has no special purpose.
21+
*/
22+
#define KFENCE_POOL_SIZE ((CONFIG_KFENCE_NUM_OBJECTS + 1) * 2 * PAGE_SIZE)
23+
extern char *__kfence_pool;
24+
25+
#ifdef CONFIG_KFENCE_STATIC_KEYS
26+
#include <linux/static_key.h>
27+
DECLARE_STATIC_KEY_FALSE(kfence_allocation_key);
28+
#else
29+
#include <linux/atomic.h>
30+
extern atomic_t kfence_allocation_gate;
31+
#endif
32+
33+
/**
34+
* is_kfence_address() - check if an address belongs to KFENCE pool
35+
* @addr: address to check
36+
*
37+
* Return: true or false depending on whether the address is within the KFENCE
38+
* object range.
39+
*
40+
* KFENCE objects live in a separate page range and are not to be intermixed
41+
* with regular heap objects (e.g. KFENCE objects must never be added to the
42+
* allocator freelists). Failing to do so may and will result in heap
43+
* corruptions, therefore is_kfence_address() must be used to check whether
44+
* an object requires specific handling.
45+
*
46+
* Note: This function may be used in fast-paths, and is performance critical.
47+
* Future changes should take this into account; for instance, we want to avoid
48+
* introducing another load and therefore need to keep KFENCE_POOL_SIZE a
49+
* constant (until immediate patching support is added to the kernel).
50+
*/
51+
static __always_inline bool is_kfence_address(const void *addr)
52+
{
53+
/*
54+
* The non-NULL check is required in case the __kfence_pool pointer was
55+
* never initialized; keep it in the slow-path after the range-check.
56+
*/
57+
return unlikely((unsigned long)((char *)addr - __kfence_pool) < KFENCE_POOL_SIZE && addr);
58+
}
59+
60+
/**
61+
* kfence_alloc_pool() - allocate the KFENCE pool via memblock
62+
*/
63+
void __init kfence_alloc_pool(void);
64+
65+
/**
66+
* kfence_init() - perform KFENCE initialization at boot time
67+
*
68+
* Requires that kfence_alloc_pool() was called before. This sets up the
69+
* allocation gate timer, and requires that workqueues are available.
70+
*/
71+
void __init kfence_init(void);
72+
73+
/**
74+
* kfence_shutdown_cache() - handle shutdown_cache() for KFENCE objects
75+
* @s: cache being shut down
76+
*
77+
* Before shutting down a cache, one must ensure there are no remaining objects
78+
* allocated from it. Because KFENCE objects are not referenced from the cache
79+
* directly, we need to check them here.
80+
*
81+
* Note that shutdown_cache() is internal to SL*B, and kmem_cache_destroy() does
82+
* not return if allocated objects still exist: it prints an error message and
83+
* simply aborts destruction of a cache, leaking memory.
84+
*
85+
* If the only such objects are KFENCE objects, we will not leak the entire
86+
* cache, but instead try to provide more useful debug info by making allocated
87+
* objects "zombie allocations". Objects may then still be used or freed (which
88+
* is handled gracefully), but usage will result in showing KFENCE error reports
89+
* which include stack traces to the user of the object, the original allocation
90+
* site, and caller to shutdown_cache().
91+
*/
92+
void kfence_shutdown_cache(struct kmem_cache *s);
93+
94+
/*
95+
* Allocate a KFENCE object. Allocators must not call this function directly,
96+
* use kfence_alloc() instead.
97+
*/
98+
void *__kfence_alloc(struct kmem_cache *s, size_t size, gfp_t flags);
99+
100+
/**
101+
* kfence_alloc() - allocate a KFENCE object with a low probability
102+
* @s: struct kmem_cache with object requirements
103+
* @size: exact size of the object to allocate (can be less than @s->size
104+
* e.g. for kmalloc caches)
105+
* @flags: GFP flags
106+
*
107+
* Return:
108+
* * NULL - must proceed with allocating as usual,
109+
* * non-NULL - pointer to a KFENCE object.
110+
*
111+
* kfence_alloc() should be inserted into the heap allocation fast path,
112+
* allowing it to transparently return KFENCE-allocated objects with a low
113+
* probability using a static branch (the probability is controlled by the
114+
* kfence.sample_interval boot parameter).
115+
*/
116+
static __always_inline void *kfence_alloc(struct kmem_cache *s, size_t size, gfp_t flags)
117+
{
118+
#ifdef CONFIG_KFENCE_STATIC_KEYS
119+
if (static_branch_unlikely(&kfence_allocation_key))
120+
#else
121+
if (unlikely(!atomic_read(&kfence_allocation_gate)))
122+
#endif
123+
return __kfence_alloc(s, size, flags);
124+
return NULL;
125+
}
126+
127+
/**
128+
* kfence_ksize() - get actual amount of memory allocated for a KFENCE object
129+
* @addr: pointer to a heap object
130+
*
131+
* Return:
132+
* * 0 - not a KFENCE object, must call __ksize() instead,
133+
* * non-0 - this many bytes can be accessed without causing a memory error.
134+
*
135+
* kfence_ksize() returns the number of bytes requested for a KFENCE object at
136+
* allocation time. This number may be less than the object size of the
137+
* corresponding struct kmem_cache.
138+
*/
139+
size_t kfence_ksize(const void *addr);
140+
141+
/**
142+
* kfence_object_start() - find the beginning of a KFENCE object
143+
* @addr: address within a KFENCE-allocated object
144+
*
145+
* Return: address of the beginning of the object.
146+
*
147+
* SL[AU]B-allocated objects are laid out within a page one by one, so it is
148+
* easy to calculate the beginning of an object given a pointer inside it and
149+
* the object size. The same is not true for KFENCE, which places a single
150+
* object at either end of the page. This helper function is used to find the
151+
* beginning of a KFENCE-allocated object.
152+
*/
153+
void *kfence_object_start(const void *addr);
154+
155+
/**
156+
* __kfence_free() - release a KFENCE heap object to KFENCE pool
157+
* @addr: object to be freed
158+
*
159+
* Requires: is_kfence_address(addr)
160+
*
161+
* Release a KFENCE object and mark it as freed.
162+
*/
163+
void __kfence_free(void *addr);
164+
165+
/**
166+
* kfence_free() - try to release an arbitrary heap object to KFENCE pool
167+
* @addr: object to be freed
168+
*
169+
* Return:
170+
* * false - object doesn't belong to KFENCE pool and was ignored,
171+
* * true - object was released to KFENCE pool.
172+
*
173+
* Release a KFENCE object and mark it as freed. May be called on any object,
174+
* even non-KFENCE objects, to simplify integration of the hooks into the
175+
* allocator's free codepath. The allocator must check the return value to
176+
* determine if it was a KFENCE object or not.
177+
*/
178+
static __always_inline __must_check bool kfence_free(void *addr)
179+
{
180+
if (!is_kfence_address(addr))
181+
return false;
182+
__kfence_free(addr);
183+
return true;
184+
}
185+
186+
/**
187+
* kfence_handle_page_fault() - perform page fault handling for KFENCE pages
188+
* @addr: faulting address
189+
*
190+
* Return:
191+
* * false - address outside KFENCE pool,
192+
* * true - page fault handled by KFENCE, no additional handling required.
193+
*
194+
* A page fault inside KFENCE pool indicates a memory error, such as an
195+
* out-of-bounds access, a use-after-free or an invalid memory access. In these
196+
* cases KFENCE prints an error message and marks the offending page as
197+
* present, so that the kernel can proceed.
198+
*/
199+
bool __must_check kfence_handle_page_fault(unsigned long addr);
200+
201+
#else /* CONFIG_KFENCE */
202+
203+
static inline bool is_kfence_address(const void *addr) { return false; }
204+
static inline void kfence_alloc_pool(void) { }
205+
static inline void kfence_init(void) { }
206+
static inline void kfence_shutdown_cache(struct kmem_cache *s) { }
207+
static inline void *kfence_alloc(struct kmem_cache *s, size_t size, gfp_t flags) { return NULL; }
208+
static inline size_t kfence_ksize(const void *addr) { return 0; }
209+
static inline void *kfence_object_start(const void *addr) { return NULL; }
210+
static inline void __kfence_free(void *addr) { }
211+
static inline bool __must_check kfence_free(void *addr) { return false; }
212+
static inline bool __must_check kfence_handle_page_fault(unsigned long addr) { return false; }
213+
214+
#endif
215+
216+
#endif /* _LINUX_KFENCE_H */

init/main.c

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@
4040
#include <linux/security.h>
4141
#include <linux/smp.h>
4242
#include <linux/profile.h>
43+
#include <linux/kfence.h>
4344
#include <linux/rcupdate.h>
4445
#include <linux/moduleparam.h>
4546
#include <linux/kallsyms.h>
@@ -824,6 +825,7 @@ static void __init mm_init(void)
824825
*/
825826
page_ext_init_flatmem();
826827
init_mem_debugging_and_hardening();
828+
kfence_alloc_pool();
827829
report_meminit();
828830
mem_init();
829831
/* page_owner must be initialized after buddy is ready */
@@ -955,6 +957,7 @@ asmlinkage __visible void __init __no_sanitize_address start_kernel(void)
955957
hrtimers_init();
956958
softirq_init();
957959
timekeeping_init();
960+
kfence_init();
958961

959962
/*
960963
* For best initial stack canary entropy, prepare it after:

lib/Kconfig.debug

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -938,6 +938,7 @@ config DEBUG_STACKOVERFLOW
938938
If in doubt, say "N".
939939

940940
source "lib/Kconfig.kasan"
941+
source "lib/Kconfig.kfence"
941942

942943
endmenu # "Memory Debugging"
943944

lib/Kconfig.kfence

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
# SPDX-License-Identifier: GPL-2.0-only
2+
3+
config HAVE_ARCH_KFENCE
4+
bool
5+
6+
menuconfig KFENCE
7+
bool "KFENCE: low-overhead sampling-based memory safety error detector"
8+
depends on HAVE_ARCH_KFENCE && !KASAN && (SLAB || SLUB)
9+
select STACKTRACE
10+
help
11+
KFENCE is a low-overhead sampling-based detector of heap out-of-bounds
12+
access, use-after-free, and invalid-free errors. KFENCE is designed
13+
to have negligible cost to permit enabling it in production
14+
environments.
15+
16+
Note that, KFENCE is not a substitute for explicit testing with tools
17+
such as KASAN. KFENCE can detect a subset of bugs that KASAN can
18+
detect, albeit at very different performance profiles. If you can
19+
afford to use KASAN, continue using KASAN, for example in test
20+
environments. If your kernel targets production use, and cannot
21+
enable KASAN due to its cost, consider using KFENCE.
22+
23+
if KFENCE
24+
25+
config KFENCE_STATIC_KEYS
26+
bool "Use static keys to set up allocations"
27+
default y
28+
depends on JUMP_LABEL # To ensure performance, require jump labels
29+
help
30+
Use static keys (static branches) to set up KFENCE allocations. Using
31+
static keys is normally recommended, because it avoids a dynamic
32+
branch in the allocator's fast path. However, with very low sample
33+
intervals, or on systems that do not support jump labels, a dynamic
34+
branch may still be an acceptable performance trade-off.
35+
36+
config KFENCE_SAMPLE_INTERVAL
37+
int "Default sample interval in milliseconds"
38+
default 100
39+
help
40+
The KFENCE sample interval determines the frequency with which heap
41+
allocations will be guarded by KFENCE. May be overridden via boot
42+
parameter "kfence.sample_interval".
43+
44+
Set this to 0 to disable KFENCE by default, in which case only
45+
setting "kfence.sample_interval" to a non-zero value enables KFENCE.
46+
47+
config KFENCE_NUM_OBJECTS
48+
int "Number of guarded objects available"
49+
range 1 65535
50+
default 255
51+
help
52+
The number of guarded objects available. For each KFENCE object, 2
53+
pages are required; with one containing the object and two adjacent
54+
ones used as guard pages.
55+
56+
config KFENCE_STRESS_TEST_FAULTS
57+
int "Stress testing of fault handling and error reporting" if EXPERT
58+
default 0
59+
help
60+
The inverse probability with which to randomly protect KFENCE object
61+
pages, resulting in spurious use-after-frees. The main purpose of
62+
this option is to stress test KFENCE with concurrent error reports
63+
and allocations/frees. A value of 0 disables stress testing logic.
64+
65+
Only for KFENCE testing; set to 0 if you are not a KFENCE developer.
66+
67+
endif # KFENCE

mm/Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,7 @@ obj-$(CONFIG_PAGE_POISONING) += page_poison.o
8181
obj-$(CONFIG_SLAB) += slab.o
8282
obj-$(CONFIG_SLUB) += slub.o
8383
obj-$(CONFIG_KASAN) += kasan/
84+
obj-$(CONFIG_KFENCE) += kfence/
8485
obj-$(CONFIG_FAILSLAB) += failslab.o
8586
obj-$(CONFIG_MEMORY_HOTPLUG) += memory_hotplug.o
8687
obj-$(CONFIG_MEMTEST) += memtest.o

mm/kfence/Makefile

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# SPDX-License-Identifier: GPL-2.0
2+
3+
obj-$(CONFIG_KFENCE) := core.o report.o

0 commit comments

Comments
 (0)