Skip to content

Commit 492e639

Browse files
yonghong-songAlexei Starovoitov
authored and
Alexei Starovoitov
committed
bpf: Add bpf_seq_printf and bpf_seq_write helpers
Two helpers bpf_seq_printf and bpf_seq_write, are added for writing data to the seq_file buffer. bpf_seq_printf supports common format string flag/width/type fields so at least I can get identical results for netlink and ipv6_route targets. For bpf_seq_printf and bpf_seq_write, return value -EOVERFLOW specifically indicates a write failure due to overflow, which means the object will be repeated in the next bpf invocation if object collection stays the same. Note that if the object collection is changed, depending how collection traversal is done, even if the object still in the collection, it may not be visited. For bpf_seq_printf, format %s, %p{i,I}{4,6} needs to read kernel memory. Reading kernel memory may fail in the following two cases: - invalid kernel address, or - valid kernel address but requiring a major fault If reading kernel memory failed, the %s string will be an empty string and %p{i,I}{4,6} will be all 0. Not returning error to bpf program is consistent with what bpf_trace_printk() does for now. bpf_seq_printf may return -EBUSY meaning that internal percpu buffer for memory copy of strings or other pointees is not available. Bpf program can return 1 to indicate it wants the same object to be repeated. Right now, this should not happen on no-RT kernels since migrate_disable(), which guards bpf prog call, calls preempt_disable(). Signed-off-by: Yonghong Song <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
1 parent b121b34 commit 492e639

File tree

4 files changed

+292
-2
lines changed

4 files changed

+292
-2
lines changed

include/uapi/linux/bpf.h

+38-1
Original file line numberDiff line numberDiff line change
@@ -3077,6 +3077,41 @@ union bpf_attr {
30773077
* See: clock_gettime(CLOCK_BOOTTIME)
30783078
* Return
30793079
* Current *ktime*.
3080+
*
3081+
* int bpf_seq_printf(struct seq_file *m, const char *fmt, u32 fmt_size, const void *data, u32 data_len)
3082+
* Description
3083+
* seq_printf uses seq_file seq_printf() to print out the format string.
3084+
* The *m* represents the seq_file. The *fmt* and *fmt_size* are for
3085+
* the format string itself. The *data* and *data_len* are format string
3086+
* arguments. The *data* are a u64 array and corresponding format string
3087+
* values are stored in the array. For strings and pointers where pointees
3088+
* are accessed, only the pointer values are stored in the *data* array.
3089+
* The *data_len* is the *data* size in term of bytes.
3090+
*
3091+
* Formats **%s**, **%p{i,I}{4,6}** requires to read kernel memory.
3092+
* Reading kernel memory may fail due to either invalid address or
3093+
* valid address but requiring a major memory fault. If reading kernel memory
3094+
* fails, the string for **%s** will be an empty string, and the ip
3095+
* address for **%p{i,I}{4,6}** will be 0. Not returning error to
3096+
* bpf program is consistent with what bpf_trace_printk() does for now.
3097+
* Return
3098+
* 0 on success, or a negative errno in case of failure.
3099+
*
3100+
* * **-EBUSY** Percpu memory copy buffer is busy, can try again
3101+
* by returning 1 from bpf program.
3102+
* * **-EINVAL** Invalid arguments, or invalid/unsupported formats.
3103+
* * **-E2BIG** Too many format specifiers.
3104+
* * **-EOVERFLOW** Overflow happens, the same object will be tried again.
3105+
*
3106+
* int bpf_seq_write(struct seq_file *m, const void *data, u32 len)
3107+
* Description
3108+
* seq_write uses seq_file seq_write() to write the data.
3109+
* The *m* represents the seq_file. The *data* and *len* represent the
3110+
* data to write in bytes.
3111+
* Return
3112+
* 0 on success, or a negative errno in case of failure.
3113+
*
3114+
* * **-EOVERFLOW** Overflow happens, the same object will be tried again.
30803115
*/
30813116
#define __BPF_FUNC_MAPPER(FN) \
30823117
FN(unspec), \
@@ -3204,7 +3239,9 @@ union bpf_attr {
32043239
FN(get_netns_cookie), \
32053240
FN(get_current_ancestor_cgroup_id), \
32063241
FN(sk_assign), \
3207-
FN(ktime_get_boot_ns),
3242+
FN(ktime_get_boot_ns), \
3243+
FN(seq_printf), \
3244+
FN(seq_write),
32083245

32093246
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
32103247
* function eBPF program intends to call

kernel/trace/bpf_trace.c

+214
Original file line numberDiff line numberDiff line change
@@ -457,6 +457,212 @@ const struct bpf_func_proto *bpf_get_trace_printk_proto(void)
457457
return &bpf_trace_printk_proto;
458458
}
459459

460+
#define MAX_SEQ_PRINTF_VARARGS 12
461+
#define MAX_SEQ_PRINTF_MAX_MEMCPY 6
462+
#define MAX_SEQ_PRINTF_STR_LEN 128
463+
464+
struct bpf_seq_printf_buf {
465+
char buf[MAX_SEQ_PRINTF_MAX_MEMCPY][MAX_SEQ_PRINTF_STR_LEN];
466+
};
467+
static DEFINE_PER_CPU(struct bpf_seq_printf_buf, bpf_seq_printf_buf);
468+
static DEFINE_PER_CPU(int, bpf_seq_printf_buf_used);
469+
470+
BPF_CALL_5(bpf_seq_printf, struct seq_file *, m, char *, fmt, u32, fmt_size,
471+
const void *, data, u32, data_len)
472+
{
473+
int err = -EINVAL, fmt_cnt = 0, memcpy_cnt = 0;
474+
int i, buf_used, copy_size, num_args;
475+
u64 params[MAX_SEQ_PRINTF_VARARGS];
476+
struct bpf_seq_printf_buf *bufs;
477+
const u64 *args = data;
478+
479+
buf_used = this_cpu_inc_return(bpf_seq_printf_buf_used);
480+
if (WARN_ON_ONCE(buf_used > 1)) {
481+
err = -EBUSY;
482+
goto out;
483+
}
484+
485+
bufs = this_cpu_ptr(&bpf_seq_printf_buf);
486+
487+
/*
488+
* bpf_check()->check_func_arg()->check_stack_boundary()
489+
* guarantees that fmt points to bpf program stack,
490+
* fmt_size bytes of it were initialized and fmt_size > 0
491+
*/
492+
if (fmt[--fmt_size] != 0)
493+
goto out;
494+
495+
if (data_len & 7)
496+
goto out;
497+
498+
for (i = 0; i < fmt_size; i++) {
499+
if (fmt[i] == '%') {
500+
if (fmt[i + 1] == '%')
501+
i++;
502+
else if (!data || !data_len)
503+
goto out;
504+
}
505+
}
506+
507+
num_args = data_len / 8;
508+
509+
/* check format string for allowed specifiers */
510+
for (i = 0; i < fmt_size; i++) {
511+
/* only printable ascii for now. */
512+
if ((!isprint(fmt[i]) && !isspace(fmt[i])) || !isascii(fmt[i])) {
513+
err = -EINVAL;
514+
goto out;
515+
}
516+
517+
if (fmt[i] != '%')
518+
continue;
519+
520+
if (fmt[i + 1] == '%') {
521+
i++;
522+
continue;
523+
}
524+
525+
if (fmt_cnt >= MAX_SEQ_PRINTF_VARARGS) {
526+
err = -E2BIG;
527+
goto out;
528+
}
529+
530+
if (fmt_cnt >= num_args) {
531+
err = -EINVAL;
532+
goto out;
533+
}
534+
535+
/* fmt[i] != 0 && fmt[last] == 0, so we can access fmt[i + 1] */
536+
i++;
537+
538+
/* skip optional "[0 +-][num]" width formating field */
539+
while (fmt[i] == '0' || fmt[i] == '+' || fmt[i] == '-' ||
540+
fmt[i] == ' ')
541+
i++;
542+
if (fmt[i] >= '1' && fmt[i] <= '9') {
543+
i++;
544+
while (fmt[i] >= '0' && fmt[i] <= '9')
545+
i++;
546+
}
547+
548+
if (fmt[i] == 's') {
549+
/* try our best to copy */
550+
if (memcpy_cnt >= MAX_SEQ_PRINTF_MAX_MEMCPY) {
551+
err = -E2BIG;
552+
goto out;
553+
}
554+
555+
err = strncpy_from_unsafe(bufs->buf[memcpy_cnt],
556+
(void *) (long) args[fmt_cnt],
557+
MAX_SEQ_PRINTF_STR_LEN);
558+
if (err < 0)
559+
bufs->buf[memcpy_cnt][0] = '\0';
560+
params[fmt_cnt] = (u64)(long)bufs->buf[memcpy_cnt];
561+
562+
fmt_cnt++;
563+
memcpy_cnt++;
564+
continue;
565+
}
566+
567+
if (fmt[i] == 'p') {
568+
if (fmt[i + 1] == 0 ||
569+
fmt[i + 1] == 'K' ||
570+
fmt[i + 1] == 'x') {
571+
/* just kernel pointers */
572+
params[fmt_cnt] = args[fmt_cnt];
573+
fmt_cnt++;
574+
continue;
575+
}
576+
577+
/* only support "%pI4", "%pi4", "%pI6" and "%pi6". */
578+
if (fmt[i + 1] != 'i' && fmt[i + 1] != 'I') {
579+
err = -EINVAL;
580+
goto out;
581+
}
582+
if (fmt[i + 2] != '4' && fmt[i + 2] != '6') {
583+
err = -EINVAL;
584+
goto out;
585+
}
586+
587+
if (memcpy_cnt >= MAX_SEQ_PRINTF_MAX_MEMCPY) {
588+
err = -E2BIG;
589+
goto out;
590+
}
591+
592+
593+
copy_size = (fmt[i + 2] == '4') ? 4 : 16;
594+
595+
err = probe_kernel_read(bufs->buf[memcpy_cnt],
596+
(void *) (long) args[fmt_cnt],
597+
copy_size);
598+
if (err < 0)
599+
memset(bufs->buf[memcpy_cnt], 0, copy_size);
600+
params[fmt_cnt] = (u64)(long)bufs->buf[memcpy_cnt];
601+
602+
i += 2;
603+
fmt_cnt++;
604+
memcpy_cnt++;
605+
continue;
606+
}
607+
608+
if (fmt[i] == 'l') {
609+
i++;
610+
if (fmt[i] == 'l')
611+
i++;
612+
}
613+
614+
if (fmt[i] != 'i' && fmt[i] != 'd' &&
615+
fmt[i] != 'u' && fmt[i] != 'x') {
616+
err = -EINVAL;
617+
goto out;
618+
}
619+
620+
params[fmt_cnt] = args[fmt_cnt];
621+
fmt_cnt++;
622+
}
623+
624+
/* Maximumly we can have MAX_SEQ_PRINTF_VARARGS parameter, just give
625+
* all of them to seq_printf().
626+
*/
627+
seq_printf(m, fmt, params[0], params[1], params[2], params[3],
628+
params[4], params[5], params[6], params[7], params[8],
629+
params[9], params[10], params[11]);
630+
631+
err = seq_has_overflowed(m) ? -EOVERFLOW : 0;
632+
out:
633+
this_cpu_dec(bpf_seq_printf_buf_used);
634+
return err;
635+
}
636+
637+
static int bpf_seq_printf_btf_ids[5];
638+
static const struct bpf_func_proto bpf_seq_printf_proto = {
639+
.func = bpf_seq_printf,
640+
.gpl_only = true,
641+
.ret_type = RET_INTEGER,
642+
.arg1_type = ARG_PTR_TO_BTF_ID,
643+
.arg2_type = ARG_PTR_TO_MEM,
644+
.arg3_type = ARG_CONST_SIZE,
645+
.arg4_type = ARG_PTR_TO_MEM_OR_NULL,
646+
.arg5_type = ARG_CONST_SIZE_OR_ZERO,
647+
.btf_id = bpf_seq_printf_btf_ids,
648+
};
649+
650+
BPF_CALL_3(bpf_seq_write, struct seq_file *, m, const void *, data, u32, len)
651+
{
652+
return seq_write(m, data, len) ? -EOVERFLOW : 0;
653+
}
654+
655+
static int bpf_seq_write_btf_ids[5];
656+
static const struct bpf_func_proto bpf_seq_write_proto = {
657+
.func = bpf_seq_write,
658+
.gpl_only = true,
659+
.ret_type = RET_INTEGER,
660+
.arg1_type = ARG_PTR_TO_BTF_ID,
661+
.arg2_type = ARG_PTR_TO_MEM,
662+
.arg3_type = ARG_CONST_SIZE_OR_ZERO,
663+
.btf_id = bpf_seq_write_btf_ids,
664+
};
665+
460666
static __always_inline int
461667
get_map_perf_counter(struct bpf_map *map, u64 flags,
462668
u64 *value, u64 *enabled, u64 *running)
@@ -1226,6 +1432,14 @@ tracing_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
12261432
case BPF_FUNC_xdp_output:
12271433
return &bpf_xdp_output_proto;
12281434
#endif
1435+
case BPF_FUNC_seq_printf:
1436+
return prog->expected_attach_type == BPF_TRACE_ITER ?
1437+
&bpf_seq_printf_proto :
1438+
NULL;
1439+
case BPF_FUNC_seq_write:
1440+
return prog->expected_attach_type == BPF_TRACE_ITER ?
1441+
&bpf_seq_write_proto :
1442+
NULL;
12291443
default:
12301444
return raw_tp_prog_func_proto(func_id, prog);
12311445
}

scripts/bpf_helpers_doc.py

+2
Original file line numberDiff line numberDiff line change
@@ -414,6 +414,7 @@ class PrinterHelpers(Printer):
414414
'struct sk_reuseport_md',
415415
'struct sockaddr',
416416
'struct tcphdr',
417+
'struct seq_file',
417418

418419
'struct __sk_buff',
419420
'struct sk_msg_md',
@@ -450,6 +451,7 @@ class PrinterHelpers(Printer):
450451
'struct sk_reuseport_md',
451452
'struct sockaddr',
452453
'struct tcphdr',
454+
'struct seq_file',
453455
}
454456
mapped_types = {
455457
'u8': '__u8',

tools/include/uapi/linux/bpf.h

+38-1
Original file line numberDiff line numberDiff line change
@@ -3077,6 +3077,41 @@ union bpf_attr {
30773077
* See: clock_gettime(CLOCK_BOOTTIME)
30783078
* Return
30793079
* Current *ktime*.
3080+
*
3081+
* int bpf_seq_printf(struct seq_file *m, const char *fmt, u32 fmt_size, const void *data, u32 data_len)
3082+
* Description
3083+
* seq_printf uses seq_file seq_printf() to print out the format string.
3084+
* The *m* represents the seq_file. The *fmt* and *fmt_size* are for
3085+
* the format string itself. The *data* and *data_len* are format string
3086+
* arguments. The *data* are a u64 array and corresponding format string
3087+
* values are stored in the array. For strings and pointers where pointees
3088+
* are accessed, only the pointer values are stored in the *data* array.
3089+
* The *data_len* is the *data* size in term of bytes.
3090+
*
3091+
* Formats **%s**, **%p{i,I}{4,6}** requires to read kernel memory.
3092+
* Reading kernel memory may fail due to either invalid address or
3093+
* valid address but requiring a major memory fault. If reading kernel memory
3094+
* fails, the string for **%s** will be an empty string, and the ip
3095+
* address for **%p{i,I}{4,6}** will be 0. Not returning error to
3096+
* bpf program is consistent with what bpf_trace_printk() does for now.
3097+
* Return
3098+
* 0 on success, or a negative errno in case of failure.
3099+
*
3100+
* * **-EBUSY** Percpu memory copy buffer is busy, can try again
3101+
* by returning 1 from bpf program.
3102+
* * **-EINVAL** Invalid arguments, or invalid/unsupported formats.
3103+
* * **-E2BIG** Too many format specifiers.
3104+
* * **-EOVERFLOW** Overflow happens, the same object will be tried again.
3105+
*
3106+
* int bpf_seq_write(struct seq_file *m, const void *data, u32 len)
3107+
* Description
3108+
* seq_write uses seq_file seq_write() to write the data.
3109+
* The *m* represents the seq_file. The *data* and *len* represent the
3110+
* data to write in bytes.
3111+
* Return
3112+
* 0 on success, or a negative errno in case of failure.
3113+
*
3114+
* * **-EOVERFLOW** Overflow happens, the same object will be tried again.
30803115
*/
30813116
#define __BPF_FUNC_MAPPER(FN) \
30823117
FN(unspec), \
@@ -3204,7 +3239,9 @@ union bpf_attr {
32043239
FN(get_netns_cookie), \
32053240
FN(get_current_ancestor_cgroup_id), \
32063241
FN(sk_assign), \
3207-
FN(ktime_get_boot_ns),
3242+
FN(ktime_get_boot_ns), \
3243+
FN(seq_printf), \
3244+
FN(seq_write),
32083245

32093246
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
32103247
* function eBPF program intends to call

0 commit comments

Comments
 (0)