diff options
author | Yonghong Song <yonghong.song@linux.dev> | 2025-02-24 15:01:16 -0800 |
---|---|---|
committer | Alexei Starovoitov <ast@kernel.org> | 2025-03-15 11:48:25 -0700 |
commit | 4b82b181a26cff8bf7adc3a85a88d121d92edeaf (patch) | |
tree | 34e5d98d08a388af82b452e57ad18123f7608208 /kernel/bpf/syscall.c | |
parent | 7c2f207a524e8ca150cb8d8a72254e05dc72e4d8 (diff) | |
download | linux-4b82b181a26cff8bf7adc3a85a88d121d92edeaf.tar.gz linux-4b82b181a26cff8bf7adc3a85a88d121d92edeaf.tar.bz2 linux-4b82b181a26cff8bf7adc3a85a88d121d92edeaf.zip |
bpf: Allow pre-ordering for bpf cgroup progs
Currently for bpf progs in a cgroup hierarchy, the effective prog array
is computed from bottom cgroup to upper cgroups (post-ordering). For
example, the following cgroup hierarchy
root cgroup: p1, p2
subcgroup: p3, p4
have BPF_F_ALLOW_MULTI for both cgroup levels.
The effective cgroup array ordering looks like
p3 p4 p1 p2
and at run time, progs will execute based on that order.
But in some cases, it is desirable to have root prog executes earlier than
children progs (pre-ordering). For example,
- prog p1 intends to collect original pkt dest addresses.
- prog p3 will modify original pkt dest addresses to a proxy address for
security reason.
The end result is that prog p1 gets proxy address which is not what it
wants. Putting p1 to every child cgroup is not desirable either as it
will duplicate itself in many child cgroups. And this is exactly a use case
we are encountering in Meta.
To fix this issue, let us introduce a flag BPF_F_PREORDER. If the flag
is specified at attachment time, the prog has higher priority and the
ordering with that flag will be from top to bottom (pre-ordering).
For example, in the above example,
root cgroup: p1, p2
subcgroup: p3, p4
Let us say p2 and p4 are marked with BPF_F_PREORDER. The final
effective array ordering will be
p2 p4 p3 p1
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250224230116.283071-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Diffstat (limited to 'kernel/bpf/syscall.c')
-rw-r--r-- | kernel/bpf/syscall.c | 3 |
1 files changed, 2 insertions, 1 deletions
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index dbd89c13dd32..d799fe8f568e 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -4170,7 +4170,8 @@ static int bpf_prog_attach_check_attach_type(const struct bpf_prog *prog, #define BPF_F_ATTACH_MASK_BASE \ (BPF_F_ALLOW_OVERRIDE | \ BPF_F_ALLOW_MULTI | \ - BPF_F_REPLACE) + BPF_F_REPLACE | \ + BPF_F_PREORDER) #define BPF_F_ATTACH_MASK_MPROG \ (BPF_F_REPLACE | \ |