<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/kernel/bpf/devmap.c, branch v6.12.80</title>
<subtitle>Clone of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git</subtitle>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/'/>
<entry>
<title>bpf: Fix stack-out-of-bounds write in devmap</title>
<updated>2026-03-13T16:20:20+00:00</updated>
<author>
<name>Kohei Enju</name>
<email>kohei@enjuk.jp</email>
</author>
<published>2026-02-25T05:34:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=d2c31d8e03d05edc16656e5ffe187f0d1da763d7'/>
<id>d2c31d8e03d05edc16656e5ffe187f0d1da763d7</id>
<content type='text'>
[ Upstream commit b7bf516c3ecd9a2aae2dc2635178ab87b734fef1 ]

get_upper_ifindexes() iterates over all upper devices and writes their
indices into an array without checking bounds.

Also the callers assume that the max number of upper devices is
MAX_NEST_DEV and allocate excluded_devices[1+MAX_NEST_DEV] on the stack,
but that assumption is not correct and the number of upper devices could
be larger than MAX_NEST_DEV (e.g., many macvlans), causing a
stack-out-of-bounds write.

Add a max parameter to get_upper_ifindexes() to avoid the issue.
When there are too many upper devices, return -EOVERFLOW and abort the
redirect.

To reproduce, create more than MAX_NEST_DEV(8) macvlans on a device with
an XDP program attached using BPF_F_BROADCAST | BPF_F_EXCLUDE_INGRESS.
Then send a packet to the device to trigger the XDP redirect path.

Reported-by: syzbot+10cc7f13760b31bd2e61@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/698c4ce3.050a0220.340abe.000b.GAE@google.com/T/
Fixes: aeea1b86f936 ("bpf, devmap: Exclude XDP broadcast to master device")
Reviewed-by: Toke Høiland-Jørgensen &lt;toke@redhat.com&gt;
Signed-off-by: Kohei Enju &lt;kohei@enjuk.jp&gt;
Link: https://lore.kernel.org/r/20260225053506.4738-1-kohei@enjuk.jp
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit b7bf516c3ecd9a2aae2dc2635178ab87b734fef1 ]

get_upper_ifindexes() iterates over all upper devices and writes their
indices into an array without checking bounds.

Also the callers assume that the max number of upper devices is
MAX_NEST_DEV and allocate excluded_devices[1+MAX_NEST_DEV] on the stack,
but that assumption is not correct and the number of upper devices could
be larger than MAX_NEST_DEV (e.g., many macvlans), causing a
stack-out-of-bounds write.

Add a max parameter to get_upper_ifindexes() to avoid the issue.
When there are too many upper devices, return -EOVERFLOW and abort the
redirect.

To reproduce, create more than MAX_NEST_DEV(8) macvlans on a device with
an XDP program attached using BPF_F_BROADCAST | BPF_F_EXCLUDE_INGRESS.
Then send a packet to the device to trigger the XDP redirect path.

Reported-by: syzbot+10cc7f13760b31bd2e61@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/698c4ce3.050a0220.340abe.000b.GAE@google.com/T/
Fixes: aeea1b86f936 ("bpf, devmap: Exclude XDP broadcast to master device")
Reviewed-by: Toke Høiland-Jørgensen &lt;toke@redhat.com&gt;
Signed-off-by: Kohei Enju &lt;kohei@enjuk.jp&gt;
Link: https://lore.kernel.org/r/20260225053506.4738-1-kohei@enjuk.jp
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bpf: fix OOB devmap writes when deleting elements</title>
<updated>2024-12-14T19:03:30+00:00</updated>
<author>
<name>Maciej Fijalkowski</name>
<email>maciej.fijalkowski@intel.com</email>
</author>
<published>2024-11-22T12:10:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=178e31df1fb3d9e0890eb471da16709cbc82edee'/>
<id>178e31df1fb3d9e0890eb471da16709cbc82edee</id>
<content type='text'>
commit ab244dd7cf4c291f82faacdc50b45cc0f55b674d upstream.

Jordy reported issue against XSKMAP which also applies to DEVMAP - the
index used for accessing map entry, due to being a signed integer,
causes the OOB writes. Fix is simple as changing the type from int to
u32, however, when compared to XSKMAP case, one more thing needs to be
addressed.

When map is released from system via dev_map_free(), we iterate through
all of the entries and an iterator variable is also an int, which
implies OOB accesses. Again, change it to be u32.

Example splat below:

[  160.724676] BUG: unable to handle page fault for address: ffffc8fc2c001000
[  160.731662] #PF: supervisor read access in kernel mode
[  160.736876] #PF: error_code(0x0000) - not-present page
[  160.742095] PGD 0 P4D 0
[  160.744678] Oops: Oops: 0000 [#1] PREEMPT SMP
[  160.749106] CPU: 1 UID: 0 PID: 520 Comm: kworker/u145:12 Not tainted 6.12.0-rc1+ #487
[  160.757050] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
[  160.767642] Workqueue: events_unbound bpf_map_free_deferred
[  160.773308] RIP: 0010:dev_map_free+0x77/0x170
[  160.777735] Code: 00 e8 fd 91 ed ff e8 b8 73 ed ff 41 83 7d 18 19 74 6e 41 8b 45 24 49 8b bd f8 00 00 00 31 db 85 c0 74 48 48 63 c3 48 8d 04 c7 &lt;48&gt; 8b 28 48 85 ed 74 30 48 8b 7d 18 48 85 ff 74 05 e8 b3 52 fa ff
[  160.796777] RSP: 0018:ffffc9000ee1fe38 EFLAGS: 00010202
[  160.802086] RAX: ffffc8fc2c001000 RBX: 0000000080000000 RCX: 0000000000000024
[  160.809331] RDX: 0000000000000000 RSI: 0000000000000024 RDI: ffffc9002c001000
[  160.816576] RBP: 0000000000000000 R08: 0000000000000023 R09: 0000000000000001
[  160.823823] R10: 0000000000000001 R11: 00000000000ee6b2 R12: dead000000000122
[  160.831066] R13: ffff88810c928e00 R14: ffff8881002df405 R15: 0000000000000000
[  160.838310] FS:  0000000000000000(0000) GS:ffff8897e0c40000(0000) knlGS:0000000000000000
[  160.846528] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  160.852357] CR2: ffffc8fc2c001000 CR3: 0000000005c32006 CR4: 00000000007726f0
[  160.859604] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  160.866847] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  160.874092] PKRU: 55555554
[  160.876847] Call Trace:
[  160.879338]  &lt;TASK&gt;
[  160.881477]  ? __die+0x20/0x60
[  160.884586]  ? page_fault_oops+0x15a/0x450
[  160.888746]  ? search_extable+0x22/0x30
[  160.892647]  ? search_bpf_extables+0x5f/0x80
[  160.896988]  ? exc_page_fault+0xa9/0x140
[  160.900973]  ? asm_exc_page_fault+0x22/0x30
[  160.905232]  ? dev_map_free+0x77/0x170
[  160.909043]  ? dev_map_free+0x58/0x170
[  160.912857]  bpf_map_free_deferred+0x51/0x90
[  160.917196]  process_one_work+0x142/0x370
[  160.921272]  worker_thread+0x29e/0x3b0
[  160.925082]  ? rescuer_thread+0x4b0/0x4b0
[  160.929157]  kthread+0xd4/0x110
[  160.932355]  ? kthread_park+0x80/0x80
[  160.936079]  ret_from_fork+0x2d/0x50
[  160.943396]  ? kthread_park+0x80/0x80
[  160.950803]  ret_from_fork_asm+0x11/0x20
[  160.958482]  &lt;/TASK&gt;

Fixes: 546ac1ffb70d ("bpf: add devmap, a map for storing net device references")
CC: stable@vger.kernel.org
Reported-by: Jordy Zomer &lt;jordyzomer@google.com&gt;
Suggested-by: Jordy Zomer &lt;jordyzomer@google.com&gt;
Reviewed-by: Toke Høiland-Jørgensen &lt;toke@redhat.com&gt;
Acked-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Signed-off-by: Maciej Fijalkowski &lt;maciej.fijalkowski@intel.com&gt;
Link: https://lore.kernel.org/r/20241122121030.716788-3-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit ab244dd7cf4c291f82faacdc50b45cc0f55b674d upstream.

Jordy reported issue against XSKMAP which also applies to DEVMAP - the
index used for accessing map entry, due to being a signed integer,
causes the OOB writes. Fix is simple as changing the type from int to
u32, however, when compared to XSKMAP case, one more thing needs to be
addressed.

When map is released from system via dev_map_free(), we iterate through
all of the entries and an iterator variable is also an int, which
implies OOB accesses. Again, change it to be u32.

Example splat below:

[  160.724676] BUG: unable to handle page fault for address: ffffc8fc2c001000
[  160.731662] #PF: supervisor read access in kernel mode
[  160.736876] #PF: error_code(0x0000) - not-present page
[  160.742095] PGD 0 P4D 0
[  160.744678] Oops: Oops: 0000 [#1] PREEMPT SMP
[  160.749106] CPU: 1 UID: 0 PID: 520 Comm: kworker/u145:12 Not tainted 6.12.0-rc1+ #487
[  160.757050] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
[  160.767642] Workqueue: events_unbound bpf_map_free_deferred
[  160.773308] RIP: 0010:dev_map_free+0x77/0x170
[  160.777735] Code: 00 e8 fd 91 ed ff e8 b8 73 ed ff 41 83 7d 18 19 74 6e 41 8b 45 24 49 8b bd f8 00 00 00 31 db 85 c0 74 48 48 63 c3 48 8d 04 c7 &lt;48&gt; 8b 28 48 85 ed 74 30 48 8b 7d 18 48 85 ff 74 05 e8 b3 52 fa ff
[  160.796777] RSP: 0018:ffffc9000ee1fe38 EFLAGS: 00010202
[  160.802086] RAX: ffffc8fc2c001000 RBX: 0000000080000000 RCX: 0000000000000024
[  160.809331] RDX: 0000000000000000 RSI: 0000000000000024 RDI: ffffc9002c001000
[  160.816576] RBP: 0000000000000000 R08: 0000000000000023 R09: 0000000000000001
[  160.823823] R10: 0000000000000001 R11: 00000000000ee6b2 R12: dead000000000122
[  160.831066] R13: ffff88810c928e00 R14: ffff8881002df405 R15: 0000000000000000
[  160.838310] FS:  0000000000000000(0000) GS:ffff8897e0c40000(0000) knlGS:0000000000000000
[  160.846528] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  160.852357] CR2: ffffc8fc2c001000 CR3: 0000000005c32006 CR4: 00000000007726f0
[  160.859604] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  160.866847] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  160.874092] PKRU: 55555554
[  160.876847] Call Trace:
[  160.879338]  &lt;TASK&gt;
[  160.881477]  ? __die+0x20/0x60
[  160.884586]  ? page_fault_oops+0x15a/0x450
[  160.888746]  ? search_extable+0x22/0x30
[  160.892647]  ? search_bpf_extables+0x5f/0x80
[  160.896988]  ? exc_page_fault+0xa9/0x140
[  160.900973]  ? asm_exc_page_fault+0x22/0x30
[  160.905232]  ? dev_map_free+0x77/0x170
[  160.909043]  ? dev_map_free+0x58/0x170
[  160.912857]  bpf_map_free_deferred+0x51/0x90
[  160.917196]  process_one_work+0x142/0x370
[  160.921272]  worker_thread+0x29e/0x3b0
[  160.925082]  ? rescuer_thread+0x4b0/0x4b0
[  160.929157]  kthread+0xd4/0x110
[  160.932355]  ? kthread_park+0x80/0x80
[  160.936079]  ret_from_fork+0x2d/0x50
[  160.943396]  ? kthread_park+0x80/0x80
[  160.950803]  ret_from_fork_asm+0x11/0x20
[  160.958482]  &lt;/TASK&gt;

Fixes: 546ac1ffb70d ("bpf: add devmap, a map for storing net device references")
CC: stable@vger.kernel.org
Reported-by: Jordy Zomer &lt;jordyzomer@google.com&gt;
Suggested-by: Jordy Zomer &lt;jordyzomer@google.com&gt;
Reviewed-by: Toke Høiland-Jørgensen &lt;toke@redhat.com&gt;
Acked-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Signed-off-by: Maciej Fijalkowski &lt;maciej.fijalkowski@intel.com&gt;
Link: https://lore.kernel.org/r/20241122121030.716788-3-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bpf: devmap: provide rxq after redirect</title>
<updated>2024-10-02T20:48:26+00:00</updated>
<author>
<name>Florian Kauer</name>
<email>florian.kauer@linutronix.de</email>
</author>
<published>2024-09-11T08:41:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=ca9984c5f0ab3690d98b13937b2485a978c8dd73'/>
<id>ca9984c5f0ab3690d98b13937b2485a978c8dd73</id>
<content type='text'>
rxq contains a pointer to the device from where
the redirect happened. Currently, the BPF program
that was executed after a redirect via BPF_MAP_TYPE_DEVMAP*
does not have it set.

This is particularly bad since accessing ingress_ifindex, e.g.

SEC("xdp")
int prog(struct xdp_md *pkt)
{
        return bpf_redirect_map(&amp;dev_redirect_map, 0, 0);
}

SEC("xdp/devmap")
int prog_after_redirect(struct xdp_md *pkt)
{
        bpf_printk("ifindex %i", pkt-&gt;ingress_ifindex);
        return XDP_PASS;
}

depends on access to rxq, so a NULL pointer gets dereferenced:

&lt;1&gt;[  574.475170] BUG: kernel NULL pointer dereference, address: 0000000000000000
&lt;1&gt;[  574.475188] #PF: supervisor read access in kernel mode
&lt;1&gt;[  574.475194] #PF: error_code(0x0000) - not-present page
&lt;6&gt;[  574.475199] PGD 0 P4D 0
&lt;4&gt;[  574.475207] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
&lt;4&gt;[  574.475217] CPU: 4 UID: 0 PID: 217 Comm: kworker/4:1 Not tainted 6.11.0-rc5-reduced-00859-g780801200300 #23
&lt;4&gt;[  574.475226] Hardware name: Intel(R) Client Systems NUC13ANHi7/NUC13ANBi7, BIOS ANRPL357.0026.2023.0314.1458 03/14/2023
&lt;4&gt;[  574.475231] Workqueue: mld mld_ifc_work
&lt;4&gt;[  574.475247] RIP: 0010:bpf_prog_5e13354d9cf5018a_prog_after_redirect+0x17/0x3c
&lt;4&gt;[  574.475257] Code: cc cc cc cc cc cc cc 80 00 00 00 cc cc cc cc cc cc cc cc f3 0f 1e fa 0f 1f 44 00 00 66 90 55 48 89 e5 f3 0f 1e fa 48 8b 57 20 &lt;48&gt; 8b 52 00 8b 92 e0 00 00 00 48 bf f8 a6 d5 c4 5d a0 ff ff be 0b
&lt;4&gt;[  574.475263] RSP: 0018:ffffa62440280c98 EFLAGS: 00010206
&lt;4&gt;[  574.475269] RAX: ffffa62440280cd8 RBX: 0000000000000001 RCX: 0000000000000000
&lt;4&gt;[  574.475274] RDX: 0000000000000000 RSI: ffffa62440549048 RDI: ffffa62440280ce0
&lt;4&gt;[  574.475278] RBP: ffffa62440280c98 R08: 0000000000000002 R09: 0000000000000001
&lt;4&gt;[  574.475281] R10: ffffa05dc8b98000 R11: ffffa05f577fca40 R12: ffffa05dcab24000
&lt;4&gt;[  574.475285] R13: ffffa62440280ce0 R14: ffffa62440549048 R15: ffffa62440549000
&lt;4&gt;[  574.475289] FS:  0000000000000000(0000) GS:ffffa05f4f700000(0000) knlGS:0000000000000000
&lt;4&gt;[  574.475294] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
&lt;4&gt;[  574.475298] CR2: 0000000000000000 CR3: 000000025522e000 CR4: 0000000000f50ef0
&lt;4&gt;[  574.475303] PKRU: 55555554
&lt;4&gt;[  574.475306] Call Trace:
&lt;4&gt;[  574.475313]  &lt;IRQ&gt;
&lt;4&gt;[  574.475318]  ? __die+0x23/0x70
&lt;4&gt;[  574.475329]  ? page_fault_oops+0x180/0x4c0
&lt;4&gt;[  574.475339]  ? skb_pp_cow_data+0x34c/0x490
&lt;4&gt;[  574.475346]  ? kmem_cache_free+0x257/0x280
&lt;4&gt;[  574.475357]  ? exc_page_fault+0x67/0x150
&lt;4&gt;[  574.475368]  ? asm_exc_page_fault+0x26/0x30
&lt;4&gt;[  574.475381]  ? bpf_prog_5e13354d9cf5018a_prog_after_redirect+0x17/0x3c
&lt;4&gt;[  574.475386]  bq_xmit_all+0x158/0x420
&lt;4&gt;[  574.475397]  __dev_flush+0x30/0x90
&lt;4&gt;[  574.475407]  veth_poll+0x216/0x250 [veth]
&lt;4&gt;[  574.475421]  __napi_poll+0x28/0x1c0
&lt;4&gt;[  574.475430]  net_rx_action+0x32d/0x3a0
&lt;4&gt;[  574.475441]  handle_softirqs+0xcb/0x2c0
&lt;4&gt;[  574.475451]  do_softirq+0x40/0x60
&lt;4&gt;[  574.475458]  &lt;/IRQ&gt;
&lt;4&gt;[  574.475461]  &lt;TASK&gt;
&lt;4&gt;[  574.475464]  __local_bh_enable_ip+0x66/0x70
&lt;4&gt;[  574.475471]  __dev_queue_xmit+0x268/0xe40
&lt;4&gt;[  574.475480]  ? selinux_ip_postroute+0x213/0x420
&lt;4&gt;[  574.475491]  ? alloc_skb_with_frags+0x4a/0x1d0
&lt;4&gt;[  574.475502]  ip6_finish_output2+0x2be/0x640
&lt;4&gt;[  574.475512]  ? nf_hook_slow+0x42/0xf0
&lt;4&gt;[  574.475521]  ip6_finish_output+0x194/0x300
&lt;4&gt;[  574.475529]  ? __pfx_ip6_finish_output+0x10/0x10
&lt;4&gt;[  574.475538]  mld_sendpack+0x17c/0x240
&lt;4&gt;[  574.475548]  mld_ifc_work+0x192/0x410
&lt;4&gt;[  574.475557]  process_one_work+0x15d/0x380
&lt;4&gt;[  574.475566]  worker_thread+0x29d/0x3a0
&lt;4&gt;[  574.475573]  ? __pfx_worker_thread+0x10/0x10
&lt;4&gt;[  574.475580]  ? __pfx_worker_thread+0x10/0x10
&lt;4&gt;[  574.475587]  kthread+0xcd/0x100
&lt;4&gt;[  574.475597]  ? __pfx_kthread+0x10/0x10
&lt;4&gt;[  574.475606]  ret_from_fork+0x31/0x50
&lt;4&gt;[  574.475615]  ? __pfx_kthread+0x10/0x10
&lt;4&gt;[  574.475623]  ret_from_fork_asm+0x1a/0x30
&lt;4&gt;[  574.475635]  &lt;/TASK&gt;
&lt;4&gt;[  574.475637] Modules linked in: veth br_netfilter bridge stp llc iwlmvm x86_pkg_temp_thermal iwlwifi efivarfs nvme nvme_core
&lt;4&gt;[  574.475662] CR2: 0000000000000000
&lt;4&gt;[  574.475668] ---[ end trace 0000000000000000 ]---

Therefore, provide it to the program by setting rxq properly.

Fixes: cb261b594b41 ("bpf: Run devmap xdp_prog on flush instead of bulk enqueue")
Reviewed-by: Toke Høiland-Jørgensen &lt;toke@redhat.com&gt;
Signed-off-by: Florian Kauer &lt;florian.kauer@linutronix.de&gt;
Acked-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Link: https://lore.kernel.org/r/20240911-devel-koalo-fix-ingress-ifindex-v4-1-5c643ae10258@linutronix.de
Signed-off-by: Martin KaFai Lau &lt;martin.lau@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
rxq contains a pointer to the device from where
the redirect happened. Currently, the BPF program
that was executed after a redirect via BPF_MAP_TYPE_DEVMAP*
does not have it set.

This is particularly bad since accessing ingress_ifindex, e.g.

SEC("xdp")
int prog(struct xdp_md *pkt)
{
        return bpf_redirect_map(&amp;dev_redirect_map, 0, 0);
}

SEC("xdp/devmap")
int prog_after_redirect(struct xdp_md *pkt)
{
        bpf_printk("ifindex %i", pkt-&gt;ingress_ifindex);
        return XDP_PASS;
}

depends on access to rxq, so a NULL pointer gets dereferenced:

&lt;1&gt;[  574.475170] BUG: kernel NULL pointer dereference, address: 0000000000000000
&lt;1&gt;[  574.475188] #PF: supervisor read access in kernel mode
&lt;1&gt;[  574.475194] #PF: error_code(0x0000) - not-present page
&lt;6&gt;[  574.475199] PGD 0 P4D 0
&lt;4&gt;[  574.475207] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
&lt;4&gt;[  574.475217] CPU: 4 UID: 0 PID: 217 Comm: kworker/4:1 Not tainted 6.11.0-rc5-reduced-00859-g780801200300 #23
&lt;4&gt;[  574.475226] Hardware name: Intel(R) Client Systems NUC13ANHi7/NUC13ANBi7, BIOS ANRPL357.0026.2023.0314.1458 03/14/2023
&lt;4&gt;[  574.475231] Workqueue: mld mld_ifc_work
&lt;4&gt;[  574.475247] RIP: 0010:bpf_prog_5e13354d9cf5018a_prog_after_redirect+0x17/0x3c
&lt;4&gt;[  574.475257] Code: cc cc cc cc cc cc cc 80 00 00 00 cc cc cc cc cc cc cc cc f3 0f 1e fa 0f 1f 44 00 00 66 90 55 48 89 e5 f3 0f 1e fa 48 8b 57 20 &lt;48&gt; 8b 52 00 8b 92 e0 00 00 00 48 bf f8 a6 d5 c4 5d a0 ff ff be 0b
&lt;4&gt;[  574.475263] RSP: 0018:ffffa62440280c98 EFLAGS: 00010206
&lt;4&gt;[  574.475269] RAX: ffffa62440280cd8 RBX: 0000000000000001 RCX: 0000000000000000
&lt;4&gt;[  574.475274] RDX: 0000000000000000 RSI: ffffa62440549048 RDI: ffffa62440280ce0
&lt;4&gt;[  574.475278] RBP: ffffa62440280c98 R08: 0000000000000002 R09: 0000000000000001
&lt;4&gt;[  574.475281] R10: ffffa05dc8b98000 R11: ffffa05f577fca40 R12: ffffa05dcab24000
&lt;4&gt;[  574.475285] R13: ffffa62440280ce0 R14: ffffa62440549048 R15: ffffa62440549000
&lt;4&gt;[  574.475289] FS:  0000000000000000(0000) GS:ffffa05f4f700000(0000) knlGS:0000000000000000
&lt;4&gt;[  574.475294] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
&lt;4&gt;[  574.475298] CR2: 0000000000000000 CR3: 000000025522e000 CR4: 0000000000f50ef0
&lt;4&gt;[  574.475303] PKRU: 55555554
&lt;4&gt;[  574.475306] Call Trace:
&lt;4&gt;[  574.475313]  &lt;IRQ&gt;
&lt;4&gt;[  574.475318]  ? __die+0x23/0x70
&lt;4&gt;[  574.475329]  ? page_fault_oops+0x180/0x4c0
&lt;4&gt;[  574.475339]  ? skb_pp_cow_data+0x34c/0x490
&lt;4&gt;[  574.475346]  ? kmem_cache_free+0x257/0x280
&lt;4&gt;[  574.475357]  ? exc_page_fault+0x67/0x150
&lt;4&gt;[  574.475368]  ? asm_exc_page_fault+0x26/0x30
&lt;4&gt;[  574.475381]  ? bpf_prog_5e13354d9cf5018a_prog_after_redirect+0x17/0x3c
&lt;4&gt;[  574.475386]  bq_xmit_all+0x158/0x420
&lt;4&gt;[  574.475397]  __dev_flush+0x30/0x90
&lt;4&gt;[  574.475407]  veth_poll+0x216/0x250 [veth]
&lt;4&gt;[  574.475421]  __napi_poll+0x28/0x1c0
&lt;4&gt;[  574.475430]  net_rx_action+0x32d/0x3a0
&lt;4&gt;[  574.475441]  handle_softirqs+0xcb/0x2c0
&lt;4&gt;[  574.475451]  do_softirq+0x40/0x60
&lt;4&gt;[  574.475458]  &lt;/IRQ&gt;
&lt;4&gt;[  574.475461]  &lt;TASK&gt;
&lt;4&gt;[  574.475464]  __local_bh_enable_ip+0x66/0x70
&lt;4&gt;[  574.475471]  __dev_queue_xmit+0x268/0xe40
&lt;4&gt;[  574.475480]  ? selinux_ip_postroute+0x213/0x420
&lt;4&gt;[  574.475491]  ? alloc_skb_with_frags+0x4a/0x1d0
&lt;4&gt;[  574.475502]  ip6_finish_output2+0x2be/0x640
&lt;4&gt;[  574.475512]  ? nf_hook_slow+0x42/0xf0
&lt;4&gt;[  574.475521]  ip6_finish_output+0x194/0x300
&lt;4&gt;[  574.475529]  ? __pfx_ip6_finish_output+0x10/0x10
&lt;4&gt;[  574.475538]  mld_sendpack+0x17c/0x240
&lt;4&gt;[  574.475548]  mld_ifc_work+0x192/0x410
&lt;4&gt;[  574.475557]  process_one_work+0x15d/0x380
&lt;4&gt;[  574.475566]  worker_thread+0x29d/0x3a0
&lt;4&gt;[  574.475573]  ? __pfx_worker_thread+0x10/0x10
&lt;4&gt;[  574.475580]  ? __pfx_worker_thread+0x10/0x10
&lt;4&gt;[  574.475587]  kthread+0xcd/0x100
&lt;4&gt;[  574.475597]  ? __pfx_kthread+0x10/0x10
&lt;4&gt;[  574.475606]  ret_from_fork+0x31/0x50
&lt;4&gt;[  574.475615]  ? __pfx_kthread+0x10/0x10
&lt;4&gt;[  574.475623]  ret_from_fork_asm+0x1a/0x30
&lt;4&gt;[  574.475635]  &lt;/TASK&gt;
&lt;4&gt;[  574.475637] Modules linked in: veth br_netfilter bridge stp llc iwlmvm x86_pkg_temp_thermal iwlwifi efivarfs nvme nvme_core
&lt;4&gt;[  574.475662] CR2: 0000000000000000
&lt;4&gt;[  574.475668] ---[ end trace 0000000000000000 ]---

Therefore, provide it to the program by setting rxq properly.

Fixes: cb261b594b41 ("bpf: Run devmap xdp_prog on flush instead of bulk enqueue")
Reviewed-by: Toke Høiland-Jørgensen &lt;toke@redhat.com&gt;
Signed-off-by: Florian Kauer &lt;florian.kauer@linutronix.de&gt;
Acked-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Link: https://lore.kernel.org/r/20240911-devel-koalo-fix-ingress-ifindex-v4-1-5c643ae10258@linutronix.de
Signed-off-by: Martin KaFai Lau &lt;martin.lau@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next</title>
<updated>2024-07-09T15:01:46+00:00</updated>
<author>
<name>Paolo Abeni</name>
<email>pabeni@redhat.com</email>
</author>
<published>2024-07-09T15:01:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=7b769adc2612b495d94a4b4537ffaa725861d763'/>
<id>7b769adc2612b495d94a4b4537ffaa725861d763</id>
<content type='text'>
Daniel Borkmann says:

====================
pull-request: bpf-next 2024-07-08

The following pull-request contains BPF updates for your *net-next* tree.

We've added 102 non-merge commits during the last 28 day(s) which contain
a total of 127 files changed, 4606 insertions(+), 980 deletions(-).

The main changes are:

1) Support resilient split BTF which cuts down on duplication and makes BTF
   as compact as possible wrt BTF from modules, from Alan Maguire &amp; Eduard Zingerman.

2) Add support for dumping kfunc prototypes from BTF which enables both detecting
   as well as dumping compilable prototypes for kfuncs, from Daniel Xu.

3) Batch of s390x BPF JIT improvements to add support for BPF arena and to implement
   support for BPF exceptions, from Ilya Leoshkevich.

4) Batch of riscv64 BPF JIT improvements in particular to add 12-argument support
   for BPF trampolines and to utilize bpf_prog_pack for the latter, from Pu Lehui.

5) Extend BPF test infrastructure to add a CHECKSUM_COMPLETE validation option
   for skbs and add coverage along with it, from Vadim Fedorenko.

6) Inline bpf_get_current_task/_btf() helpers in the arm64 BPF JIT which gives
   a small 1% performance improvement in micro-benchmarks, from Puranjay Mohan.

7) Extend the BPF verifier to track the delta between linked registers in order
   to better deal with recent LLVM code optimizations, from Alexei Starovoitov.

8) Fix bpf_wq_set_callback_impl() kfunc signature where the third argument should
   have been a pointer to the map value, from Benjamin Tissoires.

9) Extend BPF selftests to add regular expression support for test output matching
   and adjust some of the selftest when compiled under gcc, from Cupertino Miranda.

10) Simplify task_file_seq_get_next() and remove an unnecessary loop which always
    iterates exactly once anyway, from Dan Carpenter.

11) Add the capability to offload the netfilter flowtable in XDP layer through
    kfuncs, from Florian Westphal &amp; Lorenzo Bianconi.

12) Various cleanups in networking helpers in BPF selftests to shave off a few
    lines of open-coded functions on client/server handling, from Geliang Tang.

13) Properly propagate prog-&gt;aux-&gt;tail_call_reachable out of BPF verifier, so
    that x86 JIT does not need to implement detection, from Leon Hwang.

14) Fix BPF verifier to add a missing check_func_arg_reg_off() to prevent an
    out-of-bounds memory access for dynpointers, from Matt Bobrowski.

15) Fix bpf_session_cookie() kfunc to return __u64 instead of long pointer as
    it might lead to problems on 32-bit archs, from Jiri Olsa.

16) Enhance traffic validation and dynamic batch size support in xsk selftests,
    from Tushar Vyavahare.

bpf-next-for-netdev

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (102 commits)
  selftests/bpf: DENYLIST.aarch64: Remove fexit_sleep
  selftests/bpf: amend for wrong bpf_wq_set_callback_impl signature
  bpf: helpers: fix bpf_wq_set_callback_impl signature
  libbpf: Add NULL checks to bpf_object__{prev_map,next_map}
  selftests/bpf: Remove exceptions tests from DENYLIST.s390x
  s390/bpf: Implement exceptions
  s390/bpf: Change seen_reg to a mask
  bpf: Remove unnecessary loop in task_file_seq_get_next()
  riscv, bpf: Optimize stack usage of trampoline
  bpf, devmap: Add .map_alloc_check
  selftests/bpf: Remove arena tests from DENYLIST.s390x
  selftests/bpf: Add UAF tests for arena atomics
  selftests/bpf: Introduce __arena_global
  s390/bpf: Support arena atomics
  s390/bpf: Enable arena
  s390/bpf: Support address space cast instruction
  s390/bpf: Support BPF_PROBE_MEM32
  s390/bpf: Land on the next JITed instruction after exception
  s390/bpf: Introduce pre- and post- probe functions
  s390/bpf: Get rid of get_probe_mem_regno()
  ...
====================

Link: https://patch.msgid.link/20240708221438.10974-1-daniel@iogearbox.net
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Daniel Borkmann says:

====================
pull-request: bpf-next 2024-07-08

The following pull-request contains BPF updates for your *net-next* tree.

We've added 102 non-merge commits during the last 28 day(s) which contain
a total of 127 files changed, 4606 insertions(+), 980 deletions(-).

The main changes are:

1) Support resilient split BTF which cuts down on duplication and makes BTF
   as compact as possible wrt BTF from modules, from Alan Maguire &amp; Eduard Zingerman.

2) Add support for dumping kfunc prototypes from BTF which enables both detecting
   as well as dumping compilable prototypes for kfuncs, from Daniel Xu.

3) Batch of s390x BPF JIT improvements to add support for BPF arena and to implement
   support for BPF exceptions, from Ilya Leoshkevich.

4) Batch of riscv64 BPF JIT improvements in particular to add 12-argument support
   for BPF trampolines and to utilize bpf_prog_pack for the latter, from Pu Lehui.

5) Extend BPF test infrastructure to add a CHECKSUM_COMPLETE validation option
   for skbs and add coverage along with it, from Vadim Fedorenko.

6) Inline bpf_get_current_task/_btf() helpers in the arm64 BPF JIT which gives
   a small 1% performance improvement in micro-benchmarks, from Puranjay Mohan.

7) Extend the BPF verifier to track the delta between linked registers in order
   to better deal with recent LLVM code optimizations, from Alexei Starovoitov.

8) Fix bpf_wq_set_callback_impl() kfunc signature where the third argument should
   have been a pointer to the map value, from Benjamin Tissoires.

9) Extend BPF selftests to add regular expression support for test output matching
   and adjust some of the selftest when compiled under gcc, from Cupertino Miranda.

10) Simplify task_file_seq_get_next() and remove an unnecessary loop which always
    iterates exactly once anyway, from Dan Carpenter.

11) Add the capability to offload the netfilter flowtable in XDP layer through
    kfuncs, from Florian Westphal &amp; Lorenzo Bianconi.

12) Various cleanups in networking helpers in BPF selftests to shave off a few
    lines of open-coded functions on client/server handling, from Geliang Tang.

13) Properly propagate prog-&gt;aux-&gt;tail_call_reachable out of BPF verifier, so
    that x86 JIT does not need to implement detection, from Leon Hwang.

14) Fix BPF verifier to add a missing check_func_arg_reg_off() to prevent an
    out-of-bounds memory access for dynpointers, from Matt Bobrowski.

15) Fix bpf_session_cookie() kfunc to return __u64 instead of long pointer as
    it might lead to problems on 32-bit archs, from Jiri Olsa.

16) Enhance traffic validation and dynamic batch size support in xsk selftests,
    from Tushar Vyavahare.

bpf-next-for-netdev

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (102 commits)
  selftests/bpf: DENYLIST.aarch64: Remove fexit_sleep
  selftests/bpf: amend for wrong bpf_wq_set_callback_impl signature
  bpf: helpers: fix bpf_wq_set_callback_impl signature
  libbpf: Add NULL checks to bpf_object__{prev_map,next_map}
  selftests/bpf: Remove exceptions tests from DENYLIST.s390x
  s390/bpf: Implement exceptions
  s390/bpf: Change seen_reg to a mask
  bpf: Remove unnecessary loop in task_file_seq_get_next()
  riscv, bpf: Optimize stack usage of trampoline
  bpf, devmap: Add .map_alloc_check
  selftests/bpf: Remove arena tests from DENYLIST.s390x
  selftests/bpf: Add UAF tests for arena atomics
  selftests/bpf: Introduce __arena_global
  s390/bpf: Support arena atomics
  s390/bpf: Enable arena
  s390/bpf: Support address space cast instruction
  s390/bpf: Support BPF_PROBE_MEM32
  s390/bpf: Land on the next JITed instruction after exception
  s390/bpf: Introduce pre- and post- probe functions
  s390/bpf: Get rid of get_probe_mem_regno()
  ...
====================

Link: https://patch.msgid.link/20240708221438.10974-1-daniel@iogearbox.net
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bpf, devmap: Add .map_alloc_check</title>
<updated>2024-07-02T17:05:25+00:00</updated>
<author>
<name>Florian Lehner</name>
<email>dev@der-flo.net</email>
</author>
<published>2024-06-15T10:11:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=fd8db07705c55a995c42b1e71afc42faad675b0b'/>
<id>fd8db07705c55a995c42b1e71afc42faad675b0b</id>
<content type='text'>
Use the .map_allock_check callback to perform allocation checks before
allocating memory for the devmap.

Signed-off-by: Florian Lehner &lt;dev@der-flo.net&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Link: https://lore.kernel.org/bpf/20240615101158.57889-1-dev@der-flo.net
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Use the .map_allock_check callback to perform allocation checks before
allocating memory for the devmap.

Signed-off-by: Florian Lehner &lt;dev@der-flo.net&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Link: https://lore.kernel.org/bpf/20240615101158.57889-1-dev@der-flo.net
</pre>
</div>
</content>
</entry>
<entry>
<title>net: Move flush list retrieval to where it is used.</title>
<updated>2024-07-02T13:26:57+00:00</updated>
<author>
<name>Sebastian Andrzej Siewior</name>
<email>bigeasy@linutronix.de</email>
</author>
<published>2024-06-28T10:18:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=e3d69f585d651aba877e18866de7e8cfa2476caa'/>
<id>e3d69f585d651aba877e18866de7e8cfa2476caa</id>
<content type='text'>
The bpf_net_ctx_get_.*_flush_list() are used at the top of the function.
This means the variable is always assigned even if unused. By moving the
function to where it is used, it is possible to delay the initialisation
until it is unavoidable.
Not sure how much this gains in reality but by looking at bq_enqueue()
(in devmap.c) gcc pushes one register less to the stack. \o/.

 Move flush list retrieval to where it is used.

Signed-off-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Acked-by: Jesper Dangaard Brouer &lt;hawk@kernel.org&gt;
Reviewed-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The bpf_net_ctx_get_.*_flush_list() are used at the top of the function.
This means the variable is always assigned even if unused. By moving the
function to where it is used, it is possible to delay the initialisation
until it is unavoidable.
Not sure how much this gains in reality but by looking at bq_enqueue()
(in devmap.c) gcc pushes one register less to the stack. \o/.

 Move flush list retrieval to where it is used.

Signed-off-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Acked-by: Jesper Dangaard Brouer &lt;hawk@kernel.org&gt;
Reviewed-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: Optimize xdp_do_flush() with bpf_net_context infos.</title>
<updated>2024-07-02T13:26:57+00:00</updated>
<author>
<name>Sebastian Andrzej Siewior</name>
<email>bigeasy@linutronix.de</email>
</author>
<published>2024-06-28T10:18:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=d839a73179ae91c07f5f2f97ccb9c69b2b7c3306'/>
<id>d839a73179ae91c07f5f2f97ccb9c69b2b7c3306</id>
<content type='text'>
Every NIC driver utilizing XDP should invoke xdp_do_flush() after
processing all packages. With the introduction of the bpf_net_context
logic the flush lists (for dev, CPU-map and xsk) are lazy initialized
only if used. However xdp_do_flush() tries to flush all three of them so
all three lists are always initialized and the likely empty lists are
"iterated".
Without the usage of XDP but with CONFIG_DEBUG_NET the lists are also
initialized due to xdp_do_check_flushed().

Jakub suggest to utilize the hints in bpf_net_context and avoid invoking
the flush function. This will also avoiding initializing the lists which
are otherwise unused.

Introduce bpf_net_ctx_get_all_used_flush_lists() to return the
individual list if not-empty. Use the logic in xdp_do_flush() and
xdp_do_check_flushed(). Remove the not needed .*_check_flush().

Suggested-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Reviewed-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Every NIC driver utilizing XDP should invoke xdp_do_flush() after
processing all packages. With the introduction of the bpf_net_context
logic the flush lists (for dev, CPU-map and xsk) are lazy initialized
only if used. However xdp_do_flush() tries to flush all three of them so
all three lists are always initialized and the likely empty lists are
"iterated".
Without the usage of XDP but with CONFIG_DEBUG_NET the lists are also
initialized due to xdp_do_check_flushed().

Jakub suggest to utilize the hints in bpf_net_context and avoid invoking
the flush function. This will also avoiding initializing the lists which
are otherwise unused.

Introduce bpf_net_ctx_get_all_used_flush_lists() to return the
individual list if not-empty. Use the logic in xdp_do_flush() and
xdp_do_check_flushed(). Remove the not needed .*_check_flush().

Suggested-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Reviewed-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: Move per-CPU flush-lists to bpf_net_context on PREEMPT_RT.</title>
<updated>2024-06-24T23:41:24+00:00</updated>
<author>
<name>Sebastian Andrzej Siewior</name>
<email>bigeasy@linutronix.de</email>
</author>
<published>2024-06-20T13:22:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=3f9fe37d9e16a6cfd5f4d1f536686ea71db3196f'/>
<id>3f9fe37d9e16a6cfd5f4d1f536686ea71db3196f</id>
<content type='text'>
The per-CPU flush lists, which are accessed from within the NAPI callback
(xdp_do_flush() for instance), are per-CPU. There are subject to the
same problem as struct bpf_redirect_info.

Add the per-CPU lists cpu_map_flush_list, dev_map_flush_list and
xskmap_map_flush_list to struct bpf_net_context. Add wrappers for the
access. The lists initialized on first usage (similar to
bpf_net_ctx_get_ri()).

Cc: "Björn Töpel" &lt;bjorn@kernel.org&gt;
Cc: Alexei Starovoitov &lt;ast@kernel.org&gt;
Cc: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Cc: Eduard Zingerman &lt;eddyz87@gmail.com&gt;
Cc: Hao Luo &lt;haoluo@google.com&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: John Fastabend &lt;john.fastabend@gmail.com&gt;
Cc: Jonathan Lemon &lt;jonathan.lemon@gmail.com&gt;
Cc: KP Singh &lt;kpsingh@kernel.org&gt;
Cc: Maciej Fijalkowski &lt;maciej.fijalkowski@intel.com&gt;
Cc: Magnus Karlsson &lt;magnus.karlsson@intel.com&gt;
Cc: Martin KaFai Lau &lt;martin.lau@linux.dev&gt;
Cc: Song Liu &lt;song@kernel.org&gt;
Cc: Stanislav Fomichev &lt;sdf@google.com&gt;
Cc: Yonghong Song &lt;yonghong.song@linux.dev&gt;
Acked-by: Jesper Dangaard Brouer &lt;hawk@kernel.org&gt;
Reviewed-by: Toke Høiland-Jørgensen &lt;toke@redhat.com&gt;
Signed-off-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Link: https://patch.msgid.link/20240620132727.660738-16-bigeasy@linutronix.de
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The per-CPU flush lists, which are accessed from within the NAPI callback
(xdp_do_flush() for instance), are per-CPU. There are subject to the
same problem as struct bpf_redirect_info.

Add the per-CPU lists cpu_map_flush_list, dev_map_flush_list and
xskmap_map_flush_list to struct bpf_net_context. Add wrappers for the
access. The lists initialized on first usage (similar to
bpf_net_ctx_get_ri()).

Cc: "Björn Töpel" &lt;bjorn@kernel.org&gt;
Cc: Alexei Starovoitov &lt;ast@kernel.org&gt;
Cc: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Cc: Eduard Zingerman &lt;eddyz87@gmail.com&gt;
Cc: Hao Luo &lt;haoluo@google.com&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: John Fastabend &lt;john.fastabend@gmail.com&gt;
Cc: Jonathan Lemon &lt;jonathan.lemon@gmail.com&gt;
Cc: KP Singh &lt;kpsingh@kernel.org&gt;
Cc: Maciej Fijalkowski &lt;maciej.fijalkowski@intel.com&gt;
Cc: Magnus Karlsson &lt;magnus.karlsson@intel.com&gt;
Cc: Martin KaFai Lau &lt;martin.lau@linux.dev&gt;
Cc: Song Liu &lt;song@kernel.org&gt;
Cc: Stanislav Fomichev &lt;sdf@google.com&gt;
Cc: Yonghong Song &lt;yonghong.song@linux.dev&gt;
Acked-by: Jesper Dangaard Brouer &lt;hawk@kernel.org&gt;
Reviewed-by: Toke Høiland-Jørgensen &lt;toke@redhat.com&gt;
Signed-off-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Link: https://patch.msgid.link/20240620132727.660738-16-bigeasy@linutronix.de
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: Reference bpf_redirect_info via task_struct on PREEMPT_RT.</title>
<updated>2024-06-24T23:41:24+00:00</updated>
<author>
<name>Sebastian Andrzej Siewior</name>
<email>bigeasy@linutronix.de</email>
</author>
<published>2024-06-20T13:22:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=401cb7dae8130fd34eb84648e02ab4c506df7d5e'/>
<id>401cb7dae8130fd34eb84648e02ab4c506df7d5e</id>
<content type='text'>
The XDP redirect process is two staged:
- bpf_prog_run_xdp() is invoked to run a eBPF program which inspects the
  packet and makes decisions. While doing that, the per-CPU variable
  bpf_redirect_info is used.

- Afterwards xdp_do_redirect() is invoked and accesses bpf_redirect_info
  and it may also access other per-CPU variables like xskmap_flush_list.

At the very end of the NAPI callback, xdp_do_flush() is invoked which
does not access bpf_redirect_info but will touch the individual per-CPU
lists.

The per-CPU variables are only used in the NAPI callback hence disabling
bottom halves is the only protection mechanism. Users from preemptible
context (like cpu_map_kthread_run()) explicitly disable bottom halves
for protections reasons.
Without locking in local_bh_disable() on PREEMPT_RT this data structure
requires explicit locking.

PREEMPT_RT has forced-threaded interrupts enabled and every
NAPI-callback runs in a thread. If each thread has its own data
structure then locking can be avoided.

Create a struct bpf_net_context which contains struct bpf_redirect_info.
Define the variable on stack, use bpf_net_ctx_set() to save a pointer to
it, bpf_net_ctx_clear() removes it again.
The bpf_net_ctx_set() may nest. For instance a function can be used from
within NET_RX_SOFTIRQ/ net_rx_action which uses bpf_net_ctx_set() and
NET_TX_SOFTIRQ which does not. Therefore only the first invocations
updates the pointer.
Use bpf_net_ctx_get_ri() as a wrapper to retrieve the current struct
bpf_redirect_info. The returned data structure is zero initialized to
ensure nothing is leaked from stack. This is done on first usage of the
struct. bpf_net_ctx_set() sets bpf_redirect_info::kern_flags to 0 to
note that initialisation is required. First invocation of
bpf_net_ctx_get_ri() will memset() the data structure and update
bpf_redirect_info::kern_flags.
bpf_redirect_info::nh is excluded from memset because it is only used
once BPF_F_NEIGH is set which also sets the nh member. The kern_flags is
moved past nh to exclude it from memset.

The pointer to bpf_net_context is saved task's task_struct. Using
always the bpf_net_context approach has the advantage that there is
almost zero differences between PREEMPT_RT and non-PREEMPT_RT builds.

Cc: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Cc: Eduard Zingerman &lt;eddyz87@gmail.com&gt;
Cc: Hao Luo &lt;haoluo@google.com&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: John Fastabend &lt;john.fastabend@gmail.com&gt;
Cc: KP Singh &lt;kpsingh@kernel.org&gt;
Cc: Martin KaFai Lau &lt;martin.lau@linux.dev&gt;
Cc: Song Liu &lt;song@kernel.org&gt;
Cc: Stanislav Fomichev &lt;sdf@google.com&gt;
Cc: Yonghong Song &lt;yonghong.song@linux.dev&gt;
Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Jesper Dangaard Brouer &lt;hawk@kernel.org&gt;
Reviewed-by: Toke Høiland-Jørgensen &lt;toke@redhat.com&gt;
Signed-off-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Link: https://patch.msgid.link/20240620132727.660738-15-bigeasy@linutronix.de
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The XDP redirect process is two staged:
- bpf_prog_run_xdp() is invoked to run a eBPF program which inspects the
  packet and makes decisions. While doing that, the per-CPU variable
  bpf_redirect_info is used.

- Afterwards xdp_do_redirect() is invoked and accesses bpf_redirect_info
  and it may also access other per-CPU variables like xskmap_flush_list.

At the very end of the NAPI callback, xdp_do_flush() is invoked which
does not access bpf_redirect_info but will touch the individual per-CPU
lists.

The per-CPU variables are only used in the NAPI callback hence disabling
bottom halves is the only protection mechanism. Users from preemptible
context (like cpu_map_kthread_run()) explicitly disable bottom halves
for protections reasons.
Without locking in local_bh_disable() on PREEMPT_RT this data structure
requires explicit locking.

PREEMPT_RT has forced-threaded interrupts enabled and every
NAPI-callback runs in a thread. If each thread has its own data
structure then locking can be avoided.

Create a struct bpf_net_context which contains struct bpf_redirect_info.
Define the variable on stack, use bpf_net_ctx_set() to save a pointer to
it, bpf_net_ctx_clear() removes it again.
The bpf_net_ctx_set() may nest. For instance a function can be used from
within NET_RX_SOFTIRQ/ net_rx_action which uses bpf_net_ctx_set() and
NET_TX_SOFTIRQ which does not. Therefore only the first invocations
updates the pointer.
Use bpf_net_ctx_get_ri() as a wrapper to retrieve the current struct
bpf_redirect_info. The returned data structure is zero initialized to
ensure nothing is leaked from stack. This is done on first usage of the
struct. bpf_net_ctx_set() sets bpf_redirect_info::kern_flags to 0 to
note that initialisation is required. First invocation of
bpf_net_ctx_get_ri() will memset() the data structure and update
bpf_redirect_info::kern_flags.
bpf_redirect_info::nh is excluded from memset because it is only used
once BPF_F_NEIGH is set which also sets the nh member. The kern_flags is
moved past nh to exclude it from memset.

The pointer to bpf_net_context is saved task's task_struct. Using
always the bpf_net_context approach has the advantage that there is
almost zero differences between PREEMPT_RT and non-PREEMPT_RT builds.

Cc: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Cc: Eduard Zingerman &lt;eddyz87@gmail.com&gt;
Cc: Hao Luo &lt;haoluo@google.com&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: John Fastabend &lt;john.fastabend@gmail.com&gt;
Cc: KP Singh &lt;kpsingh@kernel.org&gt;
Cc: Martin KaFai Lau &lt;martin.lau@linux.dev&gt;
Cc: Song Liu &lt;song@kernel.org&gt;
Cc: Stanislav Fomichev &lt;sdf@google.com&gt;
Cc: Yonghong Song &lt;yonghong.song@linux.dev&gt;
Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Jesper Dangaard Brouer &lt;hawk@kernel.org&gt;
Reviewed-by: Toke Høiland-Jørgensen &lt;toke@redhat.com&gt;
Signed-off-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Link: https://patch.msgid.link/20240620132727.660738-15-bigeasy@linutronix.de
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bpf, devmap: Remove unnecessary if check in for loop</title>
<updated>2024-06-03T15:09:23+00:00</updated>
<author>
<name>Thorsten Blum</name>
<email>thorsten.blum@toblux.com</email>
</author>
<published>2024-05-29T10:19:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=2317dc2c22cc353b699c7d1db47b2fe91f54055c'/>
<id>2317dc2c22cc353b699c7d1db47b2fe91f54055c</id>
<content type='text'>
The iterator variable dst cannot be NULL and the if check can be removed.
Remove it and fix the following Coccinelle/coccicheck warning reported
by itnull.cocci:

	ERROR: iterator variable bound on line 762 cannot be NULL

Signed-off-by: Thorsten Blum &lt;thorsten.blum@toblux.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Reviewed-by: Toke Høiland-Jørgensen &lt;toke@redhat.com&gt;
Acked-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Link: https://lore.kernel.org/bpf/20240529101900.103913-2-thorsten.blum@toblux.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The iterator variable dst cannot be NULL and the if check can be removed.
Remove it and fix the following Coccinelle/coccicheck warning reported
by itnull.cocci:

	ERROR: iterator variable bound on line 762 cannot be NULL

Signed-off-by: Thorsten Blum &lt;thorsten.blum@toblux.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Reviewed-by: Toke Høiland-Jørgensen &lt;toke@redhat.com&gt;
Acked-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Link: https://lore.kernel.org/bpf/20240529101900.103913-2-thorsten.blum@toblux.com
</pre>
</div>
</content>
</entry>
</feed>
