<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/kernel/bpf/devmap.c, branch v6.1.168</title>
<subtitle>Clone of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git</subtitle>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/'/>
<entry>
<title>bpf: Fix stack-out-of-bounds write in devmap</title>
<updated>2026-03-25T10:02:54+00:00</updated>
<author>
<name>Kohei Enju</name>
<email>kohei@enjuk.jp</email>
</author>
<published>2026-02-25T05:34:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=5000e40acc8d0c36ab709662e32120986ac22e7e'/>
<id>5000e40acc8d0c36ab709662e32120986ac22e7e</id>
<content type='text'>
[ Upstream commit b7bf516c3ecd9a2aae2dc2635178ab87b734fef1 ]

get_upper_ifindexes() iterates over all upper devices and writes their
indices into an array without checking bounds.

Also the callers assume that the max number of upper devices is
MAX_NEST_DEV and allocate excluded_devices[1+MAX_NEST_DEV] on the stack,
but that assumption is not correct and the number of upper devices could
be larger than MAX_NEST_DEV (e.g., many macvlans), causing a
stack-out-of-bounds write.

Add a max parameter to get_upper_ifindexes() to avoid the issue.
When there are too many upper devices, return -EOVERFLOW and abort the
redirect.

To reproduce, create more than MAX_NEST_DEV(8) macvlans on a device with
an XDP program attached using BPF_F_BROADCAST | BPF_F_EXCLUDE_INGRESS.
Then send a packet to the device to trigger the XDP redirect path.

Reported-by: syzbot+10cc7f13760b31bd2e61@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/698c4ce3.050a0220.340abe.000b.GAE@google.com/T/
Fixes: aeea1b86f936 ("bpf, devmap: Exclude XDP broadcast to master device")
Reviewed-by: Toke Høiland-Jørgensen &lt;toke@redhat.com&gt;
Signed-off-by: Kohei Enju &lt;kohei@enjuk.jp&gt;
Link: https://lore.kernel.org/r/20260225053506.4738-1-kohei@enjuk.jp
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit b7bf516c3ecd9a2aae2dc2635178ab87b734fef1 ]

get_upper_ifindexes() iterates over all upper devices and writes their
indices into an array without checking bounds.

Also the callers assume that the max number of upper devices is
MAX_NEST_DEV and allocate excluded_devices[1+MAX_NEST_DEV] on the stack,
but that assumption is not correct and the number of upper devices could
be larger than MAX_NEST_DEV (e.g., many macvlans), causing a
stack-out-of-bounds write.

Add a max parameter to get_upper_ifindexes() to avoid the issue.
When there are too many upper devices, return -EOVERFLOW and abort the
redirect.

To reproduce, create more than MAX_NEST_DEV(8) macvlans on a device with
an XDP program attached using BPF_F_BROADCAST | BPF_F_EXCLUDE_INGRESS.
Then send a packet to the device to trigger the XDP redirect path.

Reported-by: syzbot+10cc7f13760b31bd2e61@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/698c4ce3.050a0220.340abe.000b.GAE@google.com/T/
Fixes: aeea1b86f936 ("bpf, devmap: Exclude XDP broadcast to master device")
Reviewed-by: Toke Høiland-Jørgensen &lt;toke@redhat.com&gt;
Signed-off-by: Kohei Enju &lt;kohei@enjuk.jp&gt;
Link: https://lore.kernel.org/r/20260225053506.4738-1-kohei@enjuk.jp
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bpf: fix OOB devmap writes when deleting elements</title>
<updated>2024-12-14T18:54:35+00:00</updated>
<author>
<name>Maciej Fijalkowski</name>
<email>maciej.fijalkowski@intel.com</email>
</author>
<published>2024-11-22T12:10:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=98c03d05936d846073df8f550e9e8bf0dde1d77f'/>
<id>98c03d05936d846073df8f550e9e8bf0dde1d77f</id>
<content type='text'>
commit ab244dd7cf4c291f82faacdc50b45cc0f55b674d upstream.

Jordy reported issue against XSKMAP which also applies to DEVMAP - the
index used for accessing map entry, due to being a signed integer,
causes the OOB writes. Fix is simple as changing the type from int to
u32, however, when compared to XSKMAP case, one more thing needs to be
addressed.

When map is released from system via dev_map_free(), we iterate through
all of the entries and an iterator variable is also an int, which
implies OOB accesses. Again, change it to be u32.

Example splat below:

[  160.724676] BUG: unable to handle page fault for address: ffffc8fc2c001000
[  160.731662] #PF: supervisor read access in kernel mode
[  160.736876] #PF: error_code(0x0000) - not-present page
[  160.742095] PGD 0 P4D 0
[  160.744678] Oops: Oops: 0000 [#1] PREEMPT SMP
[  160.749106] CPU: 1 UID: 0 PID: 520 Comm: kworker/u145:12 Not tainted 6.12.0-rc1+ #487
[  160.757050] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
[  160.767642] Workqueue: events_unbound bpf_map_free_deferred
[  160.773308] RIP: 0010:dev_map_free+0x77/0x170
[  160.777735] Code: 00 e8 fd 91 ed ff e8 b8 73 ed ff 41 83 7d 18 19 74 6e 41 8b 45 24 49 8b bd f8 00 00 00 31 db 85 c0 74 48 48 63 c3 48 8d 04 c7 &lt;48&gt; 8b 28 48 85 ed 74 30 48 8b 7d 18 48 85 ff 74 05 e8 b3 52 fa ff
[  160.796777] RSP: 0018:ffffc9000ee1fe38 EFLAGS: 00010202
[  160.802086] RAX: ffffc8fc2c001000 RBX: 0000000080000000 RCX: 0000000000000024
[  160.809331] RDX: 0000000000000000 RSI: 0000000000000024 RDI: ffffc9002c001000
[  160.816576] RBP: 0000000000000000 R08: 0000000000000023 R09: 0000000000000001
[  160.823823] R10: 0000000000000001 R11: 00000000000ee6b2 R12: dead000000000122
[  160.831066] R13: ffff88810c928e00 R14: ffff8881002df405 R15: 0000000000000000
[  160.838310] FS:  0000000000000000(0000) GS:ffff8897e0c40000(0000) knlGS:0000000000000000
[  160.846528] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  160.852357] CR2: ffffc8fc2c001000 CR3: 0000000005c32006 CR4: 00000000007726f0
[  160.859604] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  160.866847] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  160.874092] PKRU: 55555554
[  160.876847] Call Trace:
[  160.879338]  &lt;TASK&gt;
[  160.881477]  ? __die+0x20/0x60
[  160.884586]  ? page_fault_oops+0x15a/0x450
[  160.888746]  ? search_extable+0x22/0x30
[  160.892647]  ? search_bpf_extables+0x5f/0x80
[  160.896988]  ? exc_page_fault+0xa9/0x140
[  160.900973]  ? asm_exc_page_fault+0x22/0x30
[  160.905232]  ? dev_map_free+0x77/0x170
[  160.909043]  ? dev_map_free+0x58/0x170
[  160.912857]  bpf_map_free_deferred+0x51/0x90
[  160.917196]  process_one_work+0x142/0x370
[  160.921272]  worker_thread+0x29e/0x3b0
[  160.925082]  ? rescuer_thread+0x4b0/0x4b0
[  160.929157]  kthread+0xd4/0x110
[  160.932355]  ? kthread_park+0x80/0x80
[  160.936079]  ret_from_fork+0x2d/0x50
[  160.943396]  ? kthread_park+0x80/0x80
[  160.950803]  ret_from_fork_asm+0x11/0x20
[  160.958482]  &lt;/TASK&gt;

Fixes: 546ac1ffb70d ("bpf: add devmap, a map for storing net device references")
CC: stable@vger.kernel.org
Reported-by: Jordy Zomer &lt;jordyzomer@google.com&gt;
Suggested-by: Jordy Zomer &lt;jordyzomer@google.com&gt;
Reviewed-by: Toke Høiland-Jørgensen &lt;toke@redhat.com&gt;
Acked-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Signed-off-by: Maciej Fijalkowski &lt;maciej.fijalkowski@intel.com&gt;
Link: https://lore.kernel.org/r/20241122121030.716788-3-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit ab244dd7cf4c291f82faacdc50b45cc0f55b674d upstream.

Jordy reported issue against XSKMAP which also applies to DEVMAP - the
index used for accessing map entry, due to being a signed integer,
causes the OOB writes. Fix is simple as changing the type from int to
u32, however, when compared to XSKMAP case, one more thing needs to be
addressed.

When map is released from system via dev_map_free(), we iterate through
all of the entries and an iterator variable is also an int, which
implies OOB accesses. Again, change it to be u32.

Example splat below:

[  160.724676] BUG: unable to handle page fault for address: ffffc8fc2c001000
[  160.731662] #PF: supervisor read access in kernel mode
[  160.736876] #PF: error_code(0x0000) - not-present page
[  160.742095] PGD 0 P4D 0
[  160.744678] Oops: Oops: 0000 [#1] PREEMPT SMP
[  160.749106] CPU: 1 UID: 0 PID: 520 Comm: kworker/u145:12 Not tainted 6.12.0-rc1+ #487
[  160.757050] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
[  160.767642] Workqueue: events_unbound bpf_map_free_deferred
[  160.773308] RIP: 0010:dev_map_free+0x77/0x170
[  160.777735] Code: 00 e8 fd 91 ed ff e8 b8 73 ed ff 41 83 7d 18 19 74 6e 41 8b 45 24 49 8b bd f8 00 00 00 31 db 85 c0 74 48 48 63 c3 48 8d 04 c7 &lt;48&gt; 8b 28 48 85 ed 74 30 48 8b 7d 18 48 85 ff 74 05 e8 b3 52 fa ff
[  160.796777] RSP: 0018:ffffc9000ee1fe38 EFLAGS: 00010202
[  160.802086] RAX: ffffc8fc2c001000 RBX: 0000000080000000 RCX: 0000000000000024
[  160.809331] RDX: 0000000000000000 RSI: 0000000000000024 RDI: ffffc9002c001000
[  160.816576] RBP: 0000000000000000 R08: 0000000000000023 R09: 0000000000000001
[  160.823823] R10: 0000000000000001 R11: 00000000000ee6b2 R12: dead000000000122
[  160.831066] R13: ffff88810c928e00 R14: ffff8881002df405 R15: 0000000000000000
[  160.838310] FS:  0000000000000000(0000) GS:ffff8897e0c40000(0000) knlGS:0000000000000000
[  160.846528] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  160.852357] CR2: ffffc8fc2c001000 CR3: 0000000005c32006 CR4: 00000000007726f0
[  160.859604] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  160.866847] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  160.874092] PKRU: 55555554
[  160.876847] Call Trace:
[  160.879338]  &lt;TASK&gt;
[  160.881477]  ? __die+0x20/0x60
[  160.884586]  ? page_fault_oops+0x15a/0x450
[  160.888746]  ? search_extable+0x22/0x30
[  160.892647]  ? search_bpf_extables+0x5f/0x80
[  160.896988]  ? exc_page_fault+0xa9/0x140
[  160.900973]  ? asm_exc_page_fault+0x22/0x30
[  160.905232]  ? dev_map_free+0x77/0x170
[  160.909043]  ? dev_map_free+0x58/0x170
[  160.912857]  bpf_map_free_deferred+0x51/0x90
[  160.917196]  process_one_work+0x142/0x370
[  160.921272]  worker_thread+0x29e/0x3b0
[  160.925082]  ? rescuer_thread+0x4b0/0x4b0
[  160.929157]  kthread+0xd4/0x110
[  160.932355]  ? kthread_park+0x80/0x80
[  160.936079]  ret_from_fork+0x2d/0x50
[  160.943396]  ? kthread_park+0x80/0x80
[  160.950803]  ret_from_fork_asm+0x11/0x20
[  160.958482]  &lt;/TASK&gt;

Fixes: 546ac1ffb70d ("bpf: add devmap, a map for storing net device references")
CC: stable@vger.kernel.org
Reported-by: Jordy Zomer &lt;jordyzomer@google.com&gt;
Suggested-by: Jordy Zomer &lt;jordyzomer@google.com&gt;
Reviewed-by: Toke Høiland-Jørgensen &lt;toke@redhat.com&gt;
Acked-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Signed-off-by: Maciej Fijalkowski &lt;maciej.fijalkowski@intel.com&gt;
Link: https://lore.kernel.org/r/20241122121030.716788-3-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bpf: devmap: provide rxq after redirect</title>
<updated>2024-11-01T00:55:56+00:00</updated>
<author>
<name>Florian Kauer</name>
<email>florian.kauer@linutronix.de</email>
</author>
<published>2024-09-11T08:41:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=a778fbe087c19f4ece5f5fc14173328f070c3803'/>
<id>a778fbe087c19f4ece5f5fc14173328f070c3803</id>
<content type='text'>
[ Upstream commit ca9984c5f0ab3690d98b13937b2485a978c8dd73 ]

rxq contains a pointer to the device from where
the redirect happened. Currently, the BPF program
that was executed after a redirect via BPF_MAP_TYPE_DEVMAP*
does not have it set.

This is particularly bad since accessing ingress_ifindex, e.g.

SEC("xdp")
int prog(struct xdp_md *pkt)
{
        return bpf_redirect_map(&amp;dev_redirect_map, 0, 0);
}

SEC("xdp/devmap")
int prog_after_redirect(struct xdp_md *pkt)
{
        bpf_printk("ifindex %i", pkt-&gt;ingress_ifindex);
        return XDP_PASS;
}

depends on access to rxq, so a NULL pointer gets dereferenced:

&lt;1&gt;[  574.475170] BUG: kernel NULL pointer dereference, address: 0000000000000000
&lt;1&gt;[  574.475188] #PF: supervisor read access in kernel mode
&lt;1&gt;[  574.475194] #PF: error_code(0x0000) - not-present page
&lt;6&gt;[  574.475199] PGD 0 P4D 0
&lt;4&gt;[  574.475207] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
&lt;4&gt;[  574.475217] CPU: 4 UID: 0 PID: 217 Comm: kworker/4:1 Not tainted 6.11.0-rc5-reduced-00859-g780801200300 #23
&lt;4&gt;[  574.475226] Hardware name: Intel(R) Client Systems NUC13ANHi7/NUC13ANBi7, BIOS ANRPL357.0026.2023.0314.1458 03/14/2023
&lt;4&gt;[  574.475231] Workqueue: mld mld_ifc_work
&lt;4&gt;[  574.475247] RIP: 0010:bpf_prog_5e13354d9cf5018a_prog_after_redirect+0x17/0x3c
&lt;4&gt;[  574.475257] Code: cc cc cc cc cc cc cc 80 00 00 00 cc cc cc cc cc cc cc cc f3 0f 1e fa 0f 1f 44 00 00 66 90 55 48 89 e5 f3 0f 1e fa 48 8b 57 20 &lt;48&gt; 8b 52 00 8b 92 e0 00 00 00 48 bf f8 a6 d5 c4 5d a0 ff ff be 0b
&lt;4&gt;[  574.475263] RSP: 0018:ffffa62440280c98 EFLAGS: 00010206
&lt;4&gt;[  574.475269] RAX: ffffa62440280cd8 RBX: 0000000000000001 RCX: 0000000000000000
&lt;4&gt;[  574.475274] RDX: 0000000000000000 RSI: ffffa62440549048 RDI: ffffa62440280ce0
&lt;4&gt;[  574.475278] RBP: ffffa62440280c98 R08: 0000000000000002 R09: 0000000000000001
&lt;4&gt;[  574.475281] R10: ffffa05dc8b98000 R11: ffffa05f577fca40 R12: ffffa05dcab24000
&lt;4&gt;[  574.475285] R13: ffffa62440280ce0 R14: ffffa62440549048 R15: ffffa62440549000
&lt;4&gt;[  574.475289] FS:  0000000000000000(0000) GS:ffffa05f4f700000(0000) knlGS:0000000000000000
&lt;4&gt;[  574.475294] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
&lt;4&gt;[  574.475298] CR2: 0000000000000000 CR3: 000000025522e000 CR4: 0000000000f50ef0
&lt;4&gt;[  574.475303] PKRU: 55555554
&lt;4&gt;[  574.475306] Call Trace:
&lt;4&gt;[  574.475313]  &lt;IRQ&gt;
&lt;4&gt;[  574.475318]  ? __die+0x23/0x70
&lt;4&gt;[  574.475329]  ? page_fault_oops+0x180/0x4c0
&lt;4&gt;[  574.475339]  ? skb_pp_cow_data+0x34c/0x490
&lt;4&gt;[  574.475346]  ? kmem_cache_free+0x257/0x280
&lt;4&gt;[  574.475357]  ? exc_page_fault+0x67/0x150
&lt;4&gt;[  574.475368]  ? asm_exc_page_fault+0x26/0x30
&lt;4&gt;[  574.475381]  ? bpf_prog_5e13354d9cf5018a_prog_after_redirect+0x17/0x3c
&lt;4&gt;[  574.475386]  bq_xmit_all+0x158/0x420
&lt;4&gt;[  574.475397]  __dev_flush+0x30/0x90
&lt;4&gt;[  574.475407]  veth_poll+0x216/0x250 [veth]
&lt;4&gt;[  574.475421]  __napi_poll+0x28/0x1c0
&lt;4&gt;[  574.475430]  net_rx_action+0x32d/0x3a0
&lt;4&gt;[  574.475441]  handle_softirqs+0xcb/0x2c0
&lt;4&gt;[  574.475451]  do_softirq+0x40/0x60
&lt;4&gt;[  574.475458]  &lt;/IRQ&gt;
&lt;4&gt;[  574.475461]  &lt;TASK&gt;
&lt;4&gt;[  574.475464]  __local_bh_enable_ip+0x66/0x70
&lt;4&gt;[  574.475471]  __dev_queue_xmit+0x268/0xe40
&lt;4&gt;[  574.475480]  ? selinux_ip_postroute+0x213/0x420
&lt;4&gt;[  574.475491]  ? alloc_skb_with_frags+0x4a/0x1d0
&lt;4&gt;[  574.475502]  ip6_finish_output2+0x2be/0x640
&lt;4&gt;[  574.475512]  ? nf_hook_slow+0x42/0xf0
&lt;4&gt;[  574.475521]  ip6_finish_output+0x194/0x300
&lt;4&gt;[  574.475529]  ? __pfx_ip6_finish_output+0x10/0x10
&lt;4&gt;[  574.475538]  mld_sendpack+0x17c/0x240
&lt;4&gt;[  574.475548]  mld_ifc_work+0x192/0x410
&lt;4&gt;[  574.475557]  process_one_work+0x15d/0x380
&lt;4&gt;[  574.475566]  worker_thread+0x29d/0x3a0
&lt;4&gt;[  574.475573]  ? __pfx_worker_thread+0x10/0x10
&lt;4&gt;[  574.475580]  ? __pfx_worker_thread+0x10/0x10
&lt;4&gt;[  574.475587]  kthread+0xcd/0x100
&lt;4&gt;[  574.475597]  ? __pfx_kthread+0x10/0x10
&lt;4&gt;[  574.475606]  ret_from_fork+0x31/0x50
&lt;4&gt;[  574.475615]  ? __pfx_kthread+0x10/0x10
&lt;4&gt;[  574.475623]  ret_from_fork_asm+0x1a/0x30
&lt;4&gt;[  574.475635]  &lt;/TASK&gt;
&lt;4&gt;[  574.475637] Modules linked in: veth br_netfilter bridge stp llc iwlmvm x86_pkg_temp_thermal iwlwifi efivarfs nvme nvme_core
&lt;4&gt;[  574.475662] CR2: 0000000000000000
&lt;4&gt;[  574.475668] ---[ end trace 0000000000000000 ]---

Therefore, provide it to the program by setting rxq properly.

Fixes: cb261b594b41 ("bpf: Run devmap xdp_prog on flush instead of bulk enqueue")
Reviewed-by: Toke Høiland-Jørgensen &lt;toke@redhat.com&gt;
Signed-off-by: Florian Kauer &lt;florian.kauer@linutronix.de&gt;
Acked-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Link: https://lore.kernel.org/r/20240911-devel-koalo-fix-ingress-ifindex-v4-1-5c643ae10258@linutronix.de
Signed-off-by: Martin KaFai Lau &lt;martin.lau@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit ca9984c5f0ab3690d98b13937b2485a978c8dd73 ]

rxq contains a pointer to the device from where
the redirect happened. Currently, the BPF program
that was executed after a redirect via BPF_MAP_TYPE_DEVMAP*
does not have it set.

This is particularly bad since accessing ingress_ifindex, e.g.

SEC("xdp")
int prog(struct xdp_md *pkt)
{
        return bpf_redirect_map(&amp;dev_redirect_map, 0, 0);
}

SEC("xdp/devmap")
int prog_after_redirect(struct xdp_md *pkt)
{
        bpf_printk("ifindex %i", pkt-&gt;ingress_ifindex);
        return XDP_PASS;
}

depends on access to rxq, so a NULL pointer gets dereferenced:

&lt;1&gt;[  574.475170] BUG: kernel NULL pointer dereference, address: 0000000000000000
&lt;1&gt;[  574.475188] #PF: supervisor read access in kernel mode
&lt;1&gt;[  574.475194] #PF: error_code(0x0000) - not-present page
&lt;6&gt;[  574.475199] PGD 0 P4D 0
&lt;4&gt;[  574.475207] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
&lt;4&gt;[  574.475217] CPU: 4 UID: 0 PID: 217 Comm: kworker/4:1 Not tainted 6.11.0-rc5-reduced-00859-g780801200300 #23
&lt;4&gt;[  574.475226] Hardware name: Intel(R) Client Systems NUC13ANHi7/NUC13ANBi7, BIOS ANRPL357.0026.2023.0314.1458 03/14/2023
&lt;4&gt;[  574.475231] Workqueue: mld mld_ifc_work
&lt;4&gt;[  574.475247] RIP: 0010:bpf_prog_5e13354d9cf5018a_prog_after_redirect+0x17/0x3c
&lt;4&gt;[  574.475257] Code: cc cc cc cc cc cc cc 80 00 00 00 cc cc cc cc cc cc cc cc f3 0f 1e fa 0f 1f 44 00 00 66 90 55 48 89 e5 f3 0f 1e fa 48 8b 57 20 &lt;48&gt; 8b 52 00 8b 92 e0 00 00 00 48 bf f8 a6 d5 c4 5d a0 ff ff be 0b
&lt;4&gt;[  574.475263] RSP: 0018:ffffa62440280c98 EFLAGS: 00010206
&lt;4&gt;[  574.475269] RAX: ffffa62440280cd8 RBX: 0000000000000001 RCX: 0000000000000000
&lt;4&gt;[  574.475274] RDX: 0000000000000000 RSI: ffffa62440549048 RDI: ffffa62440280ce0
&lt;4&gt;[  574.475278] RBP: ffffa62440280c98 R08: 0000000000000002 R09: 0000000000000001
&lt;4&gt;[  574.475281] R10: ffffa05dc8b98000 R11: ffffa05f577fca40 R12: ffffa05dcab24000
&lt;4&gt;[  574.475285] R13: ffffa62440280ce0 R14: ffffa62440549048 R15: ffffa62440549000
&lt;4&gt;[  574.475289] FS:  0000000000000000(0000) GS:ffffa05f4f700000(0000) knlGS:0000000000000000
&lt;4&gt;[  574.475294] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
&lt;4&gt;[  574.475298] CR2: 0000000000000000 CR3: 000000025522e000 CR4: 0000000000f50ef0
&lt;4&gt;[  574.475303] PKRU: 55555554
&lt;4&gt;[  574.475306] Call Trace:
&lt;4&gt;[  574.475313]  &lt;IRQ&gt;
&lt;4&gt;[  574.475318]  ? __die+0x23/0x70
&lt;4&gt;[  574.475329]  ? page_fault_oops+0x180/0x4c0
&lt;4&gt;[  574.475339]  ? skb_pp_cow_data+0x34c/0x490
&lt;4&gt;[  574.475346]  ? kmem_cache_free+0x257/0x280
&lt;4&gt;[  574.475357]  ? exc_page_fault+0x67/0x150
&lt;4&gt;[  574.475368]  ? asm_exc_page_fault+0x26/0x30
&lt;4&gt;[  574.475381]  ? bpf_prog_5e13354d9cf5018a_prog_after_redirect+0x17/0x3c
&lt;4&gt;[  574.475386]  bq_xmit_all+0x158/0x420
&lt;4&gt;[  574.475397]  __dev_flush+0x30/0x90
&lt;4&gt;[  574.475407]  veth_poll+0x216/0x250 [veth]
&lt;4&gt;[  574.475421]  __napi_poll+0x28/0x1c0
&lt;4&gt;[  574.475430]  net_rx_action+0x32d/0x3a0
&lt;4&gt;[  574.475441]  handle_softirqs+0xcb/0x2c0
&lt;4&gt;[  574.475451]  do_softirq+0x40/0x60
&lt;4&gt;[  574.475458]  &lt;/IRQ&gt;
&lt;4&gt;[  574.475461]  &lt;TASK&gt;
&lt;4&gt;[  574.475464]  __local_bh_enable_ip+0x66/0x70
&lt;4&gt;[  574.475471]  __dev_queue_xmit+0x268/0xe40
&lt;4&gt;[  574.475480]  ? selinux_ip_postroute+0x213/0x420
&lt;4&gt;[  574.475491]  ? alloc_skb_with_frags+0x4a/0x1d0
&lt;4&gt;[  574.475502]  ip6_finish_output2+0x2be/0x640
&lt;4&gt;[  574.475512]  ? nf_hook_slow+0x42/0xf0
&lt;4&gt;[  574.475521]  ip6_finish_output+0x194/0x300
&lt;4&gt;[  574.475529]  ? __pfx_ip6_finish_output+0x10/0x10
&lt;4&gt;[  574.475538]  mld_sendpack+0x17c/0x240
&lt;4&gt;[  574.475548]  mld_ifc_work+0x192/0x410
&lt;4&gt;[  574.475557]  process_one_work+0x15d/0x380
&lt;4&gt;[  574.475566]  worker_thread+0x29d/0x3a0
&lt;4&gt;[  574.475573]  ? __pfx_worker_thread+0x10/0x10
&lt;4&gt;[  574.475580]  ? __pfx_worker_thread+0x10/0x10
&lt;4&gt;[  574.475587]  kthread+0xcd/0x100
&lt;4&gt;[  574.475597]  ? __pfx_kthread+0x10/0x10
&lt;4&gt;[  574.475606]  ret_from_fork+0x31/0x50
&lt;4&gt;[  574.475615]  ? __pfx_kthread+0x10/0x10
&lt;4&gt;[  574.475623]  ret_from_fork_asm+0x1a/0x30
&lt;4&gt;[  574.475635]  &lt;/TASK&gt;
&lt;4&gt;[  574.475637] Modules linked in: veth br_netfilter bridge stp llc iwlmvm x86_pkg_temp_thermal iwlwifi efivarfs nvme nvme_core
&lt;4&gt;[  574.475662] CR2: 0000000000000000
&lt;4&gt;[  574.475668] ---[ end trace 0000000000000000 ]---

Therefore, provide it to the program by setting rxq properly.

Fixes: cb261b594b41 ("bpf: Run devmap xdp_prog on flush instead of bulk enqueue")
Reviewed-by: Toke Høiland-Jørgensen &lt;toke@redhat.com&gt;
Signed-off-by: Florian Kauer &lt;florian.kauer@linutronix.de&gt;
Acked-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Link: https://lore.kernel.org/r/20240911-devel-koalo-fix-ingress-ifindex-v4-1-5c643ae10258@linutronix.de
Signed-off-by: Martin KaFai Lau &lt;martin.lau@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bpf: Fix DEVMAP_HASH overflow check on 32-bit arches</title>
<updated>2024-03-26T22:20:41+00:00</updated>
<author>
<name>Toke Høiland-Jørgensen</name>
<email>toke@redhat.com</email>
</author>
<published>2024-03-07T12:03:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=edf7990baa48de5097daa9ac02e06cb4c798a737'/>
<id>edf7990baa48de5097daa9ac02e06cb4c798a737</id>
<content type='text'>
[ Upstream commit 281d464a34f540de166cee74b723e97ac2515ec3 ]

The devmap code allocates a number hash buckets equal to the next power
of two of the max_entries value provided when creating the map. When
rounding up to the next power of two, the 32-bit variable storing the
number of buckets can overflow, and the code checks for overflow by
checking if the truncated 32-bit value is equal to 0. However, on 32-bit
arches the rounding up itself can overflow mid-way through, because it
ends up doing a left-shift of 32 bits on an unsigned long value. If the
size of an unsigned long is four bytes, this is undefined behaviour, so
there is no guarantee that we'll end up with a nice and tidy 0-value at
the end.

Syzbot managed to turn this into a crash on arm32 by creating a
DEVMAP_HASH with max_entries &gt; 0x80000000 and then trying to update it.
Fix this by moving the overflow check to before the rounding up
operation.

Fixes: 6f9d451ab1a3 ("xdp: Add devmap_hash map type for looking up devices by hashed index")
Link: https://lore.kernel.org/r/000000000000ed666a0611af6818@google.com
Reported-and-tested-by: syzbot+8cd36f6b65f3cafd400a@syzkaller.appspotmail.com
Signed-off-by: Toke Høiland-Jørgensen &lt;toke@redhat.com&gt;
Message-ID: &lt;20240307120340.99577-2-toke@redhat.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 281d464a34f540de166cee74b723e97ac2515ec3 ]

The devmap code allocates a number hash buckets equal to the next power
of two of the max_entries value provided when creating the map. When
rounding up to the next power of two, the 32-bit variable storing the
number of buckets can overflow, and the code checks for overflow by
checking if the truncated 32-bit value is equal to 0. However, on 32-bit
arches the rounding up itself can overflow mid-way through, because it
ends up doing a left-shift of 32 bits on an unsigned long value. If the
size of an unsigned long is four bytes, this is undefined behaviour, so
there is no guarantee that we'll end up with a nice and tidy 0-value at
the end.

Syzbot managed to turn this into a crash on arm32 by creating a
DEVMAP_HASH with max_entries &gt; 0x80000000 and then trying to update it.
Fix this by moving the overflow check to before the rounding up
operation.

Fixes: 6f9d451ab1a3 ("xdp: Add devmap_hash map type for looking up devices by hashed index")
Link: https://lore.kernel.org/r/000000000000ed666a0611af6818@google.com
Reported-and-tested-by: syzbot+8cd36f6b65f3cafd400a@syzkaller.appspotmail.com
Signed-off-by: Toke Høiland-Jørgensen &lt;toke@redhat.com&gt;
Message-ID: &lt;20240307120340.99577-2-toke@redhat.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bpf: Use bpf_map_area_alloc consistently on bpf map creation</title>
<updated>2022-08-10T18:50:43+00:00</updated>
<author>
<name>Yafang Shao</name>
<email>laoar.shao@gmail.com</email>
</author>
<published>2022-08-10T15:18:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=73cf09a36bf7bfb3e5a3ff23755c36d49137c44d'/>
<id>73cf09a36bf7bfb3e5a3ff23755c36d49137c44d</id>
<content type='text'>
Let's use the generic helper bpf_map_area_alloc() instead of the
open-coded kzalloc helpers in bpf maps creation path.

Signed-off-by: Yafang Shao &lt;laoar.shao@gmail.com&gt;
Link: https://lore.kernel.org/r/20220810151840.16394-5-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Let's use the generic helper bpf_map_area_alloc() instead of the
open-coded kzalloc helpers in bpf maps creation path.

Signed-off-by: Yafang Shao &lt;laoar.shao@gmail.com&gt;
Link: https://lore.kernel.org/r/20220810151840.16394-5-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bpf: Make __GFP_NOWARN consistent in bpf map creation</title>
<updated>2022-08-10T18:49:25+00:00</updated>
<author>
<name>Yafang Shao</name>
<email>laoar.shao@gmail.com</email>
</author>
<published>2022-08-10T15:18:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=992c9e13f5939437037627c67bcb51e674b64265'/>
<id>992c9e13f5939437037627c67bcb51e674b64265</id>
<content type='text'>
Some of the bpf maps are created with __GFP_NOWARN, i.e. arraymap,
bloom_filter, bpf_local_storage, bpf_struct_ops, lpm_trie,
queue_stack_maps, reuseport_array, stackmap and xskmap, while others are
created without __GFP_NOWARN, i.e. cpumap, devmap, hashtab,
local_storage, offload, ringbuf and sock_map. But there are not key
differences between the creation of these maps. So let make this
allocation flag consistent in all bpf maps creation. Then we can use a
generic helper to alloc all bpf maps.

Signed-off-by: Yafang Shao &lt;laoar.shao@gmail.com&gt;
Link: https://lore.kernel.org/r/20220810151840.16394-4-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Some of the bpf maps are created with __GFP_NOWARN, i.e. arraymap,
bloom_filter, bpf_local_storage, bpf_struct_ops, lpm_trie,
queue_stack_maps, reuseport_array, stackmap and xskmap, while others are
created without __GFP_NOWARN, i.e. cpumap, devmap, hashtab,
local_storage, offload, ringbuf and sock_map. But there are not key
differences between the creation of these maps. So let make this
allocation flag consistent in all bpf maps creation. Then we can use a
generic helper to alloc all bpf maps.

Signed-off-by: Yafang Shao &lt;laoar.shao@gmail.com&gt;
Link: https://lore.kernel.org/r/20220810151840.16394-4-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bpf, devmap: Compute proper xdp_frame len redirecting frames</title>
<updated>2022-07-26T14:26:19+00:00</updated>
<author>
<name>Lorenzo Bianconi</name>
<email>lorenzo@kernel.org</email>
</author>
<published>2022-07-23T17:17:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=bd82ea52f0ee2890b698155a6d7bf9ea5bd8930d'/>
<id>bd82ea52f0ee2890b698155a6d7bf9ea5bd8930d</id>
<content type='text'>
Even if it is currently forbidden to XDP_REDIRECT a multi-frag xdp_frame into
a devmap, compute proper xdp_frame length in __xdp_enqueue and is_valid_dst
routines running xdp_get_frame_len().

Signed-off-by: Lorenzo Bianconi &lt;lorenzo@kernel.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Link: https://lore.kernel.org/bpf/894d99c01139e921bdb6868158ff8e67f661c072.1658596075.git.lorenzo@kernel.org
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Even if it is currently forbidden to XDP_REDIRECT a multi-frag xdp_frame into
a devmap, compute proper xdp_frame length in __xdp_enqueue and is_valid_dst
routines running xdp_get_frame_len().

Signed-off-by: Lorenzo Bianconi &lt;lorenzo@kernel.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Link: https://lore.kernel.org/bpf/894d99c01139e921bdb6868158ff8e67f661c072.1658596075.git.lorenzo@kernel.org
</pre>
</div>
</content>
</entry>
<entry>
<title>bpf: Make non-preallocated allocation low priority</title>
<updated>2022-07-13T00:44:27+00:00</updated>
<author>
<name>Yafang Shao</name>
<email>laoar.shao@gmail.com</email>
</author>
<published>2022-07-09T15:44:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=ace2bee839e08df324cb320763258dfd72e6120e'/>
<id>ace2bee839e08df324cb320763258dfd72e6120e</id>
<content type='text'>
GFP_ATOMIC doesn't cooperate well with memcg pressure so far, especially
if we allocate too much GFP_ATOMIC memory. For example, when we set the
memcg limit to limit a non-preallocated bpf memory, the GFP_ATOMIC can
easily break the memcg limit by force charge. So it is very dangerous to
use GFP_ATOMIC in non-preallocated case. One way to make it safe is to
remove __GFP_HIGH from GFP_ATOMIC, IOW, use (__GFP_ATOMIC |
__GFP_KSWAPD_RECLAIM) instead, then it will be limited if we allocate
too much memory. There's a plan to completely remove __GFP_ATOMIC in the
mm side[1], so let's use GFP_NOWAIT instead.

We introduced BPF_F_NO_PREALLOC is because full map pre-allocation is
too memory expensive for some cases. That means removing __GFP_HIGH
doesn't break the rule of BPF_F_NO_PREALLOC, but has the same goal with
it-avoiding issues caused by too much memory. So let's remove it.

This fix can also apply to other run-time allocations, for example, the
allocation in lpm trie, local storage and devmap. So let fix it
consistently over the bpf code

It also fixes a typo in the comment.

[1]. https://lore.kernel.org/linux-mm/163712397076.13692.4727608274002939094@noble.neil.brown.name/

Cc: Roman Gushchin &lt;roman.gushchin@linux.dev&gt;
Cc: Shakeel Butt &lt;shakeelb@google.com&gt;
Cc: NeilBrown &lt;neilb@suse.de&gt;
Signed-off-by: Yafang Shao &lt;laoar.shao@gmail.com&gt;
Reviewed-by: Shakeel Butt &lt;shakeelb@google.com&gt;
Link: https://lore.kernel.org/r/20220709154457.57379-2-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
GFP_ATOMIC doesn't cooperate well with memcg pressure so far, especially
if we allocate too much GFP_ATOMIC memory. For example, when we set the
memcg limit to limit a non-preallocated bpf memory, the GFP_ATOMIC can
easily break the memcg limit by force charge. So it is very dangerous to
use GFP_ATOMIC in non-preallocated case. One way to make it safe is to
remove __GFP_HIGH from GFP_ATOMIC, IOW, use (__GFP_ATOMIC |
__GFP_KSWAPD_RECLAIM) instead, then it will be limited if we allocate
too much memory. There's a plan to completely remove __GFP_ATOMIC in the
mm side[1], so let's use GFP_NOWAIT instead.

We introduced BPF_F_NO_PREALLOC is because full map pre-allocation is
too memory expensive for some cases. That means removing __GFP_HIGH
doesn't break the rule of BPF_F_NO_PREALLOC, but has the same goal with
it-avoiding issues caused by too much memory. So let's remove it.

This fix can also apply to other run-time allocations, for example, the
allocation in lpm trie, local storage and devmap. So let fix it
consistently over the bpf code

It also fixes a typo in the comment.

[1]. https://lore.kernel.org/linux-mm/163712397076.13692.4727608274002939094@noble.neil.brown.name/

Cc: Roman Gushchin &lt;roman.gushchin@linux.dev&gt;
Cc: Shakeel Butt &lt;shakeelb@google.com&gt;
Cc: NeilBrown &lt;neilb@suse.de&gt;
Signed-off-by: Yafang Shao &lt;laoar.shao@gmail.com&gt;
Reviewed-by: Shakeel Butt &lt;shakeelb@google.com&gt;
Link: https://lore.kernel.org/r/20220709154457.57379-2-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bpf: Compute map_btf_id during build time</title>
<updated>2022-04-26T18:35:21+00:00</updated>
<author>
<name>Menglong Dong</name>
<email>imagedong@tencent.com</email>
</author>
<published>2022-04-25T13:32:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=c317ab71facc2cd0a94145973318a4c914e11acc'/>
<id>c317ab71facc2cd0a94145973318a4c914e11acc</id>
<content type='text'>
For now, the field 'map_btf_id' in 'struct bpf_map_ops' for all map
types are computed during vmlinux-btf init:

  btf_parse_vmlinux() -&gt; btf_vmlinux_map_ids_init()

It will lookup the btf_type according to the 'map_btf_name' field in
'struct bpf_map_ops'. This process can be done during build time,
thanks to Jiri's resolve_btfids.

selftest of map_ptr has passed:

  $96 map_ptr:OK
  Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED

Reported-by: kernel test robot &lt;lkp@intel.com&gt;
Signed-off-by: Menglong Dong &lt;imagedong@tencent.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
For now, the field 'map_btf_id' in 'struct bpf_map_ops' for all map
types are computed during vmlinux-btf init:

  btf_parse_vmlinux() -&gt; btf_vmlinux_map_ids_init()

It will lookup the btf_type according to the 'map_btf_name' field in
'struct bpf_map_ops'. This process can be done during build time,
thanks to Jiri's resolve_btfids.

selftest of map_ptr has passed:

  $96 map_ptr:OK
  Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED

Reported-by: kernel test robot &lt;lkp@intel.com&gt;
Signed-off-by: Menglong Dong &lt;imagedong@tencent.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bpf: generalise tail call map compatibility check</title>
<updated>2022-01-21T22:14:03+00:00</updated>
<author>
<name>Toke Hoiland-Jorgensen</name>
<email>toke@redhat.com</email>
</author>
<published>2022-01-21T10:10:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=f45d5b6ce2e835834c94b8b700787984f02cd662'/>
<id>f45d5b6ce2e835834c94b8b700787984f02cd662</id>
<content type='text'>
The check for tail call map compatibility ensures that tail calls only
happen between maps of the same type. To ensure backwards compatibility for
XDP frags we need a similar type of check for cpumap and devmap
programs, so move the state from bpf_array_aux into bpf_map, add
xdp_has_frags to the check, and apply the same check to cpumap and devmap.

Acked-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Co-developed-by: Lorenzo Bianconi &lt;lorenzo@kernel.org&gt;
Signed-off-by: Lorenzo Bianconi &lt;lorenzo@kernel.org&gt;
Signed-off-by: Toke Hoiland-Jorgensen &lt;toke@redhat.com&gt;
Link: https://lore.kernel.org/r/f19fd97c0328a39927f3ad03e1ca6b43fd53cdfd.1642758637.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The check for tail call map compatibility ensures that tail calls only
happen between maps of the same type. To ensure backwards compatibility for
XDP frags we need a similar type of check for cpumap and devmap
programs, so move the state from bpf_array_aux into bpf_map, add
xdp_has_frags to the check, and apply the same check to cpumap and devmap.

Acked-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Co-developed-by: Lorenzo Bianconi &lt;lorenzo@kernel.org&gt;
Signed-off-by: Lorenzo Bianconi &lt;lorenzo@kernel.org&gt;
Signed-off-by: Toke Hoiland-Jorgensen &lt;toke@redhat.com&gt;
Link: https://lore.kernel.org/r/f19fd97c0328a39927f3ad03e1ca6b43fd53cdfd.1642758637.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
