summaryrefslogtreecommitdiff
path: root/net/bridge
AgeCommit message (Collapse)AuthorFilesLines
2024-10-17bridge: Handle error of rtnl_register_module().Kuniyuki Iwashima3-13/+17
[ Upstream commit cba5e43b0b757734b1e79f624d93a71435e31136 ] Since introduced, br_vlan_rtnl_init() has been ignoring the returned value of rtnl_register_module(), which could fail silently. Handling the error allows users to view a module as an all-or-nothing thing in terms of the rtnetlink functionality. This prevents syzkaller from reporting spurious errors from its tests, where OOM often occurs and module is automatically loaded. Let's handle the errors by rtnl_register_many(). Fixes: 8dcea187088b ("net: bridge: vlan: add rtm definitions and dump support") Fixes: f26b296585dc ("net: bridge: vlan: add new rtm message support") Fixes: adb3ce9bcb0f ("net: bridge: vlan: add del rtm message support") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-17netfilter: br_netfilter: fix panic with metadata_dst skbAndy Roulin1-0/+5
[ Upstream commit f9ff7665cd128012868098bbd07e28993e314fdb ] Fix a kernel panic in the br_netfilter module when sending untagged traffic via a VxLAN device. This happens during the check for fragmentation in br_nf_dev_queue_xmit. It is dependent on: 1) the br_netfilter module being loaded; 2) net.bridge.bridge-nf-call-iptables set to 1; 3) a bridge with a VxLAN (single-vxlan-device) netdevice as a bridge port; 4) untagged frames with size higher than the VxLAN MTU forwarded/flooded When forwarding the untagged packet to the VxLAN bridge port, before the netfilter hooks are called, br_handle_egress_vlan_tunnel is called and changes the skb_dst to the tunnel dst. The tunnel_dst is a metadata type of dst, i.e., skb_valid_dst(skb) is false, and metadata->dst.dev is NULL. Then in the br_netfilter hooks, in br_nf_dev_queue_xmit, there's a check for frames that needs to be fragmented: frames with higher MTU than the VxLAN device end up calling br_nf_ip_fragment, which in turns call ip_skb_dst_mtu. The ip_dst_mtu tries to use the skb_dst(skb) as if it was a valid dst with valid dst->dev, thus the crash. This case was never supported in the first place, so drop the packet instead. PING 10.0.0.2 (10.0.0.2) from 0.0.0.0 h1-eth0: 2000(2028) bytes of data. [ 176.291791] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000110 [ 176.292101] Mem abort info: [ 176.292184] ESR = 0x0000000096000004 [ 176.292322] EC = 0x25: DABT (current EL), IL = 32 bits [ 176.292530] SET = 0, FnV = 0 [ 176.292709] EA = 0, S1PTW = 0 [ 176.292862] FSC = 0x04: level 0 translation fault [ 176.293013] Data abort info: [ 176.293104] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 [ 176.293488] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 176.293787] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 176.293995] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000043ef5000 [ 176.294166] [0000000000000110] pgd=0000000000000000, p4d=0000000000000000 [ 176.294827] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP [ 176.295252] Modules linked in: vxlan ip6_udp_tunnel udp_tunnel veth br_netfilter bridge stp llc ipv6 crct10dif_ce [ 176.295923] CPU: 0 PID: 188 Comm: ping Not tainted 6.8.0-rc3-g5b3fbd61b9d1 #2 [ 176.296314] Hardware name: linux,dummy-virt (DT) [ 176.296535] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 176.296808] pc : br_nf_dev_queue_xmit+0x390/0x4ec [br_netfilter] [ 176.297382] lr : br_nf_dev_queue_xmit+0x2ac/0x4ec [br_netfilter] [ 176.297636] sp : ffff800080003630 [ 176.297743] x29: ffff800080003630 x28: 0000000000000008 x27: ffff6828c49ad9f8 [ 176.298093] x26: ffff6828c49ad000 x25: 0000000000000000 x24: 00000000000003e8 [ 176.298430] x23: 0000000000000000 x22: ffff6828c4960b40 x21: ffff6828c3b16d28 [ 176.298652] x20: ffff6828c3167048 x19: ffff6828c3b16d00 x18: 0000000000000014 [ 176.298926] x17: ffffb0476322f000 x16: ffffb7e164023730 x15: 0000000095744632 [ 176.299296] x14: ffff6828c3f1c880 x13: 0000000000000002 x12: ffffb7e137926a70 [ 176.299574] x11: 0000000000000001 x10: ffff6828c3f1c898 x9 : 0000000000000000 [ 176.300049] x8 : ffff6828c49bf070 x7 : 0008460f18d5f20e x6 : f20e0100bebafeca [ 176.300302] x5 : ffff6828c7f918fe x4 : ffff6828c49bf070 x3 : 0000000000000000 [ 176.300586] x2 : 0000000000000000 x1 : ffff6828c3c7ad00 x0 : ffff6828c7f918f0 [ 176.300889] Call trace: [ 176.301123] br_nf_dev_queue_xmit+0x390/0x4ec [br_netfilter] [ 176.301411] br_nf_post_routing+0x2a8/0x3e4 [br_netfilter] [ 176.301703] nf_hook_slow+0x48/0x124 [ 176.302060] br_forward_finish+0xc8/0xe8 [bridge] [ 176.302371] br_nf_hook_thresh+0x124/0x134 [br_netfilter] [ 176.302605] br_nf_forward_finish+0x118/0x22c [br_netfilter] [ 176.302824] br_nf_forward_ip.part.0+0x264/0x290 [br_netfilter] [ 176.303136] br_nf_forward+0x2b8/0x4e0 [br_netfilter] [ 176.303359] nf_hook_slow+0x48/0x124 [ 176.303803] __br_forward+0xc4/0x194 [bridge] [ 176.304013] br_flood+0xd4/0x168 [bridge] [ 176.304300] br_handle_frame_finish+0x1d4/0x5c4 [bridge] [ 176.304536] br_nf_hook_thresh+0x124/0x134 [br_netfilter] [ 176.304978] br_nf_pre_routing_finish+0x29c/0x494 [br_netfilter] [ 176.305188] br_nf_pre_routing+0x250/0x524 [br_netfilter] [ 176.305428] br_handle_frame+0x244/0x3cc [bridge] [ 176.305695] __netif_receive_skb_core.constprop.0+0x33c/0xecc [ 176.306080] __netif_receive_skb_one_core+0x40/0x8c [ 176.306197] __netif_receive_skb+0x18/0x64 [ 176.306369] process_backlog+0x80/0x124 [ 176.306540] __napi_poll+0x38/0x17c [ 176.306636] net_rx_action+0x124/0x26c [ 176.306758] __do_softirq+0x100/0x26c [ 176.307051] ____do_softirq+0x10/0x1c [ 176.307162] call_on_irq_stack+0x24/0x4c [ 176.307289] do_softirq_own_stack+0x1c/0x2c [ 176.307396] do_softirq+0x54/0x6c [ 176.307485] __local_bh_enable_ip+0x8c/0x98 [ 176.307637] __dev_queue_xmit+0x22c/0xd28 [ 176.307775] neigh_resolve_output+0xf4/0x1a0 [ 176.308018] ip_finish_output2+0x1c8/0x628 [ 176.308137] ip_do_fragment+0x5b4/0x658 [ 176.308279] ip_fragment.constprop.0+0x48/0xec [ 176.308420] __ip_finish_output+0xa4/0x254 [ 176.308593] ip_finish_output+0x34/0x130 [ 176.308814] ip_output+0x6c/0x108 [ 176.308929] ip_send_skb+0x50/0xf0 [ 176.309095] ip_push_pending_frames+0x30/0x54 [ 176.309254] raw_sendmsg+0x758/0xaec [ 176.309568] inet_sendmsg+0x44/0x70 [ 176.309667] __sys_sendto+0x110/0x178 [ 176.309758] __arm64_sys_sendto+0x28/0x38 [ 176.309918] invoke_syscall+0x48/0x110 [ 176.310211] el0_svc_common.constprop.0+0x40/0xe0 [ 176.310353] do_el0_svc+0x1c/0x28 [ 176.310434] el0_svc+0x34/0xb4 [ 176.310551] el0t_64_sync_handler+0x120/0x12c [ 176.310690] el0t_64_sync+0x190/0x194 [ 176.311066] Code: f9402e61 79402aa2 927ff821 f9400023 (f9408860) [ 176.315743] ---[ end trace 0000000000000000 ]--- [ 176.316060] Kernel panic - not syncing: Oops: Fatal exception in interrupt [ 176.316371] Kernel Offset: 0x37e0e3000000 from 0xffff800080000000 [ 176.316564] PHYS_OFFSET: 0xffff97d780000000 [ 176.316782] CPU features: 0x0,88000203,3c020000,0100421b [ 176.317210] Memory Limit: none [ 176.317527] ---[ end Kernel panic - not syncing: Oops: Fatal Exception in interrupt ]---\ Fixes: 11538d039ac6 ("bridge: vlan dst_metadata hooks in ingress and egress paths") Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Andy Roulin <aroulin@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20241001154400.22787-2-aroulin@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-10bridge: mcast: Fail MDB get request on empty entryIdo Schimmel1-1/+1
[ Upstream commit 555f45d24ba7cd5527716553031641cdebbe76c7 ] When user space deletes a port from an MDB entry, the port is removed synchronously. If this was the last port in the entry and the entry is not joined by the host itself, then the entry is scheduled for deletion via a timer. The above means that it is possible for the MDB get netlink request to retrieve an empty entry which is scheduled for deletion. This is problematic as after deleting the last port in an entry, user space cannot rely on a non-zero return code from the MDB get request as an indication that the port was successfully removed. Fix by returning an error when the entry's port list is empty and the entry is not joined by the host. Fixes: 68b380a395a7 ("bridge: mcast: Add MDB get support") Reported-by: Jamie Bainbridge <jamie.bainbridge@gmail.com> Closes: https://lore.kernel.org/netdev/c92569919307749f879b9482b0f3e125b7d9d2e3.1726480066.git.jamie.bainbridge@gmail.com/ Tested-by: Jamie Bainbridge <jamie.bainbridge@gmail.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20240929123640.558525-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-09-04net: bridge: br_fdb_external_learn_add(): always set EXT_LEARNJonas Gorski1-4/+2
When userspace wants to take over a fdb entry by setting it as EXTERN_LEARNED, we set both flags BR_FDB_ADDED_BY_EXT_LEARN and BR_FDB_ADDED_BY_USER in br_fdb_external_learn_add(). If the bridge updates the entry later because its port changed, we clear the BR_FDB_ADDED_BY_EXT_LEARN flag, but leave the BR_FDB_ADDED_BY_USER flag set. If userspace then wants to take over the entry again, br_fdb_external_learn_add() sees that BR_FDB_ADDED_BY_USER and skips setting the BR_FDB_ADDED_BY_EXT_LEARN flags, thus silently ignores the update. Fix this by always allowing to set BR_FDB_ADDED_BY_EXT_LEARN regardless if this was a user fdb entry or not. Fixes: 710ae7287737 ("net: bridge: Mark FDB entries that were added by user as such") Signed-off-by: Jonas Gorski <jonas.gorski@bisdn.de> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20240903081958.29951-1-jonas.gorski@bisdn.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-14netfilter: nf_queue: drop packets with cloned unconfirmed conntracksFlorian Westphal1-1/+5
Conntrack assumes an unconfirmed entry (not yet committed to global hash table) has a refcount of 1 and is not visible to other cores. With multicast forwarding this assumption breaks down because such skbs get cloned after being picked up, i.e. ct->use refcount is > 1. Likewise, bridge netfilter will clone broad/mutlicast frames and all frames in case they need to be flood-forwarded during learning phase. For ip multicast forwarding or plain bridge flood-forward this will "work" because packets don't leave softirq and are implicitly serialized. With nfqueue this no longer holds true, the packets get queued and can be reinjected in arbitrary ways. Disable this feature, I see no other solution. After this patch, nfqueue cannot queue packets except the last multicast/broadcast packet. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-08-05net: bridge: mcast: wait for previous gc cycles when removing portNikolay Aleksandrov1-3/+1
syzbot hit a use-after-free[1] which is caused because the bridge doesn't make sure that all previous garbage has been collected when removing a port. What happens is: CPU 1 CPU 2 start gc cycle remove port acquire gc lock first wait for lock call br_multicasg_gc() directly acquire lock now but free port the port can be freed while grp timers still running Make sure all previous gc cycles have finished by using flush_work before freeing the port. [1] BUG: KASAN: slab-use-after-free in br_multicast_port_group_expired+0x4c0/0x550 net/bridge/br_multicast.c:861 Read of size 8 at addr ffff888071d6d000 by task syz.5.1232/9699 CPU: 1 PID: 9699 Comm: syz.5.1232 Not tainted 6.10.0-rc5-syzkaller-00021-g24ca36a562d6 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/07/2024 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:114 print_address_description mm/kasan/report.c:377 [inline] print_report+0xc3/0x620 mm/kasan/report.c:488 kasan_report+0xd9/0x110 mm/kasan/report.c:601 br_multicast_port_group_expired+0x4c0/0x550 net/bridge/br_multicast.c:861 call_timer_fn+0x1a3/0x610 kernel/time/timer.c:1792 expire_timers kernel/time/timer.c:1843 [inline] __run_timers+0x74b/0xaf0 kernel/time/timer.c:2417 __run_timer_base kernel/time/timer.c:2428 [inline] __run_timer_base kernel/time/timer.c:2421 [inline] run_timer_base+0x111/0x190 kernel/time/timer.c:2437 Reported-by: syzbot+263426984509be19c9a0@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=263426984509be19c9a0 Fixes: e12cec65b554 ("net: bridge: mcast: destroy all entries via gc") Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20240802080730.3206303-1-razor@blackwall.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-24sysctl: treewide: constify the ctl_table argument of proc_handlersJoel Granados1-1/+1
const qualify the struct ctl_table argument in the proc_handler function signatures. This is a prerequisite to moving the static ctl_table structs into .rodata data which will ensure that proc_handler function pointers cannot be modified. This patch has been generated by the following coccinelle script: ``` virtual patch @r1@ identifier ctl, write, buffer, lenp, ppos; identifier func !~ "appldata_(timer|interval)_handler|sched_(rt|rr)_handler|rds_tcp_skbuf_handler|proc_sctp_do_(hmac_alg|rto_min|rto_max|udp_port|alpha_beta|auth|probe_interval)"; @@ int func( - struct ctl_table *ctl + const struct ctl_table *ctl ,int write, void *buffer, size_t *lenp, loff_t *ppos); @r2@ identifier func, ctl, write, buffer, lenp, ppos; @@ int func( - struct ctl_table *ctl + const struct ctl_table *ctl ,int write, void *buffer, size_t *lenp, loff_t *ppos) { ... } @r3@ identifier func; @@ int func( - struct ctl_table * + const struct ctl_table * ,int , void *, size_t *, loff_t *); @r4@ identifier func, ctl; @@ int func( - struct ctl_table *ctl + const struct ctl_table *ctl ,int , void *, size_t *, loff_t *); @r5@ identifier func, write, buffer, lenp, ppos; @@ int func( - struct ctl_table * + const struct ctl_table * ,int write, void *buffer, size_t *lenp, loff_t *ppos); ``` * Code formatting was adjusted in xfs_sysctl.c to comply with code conventions. The xfs_stats_clear_proc_handler, xfs_panic_mask_proc_handler and xfs_deprecated_dointvec_minmax where adjusted. * The ctl_table argument in proc_watchdog_common was const qualified. This is called from a proc_handler itself and is calling back into another proc_handler, making it necessary to change it as part of the proc_handler migration. Co-developed-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Co-developed-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Joel Granados <j.granados@samsung.com>
2024-07-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-2/+2
Merge in late fixes to prepare for the 6.11 net-next PR. Conflicts: 93c3a96c301f ("net: pse-pd: Do not return EOPNOSUPP if config is null") 4cddb0f15ea9 ("net: ethtool: pse-pd: Fix possible null-deref") 30d7b6727724 ("net: ethtool: Add new power limit get and set features") https://lore.kernel.org/20240715123204.623520bb@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-15net: bridge: mst: Check vlan state for egress decisionElliot Ayrey1-2/+2
If a port is blocking in the common instance but forwarding in an MST instance, traffic egressing the bridge will be dropped because the state of the common instance is overriding that of the MST instance. Fix this by skipping the port state check in MST mode to allow checking the vlan state via br_allowed_egress(). This is similar to what happens in br_handle_frame_finish() when checking ingress traffic, which was introduced in the change below. Fixes: ec7328b59176 ("net: bridge: mst: Multiple Spanning Tree (MST) mode") Signed-off-by: Elliot Ayrey <elliot.ayrey@alliedtelesis.co.nz> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-24netfilter: br_netfilter: Use nested-BH locking for brnf_frag_data_storage.Sebastian Andrzej Siewior1-4/+16
brnf_frag_data_storage is a per-CPU variable and relies on disabled BH for its locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. Add a local_lock_t to the data structure and use local_lock_nested_bh() for locking. This change adds only lockdep coverage and does not alter the functional behaviour for !PREEMPT_RT. Cc: Florian Westphal <fw@strlen.de> Cc: Jozsef Kadlecsik <kadlec@netfilter.org> Cc: Nikolay Aleksandrov <razor@blackwall.org> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Roopa Prabhu <roopa@nvidia.com> Cc: bridge@lists.linux.dev Cc: coreteam@netfilter.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20240620132727.660738-8-bigeasy@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-7/+6
Cross-merge networking fixes after downstream PR. No conflicts, no adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-12net: bridge: mst: fix suspicious rcu usage in br_mst_set_stateNikolay Aleksandrov1-1/+1
I converted br_mst_set_state to RCU to avoid a vlan use-after-free but forgot to change the vlan group dereference helper. Switch to vlan group RCU deref helper to fix the suspicious rcu usage warning. Fixes: 3a7c1661ae13 ("net: bridge: mst: fix vlan use-after-free") Reported-by: syzbot+9bbe2de1bc9d470eb5fe@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=9bbe2de1bc9d470eb5fe Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/r/20240609103654.914987-3-razor@blackwall.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-12net: bridge: mst: pass vlan group directly to br_mst_vlan_set_stateNikolay Aleksandrov1-6/+5
Pass the already obtained vlan group pointer to br_mst_vlan_set_state() instead of dereferencing it again. Each caller has already correctly dereferenced it for their context. This change is required for the following suspicious RCU dereference fix. No functional changes intended. Fixes: 3a7c1661ae13 ("net: bridge: mst: fix vlan use-after-free") Reported-by: syzbot+9bbe2de1bc9d470eb5fe@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=9bbe2de1bc9d470eb5fe Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/r/20240609103654.914987-2-razor@blackwall.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-05net: bridge: fix an inconsistent indentationChen Hanxiao1-2/+2
Smatch complains: net/bridge/br_netlink_tunnel.c: 318 br_process_vlan_tunnel_info() warn: inconsistent indenting Fix it with a proper indenting Signed-off-by: Chen Hanxiao <chenhx.fnst@fujitsu.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-23net: Rename mono_delivery_time to tstamp_type for scalabiltyAbhishek Chauhan1-3/+3
mono_delivery_time was added to check if skb->tstamp has delivery time in mono clock base (i.e. EDT) otherwise skb->tstamp has timestamp in ingress and delivery_time at egress. Renaming the bitfield from mono_delivery_time to tstamp_type is for extensibilty for other timestamps such as userspace timestamp (i.e. SO_TXTIME) set via sock opts. As we are renaming the mono_delivery_time to tstamp_type, it makes sense to start assigning tstamp_type based on enum defined in this commit. Earlier we used bool arg flag to check if the tstamp is mono in function skb_set_delivery_time, Now the signature of the functions accepts tstamp_type to distinguish between mono and real time. Also skb_set_delivery_type_by_clockid is a new function which accepts clockid to determine the tstamp_type. In future tstamp_type:1 can be extended to support userspace timestamp by increasing the bitfield. Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20240509211834.3235191-2-quic_abchauha@quicinc.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-05-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextJakub Kicinski3-8/+11
Cross merge. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-15net: bridge: mst: fix vlan use-after-freeNikolay Aleksandrov1-6/+10
syzbot reported a suspicious rcu usage[1] in bridge's mst code. While fixing it I noticed that nothing prevents a vlan to be freed while walking the list from the same path (br forward delay timer). Fix the rcu usage and also make sure we are not accessing freed memory by making br_mst_vlan_set_state use rcu read lock. [1] WARNING: suspicious RCU usage 6.9.0-rc6-syzkaller #0 Not tainted ----------------------------- net/bridge/br_private.h:1599 suspicious rcu_dereference_protected() usage! ... stack backtrace: CPU: 1 PID: 8017 Comm: syz-executor.1 Not tainted 6.9.0-rc6-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114 lockdep_rcu_suspicious+0x221/0x340 kernel/locking/lockdep.c:6712 nbp_vlan_group net/bridge/br_private.h:1599 [inline] br_mst_set_state+0x1ea/0x650 net/bridge/br_mst.c:105 br_set_state+0x28a/0x7b0 net/bridge/br_stp.c:47 br_forward_delay_timer_expired+0x176/0x440 net/bridge/br_stp_timer.c:88 call_timer_fn+0x18e/0x650 kernel/time/timer.c:1793 expire_timers kernel/time/timer.c:1844 [inline] __run_timers kernel/time/timer.c:2418 [inline] __run_timer_base+0x66a/0x8e0 kernel/time/timer.c:2429 run_timer_base kernel/time/timer.c:2438 [inline] run_timer_softirq+0xb7/0x170 kernel/time/timer.c:2448 __do_softirq+0x2c6/0x980 kernel/softirq.c:554 invoke_softirq kernel/softirq.c:428 [inline] __irq_exit_rcu+0xf2/0x1c0 kernel/softirq.c:633 irq_exit_rcu+0x9/0x30 kernel/softirq.c:645 instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1043 [inline] sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1043 </IRQ> <TASK> asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702 RIP: 0010:lock_acquire+0x264/0x550 kernel/locking/lockdep.c:5758 Code: 2b 00 74 08 4c 89 f7 e8 ba d1 84 00 f6 44 24 61 02 0f 85 85 01 00 00 41 f7 c7 00 02 00 00 74 01 fb 48 c7 44 24 40 0e 36 e0 45 <4b> c7 44 25 00 00 00 00 00 43 c7 44 25 09 00 00 00 00 43 c7 44 25 RSP: 0018:ffffc90013657100 EFLAGS: 00000206 RAX: 0000000000000001 RBX: 1ffff920026cae2c RCX: 0000000000000001 RDX: dffffc0000000000 RSI: ffffffff8bcaca00 RDI: ffffffff8c1eaa60 RBP: ffffc90013657260 R08: ffffffff92efe507 R09: 1ffffffff25dfca0 R10: dffffc0000000000 R11: fffffbfff25dfca1 R12: 1ffff920026cae28 R13: dffffc0000000000 R14: ffffc90013657160 R15: 0000000000000246 Fixes: ec7328b59176 ("net: bridge: mst: Multiple Spanning Tree (MST) mode") Reported-by: syzbot+fa04eb8a56fd923fc5d8@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=fa04eb8a56fd923fc5d8 Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-15net: bridge: xmit: make sure we have at least eth header len bytesNikolay Aleksandrov1-0/+6
syzbot triggered an uninit value[1] error in bridge device's xmit path by sending a short (less than ETH_HLEN bytes) skb. To fix it check if we can actually pull that amount instead of assuming. Tested with dropwatch: drop at: br_dev_xmit+0xb93/0x12d0 [bridge] (0xffffffffc06739b3) origin: software timestamp: Mon May 13 11:31:53 2024 778214037 nsec protocol: 0x88a8 length: 2 original length: 2 drop reason: PKT_TOO_SMALL [1] BUG: KMSAN: uninit-value in br_dev_xmit+0x61d/0x1cb0 net/bridge/br_device.c:65 br_dev_xmit+0x61d/0x1cb0 net/bridge/br_device.c:65 __netdev_start_xmit include/linux/netdevice.h:4903 [inline] netdev_start_xmit include/linux/netdevice.h:4917 [inline] xmit_one net/core/dev.c:3531 [inline] dev_hard_start_xmit+0x247/0xa20 net/core/dev.c:3547 __dev_queue_xmit+0x34db/0x5350 net/core/dev.c:4341 dev_queue_xmit include/linux/netdevice.h:3091 [inline] __bpf_tx_skb net/core/filter.c:2136 [inline] __bpf_redirect_common net/core/filter.c:2180 [inline] __bpf_redirect+0x14a6/0x1620 net/core/filter.c:2187 ____bpf_clone_redirect net/core/filter.c:2460 [inline] bpf_clone_redirect+0x328/0x470 net/core/filter.c:2432 ___bpf_prog_run+0x13fe/0xe0f0 kernel/bpf/core.c:1997 __bpf_prog_run512+0xb5/0xe0 kernel/bpf/core.c:2238 bpf_dispatcher_nop_func include/linux/bpf.h:1234 [inline] __bpf_prog_run include/linux/filter.h:657 [inline] bpf_prog_run include/linux/filter.h:664 [inline] bpf_test_run+0x499/0xc30 net/bpf/test_run.c:425 bpf_prog_test_run_skb+0x14ea/0x1f20 net/bpf/test_run.c:1058 bpf_prog_test_run+0x6b7/0xad0 kernel/bpf/syscall.c:4269 __sys_bpf+0x6aa/0xd90 kernel/bpf/syscall.c:5678 __do_sys_bpf kernel/bpf/syscall.c:5767 [inline] __se_sys_bpf kernel/bpf/syscall.c:5765 [inline] __x64_sys_bpf+0xa0/0xe0 kernel/bpf/syscall.c:5765 x64_sys_call+0x96b/0x3b50 arch/x86/include/generated/asm/syscalls_64.h:322 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot+a63a1f6a062033cf0f40@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=a63a1f6a062033cf0f40 Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-2/+7
Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c 35d92abfbad8 ("net: hns3: fix kernel crash when devlink reload during initialization") 2a1a1a7b5fd7 ("net: hns3: add command queue trace for hns3") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-08net: bridge: fix corrupted ethernet header on multicast-to-unicastFelix Fietkau1-2/+7
The change from skb_copy to pskb_copy unfortunately changed the data copying to omit the ethernet header, since it was pulled before reaching this point. Fix this by calling __skb_push/pull around pskb_copy. Fixes: 59c878cbcdd8 ("net: bridge: fix multicast-to-unicast with fraglist GSO") Signed-off-by: Felix Fietkau <nbd@nbd.name> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-07net: annotate writes on dev->mtu from ndo_change_mtu()Eric Dumazet1-1/+1
Simon reported that ndo_change_mtu() methods were never updated to use WRITE_ONCE(dev->mtu, new_mtu) as hinted in commit 501a90c94510 ("inet: protect against too small mtu values.") We read dev->mtu without holding RTNL in many places, with READ_ONCE() annotations. It is time to take care of ndo_change_mtu() methods to use corresponding WRITE_ONCE() Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Simon Horman <horms@kernel.org> Closes: https://lore.kernel.org/netdev/20240505144608.GB67882@kernel.org/ Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Shannon Nelson <shannon.nelson@amd.com> Link: https://lore.kernel.org/r/20240506102812.3025432-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-03netfilter: Remove the now superfluous sentinel elements from ctl_table arrayJoel Granados1-1/+0
This commit comes at the tail end of a greater effort to remove the empty elements at the end of the ctl_table arrays (sentinels) which will reduce the overall build time size of the kernel and run time memory bloat by ~64 bytes per sentinel (further information Link : https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@bombadil.infradead.org/) * Remove sentinel elements from ctl_table structs * Remove instances where an array element is zeroed out to make it look like a sentinel. This is not longer needed and is safe after commit c899710fe7f9 ("networking: Update to register_net_sysctl_sz") added the array size to the ctl_table registration * Remove the need for having __NF_SYSCTL_CT_LAST_SYSCTL as the sysctl array size is now in NF_SYSCTL_CT_LAST_SYSCTL * Remove extra element in ctl_table arrays declarations Acked-by: Kees Cook <keescook@chromium.org> # loadpin & yama Signed-off-by: Joel Granados <j.granados@samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-1/+1
Cross-merge networking fixes after downstream PR. Conflicts: include/linux/filter.h kernel/bpf/core.c 66e13b615a0c ("bpf: verifier: prevent userspace memory access") d503a04f8bc0 ("bpf: Add support for certain atomics in bpf_arena to x86 JIT") https://lore.kernel.org/all/20240429114939.210328b0@canb.auug.org.au/ No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-01net: bridge: fix multicast-to-unicast with fraglist GSOFelix Fietkau1-1/+1
Calling skb_copy on a SKB_GSO_FRAGLIST skb is not valid, since it returns an invalid linearized skb. This code only needs to change the ethernet header, so pskb_copy is the right function to call here. Fixes: 6db6f0eae605 ("bridge: multicast to unicast") Signed-off-by: Felix Fietkau <nbd@nbd.name> Acked-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-1/+1
Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/ethernet/ti/icssg/icssg_prueth.c net/mac80211/chan.c 89884459a0b9 ("wifi: mac80211: fix idle calculation with multi-link") 87f5500285fb ("wifi: mac80211: simplify ieee80211_assign_link_chanctx()") https://lore.kernel.org/all/20240422105623.7b1fbda2@canb.auug.org.au/ net/unix/garbage.c 1971d13ffa84 ("af_unix: Suppress false-positive lockdep splat for spin_lock() in __unix_gc().") 4090fa373f0e ("af_unix: Replace garbage collection algorithm.") drivers/net/ethernet/ti/icssg/icssg_prueth.c drivers/net/ethernet/ti/icssg/icssg_common.c 4dcd0e83ea1d ("net: ti: icssg-prueth: Fix signedness bug in prueth_init_rx_chns()") e2dc7bfd677f ("net: ti: icssg-prueth: Move common functions into a separate file") No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-25net: bridge: remove redundant check of f->dstlinke li1-1/+1
In br_fill_forward_path(), f->dst is checked not to be NULL, then immediately read using READ_ONCE and checked again. The first check is useless, so this patch aims to remove the redundant check of f->dst. Signed-off-by: linke li <lilinke99@qq.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-22bridge/br_netlink.c: no need to return void functionHangbin Liu1-1/+1
br_info_notify is a void function. There is no need to return. Fixes: b6d0425b816e ("bridge: cfm: Netlink Notifications.") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-22sysctl: treewide: constify ctl_table_header::ctl_table_argThomas Weißschuh1-1/+1
To be able to constify instances of struct ctl_tables it is necessary to remove ways through which non-const versions are exposed from the sysctl core. One of these is the ctl_table_arg member of struct ctl_table_header. Constify this reference as a prerequisite for the full constification of struct ctl_table instances. No functional change. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski4-8/+28
Cross-merge networking fixes after downstream PR. Conflicts: include/trace/events/rpcgss.h 386f4a737964 ("trace: events: cleanup deprecated strncpy uses") a4833e3abae1 ("SUNRPC: Fix rpcgss_context trace event acceptor field") Adjacent changes: drivers/net/ethernet/intel/ice/ice_tc_lib.c 2cca35f5dd78 ("ice: Fix checking for unsupported keys on non-tunnel device") 784feaa65dfd ("ice: Add support for PFCP hardware offload in switchdev") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-11netfilter: br_netfilter: skip conntrack input hook for promisc packetsPablo Neira Ayuso4-8/+28
For historical reasons, when bridge device is in promisc mode, packets that are directed to the taps follow bridge input hook path. This patch adds a workaround to reset conntrack for these packets. Jianbo Liu reports warning splats in their test infrastructure where cloned packets reach the br_netfilter input hook to confirm the conntrack object. Scratch one bit from BR_INPUT_SKB_CB to annotate that this packet has reached the input hook because it is passed up to the bridge device to reach the taps. [ 57.571874] WARNING: CPU: 1 PID: 0 at net/bridge/br_netfilter_hooks.c:616 br_nf_local_in+0x157/0x180 [br_netfilter] [ 57.572749] Modules linked in: xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat xt_addrtype xt_conntrack nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_isc si ib_umad rdma_cm ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core mlx5ctl mlx5_core [ 57.575158] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.8.0+ #19 [ 57.575700] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [ 57.576662] RIP: 0010:br_nf_local_in+0x157/0x180 [br_netfilter] [ 57.577195] Code: fe ff ff 41 bd 04 00 00 00 be 04 00 00 00 e9 4a ff ff ff be 04 00 00 00 48 89 ef e8 f3 a9 3c e1 66 83 ad b4 00 00 00 04 eb 91 <0f> 0b e9 f1 fe ff ff 0f 0b e9 df fe ff ff 48 89 df e8 b3 53 47 e1 [ 57.578722] RSP: 0018:ffff88885f845a08 EFLAGS: 00010202 [ 57.579207] RAX: 0000000000000002 RBX: ffff88812dfe8000 RCX: 0000000000000000 [ 57.579830] RDX: ffff88885f845a60 RSI: ffff8881022dc300 RDI: 0000000000000000 [ 57.580454] RBP: ffff88885f845a60 R08: 0000000000000001 R09: 0000000000000003 [ 57.581076] R10: 00000000ffff1300 R11: 0000000000000002 R12: 0000000000000000 [ 57.581695] R13: ffff8881047ffe00 R14: ffff888108dbee00 R15: ffff88814519b800 [ 57.582313] FS: 0000000000000000(0000) GS:ffff88885f840000(0000) knlGS:0000000000000000 [ 57.583040] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 57.583564] CR2: 000000c4206aa000 CR3: 0000000103847001 CR4: 0000000000370eb0 [ 57.584194] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 57.584820] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 57.585440] Call Trace: [ 57.585721] <IRQ> [ 57.585976] ? __warn+0x7d/0x130 [ 57.586323] ? br_nf_local_in+0x157/0x180 [br_netfilter] [ 57.586811] ? report_bug+0xf1/0x1c0 [ 57.587177] ? handle_bug+0x3f/0x70 [ 57.587539] ? exc_invalid_op+0x13/0x60 [ 57.587929] ? asm_exc_invalid_op+0x16/0x20 [ 57.588336] ? br_nf_local_in+0x157/0x180 [br_netfilter] [ 57.588825] nf_hook_slow+0x3d/0xd0 [ 57.589188] ? br_handle_vlan+0x4b/0x110 [ 57.589579] br_pass_frame_up+0xfc/0x150 [ 57.589970] ? br_port_flags_change+0x40/0x40 [ 57.590396] br_handle_frame_finish+0x346/0x5e0 [ 57.590837] ? ipt_do_table+0x32e/0x430 [ 57.591221] ? br_handle_local_finish+0x20/0x20 [ 57.591656] br_nf_hook_thresh+0x4b/0xf0 [br_netfilter] [ 57.592286] ? br_handle_local_finish+0x20/0x20 [ 57.592802] br_nf_pre_routing_finish+0x178/0x480 [br_netfilter] [ 57.593348] ? br_handle_local_finish+0x20/0x20 [ 57.593782] ? nf_nat_ipv4_pre_routing+0x25/0x60 [nf_nat] [ 57.594279] br_nf_pre_routing+0x24c/0x550 [br_netfilter] [ 57.594780] ? br_nf_hook_thresh+0xf0/0xf0 [br_netfilter] [ 57.595280] br_handle_frame+0x1f3/0x3d0 [ 57.595676] ? br_handle_local_finish+0x20/0x20 [ 57.596118] ? br_handle_frame_finish+0x5e0/0x5e0 [ 57.596566] __netif_receive_skb_core+0x25b/0xfc0 [ 57.597017] ? __napi_build_skb+0x37/0x40 [ 57.597418] __netif_receive_skb_list_core+0xfb/0x220 Fixes: 62e7151ae3eb ("netfilter: bridge: confirm multicast packets before passing them up the stack") Reported-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-04-08ipv4: Set scope explicitly in ip_route_output().Guillaume Nault1-1/+2
Add a "scope" parameter to ip_route_output() so that callers don't have to override the tos parameter with the RTO_ONLINK flag if they want a local scope. This will allow converting flowi4_tos to dscp_t in the future, thus allowing static analysers to flag invalid interactions between "tos" (the DSCP bits) and ECN. Only three users ask for local scope (bonding, arp and atm). The others continue to use RT_SCOPE_UNIVERSE. While there, add a comment to warn users about the limitations of ip_route_output(). Signed-off-by: Guillaume Nault <gnault@redhat.com> Acked-by: Leon Romanovsky <leonro@nvidia.com> # infiniband Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-0/+6
Cross-merge networking fixes after downstream PR. Conflicts: net/ipv4/ip_gre.c 17af420545a7 ("erspan: make sure erspan_base_hdr is present in skb->head") 5832c4a77d69 ("ip_tunnel: convert __be16 tunnel flags to bitmaps") https://lore.kernel.org/all/20240402103253.3b54a1cf@canb.auug.org.au/ Adjacent changes: net/ipv6/ip6_fib.c d21d40605bca ("ipv6: Fix infinite recursion in fib6_dump_done().") 5fc68320c1fb ("ipv6: remove RTNL protection from inet6_dump_fib()") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04netfilter: validate user input for expected lengthEric Dumazet1-0/+6
I got multiple syzbot reports showing old bugs exposed by BPF after commit 20f2505fb436 ("bpf: Try to avoid kzalloc in cgroup/{s,g}etsockopt") setsockopt() @optlen argument should be taken into account before copying data. BUG: KASAN: slab-out-of-bounds in copy_from_sockptr_offset include/linux/sockptr.h:49 [inline] BUG: KASAN: slab-out-of-bounds in copy_from_sockptr include/linux/sockptr.h:55 [inline] BUG: KASAN: slab-out-of-bounds in do_replace net/ipv4/netfilter/ip_tables.c:1111 [inline] BUG: KASAN: slab-out-of-bounds in do_ipt_set_ctl+0x902/0x3dd0 net/ipv4/netfilter/ip_tables.c:1627 Read of size 96 at addr ffff88802cd73da0 by task syz-executor.4/7238 CPU: 1 PID: 7238 Comm: syz-executor.4 Not tainted 6.9.0-rc2-next-20240403-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114 print_address_description mm/kasan/report.c:377 [inline] print_report+0x169/0x550 mm/kasan/report.c:488 kasan_report+0x143/0x180 mm/kasan/report.c:601 kasan_check_range+0x282/0x290 mm/kasan/generic.c:189 __asan_memcpy+0x29/0x70 mm/kasan/shadow.c:105 copy_from_sockptr_offset include/linux/sockptr.h:49 [inline] copy_from_sockptr include/linux/sockptr.h:55 [inline] do_replace net/ipv4/netfilter/ip_tables.c:1111 [inline] do_ipt_set_ctl+0x902/0x3dd0 net/ipv4/netfilter/ip_tables.c:1627 nf_setsockopt+0x295/0x2c0 net/netfilter/nf_sockopt.c:101 do_sock_setsockopt+0x3af/0x720 net/socket.c:2311 __sys_setsockopt+0x1ae/0x250 net/socket.c:2334 __do_sys_setsockopt net/socket.c:2343 [inline] __se_sys_setsockopt net/socket.c:2340 [inline] __x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2340 do_syscall_64+0xfb/0x240 entry_SYSCALL_64_after_hwframe+0x72/0x7a RIP: 0033:0x7fd22067dde9 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fd21f9ff0c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000036 RAX: ffffffffffffffda RBX: 00007fd2207abf80 RCX: 00007fd22067dde9 RDX: 0000000000000040 RSI: 0000000000000000 RDI: 0000000000000003 RBP: 00007fd2206ca47a R08: 0000000000000001 R09: 0000000000000000 R10: 0000000020000880 R11: 0000000000000246 R12: 0000000000000000 R13: 000000000000000b R14: 00007fd2207abf80 R15: 00007ffd2d0170d8 </TASK> Allocated by task 7238: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x3f/0x80 mm/kasan/common.c:68 poison_kmalloc_redzone mm/kasan/common.c:370 [inline] __kasan_kmalloc+0x98/0xb0 mm/kasan/common.c:387 kasan_kmalloc include/linux/kasan.h:211 [inline] __do_kmalloc_node mm/slub.c:4069 [inline] __kmalloc_noprof+0x200/0x410 mm/slub.c:4082 kmalloc_noprof include/linux/slab.h:664 [inline] __cgroup_bpf_run_filter_setsockopt+0xd47/0x1050 kernel/bpf/cgroup.c:1869 do_sock_setsockopt+0x6b4/0x720 net/socket.c:2293 __sys_setsockopt+0x1ae/0x250 net/socket.c:2334 __do_sys_setsockopt net/socket.c:2343 [inline] __se_sys_setsockopt net/socket.c:2340 [inline] __x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2340 do_syscall_64+0xfb/0x240 entry_SYSCALL_64_after_hwframe+0x72/0x7a The buggy address belongs to the object at ffff88802cd73da0 which belongs to the cache kmalloc-8 of size 8 The buggy address is located 0 bytes inside of allocated 1-byte region [ffff88802cd73da0, ffff88802cd73da1) The buggy address belongs to the physical page: page: refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88802cd73020 pfn:0x2cd73 flags: 0xfff80000000000(node=0|zone=1|lastcpupid=0xfff) page_type: 0xffffefff(slab) raw: 00fff80000000000 ffff888015041280 dead000000000100 dead000000000122 raw: ffff88802cd73020 000000008080007f 00000001ffffefff 0000000000000000 page dumped because: kasan: bad access detected page_owner tracks the page as allocated page last allocated via order 0, migratetype Unmovable, gfp_mask 0x12cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY), pid 5103, tgid 2119833701 (syz-executor.4), ts 5103, free_ts 70804600828 set_page_owner include/linux/page_owner.h:32 [inline] post_alloc_hook+0x1f3/0x230 mm/page_alloc.c:1490 prep_new_page mm/page_alloc.c:1498 [inline] get_page_from_freelist+0x2e7e/0x2f40 mm/page_alloc.c:3454 __alloc_pages_noprof+0x256/0x6c0 mm/page_alloc.c:4712 __alloc_pages_node_noprof include/linux/gfp.h:244 [inline] alloc_pages_node_noprof include/linux/gfp.h:271 [inline] alloc_slab_page+0x5f/0x120 mm/slub.c:2249 allocate_slab+0x5a/0x2e0 mm/slub.c:2412 new_slab mm/slub.c:2465 [inline] ___slab_alloc+0xcd1/0x14b0 mm/slub.c:3615 __slab_alloc+0x58/0xa0 mm/slub.c:3705 __slab_alloc_node mm/slub.c:3758 [inline] slab_alloc_node mm/slub.c:3936 [inline] __do_kmalloc_node mm/slub.c:4068 [inline] kmalloc_node_track_caller_noprof+0x286/0x450 mm/slub.c:4089 kstrdup+0x3a/0x80 mm/util.c:62 device_rename+0xb5/0x1b0 drivers/base/core.c:4558 dev_change_name+0x275/0x860 net/core/dev.c:1232 do_setlink+0xa4b/0x41f0 net/core/rtnetlink.c:2864 __rtnl_newlink net/core/rtnetlink.c:3680 [inline] rtnl_newlink+0x180b/0x20a0 net/core/rtnetlink.c:3727 rtnetlink_rcv_msg+0x89b/0x10d0 net/core/rtnetlink.c:6594 netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2559 netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline] netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1361 page last free pid 5146 tgid 5146 stack trace: reset_page_owner include/linux/page_owner.h:25 [inline] free_pages_prepare mm/page_alloc.c:1110 [inline] free_unref_page+0xd3c/0xec0 mm/page_alloc.c:2617 discard_slab mm/slub.c:2511 [inline] __put_partials+0xeb/0x130 mm/slub.c:2980 put_cpu_partial+0x17c/0x250 mm/slub.c:3055 __slab_free+0x2ea/0x3d0 mm/slub.c:4254 qlink_free mm/kasan/quarantine.c:163 [inline] qlist_free_all+0x9e/0x140 mm/kasan/quarantine.c:179 kasan_quarantine_reduce+0x14f/0x170 mm/kasan/quarantine.c:286 __kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:322 kasan_slab_alloc include/linux/kasan.h:201 [inline] slab_post_alloc_hook mm/slub.c:3888 [inline] slab_alloc_node mm/slub.c:3948 [inline] __do_kmalloc_node mm/slub.c:4068 [inline] __kmalloc_node_noprof+0x1d7/0x450 mm/slub.c:4076 kmalloc_node_noprof include/linux/slab.h:681 [inline] kvmalloc_node_noprof+0x72/0x190 mm/util.c:634 bucket_table_alloc lib/rhashtable.c:186 [inline] rhashtable_rehash_alloc+0x9e/0x290 lib/rhashtable.c:367 rht_deferred_worker+0x4e1/0x2440 lib/rhashtable.c:427 process_one_work kernel/workqueue.c:3218 [inline] process_scheduled_works+0xa2c/0x1830 kernel/workqueue.c:3299 worker_thread+0x86d/0xd70 kernel/workqueue.c:3380 kthread+0x2f0/0x390 kernel/kthread.c:388 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243 Memory state around the buggy address: ffff88802cd73c80: 07 fc fc fc 05 fc fc fc 05 fc fc fc fa fc fc fc ffff88802cd73d00: fa fc fc fc fa fc fc fc fa fc fc fc fa fc fc fc >ffff88802cd73d80: fa fc fc fc 01 fc fc fc fa fc fc fc fa fc fc fc ^ ffff88802cd73e00: fa fc fc fc fa fc fc fc 05 fc fc fc 07 fc fc fc ffff88802cd73e80: 07 fc fc fc 07 fc fc fc 07 fc fc fc 07 fc fc fc Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org> Link: https://lore.kernel.org/r/20240404122051.2303764-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-01ip_tunnel: convert __be16 tunnel flags to bitmapsAlexander Lobakin1-3/+6
Historically, tunnel flags like TUNNEL_CSUM or TUNNEL_ERSPAN_OPT have been defined as __be16. Now all of those 16 bits are occupied and there's no more free space for new flags. It can't be simply switched to a bigger container with no adjustments to the values, since it's an explicit Endian storage, and on LE systems (__be16)0x0001 equals to (__be64)0x0001000000000000. We could probably define new 64-bit flags depending on the Endianness, i.e. (__be64)0x0001 on BE and (__be64)0x00010000... on LE, but that would introduce an Endianness dependency and spawn a ton of Sparse warnings. To mitigate them, all of those places which were adjusted with this change would be touched anyway, so why not define stuff properly if there's no choice. Define IP_TUNNEL_*_BIT counterparts as a bit number instead of the value already coded and a fistful of <16 <-> bitmap> converters and helpers. The two flags which have a different bit position are SIT_ISATAP_BIT and VTI_ISVTI_BIT, as they were defined not as __cpu_to_be16(), but as (__force __be16), i.e. had different positions on LE and BE. Now they both have strongly defined places. Change all __be16 fields which were used to store those flags, to IP_TUNNEL_DECLARE_FLAGS() -> DECLARE_BITMAP(__IP_TUNNEL_FLAG_NUM) -> unsigned long[1] for now, and replace all