Age | Commit message (Collapse) | Author | Files | Lines |
|
[ Upstream commit cba5e43b0b757734b1e79f624d93a71435e31136 ]
Since introduced, br_vlan_rtnl_init() has been ignoring the returned
value of rtnl_register_module(), which could fail silently.
Handling the error allows users to view a module as an all-or-nothing
thing in terms of the rtnetlink functionality. This prevents syzkaller
from reporting spurious errors from its tests, where OOM often occurs
and module is automatically loaded.
Let's handle the errors by rtnl_register_many().
Fixes: 8dcea187088b ("net: bridge: vlan: add rtm definitions and dump support")
Fixes: f26b296585dc ("net: bridge: vlan: add new rtm message support")
Fixes: adb3ce9bcb0f ("net: bridge: vlan: add del rtm message support")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit f9ff7665cd128012868098bbd07e28993e314fdb ]
Fix a kernel panic in the br_netfilter module when sending untagged
traffic via a VxLAN device.
This happens during the check for fragmentation in br_nf_dev_queue_xmit.
It is dependent on:
1) the br_netfilter module being loaded;
2) net.bridge.bridge-nf-call-iptables set to 1;
3) a bridge with a VxLAN (single-vxlan-device) netdevice as a bridge port;
4) untagged frames with size higher than the VxLAN MTU forwarded/flooded
When forwarding the untagged packet to the VxLAN bridge port, before
the netfilter hooks are called, br_handle_egress_vlan_tunnel is called and
changes the skb_dst to the tunnel dst. The tunnel_dst is a metadata type
of dst, i.e., skb_valid_dst(skb) is false, and metadata->dst.dev is NULL.
Then in the br_netfilter hooks, in br_nf_dev_queue_xmit, there's a check
for frames that needs to be fragmented: frames with higher MTU than the
VxLAN device end up calling br_nf_ip_fragment, which in turns call
ip_skb_dst_mtu.
The ip_dst_mtu tries to use the skb_dst(skb) as if it was a valid dst
with valid dst->dev, thus the crash.
This case was never supported in the first place, so drop the packet
instead.
PING 10.0.0.2 (10.0.0.2) from 0.0.0.0 h1-eth0: 2000(2028) bytes of data.
[ 176.291791] Unable to handle kernel NULL pointer dereference at
virtual address 0000000000000110
[ 176.292101] Mem abort info:
[ 176.292184] ESR = 0x0000000096000004
[ 176.292322] EC = 0x25: DABT (current EL), IL = 32 bits
[ 176.292530] SET = 0, FnV = 0
[ 176.292709] EA = 0, S1PTW = 0
[ 176.292862] FSC = 0x04: level 0 translation fault
[ 176.293013] Data abort info:
[ 176.293104] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[ 176.293488] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 176.293787] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 176.293995] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000043ef5000
[ 176.294166] [0000000000000110] pgd=0000000000000000,
p4d=0000000000000000
[ 176.294827] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
[ 176.295252] Modules linked in: vxlan ip6_udp_tunnel udp_tunnel veth
br_netfilter bridge stp llc ipv6 crct10dif_ce
[ 176.295923] CPU: 0 PID: 188 Comm: ping Not tainted
6.8.0-rc3-g5b3fbd61b9d1 #2
[ 176.296314] Hardware name: linux,dummy-virt (DT)
[ 176.296535] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS
BTYPE=--)
[ 176.296808] pc : br_nf_dev_queue_xmit+0x390/0x4ec [br_netfilter]
[ 176.297382] lr : br_nf_dev_queue_xmit+0x2ac/0x4ec [br_netfilter]
[ 176.297636] sp : ffff800080003630
[ 176.297743] x29: ffff800080003630 x28: 0000000000000008 x27:
ffff6828c49ad9f8
[ 176.298093] x26: ffff6828c49ad000 x25: 0000000000000000 x24:
00000000000003e8
[ 176.298430] x23: 0000000000000000 x22: ffff6828c4960b40 x21:
ffff6828c3b16d28
[ 176.298652] x20: ffff6828c3167048 x19: ffff6828c3b16d00 x18:
0000000000000014
[ 176.298926] x17: ffffb0476322f000 x16: ffffb7e164023730 x15:
0000000095744632
[ 176.299296] x14: ffff6828c3f1c880 x13: 0000000000000002 x12:
ffffb7e137926a70
[ 176.299574] x11: 0000000000000001 x10: ffff6828c3f1c898 x9 :
0000000000000000
[ 176.300049] x8 : ffff6828c49bf070 x7 : 0008460f18d5f20e x6 :
f20e0100bebafeca
[ 176.300302] x5 : ffff6828c7f918fe x4 : ffff6828c49bf070 x3 :
0000000000000000
[ 176.300586] x2 : 0000000000000000 x1 : ffff6828c3c7ad00 x0 :
ffff6828c7f918f0
[ 176.300889] Call trace:
[ 176.301123] br_nf_dev_queue_xmit+0x390/0x4ec [br_netfilter]
[ 176.301411] br_nf_post_routing+0x2a8/0x3e4 [br_netfilter]
[ 176.301703] nf_hook_slow+0x48/0x124
[ 176.302060] br_forward_finish+0xc8/0xe8 [bridge]
[ 176.302371] br_nf_hook_thresh+0x124/0x134 [br_netfilter]
[ 176.302605] br_nf_forward_finish+0x118/0x22c [br_netfilter]
[ 176.302824] br_nf_forward_ip.part.0+0x264/0x290 [br_netfilter]
[ 176.303136] br_nf_forward+0x2b8/0x4e0 [br_netfilter]
[ 176.303359] nf_hook_slow+0x48/0x124
[ 176.303803] __br_forward+0xc4/0x194 [bridge]
[ 176.304013] br_flood+0xd4/0x168 [bridge]
[ 176.304300] br_handle_frame_finish+0x1d4/0x5c4 [bridge]
[ 176.304536] br_nf_hook_thresh+0x124/0x134 [br_netfilter]
[ 176.304978] br_nf_pre_routing_finish+0x29c/0x494 [br_netfilter]
[ 176.305188] br_nf_pre_routing+0x250/0x524 [br_netfilter]
[ 176.305428] br_handle_frame+0x244/0x3cc [bridge]
[ 176.305695] __netif_receive_skb_core.constprop.0+0x33c/0xecc
[ 176.306080] __netif_receive_skb_one_core+0x40/0x8c
[ 176.306197] __netif_receive_skb+0x18/0x64
[ 176.306369] process_backlog+0x80/0x124
[ 176.306540] __napi_poll+0x38/0x17c
[ 176.306636] net_rx_action+0x124/0x26c
[ 176.306758] __do_softirq+0x100/0x26c
[ 176.307051] ____do_softirq+0x10/0x1c
[ 176.307162] call_on_irq_stack+0x24/0x4c
[ 176.307289] do_softirq_own_stack+0x1c/0x2c
[ 176.307396] do_softirq+0x54/0x6c
[ 176.307485] __local_bh_enable_ip+0x8c/0x98
[ 176.307637] __dev_queue_xmit+0x22c/0xd28
[ 176.307775] neigh_resolve_output+0xf4/0x1a0
[ 176.308018] ip_finish_output2+0x1c8/0x628
[ 176.308137] ip_do_fragment+0x5b4/0x658
[ 176.308279] ip_fragment.constprop.0+0x48/0xec
[ 176.308420] __ip_finish_output+0xa4/0x254
[ 176.308593] ip_finish_output+0x34/0x130
[ 176.308814] ip_output+0x6c/0x108
[ 176.308929] ip_send_skb+0x50/0xf0
[ 176.309095] ip_push_pending_frames+0x30/0x54
[ 176.309254] raw_sendmsg+0x758/0xaec
[ 176.309568] inet_sendmsg+0x44/0x70
[ 176.309667] __sys_sendto+0x110/0x178
[ 176.309758] __arm64_sys_sendto+0x28/0x38
[ 176.309918] invoke_syscall+0x48/0x110
[ 176.310211] el0_svc_common.constprop.0+0x40/0xe0
[ 176.310353] do_el0_svc+0x1c/0x28
[ 176.310434] el0_svc+0x34/0xb4
[ 176.310551] el0t_64_sync_handler+0x120/0x12c
[ 176.310690] el0t_64_sync+0x190/0x194
[ 176.311066] Code: f9402e61 79402aa2 927ff821 f9400023 (f9408860)
[ 176.315743] ---[ end trace 0000000000000000 ]---
[ 176.316060] Kernel panic - not syncing: Oops: Fatal exception in
interrupt
[ 176.316371] Kernel Offset: 0x37e0e3000000 from 0xffff800080000000
[ 176.316564] PHYS_OFFSET: 0xffff97d780000000
[ 176.316782] CPU features: 0x0,88000203,3c020000,0100421b
[ 176.317210] Memory Limit: none
[ 176.317527] ---[ end Kernel panic - not syncing: Oops: Fatal
Exception in interrupt ]---\
Fixes: 11538d039ac6 ("bridge: vlan dst_metadata hooks in ingress and egress paths")
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Andy Roulin <aroulin@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20241001154400.22787-2-aroulin@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 555f45d24ba7cd5527716553031641cdebbe76c7 ]
When user space deletes a port from an MDB entry, the port is removed
synchronously. If this was the last port in the entry and the entry is
not joined by the host itself, then the entry is scheduled for deletion
via a timer.
The above means that it is possible for the MDB get netlink request to
retrieve an empty entry which is scheduled for deletion. This is
problematic as after deleting the last port in an entry, user space
cannot rely on a non-zero return code from the MDB get request as an
indication that the port was successfully removed.
Fix by returning an error when the entry's port list is empty and the
entry is not joined by the host.
Fixes: 68b380a395a7 ("bridge: mcast: Add MDB get support")
Reported-by: Jamie Bainbridge <jamie.bainbridge@gmail.com>
Closes: https://lore.kernel.org/netdev/c92569919307749f879b9482b0f3e125b7d9d2e3.1726480066.git.jamie.bainbridge@gmail.com/
Tested-by: Jamie Bainbridge <jamie.bainbridge@gmail.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20240929123640.558525-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
When userspace wants to take over a fdb entry by setting it as
EXTERN_LEARNED, we set both flags BR_FDB_ADDED_BY_EXT_LEARN and
BR_FDB_ADDED_BY_USER in br_fdb_external_learn_add().
If the bridge updates the entry later because its port changed, we clear
the BR_FDB_ADDED_BY_EXT_LEARN flag, but leave the BR_FDB_ADDED_BY_USER
flag set.
If userspace then wants to take over the entry again,
br_fdb_external_learn_add() sees that BR_FDB_ADDED_BY_USER and skips
setting the BR_FDB_ADDED_BY_EXT_LEARN flags, thus silently ignores the
update.
Fix this by always allowing to set BR_FDB_ADDED_BY_EXT_LEARN regardless
if this was a user fdb entry or not.
Fixes: 710ae7287737 ("net: bridge: Mark FDB entries that were added by user as such")
Signed-off-by: Jonas Gorski <jonas.gorski@bisdn.de>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20240903081958.29951-1-jonas.gorski@bisdn.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Conntrack assumes an unconfirmed entry (not yet committed to global hash
table) has a refcount of 1 and is not visible to other cores.
With multicast forwarding this assumption breaks down because such
skbs get cloned after being picked up, i.e. ct->use refcount is > 1.
Likewise, bridge netfilter will clone broad/mutlicast frames and
all frames in case they need to be flood-forwarded during learning
phase.
For ip multicast forwarding or plain bridge flood-forward this will
"work" because packets don't leave softirq and are implicitly
serialized.
With nfqueue this no longer holds true, the packets get queued
and can be reinjected in arbitrary ways.
Disable this feature, I see no other solution.
After this patch, nfqueue cannot queue packets except the last
multicast/broadcast packet.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
syzbot hit a use-after-free[1] which is caused because the bridge doesn't
make sure that all previous garbage has been collected when removing a
port. What happens is:
CPU 1 CPU 2
start gc cycle remove port
acquire gc lock first
wait for lock
call br_multicasg_gc() directly
acquire lock now but free port
the port can be freed
while grp timers still
running
Make sure all previous gc cycles have finished by using flush_work before
freeing the port.
[1]
BUG: KASAN: slab-use-after-free in br_multicast_port_group_expired+0x4c0/0x550 net/bridge/br_multicast.c:861
Read of size 8 at addr ffff888071d6d000 by task syz.5.1232/9699
CPU: 1 PID: 9699 Comm: syz.5.1232 Not tainted 6.10.0-rc5-syzkaller-00021-g24ca36a562d6 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/07/2024
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:114
print_address_description mm/kasan/report.c:377 [inline]
print_report+0xc3/0x620 mm/kasan/report.c:488
kasan_report+0xd9/0x110 mm/kasan/report.c:601
br_multicast_port_group_expired+0x4c0/0x550 net/bridge/br_multicast.c:861
call_timer_fn+0x1a3/0x610 kernel/time/timer.c:1792
expire_timers kernel/time/timer.c:1843 [inline]
__run_timers+0x74b/0xaf0 kernel/time/timer.c:2417
__run_timer_base kernel/time/timer.c:2428 [inline]
__run_timer_base kernel/time/timer.c:2421 [inline]
run_timer_base+0x111/0x190 kernel/time/timer.c:2437
Reported-by: syzbot+263426984509be19c9a0@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=263426984509be19c9a0
Fixes: e12cec65b554 ("net: bridge: mcast: destroy all entries via gc")
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20240802080730.3206303-1-razor@blackwall.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
const qualify the struct ctl_table argument in the proc_handler function
signatures. This is a prerequisite to moving the static ctl_table
structs into .rodata data which will ensure that proc_handler function
pointers cannot be modified.
This patch has been generated by the following coccinelle script:
```
virtual patch
@r1@
identifier ctl, write, buffer, lenp, ppos;
identifier func !~ "appldata_(timer|interval)_handler|sched_(rt|rr)_handler|rds_tcp_skbuf_handler|proc_sctp_do_(hmac_alg|rto_min|rto_max|udp_port|alpha_beta|auth|probe_interval)";
@@
int func(
- struct ctl_table *ctl
+ const struct ctl_table *ctl
,int write, void *buffer, size_t *lenp, loff_t *ppos);
@r2@
identifier func, ctl, write, buffer, lenp, ppos;
@@
int func(
- struct ctl_table *ctl
+ const struct ctl_table *ctl
,int write, void *buffer, size_t *lenp, loff_t *ppos)
{ ... }
@r3@
identifier func;
@@
int func(
- struct ctl_table *
+ const struct ctl_table *
,int , void *, size_t *, loff_t *);
@r4@
identifier func, ctl;
@@
int func(
- struct ctl_table *ctl
+ const struct ctl_table *ctl
,int , void *, size_t *, loff_t *);
@r5@
identifier func, write, buffer, lenp, ppos;
@@
int func(
- struct ctl_table *
+ const struct ctl_table *
,int write, void *buffer, size_t *lenp, loff_t *ppos);
```
* Code formatting was adjusted in xfs_sysctl.c to comply with code
conventions. The xfs_stats_clear_proc_handler,
xfs_panic_mask_proc_handler and xfs_deprecated_dointvec_minmax where
adjusted.
* The ctl_table argument in proc_watchdog_common was const qualified.
This is called from a proc_handler itself and is calling back into
another proc_handler, making it necessary to change it as part of the
proc_handler migration.
Co-developed-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Co-developed-by: Joel Granados <j.granados@samsung.com>
Signed-off-by: Joel Granados <j.granados@samsung.com>
|
|
Merge in late fixes to prepare for the 6.11 net-next PR.
Conflicts:
93c3a96c301f ("net: pse-pd: Do not return EOPNOSUPP if config is null")
4cddb0f15ea9 ("net: ethtool: pse-pd: Fix possible null-deref")
30d7b6727724 ("net: ethtool: Add new power limit get and set features")
https://lore.kernel.org/20240715123204.623520bb@canb.auug.org.au/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If a port is blocking in the common instance but forwarding in an MST
instance, traffic egressing the bridge will be dropped because the
state of the common instance is overriding that of the MST instance.
Fix this by skipping the port state check in MST mode to allow
checking the vlan state via br_allowed_egress(). This is similar to
what happens in br_handle_frame_finish() when checking ingress
traffic, which was introduced in the change below.
Fixes: ec7328b59176 ("net: bridge: mst: Multiple Spanning Tree (MST) mode")
Signed-off-by: Elliot Ayrey <elliot.ayrey@alliedtelesis.co.nz>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
brnf_frag_data_storage is a per-CPU variable and relies on disabled BH
for its locking. Without per-CPU locking in local_bh_disable() on
PREEMPT_RT this data structure requires explicit locking.
Add a local_lock_t to the data structure and use local_lock_nested_bh()
for locking. This change adds only lockdep coverage and does not alter
the functional behaviour for !PREEMPT_RT.
Cc: Florian Westphal <fw@strlen.de>
Cc: Jozsef Kadlecsik <kadlec@netfilter.org>
Cc: Nikolay Aleksandrov <razor@blackwall.org>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Roopa Prabhu <roopa@nvidia.com>
Cc: bridge@lists.linux.dev
Cc: coreteam@netfilter.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://patch.msgid.link/20240620132727.660738-8-bigeasy@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Cross-merge networking fixes after downstream PR.
No conflicts, no adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
I converted br_mst_set_state to RCU to avoid a vlan use-after-free
but forgot to change the vlan group dereference helper. Switch to vlan
group RCU deref helper to fix the suspicious rcu usage warning.
Fixes: 3a7c1661ae13 ("net: bridge: mst: fix vlan use-after-free")
Reported-by: syzbot+9bbe2de1bc9d470eb5fe@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=9bbe2de1bc9d470eb5fe
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/20240609103654.914987-3-razor@blackwall.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pass the already obtained vlan group pointer to br_mst_vlan_set_state()
instead of dereferencing it again. Each caller has already correctly
dereferenced it for their context. This change is required for the
following suspicious RCU dereference fix. No functional changes
intended.
Fixes: 3a7c1661ae13 ("net: bridge: mst: fix vlan use-after-free")
Reported-by: syzbot+9bbe2de1bc9d470eb5fe@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=9bbe2de1bc9d470eb5fe
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/20240609103654.914987-2-razor@blackwall.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Smatch complains:
net/bridge/br_netlink_tunnel.c:
318 br_process_vlan_tunnel_info() warn: inconsistent indenting
Fix it with a proper indenting
Signed-off-by: Chen Hanxiao <chenhx.fnst@fujitsu.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
mono_delivery_time was added to check if skb->tstamp has delivery
time in mono clock base (i.e. EDT) otherwise skb->tstamp has
timestamp in ingress and delivery_time at egress.
Renaming the bitfield from mono_delivery_time to tstamp_type is for
extensibilty for other timestamps such as userspace timestamp
(i.e. SO_TXTIME) set via sock opts.
As we are renaming the mono_delivery_time to tstamp_type, it makes
sense to start assigning tstamp_type based on enum defined
in this commit.
Earlier we used bool arg flag to check if the tstamp is mono in
function skb_set_delivery_time, Now the signature of the functions
accepts tstamp_type to distinguish between mono and real time.
Also skb_set_delivery_type_by_clockid is a new function which accepts
clockid to determine the tstamp_type.
In future tstamp_type:1 can be extended to support userspace timestamp
by increasing the bitfield.
Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240509211834.3235191-2-quic_abchauha@quicinc.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
Cross merge.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
syzbot reported a suspicious rcu usage[1] in bridge's mst code. While
fixing it I noticed that nothing prevents a vlan to be freed while
walking the list from the same path (br forward delay timer). Fix the rcu
usage and also make sure we are not accessing freed memory by making
br_mst_vlan_set_state use rcu read lock.
[1]
WARNING: suspicious RCU usage
6.9.0-rc6-syzkaller #0 Not tainted
-----------------------------
net/bridge/br_private.h:1599 suspicious rcu_dereference_protected() usage!
...
stack backtrace:
CPU: 1 PID: 8017 Comm: syz-executor.1 Not tainted 6.9.0-rc6-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
lockdep_rcu_suspicious+0x221/0x340 kernel/locking/lockdep.c:6712
nbp_vlan_group net/bridge/br_private.h:1599 [inline]
br_mst_set_state+0x1ea/0x650 net/bridge/br_mst.c:105
br_set_state+0x28a/0x7b0 net/bridge/br_stp.c:47
br_forward_delay_timer_expired+0x176/0x440 net/bridge/br_stp_timer.c:88
call_timer_fn+0x18e/0x650 kernel/time/timer.c:1793
expire_timers kernel/time/timer.c:1844 [inline]
__run_timers kernel/time/timer.c:2418 [inline]
__run_timer_base+0x66a/0x8e0 kernel/time/timer.c:2429
run_timer_base kernel/time/timer.c:2438 [inline]
run_timer_softirq+0xb7/0x170 kernel/time/timer.c:2448
__do_softirq+0x2c6/0x980 kernel/softirq.c:554
invoke_softirq kernel/softirq.c:428 [inline]
__irq_exit_rcu+0xf2/0x1c0 kernel/softirq.c:633
irq_exit_rcu+0x9/0x30 kernel/softirq.c:645
instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1043 [inline]
sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1043
</IRQ>
<TASK>
asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702
RIP: 0010:lock_acquire+0x264/0x550 kernel/locking/lockdep.c:5758
Code: 2b 00 74 08 4c 89 f7 e8 ba d1 84 00 f6 44 24 61 02 0f 85 85 01 00 00 41 f7 c7 00 02 00 00 74 01 fb 48 c7 44 24 40 0e 36 e0 45 <4b> c7 44 25 00 00 00 00 00 43 c7 44 25 09 00 00 00 00 43 c7 44 25
RSP: 0018:ffffc90013657100 EFLAGS: 00000206
RAX: 0000000000000001 RBX: 1ffff920026cae2c RCX: 0000000000000001
RDX: dffffc0000000000 RSI: ffffffff8bcaca00 RDI: ffffffff8c1eaa60
RBP: ffffc90013657260 R08: ffffffff92efe507 R09: 1ffffffff25dfca0
R10: dffffc0000000000 R11: fffffbfff25dfca1 R12: 1ffff920026cae28
R13: dffffc0000000000 R14: ffffc90013657160 R15: 0000000000000246
Fixes: ec7328b59176 ("net: bridge: mst: Multiple Spanning Tree (MST) mode")
Reported-by: syzbot+fa04eb8a56fd923fc5d8@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=fa04eb8a56fd923fc5d8
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
syzbot triggered an uninit value[1] error in bridge device's xmit path
by sending a short (less than ETH_HLEN bytes) skb. To fix it check if
we can actually pull that amount instead of assuming.
Tested with dropwatch:
drop at: br_dev_xmit+0xb93/0x12d0 [bridge] (0xffffffffc06739b3)
origin: software
timestamp: Mon May 13 11:31:53 2024 778214037 nsec
protocol: 0x88a8
length: 2
original length: 2
drop reason: PKT_TOO_SMALL
[1]
BUG: KMSAN: uninit-value in br_dev_xmit+0x61d/0x1cb0 net/bridge/br_device.c:65
br_dev_xmit+0x61d/0x1cb0 net/bridge/br_device.c:65
__netdev_start_xmit include/linux/netdevice.h:4903 [inline]
netdev_start_xmit include/linux/netdevice.h:4917 [inline]
xmit_one net/core/dev.c:3531 [inline]
dev_hard_start_xmit+0x247/0xa20 net/core/dev.c:3547
__dev_queue_xmit+0x34db/0x5350 net/core/dev.c:4341
dev_queue_xmit include/linux/netdevice.h:3091 [inline]
__bpf_tx_skb net/core/filter.c:2136 [inline]
__bpf_redirect_common net/core/filter.c:2180 [inline]
__bpf_redirect+0x14a6/0x1620 net/core/filter.c:2187
____bpf_clone_redirect net/core/filter.c:2460 [inline]
bpf_clone_redirect+0x328/0x470 net/core/filter.c:2432
___bpf_prog_run+0x13fe/0xe0f0 kernel/bpf/core.c:1997
__bpf_prog_run512+0xb5/0xe0 kernel/bpf/core.c:2238
bpf_dispatcher_nop_func include/linux/bpf.h:1234 [inline]
__bpf_prog_run include/linux/filter.h:657 [inline]
bpf_prog_run include/linux/filter.h:664 [inline]
bpf_test_run+0x499/0xc30 net/bpf/test_run.c:425
bpf_prog_test_run_skb+0x14ea/0x1f20 net/bpf/test_run.c:1058
bpf_prog_test_run+0x6b7/0xad0 kernel/bpf/syscall.c:4269
__sys_bpf+0x6aa/0xd90 kernel/bpf/syscall.c:5678
__do_sys_bpf kernel/bpf/syscall.c:5767 [inline]
__se_sys_bpf kernel/bpf/syscall.c:5765 [inline]
__x64_sys_bpf+0xa0/0xe0 kernel/bpf/syscall.c:5765
x64_sys_call+0x96b/0x3b50 arch/x86/include/generated/asm/syscalls_64.h:322
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot+a63a1f6a062033cf0f40@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=a63a1f6a062033cf0f40
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Cross-merge networking fixes after downstream PR.
No conflicts.
Adjacent changes:
drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
35d92abfbad8 ("net: hns3: fix kernel crash when devlink reload during initialization")
2a1a1a7b5fd7 ("net: hns3: add command queue trace for hns3")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The change from skb_copy to pskb_copy unfortunately changed the data
copying to omit the ethernet header, since it was pulled before reaching
this point. Fix this by calling __skb_push/pull around pskb_copy.
Fixes: 59c878cbcdd8 ("net: bridge: fix multicast-to-unicast with fraglist GSO")
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Simon reported that ndo_change_mtu() methods were never
updated to use WRITE_ONCE(dev->mtu, new_mtu) as hinted
in commit 501a90c94510 ("inet: protect against too small
mtu values.")
We read dev->mtu without holding RTNL in many places,
with READ_ONCE() annotations.
It is time to take care of ndo_change_mtu() methods
to use corresponding WRITE_ONCE()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Simon Horman <horms@kernel.org>
Closes: https://lore.kernel.org/netdev/20240505144608.GB67882@kernel.org/
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://lore.kernel.org/r/20240506102812.3025432-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This commit comes at the tail end of a greater effort to remove the
empty elements at the end of the ctl_table arrays (sentinels) which will
reduce the overall build time size of the kernel and run time memory
bloat by ~64 bytes per sentinel (further information Link :
https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@bombadil.infradead.org/)
* Remove sentinel elements from ctl_table structs
* Remove instances where an array element is zeroed out to make it look
like a sentinel. This is not longer needed and is safe after commit
c899710fe7f9 ("networking: Update to register_net_sysctl_sz") added
the array size to the ctl_table registration
* Remove the need for having __NF_SYSCTL_CT_LAST_SYSCTL as the
sysctl array size is now in NF_SYSCTL_CT_LAST_SYSCTL
* Remove extra element in ctl_table arrays declarations
Acked-by: Kees Cook <keescook@chromium.org> # loadpin & yama
Signed-off-by: Joel Granados <j.granados@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Cross-merge networking fixes after downstream PR.
Conflicts:
include/linux/filter.h
kernel/bpf/core.c
66e13b615a0c ("bpf: verifier: prevent userspace memory access")
d503a04f8bc0 ("bpf: Add support for certain atomics in bpf_arena to x86 JIT")
https://lore.kernel.org/all/20240429114939.210328b0@canb.auug.org.au/
No adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Calling skb_copy on a SKB_GSO_FRAGLIST skb is not valid, since it returns
an invalid linearized skb. This code only needs to change the ethernet
header, so pskb_copy is the right function to call here.
Fixes: 6db6f0eae605 ("bridge: multicast to unicast")
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Cross-merge networking fixes after downstream PR.
Conflicts:
drivers/net/ethernet/ti/icssg/icssg_prueth.c
net/mac80211/chan.c
89884459a0b9 ("wifi: mac80211: fix idle calculation with multi-link")
87f5500285fb ("wifi: mac80211: simplify ieee80211_assign_link_chanctx()")
https://lore.kernel.org/all/20240422105623.7b1fbda2@canb.auug.org.au/
net/unix/garbage.c
1971d13ffa84 ("af_unix: Suppress false-positive lockdep splat for spin_lock() in __unix_gc().")
4090fa373f0e ("af_unix: Replace garbage collection algorithm.")
drivers/net/ethernet/ti/icssg/icssg_prueth.c
drivers/net/ethernet/ti/icssg/icssg_common.c
4dcd0e83ea1d ("net: ti: icssg-prueth: Fix signedness bug in prueth_init_rx_chns()")
e2dc7bfd677f ("net: ti: icssg-prueth: Move common functions into a separate file")
No adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In br_fill_forward_path(), f->dst is checked not to be NULL, then
immediately read using READ_ONCE and checked again. The first check is
useless, so this patch aims to remove the redundant check of f->dst.
Signed-off-by: linke li <lilinke99@qq.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
br_info_notify is a void function. There is no need to return.
Fixes: b6d0425b816e ("bridge: cfm: Netlink Notifications.")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
To be able to constify instances of struct ctl_tables it is necessary to
remove ways through which non-const versions are exposed from the
sysctl core.
One of these is the ctl_table_arg member of struct ctl_table_header.
Constify this reference as a prerequisite for the full constification of
struct ctl_table instances.
No functional change.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Cross-merge networking fixes after downstream PR.
Conflicts:
include/trace/events/rpcgss.h
386f4a737964 ("trace: events: cleanup deprecated strncpy uses")
a4833e3abae1 ("SUNRPC: Fix rpcgss_context trace event acceptor field")
Adjacent changes:
drivers/net/ethernet/intel/ice/ice_tc_lib.c
2cca35f5dd78 ("ice: Fix checking for unsupported keys on non-tunnel device")
784feaa65dfd ("ice: Add support for PFCP hardware offload in switchdev")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
For historical reasons, when bridge device is in promisc mode, packets
that are directed to the taps follow bridge input hook path. This patch
adds a workaround to reset conntrack for these packets.
Jianbo Liu reports warning splats in their test infrastructure where
cloned packets reach the br_netfilter input hook to confirm the
conntrack object.
Scratch one bit from BR_INPUT_SKB_CB to annotate that this packet has
reached the input hook because it is passed up to the bridge device to
reach the taps.
[ 57.571874] WARNING: CPU: 1 PID: 0 at net/bridge/br_netfilter_hooks.c:616 br_nf_local_in+0x157/0x180 [br_netfilter]
[ 57.572749] Modules linked in: xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat xt_addrtype xt_conntrack nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_isc si ib_umad rdma_cm ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core mlx5ctl mlx5_core
[ 57.575158] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.8.0+ #19
[ 57.575700] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 57.576662] RIP: 0010:br_nf_local_in+0x157/0x180 [br_netfilter]
[ 57.577195] Code: fe ff ff 41 bd 04 00 00 00 be 04 00 00 00 e9 4a ff ff ff be 04 00 00 00 48 89 ef e8 f3 a9 3c e1 66 83 ad b4 00 00 00 04 eb 91 <0f> 0b e9 f1 fe ff ff 0f 0b e9 df fe ff ff 48 89 df e8 b3 53 47 e1
[ 57.578722] RSP: 0018:ffff88885f845a08 EFLAGS: 00010202
[ 57.579207] RAX: 0000000000000002 RBX: ffff88812dfe8000 RCX: 0000000000000000
[ 57.579830] RDX: ffff88885f845a60 RSI: ffff8881022dc300 RDI: 0000000000000000
[ 57.580454] RBP: ffff88885f845a60 R08: 0000000000000001 R09: 0000000000000003
[ 57.581076] R10: 00000000ffff1300 R11: 0000000000000002 R12: 0000000000000000
[ 57.581695] R13: ffff8881047ffe00 R14: ffff888108dbee00 R15: ffff88814519b800
[ 57.582313] FS: 0000000000000000(0000) GS:ffff88885f840000(0000) knlGS:0000000000000000
[ 57.583040] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 57.583564] CR2: 000000c4206aa000 CR3: 0000000103847001 CR4: 0000000000370eb0
[ 57.584194] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 57.584820] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[ 57.585440] Call Trace:
[ 57.585721] <IRQ>
[ 57.585976] ? __warn+0x7d/0x130
[ 57.586323] ? br_nf_local_in+0x157/0x180 [br_netfilter]
[ 57.586811] ? report_bug+0xf1/0x1c0
[ 57.587177] ? handle_bug+0x3f/0x70
[ 57.587539] ? exc_invalid_op+0x13/0x60
[ 57.587929] ? asm_exc_invalid_op+0x16/0x20
[ 57.588336] ? br_nf_local_in+0x157/0x180 [br_netfilter]
[ 57.588825] nf_hook_slow+0x3d/0xd0
[ 57.589188] ? br_handle_vlan+0x4b/0x110
[ 57.589579] br_pass_frame_up+0xfc/0x150
[ 57.589970] ? br_port_flags_change+0x40/0x40
[ 57.590396] br_handle_frame_finish+0x346/0x5e0
[ 57.590837] ? ipt_do_table+0x32e/0x430
[ 57.591221] ? br_handle_local_finish+0x20/0x20
[ 57.591656] br_nf_hook_thresh+0x4b/0xf0 [br_netfilter]
[ 57.592286] ? br_handle_local_finish+0x20/0x20
[ 57.592802] br_nf_pre_routing_finish+0x178/0x480 [br_netfilter]
[ 57.593348] ? br_handle_local_finish+0x20/0x20
[ 57.593782] ? nf_nat_ipv4_pre_routing+0x25/0x60 [nf_nat]
[ 57.594279] br_nf_pre_routing+0x24c/0x550 [br_netfilter]
[ 57.594780] ? br_nf_hook_thresh+0xf0/0xf0 [br_netfilter]
[ 57.595280] br_handle_frame+0x1f3/0x3d0
[ 57.595676] ? br_handle_local_finish+0x20/0x20
[ 57.596118] ? br_handle_frame_finish+0x5e0/0x5e0
[ 57.596566] __netif_receive_skb_core+0x25b/0xfc0
[ 57.597017] ? __napi_build_skb+0x37/0x40
[ 57.597418] __netif_receive_skb_list_core+0xfb/0x220
Fixes: 62e7151ae3eb ("netfilter: bridge: confirm multicast packets before passing them up the stack")
Reported-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Add a "scope" parameter to ip_route_output() so that callers don't have
to override the tos parameter with the RTO_ONLINK flag if they want a
local scope.
This will allow converting flowi4_tos to dscp_t in the future, thus
allowing static analysers to flag invalid interactions between
"tos" (the DSCP bits) and ECN.
Only three users ask for local scope (bonding, arp and atm). The others
continue to use RT_SCOPE_UNIVERSE. While there, add a comment to warn
users about the limitations of ip_route_output().
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Acked-by: Leon Romanovsky <leonro@nvidia.com> # infiniband
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Cross-merge networking fixes after downstream PR.
Conflicts:
net/ipv4/ip_gre.c
17af420545a7 ("erspan: make sure erspan_base_hdr is present in skb->head")
5832c4a77d69 ("ip_tunnel: convert __be16 tunnel flags to bitmaps")
https://lore.kernel.org/all/20240402103253.3b54a1cf@canb.auug.org.au/
Adjacent changes:
net/ipv6/ip6_fib.c
d21d40605bca ("ipv6: Fix infinite recursion in fib6_dump_done().")
5fc68320c1fb ("ipv6: remove RTNL protection from inet6_dump_fib()")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
I got multiple syzbot reports showing old bugs exposed
by BPF after commit 20f2505fb436 ("bpf: Try to avoid kzalloc
in cgroup/{s,g}etsockopt")
setsockopt() @optlen argument should be taken into account
before copying data.
BUG: KASAN: slab-out-of-bounds in copy_from_sockptr_offset include/linux/sockptr.h:49 [inline]
BUG: KASAN: slab-out-of-bounds in copy_from_sockptr include/linux/sockptr.h:55 [inline]
BUG: KASAN: slab-out-of-bounds in do_replace net/ipv4/netfilter/ip_tables.c:1111 [inline]
BUG: KASAN: slab-out-of-bounds in do_ipt_set_ctl+0x902/0x3dd0 net/ipv4/netfilter/ip_tables.c:1627
Read of size 96 at addr ffff88802cd73da0 by task syz-executor.4/7238
CPU: 1 PID: 7238 Comm: syz-executor.4 Not tainted 6.9.0-rc2-next-20240403-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
print_address_description mm/kasan/report.c:377 [inline]
print_report+0x169/0x550 mm/kasan/report.c:488
kasan_report+0x143/0x180 mm/kasan/report.c:601
kasan_check_range+0x282/0x290 mm/kasan/generic.c:189
__asan_memcpy+0x29/0x70 mm/kasan/shadow.c:105
copy_from_sockptr_offset include/linux/sockptr.h:49 [inline]
copy_from_sockptr include/linux/sockptr.h:55 [inline]
do_replace net/ipv4/netfilter/ip_tables.c:1111 [inline]
do_ipt_set_ctl+0x902/0x3dd0 net/ipv4/netfilter/ip_tables.c:1627
nf_setsockopt+0x295/0x2c0 net/netfilter/nf_sockopt.c:101
do_sock_setsockopt+0x3af/0x720 net/socket.c:2311
__sys_setsockopt+0x1ae/0x250 net/socket.c:2334
__do_sys_setsockopt net/socket.c:2343 [inline]
__se_sys_setsockopt net/socket.c:2340 [inline]
__x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2340
do_syscall_64+0xfb/0x240
entry_SYSCALL_64_after_hwframe+0x72/0x7a
RIP: 0033:0x7fd22067dde9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fd21f9ff0c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
RAX: ffffffffffffffda RBX: 00007fd2207abf80 RCX: 00007fd22067dde9
RDX: 0000000000000040 RSI: 0000000000000000 RDI: 0000000000000003
RBP: 00007fd2206ca47a R08: 0000000000000001 R09: 0000000000000000
R10: 0000000020000880 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000000000b R14: 00007fd2207abf80 R15: 00007ffd2d0170d8
</TASK>
Allocated by task 7238:
kasan_save_stack mm/kasan/common.c:47 [inline]
kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
poison_kmalloc_redzone mm/kasan/common.c:370 [inline]
__kasan_kmalloc+0x98/0xb0 mm/kasan/common.c:387
kasan_kmalloc include/linux/kasan.h:211 [inline]
__do_kmalloc_node mm/slub.c:4069 [inline]
__kmalloc_noprof+0x200/0x410 mm/slub.c:4082
kmalloc_noprof include/linux/slab.h:664 [inline]
__cgroup_bpf_run_filter_setsockopt+0xd47/0x1050 kernel/bpf/cgroup.c:1869
do_sock_setsockopt+0x6b4/0x720 net/socket.c:2293
__sys_setsockopt+0x1ae/0x250 net/socket.c:2334
__do_sys_setsockopt net/socket.c:2343 [inline]
__se_sys_setsockopt net/socket.c:2340 [inline]
__x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2340
do_syscall_64+0xfb/0x240
entry_SYSCALL_64_after_hwframe+0x72/0x7a
The buggy address belongs to the object at ffff88802cd73da0
which belongs to the cache kmalloc-8 of size 8
The buggy address is located 0 bytes inside of
allocated 1-byte region [ffff88802cd73da0, ffff88802cd73da1)
The buggy address belongs to the physical page:
page: refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88802cd73020 pfn:0x2cd73
flags: 0xfff80000000000(node=0|zone=1|lastcpupid=0xfff)
page_type: 0xffffefff(slab)
raw: 00fff80000000000 ffff888015041280 dead000000000100 dead000000000122
raw: ffff88802cd73020 000000008080007f 00000001ffffefff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 0, migratetype Unmovable, gfp_mask 0x12cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY), pid 5103, tgid 2119833701 (syz-executor.4), ts 5103, free_ts 70804600828
set_page_owner include/linux/page_owner.h:32 [inline]
post_alloc_hook+0x1f3/0x230 mm/page_alloc.c:1490
prep_new_page mm/page_alloc.c:1498 [inline]
get_page_from_freelist+0x2e7e/0x2f40 mm/page_alloc.c:3454
__alloc_pages_noprof+0x256/0x6c0 mm/page_alloc.c:4712
__alloc_pages_node_noprof include/linux/gfp.h:244 [inline]
alloc_pages_node_noprof include/linux/gfp.h:271 [inline]
alloc_slab_page+0x5f/0x120 mm/slub.c:2249
allocate_slab+0x5a/0x2e0 mm/slub.c:2412
new_slab mm/slub.c:2465 [inline]
___slab_alloc+0xcd1/0x14b0 mm/slub.c:3615
__slab_alloc+0x58/0xa0 mm/slub.c:3705
__slab_alloc_node mm/slub.c:3758 [inline]
slab_alloc_node mm/slub.c:3936 [inline]
__do_kmalloc_node mm/slub.c:4068 [inline]
kmalloc_node_track_caller_noprof+0x286/0x450 mm/slub.c:4089
kstrdup+0x3a/0x80 mm/util.c:62
device_rename+0xb5/0x1b0 drivers/base/core.c:4558
dev_change_name+0x275/0x860 net/core/dev.c:1232
do_setlink+0xa4b/0x41f0 net/core/rtnetlink.c:2864
__rtnl_newlink net/core/rtnetlink.c:3680 [inline]
rtnl_newlink+0x180b/0x20a0 net/core/rtnetlink.c:3727
rtnetlink_rcv_msg+0x89b/0x10d0 net/core/rtnetlink.c:6594
netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2559
netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1361
page last free pid 5146 tgid 5146 stack trace:
reset_page_owner include/linux/page_owner.h:25 [inline]
free_pages_prepare mm/page_alloc.c:1110 [inline]
free_unref_page+0xd3c/0xec0 mm/page_alloc.c:2617
discard_slab mm/slub.c:2511 [inline]
__put_partials+0xeb/0x130 mm/slub.c:2980
put_cpu_partial+0x17c/0x250 mm/slub.c:3055
__slab_free+0x2ea/0x3d0 mm/slub.c:4254
qlink_free mm/kasan/quarantine.c:163 [inline]
qlist_free_all+0x9e/0x140 mm/kasan/quarantine.c:179
kasan_quarantine_reduce+0x14f/0x170 mm/kasan/quarantine.c:286
__kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:322
kasan_slab_alloc include/linux/kasan.h:201 [inline]
slab_post_alloc_hook mm/slub.c:3888 [inline]
slab_alloc_node mm/slub.c:3948 [inline]
__do_kmalloc_node mm/slub.c:4068 [inline]
__kmalloc_node_noprof+0x1d7/0x450 mm/slub.c:4076
kmalloc_node_noprof include/linux/slab.h:681 [inline]
kvmalloc_node_noprof+0x72/0x190 mm/util.c:634
bucket_table_alloc lib/rhashtable.c:186 [inline]
rhashtable_rehash_alloc+0x9e/0x290 lib/rhashtable.c:367
rht_deferred_worker+0x4e1/0x2440 lib/rhashtable.c:427
process_one_work kernel/workqueue.c:3218 [inline]
process_scheduled_works+0xa2c/0x1830 kernel/workqueue.c:3299
worker_thread+0x86d/0xd70 kernel/workqueue.c:3380
kthread+0x2f0/0x390 kernel/kthread.c:388
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243
Memory state around the buggy address:
ffff88802cd73c80: 07 fc fc fc 05 fc fc fc 05 fc fc fc fa fc fc fc
ffff88802cd73d00: fa fc fc fc fa fc fc fc fa fc fc fc fa fc fc fc
>ffff88802cd73d80: fa fc fc fc 01 fc fc fc fa fc fc fc fa fc fc fc
^
ffff88802cd73e00: fa fc fc fc fa fc fc fc 05 fc fc fc 07 fc fc fc
ffff88802cd73e80: 07 fc fc fc 07 fc fc fc 07 fc fc fc 07 fc fc fc
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
Link: https://lore.kernel.org/r/20240404122051.2303764-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Historically, tunnel flags like TUNNEL_CSUM or TUNNEL_ERSPAN_OPT
have been defined as __be16. Now all of those 16 bits are occupied
and there's no more free space for new flags.
It can't be simply switched to a bigger container with no
adjustments to the values, since it's an explicit Endian storage,
and on LE systems (__be16)0x0001 equals to
(__be64)0x0001000000000000.
We could probably define new 64-bit flags depending on the
Endianness, i.e. (__be64)0x0001 on BE and (__be64)0x00010000... on
LE, but that would introduce an Endianness dependency and spawn a
ton of Sparse warnings. To mitigate them, all of those places which
were adjusted with this change would be touched anyway, so why not
define stuff properly if there's no choice.
Define IP_TUNNEL_*_BIT counterparts as a bit number instead of the
value already coded and a fistful of <16 <-> bitmap> converters and
helpers. The two flags which have a different bit position are
SIT_ISATAP_BIT and VTI_ISVTI_BIT, as they were defined not as
__cpu_to_be16(), but as (__force __be16), i.e. had different
positions on LE and BE. Now they both have strongly defined places.
Change all __be16 fields which were used to store those flags, to
IP_TUNNEL_DECLARE_FLAGS() -> DECLARE_BITMAP(__IP_TUNNEL_FLAG_NUM) ->
unsigned long[1] for now, and replace all |