summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2021-10-13net: prefer socket bound to interface when not in VRFMike Manning4-4/+8
[ Upstream commit 8d6c414cd2fb74aa6812e9bfec6178f8246c4f3a ] The commit 6da5b0f027a8 ("net: ensure unbound datagram socket to be chosen when not in a VRF") modified compute_score() so that a device match is always made, not just in the case of an l3mdev skb, then increments the score also for unbound sockets. This ensures that sockets bound to an l3mdev are never selected when not in a VRF. But as unbound and bound sockets are now scored equally, this results in the last opened socket being selected if there are matches in the default VRF for an unbound socket and a socket bound to a dev that is not an l3mdev. However, handling prior to this commit was to always select the bound socket in this case. Reinstate this handling by incrementing the score only for bound sockets. The required isolation due to choosing between an unbound socket and a socket bound to an l3mdev remains in place due to the device match always being made. The same approach is taken for compute_score() for stream sockets. Fixes: 6da5b0f027a8 ("net: ensure unbound datagram socket to be chosen when not in a VRF") Fixes: e78190581aff ("net: ensure unbound stream socket to be chosen when not in a VRF") Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/cf0a8523-b362-1edf-ee78-eef63cbbb428@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-13rtnetlink: fix if_nlmsg_stats_size() under estimationEric Dumazet1-1/+1
[ Upstream commit d34367991933d28bd7331f67a759be9a8c474014 ] rtnl_fill_statsinfo() is filling skb with one mandatory if_stats_msg structure. nlmsg_put(skb, pid, seq, type, sizeof(struct if_stats_msg), flags); But if_nlmsg_stats_size() never considered the needed storage. This bug did not show up because alloc_skb(X) allocates skb with extra tailroom, because of added alignments. This could very well be changed in the future to have deterministic behavior. Fixes: 10c9ead9f3c6 ("rtnetlink: add new RTM_GETSTATS message to dump link stats") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Roopa Prabhu <roopa@nvidia.com> Acked-by: Roopa Prabhu <roopa@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-13netlink: annotate data races around nlk->boundEric Dumazet1-4/+10
[ Upstream commit 7707a4d01a648e4c655101a469c956cb11273655 ] While existing code is correct, KCSAN is reporting a data-race in netlink_insert / netlink_sendmsg [1] It is correct to read nlk->bound without a lock, as netlink_autobind() will acquire all needed locks. [1] BUG: KCSAN: data-race in netlink_insert / netlink_sendmsg write to 0xffff8881031c8b30 of 1 bytes by task 18752 on cpu 0: netlink_insert+0x5cc/0x7f0 net/netlink/af_netlink.c:597 netlink_autobind+0xa9/0x150 net/netlink/af_netlink.c:842 netlink_sendmsg+0x479/0x7c0 net/netlink/af_netlink.c:1892 sock_sendmsg_nosec net/socket.c:703 [inline] sock_sendmsg net/socket.c:723 [inline] ____sys_sendmsg+0x360/0x4d0 net/socket.c:2392 ___sys_sendmsg net/socket.c:2446 [inline] __sys_sendmsg+0x1ed/0x270 net/socket.c:2475 __do_sys_sendmsg net/socket.c:2484 [inline] __se_sys_sendmsg net/socket.c:2482 [inline] __x64_sys_sendmsg+0x42/0x50 net/socket.c:2482 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae read to 0xffff8881031c8b30 of 1 bytes by task 18751 on cpu 1: netlink_sendmsg+0x270/0x7c0 net/netlink/af_netlink.c:1891 sock_sendmsg_nosec net/socket.c:703 [inline] sock_sendmsg net/socket.c:723 [inline] __sys_sendto+0x2a8/0x370 net/socket.c:2019 __do_sys_sendto net/socket.c:2031 [inline] __se_sys_sendto net/socket.c:2027 [inline] __x64_sys_sendto+0x74/0x90 net/socket.c:2027 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae value changed: 0x00 -> 0x01 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 18751 Comm: syz-executor.0 Not tainted 5.14.0-rc1-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Fixes: da314c9923fe ("netlink: Replace rhash_portid with bound") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-13net/sched: sch_taprio: properly cancel timer from taprio_destroy()Eric Dumazet1-0/+4
[ Upstream commit a56d447f196fa9973c568f54c0d76d5391c3b0c0 ] There is a comment in qdisc_create() about us not calling ops->reset() in some cases. err_out4: /* * Any broken qdiscs that would require a ops->reset() here? * The qdisc was never in action so it shouldn't be necessary. */ As taprio sets a timer before actually receiving a packet, we need to cancel it from ops->destroy, just in case ops->reset has not been called. syzbot reported: ODEBUG: free active (active state 0) object type: hrtimer hint: advance_sched+0x0/0x9a0 arch/x86/include/asm/atomic64_64.h:22 WARNING: CPU: 0 PID: 8441 at lib/debugobjects.c:505 debug_print_object+0x16e/0x250 lib/debugobjects.c:505 Modules linked in: CPU: 0 PID: 8441 Comm: syz-executor813 Not tainted 5.14.0-rc6-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:debug_print_object+0x16e/0x250 lib/debugobjects.c:505 Code: ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 af 00 00 00 48 8b 14 dd e0 d3 e3 89 4c 89 ee 48 c7 c7 e0 c7 e3 89 e8 5b 86 11 05 <0f> 0b 83 05 85 03 92 09 01 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e c3 RSP: 0018:ffffc9000130f330 EFLAGS: 00010282 RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000000 RDX: ffff88802baeb880 RSI: ffffffff815d87b5 RDI: fffff52000261e58 RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000 R10: ffffffff815d25ee R11: 0000000000000000 R12: ffffffff898dd020 R13: ffffffff89e3ce20 R14: ffffffff81653630 R15: dffffc0000000000 FS: 0000000000f0d300(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffb64b3e000 CR3: 0000000036557000 CR4: 00000000001506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __debug_check_no_obj_freed lib/debugobjects.c:987 [inline] debug_check_no_obj_freed+0x301/0x420 lib/debugobjects.c:1018 slab_free_hook mm/slub.c:1603 [inline] slab_free_freelist_hook+0x171/0x240 mm/slub.c:1653 slab_free mm/slub.c:3213 [inline] kfree+0xe4/0x540 mm/slub.c:4267 qdisc_create+0xbcf/0x1320 net/sched/sch_api.c:1299 tc_modify_qdisc+0x4c8/0x1a60 net/sched/sch_api.c:1663 rtnetlink_rcv_msg+0x413/0xb80 net/core/rtnetlink.c:5571 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504 netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline] netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1340 netlink_sendmsg+0x86d/0xdb0 net/netlink/af_netlink.c:1929 sock_sendmsg_nosec net/socket.c:704 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:724 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2403 ___sys_sendmsg+0xf3/0x170 net/socket.c:2457 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2486 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 Fixes: 44d4775ca518 ("net/sched: sch_taprio: reset child qdiscs before freeing them") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Davide Caratti <dcaratti@redhat.com> Reported-by: syzbot <syzkaller@googlegroups.com> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Acked-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-13net: bridge: fix under estimation in br_get_linkxstats_size()Eric Dumazet1-0/+1
[ Upstream commit 0854a0513321cf70bea5fa483ebcaa983cc7c62e ] Commit de1799667b00 ("net: bridge: add STP xstats") added an additional nla_reserve_64bit() in br_fill_linkxstats(), but forgot to update br_get_linkxstats_size() accordingly. This can trigger the following in rtnl_stats_get() WARN_ON(err == -EMSGSIZE); Fixes: de1799667b00 ("net: bridge: add STP xstats") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Vivien Didelot <vivien.didelot@gmail.com> Cc: Nikolay Aleksandrov <nikolay@nvidia.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-13net: bridge: use nla_total_size_64bit() in br_get_linkxstats_size()Eric Dumazet1-1/+1
[ Upstream commit dbe0b88064494b7bb6a9b2aa7e085b14a3112d44 ] bridge_fill_linkxstats() is using nla_reserve_64bit(). We must use nla_total_size_64bit() instead of nla_total_size() for corresponding data structure. Fixes: 1080ab95e3c7 ("net: bridge: add support for IGMP/MLD stats and export them via netlink") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Nikolay Aleksandrov <nikolay@nvidia.com> Cc: Vivien Didelot <vivien.didelot@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-13net_sched: fix NULL deref in fifo_set_limit()Eric Dumazet1-0/+3
[ Upstream commit 560ee196fe9e5037e5015e2cdb14b3aecb1cd7dc ] syzbot reported another NULL deref in fifo_set_limit() [1] I could repro the issue with : unshare -n tc qd add dev lo root handle 1:0 tbf limit 200000 burst 70000 rate 100Mbit tc qd replace dev lo parent 1:0 pfifo_fast tc qd change dev lo root handle 1:0 tbf limit 300000 burst 70000 rate 100Mbit pfifo_fast does not have a change() operation. Make fifo_set_limit() more robust about this. [1] BUG: kernel NULL pointer dereference, address: 0000000000000000 PGD 1cf99067 P4D 1cf99067 PUD 7ca49067 PMD 0 Oops: 0010 [#1] PREEMPT SMP KASAN CPU: 1 PID: 14443 Comm: syz-executor959 Not tainted 5.15.0-rc3-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:0x0 Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6. RSP: 0018:ffffc9000e2f7310 EFLAGS: 00010246 RAX: dffffc0000000000 RBX: ffffffff8d6ecc00 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff888024c27910 RDI: ffff888071e34000 RBP: ffff888071e34000 R08: 0000000000000001 R09: ffffffff8fcfb947 R10: 0000000000000001 R11: 0000000000000000 R12: ffff888024c27910 R13: ffff888071e34018 R14: 0000000000000000 R15: ffff88801ef74800 FS: 00007f321d897700(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffffffffd6 CR3: 00000000722c3000 CR4: 00000000003506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: fifo_set_limit net/sched/sch_fifo.c:242 [inline] fifo_set_limit+0x198/0x210 net/sched/sch_fifo.c:227 tbf_change+0x6ec/0x16d0 net/sched/sch_tbf.c:418 qdisc_change net/sched/sch_api.c:1332 [inline] tc_modify_qdisc+0xd9a/0x1a60 net/sched/sch_api.c:1634 rtnetlink_rcv_msg+0x413/0xb80 net/core/rtnetlink.c:5572 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504 netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline] netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1340 netlink_sendmsg+0x86d/0xdb0 net/netlink/af_netlink.c:1929 sock_sendmsg_nosec net/socket.c:704 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:724 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2409 ___sys_sendmsg+0xf3/0x170 net/socket.c:2463 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2492 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: fb0305ce1b03 ("net-sched: consolidate default fifo qdisc setup") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Link: https://lore.kernel.org/r/20210930212239.3430364-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-13SUNRPC: fix sign error causing rpcsec_gss dropsJ. Bruce Fields1-1/+1
commit 2ba5acfb34957e8a7fe47cd78c77ca88e9cc2b03 upstream. If sd_max is unsigned, then sd_max - GSS_SEQ_WIN is a very large number whenever sd_max is less than GSS_SEQ_WIN, and the comparison: seq_num <= sd->sd_max - GSS_SEQ_WIN in gss_check_seq_num is pretty much always true, even when that's clearly not what was intended. This was causing pynfs to hang when using krb5, because pynfs uses zero as the initial gss sequence number. That's perfectly legal, but this logic error causes knfsd to drop the rpc in that case. Out-of-order sequence IDs in the first GSS_SEQ_WIN (128) calls will also cause this. Fixes: 10b9d99a3dbb ("SUNRPC: Augment server-side rpcgss tracepoints") Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-06netfilter: nf_tables: Fix oversized kvmalloc() callsPablo Neira Ayuso1-1/+1
commit 45928afe94a094bcda9af858b96673d59bc4a0e9 upstream. The commit 7661809d493b ("mm: don't allow oversized kvmalloc() calls") limits the max allocatable memory via kvmalloc() to MAX_INT. Reported-by: syzbot+cd43695a64bcd21b8596@syzkaller.appspotmail.com Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-06netfilter: conntrack: serialize hash resizes and cleanupsEric Dumazet1-33/+37
commit e9edc188fc76499b0b9bd60364084037f6d03773 upstream. Syzbot was able to trigger the following warning [1] No repro found by syzbot yet but I was able to trigger similar issue by having 2 scripts running in parallel, changing conntrack hash sizes, and: for j in `seq 1 1000` ; do unshare -n /bin/true >/dev/null ; done It would take more than 5 minutes for net_namespace structures to be cleaned up. This is because nf_ct_iterate_cleanup() has to restart everytime a resize happened. By adding a mutex, we can serialize hash resizes and cleanups and also make get_next_corpse() faster by skipping over empty buckets. Even without resizes in the picture, this patch considerably speeds up network namespace dismantles. [1] INFO: task syz-executor.0:8312 can't die for more than 144 seconds. task:syz-executor.0 state:R running task stack:25672 pid: 8312 ppid: 6573 flags:0x00004006 Call Trace: context_switch kernel/sched/core.c:4955 [inline] __schedule+0x940/0x26f0 kernel/sched/core.c:6236 preempt_schedule_common+0x45/0xc0 kernel/sched/core.c:6408 preempt_schedule_thunk+0x16/0x18 arch/x86/entry/thunk_64.S:35 __local_bh_enable_ip+0x109/0x120 kernel/softirq.c:390 local_bh_enable include/linux/bottom_half.h:32 [inline] get_next_corpse net/netfilter/nf_conntrack_core.c:2252 [inline] nf_ct_iterate_cleanup+0x15a/0x450 net/netfilter/nf_conntrack_core.c:2275 nf_conntrack_cleanup_net_list+0x14c/0x4f0 net/netfilter/nf_conntrack_core.c:2469 ops_exit_list+0x10d/0x160 net/core/net_namespace.c:171 setup_net+0x639/0xa30 net/core/net_namespace.c:349 copy_net_ns+0x319/0x760 net/core/net_namespace.c:470 create_new_namespaces+0x3f6/0xb20 kernel/nsproxy.c:110 unshare_nsproxy_namespaces+0xc1/0x1f0 kernel/nsproxy.c:226 ksys_unshare+0x445/0x920 kernel/fork.c:3128 __do_sys_unshare kernel/fork.c:3202 [inline] __se_sys_unshare kernel/fork.c:3200 [inline] __x64_sys_unshare+0x2d/0x40 kernel/fork.c:3200 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f63da68e739 RSP: 002b:00007f63d7c05188 EFLAGS: 00000246 ORIG_RAX: 0000000000000110 RAX: ffffffffffffffda RBX: 00007f63da792f80 RCX: 00007f63da68e739 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000040000000 RBP: 00007f63da6e8cc4 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f63da792f80 R13: 00007fff50b75d3f R14: 00007f63d7c05300 R15: 0000000000022000 Showing all locks held in the system: 1 lock held by khungtaskd/27: #0: ffffffff8b980020 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:6446 2 locks held by kworker/u4:2/153: #0: ffff888010c69138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline] #0: ffff888010c69138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: arch_atomic_long_set include/linux/atomic/atomic-long.h:41 [inline] #0: ffff888010c69138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: atomic_long_set include/linux/atomic/atomic-instrumented.h:1198 [inline] #0: ffff888010c69138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: set_work_data kernel/workqueue.c:634 [inline] #0: ffff888010c69138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: set_work_pool_and_clear_pending kernel/workqueue.c:661 [inline] #0: ffff888010c69138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x896/0x1690 kernel/workqueue.c:2268 #1: ffffc9000140fdb0 ((kfence_timer).work){+.+.}-{0:0}, at: process_one_work+0x8ca/0x1690 kernel/workqueue.c:2272 1 lock held by systemd-udevd/2970: 1 lock held by in:imklog/6258: #0: ffff88807f970ff0 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0xe9/0x100 fs/file.c:990 3 locks held by kworker/1:6/8158: 1 lock held by syz-executor.0/8312: 2 locks held by kworker/u4:13/9320: 1 lock held by syz-executor.5/10178: 1 lock held by syz-executor.4/10217: Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-06netfilter: ipset: Fix oversized kvmalloc() callsJozsef Kadlecsik1-2/+2
commit 7bbc3d385bd813077acaf0e6fdb2a86a901f5382 upstream. The commit commit 7661809d493b426e979f39ab512e3adf41fbcc69 Author: Linus Torvalds <torvalds@linux-foundation.org> Date: Wed Jul 14 09:45:49 2021 -0700 mm: don't allow oversized kvmalloc() calls limits the max allocatable memory via kvmalloc() to MAX_INT. Apply the same limit in ipset. Reported-by: syzbot+3493b1873fb3ea827986@syzkaller.appspotmail.com Reported-by: syzbot+2b8443c35458a617c904@syzkaller.appspotmail.com Reported-by: syzbot+ee5cb15f4a0e85e0d54e@syzkaller.appspotmail.com Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-06net: udp: annotate data race around udp_sk(sk)->corkflagEric Dumazet2-6/+6
commit a9f5970767d11eadc805d5283f202612c7ba1f59 upstream. up->corkflag field can be read or written without any lock. Annotate accesses to avoid possible syzbot/KCSAN reports. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-06af_unix: fix races in sk_peer_pid and sk_peer_cred accessesEric Dumazet2-12/+54
[ Upstream commit 35306eb23814444bd4021f8a1c3047d3cb0c8b2b ] Jann Horn reported that SO_PEERCRED and SO_PEERGROUPS implementations are racy, as af_unix can concurrently change sk_peer_pid and sk_peer_cred. In order to fix this issue, this patch adds a new spinlock that needs to be used whenever these fields are read or written. Jann also pointed out that l2cap_sock_get_peer_pid_cb() is currently reading sk->sk_peer_pid which makes no sense, as this field is only possibly set by AF_UNIX sockets. We will have to clean this in a separate patch. This could be done by reverting b48596d1dc25 "Bluetooth: L2CAP: Add get_peer_pid callback" or implementing what was truly expected. Fixes: 109f6e39fa07 ("af_unix: Allow SO_PEERCRED to work across namespaces.") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Jann Horn <jannh@google.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Cc: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-06net: sched: flower: protect fl_walk() with rcuVlad Buslov1-0/+6
[ Upstream commit d5ef190693a7d76c5c192d108e8dec48307b46ee ] Patch that refactored fl_walk() to use idr_for_each_entry_continue_ul() also removed rcu protection of individual filters which causes following use-after-free when filter is deleted concurrently. Fix fl_walk() to obtain rcu read lock while iterating and taking the filter reference and temporary release the lock while calling arg->fn() callback that can sleep. KASAN trace: [ 352.773640] ================================================================== [ 352.775041] BUG: KASAN: use-after-free in fl_walk+0x159/0x240 [cls_flower] [ 352.776304] Read of size 4 at addr ffff8881c8251480 by task tc/2987 [ 352.777862] CPU: 3 PID: 2987 Comm: tc Not tainted 5.15.0-rc2+ #2 [ 352.778980] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [ 352.781022] Call Trace: [ 352.781573] dump_stack_lvl+0x46/0x5a [ 352.782332] print_address_description.constprop.0+0x1f/0x140 [ 352.783400] ? fl_walk+0x159/0x240 [cls_flower] [ 352.784292] ? fl_walk+0x159/0x240 [cls_flower] [ 352.785138] kasan_report.cold+0x83/0xdf [ 352.785851] ? fl_walk+0x159/0x240 [cls_flower] [ 352.786587] kasan_check_range+0x145/0x1a0 [ 352.787337] fl_walk+0x159/0x240 [cls_flower] [ 352.788163] ? fl_put+0x10/0x10 [cls_flower] [ 352.789007] ? __mutex_unlock_slowpath.constprop.0+0x220/0x220 [ 352.790102] tcf_chain_dump+0x231/0x450 [ 352.790878] ? tcf_chain_tp_delete_empty+0x170/0x170 [ 352.791833] ? __might_sleep+0x2e/0xc0 [ 352.792594] ? tfilter_notify+0x170/0x170 [ 352.793400] ? __mutex_unlock_slowpath.constprop.0+0x220/0x220 [ 352.794477] tc_dump_tfilter+0x385/0x4b0 [ 352.795262] ? tc_new_tfilter+0x1180/0x1180 [ 352.796103] ? __mod_node_page_state+0x1f/0xc0 [ 352.796974] ? __build_skb_around+0x10e/0x130 [ 352.797826] netlink_dump+0x2c0/0x560 [ 352.798563] ? netlink_getsockopt+0x430/0x430 [ 352.799433] ? __mutex_unlock_slowpath.constprop.0+0x220/0x220 [ 352.800542] __netlink_dump_start+0x356/0x440 [ 352.801397] rtnetlink_rcv_msg+0x3ff/0x550 [ 352.802190] ? tc_new_tfilter+0x1180/0x1180 [ 352.802872] ? rtnl_calcit.isra.0+0x1f0/0x1f0 [ 352.803668] ? tc_new_tfilter+0x1180/0x1180 [ 352.804344] ? _copy_from_iter_nocache+0x800/0x800 [ 352.805202] ? kasan_set_track+0x1c/0x30 [ 352.805900] netlink_rcv_skb+0xc6/0x1f0 [ 352.806587] ? rht_deferred_worker+0x6b0/0x6b0 [ 352.807455] ? rtnl_calcit.isra.0+0x1f0/0x1f0 [ 352.808324] ? netlink_ack+0x4d0/0x4d0 [ 352.809086] ? netlink_deliver_tap+0x62/0x3d0 [ 352.809951] netlink_unicast+0x353/0x480 [ 352.810744] ? netlink_attachskb+0x430/0x430 [ 352.811586] ? __alloc_skb+0xd7/0x200 [ 352.812349] netlink_sendmsg+0x396/0x680 [ 352.813132] ? netlink_unicast+0x480/0x480 [ 352.813952] ? __import_iovec+0x192/0x210 [ 352.814759] ? netlink_unicast+0x480/0x480 [ 352.815580] sock_sendmsg+0x6c/0x80 [ 352.816299] ____sys_sendmsg+0x3a5/0x3c0 [ 352.817096] ? kernel_sendmsg+0x30/0x30 [ 352.817873] ? __ia32_sys_recvmmsg+0x150/0x150 [ 352.818753] ___sys_sendmsg+0xd8/0x140 [ 352.819518] ? sendmsg_copy_msghdr+0x110/0x110 [ 352.820402] ? ___sys_recvmsg+0xf4/0x1a0 [ 352.821110] ? __copy_msghdr_from_user+0x260/0x260 [ 352.821934] ? _raw_spin_lock+0x81/0xd0 [ 352.822680] ? __handle_mm_fault+0xef3/0x1b20 [ 352.823549] ? rb_insert_color+0x2a/0x270 [ 352.824373] ? copy_page_range+0x16b0/0x16b0 [ 352.825209] ? perf_event_update_userpage+0x2d0/0x2d0 [ 352.826190] ? __fget_light+0xd9/0xf0 [ 352.826941] __sys_sendmsg+0xb3/0x130 [ 352.827613] ? __sys_sendmsg_sock+0x20/0x20 [ 352.828377] ? do_user_addr_fault+0x2c5/0x8a0 [ 352.829184] ? fpregs_assert_state_consistent+0x52/0x60 [ 352.830001] ? exit_to_user_mode_prepare+0x32/0x160 [ 352.830845] do_syscall_64+0x35/0x80 [ 352.831445] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 352.832331] RIP: 0033:0x7f7bee973c17 [ 352.833078] Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10 [ 352.836202] RSP: 002b:00007ffcbb368e28 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 352.837524] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7bee973c17 [ 352.838715] RDX: 0000000000000000 RSI: 00007ffcbb368e50 RDI: 0000000000000003 [ 352.839838] RBP: 00007ffcbb36d090 R08: 00000000cea96d79 R09: 00007f7beea34a40 [ 352.841021] R10: 00000000004059bb R11: 0000000000000246 R12: 000000000046563f [ 352.842208] R13: 0000000000000000 R14: 0000000000000000 R15: 00007ffcbb36d088 [ 352.843784] Allocated by task 2960: [ 352.844451] kasan_save_stack+0x1b/0x40 [ 352.845173] __kasan_kmalloc+0x7c/0x90 [ 352.845873] fl_change+0x282/0x22db [cls_flower] [ 352.846696] tc_new_tfilter+0x6cf/0x1180 [ 352.847493] rtnetlink_rcv_msg+0x471/0x550 [ 352.848323] netlink_rcv_skb+0xc6/0x1f0 [ 352.849097] netlink_unicast+0x353/0x480 [ 352.849886] netlink_sendmsg+0x396/0x680 [ 352.850678] sock_sendmsg+0x6c/0x80 [ 352.851398] ____sys_sendmsg+0x3a5/0x3c0 [ 352.852202] ___sys_sendmsg+0xd8/0x140 [ 352.852967] __sys_sendmsg+0xb3/0x130 [ 352.853718] do_syscall_64+0x35/0x80 [ 352.854457] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 352.855830] Freed by task 7: [ 352.856421] kasan_save_stack+0x1b/0x40 [ 352.857139] kasan_set_track+0x1c/0x30 [ 352.857854] kasan_set_free_info+0x20/0x30 [ 352.858609] __kasan_slab_free+0xed/0x130 [ 352.859348] kfree+0xa7/0x3c0 [ 352.859951] process_one_work+0x44d/0x780 [ 352.860685] worker_thread+0x2e2/0x7e0 [ 352.861390] kthread+0x1f4/0x220 [ 352.862022] ret_from_fork+0x1f/0x30 [ 352.862955] Last potentially related work creation: [ 352.863758] kasan_save_stack+0x1b/0x40 [ 352.864378] kasan_record_aux_stack+0xab/0xc0 [ 352.865028] insert_work+0x30/0x160 [ 352.865617] __queue_work+0x351/0x670 [ 352.866261] rcu_work_rcufn+0x30/0x40 [ 352.866917] rcu_core+0x3b2/0xdb0 [ 352.867561] __do_softirq+0xf6/0x386 [ 352.868708] Second to last potentially related work creation: [ 352.869779] kasan_save_stack+0x1b/0x40 [ 352.870560] kasan_record_aux_stack+0xab/0xc0 [ 352.871426] call_rcu+0x5f/0x5c0 [ 352.872108] queue_rcu_work+0x44/0x50 [ 352.872855] __fl_put+0x17c/0x240 [cls_flower] [ 352.873733] fl_delete+0xc7/0x100 [cls_flower] [ 352.874607] tc_del_tfilter+0x510/0xb30 [ 352.886085] rtnetlink_rcv_msg+0x471/0x550 [ 352.886875] netlink_rcv_skb+0xc6/0x1f0 [ 352.887636] netlink_unicast+0x353/0x480 [ 352.888285] netlink_sendmsg+0x396/0x680 [ 352.888942] sock_sendmsg+0x6c/0x80 [ 352.889583] ____sys_sendmsg+0x3a5/0x3c0 [ 352.890311] ___sys_sendmsg+0xd8/0x140 [ 352.891019] __sys_sendmsg+0xb3/0x130 [ 352.891716] do_syscall_64+0x35/0x80 [ 352.892395] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 352.893666] The buggy address belongs to the object at ffff8881c8251000 which belongs to the cache kmalloc-2k of size 2048 [ 352.895696] The buggy address is located 1152 bytes inside of 2048-byte region [ffff8881c8251000, ffff8881c8251800) [ 352.897640] The buggy address belongs to the page: [ 352.898492] page:00000000213bac35 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1c8250 [ 352.900110] head:00000000213bac35 order:3 compound_mapcount:0 compound_pincount:0 [ 352.901541] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff) [ 352.902908] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042f00 [ 352.904391] raw: 0000000000000000 0000000000080008 00000001ffffffff 0000000000000000 [ 352.905861] page dumped because: kasan: bad access detected [ 352.907323] Memory state around the buggy address: [ 352.908218] ffff8881c8251380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 352.909471] ffff8881c8251400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 352.910735] >ffff8881c8251480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 352.912012] ^ [ 352.912642] ffff8881c8251500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 352.913919] ffff8881c8251580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 352.915185] ================================================================== Fixes: d39d714969cd ("idr: introduce idr_for_each_entry_continue_ul()") Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Acked-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-06net: ipv4: Fix rtnexthop len when RTA_FLOW is presentXiao Liang2-9/+12
[ Upstream commit 597aa16c782496bf74c5dc3b45ff472ade6cee64 ] Multipath RTA_FLOW is embedded in nexthop. Dump it in fib_add_nexthop() to get the length of rtnexthop correct. Fixes: b0f60193632e ("ipv4: Refactor nexthop attributes in fib_dump_info") Signed-off-by: Xiao Liang <shaw.leon@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-06mptcp: don't return sockets in foreign netnsFlorian Westphal6-24/+20
[ Upstream commit ea1300b9df7c8e8b65695a08b8f6aaf4b25fec9c ] mptcp_token_get_sock() may return a mptcp socket that is in a different net namespace than the socket that received the token value. The mptcp syncookie code path had an explicit check for this, this moves the test into mptcp_token_get_sock() function. Eventually token.c should be converted to pernet storage, but such change is not suitable for net tree. Fixes: 2c5ebd001d4f0 ("mptcp: refactor token container") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-06sctp: break out if skb_header_pointer returns NULL in sctp_rcv_ootbXin Long1-1/+1
[ Upstream commit f7e745f8e94492a8ac0b0a26e25f2b19d342918f ] We should always check if skb_header_pointer's return is NULL before using it, otherwise it may cause null-ptr-deref, as syzbot reported: KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] RIP: 0010:sctp_rcv_ootb net/sctp/input.c:705 [inline] RIP: 0010:sctp_rcv+0x1d84/0x3220 net/sctp/input.c:196 Call Trace: <IRQ> sctp6_rcv+0x38/0x60 net/sctp/ipv6.c:1109 ip6_protocol_deliver_rcu+0x2e9/0x1ca0 net/ipv6/ip6_input.c:422 ip6_input_finish+0x62/0x170 net/ipv6/ip6_input.c:463 NF_HOOK include/linux/netfilter.h:307 [inline] NF_HOOK include/linux/netfilter.h:301 [inline] ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:472 dst_input include/net/dst.h:460 [inline] ip6_rcv_finish net/ipv6/ip6_input.c:76 [inline] NF_HOOK include/linux/netfilter.h:307 [inline] NF_HOOK include/linux/netfilter.h:301 [inline] ipv6_rcv+0x28c/0x3c0 net/ipv6/ip6_input.c:297 Fixes: 3acb50c18d8d ("sctp: delay as much as possible skb_linearize") Reported-by: syzbot+581aff2ae6b860625116@syzkaller.appspotmail.com Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-06mac80211: mesh: fix potentially unaligned accessJohannes Berg1-1/+2
[ Upstream commit b9731062ce8afd35cf723bf3a8ad55d208f915a5 ] The pointer here points directly into the frame, so the access is potentially unaligned. Use get_unaligned_le16 to avoid that. Fixes: 3f52b7e328c5 ("mac80211: mesh power save basics") Link: https://lore.kernel.org/r/20210920154009.3110ff75be0c.Ib6a2ff9e9cc9bc6fca50fce631ec1ce725cc926b@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-06mac80211: limit injected vht mcs/nss in ieee80211_parse_tx_radiotapLorenzo Bianconi1-0/+4
[ Upstream commit 13cb6d826e0ac0d144b0d48191ff1a111d32f0c6 ] Limit max values for vht mcs and nss in ieee80211_parse_tx_radiotap routine in order to fix the following warning reported by syzbot: WARNING: CPU: 0 PID: 10717 at include/net/mac80211.h:989 ieee80211_rate_set_vht include/net/mac80211.h:989 [inline] WARNING: CPU: 0 PID: 10717 at include/net/mac80211.h:989 ieee80211_parse_tx_radiotap+0x101e/0x12d0 net/mac80211/tx.c:2244 Modules linked in: CPU: 0 PID: 10717 Comm: syz-executor.5 Not tainted 5.14.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:ieee80211_rate_set_vht include/net/mac80211.h:989 [inline] RIP: 0010:ieee80211_parse_tx_radiotap+0x101e/0x12d0 net/mac80211/tx.c:2244 RSP: 0018:ffffc9000186f3e8 EFLAGS: 00010216 RAX: 0000000000000618 RBX: ffff88804ef76500 RCX: ffffc900143a5000 RDX: 0000000000040000 RSI: ffffffff888f478e RDI: 0000000000000003 RBP: 00000000ffffffff R08: 0000000000000000 R09: 0000000000000100 R10: ffffffff888f46f9 R11: 0000000000000000 R12: 00000000fffffff8 R13: ffff88804ef7653c R14: 0000000000000001 R15: 0000000000000004 FS: 00007fbf5718f700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b2de23000 CR3: 000000006a671000 CR4: 00000000001506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600 Call Trace: ieee80211_monitor_select_queue+0xa6/0x250 net/mac80211/iface.c:740 netdev_core_pick_tx+0x169/0x2e0 net/core/dev.c:4089 __dev_queue_xmit+0x6f9/0x3710 net/core/dev.c:4165 __bpf_tx_skb net/core/filter.c:2114 [inline] __bpf_redirect_no_mac net/core/filter.c:2139 [inline] __bpf_redirect+0x5ba/0xd20 net/core/filter.c:2162 ____bpf_clone_redirect net/core/filter.c:2429 [inline] bpf_clone_redirect+0x2ae/0x420 net/core/filter.c:2401 bpf_prog_eeb6f53a69e5c6a2+0x59/0x234 bpf_dispatcher_nop_func include/linux/bpf.h:717 [inline] __bpf_prog_run include/linux/filter.h:624 [inline] bpf_prog_run include/linux/filter.h:631 [inline] bpf_test_run+0x381/0xa30 net/bpf/test_run.c:119 bpf_prog_test_run_skb+0xb84/0x1ee0 net/bpf/test_run.c:663 bpf_prog_test_run kernel/bpf/syscall.c:3307 [inline] __sys_bpf+0x2137/0x5df0 kernel/bpf/syscall.c:4605 __do_sys_bpf kernel/bpf/syscall.c:4691 [inline] __se_sys_bpf kernel/bpf/syscall.c:4689 [inline] __x64_sys_bpf+0x75/0xb0 kernel/bpf/syscall.c:4689 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x4665f9 Reported-by: syzbot+0196ac871673f0c20f68@syzkaller.appspotmail.com Fixes: 646e76bb5daf4 ("mac80211: parse VHT info in injected frames") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/c26c3f02dcb38ab63b2f2534cb463d95ee81bb13.1632141760.git.lorenzo@kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-06mac80211: Fix ieee80211_amsdu_aggregate frag_tail bugChih-Kang Chang1-0/+8
[ Upstream commit fe94bac626d9c1c5bc98ab32707be8a9d7f8adba ] In ieee80211_amsdu_aggregate() set a pointer frag_tail point to the end of skb_shinfo(head)->frag_list, and use it to bind other skb in the end of this function. But when execute ieee80211_amsdu_aggregate() ->ieee80211_amsdu_realloc_pad()->pskb_expand_head(), the address of skb_shinfo(head)->frag_list will be changed. However, the ieee80211_amsdu_aggregate() not update frag_tail after call pskb_expand_head(). That will cause the second skb can't bind to the head skb appropriately.So we update the address of frag_tail to fix it. Fixes: 6e0456b54545 ("mac80211: add A-MSDU tx support") Signed-off-by: Chih-Kang Chang <gary.chang@realtek.com> Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com> Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Link: https://lore.kernel.org/r/20210830073240.12736-1-pkshih@realtek.com [reword comment] Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-06ipvs: check that ip_vs_conn_tab_bits is between 8 and 20Andrea Claudi1-0/+4
[ Upstream commit 69e73dbfda14fbfe748d3812da1244cce2928dcb ] ip_vs_conn_tab_bits may be provided by the user through the conn_tab_bits module parameter. If this value is greater than 31, or less than 0, the shift operator used to derive tab_size causes undefined behaviour. Fix this checking ip_vs_conn_tab_bits value to be in the range specified in ipvs Kconfig. If not, simply use default value. Fixes: 6f7edb4881bf ("IPVS: Allow boot time change of hash size") Reported-by: Yi Chen <yiche@redhat.com> Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Acked-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-06mac80211: fix use-after-free in CCMP/GCMP RXJohannes Berg1-0/+6
commit 94513069eb549737bcfc3d988d6ed4da948a2de8 upstream. When PN checking is done in mac80211, for fragmentation we need to copy the PN to the RX struct so we can later use it to do a comparison, since commit bf30ca922a0c ("mac80211: check defrag PN against current frame"). Unfortunately, in that commit I used the 'hdr' variable without it being necessarily valid, so use-after-free could occur if it was necessary to reallocate (parts of) the frame. Fix this by reloading the variable after the code that results in the reallocations, if any. This fixes https://bugzilla.kernel.org/show_bug.cgi?id=214401. Cc: stable@vger.kernel.org Fixes: bf30ca922a0c ("mac80211: check defrag PN against current frame") Link: https://lore.kernel.org/r/20210927115838.12b9ac6bb233.I1d066acd5408a662c3b6e828122cd314fcb28cdb@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-30ipv6: delay fib6_sernum increase in fib6_addzhang kai1-2/+1
[ Upstream commit e87b5052271e39d62337ade531992b7e5d8c2cfa ] only increase fib6_sernum in net namespace after add fib6_info successfully. Signed-off-by: zhang kai <zhangkaiheb@126.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-30treewide: Change list_sort to use const pointersSami Tolvanen1-2/+2
[ Upstream commit 4f0f586bf0c898233d8f316f471a21db2abd522d ] list_sort() internally casts the comparison function passed to it to a different type with constant struct list_head pointers, and uses this pointer to call the functions, which trips indirect call Control-Flow Integrity (CFI) checking. Instead of removing the consts, this change defines the list_cmp_func_t type and changes the comparison function types of all list_sort() callers to use const pointers, thus avoiding type mismatches. Suggested-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210408182843.1754385-10-samitolvanen@google.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-30net: dsa: don't allocate the slave_mii_bus using devresVladimir Oltean1-3/+9
[ Upstream commit 5135e96a3dd2f4555ae6981c3155a62bcf3227f6 ] The Linux device model permits both the ->shutdown and ->remove driver methods to get called during a shutdown procedure. Example: a DSA switch which sits on an SPI bus, and the SPI bus driver calls this on its ->shutdown method: spi_unregister_controller -> device_for_each_child(&ctlr->dev, NULL, __unregister); -> spi_unregister_device(to_spi_device(dev)); -> device_del(&spi->dev); So this is a simple pattern which can theoretically appear on any bus, although the only other buses on which I've been able to find it are I2C: i2c_del_adapter -> device_for_each_child(&adap->dev, NULL, __unregister_client); -> i2c_unregister_device(client); -> device_unregister(&client->dev); The implication of this pattern is that devices on these buses can be unregistered after having been shut down. The drivers for these devices might choose to return early either from ->remove or ->shutdown if the other callback has already run once, and they might choose that the ->shutdown method should only perform a subset of the teardown done by ->remove (to avoid unnecessary delays when rebooting). So in other words, the device driver may choose on ->remove to not do anything (therefore to not unregister an MDIO bus it has registered on ->probe), because this ->remove is actually triggered by the device_shutdown path, and its ->shutdown method has already run and done the minimally required cleanup. This used to be fine until the blamed commit, but now, the following BUG_ON triggers: void mdiobus_free(struct mii_bus *bus) { /* For compatibility with error handling in drivers. */ if (bus->state == MDIOBUS_ALLOCATED) { kfree(bus); return; } BUG_ON(bus->state != MDIOBUS_UNREGISTERED); bus->state = MDIOBUS_RELEASED; put_device(&bus->dev); } In other words, there is an attempt to free an MDIO bus which was not unregistered. The attempt to free it comes from the devres release callbacks of the SPI device, which are executed after the device is unregistered. I'm not saying that the fact that MDIO buses allocated using devres would automatically get unregistered wasn't strange. I'm just saying that the commit didn't care about auditing existing call paths in the kernel, and now, the following code sequences are potentially buggy: (a) devm_mdiobus_alloc followed by plain mdiobus_register, for a device located on a bus that unregisters its children on shutdown. After the blamed patch, either both the alloc and the register should use devres, or none should. (b) devm_mdiobus_alloc followed by plain mdiobus_register, and then no mdiobus_unregister at all in the remove path. After the blamed patch, nobody unregisters the MDIO bus anymore, so this is even more buggy than the previous case which needs a specific bus configuration to be seen, this one is an unconditional bug. In this case, DSA falls into category (a), it tries to be helpful and registers an MDIO bus on behalf of the switch, which might be on such a bus. I've no idea why it does it under devres. It does this on probe: if (!ds->slave_mii_bus && ds->ops->phy_read) alloc and register mdio bus and this on remove: if (ds->slave_mii_bus && ds->ops->phy_read) unregister mdio bus I _could_ imagine using devres because the condition used on remove is different than the condition used on probe. So strictly speaking, DSA cannot determine whether the ds->slave_mii_bus it sees on remove is the ds->slave_mii_bus that _it_ has allocated on probe. Using devres would have solved that problem. But nonetheless, the existing code already proceeds to unregister the MDIO bus, even though it might be unregistering an MDIO bus it has never registered. So I can only guess that no driver that implements ds->ops->phy_read also allocates and registers ds->slave_mii_bus itself. So in that case, if unregistering is fine, freeing must be fine too. Stop using devres and free the MDIO bus manually. This will make devres stop attempting to free a still registered MDIO bus on ->shutdown. Fixes: ac3a68d56651 ("net: phy: don't abuse devres in devm_mdiobus_register()") Reported-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-30net/smc: fix 'workqueue leaked lock' in smc_conn_abort_workKarsten Graul1-0/+2
[ Upstream commit a18cee4791b1123d0a6579a7c89f4b87e48abe03 ] The abort_work is scheduled when a connection was detected to be out-of-sync after a link failure. The work calls smc_conn_kill(), which calls smc_close_active_abort() and that might end up calling smc_close_cancel_work(). smc_close_cancel_work() cancels any pending close_work and tx_work but needs to release the sock_lock before and acquires the sock_lock again afterwards. So when the sock_lock was NOT acquired before then it may be held after the abort_work completes. Thats why the sock_lock is acquired before the call to smc_conn_kill() in __smc_lgr_terminate(), but this is missing in smc_conn_abort_work(). Fix that by acquiring the sock_lock first and release it after the call to smc_conn_kill(). Fixes: b286a0651e44 ("net/smc: handle incoming CDC validation message") Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-30net/smc: add missing error check in smc_clc_prfx_set()Karsten Graul1-1/+2
[ Upstream commit 6c90731980655280ea07ce4b21eb97457bf86286 ] Coverity stumbled over a missing error check in smc_clc_prfx_set(): *** CID 1475954: Error handling issues (CHECKED_RETURN) /net/smc/smc_clc.c: 233 in smc_clc_prfx_set() >>> CID 1475954: Error handling issues (CHECKED_RETURN) >>> Calling "kernel_getsockname" without checking return value (as is done elsewhere 8 out of 10 times). 233 kernel_getsockname(clcsock, (struct sockaddr *)&addrs); Add the return code check in smc_clc_prfx_set(). Fixes: c246d942eabc ("net/smc: restructure netinfo for CLC proposal msgs") Reported-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-269p/trans_virtio: Remove sysfs file on probe failureXie Yongji1-1/+3
commit f997ea3b7afc108eb9761f321b57de2d089c7c48 upstream. This ensures we don't leak the sysfs file if we failed to allocate chan->vc_wq during probe. Link: http://lkml.kernel.org/r/20210517083557.172-1-xieyongji@bytedance.com Fixes: 86c8437383ac ("net/9p: Add sysfs mount_tag file for virtio 9P device") Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-26sctp: add param size validation for SCTP_PARAM_SET_PRIMARYMarcelo Ricardo Leitner1-3/+10
commit ef6c8d6ccf0c1dccdda092ebe8782777cd7803c9 upstream. When SCTP handles an INIT chunk, it calls for example: sctp_sf_do_5_1B_init sctp_verify_init sctp_verify_param sctp_process_init sctp_process_param handling of SCTP_PARAM_SET_PRIMARY sctp_verify_init() wasn't doing proper size validation and neither the later handling, allowing it to work over the chunk itself, possibly being uninitialized memory. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-26sctp: validate chunk size in __rcv_asconf_lookupMarcelo Ricardo Leitner1-0/+3
commit b6ffe7671b24689c09faa5675dd58f93758a97ae upstream. In one of the fallbacks that SCTP has for identifying an association for an incoming packet, it looks for AddIp chunk (from ASCONF) and take a peek. Thing is, at this stage nothing was validating that the chunk actually had enough content for that, allowing the peek to happen over uninitialized memory. Similar check already exists in actual asconf handling in sctp_verify_asconf(). Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-22ip_gre: validate csum_start only on pullWillem de Bruijn1-3/+6
[ Upstream commit 8a0ed250f911da31a2aef52101bc707846a800ff ] The GRE tunnel device can pull existing outer headers in ipge_xmit. This is a rare path, apparently unique to this device. The below commit ensured that pulling does not move skb->data beyond csum_start. But it has a false positive if ip_summed is not CHECKSUM_PARTIAL and thus csum_start is irrelevant. Refine to exclude this. At the same time simplify and strengt