summaryrefslogtreecommitdiff
path: root/net/sched
AgeCommit message (Collapse)AuthorFilesLines
2022-02-16net: sched: Clarify error message when qdisc kind is unknownVictor Nogueira1-1/+1
[ Upstream commit 973bf8fdd12f0e70ea351c018e68edd377a836d1 ] When adding a tc rule with a qdisc kind that is not supported or not compiled into the kernel, the kernel emits the following error: "Error: Specified qdisc not found.". Found via tdc testing when ETS qdisc was not compiled in and it was not obvious right away what the message meant without looking at the kernel code. Change the error message to be more explicit and say the qdisc kind is unknown. Signed-off-by: Victor Nogueira <victor@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-02-05net: sched: fix use-after-free in tc_new_tfilter()Eric Dumazet1-4/+7
commit 04c2a47ffb13c29778e2a14e414ad4cb5a5db4b5 upstream. Whenever tc_new_tfilter() jumps back to replay: label, we need to make sure @q and @chain local variables are cleared again, or risk use-after-free as in [1] For consistency, apply the same fix in tc_ctl_chain() BUG: KASAN: use-after-free in mini_qdisc_pair_swap+0x1b9/0x1f0 net/sched/sch_generic.c:1581 Write of size 8 at addr ffff8880985c4b08 by task syz-executor.4/1945 CPU: 0 PID: 1945 Comm: syz-executor.4 Not tainted 5.17.0-rc1-syzkaller-00495-gff58831fa02d #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 print_address_description.constprop.0.cold+0x8d/0x336 mm/kasan/report.c:255 __kasan_report mm/kasan/report.c:442 [inline] kasan_report.cold+0x83/0xdf mm/kasan/report.c:459 mini_qdisc_pair_swap+0x1b9/0x1f0 net/sched/sch_generic.c:1581 tcf_chain_head_change_item net/sched/cls_api.c:372 [inline] tcf_chain0_head_change.isra.0+0xb9/0x120 net/sched/cls_api.c:386 tcf_chain_tp_insert net/sched/cls_api.c:1657 [inline] tcf_chain_tp_insert_unique net/sched/cls_api.c:1707 [inline] tc_new_tfilter+0x1e67/0x2350 net/sched/cls_api.c:2086 rtnetlink_rcv_msg+0x80d/0xb80 net/core/rtnetlink.c:5583 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2494 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline] netlink_unicast+0x539/0x7e0 net/netlink/af_netlink.c:1343 netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1919 sock_sendmsg_nosec net/socket.c:705 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:725 ____sys_sendmsg+0x331/0x810 net/socket.c:2413 ___sys_sendmsg+0xf3/0x170 net/socket.c:2467 __sys_sendmmsg+0x195/0x470 net/socket.c:2553 __do_sys_sendmmsg net/socket.c:2582 [inline] __se_sys_sendmmsg net/socket.c:2579 [inline] __x64_sys_sendmmsg+0x99/0x100 net/socket.c:2579 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f2647172059 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f2645aa5168 EFLAGS: 00000246 ORIG_RAX: 0000000000000133 RAX: ffffffffffffffda RBX: 00007f2647285100 RCX: 00007f2647172059 RDX: 040000000000009f RSI: 00000000200002c0 RDI: 0000000000000006 RBP: 00007f26471cc08d R08: 0000000000000000 R09: 0000000000000000 R10: 9e00000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007fffb3f7f02f R14: 00007f2645aa5300 R15: 0000000000022000 </TASK> Allocated by task 1944: kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38 kasan_set_track mm/kasan/common.c:45 [inline] set_alloc_info mm/kasan/common.c:436 [inline] ____kasan_kmalloc mm/kasan/common.c:515 [inline] ____kasan_kmalloc mm/kasan/common.c:474 [inline] __kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:524 kmalloc_node include/linux/slab.h:604 [inline] kzalloc_node include/linux/slab.h:726 [inline] qdisc_alloc+0xac/0xa10 net/sched/sch_generic.c:941 qdisc_create.constprop.0+0xce/0x10f0 net/sched/sch_api.c:1211 tc_modify_qdisc+0x4c5/0x1980 net/sched/sch_api.c:1660 rtnetlink_rcv_msg+0x413/0xb80 net/core/rtnetlink.c:5592 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2494 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline] netlink_unicast+0x539/0x7e0 net/netlink/af_netlink.c:1343 netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1919 sock_sendmsg_nosec net/socket.c:705 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:725 ____sys_sendmsg+0x331/0x810 net/socket.c:2413 ___sys_sendmsg+0xf3/0x170 net/socket.c:2467 __sys_sendmmsg+0x195/0x470 net/socket.c:2553 __do_sys_sendmmsg net/socket.c:2582 [inline] __se_sys_sendmmsg net/socket.c:2579 [inline] __x64_sys_sendmmsg+0x99/0x100 net/socket.c:2579 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae Freed by task 3609: kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38 kasan_set_track+0x21/0x30 mm/kasan/common.c:45 kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370 ____kasan_slab_free mm/kasan/common.c:366 [inline] ____kasan_slab_free+0x130/0x160 mm/kasan/common.c:328 kasan_slab_free include/linux/kasan.h:236 [inline] slab_free_hook mm/slub.c:1728 [inline] slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1754 slab_free mm/slub.c:3509 [inline] kfree+0xcb/0x280 mm/slub.c:4562 rcu_do_batch kernel/rcu/tree.c:2527 [inline] rcu_core+0x7b8/0x1540 kernel/rcu/tree.c:2778 __do_softirq+0x29b/0x9c2 kernel/softirq.c:558 Last potentially related work creation: kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38 __kasan_record_aux_stack+0xbe/0xd0 mm/kasan/generic.c:348 __call_rcu kernel/rcu/tree.c:3026 [inline] call_rcu+0xb1/0x740 kernel/rcu/tree.c:3106 qdisc_put_unlocked+0x6f/0x90 net/sched/sch_generic.c:1109 tcf_block_release+0x86/0x90 net/sched/cls_api.c:1238 tc_new_tfilter+0xc0d/0x2350 net/sched/cls_api.c:2148 rtnetlink_rcv_msg+0x80d/0xb80 net/core/rtnetlink.c:5583 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2494 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline] netlink_unicast+0x539/0x7e0 net/netlink/af_netlink.c:1343 netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1919 sock_sendmsg_nosec net/socket.c:705 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:725 ____sys_sendmsg+0x331/0x810 net/socket.c:2413 ___sys_sendmsg+0xf3/0x170 net/socket.c:2467 __sys_sendmmsg+0x195/0x470 net/socket.c:2553 __do_sys_sendmmsg net/socket.c:2582 [inline] __se_sys_sendmmsg net/socket.c:2579 [inline] __x64_sys_sendmmsg+0x99/0x100 net/socket.c:2579 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae The buggy address belongs to the object at ffff8880985c4800 which belongs to the cache kmalloc-1k of size 1024 The buggy address is located 776 bytes inside of 1024-byte region [ffff8880985c4800, ffff8880985c4c00) The buggy address belongs to the page: page:ffffea0002617000 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x985c0 head:ffffea0002617000 order:3 compound_mapcount:0 compound_pincount:0 flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff) raw: 00fff00000010200 0000000000000000 dead000000000122 ffff888010c41dc0 raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected page_owner tracks the page as allocated page last allocated via order 3, migratetype Unmovable, gfp_mask 0x1d20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_HARDWALL), pid 1941, ts 1038999441284, free_ts 1033444432829 prep_new_page mm/page_alloc.c:2434 [inline] get_page_from_freelist+0xa72/0x2f50 mm/page_alloc.c:4165 __alloc_pages+0x1b2/0x500 mm/page_alloc.c:5389 alloc_pages+0x1aa/0x310 mm/mempolicy.c:2271 alloc_slab_page mm/slub.c:1799 [inline] allocate_slab mm/slub.c:1944 [inline] new_slab+0x28a/0x3b0 mm/slub.c:2004 ___slab_alloc+0x87c/0xe90 mm/slub.c:3018 __slab_alloc.constprop.0+0x4d/0xa0 mm/slub.c:3105 slab_alloc_node mm/slub.c:3196 [inline] slab_alloc mm/slub.c:3238 [inline] __kmalloc+0x2fb/0x340 mm/slub.c:4420 kmalloc include/linux/slab.h:586 [inline] kzalloc include/linux/slab.h:715 [inline] __register_sysctl_table+0x112/0x1090 fs/proc/proc_sysctl.c:1335 neigh_sysctl_register+0x2c8/0x5e0 net/core/neighbour.c:3787 devinet_sysctl_register+0xb1/0x230 net/ipv4/devinet.c:2618 inetdev_init+0x286/0x580 net/ipv4/devinet.c:278 inetdev_event+0xa8a/0x15d0 net/ipv4/devinet.c:1532 notifier_call_chain+0xb5/0x200 kernel/notifier.c:84 call_netdevice_notifiers_info+0xb5/0x130 net/core/dev.c:1919 call_netdevice_notifiers_extack net/core/dev.c:1931 [inline] call_netdevice_notifiers net/core/dev.c:1945 [inline] register_netdevice+0x1073/0x1500 net/core/dev.c:9698 veth_newlink+0x59c/0xa90 drivers/net/veth.c:1722 page last free stack trace: reset_page_owner include/linux/page_owner.h:24 [inline] free_pages_prepare mm/page_alloc.c:1352 [inline] free_pcp_prepare+0x374/0x870 mm/page_alloc.c:1404 free_unref_page_prepare mm/page_alloc.c:3325 [inline] free_unref_page+0x19/0x690 mm/page_alloc.c:3404 release_pages+0x748/0x1220 mm/swap.c:956 tlb_batch_pages_flush mm/mmu_gather.c:50 [inline] tlb_flush_mmu_free mm/mmu_gather.c:243 [inline] tlb_flush_mmu+0xe9/0x6b0 mm/mmu_gather.c:250 zap_pte_range mm/memory.c:1441 [inline] zap_pmd_range mm/memory.c:1490 [inline] zap_pud_range mm/memory.c:1519 [inline] zap_p4d_range mm/memory.c:1540 [inline] unmap_page_range+0x1d1d/0x2a30 mm/memory.c:1561 unmap_single_vma+0x198/0x310 mm/memory.c:1606 unmap_vmas+0x16b/0x2f0 mm/memory.c:1638 exit_mmap+0x201/0x670 mm/mmap.c:3178 __mmput+0x122/0x4b0 kernel/fork.c:1114 mmput+0x56/0x60 kernel/fork.c:1135 exit_mm kernel/exit.c:507 [inline] do_exit+0xa3c/0x2a30 kernel/exit.c:793 do_group_exit+0xd2/0x2f0 kernel/exit.c:935 __do_sys_exit_group kernel/exit.c:946 [inline] __se_sys_exit_group kernel/exit.c:944 [inline] __x64_sys_exit_group+0x3a/0x50 kernel/exit.c:944 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae Memory state around the buggy address: ffff8880985c4a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8880985c4a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff8880985c4b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff8880985c4b80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8880985c4c00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc Fixes: 470502de5bdb ("net: sched: unlock rules update API") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Vlad Buslov <vladbu@mellanox.com> Cc: Jiri Pirko <jiri@mellanox.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Reported-by: syzbot <syzkaller@googlegroups.com> Link: https://lore.kernel.org/r/20220131172018.3704490-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-27net_sched: restore "mpu xxx" handlingKevin Bracey1-0/+1
commit fb80445c438c78b40b547d12b8d56596ce4ccfeb upstream. commit 56b765b79e9a ("htb: improved accuracy at high rates") broke "overhead X", "linklayer atm" and "mpu X" attributes. "overhead X" and "linklayer atm" have already been fixed. This restores the "mpu X" handling, as might be used by DOCSIS or Ethernet shaping: tc class add ... htb rate X overhead 4 mpu 64 The code being fixed is used by htb, tbf and act_police. Cake has its own mpu handling. qdisc_calculate_pkt_len still uses the size table containing values adjusted for mpu by user space. iproute2 tc has always passed mpu into the kernel via a tc_ratespec structure, but the kernel never directly acted on it, merely stored it so that it could be read back by `tc class show`. Rather, tc would generate length-to-time tables that included the mpu (and linklayer) in their construction, and the kernel used those tables. Since v3.7, the tables were no longer used. Along with "mpu", this also broke "overhead" and "linklayer" which were fixed in 01cb71d2d47b ("net_sched: restore "overhead xxx" handling", v3.10) and 8a8e3d84b171 ("net_sched: restore "linklayer atm" handling", v3.11). "overhead" was fixed by simply restoring use of tc_ratespec::overhead - this had originally been used by the kernel but was initially omitted from the new non-table-based calculations. "linklayer" had been handled in the table like "mpu", but the mode was not originally passed in tc_ratespec. The new implementation was made to handle it by getting new versions of tc to pass the mode in an extended tc_ratespec, and for older versions of tc the table contents were analysed at load time to deduce linklayer. As "mpu" has always been given to the kernel in tc_ratespec, accompanying the mpu-based table, we can restore system functionality with no userspace change by making the kernel act on the tc_ratespec value. Fixes: 56b765b79e9a ("htb: improved accuracy at high rates") Signed-off-by: Kevin Bracey <kevin@bracey.fi> Cc: Eric Dumazet <edumazet@google.com> Cc: Jiri Pirko <jiri@resnulli.us> Cc: Vimalkumar <j.vimal@gmail.com> Link: https://lore.kernel.org/r/20220112170210.1014351-1-kevin@bracey.fi Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-11sch_qfq: prevent shift-out-of-bounds in qfq_init_qdiscEric Dumazet1-4/+2
commit 7d18a07897d07495ee140dd319b0e9265c0f68ba upstream. tx_queue_len can be set to ~0U, we need to be more careful about overflows. __fls(0) is undefined, as this report shows: UBSAN: shift-out-of-bounds in net/sched/sch_qfq.c:1430:24 shift exponent 51770272 is too large for 32-bit type 'int' CPU: 0 PID: 25574 Comm: syz-executor.0 Not tainted 5.16.0-rc7-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x201/0x2d8 lib/dump_stack.c:106 ubsan_epilogue lib/ubsan.c:151 [inline] __ubsan_handle_shift_out_of_bounds+0x494/0x530 lib/ubsan.c:330 qfq_init_qdisc+0x43f/0x450 net/sched/sch_qfq.c:1430 qdisc_create+0x895/0x1430 net/sched/sch_api.c:1253 tc_modify_qdisc+0x9d9/0x1e20 net/sched/sch_api.c:1660 rtnetlink_rcv_msg+0x934/0xe60 net/core/rtnetlink.c:5571 netlink_rcv_skb+0x200/0x470 net/netlink/af_netlink.c:2496 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] netlink_unicast+0x814/0x9f0 net/netlink/af_netlink.c:1345 netlink_sendmsg+0xaea/0xe60 net/netlink/af_netlink.c:1921 sock_sendmsg_nosec net/socket.c:704 [inline] sock_sendmsg net/socket.c:724 [inline] ____sys_sendmsg+0x5b9/0x910 net/socket.c:2409 ___sys_sendmsg net/socket.c:2463 [inline] __sys_sendmsg+0x280/0x370 net/socket.c:2492 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: 462dbc9101ac ("pkt_sched: QFQ Plus: fair-queueing service at DRR cost") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-22flow_offload: return EOPNOTSUPP for the unsupported mpls action typeBaowen Zheng1-0/+1
[ Upstream commit 166b6a46b78bf8b9559a6620c3032f9fe492e082 ] We need to return EOPNOTSUPP for the unsupported mpls action type when setup the flow action. In the original implement, we will return 0 for the unsupported mpls action type, actually we do not setup it and the following actions to the flow action entry. Fixes: 9838b20a7fb2 ("net: sched: take rtnl lock in tc_setup_flow_action()") Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-22net/sched: sch_ets: don't remove idle classes from the round-robin listDavide Caratti1-2/+2
[ Upstream commit c062f2a0b04d86c5b8c9d973bea43493eaca3d32 ] Shuang reported that the following script: 1) tc qdisc add dev ddd0 handle 10: parent 1: ets bands 8 strict 4 priomap 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 2) mausezahn ddd0 -A 10.10.10.1 -B 10.10.10.2 -c 0 -a own -b 00:c1:a0:c1:a0:00 -t udp & 3) tc qdisc change dev ddd0 handle 10: ets bands 4 strict 2 quanta 2500 2500 priomap 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 crashes systematically when line 2) is commented: list_del corruption, ffff8e028404bd30->next is LIST_POISON1 (dead000000000100) ------------[ cut here ]------------ kernel BUG at lib/list_debug.c:47! invalid opcode: 0000 [#1] PREEMPT SMP NOPTI CPU: 0 PID: 954 Comm: tc Not tainted 5.16.0-rc4+ #478 Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014 RIP: 0010:__list_del_entry_valid.cold.1+0x12/0x47 Code: fe ff 0f 0b 48 89 c1 4c 89 c6 48 c7 c7 08 42 1b 87 e8 1d c5 fe ff 0f 0b 48 89 fe 48 89 c2 48 c7 c7 98 42 1b 87 e8 09 c5 fe ff <0f> 0b 48 c7 c7 48 43 1b 87 e8 fb c4 fe ff 0f 0b 48 89 f2 48 89 fe RSP: 0018:ffffae46807a3888 EFLAGS: 00010246 RAX: 000000000000004e RBX: 0000000000000007 RCX: 0000000000000202 RDX: 0000000000000000 RSI: ffffffff871ac536 RDI: 00000000ffffffff RBP: ffffae46807a3a10 R08: 0000000000000000 R09: c0000000ffff7fff R10: 0000000000000001 R11: ffffae46807a36a8 R12: ffff8e028404b800 R13: ffff8e028404bd30 R14: dead000000000100 R15: ffff8e02fafa2400 FS: 00007efdc92e4480(0000) GS:ffff8e02fb600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000682f48 CR3: 00000001058be000 CR4: 0000000000350ef0 Call Trace: <TASK> ets_qdisc_change+0x58b/0xa70 [sch_ets] tc_modify_qdisc+0x323/0x880 rtnetlink_rcv_msg+0x169/0x4a0 netlink_rcv_skb+0x50/0x100 netlink_unicast+0x1a5/0x280 netlink_sendmsg+0x257/0x4d0 sock_sendmsg+0x5b/0x60 ____sys_sendmsg+0x1f2/0x260 ___sys_sendmsg+0x7c/0xc0 __sys_sendmsg+0x57/0xa0 do_syscall_64+0x3a/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7efdc8031338 Code: 89 02 48 c7 c0 ff ff ff ff eb b5 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 25 43 2c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 41 89 d4 55 RSP: 002b:00007ffdf1ce9828 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000000061b37a97 RCX: 00007efdc8031338 RDX: 0000000000000000 RSI: 00007ffdf1ce9890 RDI: 0000000000000003 RBP: 0000000000000000 R08: 0000000000000001 R09: 000000000078a940 R10: 000000000000000c R11: 0000000000000246 R12: 0000000000000001 R13: 0000000000688880 R14: 0000000000000000 R15: 0000000000000000 </TASK> Modules linked in: sch_ets sch_tbf dummy rfkill iTCO_wdt iTCO_vendor_support intel_rapl_msr intel_rapl_common joydev pcspkr i2c_i801 virtio_balloon i2c_smbus lpc_ich ip_tables xfs libcrc32c crct10dif_pclmul crc32_pclmul crc32c_intel serio_raw ghash_clmulni_intel ahci libahci libata virtio_blk virtio_console virtio_net net_failover failover sunrpc dm_mirror dm_region_hash dm_log dm_mod [last unloaded: sch_ets] ---[ end trace f35878d1912655c2 ]--- RIP: 0010:__list_del_entry_valid.cold.1+0x12/0x47 Code: fe ff 0f 0b 48 89 c1 4c 89 c6 48 c7 c7 08 42 1b 87 e8 1d c5 fe ff 0f 0b 48 89 fe 48 89 c2 48 c7 c7 98 42 1b 87 e8 09 c5 fe ff <0f> 0b 48 c7 c7 48 43 1b 87 e8 fb c4 fe ff 0f 0b 48 89 f2 48 89 fe RSP: 0018:ffffae46807a3888 EFLAGS: 00010246 RAX: 000000000000004e RBX: 0000000000000007 RCX: 0000000000000202 RDX: 0000000000000000 RSI: ffffffff871ac536 RDI: 00000000ffffffff RBP: ffffae46807a3a10 R08: 0000000000000000 R09: c0000000ffff7fff R10: 0000000000000001 R11: ffffae46807a36a8 R12: ffff8e028404b800 R13: ffff8e028404bd30 R14: dead000000000100 R15: ffff8e02fafa2400 FS: 00007efdc92e4480(0000) GS:ffff8e02fb600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000682f48 CR3: 00000001058be000 CR4: 0000000000350ef0 Kernel panic - not syncing: Fatal exception in interrupt Kernel Offset: 0x4e00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- we can remove 'q->classes[i].alist' only if DRR class 'i' was part of the active list. In the ETS scheduler DRR classes belong to that list only if the queue length is greater than zero: we need to test for non-zero value of 'q->classes[i].qdisc->q.qlen' before removing from the list, similarly to what has been done elsewhere in the ETS code. Fixes: de6d25924c2a ("net/sched: sch_ets: don't peek at classes beyond 'nbands'") Reported-by: Shuang Li <shuali@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-22sch_cake: do not call cake_destroy() from cake_init()Eric Dumazet1-5/+1
[ Upstream commit ab443c53916730862cec202078d36fd4008bea79 ] qdiscs are not supposed to call their own destroy() method from init(), because core stack already does that. syzbot was able to trigger use after free: DEBUG_LOCKS_WARN_ON(lock->magic != lock) WARNING: CPU: 0 PID: 21902 at kernel/locking/mutex.c:586 __mutex_lock_common kernel/locking/mutex.c:586 [inline] WARNING: CPU: 0 PID: 21902 at kernel/locking/mutex.c:586 __mutex_lock+0x9ec/0x12f0 kernel/locking/mutex.c:740 Modules linked in: CPU: 0 PID: 21902 Comm: syz-executor189 Not tainted 5.16.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:__mutex_lock_common kernel/locking/mutex.c:586 [inline] RIP: 0010:__mutex_lock+0x9ec/0x12f0 kernel/locking/mutex.c:740 Code: 08 84 d2 0f 85 19 08 00 00 8b 05 97 38 4b 04 85 c0 0f 85 27 f7 ff ff 48 c7 c6 20 00 ac 89 48 c7 c7 a0 fe ab 89 e8 bf 76 ba ff <0f> 0b e9 0d f7 ff ff 48 8b 44 24 40 48 8d b8 c8 08 00 00 48 89 f8 RSP: 0018:ffffc9000627f290 EFLAGS: 00010282 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: ffff88802315d700 RSI: ffffffff815f1db8 RDI: fffff52000c4fe44 RBP: ffff88818f28e000 R08: 0000000000000000 R09: 0000000000000000 R10: ffffffff815ebb5e R11: 0000000000000000 R12: 0000000000000000 R13: dffffc0000000000 R14: ffffc9000627f458 R15: 0000000093c30000 FS: 0000555556abc400(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fda689c3303 CR3: 000000001cfbb000 CR4: 0000000000350ef0 Call Trace: <TASK> tcf_chain0_head_change_cb_del+0x2e/0x3d0 net/sched/cls_api.c:810 tcf_block_put_ext net/sched/cls_api.c:1381 [inline] tcf_block_put_ext net/sched/cls_api.c:1376 [inline] tcf_block_put+0xbc/0x130 net/sched/cls_api.c:1394 cake_destroy+0x3f/0x80 net/sched/sch_cake.c:2695 qdisc_create.constprop.0+0x9da/0x10f0 net/sched/sch_api.c:1293 tc_modify_qdisc+0x4c5/0x1980 net/sched/sch_api.c:1660 rtnetlink_rcv_msg+0x413/0xb80 net/core/rtnetlink.c:5571 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2496 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1345 netlink_sendmsg+0x904/0xdf0 net/netlink/af_netlink.c:1921 sock_sendmsg_nosec net/socket.c:704 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:724 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2409 ___sys_sendmsg+0xf3/0x170 net/socket.c:2463 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2492 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f1bb06badb9 Code: Unable to access opcode bytes at RIP 0x7f1bb06bad8f. RSP: 002b:00007fff3012a658 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f1bb06badb9 RDX: 0000000000000000 RSI: 00000000200007c0 RDI: 0000000000000003 RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000003 R10: 0000000000000003 R11: 0000000000000246 R12: 00007fff3012a688 R13: 00007fff3012a6a0 R14: 00007fff3012a6e0 R15: 00000000000013c2 </TASK> Fixes: 046f6fd5daef ("sched: Add Common Applications Kept Enhanced (cake) qdisc") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> Link: https://lore.kernel.org/r/20211210142046.698336-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-14net/sched: fq_pie: prevent dismantle issueEric Dumazet1-0/+1
commit 61c2402665f1e10c5742033fce18392e369931d7 upstream. For some reason, fq_pie_destroy() did not copy working code from pie_destroy() and other qdiscs, thus causing elusive bug. Before calling del_timer_sync(&q->adapt_timer), we need to ensure timer will not rearm itself. rcu: INFO: rcu_preempt self-detected stall on CPU rcu: 0-....: (4416 ticks this GP) idle=60d/1/0x4000000000000000 softirq=10433/10434 fqs=2579 (t=10501 jiffies g=13085 q=3989) NMI backtrace for cpu 0 CPU: 0 PID: 13 Comm: ksoftirqd/0 Not tainted 5.16.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 nmi_cpu_backtrace.cold+0x47/0x144 lib/nmi_backtrace.c:111 nmi_trigger_cpumask_backtrace+0x1b3/0x230 lib/nmi_backtrace.c:62 trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline] rcu_dump_cpu_stacks+0x25e/0x3f0 kernel/rcu/tree_stall.h:343 print_cpu_stall kernel/rcu/tree_stall.h:627 [inline] check_cpu_stall kernel/rcu/tree_stall.h:711 [inline] rcu_pending kernel/rcu/tree.c:3878 [inline] rcu_sched_clock_irq.cold+0x9d/0x746 kernel/rcu/tree.c:2597 update_process_times+0x16d/0x200 kernel/time/timer.c:1785 tick_sched_handle+0x9b/0x180 kernel/time/tick-sched.c:226 tick_sched_timer+0x1b0/0x2d0 kernel/time/tick-sched.c:1428 __run_hrtimer kernel/time/hrtimer.c:1685 [inline] __hrtimer_run_queues+0x1c0/0xe50 kernel/time/hrtimer.c:1749 hrtimer_interrupt+0x31c/0x790 kernel/time/hrtimer.c:1811 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1086 [inline] __sysvec_apic_timer_interrupt+0x146/0x530 arch/x86/kernel/apic/apic.c:1103 sysvec_apic_timer_interrupt+0x8e/0xc0 arch/x86/kernel/apic/apic.c:1097 </IRQ> <TASK> asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:638 RIP: 0010:write_comp_data kernel/kcov.c:221 [inline] RIP: 0010:__sanitizer_cov_trace_const_cmp1+0x1d/0x80 kernel/kcov.c:273 Code: 54 c8 20 48 89 10 c3 66 0f 1f 44 00 00 53 41 89 fb 41 89 f1 bf 03 00 00 00 65 48 8b 0c 25 40 70 02 00 48 89 ce 4c 8b 54 24 08 <e8> 4e f7 ff ff 84 c0 74 51 48 8b 81 88 15 00 00 44 8b 81 84 15 00 RSP: 0018:ffffc90000d27b28 EFLAGS: 00000246 RAX: 0000000000000000 RBX: ffff888064bf1bf0 RCX: ffff888011928000 RDX: ffff888011928000 RSI: ffff888011928000 RDI: 0000000000000003 RBP: ffff888064bf1c28 R08: 0000000000000000 R09: 0000000000000000 R10: ffffffff875d8295 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8880783dd300 R14: 0000000000000000 R15: 0000000000000000 pie_calculate_probability+0x405/0x7c0 net/sched/sch_pie.c:418 fq_pie_timer+0x170/0x2a0 net/sched/sch_fq_pie.c:383 call_timer_fn+0x1a5/0x6b0 kernel/time/timer.c:1421 expire_timers kernel/time/timer.c:1466 [inline] __run_timers.part.0+0x675/0xa20 kernel/time/timer.c:1734 __run_timers kernel/time/timer.c:1715 [inline] run_timer_softirq+0xb3/0x1d0 kernel/time/timer.c:1747 __do_softirq+0x29b/0x9c2 kernel/softirq.c:558 run_ksoftirqd kernel/softirq.c:921 [inline] run_ksoftirqd+0x2d/0x60 kernel/softirq.c:913 smpboot_thread_fn+0x645/0x9c0 kernel/smpboot.c:164 kthread+0x405/0x4f0 kernel/kthread.c:327 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 </TASK> Fixes: ec97ecf1ebe4 ("net: sched: add Flow Queue PIE packet scheduler") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Mohit P. Tahiliani <tahiliani@nitk.edu.in> Cc: Sachin D. Patil <sdp.sachin@gmail.com> Cc: V. Saicharan <vsaicharan1998@gmail.com> Cc: Mohit Bhasi <mohitbhasi1998@gmail.com> Cc: Leslie Monis <lesliemonis@gmail.com> Cc: Gautam Ramakrishnan <gautamramk@gmail.com> Link: https://lore.kernel.org/r/20211209084937.3500020-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-01net/sched: sch_ets: don't peek at classes beyond 'nbands'Davide Caratti1-3/+5
[ Upstream commit de6d25924c2a8c2988c6a385990cafbe742061bf ] when the number of DRR classes decreases, the round-robin active list can contain elements that have already been freed in ets_qdisc_change(). As a consequence, it's possible to see a NULL dereference crash, caused by the attempt to call cl->qdisc->ops->peek(cl->qdisc) when cl->qdisc is NULL: BUG: kernel NULL pointer dereference, address: 0000000000000018 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 1 PID: 910 Comm: mausezahn Not tainted 5.16.0-rc1+ #475 Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014 RIP: 0010:ets_qdisc_dequeue+0x129/0x2c0 [sch_ets] Code: c5 01 41 39 ad e4 02 00 00 0f 87 18 ff ff ff 49 8b 85 c0 02 00 00 49 39 c4 0f 84 ba 00 00 00 49 8b ad c0 02 00 00 48 8b 7d 10 <48> 8b 47 18 48 8b 40 38 0f ae e8 ff d0 48 89 c3 48 85 c0 0f 84 9d RSP: 0000:ffffbb36c0b5fdd8 EFLAGS: 00010287 RAX: ffff956678efed30 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000002 RSI: ffffffff9b938dc9 RDI: 0000000000000000 RBP: ffff956678efed30 R08: e2f3207fe360129c R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000001 R12: ffff956678efeac0 R13: ffff956678efe800 R14: ffff956611545000 R15: ffff95667ac8f100 FS: 00007f2aa9120740(0000) GS:ffff95667b800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000018 CR3: 000000011070c000 CR4: 0000000000350ee0 Call Trace: <TASK> qdisc_peek_dequeued+0x29/0x70 [sch_ets] tbf_dequeue+0x22/0x260 [sch_tbf] __qdisc_run+0x7f/0x630 net_tx_action+0x290/0x4c0 __do_softirq+0xee/0x4f8 irq_exit_rcu+0xf4/0x130 sysvec_apic_timer_interrupt+0x52/0xc0 asm_sysvec_apic_timer_interrupt+0x12/0x20 RIP: 0033:0x7f2aa7fc9ad4 Code: b9 ff ff 48 8b 54 24 18 48 83 c4 08 48 89 ee 48 89 df 5b 5d e9 ed fc ff ff 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa <53> 48 83 ec 10 48 8b 05 10 64 33 00 48 8b 00 48 85 c0 0f 85 84 00 RSP: 002b:00007ffe5d33fab8 EFLAGS: 00000202 RAX: 0000000000000002 RBX: 0000561f72c31460 RCX: 0000561f72c31720 RDX: 0000000000000002 RSI: 0000561f72c31722 RDI: 0000561f72c31720 RBP: 000000000000002a R08: 00007ffe5d33fa40 R09: 0000000000000014 R10: 0000000000000000 R11: 0000000000000246 R12: 0000561f7187e380 R13: 0000000000000000 R14: 0000000000000000 R15: 0000561f72c31460 </TASK> Modules linked in: sch_ets sch_tbf dummy rfkill iTCO_wdt intel_rapl_msr iTCO_vendor_support intel_rapl_common joydev virtio_balloon lpc_ich i2c_i801 i2c_smbus pcspkr ip_tables xfs libcrc32c crct10dif_pclmul crc32_pclmul crc32c_intel ahci libahci ghash_clmulni_intel serio_raw libata virtio_blk virtio_console virtio_net net_failover failover sunrpc dm_mirror dm_region_hash dm_log dm_mod CR2: 0000000000000018 Ensuring that 'alist' was never zeroed [1] was not sufficient, we need to remove from the active list those elements that are no more SP nor DRR. [1] https://lore.kernel.org/netdev/60d274838bf09777f0371253416e8af71360bc08.1633609148.git.dcaratti@redhat.com/ v3: fix race between ets_qdisc_change() and ets_qdisc_dequeue() delisting DRR classes beyond 'nbands' in ets_qdisc_change() with the qdisc lock acquired, thanks to Cong Wang. v2: when a NULL qdisc is found in the DRR active list, try to dequeue skb from the next list item. Reported-by: Hangbin Liu <liuhangbin@gmail.com> Fixes: dcc68b4d8084 ("net: sch_ets: Add a new Qdisc") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Link: https://lore.kernel.org/r/7a5c496eed2d62241620bdbb83eb03fb9d571c99.1637762721.git.dcaratti@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-26net: sched: act_mirred: drop dst for the direction from egress to ingressXin Long1-3/+8
[ Upstream commit f799ada6bf2397c351220088b9b0980125c77280 ] Without dropping dst, the packets sent from local mirred/redirected to ingress will may still use the old dst. ip_rcv() will drop it as the old dst is for output and its .input is dst_discard. This patch is to fix by also dropping dst for those packets that are mirred or redirected from egress to ingress in act_mirred. Note that we don't drop it for the direction change from ingress to egress, as on which there might be a user case attaching a metadata dst by act_tunnel_key that would be used later. Fixes: b57dc7c13ea9 ("net/sched: Introduce action ct") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Cong Wang <cong.wang@bytedance.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18net/sched: sch_taprio: fix undefined behavior in ktime_mono_to_anyEric Dumazet1-10/+17
[ Upstream commit 6dc25401cba4d428328eade8ceae717633fdd702 ] 1) if q->tk_offset == TK_OFFS_MAX, then get_tcp_tstamp() calls ktime_mono_to_any() with out-of-bound value. 2) if q->tk_offset is changed in taprio_parse_clockid(), taprio_get_time() might also call ktime_mono_to_any() with out-of-bound value as sysbot found: UBSAN: array-index-out-of-bounds in kernel/time/timekeeping.c:908:27 index 3 is out of range for type 'ktime_t *[3]' CPU: 1 PID: 25668 Comm: kworker/u4:0 Not tainted 5.15.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: bat_events batadv_iv_send_outstanding_bat_ogm_packet Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 ubsan_epilogue+0xb/0x5a lib/ubsan.c:151 __ubsan_handle_out_of_bounds.cold+0x62/0x6c lib/ubsan.c:291 ktime_mono_to_any+0x1d4/0x1e0 kernel/time/timekeeping.c:908 get_tcp_tstamp net/sched/sch_taprio.c:322 [inline] get_packet_txtime net/sched/sch_taprio.c:353 [inline] taprio_enqueue_one+0x5b0/0x1460 net/sched/sch_taprio.c:420 taprio_enqueue+0x3b1/0x730 net/sched/sch_taprio.c:485 dev_qdisc_enqueue+0x40/0x300 net/core/dev.c:3785 __dev_xmit_skb net/core/dev.c:3869 [inline] __dev_queue_xmit+0x1f6e/0x3630 net/core/dev.c:4194 batadv_send_skb_packet+0x4a9/0x5f0 net/batman-adv/send.c:108 batadv_iv_ogm_send_to_if net/batman-adv/bat_iv_ogm.c:393 [inline] batadv_iv_ogm_emit net/batman-adv/bat_iv_ogm.c:421 [inline] batadv_iv_send_outstanding_bat_ogm_packet+0x6d7/0x8e0 net/batman-adv/bat_iv_ogm.c:1701 process_one_work+0x9b2/0x1690 kernel/workqueue.c:2298 worker_thread+0x658/0x11f0 kernel/workqueue.c:2445 kthread+0x405/0x4f0 kernel/kthread.c:327 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 Fixes: 7ede7b03484b ("taprio: make clock reference conversions easier") Fixes: 54002066100b ("taprio: Adjust timestamps for TCP packets") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Vedang Patel <vedang.patel@intel.com> Reported-by: syzbot <syzkaller@googlegroups.com> Reviewed-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Link: https://lore.kernel.org/r/20211108180815.1822479-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18net: sched: update default qdisc visibility after Tx queue cnt changesJakub Kicinski3-0/+56
[ Upstream commit 1e080f17750d1083e8a32f7b350584ae1cd7ff20 ] mq / mqprio make the default child qdiscs visible. They only do so for the qdiscs which are within real_num_tx_queues when the device is registered. Depending on order of calls in the driver, or if user space changes config via ethtool -L the number of qdiscs visible under tc qdisc show will differ from the number of queues. This is confusing to users and potentially to system configuration scripts which try to make sure qdiscs have the right parameters. Add a new Qdisc_ops callback and make relevant qdiscs TTRT. Note that this uncovers the "shortcut" created by commit 1f27cde313d7 ("net: sched: use pfifo_fast for non real queues") The default child qdiscs beyond initial real_num_tx are always pfifo_fast, no matter what the sysfs setting is. Fixing this gets a little tricky because we'd need to keep a reference on whatever the default qdisc was at the time of creation. In practice this is likely an non-issue the qdiscs likely have to be configured to non-default settings, so whatever user space is doing such configuration can replace the pfifos... now that it will see them. Reported-by: Matthew Massey <matthewmassey@fb.com> Reviewed-by: Dave Taht <dave.taht@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-20mqprio: Correct stats in mqprio_dump_class_stats().Sebastian Andrzej Siewior1-12/+18
commit 14132690860e4d06aa3e1c4d7d8e9866ba7756dd upstream. Introduction of lockless subqueues broke the class statistics. Before the change stats were accumulated in `bstats' and `qstats' on the stack which was then copied to struct gnet_dump. After the change the `bstats' and `qstats' are initialized to 0 and never updated, yet still fed to gnet_dump. The code updates the global qdisc->cpu_bstats and qdisc->cpu_qstats instead, clobbering them. Most likely a copy-paste error from the code in mqprio_dump(). __gnet_stats_copy_basic() and __gnet_stats_copy_queue() accumulate the values for per-CPU case but for global stats they overwrite the value, so only stats from the last loop iteration / tc end up in sch->[bq]stats. Use the on-stack [bq]stats variables again and add the stats manually in the global case. Fixes: ce679e8df7ed2 ("net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mqprio") Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> https://lore.kernel.org/all/20211007175000.2334713-2-bigeasy@linutronix.de/ Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-17net: prevent user from passing illegal stab size王贇1-0/+6
[ Upstream commit b193e15ac69d56f35e1d8e2b5d16cbd47764d053 ] We observed below report when playing with netlink sock: UBSAN: shift-out-of-bounds in net/sched/sch_api.c:580:10 shift exponent 249 is too large for 32-bit type CPU: 0 PID: 685 Comm: a.out Not tainted Call Trace: dump_stack_lvl+0x8d/0xcf ubsan_epilogue+0xa/0x4e __ubsan_handle_shift_out_of_bounds+0x161/0x182 __qdisc_calculate_pkt_len+0xf0/0x190 __dev_queue_xmit+0x2ed/0x15b0 it seems like kernel won't check the stab log value passing from user, and will use the insane value later to calculate pkt_len. This patch just add a check on the size/cell_log to avoid insane calculation. Reported-by: Abaci <abaci@linux.alibaba.com> Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-13net/sched: sch_taprio: properly cancel timer from taprio_destroy()Eric Dumazet1-0/+4
[ Upstream commit a56d447f196fa9973c568f54c0d76d5391c3b0c0 ] There is a comment in qdisc_create() about us not calling ops->reset() in some cases. err_out4: /* * Any broken qdiscs that would require a ops->reset() here? * The qdisc was never in action so it shouldn't be necessary. */ As taprio sets a timer before actually receiving a packet, we need to cancel it from ops->destroy, just in case ops->reset has not been called. syzbot reported: ODEBUG: free active (active state 0) object type: hrtimer hint: advance_sched+0x0/0x9a0 arch/x86/include/asm/atomic64_64.h:22 WARNING: CPU: 0 PID: 8441 at lib/debugobjects.c:505 debug_print_object+0x16e/0x250 lib/debugobjects.c:505 Modules linked in: CPU: 0 PID: 8441 Comm: syz-executor813 Not tainted 5.14.0-rc6-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:debug_print_object+0x16e/0x250 lib/debugobjects.c:505 Code: ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 af 00 00 00 48 8b 14 dd e0 d3 e3 89 4c 89 ee 48 c7 c7 e0 c7 e3 89 e8 5b 86 11 05 <0f> 0b 83 05 85 03 92 09 01 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e c3 RSP: 0018:ffffc9000130f330 EFLAGS: 00010282 RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000000 RDX: ffff88802baeb880 RSI: ffffffff815d87b5 RDI: fffff52000261e58 RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000 R10: ffffffff815d25ee R11: 0000000000000000 R12: ffffffff898dd020 R13: ffffffff89e3ce20 R14: ffffffff81653630 R15: dffffc0000000000 FS: 0000000000f0d300(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffb64b3e000 CR3: 0000000036557000 CR4: 00000000001506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __debug_check_no_obj_freed lib/debugobjects.c:987 [inline] debug_check_no_obj_freed+0x301/0x420 lib/debugobjects.c:1018 slab_free_hook mm/slub.c:1603 [inline] slab_free_freelist_hook+0x171/0x240 mm/slub.c:1653 slab_free mm/slub.c:3213 [inline] kfree+0xe4/0x540 mm/slub.c:4267 qdisc_create+0xbcf/0x1320 net/sched/sch_api.c:1299 tc_modify_qdisc+0x4c8/0x1a60 net/sched/sch_api.c:1663 rtnetlink_rcv_msg+0x413/0xb80 net/core/rtnetlink.c:5571 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504 netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline] netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1340 netlink_sendmsg+0x86d/0xdb0 net/netlink/af_netlink.c:1929 sock_sendmsg_nosec net/socket.c:704 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:724 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2403 ___sys_sendmsg+0xf3/0x170 net/socket.c:2457 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2486 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 Fixes: 44d4775ca518 ("net/sched: sch_taprio: reset child qdiscs before freeing them") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Davide Caratti <dcaratti@redhat.com> Reported-by: syzbot <syzkaller@googlegroups.com> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Acked-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-13net_sched: fix NULL deref in fifo_set_limit()Eric Dumazet1-0/+3
[ Upstream commit 560ee196fe9e5037e5015e2cdb14b3aecb1cd7dc ] syzbot reported another NULL deref in fifo_set_limit() [1] I could repro the issue with : unshare -n tc qd add dev lo root handle 1:0 tbf limit 200000 burst 70000 rate 100Mbit tc qd replace dev lo parent 1:0 pfifo_fast tc qd change dev lo root handle 1:0 tbf limit 300000 burst 70000 rate 100Mbit pfifo_fast does not have a change() operation. Make fifo_set_limit() more robust about this. [1] BUG: kernel NULL pointer dereference, address: 0000000000000000 PGD 1cf99067 P4D 1cf99067 PUD 7ca49067 PMD 0 Oops: 0010 [#1] PREEMPT SMP KASAN CPU: 1 PID: 14443 Comm: syz-executor959 Not tainted 5.15.0-rc3-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:0x0 Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6. RSP: 0018:ffffc9000e2f7310 EFLAGS: 00010246 RAX: dffffc0000000000 RBX: ffffffff8d6ecc00 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff888024c27910 RDI: ffff888071e34000 RBP: ffff888071e34000 R08: 0000000000000001 R09: ffffffff8fcfb947 R10: 0000000000000001 R11: 0000000000000000 R12: ffff888024c27910 R13: ffff888071e34018 R14: 0000000000000000 R15: ffff88801ef74800 FS: 00007f321d897700(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffffffffd6 CR3: 00000000722c3000 CR4: 00000000003506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: fifo_set_limit net/sched/sch_fifo.c:242 [inline] fifo_set_limit+0x198/0x210 net/sched/sch_fifo.c:227 tbf_change+0x6ec/0x16d0 net/sched/sch_tbf.c:418 qdisc_change net/sched/sch_api.c:1332 [inline] tc_modify_qdisc+0xd9a/0x1a60 net/sched/sch_api.c:1634 rtnetlink_rcv_msg+0x413/0xb80 net/core/rtnetlink.c:5572 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504 netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline] netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1340 netlink_sendmsg+0x86d/0xdb0 net/netlink/af_netlink.c:1929 sock_sendmsg_nosec net/socket.c:704 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:724 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2409 ___sys_sendmsg+0xf3/0x170 net/socket.c:2463 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2492 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: fb0305ce1b03 ("net-sched: consolidate default fifo qdisc setup") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Link: https://lore.kernel.org/r/20210930212239.3430364-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-06net: sched: flower: protect fl_walk() with rcuVlad Buslov1-0/+6
[ Upstream commit d5ef190693a7d76c5c192d108e8dec48307b46ee ] Patch that refactored fl_walk() to use idr_for_each_entry_continue_ul() also removed rcu protection of individual filters which causes following use-after-free when filter is deleted concurrently. Fix fl_walk() to obtain rcu read lock while iterating and taking the filter reference and temporary release the lock while calling arg->fn() callback that can sleep. KASAN trace: [ 352.773640] ================================================================== [ 352.775041] BUG: KASAN: use-after-free in fl_walk+0x159/0x240 [cls_flower] [ 352.776304] Read of size 4 at addr ffff8881c8251480 by task tc/2987 [ 352.777862] CPU: 3 PID: 2987 Comm: tc Not tainted 5.15.0-rc2+ #2 [ 352.778980] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [ 352.781022] Call Trace: [ 352.781573] dump_stack_lvl+0x46/0x5a [ 352.782332] print_address_description.constprop.0+0x1f/0x140 [ 352.783400] ? fl_walk+0x159/0x240 [cls_flower] [ 352.784292] ? fl_walk+0x159/0x240 [cls_flower] [ 352.785138] kasan_report.cold+0x83/0xdf [ 352.785851] ? fl_walk+0x159/0x240 [cls_flower] [ 352.786587] kasan_check_range+0x145/0x1a0 [ 352.787337] fl_walk+0x159/0x240 [cls_flower] [ 352.788163] ? fl_put+0x10/0x10 [cls_flower] [ 352.789007] ? __mutex_unlock_slowpath.constprop.0+0x220/0x220 [ 352.790102] tcf_chain_dump+0x231/0x450 [ 352.790878] ? tcf_chain_tp_delete_empty+0x170/0x170 [ 352.791833] ? __might_sleep+0x2e/0xc0 [ 352.792594] ? tfilter_notify+0x170/0x170 [ 352.793400] ? __mutex_unlock_slowpath.constprop.0+0x220/0x220 [ 352.794477] tc_dump_tfilter+0x385/0x4b0 [ 352.795262] ? tc_new_tfilter+0x1180/0x1180 [ 352.796103] ? __mod_node_page_state+0x1f/0xc0 [ 352.796974] ? __build_skb_around+0x10e/0x130 [ 352.797826] netlink_dump+0x2c0/0x560 [ 352.798563] ? netlink_getsockopt+0x430/0x430 [ 352.799433] ? __mutex_unlock_slowpath.constprop.0+0x220/0x220 [ 352.800542] __netlink_dump_start+0x356/0x440 [ 352.801397] rtnetlink_rcv_msg+0x3ff/0x550 [ 352.802190] ? tc_new_tfilter+0x1180/0x1180 [ 352.802872] ? rtnl_calcit.isra.0+0x1f0/0x1f0 [ 352.803668] ? tc_new_tfilter+0x1180/0x1180 [ 352.804344] ? _copy_from_iter_nocache+0x800/0x800 [ 352.805202] ? kasan_set_track+0x1c/0x30 [ 352.805900] netlink_rcv_skb+0xc6/0x1f0 [ 352.806587] ? rht_deferred_worker+0x6b0/0x6b0 [ 352.807455] ? rtnl_calcit.isra.0+0x1f0/0x1f0 [ 352.808324] ? netlink_ack+0x4d0/0x4d0 [ 352.809086] ? netlink_deliver_tap+0x62/0x3d0 [ 352.809951] netlink_unicast+0x353/0x480 [ 352.810744] ? netlink_attachskb+0x430/0x430 [ 352.811586] ? __alloc_skb+0xd7/0x200 [ 352.812349] netlink_sendmsg+0x396/0x680 [ 352.813132] ? netlink_unicast+0x480/0x480 [ 352.813952] ? __import_iovec+0x192/0x210 [ 352.814759] ? netlink_unicast+0x480/0x480 [ 352.815580] sock_sendmsg+0x6c/0x80 [ 352.816299] ____sys_sendmsg+0x3a5/0x3c0 [ 352.817096] ? kernel_sendmsg+0x30/0x30 [ 352.817873] ? __ia32_sys_recvmmsg+0x150/0x150 [ 352.818753] ___sys_sendmsg+0xd8/0x140 [ 352.819518] ? sendmsg_copy_msghdr+0x110/0x110 [ 352.820402] ? ___sys_recvmsg+0xf4/0x1a0 [ 352.821110] ? __copy_msghdr_from_user+0x260/0x260 [ 352.821934] ? _raw_spin_lock+0x81/0xd0 [ 352.822680] ? __handle_mm_fault+0xef3/0x1b20 [ 352.823549] ? rb_insert_color+0x2a/0x270 [ 352.824373] ? copy_page_range+0x16b0/0x16b0 [ 352.825209] ? perf_event_update_userpage+0x2d0/0x2d0 [ 352.826190] ? __fget_light+0xd9/0xf0 [ 352.826941] __sys_sendmsg+0xb3/0x130 [ 352.827613] ? __sys_sendmsg_sock+0x20/0x20 [ 352.828377] ? do_user_addr_fault+0x2c5/0x8a0 [ 352.829184] ? fpregs_assert_state_consistent+0x52/0x60 [ 352.830001] ? exit_to_user_mode_prepare+0x32/0x160 [ 352.830845] do_syscall_64+0x35/0x80 [ 352.831445] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 352.832331] RIP: 0033:0x7f7bee973c17 [ 352.833078] Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10 [ 352.836202] RSP: 002b:00007ffcbb368e28 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 352.837524] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7bee973c17 [ 352.838715] RDX: 0000000000000000 RSI: 00007ffcbb368e50 RDI: 0000000000000003 [ 352.839838] RBP: 00007ffcbb36d090 R08: 00000000cea96d79 R09: 00007f7beea34a40 [ 352.841021] R10: 00000000004059bb R11: 0000000000000246 R12: 000000000046563f [ 352.842208] R13: 0000000000000000 R14: 0000000000000000 R15: 00007ffcbb36d088 [ 352.843784] Allocated by task 2960: [ 352.844451] kasan_save_stack+0x1b/0x40 [ 352.845173] __kasan_kmalloc+0x7c/0x90 [ 352.845873] fl_change+0x282/0x22db [cls_flower] [ 352.846696] tc_new_tfilter+0x6cf/0x1180 [ 352.847493] rtnetlink_rcv_msg+0x471/0x550 [ 352.848323] netlink_rcv_skb+0xc6/0x1f0 [ 352.849097] netlink_unicast+0x353/0x480 [ 352.849886] netlink_sendmsg+0x396/0x680 [ 352.850678] sock_sendmsg+0x6c/0x80 [ 352.851398] ____sys_sendmsg+0x3a5/0x3c0 [ 352.852202] ___sys_sendmsg+0xd8/0x140 [ 352.852967] __sys_sendmsg+0xb3/0x130 [ 352.853718] do_syscall_64+0x35/0x80 [ 352.854457] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 352.855830] Freed by task 7: [ 352.856421] kasan_save_stack+0x1b/0x40 [ 352.857139] kasan_set_track+0x1c/0x30 [ 352.857854] kasan_set_free_info+0x20/0x30 [ 352.858609] __kasan_slab_free+0xed/0x130 [ 352.859348] kfree+0xa7/0x3c0 [ 352.859951] process_one_work+0x44d/0x780 [ 352.860685] worker_thread+0x2e2/0x7e0 [ 352.861390] kthread+0x1f4/0x220 [ 352.862022] ret_from_fork+0x1f/0x30 [ 352.862955] Last potentially related work creation: [ 352.863758] kasan_save_stack+0x1b/0x40 [ 352.864378] kasan_record_aux_stack+0xab/0xc0 [ 352.865028] insert_work+0x30/0x160 [ 352.865617] __queue_work+0x351/0x670 [ 352.866261] rcu_work_rcufn+0x30/0x40 [ 352.866917] rcu_core+0x3b2/0xdb0 [ 352.867561] __do_softirq+0xf6/0x386 [ 352.868708] Second to last potentially related work creation: [ 352.869779] kasan_save_stack+0x1b/0x40 [ 352.870560] kasan_record_aux_stack+0xab/0xc0 [ 352.871426] call_rcu+0x5f/0x5c0 [ 352.872108] queue_rcu_work+0x44/0x50 [ 352.872855] __fl_put+0x17c/0x240 [cls_flower] [ 352.873733] fl_delete+0xc7/0x100 [cls_flower] [ 352.874607] tc_del_tfilter+0x510/0xb30 [ 352.886085] rtnetlink_rcv_msg+0x471/0x550 [ 352.886875] netlink_rcv_skb+0xc6/0x1f0 [ 352.887636] netlink_unicast+0x353/0x480 [ 352.888285] netlink_sendmsg+0x396/0x680 [ 352.888942] sock_sendmsg+0x6c/0x80 [ 352.889583] ____sys_sendmsg+0x3a5/0x3c0 [ 352.890311] ___sys_sendmsg+0xd8/0x140 [ 352.891019] __sys_sendmsg+0xb3/0x130 [ 352.891716] do_syscall_64+0x35/0x80 [ 352.892395] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 352.893666] The buggy address belongs to the object at ffff8881c8251000 which belongs to the cache kmalloc-2k of size 2048 [ 352.895696] The buggy address is located 1152 bytes inside of 2048-byte region [ffff8881c8251000, ffff8881c8251800) [ 352.897640] The buggy address belongs to the page: [ 352.898492] page:00000000213bac35 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1c8250 [ 352.900110] head:00000000213bac35 order:3 compound_mapcount:0 compound_pincount:0 [ 352.901541] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff) [ 352.902908] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042f00 [ 352.904391] raw: 0000000000000000 0000000000080008 00000001ffffffff 0000000000000000 [ 352.905861] page dumped because: kasan: bad access detected [ 352.907323] Memory state around the buggy address: [ 352.908218] ffff8881c8251380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 352.909471] ffff8881c8251400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 352.910735] >ffff8881c8251480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 352.912012] ^ [ 352.912642] ffff8881c8251500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 352.913919] ffff8881c8251580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 352.915185] ================================================================== Fixes: d39d714969cd ("idr: introduce idr_for_each_entry_continue_ul()") Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Acked-by: Cong Wang <cong.wang@bytedance.com> Sig