summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
31 hoursnetfilter: ctnetlink: use netlink policy range checksDavid Carlier1-0/+4
[ Upstream commit 8f15b5071b4548b0aafc03b366eb45c9c6566704 ] Replace manual range and mask validations with netlink policy annotations in ctnetlink code paths, so that the netlink core rejects invalid values early and can generate extack errors. - CTA_PROTOINFO_TCP_STATE: reject values > TCP_CONNTRACK_SYN_SENT2 at policy level, removing the manual >= TCP_CONNTRACK_MAX check. - CTA_PROTOINFO_TCP_WSCALE_ORIGINAL/REPLY: reject values > TCP_MAX_WSCALE (14). The normal TCP option parsing path already clamps to this value, but the ctnetlink path accepted 0-255, causing undefined behavior when used as a u32 shift count. - CTA_FILTER_ORIG_FLAGS/REPLY_FLAGS: use NLA_POLICY_MASK with CTA_FILTER_F_ALL, removing the manual mask checks. - CTA_EXPECT_FLAGS: use NLA_POLICY_MASK with NF_CT_EXPECT_MASK, adding a new mask define grouping all valid expect flags. Extracted from a broader nf-next patch by Florian Westphal, scoped to ctnetlink for the fixes tree. Fixes: c8e2078cfe41 ("[NETFILTER]: ctnetlink: add support for internal tcp connection tracking flags handling") Signed-off-by: David Carlier <devnexen@gmail.com> Co-developed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
31 hoursBluetooth: hci_sync: Remove remaining dependencies of hci_requestLuiz Augusto von Dentz1-0/+17
[ Upstream commit f2d89775358606c7ab6b6b6c4a02fe1e8cd270b1 ] This removes the dependencies of hci_req_init and hci_request_cancel_all from hci_sync.c. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Stable-dep-of: 94d8e6fe5d08 ("Bluetooth: btintel: serialize btintel_hw_error() with hci_req_sync_lock") Signed-off-by: Sasha Levin <sashal@kernel.org>
31 hoursdma-mapping: add missing `inline` for `dma_free_attrs`Miguel Ojeda1-2/+2
[ Upstream commit 2cdaff22ed26f1e619aa2b43f27bb84f2c6ef8f8 ] Under an UML build for an upcoming series [1], I got `-Wstatic-in-inline` for `dma_free_attrs`: BINDGEN rust/bindings/bindings_generated.rs - due to target missing In file included from rust/helpers/helpers.c:59: rust/helpers/dma.c:17:2: warning: static function 'dma_free_attrs' is used in an inline function with external linkage [-Wstatic-in-inline] 17 | dma_free_attrs(dev, size, cpu_addr, dma_handle, attrs); | ^ rust/helpers/dma.c:12:1: note: use 'static' to give inline function 'rust_helper_dma_free_attrs' internal linkage 12 | __rust_helper void rust_helper_dma_free_attrs(struct device *dev, size_t size, | ^ | static The issue is that `dma_free_attrs` was not marked `inline` when it was introduced alongside the rest of the stubs. Thus mark it. Fixes: ed6ccf10f24b ("dma-mapping: properly stub out the DMA API for !CONFIG_HAS_DMA") Closes: https://lore.kernel.org/rust-for-linux/20260322194616.89847-1-ojeda@kernel.org/ [1] Signed-off-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20260325015548.70912-1-ojeda@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
31 hoursudp: Fix wildcard bind conflict check when using hash2Martin KaFai Lau1-0/+14
[ Upstream commit e537dd15d0d4ad989d56a1021290f0c674dd8b28 ] When binding a udp_sock to a local address and port, UDP uses two hashes (udptable->hash and udptable->hash2) for collision detection. The current code switches to "hash2" when hslot->count > 10. "hash2" is keyed by local address and local port. "hash" is keyed by local port only. The issue can be shown in the following bind sequence (pseudo code): bind(fd1, "[fd00::1]:8888") bind(fd2, "[fd00::2]:8888") bind(fd3, "[fd00::3]:8888") bind(fd4, "[fd00::4]:8888") bind(fd5, "[fd00::5]:8888") bind(fd6, "[fd00::6]:8888") bind(fd7, "[fd00::7]:8888") bind(fd8, "[fd00::8]:8888") bind(fd9, "[fd00::9]:8888") bind(fd10, "[fd00::10]:8888") /* Correctly return -EADDRINUSE because "hash" is used * instead of "hash2". udp_lib_lport_inuse() detects the * conflict. */ bind(fail_fd, "[::]:8888") /* After one more socket is bound to "[fd00::11]:8888", * hslot->count exceeds 10 and "hash2" is used instead. */ bind(fd11, "[fd00::11]:8888") bind(fail_fd, "[::]:8888") /* succeeds unexpectedly */ The same issue applies to the IPv4 wildcard address "0.0.0.0" and the IPv4-mapped wildcard address "::ffff:0.0.0.0". For example, if there are existing sockets bound to "192.168.1.[1-11]:8888", then binding "0.0.0.0:8888" or "[::ffff:0.0.0.0]:8888" can also miss the conflict when hslot->count > 10. TCP inet_csk_get_port() already has the correct check in inet_use_bhash2_on_bind(). Rename it to inet_use_hash2_on_bind() and move it to inet_hashtables.h so udp.c can reuse it in this fix. Fixes: 30fff9231fad ("udp: bind() optimisation") Reported-by: Andrew Onyshchuk <oandrew@meta.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260319181817.1901357-1-martin.lau@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
31 hoursipv6: Don't remove permanent routes with exceptions from tb6_gc_hlist.Kuniyuki Iwashima1-1/+20
[ Upstream commit 4be7b99c253f0c85a255cc1db7127ba3232dfa30 ] The cited commit mechanically put fib6_remove_gc_list() just after every fib6_clean_expires() call. When a temporary route is promoted to a permanent route, there may already be exception routes tied to it. If fib6_remove_gc_list() removes the route from tb6_gc_hlist, such exception routes will no longer be aged. Let's replace fib6_remove_gc_list() with a new helper fib6_may_remove_gc_list() and use fib6_age_exceptions() there. Note that net->ipv6 is only compiled when CONFIG_IPV6 is enabled, so fib6_{add,remove,may_remove}_gc_list() are guarded. Fixes: 5eb902b8e719 ("net/ipv6: Remove expired routes with a separated list of routes.") Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20260320072317.2561779-3-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
31 hoursusb: core: new quirk to handle devices with zero configurationsJie Deng1-0/+3
[ Upstream commit 9f6a983cfa22ac662c86e60816d3a357d4b551e9 ] Some USB devices incorrectly report bNumConfigurations as 0 in their device descriptor, which causes the USB core to reject them during enumeration. logs: usb 1-2: device descriptor read/64, error -71 usb 1-2: no configurations usb 1-2: can't read configurations, error -22 However, these devices actually work correctly when treated as having a single configuration. Add a new quirk USB_QUIRK_FORCE_ONE_CONFIG to handle such devices. When this quirk is set, assume the device has 1 configuration instead of failing with -EINVAL. This quirk is applied to the device with VID:PID 5131:2007 which exhibits this behavior. Signed-off-by: Jie Deng <dengjie03@kylinos.cn> Link: https://patch.msgid.link/20260227084931.1527461-1-dengjie03@kylinos.cn Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
31 hoursdma-buf: Include ioctl.h in UAPI headerIsaac J. Manjarres1-0/+1
[ Upstream commit a116bac87118903925108e57781bbfc7a7eea27b ] include/uapi/linux/dma-buf.h uses several macros from ioctl.h to define its ioctl commands. However, it does not include ioctl.h itself. So, if userspace source code tries to include the dma-buf.h file without including ioctl.h, it can result in build failures. Therefore, include ioctl.h in the dma-buf UAPI header. Signed-off-by: Isaac J. Manjarres <isaacmanjarres@google.com> Reviewed-by: T.J. Mercier <tjmercier@google.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/r/20260303002309.1401849-1-isaacmanjarres@google.com Signed-off-by: Sasha Levin <sashal@kernel.org>
31 hoursnet: usb: r8152: add TRENDnet TUC-ET2GValentin Spreckels1-0/+1
[ Upstream commit 15fba71533bcdfaa8eeba69a5a5a2927afdf664a ] The TRENDnet TUC-ET2G is a RTL8156 based usb ethernet adapter. Add its vendor and product IDs. Signed-off-by: Valentin Spreckels <valentin@spreckels.dev> Link: https://patch.msgid.link/20260226195409.7891-2-valentin@spreckels.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
9 daysxen/privcmd: add boot control for restricted usage in domUJuergen Gross1-0/+1
commit 1613462be621ad5103ec338a7b0ca0746ec4e5f1 upstream. When running in an unprivileged domU under Xen, the privcmd driver is restricted to allow only hypercalls against a target domain, for which the current domU is acting as a device model. Add a boot parameter "unrestricted" to allow all hypercalls (the hypervisor will still refuse destructive hypercalls affecting other guests). Make this new parameter effective only in case the domU wasn't started using secure boot, as otherwise hypercalls targeting the domU itself might result in violating the secure boot functionality. This is achieved by adding another lockdown reason, which can be tested to not being set when applying the "unrestricted" option. This is part of XSA-482 Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 daysnetfilter: nft_set_pipapo: split gc into unlink and reclaim phaseFlorian Westphal1-0/+5
commit 9df95785d3d8302f7c066050117b04cd3c2048c2 upstream. Yiming Qian reports Use-after-free in the pipapo set type: Under a large number of expired elements, commit-time GC can run for a very long time in a non-preemptible context, triggering soft lockup warnings and RCU stall reports (local denial of service). We must split GC in an unlink and a reclaim phase. We cannot queue elements for freeing until pointers have been swapped. Expired elements are still exposed to both the packet path and userspace dumpers via the live copy of the data structure. call_rcu() does not protect us: dump operations or element lookups starting after call_rcu has fired can still observe the free'd element, unless the commit phase has made enough progress to swap the clone and live pointers before any new reader has picked up the old version. This a similar approach as done recently for the rbtree backend in commit 35f83a75529a ("netfilter: nft_set_rbtree: don't gc elements on insert"). Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges") Reported-by: Yiming Qian <yimingqian591@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 daysudp_tunnel: fix NULL deref caused by udp_sock_create6 when CONFIG_IPV6=nXiang Mei1-1/+1
[ Upstream commit b3a6df291fecf5f8a308953b65ca72b7fc9e015d ] When CONFIG_IPV6 is disabled, the udp_sock_create6() function returns 0 (success) without actually creating a socket. Callers such as fou_create() then proceed to dereference the uninitialized socket pointer, resulting in a NULL pointer dereference. The captured NULL deref crash: BUG: kernel NULL pointer dereference, address: 0000000000000018 RIP: 0010:fou_nl_add_doit (net/ipv4/fou_core.c:590 net/ipv4/fou_core.c:764) [...] Call Trace: <TASK> genl_family_rcv_msg_doit.constprop.0 (net/netlink/genetlink.c:1114) genl_rcv_msg (net/netlink/genetlink.c:1194 net/netlink/genetlink.c:1209) [...] netlink_rcv_skb (net/netlink/af_netlink.c:2550) genl_rcv (net/netlink/genetlink.c:1219) netlink_unicast (net/netlink/af_netlink.c:1319 net/netlink/af_netlink.c:1344) netlink_sendmsg (net/netlink/af_netlink.c:1894) __sock_sendmsg (net/socket.c:727 (discriminator 1) net/socket.c:742 (discriminator 1)) __sys_sendto (./include/linux/file.h:62 (discriminator 1) ./include/linux/file.h:83 (discriminator 1) net/socket.c:2183 (discriminator 1)) __x64_sys_sendto (net/socket.c:2213 (discriminator 1) net/socket.c:2209 (discriminator 1) net/socket.c:2209 (discriminator 1)) do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 1) arch/x86/entry/syscall_64.c:94 (discriminator 1)) entry_SYSCALL_64_after_hwframe (net/arch/x86/entry/entry_64.S:130) This patch makes udp_sock_create6 return -EPFNOSUPPORT instead, so callers correctly take their error paths. There is only one caller of the vulnerable function and only privileged users can trigger it. Fixes: fd384412e199b ("udp_tunnel: Seperate ipv6 functions into its own file.") Reported-by: Weiming Shi <bestswngs@gmail.com> Signed-off-by: Xiang Mei <xmei5@asu.edu> Link: https://patch.msgid.link/20260317010241.1893893-1-xmei5@asu.edu Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
9 daysclsact: Fix use-after-free in init/destroy rollback asymmetryDaniel Borkmann1-0/+5
[ Upstream commit a0671125d4f55e1e98d9bde8a0b671941987e208 ] Fix a use-after-free in the clsact qdisc upon init/destroy rollback asymmetry. The latter is achieved by first fully initializing a clsact instance, and then in a second step having a replacement failure for the new clsact qdisc instance. clsact_init() initializes ingress first and then takes care of the egress part. This can fail midway, for example, via tcf_block_get_ext(). Upon failure, the kernel will trigger the clsact_destroy() callback. Commit 1cb6f0bae504 ("bpf: Fix too early release of tcx_entry") details the way how the transition is happening. If tcf_block_get_ext on the q->ingress_block ends up failing, we took the tcx_miniq_inc reference count on the ingress side, but not yet on the egress side. clsact_destroy() tests whether the {ingress,egress}_entry was non-NULL. However, even in midway failure on the replacement, both are in fact non-NULL with a valid egress_entry from the previous clsact instance. What we really need to test for is whether the qdisc instance-specific ingress or egress side previously got initialized. This adds a small helper for checking the miniq initialization called mini_qdisc_pair_inited, and utilizes that upon clsact_destroy() in order to fix the use-after-free scenario. Convert the ingress_destroy() side as well so both are consistent to each other. Fixes: 1cb6f0bae504 ("bpf: Fix too early release of tcx_entry") Reported-by: Keenan Dong <keenanat2000@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260313065531.98639-1-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
9 daysnet/sched: teql: Fix double-free in teql_master_xmitJamal Hadi Salim1-0/+28
[ Upstream commit 66360460cab63c248ca5b1070a01c0c29133b960 ] Whenever a TEQL devices has a lockless Qdisc as root, qdisc_reset should be called using the seq_lock to avoid racing with the datapath. Failure to do so may cause crashes like the following: [ 238.028993][ T318] BUG: KASAN: double-free in skb_release_data (net/core/skbuff.c:1139) [ 238.029328][ T318] Free of addr ffff88810c67ec00 by task poc_teql_uaf_ke/318 [ 238.029749][ T318] [ 238.029900][ T318] CPU: 3 UID: 0 PID: 318 Comm: poc_teql_ke Not tainted 7.0.0-rc3-00149-ge5b31d988a41 #704 PREEMPT(full) [ 238.029906][ T318] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 238.029910][ T318] Call Trace: [ 238.029913][ T318] <TASK> [ 238.029916][ T318] dump_stack_lvl (lib/dump_stack.c:122) [ 238.029928][ T318] print_report (mm/kasan/report.c:379 mm/kasan/report.c:482) [ 238.029940][ T318] ? skb_release_data (net/core/skbuff.c:1139) [ 238.029944][ T318] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:221) ... [ 238.029957][ T318] ? skb_release_data (net/core/skbuff.c:1139) [ 238.029969][ T318] kasan_report_invalid_free (mm/kasan/report.c:221 mm/kasan/report.c:563) [ 238.029979][ T318] ? skb_release_data (net/core/skbuff.c:1139) [ 238.029989][ T318] check_slab_allocation (mm/kasan/common.c:231) [ 238.029995][ T318] kmem_cache_free (mm/slub.c:2637 (discriminator 1) mm/slub.c:6168 (discriminator 1) mm/slub.c:6298 (discriminator 1)) [ 238.030004][ T318] skb_release_data (net/core/skbuff.c:1139) ... [ 238.030025][ T318] sk_skb_reason_drop (net/core/skbuff.c:1256) [ 238.030032][ T318] pfifo_fast_reset (./include/linux/ptr_ring.h:171 ./include/linux/ptr_ring.h:309 ./include/linux/skb_array.h:98 net/sched/sch_generic.c:827) [ 238.030039][ T318] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:221) ... [ 238.030054][ T318] qdisc_reset (net/sched/sch_generic.c:1034) [ 238.030062][ T318] teql_destroy (./include/linux/spinlock.h:395 net/sched/sch_teql.c:157) [ 238.030071][ T318] __qdisc_destroy (./include/net/pkt_sched.h:328 net/sched/sch_generic.c:1077) [ 238.030077][ T318] qdisc_graft (net/sched/sch_api.c:1062 net/sched/sch_api.c:1053 net/sched/sch_api.c:1159) [ 238.030089][ T318] ? __pfx_qdisc_graft (net/sched/sch_api.c:1091) [ 238.030095][ T318] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:221) [ 238.030102][ T318] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:221) [ 238.030106][ T318] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:221) [ 238.030114][ T318] tc_get_qdisc (net/sched/sch_api.c:1529 net/sched/sch_api.c:1556) ... [ 238.072958][ T318] Allocated by task 303 on cpu 5 at 238.026275s: [ 238.073392][ T318] kasan_save_stack (mm/kasan/common.c:58) [ 238.073884][ T318] kasan_save_track (mm/kasan/common.c:64 (discriminator 5) mm/kasan/common.c:79 (discriminator 5)) [ 238.074230][ T318] __kasan_slab_alloc (mm/kasan/common.c:369) [ 238.074578][ T318] kmem_cache_alloc_node_noprof (./include/linux/kasan.h:253 mm/slub.c:4542 mm/slub.c:4869 mm/slub.c:4921) [ 238.076091][ T318] kmalloc_reserve (net/core/skbuff.c:616 (discriminator 107)) [ 238.076450][ T318] __alloc_skb (net/core/skbuff.c:713) [ 238.076834][ T318] alloc_skb_with_frags (./include/linux/skbuff.h:1383 net/core/skbuff.c:6763) [ 238.077178][ T318] sock_alloc_send_pskb (net/core/sock.c:2997) [ 238.077520][ T318] packet_sendmsg (net/packet/af_packet.c:2926 net/packet/af_packet.c:3019 net/packet/af_packet.c:3108) [ 238.081469][ T318] [ 238.081870][ T318] Freed by task 299 on cpu 1 at 238.028496s: [ 238.082761][ T318] kasan_save_stack (mm/kasan/common.c:58) [ 238.083481][ T318] kasan_save_track (mm/kasan/common.c:64 (discriminator 5) mm/kasan/common.c:79 (discriminator 5)) [ 238.085348][ T318] kasan_save_free_info (mm/kasan/generic.c:587 (discriminator 1)) [ 238.085900][ T318] __kasan_slab_free (mm/kasan/common.c:287) [ 238.086439][ T318] kmem_cache_free (mm/slub.c:6168 (discriminator 3) mm/slub.c:6298 (discriminator 3)) [ 238.087007][ T318] skb_release_data (net/core/skbuff.c:1139) [ 238.087491][ T318] consume_skb (net/core/skbuff.c:1451) [ 238.087757][ T318] teql_master_xmit (net/sched/sch_teql.c:358) [ 238.088116][ T318] dev_hard_start_xmit (./include/linux/netdevice.h:5324 ./include/linux/netdevice.h:5333 net/core/dev.c:3871 net/core/dev.c:3887) [ 238.088468][ T318] sch_direct_xmit (net/sched/sch_generic.c:347) [ 238.088820][ T318] __qdisc_run (net/sched/sch_generic.c:420 (discriminator 1)) [ 238.089166][ T318] __dev_queue_xmit (./include/net/sch_generic.h:229 ./include/net/pkt_sched.h:121 ./include/net/pkt_sched.h:117 net/core/dev.c:4196 net/core/dev.c:4802) Workflow to reproduce: 1. Initialize a TEQL topology (dummy0 and ifb0 as slaves, teql0 up). 2. Start multiple sender workers continuously transmitting packets through teql0 to drive teql_master_xmit(). 3. In parallel, repeatedly delete and re-add the root qdisc on dummy0 and ifb0 via RTNETLINK, forcing frequent teardown and reset activity (teql_destroy() / qdisc_reset()). 4. After running both workloads concurrently for several iterations, KASAN reports slab-use-after-free or double-free in the skb free path. Fix this by moving dev_reset_queue to sch_generic.h and calling it, instead of qdisc_reset, in teql_destroy since it handles both the lock and lockless cases correctly for root qdiscs. Fixes: 96009c7d500e ("sched: replace __QDISC_STATE_RUNNING bit with a spin lock") Reported-by: Xianrui Dong <keenanat2000@gmail.com> Tested-by: Xianrui Dong <keenanat2000@gmail.com> Co-developed-by: Victor Nogueira <victor@mojatatu.com> Signed-off-by: Victor Nogueira <victor@mojatatu.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://patch.msgid.link/20260315155422.147256-1-jhs@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
9 daysnet: stmmac: remove support for lpi_intr_oRussell King (Oracle)1-1/+0
commit 14eb64db8ff07b58a35b98375f446d9e20765674 upstream. The dwmac databook for v3.74a states that lpi_intr_o is a sideband signal which should be used to ungate the application clock, and this signal is synchronous to the receive clock. The receive clock can run at 2.5, 25 or 125MHz depending on the media speed, and can stop under the control of the link partner. This means that the time it takes to clear is dependent on the negotiated media speed, and thus can be 8, 40, or 400ns after reading the LPI control and status register. It has been observed with some aggressive link partners, this clock can stop while lpi_intr_o is still asserted, meaning that the signal remains asserted for an indefinite period that the local system has no direct control over. The LPI interrupts will still be signalled through the main interrupt path in any case, and this path is not dependent on the receive clock. This, since we do not gate the application clock, and the chances of adding clock gating in the future are slim due to the clocks being ill-defined, lpi_intr_o serves no useful purpose. Remove the code which requests the interrupt, and all associated code. Reported-by: Ovidiu Panait <ovidiu.panait.rb@renesas.com> Tested-by: Ovidiu Panait <ovidiu.panait.rb@renesas.com> # Renesas RZ/V2H board Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1vnJbt-00000007YYN-28nm@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Ovidiu Panait <ovidiu.panait.rb@renesas.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 daysx86/uprobes: Fix XOL allocation failure for 32-bit tasksOleg Nesterov1-0/+1
[ Upstream commit d55c571e4333fac71826e8db3b9753fadfbead6a ] This script #!/usr/bin/bash echo 0 > /proc/sys/kernel/randomize_va_space echo 'void main(void) {}' > TEST.c # -fcf-protection to ensure that the 1st endbr32 insn can't be emulated gcc -m32 -fcf-protection=branch TEST.c -o test bpftrace -e 'uprobe:./test:main {}' -c ./test "hangs", the probed ./test task enters an endless loop. The problem is that with randomize_va_space == 0 get_unmapped_area(TASK_SIZE - PAGE_SIZE) called by xol_add_vma() can not just return the "addr == TASK_SIZE - PAGE_SIZE" hint, this addr is used by the stack vma. arch_get_unmapped_area_topdown() doesn't take TIF_ADDR32 into account and in_32bit_syscall() is false, this leads to info.high_limit > TASK_SIZE. vm_unmapped_area() happily returns the high address > TASK_SIZE and then get_unmapped_area() returns -ENOMEM after the "if (addr > TASK_SIZE - len)" check. handle_swbp() doesn't report this failure (probably it should) and silently restarts the probed insn. Endless loop. I think that the right fix should change the x86 get_unmapped_area() paths to rely on TIF_ADDR32 rather than in_32bit_syscall(). Note also that if CONFIG_X86_X32_ABI=y, in_x32_syscall() falsely returns true in this case because ->orig_ax = -1. But we need a simple fix for -stable, so this patch just sets TS_COMPAT if the probed task is 32-bit to make in_ia32_syscall() true. Fixes: 1b028f784e8c ("x86/mm: Introduce mmap_compat_base() for 32-bit mmap()") Reported-by: Paulo Andrade <pandrade@redhat.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/all/aV5uldEvV7pb4RA8@redhat.com/ Cc: stable@vger.kernel.org Link: https://patch.msgid.link/aWO7Fdxn39piQnxu@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 daysnet: dsa: properly keep track of conduit referenceVladimir Oltean1-0/+1
commit 06e219f6a706c367c93051f408ac61417643d2f9 upstream. Problem description ------------------- DSA has a mumbo-jumbo of reference handling of the conduit net device and its kobject which, sadly, is just wrong and doesn't make sense. There are two distinct problems. 1. The OF path, which uses of_find_net_device_by_node(), never releases the elevated refcount on the conduit's kobject. Nominally, the OF and non-OF paths should result in objects having identical reference counts taken, and it is already suspicious that dsa_dev_to_net_device() has a put_device() call which is missing in dsa_port_parse_of(), but we can actually even verify that an issue exists. With CONFIG_DEBUG_KOBJECT_RELEASE=y, if we run this command "before" and "after" applying this patch: (unbind the conduit driver for net device eno2) echo 0000:00:00.2 > /sys/bus/pci/drivers/fsl_enetc/unbind we see these lines in the output diff which appear only with the patch applied: kobject: 'eno2' (ffff002009a3a6b8): kobject_release, parent 0000000000000000 (delayed 1000) kobject: '109' (ffff0020099d59a0): kobject_release, parent 0000000000000000 (delayed 1000) 2. After we find the conduit interface one way (OF) or another (non-OF), it can get unregistered at any time, and DSA remains with a long-lived, but in this case stale, cpu_dp->conduit pointer. Holding the net device's underlying kobject isn't actually of much help, it just prevents it from being freed (but we never need that kobject directly). What helps us to prevent the net device from being unregistered is the parallel netdev reference mechanism (dev_hold() and dev_put()). Actually we actually use that netdev tracker mechanism implicitly on user ports since commit 2f1e8ea726e9 ("net: dsa: link interfaces with the DSA master to get rid of lockdep warnings"), via netdev_upper_dev_link(). But time still passes at DSA switch probe time between the initial of_find_net_device_by_node() code and the user port creation time, time during which the conduit could unregister itself and DSA wouldn't know about it. So we have to run of_find_net_device_by_node() under rtnl_lock() to prevent that from happening, and release the lock only with the netdev tracker having acquired the reference. Do we need to keep the reference until dsa_unregister_switch() / dsa_switch_shutdown()? 1: Maybe yes. A switch device will still be registered even if all user ports failed to probe, see commit 86f8b1c01a0a ("net: dsa: Do not make user port errors fatal"), and the cpu_dp->conduit pointers remain valid. I haven't audited all call paths to see whether they will actually use the conduit in lack of any user port, but if they do, it seems safer to not rely on user ports for that reference. 2. Definitely yes. We support changing the conduit which a user port is associated to, and we can get into a situation where we've moved all user ports away from a conduit, thus no longer hold any reference to it via the net device tracker. But we shouldn't let it go nonetheless - see the next change in relation to dsa_tree_find_first_conduit() and LAG conduits which disappear. We have to be prepared to return to the physical conduit, so the CPU port must explicitly keep another reference to it. This is also to say: the user ports and their CPU ports may not always keep a reference to the same conduit net device, and both are needed. As for the conduit's kobject for the /sys/class/net/ entry, we don't care about it, we can release it as soon as we hold the net device object itself. History and blame attribution ----------------------------- The code has been refactored so many times, it is very difficult to follow and properly attribute a blame, but I'll try to make a short history which I hope to be correct. We have two distinct probing paths: - one for OF, introduced in 2016 in commit 83c0afaec7b7 ("net: dsa: Add new binding implementation") - one for non-OF, introduced in 2017 in commit 71e0bbde0d88 ("net: dsa: Add support for platform data") These are both complete rewrites of the original probing paths (which used struct dsa_switch_driver and other weird stuff, instead of regular devices on their respective buses for register access, like MDIO, SPI, I2C etc): - one for OF, introduced in 2013 in commit 5e95329b701c ("dsa: add device tree bindings to register DSA switches") - one for non-OF, introduced in 2008 in commit 91da11f870f0 ("net: Distributed Switch Architecture protocol support") except for tiny bits and pieces like dsa_dev_to_net_device() which were seemingly carried over since the original commit, and used to this day. The point is that the original probing paths received a fix in 2015 in the form of commit 679fb46c5785 ("net: dsa: Add missing master netdev dev_put() calls"), but the fix never made it into the "new" (dsa2) probing paths that can still be traced to today, and the fixed probing path was later deleted in 2019 in commit 93e86b3bc842 ("net: dsa: Remove legacy probing support"). That is to say, the new probing paths were never quite correct in this area. The existence of the legacy probing support which was deleted in 2019 explains why dsa_dev_to_net_device() returns a conduit with elevated refcount (because it was supposed to be released during dsa_remove_dst()). After the removal of the legacy code, the only user of dsa_dev_to_net_device() calls dev_put(conduit) immediately after this function returns. This pattern makes no sense today, and can only be interpreted historically to understand why dev_hold() was there in the first place. Change details -------------- Today we have a better netdev tracking infrastructure which we should use. Logically netdev_hold() belongs in common code (dsa_port_parse_cpu(), where dp->conduit is assigned), but there is a tradeoff to be made with the rtnl_lock() section which would become a bit too long if we did that - dsa_port_parse_cpu() also calls request_module(). So we duplicate a bit of logic in order for the callers of dsa_port_parse_cpu() to be the ones responsible of holding the conduit reference and releasing it on error. This shortens the rtnl_lock() section significantly. In the dsa_switch_probe() error path, dsa_switch_release_ports() will be called in a number of situations, one being where dsa_port_parse_cpu() maybe didn't get the chance to run at all (a different port failed earlier, etc). So we have to test for the conduit being NULL prior to calling netdev_put(). There have still been so many transformations to the code since the blamed commits (rename master -> conduit, commit 0650bf52b31f ("net: dsa: be compatible with masters which unregister on shutdown")), that it only makes sense to fix the code using the best methods available today and see how it can be backported to stable later. I suspect the fix cannot even be backported to kernels which lack dsa_switch_shutdown(), and I suspect this is also maybe why the long-lived conduit reference didn't make it into the new DSA probing paths at the time (problems during shutdown). Because dsa_dev_to_net_device() has a single call site and has to be changed anyway, the logic was just absorbed into the non-OF dsa_port_parse(). Tested on the ocelot/felix switch and on dsa_loop, both on the NXP LS1028A with CONFIG_DEBUG_KOBJECT_RELEASE=y. Reported-by: Ma Ke <make24@iscas.ac.cn> Closes: https://lore.kernel.org/netdev/20251214131204.4684-1-make24@iscas.ac.cn/ Fixes: 83c0afaec7b7 ("net: dsa: Add new binding implementation") Fixes: 71e0bbde0d88 ("net: dsa: Add support for platform data") Reviewed-by: Jonas Gorski <jonas.gorski@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20251215150236.3931670-1-vladimir.oltean@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> [ backport: "conduit" -> "master" in code, kept original commit message ] Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 daystracing: Add recursion protection in kernel stack trace recordingSteven Rostedt1-0/+9
[ Upstream commit 5f1ef0dfcb5b7f4a91a9b0e0ba533efd9f7e2cdb ] A bug was reported about an infinite recursion caused by tracing the rcu events with the kernel stack trace trigger enabled. The stack trace code called back into RCU which then called the stack trace again. Expand the ftrace recursion protection to add a set of bits to protect events from recursion. Each bit represents the context that the event is in (normal, softirq, interrupt and NMI). Have the stack trace code use the interrupt context to protect against recursion. Note, the bug showed an issue in both the RCU code as well as the tracing stacktrace code. This only handles the tracing stack trace side of the bug. The RCU fix will be handled separately. Link: https://lore.kernel.org/all/20260102122807.7025fc87@gandalf.local.home/ Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Boqun Feng <boqun.feng@gmail.com> Link: https://patch.msgid.link/20260105203141.515cd49f@gandalf.local.home Reported-by: Yao Kai <yaokai34@huawei.com> Tested-by: Yao Kai <yaokai34@huawei.com> Fixes: 5f5fa7ea89dc ("rcu: Don't use negative nesting depth in __rcu_read_unlock()") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Leon Chen <leonchen.oss@139.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 daysrxrpc: Fix recvmsg() unconditional requeueDavid Howells1-0/+4
[ Upstream commit 2c28769a51deb6022d7fbd499987e237a01dd63a ] If rxrpc_recvmsg() fails because MSG_DONTWAIT was specified but the call at the front of the recvmsg queue already has its mutex locked, it requeues the call - whether or not the call is already queued. The call may be on the queue because MSG_PEEK was also passed and so the call was not dequeued or because the I/O thread requeued it. The unconditional requeue may then corrupt the recvmsg queue, leading to things like UAFs or refcount underruns. Fix this by only requeuing the call if it isn't already on the queue - and moving it to the front if it is already queued. If we don't queue it, we have to put the ref we obtained by dequeuing it. Also, MSG_PEEK doesn't dequeue the call so shouldn't call rxrpc_notify_socket() for the call if we didn't use up all the data on the queue, so fix that also. Fixes: 540b1c48c37a ("rxrpc: Fix deadlock between call creation and sendmsg/recvmsg") Reported-by: Faith <faith@zellic.io> Reported-by: Pumpkin Chang <pumpkin@devco.re> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Marc Dionne <marc.dionne@auristor.com> cc: Nir Ohfeld <niro@wiz.io> cc: Willy Tarreau <w@1wt.eu> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://patch.msgid.link/95163.1768428203@warthog.procyon.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org> [Use spin_unlock instead of spin_unlock_irq to maintain context consistency.] Signed-off-by: Robert Garcia <rob_garcia@163.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 daysNFS: Fix a deadlock involving nfs_release_folio()Trond Myklebust1-0/+1
[ Upstream commit cce0be6eb4971456b703aaeafd571650d314bcca ] Wang Zhaolong reports a deadlock involving NFSv4.1 state recovery waiting on kthreadd, which is attempting to reclaim memory by calling nfs_release_folio(). The latter cannot make progress due to state recovery being needed. It seems that the only safe thing to do here is to kick off a writeback of the folio, without waiting for completion, or else kicking off an asynchronous commit. Reported-by: Wang Zhaolong <wangzhaolong@huaweicloud.com> Fixes: 96780ca55e3c ("NFS: fix up nfs_release_folio() to try to release the page") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> [ Minor conflict resolved. ] Signed-off-by: Li hongliang <1468888505@139.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 daysbtrfs: fix NULL dereference on root when tracing inode evictionMiquel Sabaté Solà1-2/+2
[ Upstream commit f157dd661339fc6f5f2b574fe2429c43bd309534 ] When evicting an inode the first thing we do is to setup tracing for it, which implies fetching the root's id. But in btrfs_evict_inode() the root might be NULL, as implied in the next check that we do in btrfs_evict_inode(). Hence, we either should set the ->root_objectid to 0 in case the root is NULL, or we move tracing setup after checking that the root is not NULL. Setting the rootid to 0 at least gives us the possibility to trace this call even in the case when the root is NULL, so that's the solution taken here. Fixes: 1abe9b8a138c ("Btrfs: add initial tracepoint support for btrfs") Reported-by: syzbot+d991fea1b4b23b1f6bf8@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=d991fea1b4b23b1f6bf8 Signed-off-by: Miquel Sabaté Solà <mssola@mssola.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> [ Adjust context ] Signed-off-by: Bin Lan <lanbincn@139.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 daysnet/sched: act_gate: snapshot parameters with RCU on replacePaul Moses1-7/+26
[ Upstream commit 62413a9c3cb183afb9bb6e94dd68caf4e4145f4c ] The gate action can be replaced while the hrtimer callback or dump path is walking the schedule list. Convert the parameters to an RCU-protected snapshot and swap updates under tcf_lock, freeing the previous snapshot via call_rcu(). When REPLACE omits the entry list, preserve the existing schedule so the effective state is unchanged. Fixes: a51c328df310 ("net: qos: introduce a gate control flow action") Cc: stable@vger.kernel.org Signed-off-by: Paul Moses <p@1g4.org> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Victor Nogueira <victor@mojatatu.com> Link: https://patch.msgid.link/20260223150512.2251594-2-p@1g4.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> [ hrtimer_setup() => hrtimer_init() + keep is_tcf_gate() ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 daysirqchip/gic-v3-its: Limit number of per-device MSIs to the range the ITS ↵Marc Zyngier1-0/+1
supports commit ce9e40a9a5e5cff0b1b0d2fa582b3d71a8ce68e8 upstream. The ITS driver blindly assumes that EventIDs are in abundant supply, to the point where it never checks how many the hardware actually supports. It turns out that some pretty esoteric integrations make it so that only a few bits are available, all the way down to a single bit. Enforce the advertised limitation at the point of allocating the device structure, and hope that the endpoint driver can deal with such limitation. Fixes: 84a6a2e7fc18d ("irqchip: GICv3: ITS: device allocation and configuration") Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Zenghui Yu <zenghui.yu@linux.dev> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260206154816.3582887-1-maz@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 daysmmc: core: Avoid bitfield RMW for claim/retune flagsPenghe Geng1-4/+5
commit 901084c51a0a8fb42a3f37d2e9c62083c495f824 upstream. Move claimed and retune control flags out of the bitfield word to avoid unrelated RMW side effects in asynchronous contexts. The host->claimed bit shared a word with retune flags. Writes to claimed in __mmc_claim_host() or retune_now in mmc_mq_queue_rq() can overwrite other bits when concurrent updates happen in other contexts, triggering spurious WARN_ON(!host->claimed). Convert claimed, can_retune, retune_now and retune_paused to bool to remove shared-word coupling. Fixes: 6c0cedd1ef952 ("mmc: core: Introduce host claiming by context") Fixes: 1e8e55b67030c ("mmc: block: Add CQE support") Cc: stable@vger.kernel.org Suggested-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Penghe Geng <pgeng@nvidia.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 daysmm/tracing: rss_stat: ensure curr is false from kthread contextKalesh Singh1-1/+7
commit 079c24d5690262e83ee476e2a548e416f3237511 upstream. The rss_stat trace event allows userspace tools, like Perfetto [1], to inspect per-process RSS metric changes over time. The curr field was introduced to rss_stat in commit e4dcad204d3a ("rss_stat: add support to detect RSS updates of external mm"). Its intent is to indicate whether the RSS update is for the mm_struct of the current execution context; and is set to false when operating on a remote mm_struct (e.g., via kswapd or a direct reclaimer). However, an issue arises when a kernel thread temporarily adopts a user process's mm_struct. Kernel threads do not have their own mm_struct and normally have current->mm set to NULL. To operate on user memory, they can "borrow" a memory context using kthread_use_mm(), which sets current->mm to the user process's mm. This can be observed, for example, in the USB Function Filesystem (FFS) driver. The ffs_user_copy_worker() handles AIO completions and uses kthread_use_mm() to copy data to a user-space buffer. If a page fault occurs during this copy, the fault handler executes in the kthread's context. At this point, current is the kthread, but current->mm points to the user process's mm. Since the rss_stat event (from the page fault) is for that same mm, the condition current->mm == mm becomes true, causing curr to be incorrectly set to true when the trace event is emitted. This is misleading because it suggests the mm belongs to the kthread, confusing userspace tools that track per-process RSS changes and corrupting their mm_id-to-process association. Fix this by ensuring curr is always false when the trace event is emitted from a kthread context by checking for the PF_KTHREAD flag. Link: https://lkml.kernel.org/r/20260219233708.1971199-1-kaleshsingh@google.com Link: https://perfetto.dev/ [1] Fixes: e4dcad204d3a ("rss_stat: add support to detect RSS updates of external mm") Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Acked-by: Zi Yan <ziy@nvidia.com> Acked-by: SeongJae Park <sj@kernel.org> Reviewed-by: Pedro Falcato <pfalcato@suse.de> Cc: "David Hildenbrand (Arm)" <david@kernel.org> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: <stable@vger.kernel.org> [5.10+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 daysUSB: core: Limit the length of unkillable synchronous timeoutsAlan Stern1-0/+3
commit 1015c27a5e1a63efae2b18a9901494474b4d1dc3 upstream. The usb_control_msg(), usb_bulk_msg(), and usb_interrupt_msg() APIs in usbcore allow unlimited timeout durations. And since they use uninterruptible waits, this leaves open the possibility of hanging a task for an indefinitely long time, with no way to kill it short of unplugging the target device. To prevent this sort of problem, enforce a maximum limit on the length of these unkillable timeouts. The limit chosen here, somewhat arbitrarily, is 60 seconds. On many systems (although not all) this is short enough to avoid triggering the kernel's hung-task detector. In addition, clear up the ambiguity of negative timeout values by treating them the same as 0, i.e., using the maximum allowed timeout. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Link: https://lore.kernel.org/linux-usb/3acfe838-6334-4f6d-be7c-4bb01704b33d@rowland.harvard.edu/ Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") CC: stable@vger.kernel.org Link: https://patch.msgid.link/15fc9773-a007-47b0-a703-df89a8cf83dd@rowland.harvard.edu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 daysUSB: usbcore: Introduce usb_bulk_msg_killable()Alan Stern1-2/+3
commit 416909962e7cdf29fd01ac523c953f37708df93d upstream. The synchronous message API in usbcore (usb_control_msg(), usb_bulk_msg(), and so on) uses uninterruptible waits. However, drivers may call these routines in the context of a user thread, which means it ought to be possible to at least kill them. For this reason, introduce a new usb_bulk_msg_killable() function which behaves the same as usb_bulk_msg() except for using wait_for_completion_killable_timeout() instead of wait_for_completion_timeout(). The same can be done later for usb_control_msg() later on, if it turns out to be needed. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Suggested-by: Oliver Neukum <oneukum@suse.com> Link: https://lore.kernel.org/linux-usb/3acfe838-6334-4f6d-be7c-4bb01704b33d@rowland.harvard.edu/ Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") CC: stable@vger.kernel.org Link: https://patch.msgid.link/248628b4-cc83-4e81-a620-3ce4e0376d41@rowland.harvard.edu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 daysnet/mlx5: IFC updates for disabled host PFDaniel Jurgens1-1/+3
[ Upstream commit cd1746cb6555a2238c4aae9f9d60b637a61bf177 ] The port 2 host PF can be disabled, this bit reflects that setting. Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Reviewed-by: William Tu <witu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1752064867-16874-3-git-send-email-tariqt@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Stable-dep-of: aed763abf0e9 ("net/mlx5: Fix deadlock between devlink lock and esw->wq") Signed-off-by: Sasha Levin <sashal@kernel.org>
9 daysnet/sched: Only allow act_ct to bind to clsact/ingress qdiscs and shared blocksVictor Nogueira1-0/+1
commit 11cb63b0d1a0685e0831ae3c77223e002ef18189 upstream. As Paolo said earlier [1]: "Since the blamed commit below, classify can return TC_ACT_CONSUMED while the current skb being held by the defragmentation engine. As reported by GangMin Kim, if such packet is that may cause a UaF when the defrag engine later on tries to tuch again such packet." act_ct was never meant to be used in the egress path, however some users are attaching it to egress today [2]. Attempting to reach a middle ground, we noticed that, while most qdiscs are not handling TC_ACT_CONSUMED, clsact/ingress qdiscs are. With that in mind, we address the issue by only allowing act_ct to bind to clsact/ingress qdiscs and shared blocks. That way it's still possible to attach act_ct to egress (albeit only with clsact). [1] https://lore.kernel.org/netdev/674b8cbfc385c6f37fb29a1de08d8fe5c2b0fbee.1771321118.git.pabeni@redhat.com/ [2] https://lore.kernel.org/netdev/cc6bfb4a-4a2b-42d8-b9ce-7ef6644fb22b@ovn.org/ Reported-by: GangMin Kim <km.kim1503@gmail.com> Fixes: 3f14b377d01d ("net/sched: act_ct: fix skb leak and crash on ooo frags") CC: stable@vger.kernel.org Signed-off-by: Victor Nogueira <victor@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://patch.msgid.link/20260225134349.1287037-1-victor@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 daysxsk: introduce helper to determine rxq->frag_sizeLarysa Zaremba1-0/+10
[ Upstream commit 16394d80539937d348dd3b9ea32415c54e67a81b ] rxq->frag_size is basically a step between consecutive strictly aligned frames. In ZC mode, chunk size fits exactly, but if chunks are unaligned, there is no safe way to determine accessible space to grow tailroom. Report frag_size to be zero, if chunks are unaligned, chunk_size otherwise. Fixes: 24ea50127ecf ("xsk: support mbuf on ZC RX") Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Link: https://patch.msgid.link/20260305111253.2