summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
35 hoursfutex: Fix UaF between futex_key_to_node_opt() and vma_replace_policy()Hao-Yu Yang1-1/+1
[ Upstream commit 190a8c48ff623c3d67cb295b4536a660db2012aa ] During futex_key_to_node_opt() execution, vma->vm_policy is read under speculative mmap lock and RCU. Concurrently, mbind() may call vma_replace_policy() which frees the old mempolicy immediately via kmem_cache_free(). This creates a race where __futex_key_to_node() dereferences a freed mempolicy pointer, causing a use-after-free read of mpol->mode. [ 151.412631] BUG: KASAN: slab-use-after-free in __futex_key_to_node (kernel/futex/core.c:349) [ 151.414046] Read of size 2 at addr ffff888001c49634 by task e/87 [ 151.415969] Call Trace: [ 151.416732] __asan_load2 (mm/kasan/generic.c:271) [ 151.416777] __futex_key_to_node (kernel/futex/core.c:349) [ 151.416822] get_futex_key (kernel/futex/core.c:374 kernel/futex/core.c:386 kernel/futex/core.c:593) Fix by adding rcu to __mpol_put(). Fixes: c042c505210d ("futex: Implement FUTEX2_MPOL") Reported-by: Hao-Yu Yang <naup96721@gmail.com> Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Hao-Yu Yang <naup96721@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Link: https://patch.msgid.link/20260324174418.GB1850007@noisy.programming.kicks-ass.net Signed-off-by: Sasha Levin <sashal@kernel.org>
35 hoursfutex: Require sys_futex_requeue() to have identical flagsPeter Zijlstra1-0/+8
[ Upstream commit 19f94b39058681dec64a10ebeb6f23fe7fc3f77a ] Nicholas reported that his LLM found it was possible to create a UaF when sys_futex_requeue() is used with different flags. The initial motivation for allowing different flags was the variable sized futex, but since that hasn't been merged (yet), simply mandate the flags are identical, as is the case for the old style sys_futex() requeue operations. Fixes: 0f4b5f972216 ("futex: Add sys_futex_requeue()") Reported-by: Nicholas Carlini <npc@anthropic.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
35 hoursfutex: Clear stale exiting pointer in futex_lock_pi() retry pathDavidlohr Bueso1-1/+2
commit 210d36d892de5195e6766c45519dfb1e65f3eb83 upstream. Fuzzying/stressing futexes triggered: WARNING: kernel/futex/core.c:825 at wait_for_owner_exiting+0x7a/0x80, CPU#11: futex_lock_pi_s/524 When futex_lock_pi_atomic() sees the owner is exiting, it returns -EBUSY and stores a refcounted task pointer in 'exiting'. After wait_for_owner_exiting() consumes that reference, the local pointer is never reset to nil. Upon a retry, if futex_lock_pi_atomic() returns a different error, the bogus pointer is passed to wait_for_owner_exiting(). CPU0 CPU1 CPU2 futex_lock_pi(uaddr) // acquires the PI futex exit() futex_cleanup_begin() futex_state = EXITING; futex_lock_pi(uaddr) futex_lock_pi_atomic() attach_to_pi_owner() // observes EXITING *exiting = owner; // takes ref return -EBUSY wait_for_owner_exiting(-EBUSY, owner) put_task_struct(); // drops ref // exiting still points to owner goto retry; futex_lock_pi_atomic() lock_pi_update_atomic() cmpxchg(uaddr) *uaddr ^= WAITERS // whatever // value changed return -EAGAIN; wait_for_owner_exiting(-EAGAIN, exiting) // stale WARN_ON_ONCE(exiting) Fix this by resetting upon retry, essentially aligning it with requeue_pi. Fixes: 3ef240eaff36 ("futex: Prevent exit livelock") Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260326001759.4129680-1-dave@stgolabs.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
35 hoursalarmtimer: Fix argument order in alarm_timer_forward()Zhan Xusheng1-1/+1
commit 5d16467ae56343b9205caedf85e3a131e0914ad8 upstream. alarm_timer_forward() passes arguments to alarm_forward() in the wrong order: alarm_forward(alarm, timr->it_interval, now); However, alarm_forward() is defined as: u64 alarm_forward(struct alarm *alarm, ktime_t now, ktime_t interval); and uses the second argument as the current time: delta = ktime_sub(now, alarm->node.expires); Passing the interval as "now" results in incorrect delta computation, which can lead to missed expirations or incorrect overrun accounting. This issue has been present since the introduction of alarm_timer_forward(). Fix this by swapping the arguments. Fixes: e7561f1633ac ("alarmtimer: Implement forward callback") Signed-off-by: Zhan Xusheng <zhanxusheng@xiaomi.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260323061130.29991-1-zhanxusheng@xiaomi.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
35 hourstracing: Fix potential deadlock in cpu hotplug with osnoiseLuo Haiyang1-5/+5
commit 1f9885732248d22f788e4992c739a98c88ab8a55 upstream. The following sequence may leads deadlock in cpu hotplug: task1 task2 task3 ----- ----- ----- mutex_lock(&interface_lock) [CPU GOING OFFLINE] cpus_write_lock(); osnoise_cpu_die(); kthread_stop(task3); wait_for_completion(); osnoise_sleep(); mutex_lock(&interface_lock); cpus_read_lock(); [DEAD LOCK] Fix by swap the order of cpus_read_lock() and mutex_lock(&interface_lock). Cc: stable@vger.kernel.org Cc: <mathieu.desnoyers@efficios.com> Cc: <zhang.run@zte.com.cn> Cc: <yang.tao172@zte.com.cn> Cc: <ran.xiaokai@zte.com.cn> Fixes: bce29ac9ce0bb ("trace: Add osnoise tracer") Link: https://patch.msgid.link/20260326141953414bVSj33dAYktqp9Oiyizq8@zte.com.cn Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Luo Haiyang <luo.haiyang@zte.com.cn> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
35 hourssysctl: fix uninitialized variable in proc_do_large_bitmapMarc Buerg1-1/+1
[ Upstream commit f63a9df7e3f9f842945d292a19d9938924f066f9 ] proc_do_large_bitmap() does not initialize variable c, which is expected to be set to a trailing character by proc_get_long(). However, proc_get_long() only sets c when the input buffer contains a trailing character after the parsed value. If c is not initialized it may happen to contain a '-'. If this is the case proc_do_large_bitmap() expects to be able to parse a second part of the input buffer. If there is no second part an unjustified -EINVAL will be returned. Initialize c to 0 to prevent returning -EINVAL on valid input. Fixes: 9f977fb7ae9d ("sysctl: add proc_do_large_bitmap") Signed-off-by: Marc Buerg <buermarc@googlemail.com> Reviewed-by: Joel Granados <joel.granados@kernel.org> Signed-off-by: Joel Granados <joel.granados@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
35 hoursPM: sleep: Drop spurious WARN_ON() from pm_restore_gfp_mask()Youngjun Park1-1/+1
[ Upstream commit a8d51efb5929ae308895455a3e496b5eca2cd143 ] Commit 35e4a69b2003f ("PM: sleep: Allow pm_restrict_gfp_mask() stacking") introduced refcount-based GFP mask management that warns when pm_restore_gfp_mask() is called with saved_gfp_count == 0. Some hibernation paths call pm_restore_gfp_mask() defensively where the GFP mask may or may not be restricted depending on the execution path. For example, the uswsusp interface invokes it in SNAPSHOT_CREATE_IMAGE, SNAPSHOT_UNFREEZE, and snapshot_release(). Before the stacking change this was a silent no-op; it now triggers a spurious WARNING. Remove the WARN_ON() wrapper from the !saved_gfp_count check while retaining the check itself, so that defensive calls remain harmless without producing false warnings. Fixes: 35e4a69b2003f ("PM: sleep: Allow pm_restrict_gfp_mask() stacking") Signed-off-by: Youngjun Park <youngjun.park@lge.com> [ rjw: Subject tweak ] Link: https://patch.msgid.link/20260322120528.750178-1-youngjun.park@lge.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
35 hoursPM: hibernate: Drain trailing zero pages on userspace restoreAlberto Garcia1-0/+11
[ Upstream commit 734eba62cd32cb9ceffa09e57cdc03d761528525 ] Commit 005e8dddd497 ("PM: hibernate: don't store zero pages in the image file") added an optimization to skip zero-filled pages in the hibernation image. On restore, zero pages are handled internally by snapshot_write_next() in a loop that processes them without returning to the caller. With the userspace restore interface, writing the last non-zero page to /dev/snapshot is followed by the SNAPSHOT_ATOMIC_RESTORE ioctl. At this point there are no more calls to snapshot_write_next() so any trailing zero pages are not processed, snapshot_image_loaded() fails because handle->cur is smaller than expected, the ioctl returns -EPERM and the image is not restored. The in-kernel restore path is not affected by this because the loop in load_image() in swap.c calls snapshot_write_next() until it returns 0. It is this final call that drains any trailing zero pages. Fixed by calling snapshot_write_next() in snapshot_write_finalize(), giving the kernel the chance to drain any trailing zero pages. Fixes: 005e8dddd497 ("PM: hibernate: don't store zero pages in the image file") Signed-off-by: Alberto Garcia <berto@igalia.com> Acked-by: Brian Geffon <bgeffon@google.com> Link: https://patch.msgid.link/ef5a7c5e3e3dbd17dcb20efaa0c53a47a23498bb.1773075892.git.berto@igalia.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
35 hoursdma: swiotlb: add KMSAN annotations to swiotlb_bounce()Shigeru Yoshida1-2/+19
[ Upstream commit 6f770b73d0311a5b099277653199bb6421c4fed2 ] When a device performs DMA to a bounce buffer, KMSAN is unaware of the write and does not mark the data as initialized. When swiotlb_bounce() later copies the bounce buffer back to the original buffer, memcpy propagates the uninitialized shadow to the original buffer, causing false positive uninit-value reports. Fix this by calling kmsan_unpoison_memory() on the bounce buffer before copying it back in the DMA_FROM_DEVICE path, so that memcpy naturally propagates initialized shadow to the destination. Suggested-by: Alexander Potapenko <glider@google.com> Link: https://lore.kernel.org/CAG_fn=WUGta-paG1BgsGRoAR+fmuCgh3xo=R3XdzOt_-DqSdHw@mail.gmail.com/ Fixes: 7ade4f10779c ("dma: kmsan: unpoison DMA mappings") Signed-off-by: Shigeru Yoshida <syoshida@redhat.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20260315082750.2375581-1-syoshida@redhat.com Signed-off-by: Sasha Levin <sashal@kernel.org>
35 hourssched_ext: Use WRITE_ONCE() for the write side of dsq->seq updatezhidao su1-1/+1
[ Upstream commit 7a8464555d2e5f038758bb19e72ab4710b79e9cd ] bpf_iter_scx_dsq_new() reads dsq->seq via READ_ONCE() without holding any lock, making dsq->seq a lock-free concurrently accessed variable. However, dispatch_enqueue(), the sole writer of dsq->seq, uses a plain increment without the matching WRITE_ONCE() on the write side: dsq->seq++; ^^^^^^^^^^^ plain write -- KCSAN data race The KCSAN documentation requires that if one accessor uses READ_ONCE() or WRITE_ONCE() on a variable to annotate lock-free access, all other accesses must also use the appropriate accessor. A plain write leaves the pair incomplete and will trigger KCSAN warnings. Fix by using WRITE_ONCE() for the write side of the update: WRITE_ONCE(dsq->seq, dsq->seq + 1); This is consistent with bpf_iter_scx_dsq_new() and makes the concurrent access annotation complete and KCSAN-clean. Signed-off-by: zhidao su <suzhidao@xiaomi.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
35 hoursbpf: Fix u32/s32 bounds when ranges cross min/max boundaryEduard Zingerman1-0/+24
[ Upstream commit fbc7aef517d8765e4c425d2792409bb9bf2e1f13 ] Same as in __reg64_deduce_bounds(), refine s32/u32 ranges in __reg32_deduce_bounds() in the following situations: - s32 range crosses U32_MAX/0 boundary, positive part of the s32 range overlaps with u32 range: 0 U32_MAX | [xxxxxxxxxxxxxx u32 range xxxxxxxxxxxxxx] | |----------------------------|----------------------------| |xxxxx s32 range xxxxxxxxx] [xxxxxxx| 0 S32_MAX S32_MIN -1 - s32 range crosses U32_MAX/0 boundary, negative part of the s32 range overlaps with u32 range: 0 U32_MAX | [xxxxxxxxxxxxxx u32 range xxxxxxxxxxxxxx] | |----------------------------|----------------------------| |xxxxxxxxx] [xxxxxxxxxxxx s32 range | 0 S32_MAX S32_MIN -1 - No refinement if ranges overlap in two intervals. This helps for e.g. consider the following program: call %[bpf_get_prandom_u32]; w0 &= 0xffffffff; if w0 < 0x3 goto 1f; // on fall-through u32 range [3..U32_MAX] if w0 s> 0x1 goto 1f; // on fall-through s32 range [S32_MIN..1] if w0 s< 0x0 goto 1f; // range can be narrowed to [S32_MIN..-1] r10 = 0; 1: ...; The reg_bounds.c selftest is updated to incorporate identical logic, refinement based on non-overflowing range halves: ((x ∩ [0, smax]) ∩ (y ∩ [0, smax])) ∪ ((x ∩ [smin,-1]) ∩ (y ∩ [smin,-1])) Reported-by: Andrea Righi <arighi@nvidia.com> Reported-by: Emil Tsalapatis <emil@etsalapatis.com> Closes: https://lore.kernel.org/bpf/aakqucg4vcujVwif@gpd4/T/ Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20260306-bpf-32-bit-range-overflow-v3-1-f7f67e060a6b@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
35 hoursmodule: Fix kernel panic when a symbol st_shndx is out of boundsIhor Solodrai1-0/+7
[ Upstream commit f9d69d5e7bde2295eb7488a56f094ac8f5383b92 ] The module loader doesn't check for bounds of the ELF section index in simplify_symbols(): for (i = 1; i < symsec->sh_size / sizeof(Elf_Sym); i++) { const char *name = info->strtab + sym[i].st_name; switch (sym[i].st_shndx) { case SHN_COMMON: [...] default: /* Divert to percpu allocation if a percpu var. */ if (sym[i].st_shndx == info->index.pcpu) secbase = (unsigned long)mod_percpu(mod); else /** HERE --> **/ secbase = info->sechdrs[sym[i].st_shndx].sh_addr; sym[i].st_value += secbase; break; } } A symbol with an out-of-bounds st_shndx value, for example 0xffff (known as SHN_XINDEX or SHN_HIRESERVE), may cause a kernel panic: BUG: unable to handle page fault for address: ... RIP: 0010:simplify_symbols+0x2b2/0x480 ... Kernel panic - not syncing: Fatal exception This can happen when module ELF is legitimately using SHN_XINDEX or when it is corrupted. Add a bounds check in simplify_symbols() to validate that st_shndx is within the valid range before using it. This issue was discovered due to a bug in llvm-objcopy, see relevant discussion for details [1]. [1] https://lore.kernel.org/linux-modules/20251224005752.201911-1-ihor.solodrai@linux.dev/ Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev> Reviewed-by: Daniel Gomez <da.gomez@samsung.com> Reviewed-by: Petr Pavlu <petr.pavlu@suse.com> Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
35 hoursbpf: Fix unsound scalar forking in maybe_fork_scalars() for BPF_ORDaniel Wade1-1/+1
[ Upstream commit c845894ebd6fb43226b3118d6b017942550910c5 ] maybe_fork_scalars() is called for both BPF_AND and BPF_OR when the source operand is a constant. When dst has signed range [-1, 0], it forks the verifier state: the pushed path gets dst = 0, the current path gets dst = -1. For BPF_AND this is correct: 0 & K == 0. For BPF_OR this is wrong: 0 | K == K, not 0. The pushed path therefore tracks dst as 0 when the runtime value is K, producing an exploitable verifier/runtime divergence that allows out-of-bounds map access. Fix this by passing env->insn_idx (instead of env->insn_idx + 1) to push_stack(), so the pushed path re-executes the ALU instruction with dst = 0 and naturally computes the correct result for any opcode. Fixes: bffacdb80b93 ("bpf: Recognize special arithmetic shift in the verifier") Signed-off-by: Daniel Wade <danjwade95@gmail.com> Reviewed-by: Amery Hung <ameryhung@gmail.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20260314021521.128361-2-danjwade95@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
35 hoursbpf: Fix undefined behavior in interpreter sdiv/smod for INT_MINJenny Guanni Qu1-8/+14
[ Upstream commit c77b30bd1dcb61f66c640ff7d2757816210c7cb0 ] The BPF interpreter's signed 32-bit division and modulo handlers use the kernel abs() macro on s32 operands. The abs() macro documentation (include/linux/math.h) explicitly states the result is undefined when the input is the type minimum. When DST contains S32_MIN (0x80000000), abs((s32)DST) triggers undefined behavior and returns S32_MIN unchanged on arm64/x86. This value is then sign-extended to u64 as 0xFFFFFFFF80000000, causing do_div() to compute the wrong result. The verifier's abstract interpretation (scalar32_min_max_sdiv) computes the mathematically correct result for range tracking, creating a verifier/interpreter mismatch that can be exploited for out-of-bounds map value access. Introduce abs_s32() which handles S32_MIN correctly by casting to u32 before negating, avoiding signed overflow entirely. Replace all 8 abs((s32)...) call sites in the interpreter's sdiv32/smod32 handlers. s32 is the only affected case -- the s64 division/modulo handlers do not use abs(). Fixes: ec0e2da95f72 ("bpf: Support new signed div/mod instructions.") Acked-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: Mykyta Yatsenko <yatsenko@meta.com> Signed-off-by: Jenny Guanni Qu <qguanni@gmail.com> Link: https://lore.kernel.org/r/20260311011116.2108005-2-qguanni@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
35 hoursbpf: Fix exception exit lock checking for subprogsIhor Solodrai1-1/+2
[ Upstream commit 6c2128505f61b504c79a20b89596feba61388112 ] process_bpf_exit_full() passes check_lock = !curframe to check_resource_leak(), which is false in cases when bpf_throw() is called from a static subprog. This makes check_resource_leak() to skip validation of active_rcu_locks, active_preempt_locks, and active_irq_id on exception exits from subprogs. At runtime bpf_throw() unwinds the stack via ORC without releasing any user-acquired locks, which may cause various issues as the result. Fix by setting check_lock = true for exception exits regardless of curframe, since exceptions bypass all intermediate frame cleanup. Update the error message prefix to "bpf_throw" for exception exits to distinguish them from normal BPF_EXIT. Fix reject_subprog_with_rcu_read_lock test which was previously passing for the wrong reason. Test program returned directly from the subprog call without closing the RCU section, so the error was triggered by the unclosed RCU lock on normal exit, not by bpf_throw. Update __msg annotations for affected tests to match the new "bpf_throw" error prefix. The spin_lock case is not affected because they are already checked [1] at the call site in do_check_insn() before bpf_throw can run. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/bpf/verifier.c?h=v7.0-rc4#n21098 Assisted-by: Claude:claude-opus-4-6 Fixes: f18b03fabaa9 ("bpf: Implement BPF exceptions") Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev> Acked-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20260320000809.643798-1-ihor.solodrai@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
35 hoursbpf: Release module BTF IDR before module unloadKumar Kartikeya Dwivedi1-4/+20
[ Upstream commit 146bd2a87a65aa407bb17fac70d8d583d19aba06 ] Gregory reported in [0] that the global_map_resize test when run in repeatedly ends up failing during program load. This stems from the fact that BTF reference has not dropped to zero after the previous run's module is unloaded, and the older module's BTF is still discoverable and visible. Later, in libbpf, load_module_btfs() will find the ID for this stale BTF, open its fd, and then it will be used during program load where later steps taking module reference using btf_try_get_module() fail since the underlying module for the BTF is gone. Logically, once a module is unloaded, it's associated BTF artifacts should become hidden. The BTF object inside the kernel may still remain alive as long its reference counts are alive, but it should no longer be discoverable. To fix this, let us call btf_free_id() from the MODULE_STATE_GOING case for the module unload to free the BTF associated IDR entry, and disable its discovery once module unload returns to user space. If a race happens during unload, the outcome is non-deterministic anyway. However, user space should be able to rely on the guarantee that once it has synchronously established a successful module unload, no more stale artifacts associated with this module can be obtained subsequently. Note that we must be careful to not invoke btf_free_id() in btf_put() when btf_is_module() is true now. There could be a window where the module unload drops a non-terminal reference, frees the IDR, but the same ID gets reused and the second unconditional btf_free_id() ends up releasing an unrelated entry. To avoid a special case for btf_is_module() case, set btf->id to zero to make btf_free_id() idempotent, such that we can unconditionally invoke it from btf_put(), and also from the MODULE_STATE_GOING case. Since zero is an invalid IDR, the idr_remove() should be a noop. Note that we can be sure that by the time we reach final btf_put() for btf_is_module() case, the btf_free_id() is already done, since the module itself holds the BTF reference, and it will call this function for the BTF before dropping its own reference. [0]: https://lore.kernel.org/bpf/cover.1773170190.git.grbell@redhat.com Fixes: 36e68442d1af ("bpf: Load and verify kernel module BTFs") Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Suggested-by: Martin KaFai Lau <martin.lau@kernel.org> Reported-by: Gregory Bell <grbell@redhat.com> Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20260312205307.1346991-1-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
35 hoursperf: Make sure to use pmu_ctx->pmu for groupsPeter Zijlstra1-11/+8
[ Upstream commit 4b9ce671960627b2505b3f64742544ae9801df97 ] Oliver reported that x86_pmu_del() ended up doing an out-of-bound memory access when group_sched_in() fails and needs to roll back. This *should* be handled by the transaction callbacks, but he found that when the group leader is a software event, the transaction handlers of the wrong PMU are used. Despite the move_group case in perf_event_open() and group_sched_in() using pmu_ctx->pmu. Turns out, inherit uses event->pmu to clone the events, effectively undoing the move_group case for all inherited contexts. Fix this by also making inherit use pmu_ctx->pmu, ensuring all inherited counters end up in the same pmu context. Similarly, __perf_event_read() should use equally use pmu_ctx->pmu for the group case. Fixes: bd2756811766 ("perf: Rewrite core context handling") Reported-by: Oliver Rosenberg <olrose55@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://patch.msgid.link/20260309133713.GB606826@noisy.programming.kicks-ass.net Signed-off-by: Sasha Levin <sashal@kernel.org>
35 hoursbpf: Fix constant blinding for PROBE_MEM32 storesSachin Kumar1-0/+21
[ Upstream commit 2321a9596d2260310267622e0ad8fbfa6f95378f ] BPF_ST | BPF_PROBE_MEM32 immediate stores are not handled by bpf_jit_blind_insn(), allowing user-controlled 32-bit immediates to survive unblinded into JIT-compiled native code when bpf_jit_harden >= 1. The root cause is that convert_ctx_accesses() rewrites BPF_ST|BPF_MEM to BPF_ST|BPF_PROBE_MEM32 for arena pointer stores during verification, before bpf_jit_blind_constants() runs during JIT compilation. The blinding switch only matches BPF_ST|BPF_MEM (mode 0x60), not BPF_ST|BPF_PROBE_MEM32 (mode 0xa0). The instruction falls through unblinded. Add BPF_ST|BPF_PROBE_MEM32 cases to bpf_jit_blind_insn() alongside the existing BPF_ST|BPF_MEM cases. The blinding transformation is identical: load the blinded immediate into BPF_REG_AX via mov+xor, then convert the immediate store to a register store (BPF_STX). The rewritten STX instruction must preserve the BPF_PROBE_MEM32 mode so the architecture JIT emits the correct arena addressing (R12-based on x86-64). Cannot use the BPF_STX_MEM() macro here because it hardcodes BPF_MEM mode; construct the instruction directly instead. Fixes: 6082b6c328b5 ("bpf: Recognize addr_space_cast instruction in the verifier.") Reviewed-by: Puranjay Mohan <puranjay@kernel.org> Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> Signed-off-by: Sachin Kumar <xcyfun@protonmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/Y6IT5VvNRchPBLI5D7JZHBzZrU9rb0ycRJPJzJSXGj7kJlX8RJwZFSM2YZjcDxoQKABkxt1T8Os2gi23PYyFuQe6KkZGWVyfz8K5afdy9ak=@protonmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
35 hoursbpf: Reset register ID for BPF_END value trackingYazhou Tang1-0/+7
[ Upstream commit a3125bc01884431d30d731461634c8295b6f0529 ] When a register undergoes a BPF_END (byte swap) operation, its scalar value is mutated in-place. If this register previously shared a scalar ID with another register (e.g., after an `r1 = r0` assignment), this tie must be broken. Currently, the verifier misses resetting `dst_reg->id` to 0 for BPF_END. Consequently, if a conditional jump checks the swapped register, the verifier incorrectly propagates the learned bounds to the linked register, leading to false confidence in the linked register's value and potentially allowing out-of-bounds memory accesses. Fix this by explicitly resetting `dst_reg->id` to 0 in the BPF_END case to break the scalar tie, similar to how BPF_NEG handles it via `__mark_reg_known`. Fixes: 9d2119984224 ("bpf: Add bitwise tracking for BPF_END") Closes: https://lore.kernel.org/bpf/AMBPR06MB108683CFEB1CB8D9E02FC95ECF17EA@AMBPR06MB10868.eurprd06.prod.outlook.com/ Link: https://lore.kernel.org/bpf/4be25f7442a52244d0dd1abb47bc6750e57984c9.camel@gmail.com/ Reported-by: Guillaume Laporte <glapt.pro@outlook.com> Co-developed-by: Tianci Cao <ziye@zju.edu.cn> Signed-off-by: Tianci Cao <ziye@zju.edu.cn> Co-developed-by: Shenghao Yuan <shenghaoyuan0928@163.com> Signed-off-by: Shenghao Yuan <shenghaoyuan0928@163.com> Signed-off-by: Yazhou Tang <tangyazhou518@outlook.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20260304083228.142016-2-tangyazhou@zju.edu.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
9 daystracing: Fix trace_marker copy link list updatesSteven Rostedt1-9/+10
[ Upstream commit 07183aac4a6828e474f00b37c9d795d0d99e18a7 ] When the "copy_trace_marker" option is enabled for an instance, anything written into /sys/kernel/tracing/trace_marker is also copied into that instances buffer. When the option is set, that instance's trace_array descriptor is added to the marker_copies link list. This list is protected by RCU, as all iterations uses an RCU protected list traversal. When the instance is deleted, all the flags that were enabled are cleared. This also clears the copy_trace_marker flag and removes the trace_array descriptor from the list. The issue is after the flags are called, a direct call to update_marker_trace() is performed to clear the flag. This function returns true if the state of the flag changed and false otherwise. If it returns true here, synchronize_rcu() is called to make sure all readers see that its removed from the list. But since the flag was already cleared, the state does not change and the synchronization is never called, leaving a possible UAF bug. Move the clearing of all flags below the updating of the copy_trace_marker option which then makes sure the synchronization is performed. Also use the flag for checking the state in update_marker_trace() instead of looking at if the list is empty. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20260318185512.1b6c7db4@gandalf.local.home Fixes: 7b382efd5e8a ("tracing: Allow the top level trace_marker to write into another instances") Reported-by: Sasha Levin <sashal@kernel.org> Closes: https://lore.kernel.org/all/20260225133122.237275-1-sashal@kernel.org/ Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 daystracing: Fix failure to read user space from system call trace eventsSteven Rostedt1-0/+17
commit edca33a56297d5741ccf867669debec116681987 upstream. The system call trace events call trace_user_fault_read() to read the user space part of some system calls. This is done by grabbing a per-cpu buffer, disabling migration, enabling preemption, calling copy_from_user(), disabling preemption, enabling migration and checking if the task was preempted while preemption was enabled. If it was, the buffer is considered corrupted and it tries again. There's a safety mechanism that will fail out of this loop if it fails 100 times (with a warning). That warning message was triggered in some pi_futex stress tests. Enabling the sched_switch trace event and traceoff_on_warning, showed the problem: pi_mutex_hammer-1375 [006] d..21 138.981648: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0 migration/6-47 [006] d..2. 138.981651: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95 pi_mutex_hammer-1375 [006] d..21 138.981656: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0 migration/6-47 [006] d..2. 138.981659: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95 pi_mutex_hammer-1375 [006] d..21 138.981664: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0 migration/6-47 [006] d..2. 138.981667: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95 pi_mutex_hammer-1375 [006] d..21 138.981671: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0 migration/6-47 [006] d..2. 138.981675: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95 pi_mutex_hammer-1375 [006] d..21 138.981679: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0 migration/6-47 [006] d..2. 138.981682: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95 pi_mutex_hammer-1375 [006] d..21 138.981687: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0 migration/6-47 [006] d..2. 138.981690: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95 pi_mutex_hammer-1375 [006] d..21 138.981695: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0 migration/6-47 [006] d..2. 138.981698: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95 pi_mutex_hammer-1375 [006] d..21 138.981703: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0 migration/6-47 [006] d..2. 138.981706: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95 pi_mutex_hammer-1375 [006] d..21 138.981711: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0 migration/6-47 [006] d..2. 138.981714: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95 pi_mutex_hammer-1375 [006] d..21 138.981719: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0 migration/6-47 [006] d..2. 138.981722: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95 pi_mutex_hammer-1375 [006] d..21 138.981727: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0 migration/6-47 [006] d..2. 138.981730: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95 pi_mutex_hammer-1375 [006] d..21 138.981735: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0 migration/6-47 [006] d..2. 138.981738: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95 What happened was the task 1375 was flagged to be migrated. When preemption was enabled, the migration thread woke up to migrate that task, but failed because migration for that task was disabled. This caused the loop to fail to exit because the task scheduled out while trying to read user space. Every time the task enabled preemption the migration thread would schedule in, try to migrate the task, fail and let the task continue. But because the loop would only enable preemption with migration disabled, it would always fail because each time it enabled preemption to read user space, the migration thread would try to migrate it. To solve this, when the loop fails to read user space without being scheduled out, enabled and disable preemption with migration enabled. This will allow the migration task to successfully migrate the task and the next loop should succeed to read user space without being scheduled out. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20260316130734.1858a998@gandalf.local.home Fixes: 64cf7d058a005 ("tracing: Have trace_marker use per-cpu data to read user space") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 daysring-buffer: Fix to update per-subbuf entries of persistent ring bufferMasami Hiramatsu (Google)1-1/+1
commit f35dbac6942171dc4ce9398d1d216a59224590a9 upstream. Since the validation loop in rb_meta_validate_events() updates the same cpu_buffer->head_page->entries, the other subbuf entries are not updated. Fix to use head_page to update the entries field, since it is the cursor in this loop. Cc: stable@vger.kernel.org Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Ian Rogers <irogers@google.com> Fixes: 5f3b6e839f3c ("ring-buffer: Validate boot range memory events") Link: https://patch.msgid.link/177391153882.193994.17158784065013676533.stgit@mhiramat.tok.corp.google.com Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 dayssched: idle: Consolidate the handling of two special casesRafael J. Wysocki1-9/+21
[ Upstream commit f4c31b07b136839e0fb3026f8a5b6543e3b14d2f ] There are two special cases in the idle loop that are handled inconsistently even though they are analogous. The first one is when a cpuidle driver is absent and the default CPU idle time power management implemented by the architecture code is used. In that case, the scheduler tick is stopped every time before invoking default_idle_call(). The second one is when a cpuidle driver is present, but there is only one idle state in its table. In that case, the scheduler tick is never stopped at all. Since each of these approaches has its drawbacks, reconcile them with the help of one simple heuristic. Namely, stop the tick if the CPU has been woken up by it in the previous iteration of the idle loop, or let it tick otherwise. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Qais Yousef <qyousef@layalina.io> Reviewed-by: Aboorva Devarajan <aboorvad@linux.ibm.com> Fixes: ed98c3491998 ("sched: idle: Do not stop the tick before cpuidle_idle_call()") [ rjw: Added Fixes tag, changelog edits ] Link: https://patch.msgid.link/4741364.LvFx2qVVIh@rafael.j.wysocki Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
9 dayssched_ext: Disable preemption between scx_claim_exit() and kicking helper workTejun Heo1-0/+12
[ Upstream commit 83236b2e43dba00bee5b82eb5758816b1a674f6a ] scx_claim_exit() atomically sets exit_kind, which prevents scx_error() from triggering further error handling. After claiming exit, the caller must kick the helper kthread work which initiates bypass mode and teardown. If the calling task gets preempted between claiming exit and kicking the helper work, and the BPF scheduler fails to schedule it back (since error handling is now disabled), the helper work is never queued, bypass mode never activates, tasks stop being dispatched, and the system wedges. Disable preemption across scx_claim_exit() and the subsequent work kicking in all callers - scx_disable() and scx_vexit(). Add lockdep_assert_preemption_disabled() to scx_claim_exit() to enforce the requirement. Fixes: f0e1a0643a59 ("sched_ext: Implement BPF extensible scheduler class") Cc: stable@vger.kernel.org # v6.12+ Reviewed-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 dayssched_ext: Simplify breather mechanism with scx_aborting flagTejun Heo1-29/+25
[ Upstream commit a69040ed57f50156e5452474d25c79b9e62075d0 ] The breather mechanism was introduced in 62dcbab8b0ef ("sched_ext: Avoid live-locking bypass mode switching") and e32c260195e6 ("sched_ext: Enable the ops breather and eject BPF scheduler on softlockup") to prevent live-locks by injecting delays when CPUs are trapped in dispatch paths. Currently, it uses scx_breather_depth (atomic_t) and scx_in_softlockup (unsigned long) with separate increment/decrement and cleanup operations. The breather is only activated when aborting, so tie it directly to the exit mechanism. Replace both variables with scx_aborting flag set when exit is claimed and cleared after bypass is enabled. Introduce scx_claim_exit() to consolidate exit_kind claiming and breather enablement. This eliminates scx_clear_softlockup() and simplifies scx_softlockup() and scx_bypass(). The breather mechanism will be replaced by a different abort mechanism in a future patch. This simplification prepares for that change. Reviewed-by: Dan Schatzberg <schatzberg.dan@gmail.com> Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> Acked-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org> Stable-dep-of: 83236b2e43db ("sched_ext: Disable preemption between scx_claim_exit() and kicking helper work") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 dayssched_ext: Fix starvation of scx_enable() under fair-class saturationTejun Heo1-10/+56
[ Upstream commit b06ccbabe2506fd70b9167a644978b049150224a ] During scx_enable(), the READY -> ENABLED task switching loop changes the calling thread's sched_class from fair to ext. Since fair has higher priority than ext, saturating fair-class workloads can indefinitely starve the enable thread, hanging the system. This was introduced when the enable path switched from preempt_disable() to scx_bypass() which doesn't protect against fair-class starvation. Note that the original preempt_disable() protection wasn't complete either - in partial switch modes, the calling thread could still be starved after preempt_enable() as it may have been switched to ext class. Fix it by offloading the enable body to a dedicated system-wide RT (SCHED_FIFO) kthread which cannot be starved by either fair or ext class tasks. scx_enable() lazily creates the kthread on first use and passes the ops pointer through a struct scx_enable_cmd containing the kthread_work, then synchronously waits for completion. The workfn runs on a different kthread from sch->helper (which runs disable_work), so it can safely flush disable_work on the error path without deadlock. Fixes: 8c2090c504e9 ("sched_ext: Initialize in bypass mode") Cc: stable@vger.kernel.org # v6.12+ Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 daysnsfs: tighten permission checks for ns iteration ioctlsChristian Brauner1-0/+6
[ Upstream commit e6b899f08066e744f89df16ceb782e06868bd148 ] Even privileged services should not necessarily be able to see other privileged service's namespaces so they can't leak information to each other. Use may_see_all_namespaces() helper that centralizes this policy until the nstree adapts. Link: https://patch.msgid.link/20260226-work-visibility-fixes-v1-1-d2c2853313bd@kernel.org Fixes: a1d220d9dafa ("nsfs: iterate through mount namespaces") Reviewed-by: Jeff Layton <jlayton@kernel.org> Cc: stable@kernel.org # v6.12+ Signed-off-by: Christian Brauner <brauner@kernel.org> [ context ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 daysfgraph: Fix thresh_return nosleeptime double-adjustShengming Hu1-4/+11
[ Upstream commit b96d0c59cdbb2a22b2545f6f3d5c6276b05761dd ] trace_graph_thresh_return() called handle_nosleeptime() and then delegated to trace_graph_return(), which calls handle_nosleeptime() again. When sleep-time accounting is disabled this double-adjusts calltime and can produce bogus durations (including underflow). Fix this by computing rettime once, applying handle_nosleeptime() only once, using the adjusted calltime for threshold comparison, and writing the return event directly via __trace_graph_return() when the threshold is met. Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260221113314048jE4VRwIyZEALiYByGK0My@zte.com.cn Fixes: 3c9880f3ab52b ("ftrace: Use a running sleeptime instead of saving on shadow stack") Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Shengming Hu <hu.shengming@zte.com.cn> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 dayscrash_dump: don't log dm-crypt key bytes in read_key_from_user_keyingThorsten Blum1-2/+2
commit 36f46b0e36892eba08978eef7502ff3c94ddba77 upstream. When debug logging is enabled, read_key_from_user_keying() logs the first 8 bytes of the key payload and partially exposes the dm-crypt key. Stop logging any key bytes. Link: https://lkml.kernel.org/r/20260227230008.858641-2-thorsten.blum@linux.dev Fixes: 479e58549b0f ("crash_dump: store dm crypt keys in kdump reserved memory") Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Cc: Baoquan He <bhe@redhat.com> Cc: Coiby Xu <coxu@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-19bpf: drop kthread_exit from noreturn_denyChristian Loehle1-1/+0
commit 7fe44c4388146bdbb3c5932d81a26d9fa0fd3ec9 upstream. kthread_exit became a macro to do_exit in commit 28aaa9c39945 ("kthread: consolidate kthread exit paths to prevent use-after-free"), so there is no kthread_exit function BTF ID to resolve. Remove it from noreturn_deny to avoid resolve_btfids unresolved symbol warnings. Signed-off-by: Christian Loehle <christian.loehle@arm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-19kprobes: Remove unneeded warnings from __arm_kprobe_ftrace()Masami Hiramatsu (Google)1-2/+2
commit 5ef268cb7a0aac55521fd9881f1939fa94a8988e upstream. Remove unneeded warnings for handled errors from __arm_kprobe_ftrace() because all caller handled the error correctly. Link: https://lore.kernel.org/all/177261531182.1312989.8737778408503961141.stgit@mhiramat.tok.corp.google.com/ Reported-by: Zw Tang <shicenci@gmail.com> Closes: https://lore.kernel.org/all/CAPHJ_V+J6YDb_wX2nhXU6kh466Dt_nyDSas-1i_Y8s7tqY-Mzw@mail.gmail.com/ Fixes: 9c89bb8e3272 ("kprobes: treewide: Cleanup the error messages for kprobes") Cc: stable@vger.kernel.org Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-19sched_ext: Fix enqueue_task_scx() truncation of upper enqueue flagsTejun Heo1-3/+2
commit 57ccf5ccdc56954f2a91a7f66684fd31c566bde5 upstream. enqueue_task_scx() takes int enq_flags from the sched_class interface. SCX enqueue flags starting at bit 32 (SCX_ENQ_PREEMPT and above) are silently truncated when passed through activate_task(). extra_enq_flags was added as a workaround - storing high bits in rq->scx.extra_enq_flags and O