summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2025-10-19Linux 6.6.113v6.6.113Greg Kroah-Hartman1-1/+1
Link: https://lore.kernel.org/r/20251017145134.710337454@linuxfoundation.org Tested-by: Jon Hunter <jonathanh@nvidia.com> Tested-by: Hardik Garg <hargar@linux.microsoft.com> Tested-by: Shuah Khan <skhan@linuxfoundation.org> Tested-by: Peter Schneider <pschneider1968@googlemail.com> Tested-by: Florian Fainelli <florian.fainelli@broadcom.com> Tested-by: Brett A C Sheffield <bacs@librecast.net> Tested-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-19perf test stat: Avoid hybrid assumption when virtualizedIan Rogers1-1/+5
commit f9c506fb69bdcfb9d7138281378129ff037f2aa1 upstream. The cycles event will fallback to task-clock in the hybrid test when running virtualized. Change the test to not fail for this. Fixes: 65d11821910bd910 ("perf test: Add a test for default perf stat command") Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241212173354.9860-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-19writeback: Avoid excessively long inode switching timesJan Kara1-10/+11
[ Upstream commit 9a6ebbdbd41235ea3bc0c4f39e2076599b8113cc ] With lazytime mount option enabled we can be switching many dirty inodes on cgroup exit to the parent cgroup. The numbers observed in practice when systemd slice of a large cron job exits can easily reach hundreds of thousands or millions. The logic in inode_do_switch_wbs() which sorts the inode into appropriate place in b_dirty list of the target wb however has linear complexity in the number of dirty inodes thus overall time complexity of switching all the inodes is quadratic leading to workers being pegged for hours consuming 100% of the CPU and switching inodes to the parent wb. Simple reproducer of the issue: FILES=10000 # Filesystem mounted with lazytime mount option MNT=/mnt/ echo "Creating files and switching timestamps" for (( j = 0; j < 50; j ++ )); do mkdir $MNT/dir$j for (( i = 0; i < $FILES; i++ )); do echo "foo" >$MNT/dir$j/file$i done touch -a -t 202501010000 $MNT/dir$j/file* done wait echo "Syncing and flushing" sync echo 3 >/proc/sys/vm/drop_caches echo "Reading all files from a cgroup" mkdir /sys/fs/cgroup/unified/mycg1 || exit echo $$ >/sys/fs/cgroup/unified/mycg1/cgroup.procs || exit for (( j = 0; j < 50; j ++ )); do cat /mnt/dir$j/file* >/dev/null & done wait echo "Switching wbs" # Now rmdir the cgroup after the script exits We need to maintain b_dirty list ordering to keep writeback happy so instead of sorting inode into appropriate place just append it at the end of the list and clobber dirtied_time_when. This may result in inode writeback starting later after cgroup switch however cgroup switches are rare so it shouldn't matter much. Since the cgroup had write access to the inode, there are no practical concerns of the possible DoS issues. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-10-19writeback: Avoid softlockup when switching many inodesJan Kara1-1/+10
[ Upstream commit 66c14dccd810d42ec5c73bb8a9177489dfd62278 ] process_inode_switch_wbs_work() can be switching over 100 inodes to a different cgroup. Since switching an inode requires counting all dirty & under-writeback pages in the address space of each inode, this can take a significant amount of time. Add a possibility to reschedule after processing each inode to avoid softlockups. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-10-19cramfs: Verify inode mode when loading from diskTetsuo Handa1-1/+10
[ Upstream commit 7f9d34b0a7cb93d678ee7207f0634dbf79e47fe5 ] The inode mode loaded from corrupted disk can be invalid. Do like what commit 0a9e74051313 ("isofs: Verify inode mode when loading from disk") does. Reported-by: syzbot <syzbot+895c23f6917da440ed0d@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=895c23f6917da440ed0d Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Link: https://lore.kernel.org/429b3ef1-13de-4310-9a8e-c2dc9a36234a@I-love.SAKURA.ne.jp Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-10-19fs: Add 'initramfs_options' to set initramfs mount optionsLichen Liu2-1/+13
[ Upstream commit 278033a225e13ec21900f0a92b8351658f5377f2 ] When CONFIG_TMPFS is enabled, the initial root filesystem is a tmpfs. By default, a tmpfs mount is limited to using 50% of the available RAM for its content. This can be problematic in memory-constrained environments, particularly during a kdump capture. In a kdump scenario, the capture kernel boots with a limited amount of memory specified by the 'crashkernel' parameter. If the initramfs is large, it may fail to unpack into the tmpfs rootfs due to insufficient space. This is because to get X MB of usable space in tmpfs, 2*X MB of memory must be available for the mount. This leads to an OOM failure during the early boot process, preventing a successful crash dump. This patch introduces a new kernel command-line parameter, initramfs_options, which allows passing specific mount options directly to the rootfs when it is first mounted. This gives users control over the rootfs behavior. For example, a user can now specify initramfs_options=size=75% to allow the tmpfs to use up to 75% of the available memory. This can significantly reduce the memory pressure for kdump. Consider a practical example: To unpack a 48MB initramfs, the tmpfs needs 48MB of usable space. With the default 50% limit, this requires a memory pool of 96MB to be available for the tmpfs mount. The total memory requirement is therefore approximately: 16MB (vmlinuz) + 48MB (loaded initramfs) + 48MB (unpacked kernel) + 96MB (for tmpfs) + 12MB (runtime overhead) ≈ 220MB. By using initramfs_options=size=75%, the memory pool required for the 48MB tmpfs is reduced to 48MB / 0.75 = 64MB. This reduces the total memory requirement by 32MB (96MB - 64MB), allowing the kdump to succeed with a smaller crashkernel size, such as 192MB. An alternative approach of reusing the existing rootflags parameter was considered. However, a new, dedicated initramfs_options parameter was chosen to avoid altering the current behavior of rootflags (which applies to the final root filesystem) and to prevent any potential regressions. Also add documentation for the new kernel parameter "initramfs_options" This approach is inspired by prior discussions and patches on the topic. Ref: https://www.lightofdawn.org/blog/?viewDetailed=00128 Ref: https://landley.net/notes-2015.html#01-01-2015 Ref: https://lkml.org/lkml/2021/6/29/783 Ref: https://www.kernel.org/doc/html/latest/filesystems/ramfs-rootfs-initramfs.html#what-is-rootfs Signed-off-by: Lichen Liu <lichliu@redhat.com> Link: https://lore.kernel.org/20250815121459.3391223-1-lichliu@redhat.com Tested-by: Rob Landley <rob@landley.net> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-10-19pid: Add a judgment for ns null in pid_nr_nsgaoxiang171-1/+1
[ Upstream commit 006568ab4c5ca2309ceb36fa553e390b4aa9c0c7 ] __task_pid_nr_ns ns = task_active_pid_ns(current); pid_nr_ns(rcu_dereference(*task_pid_ptr(task, type)), ns); if (pid && ns->level <= pid->level) { Sometimes null is returned for task_active_pid_ns. Then it will trigger kernel panic in pid_nr_ns. For example: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000058 Mem abort info: ESR = 0x0000000096000007 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x07: level 3 translation fault Data abort info: ISV = 0, ISS = 0x00000007, ISS2 = 0x00000000 CM = 0, WnR = 0, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 user pgtable: 4k pages, 39-bit VAs, pgdp=00000002175aa000 [0000000000000058] pgd=08000002175ab003, p4d=08000002175ab003, pud=08000002175ab003, pmd=08000002175be003, pte=0000000000000000 pstate: 834000c5 (Nzcv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--) pc : __task_pid_nr_ns+0x74/0xd0 lr : __task_pid_nr_ns+0x24/0xd0 sp : ffffffc08001bd10 x29: ffffffc08001bd10 x28: ffffffd4422b2000 x27: 0000000000000001 x26: ffffffd442821168 x25: ffffffd442821000 x24: 00000f89492eab31 x23: 00000000000000c0 x22: ffffff806f5693c0 x21: ffffff806f5693c0 x20: 0000000000000001 x19: 0000000000000000 x18: 0000000000000000 x17: 00000000529c6ef0 x16: 00000000529c6ef0 x15: 00000000023a1adc x14: 0000000000000003 x13: 00000000007ef6d8 x12: 001167c391c78800 x11: 00ffffffffffffff x10: 0000000000000000 x9 : 0000000000000001 x8 : ffffff80816fa3c0 x7 : 0000000000000000 x6 : 49534d702d535449 x5 : ffffffc080c4c2c0 x4 : ffffffd43ee128c8 x3 : ffffffd43ee124dc x2 : 0000000000000000 x1 : 0000000000000001 x0 : ffffff806f5693c0 Call trace: __task_pid_nr_ns+0x74/0xd0 ... __handle_irq_event_percpu+0xd4/0x284 handle_irq_event+0x48/0xb0 handle_fasteoi_irq+0x160/0x2d8 generic_handle_domain_irq+0x44/0x60 gic_handle_irq+0x4c/0x114 call_on_irq_stack+0x3c/0x74 do_interrupt_handler+0x4c/0x84 el1_interrupt+0x34/0x58 el1h_64_irq_handler+0x18/0x24 el1h_64_irq+0x68/0x6c account_kernel_stack+0x60/0x144 exit_task_stack_account+0x1c/0x80 do_exit+0x7e4/0xaf8 ... get_signal+0x7bc/0x8d8 do_notify_resume+0x128/0x828 el0_svc+0x6c/0x70 el0t_64_sync_handler+0x68/0xbc el0t_64_sync+0x1a8/0x1ac Code: 35fffe54 911a02a8 f9400108 b4000128 (b9405a69) ---[ end trace 0000000000000000 ]--- Kernel panic - not syncing: Oops: Fatal exception in interrupt Signed-off-by: gaoxiang17 <gaoxiang17@xiaomi.com> Link: https://lore.kernel.org/20250802022123.3536934-1-gxxa03070307@gmail.com Reviewed-by: Baoquan He <bhe@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-10-19minixfs: Verify inode mode when loading from diskTetsuo Handa1-1/+7
[ Upstream commit 73861970938ad1323eb02bbbc87f6fbd1e5bacca ] The inode mode loaded from corrupted disk can be invalid. Do like what commit 0a9e74051313 ("isofs: Verify inode mode when loading from disk") does. Reported-by: syzbot <syzbot+895c23f6917da440ed0d@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=895c23f6917da440ed0d Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Link: https://lore.kernel.org/ec982681-84b8-4624-94fa-8af15b77cbd2@I-love.SAKURA.ne.jp Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-10-19irqchip/sifive-plic: Avoid interrupt ID 0 handling during suspend/resumeLucas Zampieri1-2/+4
[ Upstream commit f75e07bf5226da640fa99a0594687c780d9bace4 ] According to the PLIC specification[1], global interrupt sources are assigned small unsigned integer identifiers beginning at the value 1. An interrupt ID of 0 is reserved to mean "no interrupt". The current plic_irq_resume() and plic_irq_suspend() functions incorrectly start the loop from index 0, which accesses the register space for the reserved interrupt ID 0. Change the loop to start from index 1, skipping the reserved interrupt ID 0 as per the PLIC specification. This prevents potential undefined behavior when accessing the reserved register space during suspend/resume cycles. Fixes: e80f0b6a2cf3 ("irqchip/irq-sifive-plic: Add syscore callbacks for hibernation") Co-developed-by: Jia Wang <wangjia@ultrarisc.com> Signed-off-by: Jia Wang <wangjia@ultrarisc.com> Co-developed-by: Charles Mirabile <cmirabil@redhat.com> Signed-off-by: Charles Mirabile <cmirabil@redhat.com> Signed-off-by: Lucas Zampieri <lzampier@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://github.com/riscv/riscv-plic-spec/releases/tag/1.0.0 Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-10-19irqchip/sifive-plic: Make use of __assign_bit()Hongbo Li1-5/+4
[ Upstream commit 40d7af5375a4e27d8576d9d11954ac213d06f09e ] Replace the open coded if (foo) __set_bit(n, bar); else __clear_bit(n, bar); with __assign_bit(). No functional change intended. Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com> Link: https://lore.kernel.org/all/20240902130824.2878644-1-lihongbo22@huawei.com Stable-dep-of: f75e07bf5226 ("irqchip/sifive-plic: Avoid interrupt ID 0 handling during suspend/resume") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-10-19mptcp: pm: in-kernel: usable client side with C-flagMatthieu Baerts (NGI0)3-3/+64
commit 4b1ff850e0c1aacc23e923ed22989b827b9808f9 upstream. When servers set the C-flag in their MP_CAPABLE to tell clients not to create subflows to the initial address and port, clients will likely not use their other endpoints. That's because the in-kernel path-manager uses the 'subflow' endpoints to create subflows only to the initial address and port. If the limits have not been modified to accept ADD_ADDR, the client doesn't try to establish new subflows. If the limits accept ADD_ADDR, the routing routes will be used to select the source IP. The C-flag is typically set when the server is operating behind a legacy Layer 4 load balancer, or using anycast IP address. Clients having their different 'subflow' endpoints setup, don't end up creating multiple subflows as expected, and causing some deployment issues. A special case is then added here: when servers set the C-flag in the MPC and directly sends an ADD_ADDR, this single ADD_ADDR is accepted. The 'subflows' endpoints will then be used with this new remote IP and port. This exception is only allowed when the ADD_ADDR is sent immediately after the 3WHS, and makes the client switching to the 'fully established' mode. After that, 'select_local_address()' will not be able to find any subflows, because 'id_avail_bitmap' will be filled in mptcp_pm_create_subflow_or_signal_addr(), when switching to 'fully established' mode. Fixes: df377be38725 ("mptcp: add deny_join_id0 in mptcp_options_received") Cc: stable@vger.kernel.org Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/536 Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250925-net-next-mptcp-c-flag-laminar-v1-1-ad126cc47c6b@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> [ Conflict in pm.c, because commit 498d7d8b75f1 ("mptcp: pm: remove '_nl' from mptcp_pm_nl_is_init_remote_addr") renamed an helper in the context, and it is not in this version. The same new code can be applied at the same place. Conflict in pm_kernel.c, because the modified code has been moved from pm_netlink.c to pm_kernel.c in commit 8617e85e04bd ("mptcp: pm: split in-kernel PM specific code"), which is not in this version. The resolution is easy: simply by applying the patch where 'pm_kernel.c' has been replaced 'pm_netlink.c'. Conflict in pm_netlink.c, because commit b83fbca1b4c9 ("mptcp: pm: reduce entries iterations on connect") is not in this version. Instead of using the 'locals' variable (struct mptcp_pm_local *) from the new version and embedding a "struct mptcp_addr_info", we can simply continue to use the 'addrs' variable (struct mptcp_addr_info *). Conflict in protocol.h, because commit af3dc0ad3167 ("mptcp: Remove unused declaration mptcp_sockopt_sync()") is not in this version and it removed one line in the context. The resolution is easy because the new function can still be added at the same place. ] Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-19selftests/mm: skip soft-dirty tests when CONFIG_MEM_SOFT_DIRTY is disabledLance Yang4-20/+84
commit 0389c305ef56cbadca4cbef44affc0ec3213ed30 upstream. The madv_populate and soft-dirty kselftests currently fail on systems where CONFIG_MEM_SOFT_DIRTY is disabled. Introduce a new helper softdirty_supported() into vm_util.c/h to ensure tests are properly skipped when the feature is not enabled. Link: https://lkml.kernel.org/r/20250917133137.62802-1-lance.yang@linux.dev Fixes: 9f3265db6ae8 ("selftests: vm: add test for Soft-Dirty PTE bit") Signed-off-by: Lance Yang <lance.yang@linux.dev> Acked-by: David Hildenbrand <david@redhat.com> Suggested-by: David Hildenbrand <david@redhat.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Gabriel Krisman Bertazi <krisman@collabora.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-19s390/bpf: Write back tail call counter for BPF_TRAMP_F_CALL_ORIGIlya Leoshkevich1-0/+3
commit bc3905a71f02511607d3ccf732360580209cac4c upstream. The tailcall_bpf2bpf_hierarchy_fentry test hangs on s390. Its call graph is as follows: entry() subprog_tail() trampoline() fentry() the rest of subprog_tail() # via BPF_TRAMP_F_CALL_ORIG return to entry() The problem is that the rest of subprog_tail() increments the tail call counter, but the trampoline discards the incremented value. This results in an astronomically large number of tail calls. Fix by making the trampoline write the incremented tail call counter back. Fixes: 528eb2cb87bc ("s390/bpf: Implement arch_prepare_bpf_trampoline()") Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20250813121016.163375-4-iii@linux.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-19s390/bpf: Write back tail call counter for BPF_PSEUDO_CALLIlya Leoshkevich1-7/+16
commit c861a6b147137d10b5ff88a2c492ba376cd1b8b0 upstream. The tailcall_bpf2bpf_hierarchy_1 test hangs on s390. Its call graph is as follows: entry() subprog_tail() bpf_tail_call_static(0) -> entry + tail_call_start subprog_tail() bpf_tail_call_static(0) -> entry + tail_call_start entry() copies its tail call counter to the subprog_tail()'s frame, which then increments it. However, the incremented result is discarded, leading to an astronomically large number of tail calls. Fix by writing the incremented counter back to the entry()'s frame. Fixes: dd691e847d28 ("s390/bpf: Implement bpf_jit_supports_subprog_tailcalls()") Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20250813121016.163375-3-iii@linux.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-19s390/bpf: Describe the frame using a struct instead of constantsIlya Leoshkevich2-77/+47
commit e26d523edf2a62b142d2dd2dd9b87f61ed92f33a upstream. Currently the caller-allocated portion of the stack frame is described using constants, hardcoded values, and an ASCII drawing, making it harder than necessary to ensure that everything is in sync. Declare a struct and use offsetof() and offsetofend() macros to refer to various values stored within the frame. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20250624121501.50536-3-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-19s390/bpf: Centralize frame offset calculationsIlya Leoshkevich1-30/+26
commit b2268d550d20ff860bddfe3a91b1aec00414689a upstream. The calculation of the distance from %r15 to the caller-allocated portion of the stack frame is copy-pasted into multiple places in the JIT code. Move it to bpf_jit_prog() and save the result into bpf_jit::frame_off, so that the other parts of the JIT can use it. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20250624121501.50536-2-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-19s390/bpf: Change seen_reg to a maskIlya Leoshkevich1-16/+16
commit 7ba4f43e16de351fe9821de80e15d88c884b2967 upstream. Using a mask instead of an array saves a small amount of memory and allows marking multiple registers as seen with a simple "or". Another positive side-effect is that it speeds up verification with jitterbug. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240703005047.40915-2-iii@linux.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-19ACPI: property: Do not pass NULL handles to acpi_attach_data()Rafael J. Wysocki1-0/+12
[ Upstream commit baf60d5cb8bc6b85511c5df5f0ad7620bb66d23c ] In certain circumstances, the ACPI handle of a data-only node may be NULL, in which case it does not make sense to attempt to attach that node to an ACPI namespace object, so update the code to avoid attempts to do so. This prevents confusing and unuseful error messages from being printed. Also document the fact that the ACPI handle of a data-only node may be NULL and when that happens in a code comment. In addition, make acpi_add_nondev_subnodes() print a diagnostic message for each data-only node with an unknown ACPI namespace scope. Fixes: 1d52f10917a7 ("ACPI: property: Tie data nodes to acpi handles") Cc: 6.0+ <stable@vger.kernel.org> # 6.0+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com> Tested-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-19ACPI: property: Add code comments explaining what is going onRafael J. Wysocki1-2/+44
[ Upstream commit 737c3a09dcf69ba2814f3674947ccaec1861c985 ] In some places in the ACPI device properties handling code, it is unclear why the code is what it is. Some assumptions are not documented and some pieces of code are based on knowledge that is not mentioned anywhere. Add code comments explaining these things. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com> Tested-by: Sakari Ailus <sakari.ailus@linux.intel.com> Stable-dep-of: baf60d5cb8bc ("ACPI: property: Do not pass NULL handles to acpi_attach_data()") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-19ACPI: property: Disregard references in data-only subnode listsRafael J. Wysocki1-29/+22
[ Upstream commit d06118fe9b03426484980ed4c189a8c7b99fa631 ] Data-only subnode links following the ACPI data subnode GUID in a _DSD package are expected to point to named objects returning _DSD-equivalent packages. If a reference to such an object is used in the target field of any of those links, that object will be evaluated in place (as a named object) and its return data will be embedded in the outer _DSD package. For this reason, it is not expected to see a subnode link with the target field containing a local reference (that would mean pointing to a device or another object that cannot be evaluated in place and therefore cannot return a _DSD-equivalent package). Accordingly, simplify the code parsing data-only subnode links to simply print a message when it encounters a local reference in the target field of one of those links. Moreover, since acpi_nondev_subnode_data_ok() would only have one caller after the change above, fold it into that caller. Link: https://lore.kernel.org/linux-acpi/CAJZ5v0jVeSrDO6hrZhKgRZrH=FpGD4vNUjFD8hV9WwN9TLHjzQ@mail.gmail.com/ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com> Tested-by: Sakari Ailus <sakari.ailus@linux.intel.com> Stable-dep-of: baf60d5cb8bc ("ACPI: property: Do not pass NULL handles to acpi_attach_data()") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-19ACPI: battery: Add synchronization between interface updatesRafael J. Wysocki1-14/+29
[ Upstream commit 399dbcadc01ebf0035f325eaa8c264f8b5cd0a14 ] There is no synchronization between different code paths in the ACPI battery driver that update its sysfs interface or its power supply class device interface. In some cases this results to functional failures due to race conditions. One example of this is when two ACPI notifications: - ACPI_BATTERY_NOTIFY_STATUS (0x80) - ACPI_BATTERY_NOTIFY_INFO (0x81) are triggered (by the platform firmware) in a row with a little delay in between after removing and reinserting a laptop battery. Both notifications cause acpi_battery_update() to be called and if the delay between them is sufficiently small, sysfs_add_battery() can be re-entered before battery->bat is set which leads to a duplicate sysfs entry error: sysfs: cannot create duplicate filename '/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0A:00/power_supply/BAT1' CPU: 1 UID: 0 PID: 185 Comm: kworker/1:4 Kdump: loaded Not tainted 6.12.38+deb13-amd64 #1 Debian 6.12.38-1 Hardware name: Gateway NV44 /SJV40-MV , BIOS V1.3121 04/08/2009 Workqueue: kacpi_notify acpi_os_execute_deferred Call Trace: <TASK> dump_stack_lvl+0x5d/0x80 sysfs_warn_dup.cold+0x17/0x23 sysfs_create_dir_ns+0xce/0xe0 kobject_add_internal+0xba/0x250 kobject_add+0x96/0xc0 ? get_device_parent+0xde/0x1e0 device_add+0xe2/0x870 __power_supply_register.part.0+0x20f/0x3f0 ? wake_up_q+0x4e/0x90 sysfs_add_battery+0xa4/0x1d0 [battery] acpi_battery_update+0x19e/0x290 [battery] acpi_battery_notify+0x50/0x120 [battery] acpi_ev_notify_dispatch+0x49/0x70 acpi_os_execute_deferred+0x1a/0x30 process_one_work+0x177/0x330 worker_thread+0x251/0x390 ? __pfx_worker_thread+0x10/0x10 kthread+0xd2/0x100 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x34/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> kobject: kobject_add_internal failed for BAT1 with -EEXIST, don't try to register things with the same name in the same directory. There are also other scenarios in which analogous issues may occur. Address this by using a common lock in all of the code paths leading to updates of driver interfaces: ACPI Notify () handler, system resume callback and post-resume notification, device addition and removal. This new lock replaces sysfs_lock that has been used only in sysfs_remove_battery() which now is going to be always called under the new lock, so it doesn't need any internal locking any more. Fixes: 10666251554c ("ACPI: battery: Install Notify() handler directly") Closes: https://lore.kernel.org/linux-acpi/20250910142653.313360-1-luogf2025@163.com/ Reported-by: GuangFei Luo <luogf2025@163.com> Tested-by: GuangFei Luo <luogf2025@163.com> Cc: 6.6+ <stable@vger.kernel.org> # 6.6+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-19ACPI: battery: Check for error code from devm_mutex_init() callAndy Shevchenko1-2/+8
[ Upstream commit 815daedc318b2f9f1b956d0631377619a0d69d96 ] Even if it's not critical, the avoidance of checking the error code from devm_mutex_init() call today diminishes the point of using devm variant of it. Tomorrow it may even leak something. Add the missed check. Fixes: 0710c1ce5045 ("ACPI: battery: initialize mutexes through devm_ APIs") Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://patch.msgid.link/20241030162754.2110946-1-andriy.shevchenko@linux.intel.com [ rjw: Added 2 empty code lines ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Stable-dep-of: 399dbcadc01e ("ACPI: battery: Add synchronization between interface updates") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-19ACPI: battery: initialize mutexes through devm_ APIsThomas Weißschuh1-7/+2
[ Upstream commit 0710c1ce50455ed0db91bffa0eebbaa4f69b1773 ] Simplify the cleanup logic a bit. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://patch.msgid.link/20240904-acpi-battery-cleanups-v1-3-a3bf74f22d40@weissschuh.net Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Stable-dep-of: 399dbcadc01e ("ACPI: battery: Add synchronization between interface updates") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-19ACPI: battery: allocate driver data through devm_ APIsThomas Weißschuh1-3/+1
[ Upstream commit 909dfc60692331e1599d5e28a8f08a611f353aef ] Simplify the cleanup logic a bit. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://patch.msgid.link/20240904-acpi-battery-cleanups-v1-2-a3bf74f22d40@weissschuh.net Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Stable-dep-of: 399dbcadc01e ("ACPI: battery: Add synchronization between interface updates") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-19arm64: mte: Do not flag the zero page as PG_mte_taggedCatalin Marinas2-4/+9
[ Upstream commit f620d66af3165838bfa845dcf9f5f9b4089bf508 ] Commit 68d54ceeec0e ("arm64: mte: Allow PTRACE_PEEKMTETAGS access to the zero page") attempted to fix ptrace() reading of tags from the zero page by marking it as PG_mte_tagged during cpu_enable_mte(). The same commit also changed the ptrace() tag access permission check to the VM_MTE vma flag while turning the page flag test into a WARN_ON_ONCE(). Attempting to set the PG_mte_tagged flag early with CONFIG_DEFERRED_STRUCT_PAGE_INIT enabled may either hang (after commit d77e59a8fccd "arm64: mte: Lock a page for MTE tag initialisation") or have the flags cleared later during page_alloc_init_late(). In addition, pages_identical() -> memcmp_pages() will reject any comparison with the zero page as it is marked as tagged. Partially revert the above commit to avoid setting PG_mte_tagged on the zero page. Update the __access_remote_tags() warning on untagged pages to ignore the zero page since it is known to have the tags initialised. Note that all user mapping of the zero page are marked as pte_special(). The arm64 set_pte_at() will not call mte_sync_tags() on such pages, so PG_mte_tagged will remain cleared. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Fixes: 68d54ceeec0e ("arm64: mte: Allow PTRACE_PEEKMTETAGS access to the zero page") Reported-by: Gergely Kovacs <Gergely.Kovacs2@arm.com> Cc: stable@vger.kernel.org # 5.10.x Cc: Will Deacon <will@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Lance Yang <lance.yang@linux.dev> Acked-by: Lance Yang <lance.yang@linux.dev> Reviewed-by: David Hildenbrand <david@redhat.com> Tested-by: Lance Yang <lance.yang@linux.dev> Signed-off-by: Will Deacon <will@kernel.org> [ Adjust context ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-19arm64: kprobes: call set_memory_rox() for kprobe pageYang Shi1-1/+7
[ Upstream commit 195a1b7d8388c0ec2969a39324feb8bebf9bb907 ] The kprobe page is allocated by execmem allocator with ROX permission. It needs to call set_memory_rox() to set proper permission for the direct map too. It was missed. Fixes: 10d5e97c1bf8 ("arm64: use PAGE_KERNEL_ROX directly in alloc_insn_page") Cc: <stable@vger.kernel.org> Signed-off-by: Yang Shi <yang@os.amperecomputing.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will@kernel.org> [ kept existing __vmalloc_node_range() instead of upstream's execmem_alloc() ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-19ipmi: Fix handling of messages with provided receive message pointerGuenter Roeck1-1/+4
commit e2c69490dda5d4c9f1bfbb2898989c8f3530e354 upstream Prior to commit b52da4054ee0 ("ipmi: Rework user message limit handling"), i_ipmi_request() used to increase the user reference counter if the receive message is provided by the caller of IPMI API functions. This is no longer the case. However, ipmi_free_recv_msg() is still called and decreases the reference counter. This results in the reference counter reaching zero, the user data pointer is released, and all kinds of interesting crashes are seen. Fix the problem by increasing user reference counter if the receive message has been provided by the caller. Fixes: b52da4054ee0 ("ipmi: Rework user message limit handling") Reported-by: Eric Dumazet <edumazet@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Thelen <gthelen@google.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Message-ID: <20251006201857.3433837-1-linux@roeck-us.net> Signed-off-by: Corey Minyard <corey@minyard.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-19ipmi: Rework user message limit handlingCorey Minyard1-217/+198
commit b52da4054ee0bf9ecb44996f2c83236ff50b3812 upstream This patch required quite a bit of work to backport due to a number of unrelated changes that do not make sense to backport. This has been run against my test suite and passes all tests. The limit on the number of user messages had a number of issues, improper counting in some cases and a use after free. Restructure how this is all done to handle more in the receive message allocation routine, so all refcouting and user message limit counts are done in that routine. It's a lot cleaner and safer. Reported-by: Gilles BULOZ <gilles.buloz@kontron.com> Closes: https://lore.kernel.org/lkml/aLsw6G0GyqfpKs2S@mail.minyard.net/ Fixes: 8e76741c3d8b ("ipmi: Add a limit on the number of users that may use IPMI") Cc: <stable@vger.kernel.org> # 4.19 Signed-off-by: Corey Minyard <corey@minyard.net> Tested-by: Gilles BULOZ <gilles.buloz@kontron.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-19KVM: SVM: Emulate PERF_CNTR_GLOBAL_STATUS_SET for PerfMonV2Sean Christopherson4-0/+9
[ Upstream commit 68e61f6fd65610e73b17882f86fedfd784d99229 ] Emulate PERF_CNTR_GLOBAL_STATUS_SET when PerfMonV2 is enumerated to the guest, as the MSR is supposed to exist in all AMD v2 PMUs. Fixes: 4a2771895ca6 ("KVM: x86/svm/pmu: Add AMD PerfMonV2 support") Cc: stable@vger.kernel.org Cc: Sandipan Das <sandipan.das@amd.com> Link: https://lore.kernel.org/r/20250711172746.1579423-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> [ changed global_status_rsvd field to global_status_mask ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-19rseq: Protect event mask against membarrier IPIThomas Gleixner2-8/+13
[ Upstream commit 6eb350a2233100a283f882c023e5ad426d0ed63b ] rseq_need_restart() reads and clears task::rseq_event_mask with preemption disabled to guard against the scheduler. But membarrier() uses an IPI and sets the PREEMPT bit in the event mask from the IPI, which leaves that RMW operation unprotected. Use guard(irq) if CONFIG_MEMBARRIER is enabled to fix that. Fixes: 2a36ab717e8f ("rseq/membarrier: Add MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Boqun Feng <boqun.feng@gmail.com> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: stable@vger.kernel.org [ Applied changes to include/linux/sched.h instead of include/linux/rseq.h ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-19btrfs: fix the incorrect max_bytes value for find_lock_delalloc_range()Qu Wenruo1-3/+11
[ Upstream commit 7b26da407420e5054e3f06c5d13271697add9423 ] [BUG] With my local branch to enable bs > ps support for btrfs, sometimes I hit the following ASSERT() inside submit_one_sector(): ASSERT(block_start != EXTENT_MAP_HOLE); Please note that it's not yet possible to hit this ASSERT() in the wild yet, as it requires btrfs bs > ps support, which is not even in the development branch. But on the other hand, there is also a very low chance to hit above ASSERT() with bs < ps cases, so this is an existing bug affect not only the incoming bs > ps support but also the existing bs < ps support. [CAUSE] Firstly that ASSERT() means we're trying to submit a dirty block but without a real extent map nor ordered extent map backing it. Furthermore with extra debugging, the folio triggering such ASSERT() is always larger than the fs block size in my bs > ps case. (8K block size, 4K page size) After some more debugging, the ASSERT() is trigger by the following sequence: extent_writepage() | We got a 32K folio (4 fs blocks) at file offset 0, and the fs block | size is 8K, page size is 4K. | And there is another 8K folio at file offset 32K, which is also | dirty. | So the filemap layout looks like the following: | | "||" is the filio boundary in the filemap. | "//| is the dirty range. | | 0 8K 16K 24K 32K 40K | |////////| |//////////////////////||////////| | |- writepage_delalloc() | |- find_lock_delalloc_range() for [0, 8K) | | Now range [0, 8K) is properly locked. | | | |- find_lock_delalloc_range() for [16K, 40K) | | |- btrfs_find_delalloc_range() returned range [16K, 40K) | | |- lock_delalloc_folios() locked folio 0 successfully | | | | | | The filemap range [32K, 40K) got dropped from filemap. | | | | | |- lock_delalloc_folios() failed with -EAGAIN on folio 32K | | | As the folio at 32K is dropped. | | | | | |- loops = 1; | | |- max_bytes = PAGE_SIZE; | | |- goto again; | | | This will re-do the lookup for dirty delalloc ranges. | | | | | |- btrfs_find_delalloc_range() called with @max_bytes == 4K | | | This is smaller than block size, so | | | btrfs_find_delalloc_range() is unable to return any range. | | \- return false; | | | \- Now only range [0, 8K) has an OE for it, but for dirty range | [16K, 32K) it's dirty without an OE. | This breaks the assumption that writepage_delalloc() will find | and lock all dirty ranges inside the folio. | |- extent_writepage_io() |- submit_one_sector() for [0, 8K) | Succeeded | |- submit_one_sector() for [16K, 24K) Triggering the ASSERT(), as there is no OE, and the original extent map is a hole. Please note that, this also exposed the same problem for bs < ps support. E.g. with 64K page size and 4K block size. If we failed to lock a folio, and falls back into the "loops = 1;" branch, we will re-do the search using 64K as max_bytes. Which may fail again to lock the next folio, and exit early without handling all dirty blocks inside the folio. [FIX] Instead of using the fixed size PAGE_SIZE as @max_bytes, use @sectorsize, so that we are ensured to find and lock any remaining blocks inside the folio. And since we're here, add an extra ASSERT() to before calling btrfs_find_delalloc_range() to make sure the @max_bytes is at least no smaller than a block to avoid false negative. Cc: stable@vger.kernel.org # 5.15+ Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> [ adapted folio terminology and API calls to page-based equivalents ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-19PCI: endpoint: pci-epf-test: Add NULL check for DMA channels before releaseShin'ichiro Kawasaki1-6/+11
[ Upstream commit 85afa9ea122dd9d4a2ead104a951d318975dcd25 ] The fields dma_chan_tx and dma_chan_rx of the struct pci_epf_test can be NULL even after EPF initialization. Then it is prudent to check that they have non-NULL values before releasing the channels. Add the checks in pci_epf_test_clean_dma_chan(). Without the checks, NULL pointer dereferences happen and they can lead to a kernel panic in some cases: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000050 Call trace: dma_release_channel+0x2c/0x120 (P) pci_epf_test_epc_deinit+0x94/0xc0 [pci_epf_test] pci_epc_deinit_notify+0x74/0xc0 tegra_pcie_ep_pex_rst_irq+0x250/0x5d8 irq_thread_fn+0x34/0xb8 irq_thread+0x18c/0x2e8 kthread+0x14c/0x210 ret_from_fork+0x10/0x20 Fixes: 8353813c88ef ("PCI: endpoint: Enable DMA tests for endpoints with DMA capabilities") Fixes: 5ebf3fc59bd2 ("PCI: endpoint: functions/pci-epf-test: Add DMA support to transfer data") Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> [mani: trimmed the stack trace] Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20250916025756.34807-1-shinichiro.kawasaki@wdc.com Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-19PCI: endpoint: Remove surplus return statement from ↵Wang Jiang1-2/+0
pci_epf_test_clean_dma_chan() [ Upstream commit 9b80bdb10aee04ce7289896e6bdad13e33972636 ] Remove a surplus return statement from the void function that has been added in the commit commit 8353813c88ef ("PCI: endpoint: Enable DMA tests for endpoints with DMA capabilities"). Especially, as an empty return statements at the end of a void functions serve little purpose. This fixes the following checkpatch.pl script warning: WARNING: void function return statements are not generally useful #296: FILE: drivers/pci/endpoint/functions/pci-epf-test.c:296: + return; +} Link: https://lore.kernel.org/r/tencent_F250BEE2A65745A524E2EFE70CF615CA8F06@qq.com Signed-off-by: Wang Jiang <jiangwang@kylinos.cn> [kwilczynski: commit log] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Stable-dep-of: 85afa9ea122d ("PCI: endpoint: pci-epf-test: Add NULL check for DMA channels before release") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-19misc: fastrpc: Save actual DMA size in fastrpc_map structureLing Xu1-9/+18
[ Upstream commit 8b5b456222fd604079b5cf2af1f25ad690f54a25 ] For user passed fd buffer, map is created using DMA calls. The map related information is stored in fastrpc_map structure. The actual DMA size is not stored in the structure. Store the actual size of buffer and check it against the user passed size. Fixes: c68cfb718c8f ("misc: fastrpc: Add support for context Invoke method") Cc: stable@kernel.org Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Co-developed-by: Ekansh Gupta <ekansh.gupta@oss.qualcomm.com> Signed-off-by: Ekansh Gupta <ekansh.gupta@oss.qualcomm.com> Signed-off-by: Ling Xu <quic_lxu5@quicinc.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Signed-off-by: Srinivas Kandagatla <srini@kernel.org> Link: https://lore.kernel.org/r/20250912131236.303102-2-srini@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-19misc: fastrpc: Add missing dev_err newlinesEkansh Gupta1-5/+5
[ Upstream commit a150c68ae6369ea65b786fefd0b8aa0b075c041a ] Few dev_err calls are missing newlines. This can result in unrelated lines getting appended which might make logs difficult to understand. Add trailing newlines to avoid this. Signed-off-by: Ekansh Gupta <quic_ekangupt@quicinc.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Reviewed-by: Caleb Connolly <caleb.connolly@linaro.org> Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Link: https://lore.kernel.org/r/20240705075900.424100-3-srinivas.kandagatla@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Stable-dep-of: 8b5b456222fd ("misc: fastrpc: Save actual DMA size in fastrpc_map structure") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-19ksmbd: add max ip connections parameterNamjae Jeon4-13/+24
[ Upstream commit d8b6dc9256762293048bf122fc11c4e612d0ef5d ] This parameter set the maximum number of connections per ip address. The default is 8. Cc: stable@vger.kernel.org Fixes: c0d41112f1a5 ("ksmbd: extend the connection limiting mechanism to support IPv6") Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> [ Adjust reserved room ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-19KVM: SVM: Skip fastpath emulation on VM-Exit if next RIP isn't validSean Christopherson1-2/+11
[ Upstream commit 0910dd7c9ad45a2605c45fd2bf3d1bcac087687c ] Skip the WRMSR and HLT fastpaths in SVM's VM-Exit handler if the next RIP isn't valid, e.g. because KVM is running with nrips=false. SVM must decode and emulate to skip the instruction if the CPU doesn't provide the next RIP, and getting the instruction bytes to decode requires reading guest memory. Reading guest memory through the emulator can fault, i.e. can sleep, which is disallowed since the fastpath handlers run with IRQs disabled. BUG: sleeping function called from invalid context at ./include/linux/uaccess.h:106 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 32611, name: qemu preempt_count: 1, expected: 0 INFO: lockdep is turned off. irq event stamp: 30580 hardirqs last enabled at (30579): [<ffffffffc08b2527>] vcpu_run+0x1787/0x1db0 [kvm] hardirqs last disabled at (30580): [<ffffffffb4f62e32>] __schedule+0x1e2/0xed0 softirqs last enabled at (30570): [<ffffffffb4247a64>] fpu_swap_kvm_fpstate+0x44/0x210 softirqs last disabled at (30568): [<ffffffffb4247a64>] fpu_swap_kvm_fpstate+0x44/0x210 CPU: 298 UID: 0 PID: 32611 Comm: qemu Tainted: G U 6.16.0-smp--e6c618b51cfe-sleep #782 NONE Tainted: [U]=USER Hardware name: Google Astoria-Turin/astoria, BIOS 0.20241223.2-0 01/17/2025 Call Trace: <TASK> dump_stack_lvl+0x7d/0xb0 __might_resched+0x271/0x290 __might_fault+0x28/0x80 kvm_vcpu_read_guest_page+0x8d/0xc0 [kvm] kvm_fetch_guest_virt+0x92/0xc0 [kvm] __do_insn_fetch_bytes+0xf3/0x1e0 [kvm] x86_decode_insn+0xd1/0x1010 [kvm] x86_emulate_instruction+0x105/0x810 [kvm] __svm_skip_emulated_instruction+0xc4/0x140 [kvm_amd] handle_fastpath_invd+0xc4/0x1a0 [kvm] vcpu_run+0x11a1/0x1db0 [kvm] kvm_arch_vcpu_ioctl_run+0x5cc/0x730 [kvm] kvm_vcpu_ioctl+0x578/0x6a0 [kvm] __se_sys_ioctl+0x6d/0xb0 do_syscall_64+0x8a/0x2c0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 RIP: 0033:0x7f479d57a94b </TASK> Note, this is essentially a reapply of commit 5c30e8101e8d ("KVM: SVM: Skip WRMSR fastpath on VM-Exit if next RIP isn't valid"), but with different justification (KVM now grabs SRCU when skipping the instruction for other reasons). Fixes: b439eb8ab578 ("Revert "KVM: SVM: Skip WRMSR fastpath on VM-Exit if next RIP isn't valid"") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250805190526.1453366-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> [ adapted switch-based MSR/HLT fastpath to if-based MSR-only check ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-19mm/ksm: fix incorrect KSM counter handling in mm_struct during forkDonet Tom1-0/+6
[ Upstream commit 4d6fc29f36341d7795db1d1819b4c15fe9be7b23 ] Patch series "mm/ksm: Fix incorrect accounting of KSM counters during fork", v3. The first patch in this series fixes the incorrect accounting of KSM counters such as ksm_merging_pages, ksm_rmap_items, and the global ksm_zero_pages during fork. The following patch add a selftest to verify the ksm_merging_pages counter was updated correctly during fork. Test Results ============ Without the first patch ----------------------- # [RUN] test_fork_ksm_merging_page_count not ok 10 ksm_merging_page in child: 32 With the first patch -------------------- # [RUN] test_fork_ksm_merging_page_count ok 10 ksm_merging_pages is not inherited after fork This patch (of 2): Currently, the KSM-related counters in `mm_struct`, such as `ksm_merging_pages`, `ksm_rmap_items`, and `ksm_zero_pages`, are inherited by the child process during fork. This results in inconsistent accounting. When a process uses KSM, identical pages are merged and an rmap item is created for each merged page. The `ksm_merging_pages` and `ksm_rmap_items` counters are updated accordingly. However, after a fork, these counters are copied to the child while the corresponding rmap items are not. As a result, when the child later triggers an unmerge, there are no rmap items present in the child, so the counters remain stale, leading to incorrect accounting. A similar issue exists with `ksm_zero_pages`, which maintains both a global counter and a per-process counter. During fork, the per-process counter is inherited by the child, but the global counter is not incremented. Since the child also references zero pages, the global counter should be updated as well. Otherwise, during zero-page unmerge, both the global and per-process counters are decremented, causing the global counter to become inconsistent. To fix this, ksm_merging_pages and ksm_rmap_items are reset to 0 during fork, and the global ksm_zero_pages counter is updated with the per-process ksm_zero_pages value inherited by the child. This ensures that KSM statistics remain accurate and reflect the activity of each process correctly. Link: https://lkml.kernel.org/r/cover.1758648700.git.donettom@linux.ibm.com Link: https://lkml.kernel.org/r/7b9870eb67ccc0d79593940d9dbd4a0b39b5d396.1758648700.git.donettom@linux.ibm.com Fixes: 7609385337a4 ("ksm: count ksm merging pages for each process") Fixes: cb4df4cae4f2 ("ksm: count allocated ksm rmap_items for each process") Fixes: e2942062e01d ("ksm: count all zero pages placed by KSM") Signed-off-by: Donet Tom <donettom@linux.ibm.com> Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev> Acked-by: David Hildenbrand <david@redhat.com> Cc: Aboorva Devarajan <aboorvad@linux.ibm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: xu xin <xu.xin16@zte.com.cn> Cc: <stable@vger.kernel.org> [6.6+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> [ changed mm_flags_test() to test_bit() ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-19tracing: Fix race condition in kprobe initialization causing NULL pointer ↵Yuan Chen4-14/+29
dereference [ Upstream commit 9cf9aa7b0acfde7545c1a1d912576e9bab28dc6f ] There is a critical race condition in kprobe initialization that can lead to NULL pointer dereference and kernel crash. [1135630.084782] Unable to handle kernel paging request at virtual address 0000710a04630000 ... [1135630.260314] pstate: 404003c9 (nZcv DAIF +PAN -UAO) [1135630.269239] pc : kprobe_perf_func+0x30/0x260 [1135630.277643] lr : kprobe_dispatcher+0x44/0x60 [1135630.286041] sp : ffffaeff4977fa40 [1135630.293441] x29: ffffaeff4977fa40 x28: ffffaf015340e400 [1135630.302837] x27: 0000000000000000 x26: 0000000000000000 [1135630.312257] x25: ffffaf029ed108a8 x24: ffffaf015340e528 [1135630.321705] x23: ffffaeff4977fc50 x22: ffffaeff4977fc50 [1135630.331154] x21: 0000000000000000 x20: ffffaeff4977fc50 [1135630.340586] x19: ffffaf015340e400 x18: 0000000000000000 [1135630.349985] x17: 0000000000000000 x16: 0000000000000000 [1135630.359285] x15: 0000000000000000 x14: 0000000000000000 [1135630.368445] x13: 0000000000000000 x12: 0000000000000000 [1135630.377473] x11: 0000000000000000 x10: 0000000000000000 [1135630.386411] x9 : 0000000000000000 x8 : 0000000000000000 [1135630.395252] x7 : 0000000000000000 x6 : 0000000000000000 [1135630.403963] x5 : 0000000000000000 x4 : 0000000000000000 [1135630.412545] x3 : 0000710a04630000 x2 : 0000000000000006 [1135630.421021] x1 : ffffaeff4977fc50 x0 : 0000710a04630000 [1135630.429410] Call trace: [1135630.434828] kprobe_perf_func+0x30/0x260 [1135630.441661] kprobe_dispatcher+0x44/0x60 [1135630.448396] aggr_pre_handler+0x70/0xc8 [1135630.454959] kprobe_breakpoint_handler+0x140/0x1e0 [1135630.462435] brk_handler+0xbc/0xd8 [1135630.468437] do_debug_exception+0x84/0x138 [1135630.475074] el1_dbg+0x18/0x8c [1135630.480582] security_file_permission+0x0/0xd0 [1135630.487426] vfs_write+0x70/0x1c0 [1135630.493059] ksys_write+0x5c/0xc8 [1135630.498638] __arm64_sys_write+0x24/0x30 [1135630.504821] el0_svc_common+0x78/0x130 [1135630.510838] el0_svc_handler+0x38/0x78 [1135630.516834] el0_svc+0x8/0x1b0 kernel/trace/trace_kprobe.c: 1308 0xffff3df8995039ec <kprobe_perf_func+0x2c>: ldr x21, [x24,#120] include/linux/compiler.h: 294 0xffff3df8995039f0 <kprobe_perf_func+0x30>: ldr x1, [x21,x0] kernel/trace/trace_kprobe.c 1308: head = this_cpu_ptr(call->perf_events); 1309: if (hlist_empty(head)) 1310: return 0; crash> struct trace_event_call -o struct trace_event_call { ... [120] struct hlist_head *perf_events; //(call->perf_event) ... } crash> struct trace_event_call ffffaf015340e528 struct trace_event_call { ... perf_events = 0xffff0ad5fa89f088, //this value is correct, but x21 = 0 ... } Race Condition Analysis: The race occurs between kprobe activation and perf_events initialization: CPU0 CPU1 ==== ==== perf_kprobe_init perf_trace_event_init tp_event->perf_events = list;(1) tp_event->class->reg (2)← KPROBE ACTIVE Debug exception triggers ... kprobe_dispatcher kprobe_perf_func (tk->tp.flags & TP_FLAG_PROFILE) head = this_cpu_ptr(call->perf_events)(3) (perf_events is still NULL) Problem: 1. CPU0 executes (1) assigning tp_event->perf_events = list 2. CPU0 executes (2) enabling kprobe functionality via class->reg() 3. CPU1 triggers and reaches kprobe_dispatcher 4. CPU1 checks TP_FLAG_PROFILE - condition passes (step 2 completed) 5. CPU1 calls kprobe_perf_func() and crashes at (3) because call->perf_events is still NULL CPU1 sees that kprobe functionality is enabled but does not see that perf_events has been assigned. Add pairing read and write memory barriers to guarantee that if CPU1 sees that kprobe functionality is enabled, it must also see that perf_events has been assigned. Link: https://lore.kernel.org/all/20251001022025.44626-1-chenyuan_fl@163.com/ Fixes: 50d780560785 ("tracing/kprobes: Add probe handler dispatcher to support perf and ftrace concurrent use") Cc: stable@vger.kernel.org Signed-off-by: Yuan Chen <chenyuan@kylinos.cn> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> [ Adjust context ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-19mfd: intel_soc_pmic_chtdc_ti: Set use_single_read regmap_config flagHans de Goede1-0/+2
[ Upstream commit 64e0d839c589f4f2ecd2e3e5bdb5cee6ba6bade9 ] Testing has shown that reading multiple registers at once (for 10-bit ADC values) does not work. Set the use_single_read regmap_config flag to make regmap split these for us. This should fix temperature opregion accesses done by drivers/acpi/pmic/intel_pmic_chtdc_ti.c and is also necessary for the upcoming drivers for the ADC and battery MFD cells. Fixes: 6bac0606fdba ("mfd: Add support for Cherry Trail Dollar Cove TI PMIC") Cc: stable@vger.kernel.org Reviewed-by: Andy Shevchenko <andy@kernel.org> Signed-off-by: Hans de Goede <hansg@kernel.org> Link: https://lore.kernel.org/r/20250804133240.312383-1-hansg@kernel.org Signed-off-by: Lee Jones <lee@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-19mfd: intel_soc_pmic_chtdc_ti: Drop unneeded assignment for cache_typeAndy Shevchenko1-1/+0
[ Upstream commit 9eb99c08508714906db078b5efbe075329a3fb06 ] REGCACHE_NONE is the default type of the cache when not provided. Drop unneeded explicit assignment to it. Note, it's defined to 0, and if ever be redefined, it will break literally a lot of the drivers, so it very unlikely to happen. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20250129152823.1802273-1-andriy.shevchenko@linux.intel.com Signed-off-by: Lee Jones <lee@kernel.org> Stable-dep-of: 64e0d839c589 ("mfd: intel_soc_pmic_chtdc_ti: Set use_single_read regmap_config flag") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-19mfd: intel_soc_pmic_chtdc_ti: Fix invalid regmap-config max_register valueHans de Goede1-1/+1
[ Upstream commit 70e997e0107e5ed85c1a3ef2adfccbe351c29d71 ] The max_register = 128 setting in the regmap config is not valid. The Intel Dollar Cove TI PMIC has an eeprom unlock register at address 0x88 and a number of EEPROM registers at 0xF?. Increase max_register to 0xff so that these registers can be accessed. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Andy Shevchenko <andy@kernel.org> Link: https://lore.kernel.org/r/20241208150028.325349-1-hdegoede@redhat.com Signed-off-by: Lee Jones <lee@kernel.org> Stable-dep-of: 64e0d839c589 ("mfd: intel_soc_pmic_chtdc_ti: Set use_single_read regmap_config flag") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-19media: mc: Clear minor number before put deviceEdward Adam Davis1-5/+1
[ Upstream commit 8cfc8cec1b4da88a47c243a11f384baefd092a50 ] The device minor should not be cleared after the device is released. Fixes: 9e14868dc952 ("media: mc: Clear minor number reservation at unregistration time") Cc: stable@vger.kernel.org Reported-by: syzbot+031d0cfd7c362817963f@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=031d0cfd7c362817963f Tested-by: syzbot+031d0cfd7c362817963f@syzkaller.appspotmail.com Signed-off-by: Edward Adam Davis <eadavis@qq.com> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org> [ moved clear_bit from media_devnode_release callback to media_devnode_unregister before put_device ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-19Squashfs: reject negative file sizes in squashfs_read_inode()Phillip Lougher1-0/+4
[ Upstream commit 9f1c14c1de1bdde395f6cc893efa4f80a2ae3b2b ] Syskaller reports a "WARNING in ovl_copy_up_file" in overlayfs. This warning is ultimately caused because the underlying Squashfs file system returns a file with a negative file size. This commit checks for a negative file size and returns EINVAL. [phillip@squashfs.org.uk: only need to check 64 bit quantity] Link: https://lkml.kernel.org/r/20250926222305.110103-1-phillip@squashfs.org.uk Link: https://lkml.kernel.org/r/20250926215935.107233-1-phillip@squashfs.org.uk Fixes: 6545b246a2c8 ("Squashfs: inode operations") Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Reported-by: syzbot+f754e01116421e9754b9@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/68d580e5.a00a0220.303701.0019.GAE@google.com/ Cc: Amir Goldstein <amir73il@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-19Squashfs: add additional inode sanity checkingPhillip Lougher1-2/+18
[ Upstream commit 9ee94bfbe930a1b39df53fa2d7b31141b780eb5a ] Patch series "Squashfs: performance improvement and a sanity check". This patchset adds an additional sanity check when reading regular file inodes, and adds support for SEEK_DATA/SEEK_HOLE lseek() whence values. This patch (of 2): Add an additional sanity check when reading regular file inodes. A regular file if the file size is an exact multiple of the filesystem block size cannot have a fragment. This is because by definition a fragment block stores tailends which are not a whole block in size. Link: https://lkml.kernel.org/r/20250923220652.568416-1-phillip@squashfs.org.uk Link: https://lkml.kernel.org/r/20250923220652.568416-2-phillip@squashfs.org.uk Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Stable-dep-of: 9f1c14c1de1b ("Squashfs: reject negative file sizes in squashfs_read_inode()") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-19lib/crypto/curve25519-hacl64: Disable KASAN with clang-17 and olderNathan Chancellor1-0/+4
commit 2f13daee2a72bb962f5fd356c3a263a6f16da965 upstream. After commit 6f110a5e4f99 ("Disable SLUB_TINY for build testing"), which causes CONFIG_KASAN to be enabled in allmodconfig again, arm64 allmodconfig builds with clang-17 and older show an instance of -Wframe-larger-than (which breaks the build with CONFIG_WERROR=y): lib/crypto/curve25519-hacl64.c:757:6: error: stack frame size (2336) exceeds limit (2048) in 'curve25519_generic' [-Werror,-Wframe-larger-than] 757 | void curve25519_generic(u8 mypublic[CURVE25519_KEY_SIZE], | ^ When KASAN is disabled, the stack usage is roughly quartered: lib/crypto/curve25519-hacl64.c:757:6: error: stack frame size (608) exceeds limit (128) in 'curve25519_generic' [-Werror,-Wframe-larger-than] 757 | void curve25519_generic(u8 mypublic[CURVE25519_KEY_SIZE], | ^ Using '-Rpass-analysis=stack-frame-layout' shows the following variables and many, many 8-byte spills when KASAN is enabled: Offset: [SP-144], Type: Variable, Align: 8, Size: 40 Offset: [SP-464], Type: Variable, Align: 8, Size: 320 Offset: [SP-784], Type: Variable, Align: 8, Size: 320 Offset: [SP-864], Type: Variable, Align: 32, Size: 80 Offset: [SP-896], Type: Variable, Align: 32, Size: 32 Offset: [SP-1016], Type: Variable, Align: 8, Size: 120 When KASAN is disabled, there are still spills but not at many and the variables list is smaller: Offset: [SP-192], Type: Variable, Align: 32, Size: 80 Offset: [SP-224], Type: Variable, Align: 32, Size: 32 Offset: [SP-344], Type: Variable, Align: 8, Size: 120 Disable KASAN for this file when using clang-17 or older to avoid blowing out the stack, clearing up the warning. Signed-off-by: Nathan Chancellor <nathan@kernel.org> Acked-by: "Jason A. Donenfeld" <Jason@zx2c4.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250609-curve25519-hacl64-disable-kasan-clang-v1-1-08ea0ac5ccff@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-19ext4: free orphan info with kvfreeJan Kara1-2/+2
commit 971843c511c3c2f6eda96c6b03442913bfee6148 upstream. Orphan info is now getting allocated with kvmalloc_array(). Free it with kvfree() instead of kfree() to avoid complaints from mm. Reported-by: Chris Mason <clm@meta.com> Fixes: 0a6ce20c1564 ("ext4: verify orphan file size is not too big") Cc: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Message-ID: <20251007134936.7291-2-jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-19ACPICA: Allow to skip Global Lock initializationHuacai Chen2-0/+10
commit feb8ae81b2378b75a99c81d315602ac8918ed382 upstream. Introduce acpi_gbl_use_global_lock, which allows to skip the Global Lock initialization. This is useful for systems without Global Lock (such as loong_arch), so as to avoid error messages during boot phase: ACPI Error: Could not enable global_lock event (20240827/evxfevnt-182) ACPI Error: No response from Global Lock hardware, disabling lock (20240827/evglock-59) Link: https://github.com/acpica/acpica/commit/463cb0fe Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-19ext4: validate ea_ino and size in check_xattrsDeepanshu Kartikey1-0/+4
commit 44d2a72f4d64655f906ba47a5e108733f59e6f28 upstream. During xattr block validation, check_xattrs() processes xattr entries without validating that entries claiming to use EA inodes have non-zero sizes. Corrupted filesystems may contain xattr entries where e_value_size is zero but e_value_inum is non-zero, indicating invalid xattr data. Add validation in check_xattrs() to detect this corruption pattern early and return -EFSCORRUPTED, preventing invalid xattr entries from causing issues throughout the ext4 codebase. Cc: stable@kernel.org Suggested-by: Theodore Ts'o <tytso@mit.edu> Reported-by: syzbot+4c9d23743a2409b80293@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=4c9d23743a2409b80293 Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Message-ID: <20250923133245.1091761-1-kartikey406@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-19ext4: guard against EA inode refcount underflow in xattr updateAhmet Eray Karadag1-7/+8
commit 57295e835408d8d425bef58da5253465db3d6888 upstream. syzkaller found a path where ext4_xattr_inode_update_ref() reads an EA inode refcount that is already <= 0 and then applies ref_change (often -1). That lets the refcount underflow and we proceed with a bogus value, triggering errors like: EXT4-fs error: EA inode <n> ref underflow: ref_count=-1 ref_change=-1 EXT4-fs warning: ea_inode dec ref err=-117 Make the invariant explicit: if the current refcount is non-positive, treat this as on-disk corruption, emit ext4_error_inode(), and fail the operation with -EFSCORRUPTED instead of updating the refcount. Delete the WARN_ONCE() as negative refcounts are now impossible; keep error reporting in ext4_error_inode(). This prevents the underflow and the follow-on orphan/cleanup churn. Reported-by: syzbot+0be4f339a8218d2a5bb1@syzkaller.appspotmail.com Fixes: https://syzbot.org/bug?extid=0be4f339a8218d2a5bb1 Cc: stable@kernel.org Co-developed-by: Albin Babu Varghese <albinbabuvarghese20@gmail.com> Signed-off-by: Albin Babu Varghese <albinbabuvarghese20@gmail.com> Signed-off-by: Ahmet Eray Karadag <eraykrdg1@gmail.com> Message-ID: <20250920021342.45575-1-eraykrdg1@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>