summaryrefslogtreecommitdiff
path: root/arch/powerpc/include
AgeCommit message (Collapse)AuthorFilesLines
2026-03-19powerpc/uaccess: Fix inline assembly for clang build on PPC32Christophe Leroy (CS GROUP)1-1/+1
[ Upstream commit 0ee95a1d458630272d0415d0ffa9424fcb606c90 ] Test robot reports the following error with clang-16.0.6: In file included from kernel/rseq.c:75: include/linux/rseq_entry.h:141:3: error: invalid operand for instruction unsafe_get_user(offset, &ucs->post_commit_offset, efault); ^ include/linux/uaccess.h:608:2: note: expanded from macro 'unsafe_get_user' arch_unsafe_get_user(x, ptr, local_label); \ ^ arch/powerpc/include/asm/uaccess.h:518:2: note: expanded from macro 'arch_unsafe_get_user' __get_user_size_goto(__gu_val, __gu_addr, sizeof(*(p)), e); \ ^ arch/powerpc/include/asm/uaccess.h:284:2: note: expanded from macro '__get_user_size_goto' __get_user_size_allowed(x, ptr, size, __gus_retval); \ ^ arch/powerpc/include/asm/uaccess.h:275:10: note: expanded from macro '__get_user_size_allowed' case 8: __get_user_asm2(x, (u64 __user *)ptr, retval); break; \ ^ arch/powerpc/include/asm/uaccess.h:258:4: note: expanded from macro '__get_user_asm2' " li %1+1,0\n" \ ^ <inline asm>:7:5: note: instantiated into assembly here li 31+1,0 ^ 1 error generated. On PPC32, for 64 bits vars a pair of registers is used. Usually the lower register in the pair is the high part and the higher register is the low part. GCC uses r3/r4 ... r11/r12 ... r14/r15 ... r30/r31 In older kernel code inline assembly was using %1 and %1+1 to represent 64 bits values. However here it looks like clang uses r31 as high part, allthough r32 doesn't exist hence the error. Allthoug %1+1 should work, most places now use %L1 instead of %1+1, so let's do the same here. With that change, the build doesn't fail anymore and a disassembly shows clang uses r17/r18 and r31/r14 pair when GCC would have used r16/r17 and r30/r31: Disassembly of section .fixup: 00000000 <.fixup>: 0: 38 a0 ff f2 li r5,-14 4: 3a 20 00 00 li r17,0 8: 3a 40 00 00 li r18,0 c: 48 00 00 00 b c <.fixup+0xc> c: R_PPC_REL24 .text+0xbc 10: 38 a0 ff f2 li r5,-14 14: 3b e0 00 00 li r31,0 18: 39 c0 00 00 li r14,0 1c: 48 00 00 00 b 1c <.fixup+0x1c> 1c: R_PPC_REL24 .text+0x144 Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202602021825.otcItxGi-lkp@intel.com/ Fixes: c20beffeec3c ("powerpc/uaccess: Use flexible addressing with __put_user()/__get_user()") Signed-off-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org> Acked-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/8ca3a657a650e497a96bfe7acde2f637dadab344.1770103646.git.chleroy@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-26powerpc/eeh: fix recursive pci_lock_rescan_remove locking in EEH event handlingNarayana Murty N1-0/+2
[ Upstream commit 815a8d2feb5615ae7f0b5befd206af0b0160614c ] The recent commit 1010b4c012b0 ("powerpc/eeh: Make EEH driver device hotplug safe") restructured the EEH driver to improve synchronization with the PCI hotplug layer. However, it inadvertently moved pci_lock_rescan_remove() outside its intended scope in eeh_handle_normal_event(), leading to broken PCI error reporting and improper EEH event triggering. Specifically, eeh_handle_normal_event() acquired pci_lock_rescan_remove() before calling eeh_pe_bus_get(), but eeh_pe_bus_get() itself attempts to acquire the same lock internally, causing nested locking and disrupting normal EEH event handling paths. This patch adds a boolean parameter do_lock to _eeh_pe_bus_get(), with two public wrappers: eeh_pe_bus_get() with locking enabled. eeh_pe_bus_get_nolock() that skips locking. Callers that already hold pci_lock_rescan_remove() now use eeh_pe_bus_get_nolock() to avoid recursive lock acquisition. Additionally, pci_lock_rescan_remove() calls are restored to the correct position—after eeh_pe_bus_get() and immediately before iterating affected PEs and devices. This ensures EEH-triggered PCI removes occur under proper bus rescan locking without recursive lock contention. The eeh_pe_loc_get() function has been split into two functions: eeh_pe_loc_get(struct eeh_pe *pe) which retrieves the loc for given PE. eeh_pe_loc_get_bus(struct pci_bus *bus) which retrieves the location code for given bus. This resolves lockdep warnings such as: <snip> [ 84.964298] [ T928] ============================================ [ 84.964304] [ T928] WARNING: possible recursive locking detected [ 84.964311] [ T928] 6.18.0-rc3 #51 Not tainted [ 84.964315] [ T928] -------------------------------------------- [ 84.964320] [ T928] eehd/928 is trying to acquire lock: [ 84.964324] [ T928] c000000003b29d58 (pci_rescan_remove_lock){+.+.}-{3:3}, at: pci_lock_rescan_remove+0x28/0x40 [ 84.964342] [ T928] but task is already holding lock: [ 84.964347] [ T928] c000000003b29d58 (pci_rescan_remove_lock){+.+.}-{3:3}, at: pci_lock_rescan_remove+0x28/0x40 [ 84.964357] [ T928] other info that might help us debug this: [ 84.964363] [ T928] Possible unsafe locking scenario: [ 84.964367] [ T928] CPU0 [ 84.964370] [ T928] ---- [ 84.964373] [ T928] lock(pci_rescan_remove_lock); [ 84.964378] [ T928] lock(pci_rescan_remove_lock); [ 84.964383] [ T928] *** DEADLOCK *** [ 84.964388] [ T928] May be due to missing lock nesting notation [ 84.964393] [ T928] 1 lock held by eehd/928: [ 84.964397] [ T928] #0: c000000003b29d58 (pci_rescan_remove_lock){+.+.}-{3:3}, at: pci_lock_rescan_remove+0x28/0x40 [ 84.964408] [ T928] stack backtrace: [ 84.964414] [ T928] CPU: 2 UID: 0 PID: 928 Comm: eehd Not tainted 6.18.0-rc3 #51 VOLUNTARY [ 84.964417] [ T928] Hardware name: IBM,9080-HEX POWER10 (architected) 0x800200 0xf000006 of:IBM,FW1060.00 (NH1060_022) hv:phyp pSeries [ 84.964419] [ T928] Call Trace: [ 84.964420] [ T928] [c0000011a7157990] [c000000001705de4] dump_stack_lvl+0xc8/0x130 (unreliable) [ 84.964424] [ T928] [c0000011a71579d0] [c0000000002f66e0] print_deadlock_bug+0x430/0x440 [ 84.964428] [ T928] [c0000011a7157a70] [c0000000002fd0c0] __lock_acquire+0x1530/0x2d80 [ 84.964431] [ T928] [c0000011a7157ba0] [c0000000002fea54] lock_acquire+0x144/0x410 [ 84.964433] [ T928] [c0000011a7157cb0] [c0000011a7157cb0] __mutex_lock+0xf4/0x1050 [ 84.964436] [ T928] [c0000011a7157e00] [c000000000de21d8] pci_lock_rescan_remove+0x28/0x40 [ 84.964439] [ T928] [c0000011a7157e20] [c00000000004ed98] eeh_pe_bus_get+0x48/0xc0 [ 84.964442] [ T928] [c0000011a7157e50] [c000000000050434] eeh_handle_normal_event+0x64/0xa60 [ 84.964446] [ T928] [c0000011a7157f30] [c000000000051de8] eeh_event_handler+0xf8/0x190 [ 84.964450] [ T928] [c0000011a7157f90] [c0000000002747ac] kthread+0x16c/0x180 [ 84.964453] [ T928] [c0000011a7157fe0] [c00000000000ded8] start_kernel_thread+0x14/0x18 </snip> Fixes: 1010b4c012b0 ("powerpc/eeh: Make EEH driver device hotplug safe") Signed-off-by: Narayana Murty N <nnmlinux@linux.ibm.com> Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20251210142559.8874-1-nnmlinux@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-26powerpc/uaccess: Move barrier_nospec() out of allow_read_{from/write}_user()Christophe Leroy2-2/+4
[ Upstream commit 5fbc09eb0b4f4b1a4b33abebacbeee0d29f195e9 ] Commit 74e19ef0ff80 ("uaccess: Add speculation barrier to copy_from_user()") added a redundant barrier_nospec() in copy_from_user(), because powerpc is already calling barrier_nospec() in allow_read_from_user() and allow_read_write_user(). But on other architectures that call to barrier_nospec() was missing. So change powerpc instead of reverting the above commit and having to fix other architectures one by one. This is now possible because barrier_nospec() has also been added in copy_from_user_iter(). Move barrier_nospec() out of allow_read_from_user() and allow_read_write_user(). This will also allow reuse of those functions when implementing masked user access which doesn't require barrier_nospec(). Don't add it back in raw_copy_from_user() as it is already called by copy_from_user() and copy_from_user_iter(). Fixes: 74e19ef0ff80 ("uaccess: Add speculation barrier to copy_from_user()") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/f29612105c5fcbc8ceb7303808ddc1a781f0f6b5.1766574657.git.chleroy@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-12-22powerpc/32: Restore disabling of interrupts at interrupt/syscall exitChristophe Leroy (CS GROUP)2-1/+2
Commit 2997876c4a1a ("powerpc/32: Restore clearing of MSR[RI] at interrupt/syscall exit") delayed clearing of MSR[RI], but missed that both MSR[RI] and MSR[EE] are cleared at the same time, so the commit also delayed the disabling of interrupts, leading to unexpected behaviour. To fix that, mostly revert the blamed commit and restore the clearing of MSR[RI] in interrupt_exit_kernel_prepare() instead. For 8xx it implies adding a synchronising instruction after the mtspr in order to make sure no instruction counter interrupt (used for perf events) will fire just after clearing MSR[RI]. Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de> Closes: https://lore.kernel.org/all/4d0bd05d-6158-1323-3509-744d3fbe8fc7@xenosoft.de/ Reported-by: Guenter Roeck <linux@roeck-us.net> Closes: https://lore.kernel.org/all/6b05eb1c-fdef-44e0-91a7-8286825e68f1@roeck-us.net/ Fixes: 2997876c4a1a ("powerpc/32: Restore clearing of MSR[RI] at interrupt/syscall exit") Signed-off-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/585ea521b2be99d293b539bbfae148366cfb3687.1766146895.git.chleroy@kernel.org
2025-12-06Merge tag 'mm-nonmm-stable-2025-12-06-11-14' of ↵Linus Torvalds1-0/+8
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - "panic: sys_info: Refactor and fix a potential issue" (Andy Shevchenko) fixes a build issue and does some cleanup in ib/sys_info.c - "Implement mul_u64_u64_div_u64_roundup()" (David Laight) enhances the 64-bit math code on behalf of a PWM driver and beefs up the test module for these library functions - "scripts/gdb/symbols: make BPF debug info available to GDB" (Ilya Leoshkevich) makes BPF symbol names, sizes, and line numbers available to the GDB debugger - "Enable hung_task and lockup cases to dump system info on demand" (Feng Tang) adds a sysctl which can be used to cause additional info dumping when the hung-task and lockup detectors fire - "lib/base64: add generic encoder/decoder, migrate users" (Kuan-Wei Chiu) adds a general base64 encoder/decoder to lib/ and migrates several users away from their private implementations - "rbree: inline rb_first() and rb_last()" (Eric Dumazet) makes TCP a little faster - "liveupdate: Rework KHO for in-kernel users" (Pasha Tatashin) reworks the KEXEC Handover interfaces in preparation for Live Update Orchestrator (LUO), and possibly for other future clients - "kho: simplify state machine and enable dynamic updates" (Pasha Tatashin) increases the flexibility of KEXEC Handover. Also preparation for LUO - "Live Update Orchestrator" (Pasha Tatashin) is a major new feature targeted at cloud environments. Quoting the cover letter: This series introduces the Live Update Orchestrator, a kernel subsystem designed to facilitate live kernel updates using a kexec-based reboot. This capability is critical for cloud environments, allowing hypervisors to be updated with minimal downtime for running virtual machines. LUO achieves this by preserving the state of selected resources, such as memory, devices and their dependencies, across the kernel transition. As a key feature, this series includes support for preserving memfd file descriptors, which allows critical in-memory data, such as guest RAM or any other large memory region, to be maintained in RAM across the kexec reboot. Mike Rappaport merits a mention here, for his extensive review and testing work. - "kexec: reorganize kexec and kdump sysfs" (Sourabh Jain) moves the kexec and kdump sysfs entries from /sys/kernel/ to /sys/kernel/kexec/ and adds back-compatibility symlinks which can hopefully be removed one day - "kho: fixes for vmalloc restoration" (Mike Rapoport) fixes a BUG which was being hit during KHO restoration of vmalloc() regions * tag 'mm-nonmm-stable-2025-12-06-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (139 commits) calibrate: update header inclusion Reinstate "resource: avoid unnecessary lookups in find_next_iomem_res()" vmcoreinfo: track and log recoverable hardware errors kho: fix restoring of contiguous ranges of order-0 pages kho: kho_restore_vmalloc: fix initialization of pages array MAINTAINERS: TPM DEVICE DRIVER: update the W-tag init: replace simple_strtoul with kstrtoul to improve lpj_setup KHO: fix boot failure due to kmemleak access to non-PRESENT pages Documentation/ABI: new kexec and kdump sysfs interface Documentation/ABI: mark old kexec sysfs deprecated kexec: move sysfs entries to /sys/kernel/kexec test_kho: always print restore status kho: free chunks using free_page() instead of kfree() selftests/liveupdate: add kexec test for multiple and empty sessions selftests/liveupdate: add simple kexec-based selftest for LUO selftests/liveupdate: add userspace API selftests docs: add documentation for memfd preservation via LUO mm: memfd_luo: allow preserving memfd liveupdate: luo_file: add private argument to store runtime state mm: shmem: export some functions to internal.h ...
2025-12-06Merge tag 'dma-mapping-6.19-2025-12-05' of ↵Linus Torvalds1-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux Pull dma-mapping updates from Marek Szyprowski: - More DMA mapping API refactoring to physical addresses as the primary interface instead of page+offset parameters. This time dma_map_ops callbacks are converted to physical addresses, what in turn results also in some simplification of architecture specific code (Leon Romanovsky and Jason Gunthorpe) - Clarify that dma_map_benchmark is not a kernel self-test, but standalone tool (Qinxin Xia) * tag 'dma-mapping-6.19-2025-12-05' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux: dma-mapping: remove unused map_page callback xen: swiotlb: Convert mapping routine to rely on physical address x86: Use physical address for DMA mapping sparc: Use physical address DMA mapping powerpc: Convert to physical address DMA mapping parisc: Convert DMA map_page to map_phys interface MIPS/jazzdma: Provide physical address directly alpha: Convert mapping routine to rely on physical address dma-mapping: remove unused mapping resource callbacks xen: swiotlb: Switch to physical address mapping callbacks ARM: dma-mapping: Switch to physical address mapping callbacks ARM: dma-mapping: Reduce struct page exposure in arch_sync_dma*() dma-mapping: convert dummy ops to physical address mapping dma-mapping: prepare dma_map_ops to conversion to physical address tools/dma: move dma_map_benchmark from selftests to tools/dma
2025-12-05Merge tag 'powerpc-6.19-1' of ↵Linus Torvalds4-6/+13
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Restore clearing of MSR[RI] at interrupt/syscall exit on 32-bit - Fix unpaired stwcx on interrupt exit on 32-bit - Fix race condition leading to double list-add in mac_hid_toggle_emumouse() - Fix mprotect on book3s 32-bit - Fix SLB multihit issue during SLB preload with 64-bit hash MMU - Add support for crashkernel CMA reservation - Add die_id and die_cpumask for Power10 & later to expose chip hemispheres - A series of minor fixes and improvements to the hash SLB code Thanks to Antonio Alvarez Feijoo, Ben Collins, Bhaskar Chowdhury, Christophe Leroy, Daniel Thompson, Dave Vasilevsky, Donet Tom, J. Neuschäfer, Kunwu Chan, Long Li, Naresh Kamboju, Nathan Chancellor, Ritesh Harjani (IBM), Shirisha G, Shrikanth Hegde, Sourabh Jain, Srikar Dronamraju, Stephen Rothwell, Thomas Zimmermann, Venkat Rao Bagalkote, and Vishal Chourasia. * tag 'powerpc-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (32 commits) macintosh/via-pmu-backlight: Include <linux/fb.h> and <linux/of.h> powerpc/powermac: backlight: Include <linux/of.h> powerpc/64s/slb: Add no_slb_preload early cmdline param powerpc/64s/slb: Make preload_add return type as void powerpc/ptdump: Dump PXX level info for kernel_page_tables powerpc/64s/pgtable: Enable directMap counters in meminfo for Hash powerpc/64s/hash: Update directMap page counters for Hash powerpc/64s/hash: Hash hpt_order should be only available with Hash MMU powerpc/64s/hash: Improve hash mmu printk messages powerpc/64s/hash: Fix phys_addr_t printf format in htab_initialize() powerpc/64s/ptdump: Fix kernel_hash_pagetable dump for ISA v3.00 HPTE format powerpc/64s/hash: Restrict stress_hpt_struct memblock region to within RMA limit powerpc/64s/slb: Fix SLB multihit issue during SLB preload powerpc, mm: Fix mprotect on book3s 32-bit powerpc/smp: Expose die_id and die_cpumask powerpc/83xx: Add a null pointer check to mcu_gpiochip_add arch:powerpc:tools This file was missing shebang line, so added it kexec: Include kernel-end even without crashkernel powerpc: p2020: Rename wdt@ nodes to watchdog@ powerpc: 86xx: Rename wdt@ nodes to watchdog@ ...
2025-12-04Merge tag 'iommu-updates-v6.19' of ↵Linus Torvalds1-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux Pull iommu updates from Joerg Roedel: - Introduction of the generic IO page-table framework with support for Intel and AMD IOMMU formats from Jason. This has good potential for unifying more IO page-table implementations and making future enhancements more easy. But this also needed quite some fixes during development. All known issues have been fixed, but my feeling is that there is a higher potential than usual that more might be needed. - Intel VT-d updates: - Use right invalidation hint in qi_desc_iotlb() - Reduce the scope of INTEL_IOMMU_FLOPPY_WA - ARM-SMMU updates: - Qualcomm device-tree binding updates for Kaanapali and Glymur SoCs and a new clock for the TBU. - Fix error handling if level 1 CD table allocation fails. - Permit more than the architectural maximum number of SMRs for funky Qualcomm mis-implementations of SMMUv2. - Mediatek driver: - MT8189 iommu support - Move ARM IO-pgtable selftests to kunit - Device leak fixes for a couple of drivers - Random smaller fixes and improvements * tag 'iommu-updates-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: (81 commits) iommupt/vtd: Support mgaw's less than a 4 level walk for first stage iommupt/vtd: Allow VT-d to have a larger table top than the vasz requires powerpc/pseries/svm: Make mem_encrypt.h self contained genpt: Make GENERIC_PT invisible iommupt: Avoid a compiler bug with sw_bit iommu/arm-smmu-qcom: Enable use of all SMR groups when running bare-metal iommupt: Fix unlikely flows in increase_top() iommu/amd: Propagate the error code returned by __modify_irte_ga() in modify_irte_ga() MAINTAINERS: Update my email address iommu/arm-smmu-v3: Fix error check in arm_smmu_alloc_cd_tables dt-bindings: iommu: qcom_iommu: Allow 'tbu' clock iommu/vt-d: Restore previous domain::aperture_end calculation iommu/vt-d: Fix unused invalidation hint in qi_desc_iotlb iommu/vt-d: Set INTEL_IOMMU_FLOPPY_WA depend on BLK_DEV_FD iommu/tegra: fix device leak on probe_device() iommu/sun50i: fix device leak on of_xlate() iommu/omap: simplify probe_device() error handling iommu/omap: fix device leaks on probe_device() iommu/mediatek-v1: add missing larb count sanity check iommu/mediatek-v1: fix device leaks on probe() ...
2025-12-02Merge tag 'core-uaccess-2025-11-30' of ↵Linus Torvalds1-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scoped user access updates from Thomas Gleixner: "Scoped user mode access and related changes: - Implement the missing u64 user access function on ARM when CONFIG_CPU_SPECTRE=n. This makes it possible to access a 64bit value in generic code with [unsafe_]get_user(). All other architectures and ARM variants provide the relevant accessors already. - Ensure that ASM GOTO jump label usage in the user mode access helpers always goes through a local C scope label indirection inside the helpers. This is required because compilers are not supporting that a ASM GOTO target leaves a auto cleanup scope. GCC silently fails to emit the cleanup invocation and CLANG fails the build. [ Editor's note: gcc-16 will have fixed the code generation issue in commit f68fe3ddda4 ("eh: Invoke cleanups/destructors in asm goto jumps [PR122835]"). But we obviously have to deal with clang and older versions of gcc, so.. - Linus ] This provides generic wrapper macros and the conversion of affected architecture code to use them. - Scoped user mode access with auto cleanup Access to user mode memory can be required in hot code paths, but if it has to be done with user controlled pointers, the access is shielded with a speculation barrier, so that the CPU cannot speculate around the address range check. Those speculation barriers impact performance quite significantly. This cost can be avoided by "masking" the provided pointer so it is guaranteed to be in the valid user memory access range and otherwise to point to a guaranteed unpopulated address space. This has to be done without branches so it creates an address dependency for the access, which the CPU cannot speculate ahead. This results in repeating and error prone programming patterns: if (can_do_masked_user_access()) from = masked_user_read_access_begin((from)); else if (!user_read_access_begin(from, sizeof(*from))) return -EFAULT; unsafe_get_user(val, from, Efault); user_read_access_end(); return 0; Efault: user_read_access_end(); return -EFAULT; which can be replaced with scopes and automatic cleanup: scoped_user_read_access(from, Efault) unsafe_get_user(val, from, Efault); return 0; Efault: return -EFAULT; - Convert code which implements the above pattern over to scope_user.*.access(). This also corrects a couple of imbalanced masked_*_begin() instances which are harmless on most architectures, but prevent PowerPC from implementing the masking optimization. - Add a missing speculation barrier in copy_from_user_iter()" * tag 'core-uaccess-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: lib/strn*,uaccess: Use masked_user_{read/write}_access_begin when required scm: Convert put_cmsg() to scoped user access iov_iter: Add missing speculation barrier to copy_from_user_iter() iov_iter: Convert copy_from_user_iter() to masked user access select: Convert to scoped user access x86/futex: Convert to scoped user access futex: Convert to get/put_user_inline() uaccess: Provide put/get_user_inline() uaccess: Provide scoped user access regions arm64: uaccess: Use unsafe wrappers for ASM GOTO s390/uaccess: Use unsafe wrappers for ASM GOTO riscv/uaccess: Use unsafe wrappers for ASM GOTO powerpc/uaccess: Use unsafe wrappers for ASM GOTO x86/uaccess: Use unsafe wrappers for ASM GOTO uaccess: Provide ASM GOTO safe wrappers for unsafe_*_user() ARM: uaccess: Implement missing __get_user_asm_dword()
2025-11-28powerpc/pseries/svm: Make mem_encrypt.h self containedJason Gunthorpe1-0/+3
Add the missing forward declarations and includes so it does not have implicit dependencies. mem_encrypt.h is a public header imported by drivers. Users should not have to guess what include files are needed. Resolves a kbuild splat: In file included from drivers/iommu/generic_pt/fmt/iommu_amdv1.c:15: In file included from drivers/iommu/generic_pt/fmt/iommu_template.h:36: In file included from drivers/iommu/generic_pt/fmt/amdv1.h:23: In file included from include/linux/mem_encrypt.h:17: >> arch/powerpc/include/asm/mem_encrypt.h:13:49: warning: declaration of 'struct device' will not be visible outside of this function [-Wvisibility] 13 | static inline bool force_dma_unencrypted(struct device *dev) Fixes: 879ced2bab1b ("iommupt: Add the AMD IOMMU v1 page table format") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202511161358.rS5pSb3U-lkp@intel.com/ Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2025-11-21Merge branch 'objtool/core'Peter Zijlstra108-289/+346
Bring in the UDB and objtool data annotations to avoid conflicts while further extending the bug exceptions. Signed-off-by: Peter Zijlstra <peterz@infradead.org>
2025-11-18powerpc/64s/slb: Fix SLB multihit issue during SLB preloadDonet Tom1-1/+0
On systems using the hash MMU, there is a software SLB preload cache that mirrors the entries loaded into the hardware SLB buffer. This preload cache is subject to periodic eviction — typically after every 256 context switches — to remove old entry. To optimize performance, the kernel skips switch_mmu_context() in switch_mm_irqs_off() when the prev and next mm_struct are the same. However, on hash MMU systems, this can lead to inconsistencies between the hardware SLB and the software preload cache. If an SLB entry for a process is evicted from the software cache on one CPU, and the same process later runs on another CPU without executing switch_mmu_context(), the hardware SLB may retain stale entries. If the kernel then attempts to reload that entry, it can trigger an SLB multi-hit error. The following timeline shows how stale SLB entries are created and can cause a multi-hit error when a process moves between CPUs without a MMU context switch. CPU 0 CPU 1 ----- ----- Process P exec swapper/1 load_elf_binary begin_new_exc activate_mm switch_mm_irqs_off switch_mmu_context switch_slb /* * This invalidates all * the entries in the HW * and setup the new HW * SLB entries as per the * preload cache. */ context_switch sched_migrate_task migrates process P to cpu-1 Process swapper/0 context switch (to process P) (uses mm_struct of Process P) switch_mm_irqs_off() switch_slb load_slb++ /* * load_slb becomes 0 here * and we evict an entry from * the preload cache with * preload_age(). We still * keep HW SLB and preload * cache in sync, that is * because all HW SLB entries * anyways gets evicted in * switch_slb during SLBIA. * We then only add those * entries back in HW SLB, * which are currently * present in preload_cache * (after eviction). */ load_elf_binary continues... setup_new_exec() slb_setup_new_exec() sched_switch event sched_migrate_task migrates process P to cpu-0 context_switch from swapper/0 to Process P switch_mm_irqs_off() /* * Since both prev and next mm struct are same we don't call * switch_mmu_context(). This will cause the HW SLB and SW preload * cache to go out of sync in preload_new_slb_context. Because there * was an SLB entry which was evicted from both HW and preload cache * on cpu-1. Now later in preload_new_slb_context(), when we will try * to add the same preload entry again, we will add this to the SW * preload cache and then will add it to the HW SLB. Since on cpu-0 * this entry was never invalidated, hence adding this entry to the HW * SLB will cause a SLB multi-hit error. */ load_elf_binary continues... START_THREAD start_thread preload_new_slb_context /* * This tries to add a new EA to preload cache which was earlier * evicted from both cpu-1 HW SLB and preload cache. This caused the * HW SLB of cpu-0 to go out of sync with the SW preload cache. The * reason for this was, that when we context switched back on CPU-0, * we should have ideally called switch_mmu_context() which will * bring the HW SLB entries on CPU-0 in sync with SW preload cache * entries by setting up the mmu context properly. But we didn't do * that since the prev mm_struct running on cpu-0 was same as the * next mm_struct (which is true for swapper / kernel threads). So * now when we try to add this new entry into the HW SLB of cpu-0, * we hit a SLB multi-hit error. */ WARNING: CPU: 0 PID: 1810970 at arch/powerpc/mm/book3s64/slb.c:62 assert_slb_presence+0x2c/0x50(48 results) 02:47:29 [20157/42149] Modules linked in: CPU: 0 UID: 0 PID: 1810970 Comm: dd Not tainted 6.16.0-rc3-dirty #12 VOLUNTARY Hardware name: IBM pSeries (emulated by qemu) POWER8 (architected) 0x4d0200 0xf000004 of:SLOF,HEAD hv:linux,kvm pSeries NIP: c00000000015426c LR: c0000000001543b4 CTR: 0000000000000000 REGS: c0000000497c77e0 TRAP: 0700 Not tainted (6.16.0-rc3-dirty) MSR: 8000000002823033 <SF,VEC,VSX,FP,ME,IR,DR,RI,LE> CR: 28888482 XER: 00000000 CFAR: c0000000001543b0 IRQMASK: 3 <...> NIP [c00000000015426c] assert_slb_presence+0x2c/0x50 LR [c0000000001543b4] slb_insert_entry+0x124/0x390 Call Trace: 0x7fffceb5ffff (unreliable) preload_new_slb_context+0x100/0x1a0 start_thread+0x26c/0x420 load_elf_binary+0x1b04/0x1c40 bprm_execve+0x358/0x680 do_execveat_common+0x1f8/0x240 sys_execve+0x58/0x70 system_call_exception+0x114/0x300 system_call_common+0x160/0x2c4 >From the above analysis, during early exec the hardware SLB is cleared, and entries from the software preload cache are reloaded into hardware by switch_slb. However, preload_new_slb_context and slb_setup_new_exec also attempt to load some of the same entries, which can trigger a multi-hit. In most cases, these additional preloads simply hit existing entries and add nothing new. Removing these functions avoids redundant preloads and eliminates the multi-hit issue. This patch removes these two functions. We tested process switching performance using the context_switch benchmark on POWER9/hash, and observed no regression. Without this patch: 129041 ops/sec With this patch: 129341 ops/sec We also measured SLB faults during boot, and the counts are essentially the same with and without this patch. SLB faults without this patch: 19727 SLB faults with this patch: 19786 Fixes: 5434ae74629a ("powerpc/64s/hash: Add a SLB preload cache") cc: stable@vger.kernel.org Suggested-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Donet Tom <donettom@linux.ibm.com> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/0ac694ae683494fe8cadbd911a1a5018d5d3c541.1761834163.git.ritesh.list@gmail.com
2025-11-18powerpc, mm: Fix mprotect on book3s 32-bitDave Vasilevsky1-1/+4
On 32-bit book3s with hash-MMUs, tlb_flush() was a no-op. This was unnoticed because all uses until recently were for unmaps, and thus handled by __tlb_remove_tlb_entry(). After commit 4a18419f71cd ("mm/mprotect: use mmu_gather") in kernel 5.19, tlb_gather_mmu() started being used for mprotect as well. This caused mprotect to simply not work on these machines: int *ptr = mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0); *ptr = 1; // force HPTE to be created mprotect(ptr, 4096, PROT_READ); *ptr = 2; // should segfault, but succeeds Fixed by making tlb_flush() actually flush TLB pages. This finally agrees with the behaviour of boot3s64's tlb_flush(). Fixes: 4a18419f71cd ("mm/mprotect: use mmu_gather") Cc: stable@vger.kernel.org Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Dave Vasilevsky <dave@vasilevsky.ca> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20251116-vasi-mprotect-g3-v3-1-59a9bd33ba00@vasilevsky.ca
2025-11-14powerpc/smp: Expose die_id and die_cpumaskSrikar Dronamraju1-4/+7
>From Power10 processors onwards, each chip has 2 hemispheres. For LPARs running on PowerVM Hypervisor, hypervisor determines the allocation of CPU groups to each LPAR, resulting in two LPARs with the same number of CPUs potentially having different numbers of CPUs from each hemisphere. Additionally, it is not feasible to ascertain the hemisphere based solely on the CPU number. Users wishing to assign their workload to all CPUs, or a subset of CPUs within a specific hemisphere, encounter difficulties in identifying the cpumask. To address this, it is proposed to expose hemisphere information as a die in sysfs. This aligns with other architectures and facilitates the identification of CPUs within the same hemisphere. Tools such as lstopo can also access this information. Please note: The hypervisor reveals the locality of the CPUs to hemispheres only in dedicated mode. Consequently, in systems where hemisphere information is unavailable, such as shared LPARs, the die_cpus information in sysfs will mirror package_cpus, with die_id set to -1. Without this change. $ grep . /sys/devices/system/cpu/cpu16/topology/{die*,package*} 2>/dev/null /sys/devices/system/cpu/cpu16/topology/package_cpus:000000,000000ff,ffff0000 /sys/devices/system/cpu/cpu16/topology/package_cpus_list:16-39 With this change. $ grep . /sys/devices/system/cpu/cpu16/topology/{die*,package*} 2>/dev/null /sys/devices/system/cpu/cpu16/topology/die_cpus:000000,00000000,00ff0000 /sys/devices/system/cpu/cpu16/topology/die_cpus_list:16-23 /sys/devices/system/cpu/cpu16/topology/die_id:2 /sys/devices/system/cpu/cpu16/topology/package_cpus:000000,000000ff,ffff0000 /sys/devices/system/cpu/cpu16/topology/package_cpus_list:16-39 snipped lstopo-no-graphics o/p Group0 L#0 (total=8747584KB) Package L#0 (total=3564096KB CPUModel="POWER10 (architected), altivec supported" CPURevision="2.0 (pvr 0080 0200)") NUMANode L#0 (P#0 local=3564096KB total=3564096KB) Die L#0 (P#0) Core L#0 (P#0) <snipped> Package L#1 (total=5183488KB CPUModel="POWER10 (architected), altivec supported" CPURevision="2.0 (pvr 0080 0200)") NUMANode L#1 (P#1 local=5183488KB total=5183488KB) Die L#2 (P#2) Core L#2 (P#16) L3Cache L#4 (size=4096KB linesize=128 ways=16) L2Cache L#4 (size=1024KB linesize=128 ways=8) L1dCache L#4 (size=32KB linesize=128 ways=8) L1iCache L#4 (size=48KB linesize=128 ways=6) PU L#16 (P#16) PU L#17 (P#18) PU L#18 (P#20) PU L#19 (P#22) L3Cache L#5 (size=4096KB linesize=128 ways=16) L2Cache L#5 (size=1024KB linesize=128 ways=8) L1dCache L#5 (size=32KB linesize=128 ways=8) L1iCache L#5 (size=48KB linesize=128 ways=6) PU L#20 (P#17) PU L#21 (P#19) PU L#22 (P#21) PU L#23 (P#23) Die L#3 (P#3) Core L#3 (P#24) L3Cache L#6 (size=4096KB linesize=128 ways=16) L2Cache L#6 (size=1024KB linesize=128 ways=8) L1dCache L#6 (size=32KB linesize=128 ways=8) L1iCache L#6 (size=48KB linesize=128 ways=6) PU L#24 (P#24) PU L#25 (P#26) PU L#26 (P#28) PU L#27 (P#30) L3Cache L#7 (size=4096KB linesize=128 ways=16) L2Cache L#7 (size=1024KB linesize=128 ways=8) L1dCache L#7 (size=32KB linesize=128 ways=8) L1iCache L#7 (size=48KB linesize=128 ways=6) PU L#28 (P#25) PU L#29 (P#27) PU L#30 (P#29) PU L#31 (P#31) Core L#4 (P#32) L3Cache L#8 (size=4096KB linesize=128 ways=16) L2Cache L#8 (size=1024KB linesize=128 ways=8) L1dCache L#8 (size=32KB linesize=128 ways=8) L1iCache L#8 (size=48KB linesize=128 ways=6) PU L#32 (P#32) PU L#33 (P#34) PU L#34 (P#36) PU L#35 (P#38) L3Cache L#9 (size=4096KB linesize=128 ways=16) L2Cache L#9 (size=1024KB linesize=128 ways=8) L1dCache L#9 (size=32KB linesize=128 ways=8) L1iCache L#9 (size=48KB linesize=128 ways=6) PU L#36 (P#33) PU L#37 (P#35) PU L#38 (P#37) PU L#39 (P#39) Group0 L#1 (total=7736896KB) Package L#2 (total=5170880KB CPUModel="POWER10 (architected), altivec supported" CPURevision="2.0 (pvr 0080 0200)") NUMANode L#2 (P#2 local=5170880KB total=5170880KB) Die L#4 (P#4) <snipped> Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com> Signed-off-by: Srikar Dronamraju <srikar@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20251112074859.814087-1-srikar@linux.ibm.com
2025-11-12crash: let architecture decide crash memory export to iomem_resourceSourabh Jain1-0/+8
With the generic crashkernel reservation, the kernel emits the following warning on powerpc: WARNING: CPU: 0 PID: 1 at arch/powerpc/mm/mem.c:341 add_system_ram_resources+0xfc/0x180 Modules linked in: CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.17.0-auto-12607-g5472d60c129f #1 VOLUNTARY Hardware name: IBM,9080-HEX Power11 (architected) 0x820200 0xf000007 of:IBM,FW1110.01 (NH1110_069) hv:phyp pSeries NIP: c00000000201de3c LR: c00000000201de34 CTR: 0000000000000000 REGS: c000000127cef8a0 TRAP: 0700 Not tainted (6.17.0-auto-12607-g5472d60c129f) MSR: 8000000002029033 <SF,VEC,EE,ME,IR,DR,RI,LE> CR: 84000840 XER: 20040010 CFAR: c00000000017eed0 IRQMASK: 0 GPR00: c00000000201de34 c000000127cefb40 c0000000016a8100 0000000000000001 GPR04: c00000012005aa00 0000000020000000 c000000002b705c8 0000000000000000 GPR08: 000000007fffffff fffffffffffffff0 c000000002db8100 000000011fffffff GPR12: c00000000201dd40 c000000002ff0000 c0000000000112bc 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 0000000000000000 c0000000015a3808 GPR24: c00000000200468c c000000001699888 0000000000000106 c0000000020d1950 GPR28: c0000000014683f8 0000000081000200 c0000000015c1868 c000000002b9f710 NIP [c00000000201de3c] add_system_ram_resources+0xfc/0x180 LR [c00000000201de34] add_system_ram_resources+0xf4/0x180 Call Trace: add_system_ram_resources+0xf4/0x180 (unreliable) do_one_initcall+0x60/0x36c do_initcalls+0x120/0x220 kernel_init_freeable+0x23c/0x390 kernel_init+0x34/0x26c ret_from_kernel_user_thread+0x14/0x1c This warning occurs due to a conflict between crashkernel and System RAM iomem resources. The generic crashkernel reservation adds the crashkernel memory range to /proc/iomem during early initialization. Later, all memblock ranges are added to /proc/iomem as System RAM. If the crashkernel region overlaps with any memblock range, it causes a conflict while adding those memblock regions as iomem resources, triggering the above warning. The conflicting memblock regions are then omitted from /proc/iomem. For example, if the following crashkernel region is added to /proc/iomem: 20000000-11fffffff : Crash kernel then the following memblock regions System RAM regions fail to be inserted: 00000000-7fffffff : System RAM 80000000-257fffffff : System RAM Fix this by not adding the crashkernel memory to /proc/iomem on powerpc. Introduce an architecture hook to let each architecture decide whether to export the crashkernel region to /proc/iomem. For more info checkout commit c40dd2f766440 ("powerpc: Add System RAM to /proc/iomem") and commit bce074bdbc36 ("powerpc: insert System RAM resource to prevent crashkernel conflict") Note: Before switching to the generic crashkernel reservation, powerpc never exported the crashkernel region to /proc/iomem. Link: https://lkml.kernel.org/r/20251016142831.144515-1-sourabhjain@linux.ibm.com Fixes: e3185ee438c2 ("powerpc/crash: use generic crashkernel reservation"). Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Closes: https://lore.kernel.org/all/90937fe0-2e76-4c82-b27e-7b8a7fe3ac69@linux.ibm.com/ Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Cc: Baoquan he <bhe@redhat.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-11powerpc/kdump: Add support for crashkernel CMA reservationSourabh Jain1-0/+2
Commit 35c18f2933c5 ("Add a new optional ",cma" suffix to the crashkernel= command line option") and commit ab475510e042 ("kdump: implement reserve_crashkernel_cma") added CMA support for kdump crashkernel reservation. Extend crashkernel CMA reservation support to powerpc. The following changes are made to enable CMA reservation on powerpc: - Parse and obtain the CMA reservation size along with other crashkernel parameters - Call reserve_crashkernel_cma() to allocate the CMA region for kdump - Include the CMA-reserved ranges in the usable memory ranges for the kdump kernel to use. - Exclude the CMA-reserved ranges from the crash kernel memory to prevent them from being exported through /proc/vmcore. With the introduction of the CMA crashkernel regions, crash_exclude_mem_range() needs to be called multiple times to exclude both crashk_res and crashk_cma_ranges from the crash memory ranges. To avoid repetitive logic for validating mem_ranges size and handling reallocation when required, this functionality is moved to a new wrapper function crash_exclude_mem_range_guarded(). To ensure proper CMA reservation, reserve_crashkernel_cma() is called after pageblock_order is initialized. Update kernel-parameters.txt to document CMA support for crashkernel on powerpc architecture. Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20251107080334.708028-1-sourabhjain@linux.ibm.com
2025-11-03powerpc/uaccess: Use unsafe wrappers for ASM GOTOThomas Gleixner1-4/+4
ASM GOTO is miscompiled by GCC when it is used inside a auto cleanup scope: bool foo(u32 __user *p, u32 val) { scoped_guard(pagefault) unsafe_put_user(val, p, efault); return true; efault: return false; } It ends up leaking the pagefault disable counter in the fault path. clang at least fails the build. Rename unsafe_*_user() to arch_unsafe_*_user() which makes the generic uaccess header wrap it with a local label that makes both compilers emit correct code. Same for the kernel_nofault() variants. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027083745.356628509@linutronix.de
2025-10-29powerpc: Convert to physical address DMA mappingLeon Romanovsky1-4/+4
Adapt PowerPC DMA to use physical addresses in order to prepare code to removal .map_page and .unmap_page. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20251015-remove-map-page-v5-10-3bbfe3a25cdf@kernel.org
2025-10-06Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2-1/+15
Pull x86 kvm updates from Paolo Bonzini: "Generic: - Rework almost all of KVM's exports to expose symbols only to KVM's x86 vendor modules (kvm-{amd,intel}.ko and PPC's kvm-{pr,hv}.ko x86: - Rework almost all of KVM x86's exports to expose symbols only to KVM's vendor modules, i.e. to kvm-{amd,intel}.ko - Add support for virtualizing Control-flow Enforcement Technology (CET) on Intel (Shadow Stacks and Indirect Branch Tracking) and AMD (Shadow Stacks). It is worth noting that while SHSTK and IBT can be enabled separately in CPUID, it is not really possible to virtualize them separately. Therefore, Intel processors will really allow both SHSTK and IBT under the hood if either is made visible in the guest's CPUID. The alternative would be to intercept XSAVES/XRSTORS, which is not feasible for performance reasons - Fix a variety of fuzzing WARNs all caused by checking L1 intercepts when completing userspace I/O. KVM has already committed to allowing L2 to to perform I/O at that point - Emulate PERF_CNTR_GLOBAL_STATUS_SET for PerfMonV2 guests, as the MSR is supposed to exist for v2 PMUs - Allow Centaur CPU leaves (base 0xC000_0000) for Zhaoxin CPUs - Add support for the immediate forms of RDMSR and WRMSRNS, sans full emulator support (KVM should never need to emulate the MSRs outside of forced emulation and other contrived testing scenarios) - Clean up the MSR APIs in preparation for CET and FRED virtualization, as well as mediated vPMU support - Clean up a pile of PMU code in anticipation of adding support for mediated vPMUs - Reject in-kernel IOAPIC/PIT for TDX VMs, as KVM can't obtain EOI vmexits needed to faithfully emulate an I/O APIC for such guests - Many cleanups and minor fixes - Recover possible NX huge pages within the TDP MMU under read lock to reduce guest jitter when restoring NX huge pages - Return -EAGAIN during prefault if userspace concurrently deletes/moves the relevant memslot, to fix an issue where prefaulting could deadlock with the memslot update x86 (AMD): - Enable AVIC by default for Zen4+ if x2AVIC (and other prereqs) is supported - Require a minimum GHCB version of 2 when starting SEV-SNP guests via KVM_SEV_INIT2 so that invalid GHCB versions result in immediate errors instead of latent guest failures - Add support for SEV-SNP's CipherText Hiding, an opt-in feature that prevents unauthorized CPU accesses from reading the ciphertext of SNP guest private memory, e.g. to attempt an offline attack. This feature splits the shared SEV-ES/SEV-SNP ASID space into separate ranges for SEV-ES and SEV-SNP guests, therefore a new module parameter is needed to control the number of ASIDs that can be used for VMs with CipherText Hiding vs. how many can be used to run SEV-ES guests - Add support for Secure TSC for SEV-SNP guests, which prevents the untrusted host from tampering with the guest's TSC frequency, while still allowing the the VMM to configure the guest's TSC frequency prior to launch - Validate the XCR0 provided by the guest (via the GHCB) to avoid bugs resulting from bogus XCR0 values - Save an SEV guest's policy if and only if LAUNCH_START fully succeeds to avoid leaving behind stale state (thankfully not consumed in KVM) - Explicitly reject non-positive effective lengths during SNP's LAUNCH_UPDATE instead of subtly relying on guest_memfd to deal with them - Reload the pre-VMRUN TSC_AUX on #VMEXIT for SEV-ES guests, not the host's desired TSC_AUX, to fix a bug where KVM was keeping a different vCPU's TSC_AUX in the host MSR until return to userspace KVM (Intel): - Preparation for FRED support - Don't retry in TDX's anti-zero-step mitigation if the target memslot is invalid, i.e. is being deleted or moved, to fix a deadlock scenario similar to the aforementioned prefaulting case - Misc bugfixes and minor cleanups" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (142 commits) KVM: x86: Export KVM-internal symbols for sub-modules only KVM: x86: Drop pointless exports of kvm_arch_xxx() hooks KVM: x86: Move kvm_intr_is_single_vcpu() to lapic.c KVM: Export KVM-internal symbols for sub-modules only KVM: s390/vfio-ap: Use kvm_is_gpa_in_memslot() instead of open coded equivalent KVM: VMX: Make CR4.CET a guest owned bit KVM: selftests: Verify MSRs are (not) in save/restore list when (un)supported KVM: selftests: Add coverage for KVM-defined registers in MSRs test KVM: selftests: Add KVM_{G,S}ET_ONE_REG coverage to MSRs test KVM: selftests: Extend MSRs test to validate vCPUs without supported features KVM: selftests: Add support for MSR_IA32_{S,U}_CET to MSRs test KVM: selftests: Add an MSR test to exercise guest/host and read/write KVM: x86: Define AMD's #HV, #VC, and #SX exception vectors KVM: x86: Define Control Protection Exception (#CP) vector KVM: x86: Add human friendly formatting for #XM, and #VE KVM: SVM: Enable shadow stack virtualization for SVM KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid KVM: SVM: Pass through shadow stack MSRs as appropriate KVM: SVM: Update dump_vmcb with shadow stack save area additions KVM: nSVM: Save/load CET Shadow Stack state to/from vmcb12/vmcb02 ...
2025-10-02Merge tag 'mm-stable-2025-10-01-19-00' of ↵Linus Torvalds3-16/+4
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - "mm, swap: improve cluster scan strategy" from Kairui Song improves performance and reduces the failure rate of swap cluster allocation - "support large align and nid in Rust allocators" from Vitaly Wool permits Rust allocators to set NUMA node and large alignment when perforning slub and vmalloc reallocs - "mm/damon/vaddr: support stat-purpose DAMOS" from Yueyang Pan extend DAMOS_STAT's handling of the DAMON operations sets for virtual address spaces for