summaryrefslogtreecommitdiff
path: root/arch/powerpc/include
AgeCommit message (Collapse)AuthorFilesLines
2024-10-10powerpc/vdso: Fix VDSO data access when running in a non-root time namespaceChristophe Leroy1-0/+15
[ Upstream commit c73049389e58c01e2e3bbfae900c8daeee177191 ] When running in a non-root time namespace, the global VDSO data page is replaced by a dedicated namespace data page and the global data page is mapped next to it. Detailed explanations can be found at commit 660fd04f9317 ("lib/vdso: Prepare for time namespace support"). When it happens, __kernel_get_syscall_map and __kernel_get_tbfreq and __kernel_sync_dicache don't work anymore because they read 0 instead of the data they need. To address that, clock_mode has to be read. When it is set to VDSO_CLOCKMODE_TIMENS, it means it is a dedicated namespace data page and the global data is located on the following page. Add a macro called get_realdatapage which reads clock_mode and add PAGE_SIZE to the pointer provided by get_datapage macro when clock_mode is equal to VDSO_CLOCKMODE_TIMENS. Use this new macro instead of get_datapage macro except for time functions as they handle it internally. Fixes: 74205b3fc2ef ("powerpc/vdso: Add support for time namespaces") Reported-by: Jason A. Donenfeld <Jason@zx2c4.com> Closes: https://lore.kernel.org/all/ZtnYqZI-nrsNslwy@zx2c4.com/ Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-04powerpc/atomic: Use YZ constraints for DS-form instructionsMichael Ellerman3-8/+10
commit 39190ac7cff1fd15135fa8e658030d9646fdb5f2 upstream. The 'ld' and 'std' instructions require a 4-byte aligned displacement because they are DS-form instructions. But the "m" asm constraint doesn't enforce that. That can lead to build errors if the compiler chooses a non-aligned displacement, as seen with GCC 14: /tmp/ccuSzwiR.s: Assembler messages: /tmp/ccuSzwiR.s:2579: Error: operand out of domain (39 is not a multiple of 4) make[5]: *** [scripts/Makefile.build:229: net/core/page_pool.o] Error 1 Dumping the generated assembler shows: ld 8,39(8) # MEM[(const struct atomic64_t *)_29].counter, t Use the YZ constraints to tell the compiler either to generate a DS-form displacement, or use an X-form instruction, either of which prevents the build error. See commit 2d43cc701b96 ("powerpc/uaccess: Fix build errors seen with GCC 13/14") for more details on the constraint letters. Fixes: 9f0cbea0d8cc ("[POWERPC] Implement atomic{, 64}_{read, write}() without volatile") Cc: stable@vger.kernel.org # v2.6.24+ Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Closes: https://lore.kernel.org/all/20240913125302.0a06b4c7@canb.auug.org.au Tested-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240916120510.2017749-1-mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-22powerpc/mm: Fix return type of pgd_val()Christophe Leroy2-5/+11
Commit 6b0e82791bd0 ("powerpc/e500: switch to 64 bits PGD on 85xx (32 bits)") switched PGD entries to 64 bits, but pgd_val() returns an unsigned long which is 32 bits on PPC32. This is not a problem for regular PMD entries because the upper part is always NULL, but when PMD entries are leaf they contain 64 bits values, so pgd_val() must return an unsigned long long instead of an unsigned long. Also change the condition to CONFIG_PPC_85xx instead of CONFIG_PPC_E500 as the change was meant for 32 bits only. Allthough this should be harmless on PPC64, it generates a warning with pgd_ERROR print. Fixes: 6b0e82791bd0 ("powerpc/e500: switch to 64 bits PGD on 85xx (32 bits)") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/45f8fdf298ec3df7573b66d21b03a5cda92e2cb1.1724313510.git.christophe.leroy@csgroup.eu
2024-08-13powerpc/topology: Check if a core is onlineNysal Jan K.A1-0/+13
topology_is_core_online() checks if the core a CPU belongs to is online. The core is online if at least one of the sibling CPUs is online. The first CPU of an online core is also online in the common case, so this should be fairly quick. Fixes: 73c58e7e1412 ("powerpc: Add HOTPLUG_SMT support") Signed-off-by: Nysal Jan K.A <nysal@linux.ibm.com> Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240731030126.956210-3-nysal@linux.ibm.com
2024-07-25Merge tag 'driver-core-6.11-rc1' of ↵Linus Torvalds2-10/+2
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the big set of driver core changes for 6.11-rc1. Lots of stuff in here, with not a huge diffstat, but apis are evolving which required lots of files to be touched. Highlights of the changes in here are: - platform remove callback api final fixups (Uwe took many releases to get here, finally!) - Rust bindings for basic firmware apis and initial driver-core interactions. It's not all that useful for a "write a whole driver in rust" type of thing, but the firmware bindings do help out the phy rust drivers, and the driver core bindings give a solid base on which others can start their work. There is still a long way to go here before we have a multitude of rust drivers being added, but it's a great first step. - driver core const api changes. This reached across all bus types, and there are some fix-ups for some not-common bus types that linux-next and 0-day testing shook out. This work is being done to help make the rust bindings more safe, as well as the C code, moving toward the end-goal of allowing us to put driver structures into read-only memory. We aren't there yet, but are getting closer. - minor devres cleanups and fixes found by code inspection - arch_topology minor changes - other minor driver core cleanups All of these have been in linux-next for a very long time with no reported problems" * tag 'driver-core-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (55 commits) ARM: sa1100: make match function take a const pointer sysfs/cpu: Make crash_hotplug attribute world-readable dio: Have dio_bus_match() callback take a const * zorro: make match function take a const pointer driver core: module: make module_[add|remove]_driver take a const * driver core: make driver_find_device() take a const * driver core: make driver_[create|remove]_file take a const * firmware_loader: fix soundness issue in `request_internal` firmware_loader: annotate doctests as `no_run` devres: Correct code style for functions that return a pointer type devres: Initialize an uninitialized struct member devres: Fix memory leakage caused by driver API devm_free_percpu() devres: Fix devm_krealloc() wasting memory driver core: platform: Switch to use kmemdup_array() driver core: have match() callback in struct bus_type take a const * MAINTAINERS: add Rust device abstractions to DRIVER CORE device: rust: improve safety comments MAINTAINERS: add Danilo as FIRMWARE LOADER maintainer MAINTAINERS: add Rust FW abstractions to FIRMWARE LOADER firmware: rust: improve safety comments ...
2024-07-21Merge tag 'mm-stable-2024-07-21-14-50' of ↵Linus Torvalds22-329/+195
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - In the series "mm: Avoid possible overflows in dirty throttling" Jan Kara addresses a couple of issues in the writeback throttling code. These fixes are also targetted at -stable kernels. - Ryusuke Konishi's series "nilfs2: fix potential issues related to reserved inodes" does that. This should actually be in the mm-nonmm-stable tree, along with the many other nilfs2 patches. My bad. - More folio conversions from Kefeng Wang in the series "mm: convert to folio_alloc_mpol()" - Kemeng Shi has sent some cleanups to the writeback code in the series "Add helper functions to remove repeated code and improve readability of cgroup writeback" - Kairui Song has made the swap code a little smaller and a little faster in the series "mm/swap: clean up and optimize swap cache index". - In the series "mm/memory: cleanly support zeropage in vm_insert_page*(), vm_map_pages*() and vmf_insert_mixed()" David Hildenbrand has reworked the rather sketchy handling of the use of the zeropage in MAP_SHARED mappings. I don't see any runtime effects here - more a cleanup/understandability/maintainablity thing. - Dev Jain has improved selftests/mm/va_high_addr_switch.c's handling of higher addresses, for aarch64. The (poorly named) series is "Restructure va_high_addr_switch". - The core TLB handling code gets some cleanups and possible slight optimizations in Bang Li's series "Add update_mmu_tlb_range() to simplify code". - Jane Chu has improved the handling of our fake-an-unrecoverable-memory-error testing feature MADV_HWPOISON in the series "Enhance soft hwpoison handling and injection". - Jeff Johnson has sent a billion patches everywhere to add MODULE_DESCRIPTION() to everything. Some landed in this pull. - In the series "mm: cleanup MIGRATE_SYNC_NO_COPY mode", Kefeng Wang has simplified migration's use of hardware-offload memory copying. - Yosry Ahmed performs more folio API conversions in his series "mm: zswap: trivial folio conversions". - In the series "large folios swap-in: handle refault cases first", Chuanhua Han inches us forward in the handling of large pages in the swap code. This is a cleanup and optimization, working toward the end objective of full support of large folio swapin/out. - In the series "mm,swap: cleanup VMA based swap readahead window calculation", Huang Ying has contributed some cleanups and a possible fixlet to his VMA based swap readahead code. - In the series "add mTHP support for anonymous shmem" Baolin Wang has taught anonymous shmem mappings to use multisize THP. By default this is a no-op - users must opt in vis sysfs controls. Dramatic improvements in pagefault latency are realized. - David Hildenbrand has some cleanups to our remaining use of page_mapcount() in the series "fs/proc: move page_mapcount() to fs/proc/internal.h". - David also has some highmem accounting cleanups in the series "mm/highmem: don't track highmem pages manually". - Build-time fixes and cleanups from John Hubbard in the series "cleanups, fixes, and progress towards avoiding "make headers"". - Cleanups and consolidation of the core pagemap handling from Barry Song in the series "mm: introduce pmd|pte_needs_soft_dirty_wp helpers and utilize them". - Lance Yang's series "Reclaim lazyfree THP without splitting" has reduced the latency of the reclaim of pmd-mapped THPs under fairly common circumstances. A 10x speedup is seen in a microbenchmark. It does this by punting to aother CPU but I guess that's a win unless all CPUs are pegged. - hugetlb_cgroup cleanups from Xiu Jianfeng in the series "mm/hugetlb_cgroup: rework on cftypes". - Miaohe Lin's series "Some cleanups for memory-failure" does just that thing. - Someone other than SeongJae has developed a DAMON feature in Honggyu Kim's series "DAMON based tiered memory management for CXL memory". This adds DAMON features which may be used to help determine the efficiency of our placement of CXL/PCIe attached DRAM. - DAMON user API centralization and simplificatio work in SeongJae Park's series "mm/damon: introduce DAMON parameters online commit function". - In the series "mm: page_type, zsmalloc and page_mapcount_reset()" David Hildenbrand does some maintenance work on zsmalloc - partially modernizing its use of pageframe fields. - Kefeng Wang provides more folio conversions in the series "mm: remove page_maybe_dma_pinned() and page_mkclean()". - More cleanup from David Hildenbrand, this time in the series "mm/memory_hotplug: use PageOffline() instead of PageReserved() for !ZONE_DEVICE". It "enlightens memory hotplug more about PageOffline() pages" and permits the removal of some virtio-mem hacks. - Barry Song's series "mm: clarify folio_add_new_anon_rmap() and __folio_add_anon_rmap()" is a cleanup to the anon folio handling in preparation for mTHP (multisize THP) swapin. - Kefeng Wang's series "mm: improve clear and copy user folio" implements more folio conversions, this time in the area of large folio userspace copying. - The series "Docs/mm/damon/maintaier-profile: document a mailing tool and community meetup series" tells people how to get better involved with other DAMON developers. From SeongJae Park. - A large series ("kmsan: Enable on s390") from Ilya Leoshkevich does that. - David Hildenbrand sends along more cleanups, this time against the migration code. The series is "mm/migrate: move NUMA hinting fault folio isolation + checks under PTL". - Jan Kara has found quite a lot of strangenesses and minor errors in the readahead code. He addresses this in the series "mm: Fix various readahead quirks". - SeongJae Park's series "selftests/damon: test DAMOS tried regions and {min,max}_nr_regions" adds features and addresses errors in DAMON's self testing code. - Gavin Shan has found a userspace-triggerable WARN in the pagecache code. The series "mm/filemap: Limit page cache size to that supported by xarray" addresses this. The series is marked cc:stable. - Chengming Zhou's series "mm/ksm: cmp_and_merge_page() optimizations and cleanup" cleans up and slightly optimizes KSM. - Roman Gushchin has separated the memcg-v1 and memcg-v2 code - lots of code motion. The series (which also makes the memcg-v1 code Kconfigurable) are "mm: memcg: separate legacy cgroup v1 code and put under config option" and "mm: memcg: put cgroup v1-specific memcg data under CONFIG_MEMCG_V1" - Dan Schatzberg's series "Add swappiness argument to memory.reclaim" adds an additional feature to this cgroup-v2 control file. - The series "Userspace controls soft-offline pages" from Jiaqi Yan permits userspace to stop the kernel's automatic treatment of excessive correctable memory errors. In order to permit userspace to monitor and handle this situation. - Kefeng Wang's series "mm: migrate: support poison recover from migrate folio" teaches the kernel to appropriately handle migration from poisoned source folios rather than simply panicing. - SeongJae Park's series "Docs/damon: minor fixups and improvements" does those things. - In the series "mm/zsmalloc: change back to per-size_class lock" Chengming Zhou improves zsmalloc's scalability and memory utilization. - Vivek Kasireddy's series "mm/gup: Introduce memfd_pin_folios() for pinning memfd folios" makes the GUP code use FOLL_PIN rather than bare refcount increments. So these paes can first be moved aside if they reside in the movable zone or a CMA block. - Andrii Nakryiko has added a binary ioctl()-based API to /proc/pid/maps for much faster reading of vma information. The series is "query VMAs from /proc/<pid>/maps". - In the series "mm: introduce per-order mTHP split counters" Lance Yang improves the kernel's presentation of developer information related to multisize THP splitting. - Michael Ellerman has developed the series "Reimplement huge pages without hugepd on powerpc (8xx, e500, book3s/64)". This permits userspace to use all available huge page sizes. - In the series "revert unconditional slab and page allocator fault injection calls" Vlastimil Babka removes a performance-affecting and not very useful feature from slab fault injection. * tag 'mm-stable-2024-07-21-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (411 commits) mm/mglru: fix ineffective protection calculation mm/zswap: fix a white space issue mm/hugetlb: fix kernel NULL pointer dereference when migrating hugetlb folio mm/hugetlb: fix possible recursive locking detected warning mm/gup: clear the LRU flag of a page before adding to LRU batch mm/numa_balancing: teach mpol_to_str about the balancing mode mm: memcg1: convert charge move flags to unsigned long long alloc_tag: fix page_ext_get/page_ext_put sequence during page splitting lib: reuse page_ext_data() to obtain codetag_ref lib: add missing newline character in the warning message mm/mglru: fix overshooting shrinker memory mm/mglru: fix div-by-zero in vmpressure_calc_level() mm/kmemleak: replace strncpy() with strscpy() mm, page_alloc: put should_fail_alloc_page() back behing CONFIG_FAIL_PAGE_ALLOC mm, slab: put should_failslab() back behind CONFIG_SHOULD_FAILSLAB mm: ignore data-race in __swap_writepage hugetlbfs: ensure generic_hugetlb_get_unmapped_area() returns higher address than mmap_min_addr mm: shmem: rename mTHP shmem counters mm: swap_state: use folio_alloc_mpol() in __read_swap_cache_async() mm/migrate: putback split folios when numa hint migration fails ...
2024-07-20Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-1/+0
Pull kvm updates from Paolo Bonzini: "ARM: - Initial infrastructure for shadow stage-2 MMUs, as part of nested virtualization enablement - Support for userspace changes to the guest CTR_EL0 value, enabling (in part) migration of VMs between heterogenous hardware - Fixes + improvements to pKVM's FF-A proxy, adding support for v1.1 of the protocol - FPSIMD/SVE support for nested, including merged trap configuration and exception routing - New command-line parameter to control the WFx trap behavior under KVM - Introduce kCFI hardening in the EL2 hypervisor - Fixes + cleanups for handling presence/absence of FEAT_TCRX - Miscellaneous fixes + documentation updates LoongArch: - Add paravirt steal time support - Add support for KVM_DIRTY_LOG_INITIALLY_SET - Add perf kvm-stat support for loongarch RISC-V: - Redirect AMO load/store access fault traps to guest - perf kvm stat support - Use guest files for IMSIC virtualization, when available s390: - Assortment of tiny fixes which are not time critical x86: - Fixes for Xen emulation - Add a global struct to consolidate tracking of host values, e.g. EFER - Add KVM_CAP_X86_APIC_BUS_CYCLES_NS to allow configuring the effective APIC bus frequency, because TDX - Print the name of the APICv/AVIC inhibits in the relevant tracepoint - Clean up KVM's handling of vendor specific emulation to consistently act on "compatible with Intel/AMD", versus checking for a specific vendor - Drop MTRR virtualization, and instead always honor guest PAT on CPUs that support self-snoop - Update to the newfangled Intel CPU FMS infrastructure - Don't advertise IA32_PERF_GLOBAL_OVF_CTRL as an MSR-to-be-saved, as it reads '0' and writes from userspace are ignored - Misc cleanups x86 - MMU: - Small cleanups, renames and refactoring extracted from the upcoming Intel TDX support - Don't allocate kvm_mmu_page.shadowed_translation for shadow pages that can't hold leafs SPTEs - Unconditionally drop mmu_lock when allocating TDP MMU page tables for eager page splitting, to avoid stalling vCPUs when splitting huge pages - Bug the VM instead of simply warning if KVM tries to split a SPTE that is non-present or not-huge. KVM is guaranteed to end up in a broken state because the callers fully expect a valid SPTE, it's all but dangerous to let more MMU changes happen afterwards x86 - AMD: - Make per-CPU save_area allocations NUMA-aware - Force sev_es_host_save_area() to be inlined to avoid calling into an instrumentable function from noinstr code - Base support for running SEV-SNP guests. API-wise, this includes a new KVM_X86_SNP_VM type, encrypting/measure the initial image into guest memory, and finalizing it before launching it. Internally, there are some gmem/mmu hooks needed to prepare gmem-allocated pages before mapping them into guest private memory ranges This includes basic support for attestation guest requests, enough to say that KVM supports the GHCB 2.0 specification There is no support yet for loading into the firmware those signing keys to be used for attestation requests, and therefore no need yet for the host to provide certificate data for those keys. To support fetching certificate data from userspace, a new KVM exit type will be needed to handle fetching the certificate from userspace. An attempt to define a new KVM_EXIT_COCO / KVM_EXIT_COCO_REQ_CERTS exit type to handle this was introduced in v1 of this patchset, but is still being discussed by community, so for now this patchset only implements a stub version of SNP Extended Guest Requests that does not provide certificate data x86 - Intel: - Remove an unnecessary EPT TLB flush when enabling hardware - Fix a series of bugs that cause KVM to fail to detect nested pending posted interrupts as valid wake eents for a vCPU executing HLT in L2 (with HLT-exiting disable by L1) - KVM: x86: Suppress MMIO that is triggered during task switch emulation Explicitly suppress userspace emulated MMIO exits that are triggered when emulating a task switch as KVM doesn't support userspace MMIO during complex (multi-step) emulation Silently ignoring the exit request can result in the WARN_ON_ONCE(vcpu->mmio_needed) firing if KVM exits to userspace for some other reason prior to purging mmio_needed See commit 0dc902267cb3 ("KVM: x86: Suppress pending MMIO write exits if emulator detects exception") for more details on KVM's limitations with respect to emulated MMIO during complex emulator flows Generic: - Rename the AS_UNMOVABLE flag that was introduced for KVM to AS_INACCESSIBLE, because the special casing needed by these pages is not due to just unmovability (and in fact they are only unmovable because the CPU cannot access them) - New ioctl to populate the KVM page tables in advance, which is useful to mitigate KVM page faults during guest boot or after live migration. The code will also be used by TDX, but (probably) not through the ioctl - Enable halt poll shrinking by default, as Intel found it to be a clear win - Setup empty IRQ routing when creating a VM to avoid having to synchronize SRCU when creating a split IRQCHIP on x86 - Rework the sched_in/out() paths to replace kvm_arch_sched_in() with a flag that arch code can use for hooking both sched_in() and sched_out() - Take the vCPU @id as an "unsigned long" instead of "u32" to avoid truncating a bogus value from userspace, e.g. to help userspace detect bugs - Mark a vCPU as preempted if and only if it's scheduled out while in the KVM_RUN loop, e.g. to avoid marking it preempted and thus writing guest memory when retrieving guest state during live migration blackout Selftests: - Remove dead code in the memslot modification stress test - Treat "branch instructions retired" as supported on all AMD Family 17h+ CPUs - Print the guest pseudo-RNG seed only when it changes, to avoid spamming the log for tests that create lots of VMs - Make the PMU counters test less flaky when counting LLC cache misses by doing CLFLUSH{OPT} in every loop iteration" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (227 commits) crypto: ccp: Add the SNP_VLEK_LOAD command KVM: x86/pmu: Add kvm_pmu_call() to simplify static calls of kvm_pmu_ops KVM: x86: Introduce kvm_x86_call() to simplify static calls of kvm_x86_ops KVM: x86: Replace static_call_cond() with static_call() KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event x86/sev: Move sev_guest.h into common SEV header KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event KVM: x86: Suppress MMIO that is triggered during task switch emulation KVM: x86/mmu: Clean up make_huge_page_split_spte() definition and intro KVM: x86/mmu: Bug the VM if KVM tries to split a !hugepage SPTE KVM: selftests: x86: Add test for KVM_PRE_FAULT_MEMORY KVM: x86: Implement kvm_arch_vcpu_pre_fault_memory() KVM: x86/mmu: Make kvm_mmu_do_page_fault() return mapped level KVM: x86/mmu: Account pf_{fixed,emulate,spurious} in callers of "do page fault" KVM: x86/mmu: Bump pf_taken stat only in the "real" page fault handler KVM: Add KVM_PRE_FAULT_MEMORY vcpu ioctl to pre-populate guest memory KVM: Document KVM_PRE_FAULT_MEMORY ioctl mm, virt: merge AS_UNMOVABLE and AS_INACCESSIBLE perf kvm: Add kvm-stat for loongarch64 LoongArch: KVM: Add PV steal time support in guest side ...
2024-07-19Merge tag 'powerpc-6.11-1' of ↵Linus Torvalds30-360/+70
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Remove support for 40x CPUs & platforms - Add support to the 64-bit BPF JIT for cpu v4 instructions - Fix PCI hotplug driver crash on powernv - Fix doorbell emulation for KVM on PAPR guests (nestedv2) - Fix KVM nested guest handling of some less used SPRs - Online NUMA nodes with no CPU/memory if they have a PCI device attached - Reduce memory overhead of enabling kfence on 64-bit Radix MMU kernels - Reimplement the iommu table_group_ops for pseries for VFIO SPAPR TCE Thanks to: Anjali K, Artem Savkov, Athira Rajeev, Breno Leitao, Brian King, Celeste Liu, Christophe Leroy, Esben Haabendal, Gaurav Batra, Gautam Menghani, Haren Myneni, Hari Bathini, Jeff Johnson, Krishna Kumar, Krzysztof Kozlowski, Nathan Lynch, Nicholas Piggin, Nick Bowler, Nilay Shroff, Rob Herring (Arm), Shawn Anastasio, Shivaprasad G Bhat, Sourabh Jain, Srikar Dronamraju, Timothy Pearson, Uwe Kleine-König, and Vaibhav Jain. * tag 'powerpc-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (57 commits) Documentation/powerpc: Mention 40x is removed powerpc: Remove 40x leftovers macintosh/therm_windtunnel: fix module unload. powerpc: Check only single values are passed to CPU/MMU feature checks powerpc/xmon: Fix disassembly CPU feature checks powerpc: Drop clang workaround for builtin constant checks powerpc64/bpf: jit support for signed division and modulo powerpc64/bpf: jit support for sign extended mov powerpc64/bpf: jit support for sign extended load powerpc64/bpf: jit support for unconditional byte swap powerpc64/bpf: jit support for 32bit offset jmp instruction powerpc/pci: Hotplug driver bridge support pci/hotplug/pnv_php: Fix hotplug driver crash on Powernv powerpc/configs: Update defconfig with now user-visible CONFIG_FSL_IFC powerpc: add missing MODULE_DESCRIPTION() macros macintosh/mac_hid: add MODULE_DESCRIPTION() KVM: PPC: add missing MODULE_DESCRIPTION() macros powerpc/kexec: Use of_property_read_reg() powerpc/64s/radix/kfence: map __kfence_pool at page granularity powerpc/pseries/iommu: Define spapr_tce_table_group_ops only with CONFIG_IOMMU_API ...
2024-07-18Merge tag 'ftrace-v6.11' of ↵Linus Torvalds1-2/+0
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull ftrace updates from Steven Rostedt: "Rewrite of function graph tracer to allow multiple users Up until now, the function graph tracer could only have a single user attached to it. If another user tried to attach to the function graph tracer while one was already attached, it would fail. Allowing function graph tracer to have more than one user has been asked for since 2009, but it required a rewrite to the logic to pull it off so it never happened. Until now! There's three systems that trace the return of a function. That is kretprobes, function graph tracer, and BPF. kretprobes and function graph tracing both do it similarly. The difference is that kretprobes uses a shadow stack per callback and function graph tracer creates a shadow stack for all tasks. The function graph tracer method makes it possible to trace the return of all functions. As kretprobes now needs that feature too, allowing it to use function graph tracer was needed. BPF also wants to trace the return of many probes and its method doesn't scale either. Having it use function graph tracer would improve that. By allowing function graph tracer to have multiple users allows both kretprobes and BPF to use function graph tracer in these cases. This will allow kretprobes code to be removed in the future as it's version will no longer be needed. Note, function graph tracer is only limited to 16 simultaneous users, due to shadow stack size and allocated slots" * tag 'ftrace-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (49 commits) fgraph: Use str_plural() in test_graph_storage_single() function_graph: Add READ_ONCE() when accessing fgraph_array[] ftrace: Add missing kerneldoc parameters to unregister_ftrace_direct() function_graph: Everyone uses HAVE_FUNCTION_GRAPH_RET_ADDR_PTR, remove it function_graph: Fix up ftrace_graph_ret_addr() function_graph: Make fgraph_update_pid_func() a stub for !DYNAMIC_FTRACE function_graph: Rename BYTE_NUMBER to CHAR_NUMBER in selftests fgraph: Remove some unused functions ftrace: Hide one more entry in stack trace when ftrace_pid is enabled function_graph: Do not update pid func if CONFIG_DYNAMIC_FTRACE not enabled function_graph: Make fgraph_do_direct static key static ftrace: Fix prototypes for ftrace_startup/shutdown_subops() ftrace: Assign RCU list variable with rcu_assign_ptr() ftrace: Assign ftrace_list_end to ftrace_ops_list type cast to RCU ftrace: Declare function_trace_op in header to quiet sparse warning ftrace: Add comments to ftrace_hash_move() and friends ftrace: Convert "inc" parameter to bool in ftrace_hash_rec_update_modify() ftrace: Add comments to ftrace_hash_rec_disable/enable() ftrace: Remove "filter_hash" parameter from __ftrace_hash_rec_update() ftrace: Rename dup_hash() and comment it ...
2024-07-18Merge branch 'topic/ppc-kvm' into nextMichael Ellerman6-4/+22
Merge the powerpc KVM topic branch.
2024-07-12powerpc/mm: remove hugepd leftoversChristophe Leroy4-32/+0
All targets have now opted out of CONFIG_ARCH_HAS_HUGEPD so remove left over code. Link: https://lkml.kernel.org/r/39c0d0adee6790fc42cee9f458e05fb95136c3dd.1719928057.git.christophe.leroy@csgroup.eu Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Oscar Salvador <osalvador@suse.de> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-12powerpc/64s: use contiguous PMD/PUD instead of HUGEPDChristophe Leroy9-142/+56
On book3s/64, the only user of hugepd is hash in 4k mode. All other setups (hash-64, radix-4, radix-64) use leaf PMD/PUD. Rework hash-4k to use contiguous PMD and PUD instead. In that setup there are only two huge page sizes: 16M and 16G. 16M sits at PMD level and 16G at PUD level. pte_update doesn't know page size, lets use the same trick as hpte_need_flush() to get page size from segment properties. That's not the most efficient way but let's do that until callers of pte_update() provide page size instead of just a huge flag. Link: https://lkml.kernel.org/r/7448f60a9b3efd396595f4f735d1e0babc5ae379.1719928057.git.christophe.leroy@csgroup.eu Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-12powerpc/e500: use contiguous PMD instead of hugepdChristophe Leroy5-56/+68
e500 supports many page sizes among which the following size are implemented in the kernel at the time being: 4M, 16M, 64M, 256M, 1G. On e500, TLB miss for hugepages is exclusively handled by SW even on e6500 which has HW assistance for 4k pages, so there are no constraints like on the 8xx. On e500/32, all are at PGD/PMD level and can be handled as cont-PMD. On e500/64, smaller ones are on PMD while bigger ones are on PUD. Again, they can easily be handled as cont-PMD and cont-PUD instead of hugepd. On e500/32, use the pagesize bits in PTE to know if it is a PMD or a leaf entry. This works because the pagesize bits are in the last 12 bits and page tables are 4k aligned. On e500/64, use highest bit which is always 1 on PxD (Because PxD contains virtual address of a kernel memory) and always 0 on PTEs because not all bits of RPN are used/possible. Link: https://lkml.kernel.org/r/dd085987816ed2a0c70adb7e34966cb833fc03e1.1719928057.git.christophe.leroy@csgroup.eu Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-12powerpc/e500: encode hugepage size in PTE bitsChristophe Leroy2-15/+22
Use PTE page size bits to encode hugepage size with the following format corresponding to the values expected in bits 52-55 in MAS1 register. Those bits are called TSIZE: 0001 4 Kbyte 0010 16 Kbyte 0011 64 Kbyte 0100 256 Kbyte 0101 1 Mbyte 0110 4 Mbyte 0111 16 Mbyte 1000 64 Mbyte 1001 256 Mbyte 1010 1 Gbyte 1011 4 Gbyte 1100 16 Gbyte 1101 64 Gbyte 1110 256 Gbyte 1111 1 Tbyte It corresponds to shift value minus 10 with lowest bit removed. It is not the value expected in the PTE in that field, but only e6500 performs HW based TLB loading and the e6500 reference manual explicitely says that this field is ignored. Also add pte_huge_size() which will be used later. Link: https://lkml.kernel.org/r/6f7ce82fa8c381d55f65342d77060fc55802e612.1719928057.git.christophe.leroy@csgroup.eu Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-12powerpc/e500: switch to 64 bits PGD on 85xx (32 bits)Christophe Leroy1-0/+4
At the time being when CONFIG_PTE_64BIT is selected, PTE entries are 64 bits but PGD entries are still 32 bits. In order to allow leaf PMD entries, switch the PGD to 64 bits entries. Link: https://lkml.kernel.org/r/ca85397df02564e5edc3a3c27b55cf43af3e4ef3.1719928057.git.christophe.leroy@csgroup.eu Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-12powerpc/e500: remove enc and ind fields from struct mmu_psize_defChristophe Leroy1-3/+0
enc field is hidden behind BOOK3E_PAGESZ_XX macros, and when you look closer you realise that this field is nothing else than the value of shift minus ten. So remove enc field and calculate tsize from shift field. Also remove inc field which is unused. Link: https://lkml.kernel.org/r/e99136779b5b0829c2c60d37f305a1410c65cf9b.1719928057.git.christophe.leroy@csgroup.eu Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-12powerpc/8xx: simplify struct mmu_psize_defChristophe Leroy1-7/+2
On 8xx, only the shift field is used in struct mmu_psize_def Remove other fields and related macros. Link: https://lkml.kernel.org/r/dd0587a9e8354005858c7f8c9a775ad05523b314.1719928057.git.christophe.leroy@csgroup.eu Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-12powerpc/8xx: rework support for 8M pages using contiguous PTE entriesChristophe Leroy5-58/+45
In order to fit better with standard Linux page tables layout, add support for 8M pages using contiguous PTE entries in a standard page table. Page tables will then be populated with 1024 similar entries and two PMD entries will point to that page table. The PMD entries also get a flag to tell it is addressing an 8M page, this is required for the HW tablewalk assistance. Link: https://lkml.kernel.org/r/8693d9a0408371043ca63bf9e4a9c140667af63e.1719928057.git.christophe.leroy@csgroup.eu Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-12powerpc/mm: allow hugepages without hugepdChristophe Leroy3-11/+3
In preparation of implementing huge pages on powerpc 8xx without hugepd, enclose hugepd related code inside an ifdef CONFIG_ARCH_HAS_HUGEPD This also allows removing some stubs. Link: https://lkml.kernel.org/r/ada097ca8a4fa85a77f51719516ef2478800d77a.1719928057.git.christophe.leroy@csgroup.eu Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-12powerpc/mm: remove _PAGE_PSIZEChristophe Leroy4-12/+3
_PAGE_PSIZE macro is never used outside the place it is defined and is used only on 8xx and e500. Remove indirection, remove it and use its content directly. Link: https://lkml.kernel.org/r/c41da3b0ceda7311a50f0391cc4d54302ae15b74.1719928057.git.christophe.leroy@csgroup.eu Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-12powerpc/64e: remove unused IBM HTW codeMichael Ellerman1-2/+1
Patch series "Reimplement huge pages without hugepd on powerpc (8xx, e500, book3s/64)", v7. Unlike most architectures, powerpc 8xx HW requires a two-level pagetable topology for all page sizes. So a leaf PMD-contig approach is not feasible as such. Possible sizes on 8xx are 4k, 16k, 512k and 8M. First level (PGD/PMD) covers 4M per entry. For 8M pages, two PMD entries must point to a single entry level-2 page table. Until now that was done using hugepd. This series changes it to use standard page tables where the entry is replicated 1024 times on each of the two pagetables refered by the two associated PMD entries for that 8M page. For e500 and book3s/64 there are less constraints because it is not tied to the HW assisted tablewalk like on 8xx, so it is easier to use leaf PMDs (and PUDs). On e500 the supported page sizes are 4M, 16M, 64M, 256M and 1G. All at PMD level on e500/32 (mpc85xx) and mix of PMD and PUD for e500/64. We encode page size with 4 available bits in PTE entries. On e300/32 PGD entries size is increases to 64 bits in order to allow leaf-PMD entries because PTE are 64 bits on e500. On book3s/64 only the hash-4k mode is concerned. It supports 16M pages as cont-PMD and 16G pages as cont-PUD. In other modes (radix-4k, radix-6k and hash-64k) the sizes match with PMD and PUD sizes so that's just leaf entries. The hash processing make things a bit more complex. To ease things, __hash_page_huge() is modified to bail out when DIRTY or ACCESSED bits are missing, leaving it to mm core to fix it. This patch (of 23): The nohash HTW_IBM (Hardware Table Walk) code is unused since support for A2 was removed in commit fb5a515704d7 ("powerpc: Remove platforms/ wsp and associated pieces") (2014). The remaining supported CPUs use either no HTW (data_tlb_miss_bolted), or the e6500 HTW (data_tlb_miss_e6500). Link: https://lkml.kernel.org/r/cover.1719928057.git.christophe.leroy@csgroup.eu Link: https://lkml.kernel.org/r/820dd1385ecc931f07b0d7a0fa827b1613917ab6.1719928057.git.christophe.leroy@csgroup.eu Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-11powerpc: Check only single values are passed to CPU/MMU feature checksMichael Ellerman2-0/+2
cpu_has_feature()/mmu_has_feature() are only able to check a single feature at a time, but there is no enforcement of that. In fact, as fixed in the previous commit, there was code that was passing multiple values to cpu_has_feature(). So add a check that only a single feature is passed using popcount. Note that the test allows 0 or 1 bits to be set, because some code relies on cpu_has_feature(0) being false, the check with CPU_FTRS_POSSIBLE ensures that. See for example CPU_FTR_PPC_LE. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240509121248.270878-3-mpe@ellerman.id.au
2024-07-11powerpc: Drop clang workaround for builtin constant checksMichael Ellerman2-4/+0
The CPU/MMU feature code has build-time checks that the feature value is a builtin constant. Back when the code was added clang wasn't able to compile the checks, so an ifdef was added to avoid the checks for clang builds. See commit b5fa0f7f88ed ("powerpc: Fix build failure with clang due to BUILD_BUG_ON()") These days clang 13 and later are able to build the checks successfully, so drop the workaround. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240509121248.270878-1-mpe@ellerman.id.au
2024-07-11powerpc64/bpf: jit support for signed division and moduloArtem Savkov1-0/+1
Add jit support for sign division and modulo. Tested using test_bpf module. Signed-off-by: Artem Savkov <asavkov@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240517075650.248801-6-asavkov@redhat.com
2024-07-11powerpc64/bpf: jit support for sign extended loadArtem Savkov1-0/+1
Add jit support for sign extended load. Tested using test_bpf module. Signed-off-by: Artem Savkov <asavkov@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240517075650.248801-4-asavkov@redhat.com
2024-07-10clone3: drop __ARCH_WANT_SYS_CLONE3 macroArnd Bergmann1-1/+0
When clone3() was introduced, it was not obvious how each architecture deals with setting up the stack and keeping the register contents in a fork()-like system call, so this was left for the architecture maintainers to implement, with __ARCH_WANT_SYS_CLONE3 defined by those that already implement it. Five years later, we still have a few architectures left that are missing clone3(), and the macro keeps getting in the way as it's fundamentally different from all the other __ARCH_WANT_SYS_* macros that are meant to provide backwards-compatibility with applications using older syscalls that are no longer provided by default. Address this by reversing the polarity of the macro, adding an __ARCH_BROKEN_SYS_CLONE3 macro to all architectures that don't already provide the syscall, and remove __ARCH_WANT_SYS_CLONE3 from all the other ones. Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-07-04powerpc/64s/radix/kfence: map __kfence_pool at page granularityHari Bathini1-1/+10
When KFENCE is enabled, total system memory is mapped at page level granularity. But in radix MMU mode, ~3GB additional memory is needed to map 100GB of system memory at page level granularity when compared to using 2MB direct mapping.This is not desired considering KFENCE is designed to be enabled in production kernels [1]. Mapping only the memory allocated for KFENCE pool at page granularity is sufficient to enable KFENCE support. So, allocate __kfence_pool during bootup and map it at page granularity instead of mapping all system memory at page granularity. Without patch: # cat /proc/meminfo MemTotal: 101201920 kB With patch: # cat /proc/meminfo MemTotal: 104483904 kB Note that enabling KFENCE at runtime is disabled for radix MMU for now, as it depends on the ability to split page table mappings and such APIs are not currently implemented for radix MMU. All kfence_test.c testcases passed with this patch. [1] https://lore.kernel.org/all/20201103175841.3495947-2-elver@google.com/ Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240701130021.578240-1-hbathini@linux.ibm.com
2024-07-03driver core: have match() callback in struct bus_type take a const *Greg Kroah-Hartman2-10/+2
In the match() callback, the struct device_driver * should not be changed, so change the function callback to be a const *. This is one step of many towards making the driver core safe to have struct device_driver in read-only memory. Because the match() callback is in all busses, all busses are modified to handle this properly. This does entail switching some container_of() calls to container_of_const() to properly handle the constant *. For some busses, like PCI and USB and HV, the const * is cast away in the match callback as those busses do want to modify those structures at this point in time (they have a local lock in the driver structure.) That will have to be changed in the future if they wish to have their struct device * in read-only-memory. Cc: Rafael J. Wysocki <rafael@kernel.org> Reviewed-by: Alex Elder <elder@kernel.org> Acked-by: Sumit Garg <sumit.garg@linaro.org> Link: https://lore.kernel.org/r/2024070136-wrongdoer-busily-01e8@gregkh Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-28powerpc: Replace CONFIG_4xx with CONFIG_44xMichael Ellerman2-2/+2
Replace 4xx usage with 44x, and replace 4xx_SOC with 44x. Also, as pointed out by Christophe, if 44x || BOOKE can be simplified to just test BOOKE, because 44x always selects BOOKE. Retain the CONFIG_4xx symbol, as there are drivers that use it to mean 4xx || 44x, those will need updating before CONFIG_4xx can be removed. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240628121201.130802-6-mpe@ellerman.id.au
2024-06-28powerpc/4xx: Remove CONFIG_BOOKE_OR_40xMichael Ellerman6-9/+9
Now that 40x is gone, replace CONFIG_BOOKE_OR_40x by CONFIG_BOOKE. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240628121201.130802-5-mpe@ellerman.id.au
2024-06-28powerpc: Remove core support for 40xChristophe Leroy10-304/+6
Now that 40x platforms have gone, remove support for 40x in the core of powerpc arch. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240628121201.130802-4-mpe@ellerman.id.au
2024-06-28