summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)AuthorFilesLines
2024-08-27irqchip/riscv-intc: Add ACPI support for AIASunil V L1-0/+33
The RINTC subtype structure in MADT also has information about other interrupt controllers. Save this information and provide interfaces to retrieve them when required by corresponding drivers. Signed-off-by: Sunil V L <sunilvl@ventanamicro.com> Reviewed-by: Anup Patel <anup@brainfault.org> Tested-by: Björn Töpel <bjorn@rivosinc.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20240812005929.113499-14-sunilvl@ventanamicro.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-08-27ACPI: RISC-V: Initialize GSI mapping structuresSunil V L1-0/+22
RISC-V has PLIC and APLIC in MADT as well as namespace devices. Initialize the list of those structures using MADT and namespace devices to create mapping between the ACPI handle and the GSI ranges. This will be used later to add dependencies. Signed-off-by: Sunil V L <sunilvl@ventanamicro.com> Tested-by: Björn Töpel <bjorn@rivosinc.com> Link: https://patch.msgid.link/20240812005929.113499-12-sunilvl@ventanamicro.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-08-27ACPI: RISC-V: Implement PCI related functionalitySunil V L2-17/+16
Replace the dummy implementation for PCI related functions with actual implementation. This needs ECAM and MCFG CONFIG options to be enabled for RISC-V. Signed-off-by: Sunil V L <sunilvl@ventanamicro.com> Tested-by: Björn Töpel <bjorn@rivosinc.com> Link: https://patch.msgid.link/20240812005929.113499-10-sunilvl@ventanamicro.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-08-27arm64: PCI: Migrate ACPI related functions to pci-acpi.cSunil V L1-191/+0
The functions defined in arm64 for ACPI support are required for RISC-V also. To avoid duplication, move these functions to common location. Signed-off-by: Sunil V L <sunilvl@ventanamicro.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Will Deacon <will@kernel.org> Tested-by: Björn Töpel <bjorn@rivosinc.com> Link: https://patch.msgid.link/20240812005929.113499-2-sunilvl@ventanamicro.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-08-27arm64: Constify struct kobj_typeHuang Xiaojia1-1/+1
'struct kobj_type' is not modified. It is only used in kobject_init() which takes a 'const struct kobj_type *ktype' parameter. Constifying this structure moves some data to a read-only section, so increase over all security. On a x86_64, compiled with arm defconfig: Before: ====== text data bss dec hex filename 5602 548 352 6502 1966 arch/arm64/kernel/cpuinfo.o After: ====== text data bss dec hex filename 5650 500 352 6502 1966 arch/arm64/kernel/cpuinfo.o Signed-off-by: Huang Xiaojia <huangxiaojia2@huawei.com> Link: https://lore.kernel.org/r/20240826151250.3500302-1-huangxiaojia2@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2024-08-27arm64: Implement prctl(PR_{G,S}ET_TSC)Peter Collingbourne4-18/+82
On arm64, this prctl controls access to CNTVCT_EL0, CNTVCTSS_EL0 and CNTFRQ_EL0 via CNTKCTL_EL1.EL0VCTEN. Since this bit is also used to implement various erratum workarounds, check whether the CPU needs a workaround whenever we potentially need to change it. This is needed for a correct implementation of non-instrumenting record-replay debugging on arm64 (i.e. rr; https://rr-project.org/). rr must trap and record any sources of non-determinism from the userspace program's perspective so it can be replayed later. This includes the results of syscalls as well as the results of access to architected timers exposed directly to the program. This prctl was originally added for x86 by commit 8fb402bccf20 ("generic, x86: add prctl commands PR_GET_TSC and PR_SET_TSC"), and rr uses it to trap RDTSC on x86 for the same reason. We also considered exposing this as a PTRACE_EVENT. However, prctl seems like a better choice for these reasons: 1) In general an in-process control seems more useful than an out-of-process control, since anything that you would be able to do with ptrace could also be done with prctl (tracer can inject a call to the prctl and handle signal-delivery-stops), and it avoids needing an additional process (which will complicate debugging of the ptraced process since it cannot have more than one tracer, and will be incompatible with ptrace_scope=3) in cases where that is not otherwise necessary. 2) Consistency with x86_64. Note that on x86_64, RDTSC has been there since the start, so it's the same situation as on arm64. Signed-off-by: Peter Collingbourne <pcc@google.com> Link: https://linux-review.googlesource.com/id/I233a1867d1ccebe2933a347552e7eae862344421 Link: https://lore.kernel.org/r/20240824015415.488474-1-pcc@google.com Signed-off-by: Will Deacon <will@kernel.org>
2024-08-27Merge v6.11-rc5 into drm-nextDaniel Vetter79-254/+379
amdgpu pr conconflicts due to patches cherry-picked to -fixes, I might as well catch up with a backmerge and handle them all. Plus both misc and intel maintainers asked for a backmerge anyway. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2024-08-27virt: sev-guest: Ensure the SNP guest messages do not exceed a pageNikunj A Dadhania1-1/+1
Currently, struct snp_guest_msg includes a message header (96 bytes) and a payload (4000 bytes). There is an implicit assumption here that the SNP message header will always be 96 bytes, and with that assumption the payload array size has been set to 4000 bytes - a magic number. If any new member is added to the SNP message header, the SNP guest message will span more than a page. Instead of using a magic number for the payload, declare struct snp_guest_msg in a way that payload plus the message header do not exceed a page. [ bp: Massage. ] Suggested-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Nikunj A Dadhania <nikunj@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240731150811.156771-5-nikunj@amd.com
2024-08-27powerpc/xmon: Fix tmpstr length check in scanhexMadhavan Srinivasan1-1/+1
If a function name is greater than 63 characters long, xmon command may not find them. For example, here is a test that executed an illegal instruction in a kernel function and one of call stack function has a name greater than 63 characters long: cpu 0x0: Vector: 700 (Program Check) at [c00000000a6577e0] pc: c0000000001aacb8: check__allowed__function__name__for__symbol__r4+0x8/0x10 lr: c00000000019c1e0: check__allowed__function__name__for__symbol__r1+0x20/0x40 sp: c00000000a657a80 msr: 800000000288b033 current = 0xc00000000a439900 paca = 0xc000000003e90000 irqmask: 0x03 irq_happened: 0x01 ..... [link register ] c00000000019c1e0 check__allowed__function__name__for__symbol__r1+0x20/0x40 [c00000000a657a80] c00000000a439900 (unreliable) [c00000000a657aa0] c0000000001021d8 check__allowed__function__name__for__symbol__r2_resolution_symbol+0x38/0x4c [c00000000a657ac0] c00000000019b424 power_pmu_event_init+0xa4/0xa50 and when executing a dump instruction (di) command for long function name, xmon fails to find the function symbol: 0:mon> di $check__allowed__function__name__for__symbol__r2_resolution_symbol unknown symbol 'check__allowed__function__name__for__symbol__r2_resolution_symb' 0000000000000000 ******** This is because in scanhex(), tmpstr loop index is checked only for a upper bound of 63. Fix it by replacing the upper bound value with (KSYM_NAME_LEN-1). With fix: 0:mon> di $check__allowed__function__name__for__symbol__r2_resolution_symbol c0000000001021a0 3c4c0249 addis r2,r12,585 c0000000001021a4 3842ae60 addi r2,r2,-20896 c0000000001021a8 7c0802a6 mflr r0 c0000000001021ac 60000000 nop ..... Reported-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Closes: https://lore.kernel.org/linuxppc-dev/CANiq72=QeTgtZL4k9=4CJP6C_Hv=rh3fsn3B9S3KFoPXkyWk3w@mail.gmail.com/ Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240826064217.46658-1-maddy@linux.ibm.com
2024-08-27KVM: arm64: Expose ID_AA64PFR2_EL1 to userspace and guestsMarc Zyngier1-1/+5
Everything is now in place for a guest to "enjoy" FP8 support. Expose ID_AA64PFR2_EL1 to both userspace and guests, with the explicit restriction of only being able to clear FPMR. All other features (MTE* at the time of writing) are hidden and not writable. Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240820131802.3547589-9-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-08-27KVM: arm64: Enable FP8 support when available and configuredMarc Zyngier1-0/+3
If userspace has enabled FP8 support (by setting ID_AA64PFR2_EL1.FPMR to 1), let's enable the feature by setting HCRX_EL2.EnFPM for the vcpu. Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240820131802.3547589-8-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-08-27KVM: arm64: Expose ID_AA64FPFR0_EL1 as a writable ID regMarc Zyngier1-1/+1
ID_AA64FPFR0_EL1 contains all sort of bits that contain a description of which FP8 subfeatures are implemented. We don't really care about them, so let's just expose that register and allow userspace to disable subfeatures at will. Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240820131802.3547589-7-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-08-27KVM: arm64: Honor trap routing for FPMRMarc Zyngier1-0/+8
HCRX_EL2.EnFPM controls the trapping of FPMR (as well as the validity of any FP8 instruction, but we don't really care about this last part). Describe the trap bit so that the exception can be reinjected in a NV guest. Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240820131802.3547589-6-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-08-27KVM: arm64: Add save/restore support for FPMRMarc Zyngier6-0/+35
Just like the rest of the FP/SIMD state, FPMR needs to be context switched. The only interesting thing here is that we need to treat the pKVM part a bit differently, as the host FP state is never written back to the vcpu thread, but instead stored locally and eagerly restored. Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240820131802.3547589-5-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-08-27KVM: arm64: Move FPMR into the sysreg arrayMarc Zyngier3-2/+12
Just like SVCR, FPMR is currently stored at the wrong location. Let's move it where it belongs. Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240820131802.3547589-4-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-08-27KVM: arm64: Add predicate for FPMR support in a VMMarc Zyngier1-0/+4
As we are about to check for the advertisement of FPMR support to a guest in a number of places, add a predicate that will gate most of the support code for FPMR. Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240820131802.3547589-3-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-08-27KVM: arm64: Move SVCR into the sysreg arrayMarc Zyngier3-3/+14
SVCR is just a system register, and has no purpose being outside of the sysreg array. If anything, it only makes it more difficult to eventually support SME one day. If ever. Move it into the array with its little friends, and associate it with a visibility predicate. Although this is dead code, it at least paves the way for the next set of FP-related extensions. Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240820131802.3547589-2-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-08-27ARM: dts: aspeed: catalina: Update io expander line namesPotin Lai1-2/+2
io_expander7 - P1-5: MCU_GPIO - P1-6: MCU_RST_N - P1-7: MCU_RECOVERY_N io_expander8 - P1-5: SEC_MCU_GPIO - P1-6: SEC_MCU_RST_N - P1-7: SEC_MCU_RECOVERY_N Signed-off-by: Potin Lai <potin.lai.pt@gmail.com> Link: https://lore.kernel.org/r/20240823-catalina-ioexp-update-v1-2-4bfd8dad819c@gmail.com Signed-off-by: Andrew Jeffery <andrew@codeconstruct.com.au> Signed-off-by: Joel Stanley <joel@jms.id.au>
2024-08-27ARM: dts: aspeed: catalina: Add pdb cpld io expanderPotin Lai1-0/+131
Add more IO expanders which are emulated by the PDB CPLD. Signed-off-by: Potin Lai <potin.lai.pt@gmail.com> Link: https://lore.kernel.org/r/20240823-catalina-ioexp-update-v1-1-4bfd8dad819c@gmail.com Signed-off-by: Andrew Jeffery <andrew@codeconstruct.com.au> Signed-off-by: Joel Stanley <joel@jms.id.au>
2024-08-26x86/tdx: Fix data leak in mmio_read()Kirill A. Shutemov1-1/+0
The mmio_read() function makes a TDVMCALL to retrieve MMIO data for an address from the VMM. Sean noticed that mmio_read() unintentionally exposes the value of an initialized variable (val) on the stack to the VMM. This variable is only needed as an output value. It did not need to be passed to the VMM in the first place. Do not send the original value of *val to the VMM. [ dhansen: clarify what 'val' is used for. ] Fixes: 31d58c4e557d ("x86/tdx: Handle in-kernel MMIO") Reported-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc:stable@vger.kernel.org Link: https://lore.kernel.org/all/20240826125304.1566719-1-kirill.shutemov%40linux.intel.com
2024-08-26x86/ioremap: Improve iounmap() address range checksMax Ramanouski1-1/+2
Allowing iounmap() on memory that was not ioremap()'d in the first place is obviously a bad idea. There is currently a feeble attempt to avoid errant iounmap()s by checking to see if the address is below "high_memory". But that's imprecise at best because there are plenty of high addresses that are also invalid to call iounmap() on. Thankfully, there is a more precise helper: is_ioremap_addr(). x86 just does not use it in iounmap(). Restrict iounmap() to addresses in the ioremap region, by using is_ioremap_addr(). This aligns x86 closer to the generic iounmap() implementation. Additionally, add a warning in case there is an attempt to iounmap() invalid memory. This replaces an existing silent return and will help alert folks to any incorrect usage of iounmap(). Due to VMALLOC_START on i386 not being present in asm/pgtable.h, include for asm/vmalloc.h had to be added to include/linux/ioremap.h. [ dhansen: tweak subject and changelog ] Signed-off-by: Max Ramanouski <max8rr8@gmail.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alistair Popple <apopple@nvidia.com> Link: https://lore.kernel.org/all/20240824220111.84441-1-max8rr8%40gmail.com
2024-08-26arm64: dts: qcom: Add support for X1-based Surface Laptop 7 devicesKonrad Dybcio4-0/+863
Add support for Surface Laptop 7 machines, based on X1E80100. The feature status is mostly on par with other X Elite machines, notably lacking: - USB-A and probably USB-over-Surface-connector (pending NXP retimer support) - SD card reader (Realtek RTS5261 connected over PCIe) - Touchscreen and touchpad support (hid-over-SPI [1]) - Audio (a quick look suggests the setup is very close to the one in X1E CRD) The two Surface Laptop 7 SKUs (13.8" and 15") only have very minor differences, amounting close to none on the software side. Even the MBN firmware files and ACPI tables are shared between the two machines. With that in mind, support is added for both, although only the larger one was physically tested. Display differences will be taken care of through fused-in EDID and other matters should be solved within the EC and boot firmware. [1] https://www.microsoft.com/en-us/download/details.aspx?id=103325 Signed-off-by: Konrad Dybcio <quic_kdybcio@quicinc.com> Link: https://lore.kernel.org/r/20240826-topic-sl7-v2-5-c32ebae78789@quicinc.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2024-08-26arm64: dts: qcom: x1e80100: Add UART2Konrad Dybcio1-5/+65
GENI SE2 within QUP0 is used as UART on some devices, describe it. While at it, rewrite the adjacent UART21 pins node to make it more easily modifiable. Signed-off-by: Konrad Dybcio <quic_kdybcio@quicinc.com> Link: https://lore.kernel.org/r/20240826-topic-sl7-v2-4-c32ebae78789@quicinc.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2024-08-26arm64: dts: qcom: x1e80100-pmics: Add PMC8380C PWMKonrad Dybcio1-0/+8
The PMC8380C (PM8550) has a PWM block, describe it. Signed-off-by: Konrad Dybcio <quic_kdybcio@quicinc.com> Link: https://lore.kernel.org/r/20240826-topic-sl7-v2-3-c32ebae78789@quicinc.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2024-08-26LoongArch: KVM: Invalidate guest steal time address on vCPU resetBibo Mao3-9/+1
If ParaVirt steal time feature is enabled, there is a percpu gpa address passed from guest vCPU and host modifies guest memory space with this gpa address. When vCPU is reset normally, it will notify host and invalidate gpa address. However if VM is crashed and VMM reboots VM forcely, the vCPU reboot notification callback will not be called in VM. Host needs invalidate the gpa address, else host will modify guest memory during VM reboots. Here it is invalidated from the vCPU KVM_REG_LOONGARCH_VCPU_RESET ioctl interface. Also funciton kvm_reset_timer() is removed at vCPU reset stage, since SW emulated timer is only used in vCPU block state. When a vCPU is removed from the block waiting queue, kvm_restore_timer() is called and SW timer is cancelled. And the timer register is also cleared at VMM when a vCPU is reset. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-08-26LoongArch: Add ifdefs to fix LSX and LASX related warningsTiezhu Yang2-0/+8
There exist some warnings when building kernel if CONFIG_CPU_HAS_LBT is set but CONFIG_CPU_HAS_LSX and CONFIG_CPU_HAS_LASX are not set. In this case, there are no definitions of _restore_lsx & _restore_lasx and there are also no definitions of kvm_restore_lsx & kvm_restore_lasx in fpu.S and switch.S respectively, just add some ifdefs to fix these warnings. AS arch/loongarch/kernel/fpu.o arch/loongarch/kernel/fpu.o: warning: objtool: unexpected relocation symbol type in .rela.discard.func_stack_frame_non_standard: 0 arch/loongarch/kernel/fpu.o: warning: objtool: unexpected relocation symbol type in .rela.discard.func_stack_frame_non_standard: 0 AS [M] arch/loongarch/kvm/switch.o arch/loongarch/kvm/switch.o: warning: objtool: unexpected relocation symbol type in .rela.discard.func_stack_frame_non_standard: 0 arch/loongarch/kvm/switch.o: warning: objtool: unexpected relocation symbol type in .rela.discard.func_stack_frame_non_standard: 0 MODPOST Module.symvers ERROR: modpost: "kvm_restore_lsx" [arch/loongarch/kvm/kvm.ko] undefined! ERROR: modpost: "kvm_restore_lasx" [arch/loongarch/kvm/kvm.ko] undefined! Cc: stable@vger.kernel.org # 6.9+ Fixes: cb8a2ef0848c ("LoongArch: Add ORC stack unwinder support") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202408120955.qls5oNQY-lkp@intel.com/ Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-08-26LoongArch: Define ARCH_IRQ_INIT_FLAGS as IRQ_NOPROBEHuacai Chen2-3/+2
Currently we call irq_set_noprobe() in a loop for all IRQs, but indeed it only works for IRQs below NR_IRQS_LEGACY because at init_IRQ() only legacy interrupts have been allocated. Instead, we can define ARCH_IRQ_INIT_FLAGS as IRQ_NOPROBE in asm/hwirq.h and the core will automatically set the flag for all interrupts. Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Tianyang Zhang <zhangtianyang@loongson.cn>
2024-08-26m68k: defconfig: Update defconfigs for v6.11-rc1Geert Uytterhoeven12-24/+0
- Drop CONFIG_CRYPTO_SM2=m (removed in commit 46b3ff73afc815f1 ("crypto: sm2 - Remove sm2 algorithm")), - Drop CONFIG_TEST_USER_COPY=m (replaced by auto-modular CONFIG_USERCOPY_KUNIT_TEST in commit cf6219ee889fb304 ("usercopy: Convert test_user_copy to KUnit test")). Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Link: https://lore.kernel.org/bfe0530e290cee9d350f89c4d41436f3de7cb2a5.1722248695.git.geert@linux-m68k.org
2024-08-26m68k: Fix kernel_clone_args.flags in m68k_clone()Finn Thain1-1/+1
Stan Johnson recently reported a failure from the 'dump' command: DUMP: Date of this level 0 dump: Fri Aug 9 23:37:15 2024 DUMP: Dumping /dev/sda (an unlisted file system) to /dev/null DUMP: Label: none DUMP: Writing 10 Kilobyte records DUMP: mapping (Pass I) [regular files] DUMP: mapping (Pass II) [directories] DUMP: estimated 3595695 blocks. DUMP: Context save fork fails in parent 671 The dump program uses the clone syscall with the CLONE_IO flag, that is, flags == 0x80000000. When that value is promoted from long int to u64 by m68k_clone(), it undergoes sign-extension. The new value includes CLONE_INTO_CGROUP so the validation in cgroup_css_set_fork() fails and the syscall returns -EBADF. Avoid sign-extension by casting to u32. Reported-by: Stan Johnson <userm57@yahoo.com> Closes: https://lists.debian.org/debian-68k/2024/08/msg00000.html Fixes: 6aabc1facdb2 ("m68k: Implement copy_thread_tls()") Signed-off-by: Finn Thain <fthain@linux-m68k.org> Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> Link: https://lore.kernel.org/3463f1e5d4e95468dc9f3368f2b78ffa7b72199b.1723335149.git.fthain@linux-m68k.org Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
2024-08-26m68k: cmpxchg: Use swap() to improve codeThorsten Blum1-10/+5
Remove the local variable tmp and use the swap() macro instead. Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> Link: https://lore.kernel.org/20240730234506.492743-2-thorsten.blum@toblux.com Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
2024-08-25Revert "MIPS: csrc-r4k: Apply verification clocksource flags"Guenter Roeck1-3/+1
This reverts commit 7190401fc56fb5f02ee3d04476778ab000bbaf32. Verifying the clock source sometimes deems the MIPS clock to be unstable, at least in qemu. clocksource: timekeeping watchdog on CPU0: Marking clocksource 'MIPS' as unstable because the skew is too large: clocksource: 'jiffies' wd_nsec: 500000000 wd_now: ffff8bde wd_last: ffff8bac mask: ffffffff clocksource: 'MIPS' cs_nsec: 940634468 cs_now: 310181c4 cs_last: 28090a09 mask: ffffffff clocksource: Clocksource 'MIPS' skewed 440634468 ns (440 ms) over watchdog 'jiffies' interval of 500000000 ns (500 ms) clocksource: 'MIPS' is current clocksource. If this happens, network interfaces fail to come online. Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2024-08-25microblaze: don't treat zero reserved memory regions as errorMike Rapoport1-5/+0
Before commit 721f4a6526da ("mm/memblock: remove empty dummy entry") the check for non-zero of memblock.reserved.cnt in mmu_init() would always be true either because memblock.reserved.cnt is initialized to 1 or because there were memory reservations earlier. The removal of dummy empty entry in memblock caused this check to fail because now memblock.reserved.cnt is initialized to 0. Remove the check for non-zero of memblock.reserved.cnt because it's perfectly fine to have an empty memblock.reserved array that early in boot. Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Mike Rapoport <rppt@kernel.org> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Tested-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20240729053327.4091459-1-rppt@kernel.org Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2024-08-25x86/entry: Set FRED RSP0 on return to userspace instead of context switchXin Li (Intel)4-5/+27
The FRED RSP0 MSR points to the top of the kernel stack for user level event delivery. As this is the task stack it needs to be updated when a task is scheduled in. The update is done at context switch. That means it's also done when switching to kernel threads, which is pointless as those never go out to user space. For KVM threads this means there are two writes to FRED_RSP0 as KVM has to switch to the guest value before VMENTER. Defer the update to the exit to user space path and cache the per CPU FRED_RSP0 value, so redundant writes can be avoided. Provide fred_sync_rsp0() for KVM to keep the cache in sync with the actual MSR value after returning from guest to host mode. [ tglx: Massage change log ] Suggested-by: Sean Christopherson <seanjc@google.com> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20240822073906.2176342-4-xin@zytor.com
2024-08-25x86/msr: Switch between WRMSRNS and WRMSR with the alternatives mechanismAndrew Cooper3-16/+11
Per the discussion about FRED MSR writes with WRMSRNS instruction [1], use the alternatives mechanism to choose WRMSRNS when it's available, otherwise fallback to WRMSR. Remove the dependency on X86_FEATURE_WRMSRNS as WRMSRNS is no longer dependent on FRED. [1] https://lore.kernel.org/lkml/15f56e6a-6edd-43d0-8e83-bb6430096514@citrix.com/ Use DS prefix to pad WRMSR instead of a NOP. The prefix is ignored. At least that's the current information from the hardware folks. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20240822073906.2176342-3-xin@zytor.com
2024-08-25x86/entry: Test ti_work for zero before processing individual bitsXin Li (Intel)1-2/+8
In most cases, ti_work values passed to arch_exit_to_user_mode_prepare() are zeros, e.g., 99% in kernel build tests. So an obvious optimization is to test ti_work for zero before processing individual bits in it. Omit the optimization when FPU debugging is enabled, otherwise the FPU consistency check is never executed. Intel 0day tests did not find a perfermance regression with this change. Suggested-by: H. Peter Anvin (Intel) <hpa@zytor.com> Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20240822073906.2176342-2-xin@zytor.com
2024-08-25KVM: arm64: Allow userspace to change ID_AA64PFR1_EL1Shaoqin Huang1-1/+13
Allow userspace to change the guest-visible value of the register with different way of handling: - Since the RAS and MPAM is not writable in the ID_AA64PFR0_EL1 register, RAS_frac and MPAM_frac are also not writable in the ID_AA64PFR1_EL1 register. - The MTE is controlled by a separate UAPI (KVM_CAP_ARM_MTE) with an internal flag (KVM_ARCH_FLAG_MTE_ENABLED). So it's not writable. - For those fields which KVM doesn't know how to handle, they are not exposed to the guest (being disabled in the register read accessor), those fields value will always be 0. Those fields don't have a known behavior now, so don't advertise them to the userspace. Thus still not writable. Those fields include SME, RNDR_trap, NMI, GCS, THE, DF2, PFAR, MTE_frac, MTEX. - The BT, SSBS, CSV2_frac don't introduce any new registers which KVM doesn't know how to handle, they can be written without ill effect. So let them writable. Besides, we don't do the crosscheck in KVM about the CSV2_frac even if it depends on the value of CSV2, it should be made sure by the VMM instead of KVM. Signed-off-by: Shaoqin Huang <shahuang@redhat.com> Link: https://lore.kernel.org/r/20240723072004.1470688-4-shahuang@redhat.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-08-25KVM: arm64: Use kvm_has_feat() to check if FEAT_SSBS is advertised to the guestShaoqin Huang1-6/+6
Currently KVM use cpus_have_final_cap() to check if FEAT_SSBS is advertised to the guest. But if FEAT_SSBS is writable and isn't advertised to the guest, this is wrong. Update it to use kvm_has_feat() to check if FEAT_SSBS is advertised to the guest, thus the KVM can do the right thing if FEAT_SSBS isn't advertised to the guest. Signed-off-by: Shaoqin Huang <shahuang@redhat.com> Link: https://lore.kernel.org/r/20240723072004.1470688-3-shahuang@redhat.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-08-25KVM: arm64: Disable fields that KVM doesn't know how to handle in ↵Shaoqin Huang1-0/+8
ID_AA64PFR1_EL1 For some of the fields in the ID_AA64PFR1_EL1 register, KVM doesn't know how to handle them right now. So explicitly disable them in the register accessor, then those fields value will be masked to 0 even if on the hardware the field value is 1. This is safe because from a UAPI point of view that read_sanitised_ftr_reg() doesn't yet return a nonzero value for any of those fields. This will benifit the migration if the host and VM have different values when restoring a VM. Those fields include RNDR_trap, NMI, MTE_frac, GCS, THE, MTEX, DF2, PFAR. Signed-off-by: Shaoqin Huang <shahuang@redhat.com> Link: https://lore.kernel.org/r/20240723072004.1470688-2-shahuang@redhat.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-08-25perf/x86/intel: Limit the period on HaswellKan Liang1-2/+21
Running the ltp test cve-2015-3290 concurrently reports the following warnings. perfevents: irq loop stuck! WARNING: CPU: 31 PID: 32438 at arch/x86/events/intel/core.c:3174 intel_pmu_handle_irq+0x285/0x370 Call Trace: <NMI> ? __warn+0xa4/0x220 ? intel_pmu_handle_irq+0x285/0x370 ? __report_bug+0x123/0x130 ? intel_pmu_handle_irq+0x285/0x370 ? __report_bug+0x123/0x130 ? intel_pmu_handle_irq+0x285/0x370 ? report_bug+0x3e/0xa0 ? handle_bug+0x3c/0x70 ? exc_invalid_op+0x18/0x50 ? asm_exc_invalid_op+0x1a/0x20 ? irq_work_claim+0x1e/0x40 ? intel_pmu_handle_irq+0x285/0x370 perf_event_nmi_handler+0x3d/0x60 nmi_handle+0x104/0x330 Thanks to Thomas Gleixner's analysis, the issue is caused by the low initial period (1) of the frequency estimation algorithm, which triggers the defects of the HW, specifically erratum HSW11 and HSW143. (For the details, please refer https://lore.kernel.org/lkml/87plq9l5d2.ffs@tglx/) The HSW11 requires a period larger than 100 for the INST_RETIRED.ALL event, but the initial period in the freq mode is 1. The erratum is the same as the BDM11, which has been supported in the kernel. A minimum period of 128 is enforced as well on HSW. HSW143 is regarding that the fixed counter 1 may overcount 32 with the Hyper-Threading is enabled. However, based on the test, the hardware has more issues than it tells. Besides the fixed counter 1, the message 'interrupt took too long' can be observed on any counter which was armed with a period < 32 and two events expired in the same NMI. A minimum period of 32 is enforced for the rest of the events. The recommended workaround code of the HSW143 is not implemented. Because it only addresses the issue for the fixed counter. It brings extra overhead through extra MSR writing. No related overcounting issue has been reported so far. Fixes: 3a632cb229bf ("perf/x86/intel: Add simple Haswell PMU support") Reported-by: Li Huafei <lihuafei1@huawei.com> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20240819183004.3132920-1-kan.liang@linux.intel.com Closes: https://lore.kernel.org/lkml/20240729223328.327835-1-lihuafei1@huawei.com/
2024-08-25x86/fred: Set SS to __KERNEL_DS when enabling FREDXin Li (Intel)1-0/+14
SS is initialized to NULL during boot time and not explicitly set to __KERNEL_DS. With FRED enabled, if a kernel event is delivered before a CPU goes to user level for the first time, its SS is NULL thus NULL is pushed into the SS field of the FRED stack frame. But before ERETS is executed, the CPU may context switch to another task and go to user level. Then when the CPU comes back to kernel mode, SS is changed to __KERNEL_DS. Later when ERETS is executed to return from the kernel event handler, a #GP fault is generated because SS doesn't match the SS saved in the FRED stack frame. Initialize SS to __KERNEL_DS when enabling FRED to prevent that. Note, IRET doesn't check if SS matches the SS saved in its stack frame, thus IDT doesn't have this problem. For IDT it doesn't matter whether SS is set to __KERNEL_DS or not, because it's set to NULL upon interrupt or exception delivery and __KERNEL_DS upon SYSCALL. Thus it's pointless to initialize SS for IDT. Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20240816104316.2276968-1-xin@zytor.com
2024-08-25LoongArch: Remove the unused dma-direct.hMiao Wang1-11/+0
dma-direct.h is introduced in commit d4b6f1562a3c3284 ("LoongArch: Add Non-Uniform Memory Access (NUMA) support"). In commit c78c43fe7d42524c ("LoongArch: Use acpi_arch_dma_setup() and remove ARCH_HAS_PHYS_TO_DMA"), ARCH_HAS_PHYS_TO_DMA was deselected and the coresponding phys_to_dma()/ dma_to_phys() functions were removed. However, the unused dma-direct.h was left behind, which is removed by this patch. Cc: <stable@vger.kernel.org> Fixes: c78c43fe7d42 ("LoongArch: Use acpi_arch_dma_setup() and remove ARCH_HAS_PHYS_TO_DMA") Signed-off-by: Miao Wang <shankerwangmiao@gmail.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-08-25x86/amd_nb: Add new PCI IDs for AMD family 1Ah model 60h-70hRichard Gong1-0/+4
Add new PCI IDs for Device 18h and Function 4 to enable the amd_atl driver on those systems. Signed-off-by: Richard Gong <richard.gong@amd.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Link: https://lore.kernel.org/all/20240819123041.915734-1-richard.gong@amd.com
2024-08-25x86/extable: Remove unused declaration fixup_bug()Yue Haibing1-1/+0
Commit 15a416e8aaa7 ("x86/entry: Treat BUG/WARN as NMI-like entries") removed the implementation but left the declaration. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20240816102219.883297-1-yuehaibing@huawei.com
2024-08-25x86/boot/64: Strip percpu address space when setting up GDT descriptorsUros Bizjak1-1/+2
init_per_cpu_var() returns a pointer in the percpu address space while rip_rel_ptr() expects a pointer in the generic address space. When strict address space checks are enabled, GCC's named address space checks fail: asm.h:124:63: error: passing argument 1 of 'rip_rel_ptr' from pointer to non-enclosed address space Add a explicit cast to remove address space of the returned pointer. Fixes: 11e36b0f7c21 ("x86/boot/64: Load the final kernel GDT during early boot directly, remove startup_gdt[]") Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20240819083334.148536-1-ubizjak@gmail.com
2024-08-25x86/cpu: Clarify the error message when BIOS does not support SGXWangYuli1-1/+1
When SGX is not supported by the BIOS, the kernel log contains the error 'SGX disabled by BIOS', which can be confusing since there might not be an SGX-related option in the BIOS settings. For the kernel it's difficult to distinguish between the BIOS not supporting SGX and the BIOS supporting SGX but having it disabled. Therefore, update the error message to 'SGX disabled or unsupported by BIOS' to make it easier for those reading kernel logs to understand what's happening. Reported-by: Bo Wu <wubo@uniontech.com> Co-developed-by: Zelong Xiang <xiangzelong@uniontech.com> Signed-off-by: Zelong Xiang <xiangzelong@uniontech.com> Signed-off-by: WangYuli <wangyuli@uniontech.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/all/F8D977CB368423F3+20240825104653.1294624-1-wangyuli@uniontech.com Closes: https://github.com/linuxdeepin/developer-center/issues/10032
2024-08-25x86/kexec: Add comments around swap_pages() assembly to improve readabilityKai Huang1-2/+6
The current assembly around swap_pages() in the relocate_kernel() takes some time to follow because the use of registers can be easily lost when the line of assembly goes long. Add a couple of comments to clarify the code around swap_pages() to improve readability. Signed-off-by: Kai Huang <kai.huang@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Link: https://lore.kernel.org/all/8b52b0b8513a34b2a02fb4abb05c6700c2821475.1724573384.git.kai.huang@intel.com
2024-08-25x86/kexec: Fix a comment of swap_pages() assemblyKai Huang1-1/+1
When relocate_kernel() gets called, %rdi holds 'indirection_page' and %rsi holds 'page_list'. And %rdi always holds 'indirection_page' when swap_pages() is called. Therefore the comment of the first line code of swap_pages() movq %rdi, %rcx /* Put the page_list in %rcx */ .. isn't correct because it actually moves the 'indirection_page' to the %rcx. Fix it. Signed-off-by: Kai Huang <kai.huang@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Link: https://lore.kernel.org/all/adafdfb1421c88efce04420fc9a996c0e2ca1b34.1724573384.git.kai.huang@intel.com
2024-08-25x86/sgx: Fix a W=1 build warning in function commentKai Huang1-1/+1
Building the SGX code with W=1 generates below warning: arch/x86/kernel/cpu/sgx/main.c:741: warning: Function parameter or struct member 'low' not described in 'sgx_calc_section_metric' arch/x86/kernel/cpu/sgx/main.c:741: warning: Function parameter or struct member 'high' not described in 'sgx_calc_section_metric' ... The function sgx_calc_section_metric() is a simple helper which is only used in sgx/main.c. There's no need to use kernel-doc style comment for it. Downgrade to a normal comment. Signed-off-by: Kai Huang <kai.huang@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20240825080649.145250-1-kai.huang@intel.com
2024-08-25x86/EISA: Use memremap() to probe for the EISA BIOS signatureMaciej W. Rozycki1-3/+3
The area at the 0x0FFFD9 physical location in the PC memory space is regular memory, traditionally ROM BIOS and more recently a copy of BIOS code and data in RAM, write-protected. Therefore use memremap() to get access to it rather than ioremap(), avoiding issues in virtualization scenarios and complementing changes such as commit f7750a795687 ("x86, mpparse, x86/acpi, x86/PCI, x86/dmi, SFI: Use memremap() for RAM mappings") or commit 5997efb96756 ("x86/boot: Use memremap() to map the MPF and MPC data"). Reported-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/alpine.DEB.2.21.2408242025210.30766@angie.orcam.me.uk Closes: https://lore.kernel.org/r/20240822095122.736522-1-kirill.shutemov@linux.intel.com
2024-08-25x86/mtrr: Remove obsolete declaration for mtrr_bp_restore()Gaosheng Cui1-2/+0
mtrr_bp_restore() has been removed in commit 0b9a6a8bedbf ("x86/mtrr: Add a stop_machine() handler calling only cache_cpu_init()"), but the declaration was left behind. Remove it. Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20240824120234.2516830-1-cuigaosheng1@huawei.com