linux.git - Clone of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git

Age	Commit message (Collapse)	Author	Files	Lines
2024-09-01	Merge branch 'fixes' of ↵	Linus Torvalds	2	-8/+1
	git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull misc fixes from Guenter Roeck. These are fixes for regressions that Guenther has been reporting, and the maintainers haven't picked up and sent in. With rc6 fairly imminent, I'm taking them directly from Guenter. * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: apparmor: fix policy_unpack_test on big endian systems Revert "MIPS: csrc-r4k: Apply verification clocksource flags" microblaze: don't treat zero reserved memory regions as error
2024-09-01	Merge tag 'arm-fixes-6.11-2' of ↵	Linus Torvalds	22	-70/+244
	git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC fixes from Arnd Bergmann: "There is a fairly large number of bug fixes for Qualcomm platforms, most of them addressing issues with the devicetree files for the newly added Snapdragon X1 based laptops to make them more reliable. The Qualcomm driver changes address a few build-time issues as well as runtime problems in the tzmem and scm firmware, the USB Type-C driver, and the cmd-db and pmic_glink soc drivers. The NXP i.MX usually gets a bunch of devicetree fixes that is proportional to the number of supported machines. This includes both warning fixes and correctness for the 64-bit i.MX9, i.MX8 and layerscape platforms, as well as a single fix for a 32-bit i.MX6 based board. The other changes are the usual minor changes, including an update to the MAINTAINERS file, an omap3 dts file and a SoC driver for mpfs (risc-v)" * tag 'arm-fixes-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (50 commits) firmware: microchip: fix incorrect error report of programming:timeout on success soc: qcom: pd-mapper: Fix singleton refcount firmware: qcom: tzmem: disable sdm670 platform soc: qcom: pmic_glink: Actually communicate when remote goes down usb: typec: ucsi: Move unregister out of atomic section soc: qcom: pmic_glink: Fix race during initialization firmware: qcom: qseecom: remove unused functions firmware: qcom: tzmem: fix virtual-to-physical address conversion firmware: qcom: scm: Mark get_wq_ctx() as atomic call arm64: dts: qcom: x1e80100: Fix Adreno SMMU global interrupt arm64: dts: qcom: disable GPU on x1e80100 by default arm64: dts: imx8mm-phygate: fix typo pinctrcl-0 arm64: dts: imx95: correct L3Cache cache-sets arm64: dts: imx95: correct a55 power-domains arm64: dts: freescale: imx93-tqma9352-mba93xxla: fix typo arm64: dts: freescale: imx93-tqma9352: fix CMA alloc-ranges ARM: dts: imx6dl-yapp43: Increase LED current to match the yapp4 HW design arm64: dts: imx93: update default value for snps,clk-csr arm64: dts: freescale: tqma9352: Fix watchdog reset arm64: dts: imx8mp-beacon-kit: Fix Stereo Audio on WM8962 ...
2024-08-29	Merge patch series "riscv: mm: Do not restrict mmap address based on hint"	Palmer Dabbelt	1	-24/+2
	Charlie Jenkins <charlie@rivosinc.com> says: There have been a couple of reports that using the hint address to restrict the address returned by mmap hint address has caused issues in applications. A different solution for restricting addresses returned by mmap is necessary to avoid breakages. [Palmer: This also just wasn't doing the right thing in the first place, as it didn't handle the sv39 cases we were trying to deal with.] * b4-shazam-merge: riscv: mm: Do not restrict mmap address based on hint riscv: selftests: Remove mmap hint address checks Revert "RISC-V: mm: Document mmap changes" Link: https://lore.kernel.org/r/20240826-riscv_mmap-v1-0-cd8962afe47f@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-08-29	riscv: mm: Do not restrict mmap address based on hint	Charlie Jenkins	1	-24/+2
	The hint address should not forcefully restrict the addresses returned by mmap as this causes mmap to report ENOMEM when there is memory still available. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Fixes: b5b4287accd7 ("riscv: mm: Use hint address in mmap if available") Fixes: add2cc6b6515 ("RISC-V: mm: Restrict address space for sv39,sv48,sv57") Closes: https://lore.kernel.org/linux-kernel/ZbxTNjQPFKBatMq+@ghost/T/#mccb1890466bf5a488c9ce7441e57e42271895765 Link: https://lore.kernel.org/r/20240826-riscv_mmap-v1-3-cd8962afe47f@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-08-29	arm64: dts: rockchip: Fix compatibles for RK3588 VO{0,1}_GRF	Cristian Ciocaltea	1	-3/+3
	RK3588 VO0 and VO1 GRFs are not identical (though quite similar in terms of layout) and, therefore, incorrectly shared the compatible string. Since the related binding document has been updated to use dedicated strings, update the compatibles for vo{0,1}_grf DT nodes accordingly. Additionally, for consistency, set the full region size (16KB) for VO1_GRF. Reported-by: Conor Dooley <conor@kernel.org> Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20240828-rk3588-vo-grf-compat-v2-2-4db2f791593f@collabora.com Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2024-08-29	powerpc/qspinlock: Fix deadlock in MCS queue	Nysal Jan K.A.	1	-1/+9
	If an interrupt occurs in queued_spin_lock_slowpath() after we increment qnodesp->count and before node->lock is initialized, another CPU might see stale lock values in get_tail_qnode(). If the stale lock value happens to match the lock on that CPU, then we write to the "next" pointer of the wrong qnode. This causes a deadlock as the former CPU, once it becomes the head of the MCS queue, will spin indefinitely until it's "next" pointer is set by its successor in the queue. Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results in occasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive \ --maximize --oomable --verify --syslog \ --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ec The following code flow illustrates how the deadlock occurs. For the sake of brevity, assume that both locks (A and B) are contended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) \| spin_unlock_irqrestore(A) \| spin_lock(B) \| \| \| ▼ \| id = qnodesp->count++; \| (Note that nodes[0].lock == A) \| \| \| ▼ \| Interrupt \| (happens before "nodes[0].lock = B") \| \| \| ▼ \| spin_lock_irqsave(A) \| \| \| ▼ \| id = qnodesp->count++ \| nodes[1].lock = A \| \| \| ▼ \| Tail of MCS queue \| \| spin_lock_irqsave(A) ▼ \| Head of MCS queue ▼ \| CPU0 is previous tail ▼ \| Spin indefinitely ▼ (until "nodes[1].next != NULL") prev = get_tail_qnode(A, CPU0) \| ▼ prev == &qnodes[CPU0].nodes[0] (as qnodes[CPU0].nodes[0].lock == A) \| ▼ WRITE_ONCE(prev->next, node) \| ▼ Spin indefinitely (until nodes[0].locked == 1) Thanks to Saket Kumar Bhaskar for help with recreating the issue Fixes: 84990b169557 ("powerpc/qspinlock: add mcs queueing for contended waiters") Cc: stable@vger.kernel.org # v6.2+ Reported-by: Geetika Moolchandani <geetika@linux.ibm.com> Reported-by: Vaishnavi Bhat <vaish123@in.ibm.com> Reported-by: Jijo Varghese <vargjijo@in.ibm.com> Signed-off-by: Nysal Jan K.A. <nysal@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240829022830.1164355-1-nysal@linux.ibm.com
2024-08-28	Merge tag 'qcom-arm64-fixes-for-6.11' of ↵	Arnd Bergmann	6	-35/+209
	https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into arm/fixes Qualcomm Arm64 DeviceTree fixes for v6.11 On X1E the GPU node is disabled by default, to be enabled in the individual devices once the developers install the required firmware. The generic EDP panel driver used on the X1E CRD is replaced with the Samsung ATNA45AF01 driver, in order to ensure backlight is brought back up after being turned off. The pin configuration for PCIe-related pins are corrected across all the X1E targets. The PCIe controllers gain a minimum OPP vote, and PCIe domain numbers are corrected. WiFi calibration variant information is added to the Lenovo Yoga Slim 7x, to pick the right data from the firmware packages. The incorrect Adreno SMMU global interrupt is corrected. For IPQ5332, the IRQ triggers for the USB controller are corrected. * tag 'qcom-arm64-fixes-for-6.11' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux: (23 commits) arm64: dts: qcom: x1e80100: Fix Adreno SMMU global interrupt arm64: dts: qcom: disable GPU on x1e80100 by default arm64: dts: qcom: x1e80100-crd: Fix backlight arm64: dts: qcom: x1e80100-yoga-slim7x: fix missing PCIe4 gpios arm64: dts: qcom: x1e80100-yoga-slim7x: disable PCIe6a perst pull down arm64: dts: qcom: x1e80100-yoga-slim7x: fix up PCIe6a pinctrl node arm64: dts: qcom: x1e80100-yoga-slim7x: fix PCIe4 PHY supply arm64: dts: qcom: x1e80100-vivobook-s15: fix missing PCIe4 gpios arm64: dts: qcom: x1e80100-vivobook-s15: disable PCIe6a perst pull down arm64: dts: qcom: x1e80100-vivobook-s15: fix up PCIe6a pinctrl node arm64: dts: qcom: x1e80100-vivobook-s15: fix PCIe4 PHY supply arm64: dts: qcom: x1e80100-qcp: fix missing PCIe4 gpios arm64: dts: qcom: x1e80100-qcp: disable PCIe6a perst pull down arm64: dts: qcom: x1e80100-qcp: fix up PCIe6a pinctrl node arm64: dts: qcom: x1e80100-qcp: fix PCIe4 PHY supply arm64: dts: qcom: x1e80100-crd: fix missing PCIe4 gpios arm64: dts: qcom: x1e80100-crd: disable PCIe6a perst pull down arm64: dts: qcom: x1e80100-crd: fix up PCIe6a pinctrl node arm64: dts: qcom: x1e80100: add missing PCIe minimum OPP arm64: dts: qcom: x1e80100: fix PCIe domain numbers ... Link: https://lore.kernel.org/r/20240826152426.1648383-1-andersson@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-08-28	Merge tag 'qcom-arm64-defconfig-fixes-for-6.11' of ↵	Arnd Bergmann	1	-0/+1
	https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into arm/fixes Qualcomm Arm64 defconfig fix for 6.11 Enable the Samsung ATNA33XC20 display panel driver, as we switched from the generic EDP panel for some of the X1E devices in v6.11. * tag 'qcom-arm64-defconfig-fixes-for-6.11' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux: arm64: defconfig: Add CONFIG_DRM_PANEL_SAMSUNG_ATNA33XC20 Link: https://lore.kernel.org/r/20240826145736.1646729-1-andersson@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-08-28	Merge tag 'imx-fixes-6.11' of ↵	Arnd Bergmann	14	-34/+33
	https://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into arm/fixes i.MX fixes for 6.11: - One imx8mp-beacon-kit change from Adam Ford to fix the broken WM8962 audio support - One pinctrl property typo fix for imx8mm-phygate - One layerscape fix from Krzysztof Kozlowski to get thermal nodes correct name length - A couple of imx93-tqma9352 fixes from Markus Niebel, one on CMA alloc-ranges and the other on SD-Card cd-gpios typo - One change from Michal Vokáč to fix imx6dl-yapp43 LED current to match the HW design - A couple of imx95 fixes from Peng Fan, one to correct a55 power domains and the other to correct L3Cache cache-sets - One tqma9352 watchdog reset fix from Sascha Hauer - One imx93 change from Shenwei Wang to fix the default value for STMMAC EQOS snps,clk-csr * tag 'imx-fixes-6.11' of https://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux: arm64: dts: imx8mm-phygate: fix typo pinctrcl-0 arm64: dts: imx95: correct L3Cache cache-sets arm64: dts: imx95: correct a55 power-domains arm64: dts: freescale: imx93-tqma9352-mba93xxla: fix typo arm64: dts: freescale: imx93-tqma9352: fix CMA alloc-ranges ARM: dts: imx6dl-yapp43: Increase LED current to match the yapp4 HW design arm64: dts: imx93: update default value for snps,clk-csr arm64: dts: freescale: tqma9352: Fix watchdog reset arm64: dts: imx8mp-beacon-kit: Fix Stereo Audio on WM8962 arm64: dts: layerscape: fix thermal node names length Link: https://lore.kernel.org/r/ZrtsTO1+jXhJ6GSM@dragon Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-08-28	Merge tag 'omap-for-v6.11/fixes-signed' of ↵	Arnd Bergmann	1	-1/+1
	https://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-omap into arm/fixes OMAP fixes for v6.11-rc - omap3-n900: fix accelerometer orientation * tag 'omap-for-v6.11/fixes-signed' of https://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-omap: ARM: dts: omap3-n900: correct the accelerometer orientation Link: https://lore.kernel.org/r/7h4j7eyhyh.fsf@baylibre.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-08-28	KVM: SEV: Update KVM_AMD_SEV Kconfig entry and mention SEV-SNP	Vitaly Kuznetsov	1	-2/+4
	SEV-SNP support is present since commit 1dfe571c12cf ("KVM: SEV: Add initial SEV-SNP support") but Kconfig entry wasn't updated and still mentions SEV and SEV-ES only. Add SEV-SNP there and, while on it, expand 'SEV' in the description as 'Encrypted VMs' is not what 'SEV' stands for. No functional change. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20240828122111.160273-1-vkuznets@redhat.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-08-28	x86/resctrl: Fix arch_mbm_* array overrun on SNC	Peter Newman	2	-6/+8
	When using resctrl on systems with Sub-NUMA Clustering enabled, monitoring groups may be allocated RMID values which would overrun the arch_mbm_{local,total} arrays. This is due to inconsistencies in whether the SNC-adjusted num_rmid value or the unadjusted value in resctrl_arch_system_num_rmid_idx() is used. The num_rmid value for the L3 resource is currently: resctrl_arch_system_num_rmid_idx() / snc_nodes_per_l3_cache As a simple fix, make resctrl_arch_system_num_rmid_idx() return the SNC-adjusted, L3 num_rmid value on x86. Fixes: e13db55b5a0d ("x86/resctrl: Introduce snc_nodes_per_l3_cache") Signed-off-by: Peter Newman <peternewman@google.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/r/20240822190212.1848788-1-peternewman@google.com
2024-08-26	x86/tdx: Fix data leak in mmio_read()	Kirill A. Shutemov	1	-1/+0
	The mmio_read() function makes a TDVMCALL to retrieve MMIO data for an address from the VMM. Sean noticed that mmio_read() unintentionally exposes the value of an initialized variable (val) on the stack to the VMM. This variable is only needed as an output value. It did not need to be passed to the VMM in the first place. Do not send the original value of *val to the VMM. [ dhansen: clarify what 'val' is used for. ] Fixes: 31d58c4e557d ("x86/tdx: Handle in-kernel MMIO") Reported-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc:stable@vger.kernel.org Link: https://lore.kernel.org/all/20240826125304.1566719-1-kirill.shutemov%40linux.intel.com
2024-08-26	LoongArch: KVM: Invalidate guest steal time address on vCPU reset	Bibo Mao	3	-9/+1
	If ParaVirt steal time feature is enabled, there is a percpu gpa address passed from guest vCPU and host modifies guest memory space with this gpa address. When vCPU is reset normally, it will notify host and invalidate gpa address. However if VM is crashed and VMM reboots VM forcely, the vCPU reboot notification callback will not be called in VM. Host needs invalidate the gpa address, else host will modify guest memory during VM reboots. Here it is invalidated from the vCPU KVM_REG_LOONGARCH_VCPU_RESET ioctl interface. Also funciton kvm_reset_timer() is removed at vCPU reset stage, since SW emulated timer is only used in vCPU block state. When a vCPU is removed from the block waiting queue, kvm_restore_timer() is called and SW timer is cancelled. And the timer register is also cleared at VMM when a vCPU is reset. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-08-26	LoongArch: Add ifdefs to fix LSX and LASX related warnings	Tiezhu Yang	2	-0/+8
	There exist some warnings when building kernel if CONFIG_CPU_HAS_LBT is set but CONFIG_CPU_HAS_LSX and CONFIG_CPU_HAS_LASX are not set. In this case, there are no definitions of _restore_lsx & _restore_lasx and there are also no definitions of kvm_restore_lsx & kvm_restore_lasx in fpu.S and switch.S respectively, just add some ifdefs to fix these warnings. AS arch/loongarch/kernel/fpu.o arch/loongarch/kernel/fpu.o: warning: objtool: unexpected relocation symbol type in .rela.discard.func_stack_frame_non_standard: 0 arch/loongarch/kernel/fpu.o: warning: objtool: unexpected relocation symbol type in .rela.discard.func_stack_frame_non_standard: 0 AS [M] arch/loongarch/kvm/switch.o arch/loongarch/kvm/switch.o: warning: objtool: unexpected relocation symbol type in .rela.discard.func_stack_frame_non_standard: 0 arch/loongarch/kvm/switch.o: warning: objtool: unexpected relocation symbol type in .rela.discard.func_stack_frame_non_standard: 0 MODPOST Module.symvers ERROR: modpost: "kvm_restore_lsx" [arch/loongarch/kvm/kvm.ko] undefined! ERROR: modpost: "kvm_restore_lasx" [arch/loongarch/kvm/kvm.ko] undefined! Cc: stable@vger.kernel.org # 6.9+ Fixes: cb8a2ef0848c ("LoongArch: Add ORC stack unwinder support") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202408120955.qls5oNQY-lkp@intel.com/ Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-08-26	LoongArch: Define ARCH_IRQ_INIT_FLAGS as IRQ_NOPROBE	Huacai Chen	2	-3/+2
	Currently we call irq_set_noprobe() in a loop for all IRQs, but indeed it only works for IRQs below NR_IRQS_LEGACY because at init_IRQ() only legacy interrupts have been allocated. Instead, we can define ARCH_IRQ_INIT_FLAGS as IRQ_NOPROBE in asm/hwirq.h and the core will automatically set the flag for all interrupts. Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Tianyang Zhang <zhangtianyang@loongson.cn>
2024-08-25	Revert "MIPS: csrc-r4k: Apply verification clocksource flags"	Guenter Roeck	1	-3/+1
	This reverts commit 7190401fc56fb5f02ee3d04476778ab000bbaf32. Verifying the clock source sometimes deems the MIPS clock to be unstable, at least in qemu. clocksource: timekeeping watchdog on CPU0: Marking clocksource 'MIPS' as unstable because the skew is too large: clocksource: 'jiffies' wd_nsec: 500000000 wd_now: ffff8bde wd_last: ffff8bac mask: ffffffff clocksource: 'MIPS' cs_nsec: 940634468 cs_now: 310181c4 cs_last: 28090a09 mask: ffffffff clocksource: Clocksource 'MIPS' skewed 440634468 ns (440 ms) over watchdog 'jiffies' interval of 500000000 ns (500 ms) clocksource: 'MIPS' is current clocksource. If this happens, network interfaces fail to come online. Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2024-08-25	microblaze: don't treat zero reserved memory regions as error	Mike Rapoport	1	-5/+0
	Before commit 721f4a6526da ("mm/memblock: remove empty dummy entry") the check for non-zero of memblock.reserved.cnt in mmu_init() would always be true either because memblock.reserved.cnt is initialized to 1 or because there were memory reservations earlier. The removal of dummy empty entry in memblock caused this check to fail because now memblock.reserved.cnt is initialized to 0. Remove the check for non-zero of memblock.reserved.cnt because it's perfectly fine to have an empty memblock.reserved array that early in boot. Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Mike Rapoport <rppt@kernel.org> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Tested-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20240729053327.4091459-1-rppt@kernel.org Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2024-08-25	perf/x86/intel: Limit the period on Haswell	Kan Liang	1	-2/+21
	Running the ltp test cve-2015-3290 concurrently reports the following warnings. perfevents: irq loop stuck! WARNING: CPU: 31 PID: 32438 at arch/x86/events/intel/core.c:3174 intel_pmu_handle_irq+0x285/0x370 Call Trace: <NMI> ? __warn+0xa4/0x220 ? intel_pmu_handle_irq+0x285/0x370 ? __report_bug+0x123/0x130 ? intel_pmu_handle_irq+0x285/0x370 ? __report_bug+0x123/0x130 ? intel_pmu_handle_irq+0x285/0x370 ? report_bug+0x3e/0xa0 ? handle_bug+0x3c/0x70 ? exc_invalid_op+0x18/0x50 ? asm_exc_invalid_op+0x1a/0x20 ? irq_work_claim+0x1e/0x40 ? intel_pmu_handle_irq+0x285/0x370 perf_event_nmi_handler+0x3d/0x60 nmi_handle+0x104/0x330 Thanks to Thomas Gleixner's analysis, the issue is caused by the low initial period (1) of the frequency estimation algorithm, which triggers the defects of the HW, specifically erratum HSW11 and HSW143. (For the details, please refer https://lore.kernel.org/lkml/87plq9l5d2.ffs@tglx/) The HSW11 requires a period larger than 100 for the INST_RETIRED.ALL event, but the initial period in the freq mode is 1. The erratum is the same as the BDM11, which has been supported in the kernel. A minimum period of 128 is enforced as well on HSW. HSW143 is regarding that the fixed counter 1 may overcount 32 with the Hyper-Threading is enabled. However, based on the test, the hardware has more issues than it tells. Besides the fixed counter 1, the message 'interrupt took too long' can be observed on any counter which was armed with a period < 32 and two events expired in the same NMI. A minimum period of 32 is enforced for the rest of the events. The recommended workaround code of the HSW143 is not implemented. Because it only addresses the issue for the fixed counter. It brings extra overhead through extra MSR writing. No related overcounting issue has been reported so far. Fixes: 3a632cb229bf ("perf/x86/intel: Add simple Haswell PMU support") Reported-by: Li Huafei <lihuafei1@huawei.com> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20240819183004.3132920-1-kan.liang@linux.intel.com Closes: https://lore.kernel.org/lkml/20240729223328.327835-1-lihuafei1@huawei.com/
2024-08-25	LoongArch: Remove the unused dma-direct.h	Miao Wang	1	-11/+0
	dma-direct.h is introduced in commit d4b6f1562a3c3284 ("LoongArch: Add Non-Uniform Memory Access (NUMA) support"). In commit c78c43fe7d42524c ("LoongArch: Use acpi_arch_dma_setup() and remove ARCH_HAS_PHYS_TO_DMA"), ARCH_HAS_PHYS_TO_DMA was deselected and the coresponding phys_to_dma()/ dma_to_phys() functions were removed. However, the unused dma-direct.h was left behind, which is removed by this patch. Cc: <stable@vger.kernel.org> Fixes: c78c43fe7d42 ("LoongArch: Use acpi_arch_dma_setup() and remove ARCH_HAS_PHYS_TO_DMA") Signed-off-by: Miao Wang <shankerwangmiao@gmail.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-08-25	Merge tag 's390-6.11-4' of ↵	Linus Torvalds	8	-33/+85
	git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Vasily Gorbik: - Fix KASLR base offset to account for symbol offsets in the vmlinux ELF file, preventing tool breakages like the drgn debugger - Fix potential memory corruption of physmem_info during kernel physical address randomization - Fix potential memory corruption due to overlap between the relocated lowcore and identity mapping by correctly reserving lowcore memory - Fix performance regression and avoid randomizing identity mapping base by default - Fix unnecessary delay of AP bus binding complete uevent to prevent startup lag in KVM guests using AP * tag 's390-6.11-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/boot: Fix KASLR base offset off by __START_KERNEL bytes s390/boot: Avoid possible physmem_info segment corruption s390/ap: Refine AP bus bindings complete processing s390/mm: Pin identity mapping base to zero s390/mm: Prevent lowcore vs identity mapping overlap
2024-08-24	Merge tag 'mips-fixes_6.11_1' of ↵	Linus Torvalds	2	-8/+11
	git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux Pull MIPS fixes from Thomas Bogendoerfer: - Set correct timer mode on Loongson64 - Only request r4k clockevent interrupt on one CPU * tag 'mips-fixes_6.11_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: MIPS: cevt-r4k: Don't call get_c0_compare_int if timer irq is installed MIPS: Loongson64: Set timer mode in cpu-probe
2024-08-24	Merge tag 'arm64-fixes' of ↵	Linus Torvalds	6	-5/+33
	git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 kvm fixes from Catalin Marinas: - Don't drop references on LPIs that weren't visited by the vgic-debug iterator - Cure lock ordering issue when unregistering vgic redistributors - Fix for misaligned stage-2 mappings when VMs are backed by hugetlb pages - Treat SGI registers as UNDEFINED if a VM hasn't been configured for GICv3 * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: KVM: arm64: Make ICC_SGI_EL1 undef in the absence of a vGICv3 KVM: arm64: Ensure canonical IPA is hugepage-aligned when handling fault KVM: arm64: vgic: Don't hold config_lock while unregistering redistributors KVM: arm64: vgic-debug: Don't put unmarked LPIs
2024-08-23	Merge tag 'kvmarm-fixes-6.11-2' of ↵	Catalin Marinas	18	-38/+72
	git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into for-next/fixes KVM/arm64 fixes for 6.11, round #2 - Don't drop references on LPIs that weren't visited by the vgic-debug iterator - Cure lock ordering issue when unregistering vgic redistributors - Fix for misaligned stage-2 mappings when VMs are backed by hugetlb pages - Treat SGI registers as UNDEFINED if a VM hasn't been configured for GICv3 * tag 'kvmarm-fixes-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm: KVM: arm64: Make ICC_SGI_EL1 undef in the absence of a vGICv3 KVM: arm64: Ensure canonical IPA is hugepage-aligned when handling fault KVM: arm64: vgic: Don't hold config_lock while unregistering redistributors KVM: arm64: vgic-debug: Don't put unmarked LPIs KVM: arm64: vgic: Hold config_lock while tearing down a CPU interface KVM: selftests: arm64: Correct feature test for S1PIE in get-reg-list KVM: arm64: Tidying up PAuth code in KVM KVM: arm64: vgic-debug: Exit the iterator properly w/o LPI KVM: arm64: Enforce dependency on an ARMv8.4-aware toolchain docs: KVM: Fix register ID of SPSR_FIQ KVM: arm64: vgic: fix unexpected unlock sparse warnings KVM: arm64: fix kdoc warnings in W=1 builds KVM: arm64: fix override-init warnings in W=1 builds KVM: arm64: free kvm->arch.nested_mmus with kvfree()
2024-08-22	KVM: SVM: Don't advertise Bus Lock Detect to guest if SVM support is missing	Ravi Bangoria	1	-0/+3
	If host supports Bus Lock Detect, KVM advertises it to guests even if SVM support is absent. Additionally, guest wouldn't be able to use it despite guest CPUID bit being set. Fix it by unconditionally clearing the feature bit in KVM cpu capability. Reported-by: Jim Mattson <jmattson@google.com> Closes: https://lore.kernel.org/r/CALMp9eRet6+v8Y1Q-i6mqPm4hUow_kJNhmVHfOV8tMfuSS=tVg@mail.gmail.com Fixes: 76ea438b4afc ("KVM: X86: Expose bus lock debug exception to guest") Cc: stable@vger.kernel.org Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20240808062937.1149-4-ravi.bangoria@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-08-22	KVM: SVM: fix emulation of msr reads/writes of MSR_FS_BASE and MSR_GS_BASE	Maxim Levitsky	1	-0/+12
	If these msrs are read by the emulator (e.g due to 'force emulation' prefix), SVM code currently fails to extract the corresponding segment bases, and return them to the emulator. Fix that. Cc: stable@vger.kernel.org Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20240802151608.72896-3-mlevitsk@redhat.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-08-22	KVM: x86: Acquire kvm->srcu when handling KVM_SET_VCPU_EVENTS	Sean Christopherson	1	-0/+2
	Grab kvm->srcu when processing KVM_SET_VCPU_EVENTS, as KVM will forcibly leave nested VMX/SVM if SMM mode is being toggled, and leaving nested VMX reads guest memory. Note, kvm_vcpu_ioctl_x86_set_vcpu_events() can also be called from KVM_RUN via sync_regs(), which already holds SRCU. I.e. trying to precisely use kvm_vcpu_srcu_read_lock() around the problematic SMM code would cause problems. Acquiring SRCU isn't all that expensive, so for simplicity, grab it unconditionally for KVM_SET_VCPU_EVENTS. ============================= WARNING: suspicious RCU usage 6.10.0-rc7-332d2c1d713e-next-vm #552 Not tainted ----------------------------- include/linux/kvm_host.h:1027 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by repro/1071: #0: ffff88811e424430 (&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0x7d/0x970 [kvm] stack backtrace: CPU: 15 PID: 1071 Comm: repro Not tainted 6.10.0-rc7-332d2c1d713e-next-vm #552 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 Call Trace: <TASK> dump_stack_lvl+0x7f/0x90 lockdep_rcu_suspicious+0x13f/0x1a0 kvm_vcpu_gfn_to_memslot+0x168/0x190 [kvm] kvm_vcpu_read_guest+0x3e/0x90 [kvm] nested_vmx_load_msr+0x6b/0x1d0 [kvm_intel] load_vmcs12_host_state+0x432/0xb40 [kvm_intel] vmx_leave_nested+0x30/0x40 [kvm_intel] kvm_vcpu_ioctl_x86_set_vcpu_events+0x15d/0x2b0 [kvm] kvm_arch_vcpu_ioctl+0x1107/0x1750 [kvm] ? mark_held_locks+0x49/0x70 ? kvm_vcpu_ioctl+0x7d/0x970 [kvm] ? kvm_vcpu_ioctl+0x497/0x970 [kvm] kvm_vcpu_ioctl+0x497/0x970 [kvm] ? lock_acquire+0xba/0x2d0 ? find_held_lock+0x2b/0x80 ? do_user_addr_fault+0x40c/0x6f0 ? lock_release+0xb7/0x270 __x64_sys_ioctl+0x82/0xb0 do_syscall_64+0x6c/0x170 entry_SYSCALL_64_after_hwframe+0x4b/0x53 RIP: 0033:0x7ff11eb1b539 </TASK> Fixes: f7e570780efc ("KVM: x86: Forcibly leave nested virt when SMM state is toggled") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240723232055.3643811-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-08-22	KVM: x86/mmu: Check that root is valid/loaded when pre-faulting SPTEs	Sean Christopherson	1	-1/+3
	Error out if kvm_mmu_reload() fails when pre-faulting memory, as trying to fault-in SPTEs will fail miserably due to root.hpa pointing at garbage. Note, kvm_mmu_reload() can return -EIO and thus trigger the WARN on -EIO in kvm_vcpu_pre_fault_memory(), but all such paths also WARN, i.e. the WARN isn't user-triggerable and won't run afoul of warn-on-panic because the kernel would already be panicking. BUG: unable to handle page fault for address: 000029ffffffffe8 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] PREEMPT SMP CPU: 22 PID: 1069 Comm: pre_fault_memor Not tainted 6.10.0-rc7-332d2c1d713e-next-vm #548 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:is_page_fault_stale+0x3e/0xe0 [kvm] RSP: 0018:ffffc9000114bd48 EFLAGS: 00010206 RAX: 00003fffffffffc0 RBX: ffff88810a07c080 RCX: ffffc9000114bd78 RDX: ffff88810a07c080 RSI: ffffea0000000000 RDI: ffff88810a07c080 RBP: ffffc9000114bd78 R08: 00007fa3c8c00000 R09: 8000000000000225 R10: ffffea00043d7d80 R11: 0000000000000000 R12: ffff88810a07c080 R13: 0000000100000000 R14: ffffc9000114be58 R15: 0000000000000000 FS: 00007fa3c9da0740(0000) GS:ffff888277d80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000029ffffffffe8 CR3: 000000011d698000 CR4: 0000000000352eb0 Call Trace: <TASK> kvm_tdp_page_fault+0xcc/0x160 [kvm] kvm_mmu_do_page_fault+0xfb/0x1f0 [kvm] kvm_arch_vcpu_pre_fault_memory+0xd0/0x1a0 [kvm] kvm_vcpu_ioctl+0x761/0x8c0 [kvm] __x64_sys_ioctl+0x82/0xb0 do_syscall_64+0x5b/0x160 entry_SYSCALL_64_after_hwframe+0x4b/0x53 </TASK> Modules linked in: kvm_intel kvm CR2: 000029ffffffffe8 ---[ end trace 0000000000000000 ]--- Fixes: 6e01b7601dfe ("KVM: x86: Implement kvm_arch_vcpu_pre_fault_memory()") Reported-by: syzbot+23786faffb695f17edaa@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/0000000000002b84dc061dd73544@google.com Reviewed-by: Kai Huang <kai.huang@intel.com> Tested-by: xingwei lee <xrivendell7@gmail.com> Tested-by: yuxin wang <wang1315768607@163.com> Link: https://lore.kernel.org/r/20240723000211.3352304-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-08-22	KVM: x86/mmu: Fixup comments missed by the REMOVED_SPTE=>FROZEN_SPTE rename	Yan Zhao	3	-8/+8
	Replace "removed" with "frozen" in comments as appropriate to complete the rename of REMOVED_SPTE to FROZEN_SPTE. Fixes: 964cea817196 ("KVM: x86/tdp_mmu: Rename REMOVED_SPTE to FROZEN_SPTE") Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Link: https://lore.kernel.org/r/20240712233438.518591-1-rick.p.edgecombe@intel.com [sean: write changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-08-22	s390/boot: Fix KASLR base offset off by __START_KERNEL bytes	Alexander Gordeev	6	-31/+52
	Symbol offsets to the KASLR base do not match symbol address in the vmlinux image. That is the result of setting the KASLR base to the beginning of .text section as result of an optimization. Revert that optimization and allocate virtual memory for the whole kernel image including __START_KERNEL bytes as per the linker script. That allows keeping the semantics of the KASLR base offset in sync with other architectures. Rename __START_KERNEL to TEXT_OFFSET, since it represents the offset of the .text section within the kernel image, rather than a virtual address. Still skip mapping TEXT_OFFSET bytes to save memory on pgtables and provoke exceptions in case an attempt to access this area is made, as no kernel symbol may reside there. In case CONFIG_KASAN is enabled the location counter might exceed the value of TEXT_OFFSET, while the decompressor linker script forcefully resets it to TEXT_OFFSET, which leads to a sections overlap link failure. Use MAX() expression to avoid that. Reported-by: Omar Sandoval <osandov@osandov.com> Closes: https://lore.kernel.org/linux-s390/ZnS8dycxhtXBZVky@telecaster.dhcp.thefacebook.com/ Fixes: 56b1069c40c7 ("s390/boot: Rework deployment of the kernel image") Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Acked-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-08-22	s390/boot: Avoid possible physmem_info segment corruption	Alexander Gordeev	1	-2/+2
	When physical memory for the kernel image is allocated it does not consider extra memory required for offsetting the image start to match it with the lower 20 bits of KASLR virtual base address. That might lead to kernel access beyond its memory range. Suggested-by: Vasily Gorbik <gor@linux.ibm.com> Fixes: 693d41f7c938 ("s390/mm: Restore mapping of kernel image using large pages") Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Acked-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-08-22	powerpc/mm: Fix return type of pgd_val()	Christophe Leroy	2	-5/+11
	Commit 6b0e82791bd0 ("powerpc/e500: switch to 64 bits PGD on 85xx (32 bits)") switched PGD entries to 64 bits, but pgd_val() returns an unsigned long which is 32 bits on PPC32. This is not a problem for regular PMD entries because the upper part is always NULL, but when PMD entries are leaf they contain 64 bits values, so pgd_val() must return an unsigned long long instead of an unsigned long. Also change the condition to CONFIG_PPC_85xx instead of CONFIG_PPC_E500 as the change was meant for 32 bits only. Allthough this should be harmless on PPC64, it generates a warning with pgd_ERROR print. Fixes: 6b0e82791bd0 ("powerpc/e500: switch to 64 bits PGD on 85xx (32 bits)") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/45f8fdf298ec3df7573b66d21b03a5cda92e2cb1.1724313510.git.christophe.leroy@csgroup.eu
2024-08-22	powerpc/vdso: Don't discard rela sections	Christophe Leroy	2	-3/+5
	After building the VDSO, there is a verification that it contains no dynamic relocation, see commit aff69273af61 ("vdso: Improve cmd_vdso_check to check all dynamic relocations"). This verification uses readelf -r and doesn't work if rela sections are discarded. Fixes: 8ad57add77d3 ("powerpc/build: vdso linker warning for orphan sections") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/45c3e6fc76cad05ad2cac0f5b5dfb4fae86dc9d6.1724153239.git.christophe.leroy@csgroup.eu
2024-08-22	powerpc/64e: Define mmu_pte_psize static	Christophe Leroy	1	-1/+1
	mmu_pte_psize is only used in the tlb_64e.c, define it static. Fixes: 25d21ad6e799 ("powerpc: Add TLB management code for 64-bit Book3E") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202408011256.1O99IB0s-lkp@intel.com/ Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/beb30d280eaa5d857c38a0834b147dffd6b28aa9.1724157750.git.christophe.leroy@csgroup.eu
2024-08-22	KVM: arm64: Make ICC_SGI_EL1 undef in the absence of a vGICv3	Marc Zyngier	2	-0/+13
	On a system with a GICv3, if a guest hasn't been configured with GICv3 and that the host is not capable of GICv2 emulation, a write to any of the ICC_SGI_EL1 registers is trapped to EL2. We therefore try to emulate the SGI access, only to hit a NULL pointer as no private interrupt is allocated (no GIC, remember?). The obvious fix is to give the guest what it deserves, in the shape of a UNDEF exception. Reported-by: Alexander Potapenko <glider@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240820100349.3544850-2-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-08-22	KVM: arm64: Ensure canonical IPA is hugepage-aligned when handling fault	Oliver Upton	1	-1/+8
	Zenghui reports that VMs backed by hugetlb pages are no longer booting after commit fd276e71d1e7 ("KVM: arm64: nv: Handle shadow stage 2 page faults"). Support for shadow stage-2 MMUs introduced the concept of a fault IPA and canonical IPA to stage-2 fault handling. These are identical in the non-nested case, as the hardware stage-2 context is always that of the canonical IPA space. Both addresses need to be hugepage-aligned when preparing to install a hugepage mapping to ensure that KVM uses the correct GFN->PFN translation and installs that at the correct IPA for the current stage-2. And now I'm feeling thirsty after all this talk of IPAs... Fixes: fd276e71d1e7 ("KVM: arm64: nv: Handle shadow stage 2 page faults") Reported-by: Zenghui Yu <yuzenghui@huawei.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240822071710.2291690-1-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-08-21	s390/mm: Pin identity mapping base to zero	Alexander Gordeev	2	-1/+15
	SIE instruction performs faster when the virtual address of SIE block matches the physical one. Pin the identity mapping base to zero for the benefit of SIE and other instructions that have similar performance impact. Still, randomize the base when DEBUG_VM kernel configuration option is enabled. Suggested-by: Vasily Gorbik <gor@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-08-21	s390/mm: Prevent lowcore vs identity mapping overlap	Alexander Gordeev	1	-1/+18
	The identity mapping position in virtual memory is randomized together with the kernel mapping. That position can never overlap with the lowcore even when the lowcore is relocated. Prevent overlapping with the lowcore to allow independent positioning of the identity mapping. With the current value of the alternative lowcore address of 0x70000 the overlap could happen in case the identity mapping is placed at zero. This is a prerequisite for uncoupling of randomization base of kernel image and identity mapping in virtual memory. Acked-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-08-20	x86/kaslr: Expose and use the end of the physical memory address space	Thomas Gleixner	4	-6/+35
	iounmap() on x86 occasionally fails to unmap because the provided valid ioremap address is not below high_memory. It turned out that this happens due to KASLR. KASLR uses the full address space between PAGE_OFFSET and vaddr_end to randomize the starting points of the direct map, vmalloc and vmemmap regions. It thereby limits the size of the direct map by using the installed memory size plus an extra configurable margin for hot-plug memory. This limitation is done to gain more randomization space because otherwise only the holes between the direct map, vmalloc, vmemmap and vaddr_end would be usable for randomizing. The limited direct map size is not exposed to the rest of the kernel, so the memory hot-plug and resource management related code paths still operate under the assumption that the available address space can be determined with MAX_PHYSMEM_BITS. request_free_mem_region() allocates from (1 << MAX_PHYSMEM_BITS) - 1 downwards. That means the first allocation happens past the end of the direct map and if unlucky this address is in the vmalloc space, which causes high_memory to become greater than VMALLOC_START and consequently causes iounmap() to fail for valid ioremap addresses. MAX_PHYSMEM_BITS cannot be changed for that because the randomization does not align with address bit boundaries and there are other places which actually require to know the maximum number of address bits. All remaining usage sites of MAX_PHYSMEM_BITS have been analyzed and found to be correct. Cure this by exposing the end of the direct map via PHYSMEM_END and use that for the memory hot-plug and resource management related places instead of relying on MAX_PHYSMEM_BITS. In the KASLR case PHYSMEM_END maps to a variable which is initialized by the KASLR initialization and otherwise it is based on MAX_PHYSMEM_BITS as before. To prevent future hickups add a check into add_pages() to catch callers trying to add memory above PHYSMEM_END. Fixes: 0483e1fa6e09 ("x86/mm: Implement ASLR for kernel memory regions") Reported-by: Max Ramanouski <max8rr8@gmail.com> Reported-by: Alistair Popple <apopple@nvidia.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-By: Max Ramanouski <max8rr8@gmail.com> Tested-by: Alistair Popple <apopple@nvidia.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Alistair Popple <apopple@nvidia.com> Reviewed-by: Kees Cook <kees@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/87ed6soy3z.ffs@tglx
2024-08-20	MIPS: cevt-r4k: Don't call get_c0_compare_int