summaryrefslogtreecommitdiff
path: root/drivers/irqchip
AgeCommit message (Collapse)AuthorFilesLines
2025-03-22irqchip/riscv: Ensure ordering of memory writes and IPI writesXu Lu2-2/+2
[ Upstream commit 825c78e6a60c309a59d18d5ac5968aa79cef0bd6 ] RISC-V distinguishes between memory accesses and device I/O and uses FENCE instruction to order them as viewed by other RISC-V harts and external devices or coprocessors. The FENCE instruction can order any combination of device input(I), device output(O), memory reads(R) and memory writes(W). For example, 'fence w, o' is used to ensure all memory writes from instructions preceding the FENCE instruction appear earlier in the global memory order than device output writes from instructions after the FENCE instruction. RISC-V issues IPIs by writing to the IMSIC/ACLINT MMIO registers, which is regarded as device output operation. However, the existing implementation of the IMSIC/ACLINT drivers issue the IPI via writel_relaxed(), which does not guarantee the order of device output operation and preceding memory writes. As a consequence the hart receiving the IPI might not observe the IPI related data. Fix this by replacing writel_relaxed() with writel() when issuing IPIs, which uses 'fence w, o' to ensure all previous writes made by the current hart are visible to other harts before they receive the IPI. Signed-off-by: Xu Lu <luxu.kernel@bytedance.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250127093846.98625-1-luxu.kernel@bytedance.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-27irqchip/gic-v3: Fix rk3399 workaround when secure interrupts are enabledMarc Zyngier1-13/+40
commit 4cb77793842a351b39a030f77caebace3524840e upstream. Christoph reports that their rk3399 system dies since commit 773c05f417fa1 ("irqchip/gic-v3: Work around insecure GIC integrations"). It appears that some rk3399 have secure payloads, and that the firmware sets SCR_EL3.FIQ==1. Obivously, disabling security in that configuration leads to even more problems. Revisit the workaround by: - making it rk3399 specific - checking whether Group-0 is available, which is a good proxy for SCR_EL3.FIQ being 0 - either apply the workaround if Group-0 is available, or disable pseudo-NMIs if not Note that this doesn't mean that the secure side is able to receive interrupts, as all interrupts are made non-secure anyway. Clearly, nobody ever tested secure interrupts on this platform. Fixes: 773c05f417fa1 ("irqchip/gic-v3: Work around insecure GIC integrations") Reported-by: Christoph Fritz <chf.fritz@googlemail.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Christoph Fritz <chf.fritz@googlemail.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20250215185241.3768218-1-maz@kernel.org Closes: https://lore.kernel.org/r/b1266652fb64857246e8babdf268d0df8f0c36d9.camel@googlemail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-27irqchip/jcore-aic, clocksource/drivers/jcore: Fix jcore-pit interrupt requestArtur Rojek1-1/+1
[ Upstream commit d7e3fd658248f257006227285095d190e70ee73a ] The jcore-aic irqchip does not have separate interrupt numbers reserved for cpu-local vs global interrupts. Therefore the device drivers need to request the given interrupt as per CPU interrupt. 69a9dcbd2d65 ("clocksource/drivers/jcore: Use request_percpu_irq()") converted the clocksource driver over to request_percpu_irq(), but failed to do add all the required changes, resulting in a failure to register PIT interrupts. Fix this by: 1) Explicitly mark the interrupt via irq_set_percpu_devid() in jcore_pit_init(). 2) Enable and disable the per CPU interrupt in the CPU hotplug callbacks. 3) Pass the correct per-cpu cookie to the irq handler by using handle_percpu_devid_irq() instead of handle_percpu_irq() in handle_jcore_irq(). [ tglx: Massage change log ] Fixes: 69a9dcbd2d65 ("clocksource/drivers/jcore: Use request_percpu_irq()") Signed-off-by: Artur Rojek <contact@artur-rojek.eu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Uros Bizjak <ubizjak@gmail.com> Link: https://lore.kernel.org/all/20250216175545.35079-3-contact@artur-rojek.eu Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-21genirq: Remove leading space from irq_chip::irq_print_chip() callbacksGeert Uytterhoeven1-1/+1
[ Upstream commit 29a61a1f40637ae010b828745fb41f60301c3a3d ] The space separator was factored out from the multiple chip name prints, but several irq_chip::irq_print_chip() callbacks still print a leading space. Remove the superfluous double spaces. Fixes: 9d9f204bdf7243bf ("genirq/proc: Add missing space separator back") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/893f7e9646d8933cd6786d5a1ef3eb076d263768.1738764803.git.geert+renesas@glider.be Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-17irqchip/apple-aic: Only handle PMC interrupt as FIQ when configured soNick Chan1-1/+2
commit 698244bbb3bfd32ddf9a0b70a12b1c7d69056497 upstream. The CPU PMU in Apple SoCs can be configured to fire its interrupt in one of several ways, and since Apple A11 one of the methods is FIQ, but the check of the configuration register fails to test explicitely for FIQ mode. It tests whether the IMODE bitfield is zero or not and the PMCRO_IACT bit is set. That results in false positives when the IMODE bitfield is not zero, but does not have the mode PMCR0_IMODE_FIQ. Only handle the PMC interrupt as a FIQ when the CPU PMU has been configured to fire FIQs, i.e. the IMODE bitfield value is PMCR0_IMODE_FIQ and PMCR0_IACT is set. Fixes: c7708816c944 ("irqchip/apple-aic: Wire PMU interrupts") Signed-off-by: Nick Chan <towinchenmi@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20250118163554.16733-1-towinchenmi@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-17irqchip/irq-mvebu-icu: Fix access to msi_data from irq_domain::host_dataStefan Eichenberger1-1/+2
commit 987f379b54091cc1b1db986bde71cee1081350b3 upstream. mvebu_icu_translate() incorrectly casts irq_domain::host_data directly to mvebu_icu_msi_data. However, host_data actually points to a structure of type msi_domain_info. This incorrect cast causes issues such as the thermal sensors of the CP110 platform malfunctioning. Specifically, the translation of the SEI interrupt to IRQ_TYPE_EDGE_RISING fails, preventing proper interrupt handling. The following error was observed: genirq: Setting trigger mode 4 for irq 85 failed (irq_chip_set_type_parent+0x0/0x34) armada_thermal f2400000.system-controller:thermal-sensor@70: Cannot request threaded IRQ 85 Resolve the issue by first casting host_data to msi_domain_info and then accessing mvebu_icu_msi_data through msi_domain_info::chip_data. Fixes: d929e4db22b6 ("irqchip/irq-mvebu-icu: Prepare for real per device MSI") Signed-off-by: Stefan Eichenberger <eichest@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20250124085140.44792-1-eichest@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-17irqchip/lan966x-oic: Make CONFIG_LAN966X_OIC depend on CONFIG_MCHP_LAN966X_PCIGeert Uytterhoeven1-0/+1
[ Upstream commit e06c9e3682f58fbeb632b7b866bb4fe66a4a4b42 ] The Microchip LAN966x outband interrupt controller is only present on Microchip LAN966x SoCs, and only used in PCI endpoint mode. Hence add a dependency on MCHP_LAN966X_PCI, to prevent asking the user about this driver when configuring a kernel without Microchip LAN966x PCIe support. Fixes: 3e3a7b35332924c8 ("irqchip: Add support for LAN966x OIC") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Herve Codina <herve.codina@bootlin.com> Link: https://lore.kernel.org/all/28e8a605e72ee45e27f0d06b2b71366159a9c782.1737383314.git.geert+renesas@glider.be Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-01-15irqchip: Plug a OF node reference leak in platform_irqchip_probe()Joe Hattori1-3/+1
platform_irqchip_probe() leaks a OF node when irq_init_cb() fails. Fix it by declaring par_np with the __free(device_node) cleanup construct. This bug was found by an experimental static analysis tool that I am developing. Fixes: f8410e626569 ("irqchip: Add IRQCHIP_PLATFORM_DRIVER_BEGIN/END and IRQCHIP_MATCH helper macros") Signed-off-by: Joe Hattori <joe@pf.is.s.u-tokyo.ac.jp> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20241215033945.3414223-1-joe@pf.is.s.u-tokyo.ac.jp
2025-01-15irqchip/sunxi-nmi: Add missing SKIP_WAKE flagPhilippe Simons1-1/+2
Some boards with Allwinner SoCs connect the PMIC's IRQ pin to the SoC's NMI pin instead of a normal GPIO. Since the power key is connected to the PMIC, and people expect to wake up a suspended system via this key, the NMI IRQ controller must stay alive when the system goes into suspend. Add the SKIP_WAKE flag to prevent the sunxi NMI controller from going to sleep, so that the power key can wake up those systems. [ tglx: Fixed up coding style ] Signed-off-by: Philippe Simons <simons.philippe@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250112123402.388520-1-simons.philippe@gmail.com
2025-01-15irqchip/gic-v3-its: Don't enable interrupts in its_irq_set_vcpu_affinity()Tomas Krcka1-1/+1
The following call-chain leads to enabling interrupts in a nested interrupt disabled section: irq_set_vcpu_affinity() irq_get_desc_lock() raw_spin_lock_irqsave() <--- Disable interrupts its_irq_set_vcpu_affinity() guard(raw_spinlock_irq) <--- Enables interrupts when leaving the guard() irq_put_desc_unlock() <--- Warns because interrupts are enabled This was broken in commit b97e8a2f7130, which replaced the original raw_spin_[un]lock() pair with guard(raw_spinlock_irq). Fix the issue by using guard(raw_spinlock). [ tglx: Massaged change log ] Fixes: b97e8a2f7130 ("irqchip/gic-v3-its: Fix potential race condition in its_vlpi_prop_update()") Signed-off-by: Tomas Krcka <krckatom@amazon.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20241230150825.62894-1-krckatom@amazon.de
2025-01-15irqchip/gic-v3: Handle CPU_PM_ENTER_FAILED correctlyYogesh Lal1-1/+1
When a CPU attempts to enter low power mode, it disables the redistributor and Group 1 interrupts and reinitializes the system registers upon wakeup. If the transition into low power mode fails, then the CPU_PM framework invokes the PM notifier callback with CPU_PM_ENTER_FAILED to allow the drivers to undo the state changes. The GIC V3 driver ignores CPU_PM_ENTER_FAILED, which leaves the GIC in disabled state. Handle CPU_PM_ENTER_FAILED in the same way as CPU_PM_EXIT to restore normal operation. [ tglx: Massage change log, add Fixes tag ] Fixes: 3708d52fc6bb ("irqchip: gic-v3: Implement CPU PM notifier") Signed-off-by: Yogesh Lal <quic_ylal@quicinc.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20241220093907.2747601-1-quic_ylal@quicinc.com
2024-12-13irqchip/gic-v3: Work around insecure GIC integrationsMarc Zyngier1-1/+16
It appears that the relatively popular RK3399 SoC has been put together using a large amount of illicit substances, as experiments reveal that its integration of GIC500 exposes the *secure* programming interface to non-secure. This has some pretty bad effects on the way priorities are handled, and results in a dead machine if booting with pseudo-NMI enabled (irqchip.gicv3_pseudo_nmi=1) if the kernel contains 18fdb6348c480 ("arm64: irqchip/gic-v3: Select priorities at boot time"), which relies on the priorities being programmed using the NS view. Let's restore some sanity by going one step further and disable security altogether in this case. This is not any worse, and puts us in a mode where priorities actually make some sense. Huge thanks to Mark Kettenis who initially identified this issue on OpenBSD, and to Chen-Yu Tsai who reported the problem in Linux. Fixes: 18fdb6348c480 ("arm64: irqchip/gic-v3: Select priorities at boot time") Reported-by: Mark Kettenis <mark.kettenis@xs4all.nl> Reported-by: Chen-Yu Tsai <wens@csie.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Chen-Yu Tsai <wens@csie.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20241213141037.3995049-1-maz@kernel.org
2024-12-13irqchip/gic: Correct declaration of *percpu_base pointer in union gic_baseUros Bizjak1-1/+1
percpu_base is used in various percpu functions that expect variable in __percpu address space. Correct the declaration of percpu_base to void __iomem * __percpu *percpu_base; to declare the variable as __percpu pointer. The patch fixes several sparse warnings: irq-gic.c:1172:44: warning: incorrect type in assignment (different address spaces) irq-gic.c:1172:44: expected void [noderef] __percpu *[noderef] __iomem *percpu_base irq-gic.c:1172:44: got void [noderef] __iomem *[noderef] __percpu * ... irq-gic.c:1231:43: warning: incorrect type in argument 1 (different address spaces) irq-gic.c:1231:43: expected void [noderef] __percpu *__pdata irq-gic.c:1231:43: got void [noderef] __percpu *[noderef] __iomem *percpu_base There were no changes in the resulting object files. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/all/20241213145809.2918-2-ubizjak@gmail.com
2024-12-08Merge tag 'irq_urgent_for_v6.13_rc2' of ↵Linus Torvalds3-2/+5
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Borislav Petkov: - Fix a /proc/interrupts formatting regression - Have the BCM2836 interrupt controller enter power management states properly - Other fixlets * tag 'irq_urgent_for_v6.13_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/stm32mp-exti: CONFIG_STM32MP_EXTI should not default to y when compile-testing genirq/proc: Add missing space separator back irqchip/bcm2836: Enable SKIP_SET_WAKE and MASK_ON_SUSPEND irqchip/gic-v3: Fix irq_complete_ack() comment
2024-12-03irqchip/stm32mp-exti: CONFIG_STM32MP_EXTI should not default to y when ↵Geert Uytterhoeven1-1/+1
compile-testing Merely enabling compile-testing should not enable additional functionality. Fixes: 0be58e0553812fcb ("irqchip/stm32mp-exti: Allow building as module") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/ef5ec063b23522058f92087e072419ea233acfe9.1733243115.git.geert+renesas@glider.be
2024-12-03irqchip/bcm2836: Enable SKIP_SET_WAKE and MASK_ON_SUSPENDStefan Wahren1-0/+3
The BCM2836 interrupt controller doesn't provide any facility to configure the wakeup sources. That's the reason why the driver lacks the irq_set_wake() callback for the interrupt chip. Enable the flags IRQCHIP_SKIP_SET_WAKE and IRQCHIP_MASK_ON_SUSPEND so the interrupt suspend logic can handle the chip correctly equivalently to the corresponding bcm2835 change (9a58480e5e53 ("irqchip/bcm2835: Enable SKIP_SET_WAKE and MASK_ON_SUSPEND"). Signed-off-by: Stefan Wahren <wahrenst@gmx.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/all/20241202115437.33552-1-wahrenst@gmx.net
2024-12-03irqchip/gic-v3: Fix irq_complete_ack() commentLorenzo Pieralisi1-1/+1
When the GIC is in EOImode == 1 in irq_complete_ack() it executes a priority drop not a deactivation. Fix the function comment to clarify the behaviour. Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20241202112518.51178-1-lpieralisi@kernel.org
2024-12-01Merge tag 'irq_urgent_for_v6.13_rc1' of ↵Linus Torvalds14-24/+52
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Borislav Petkov: - Move the ->select callback to the correct ops structure in irq-mvebu-sei to fix some Marvell Armada platforms - Add a workaround for Hisilicon ITS erratum 162100801 which can cause some virtual interrupts to get lost - More platform_driver::remove() conversion * tag 'irq_urgent_for_v6.13_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip: Switch back to struct platform_driver::remove() irqchip/gicv3-its: Add workaround for hip09 ITS erratum 162100801 irqchip/irq-mvebu-sei: Move misplaced select() callback to SEI CP domain
2024-11-29Merge tag 'driver-core-6.13-rc1' of ↵Linus Torvalds3-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is a small set of driver core changes for 6.13-rc1. Nothing major for this merge cycle, except for the two simple merge conflicts are here just to make life interesting. Included in here are: - sysfs core changes and preparations for more sysfs api cleanups that can come through all driver trees after -rc1 is out - fw_devlink fixes based on many reports and debugging sessions - list_for_each_reverse() removal, no one was using it! - last-minute seq_printf() format string bug found and fixed in many drivers all at once. - minor bugfixes and changes full details in the shortlog" * tag 'driver-core-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (35 commits) Fix a potential abuse of seq_printf() format string in drivers cpu: Remove spurious NULL in attribute_group definition s390/con3215: Remove spurious NULL in attribute_group definition perf: arm-ni: Remove spurious NULL in attribute_group definition driver core: Constify bin_attribute definitions sysfs: attribute_group: allow registration of const bin_attribute firmware_loader: Fix possible resource leak in fw_log_firmware_info() drivers: core: fw_devlink: Fix excess parameter description in docstring driver core: class: Correct WARN() message in APIs class_(for_each|find)_device() cacheinfo: Use of_property_present() for non-boolean properties cdx: Fix cdx_mmap_resource() after constifying attr in ->mmap() drivers: core: fw_devlink: Make the error message a bit more useful phy: tegra: xusb: Set fwnode for xusb port devices drm: display: Set fwnode for aux bus devices driver core: fw_devlink: Stop trying to optimize cycle detection logic driver core: Constify attribute arguments of binary attributes sysfs: bin_attribute: add const read/write callback variants sysfs: implement all BIN_ATTR_* macros in terms of __BIN_ATTR() sysfs: treewide: constify attribute callback of bin_attribute::llseek() sysfs: treewide: constify attribute callback of bin_attribute::mmap() ...
2024-11-26irqchip: Switch back to struct platform_driver::remove()Uwe Kleine-König12-12/+12
After commit 0edb555a65d1 ("platform: Make platform_driver::remove() return void") .remove() is (again) the right callback to implement for platform drivers. Convert all platform drivers below drivers/irqchip/ to use .remove(), with the eventual goal to drop struct platform_driver::remove_new(). As .remove() and .remove_new() have the same prototypes, conversion is done by just changing the structure member name in the driver initializer. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20241109173828.291172-2-u.kleine-koenig@baylibre.com
2024-11-26irqchip/gicv3-its: Add workaround for hip09 ITS erratum 162100801Zhou Wang1-11/+39
When enabling GICv4.1 in hip09, VMAPP fails to clear some caches during the unmap operation, which can causes vSGIs to be lost. To fix the issue, invalidate the related vPE cache through GICR_INVALLR after VMOVP. Suggested-by: Marc Zyngier <maz@kernel.org> Co-developed-by: Nianyao Tang <tangnianyao@huawei.com> Signed-off-by: Nianyao Tang <tangnianyao@huawei.com> Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Marc Zyngier <maz@kernel.org>
2024-11-26irqchip/irq-mvebu-sei: Move misplaced select() callback to SEI CP domainRussell King (Oracle)1-1/+1
Commit fbdf14e90ce4 ("irqchip/irq-mvebu-sei: Switch to MSI parent") introduced in v6.11-rc1 broke Mavell Armada platforms (and possibly others) by incorrectly switching irq-mvebu-sei to MSI parent. In the above commit, msi_parent_ops is set for the sei->cp_domain, but rather than adding a .select method to mvebu_sei_cp_domain_ops (which is associated with sei->cp_domain), it was added to mvebu_sei_domain_ops which is associated with sei->sei_domain, which doesn't have any msi_parent_ops. This makes the call to msi_lib_irq_domain_select() always fail. This bug manifests itself with the following kernel messages on Armada 8040 based systems: platform f21e0000.interrupt-controller:interrupt-controller@50: deferred probe pending: (reason unknown) platform f41e0000.interrupt-controller:interrupt-controller@50: deferred probe pending: (reason unknown) Move the select callback to mvebu_sei_cp_domain_ops to cure it. Fixes: fbdf14e90ce4 ("irqchip/irq-mvebu-sei: Switch to MSI parent") Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/E1tE6bh-004CmX-QU@rmk-PC.armlinux.org.uk
2024-11-23Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-21/+87
Pull kvm updates from Paolo Bonzini: "The biggest change here is eliminating the awful idea that KVM had of essentially guessing which pfns are refcounted pages. The reason to do so was that KVM needs to map both non-refcounted pages (for example BARs of VFIO devices) and VM_PFNMAP/VM_MIXMEDMAP VMAs that contain refcounted pages. However, the result was security issues in the past, and more recently the inability to map VM_IO and VM_PFNMAP memory that _is_ backed by struct page but is not refcounted. In particular this broke virtio-gpu blob resources (which directly map host graphics buffers into the guest as "vram" for the virtio-gpu device) with the amdgpu driver, because amdgpu allocates non-compound higher order pages and the tail pages could not be mapped into KVM. This requires adjusting all uses of struct page in the per-architecture code, to always work on the pfn whenever possible. The large series that did this, from David Stevens and Sean Christopherson, also cleaned up substantially the set of functions that provided arch code with the pfn for a host virtual addresses. The previous maze of twisty little passages, all different, is replaced by five functions (__gfn_to_page, __kvm_faultin_pfn, the non-__ versions of these two, and kvm_prefetch_pages) saving almost 200 lines of code. ARM: - Support for stage-1 permission indirection (FEAT_S1PIE) and permission overlays (FEAT_S1POE), including nested virt + the emulated page table walker - Introduce PSCI SYSTEM_OFF2 support to KVM + client driver. This call was introduced in PSCIv1.3 as a mechanism to request hibernation, similar to the S4 state in ACPI - Explicitly trap + hide FEAT_MPAM (QoS controls) from KVM guests. As part of it, introduce trivial initialization of the host's MPAM context so KVM can use the corresponding traps - PMU support under nested virtualization, honoring the guest hypervisor's trap configuration and event filtering when running a nested guest - Fixes to vgic ITS serialization where stale device/interrupt table entries are not zeroed when the mapping is invalidated by the VM - Avoid emulated MMIO completion if userspace has requested synchronous external abort injection - Various fixes and cleanups affecting pKVM, vCPU initialization, and selftests LoongArch: - Add iocsr and mmio bus simulation in kernel. - Add in-kernel interrupt controller emulation. - Add support for virtualization extensions to the eiointc irqchip. PPC: - Drop lingering and utterly obsolete references to PPC970 KVM, which was removed 10 years ago. - Fix incorrect documentation references to non-existing ioctls RISC-V: - Accelerate KVM RISC-V when running as a guest - Perf support to collect KVM guest statistics from host side s390: - New selftests: more ucontrol selftests and CPU model sanity checks - Support for the gen17 CPU model - List registers supported by KVM_GET/SET_ONE_REG in the documentation x86: - Cleanup KVM's handling of Accessed and Dirty bits to dedup code, improve documentation, harden against unexpected changes. Even if the hardware A/D tracking is disabled, it is possible to use the hardware-defined A/D bits to track if a PFN is Accessed and/or Dirty, and that removes a lot of special cases. - Elide TLB flushes when aging secondary PTEs, as has been done in x86's primary MMU for over 10 years. - Recover huge pages in-place in the TDP MMU when dirty page logging is toggled off, instead of zapping them and waiting until the page is re-accessed to create a huge mapping. This reduces vCPU jitter. - Batch TLB flushes when dirty page logging is toggled off. This reduces the time it takes to disable dirty logging by ~3x. - Remove the shrinker that was (poorly) attempting to reclaim shadow page tables in low-memory situations. - Clean up and optimize KVM's handling of writes to MSR_IA32_APICBASE. - Advertise CPUIDs for new instructions in Clearwater Forest - Quirk KVM's misguided behavior of initialized certain feature MSRs to their maximum supported feature set, which can result in KVM creating invalid vCPU state. E.g. initializing PERF_CAPABILITIES to a non-zero value results in the vCPU having invalid state if userspace hides PDCM from the guest, which in turn can lead to save/restore failures. - Fix KVM's handling of non-canonical checks for vCPUs that support LA57 to better follow the "architecture", in quotes because the actual behavior is poorly documented. E.g. most MSR writes and descriptor table loads ignore CR4.LA57 and operate purely on whether the CPU supports LA57. - Bypass the register cache when querying CPL from kvm_sched_out(), as filling the cache from IRQ context is generally unsafe; harden the cache accessors to try to prevent similar issues from occuring in the future. The issue that triggered this change was already fixed in 6.12, but was still kinda latent. - Advertise AMD_IBPB_RET to userspace, and fix a related bug where KVM over-advertises SPEC_CTRL when trying to support cross-vendor VMs. - Minor cleanups - Switch hugepage recovery thread to use vhost_task. These kthreads can consume significant amounts of CPU time on behalf of a VM or in response to how the VM behaves (for example how it accesses its memory); therefore KVM tried to place the thread in the VM's cgroups and charge the CPU time consumed by that work to the VM's container. However the kthreads did not process SIGSTOP/SIGCONT, and therefore cgroups which had KVM instances inside could not complete freezing. Fix this by replacing the kthread with a PF_USER_WORKER thread, via the vhost_task abstraction. Another 100+ lines removed, with generally better behavior too like having these threads properly parented in the process tree. - Revert a workaround for an old CPU erratum (Nehalem/Westmere) that didn't really work; there was really nothing to work around anyway: the broken patch was meant to fix nested virtualization, but the PERF_GLOBAL_CTRL MSR is virtualized and therefore unaffected by the erratum. - Fix 6.12 regression where CONFIG_KVM will be built as a module even if asked to be builtin, as long as neither KVM_INTEL nor KVM_AMD is 'y'. x86 selftests: - x86 selftests can now use AVX. Documentation: - Use rST internal links - Reorganize the introduction to the API document Generic: - Protect vcpu->pid accesses outside of vcpu->mutex with a rwlock instead of RCU, so that running a vCPU on a different task doesn't encounter long due to having to wait for all CPUs become quiescent. In general both reads and writes are rare, but userspace that supports confidential computing is introducing the use of "helper" vCPUs that may jump from one host processor to another. Those will be very happy to trigger a synchronize_rcu(), and the effect on performance is quite the disaster" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (298 commits) KVM: x86: Break CONFIG_KVM_X86's direct dependency on KVM_INTEL || KVM_AMD KVM: x86: add back X86_LOCAL_APIC dependency Revert "KVM: VMX: Move LOAD_IA32_PERF_GLOBAL_CTRL errata handling out of setup_vmcs_config()" KVM: x86: switch hugepage recovery thread to vhost_task KVM: x86: expose MSR_PLATFORM_INFO as a feature MSR x86: KVM: Advertise CPUIDs for new instructions in Clearwater Forest Documentation: KVM: fix malformed table irqchip/loongson-eiointc: Add virt extension support LoongArch: KVM: Add irqfd support LoongArch: KVM: Add PCHPIC user mode read and write functions LoongArch: KVM: Add PCHPIC read and write functions LoongArch: KVM: Add PCHPIC device support LoongArch: KVM: Add EIOINTC user mode read and write functions LoongArch: KVM: Add EIOINTC read and write functions LoongArch: KVM: Add EIOINTC device support LoongArch: KVM: Add IPI user mode read and write function LoongArch: KVM: Add IPI read and write function LoongArch: KVM: Add IPI device support LoongArch: KVM: Add iocsr and mmio bus simulation in kernel KVM: arm64: Pass on SVE mapping failures ...
2024-11-22Fix a potential abuse of seq_printf() format string in driversDavid Wang3-3/+3
Using device name as format string of seq_printf() is proned to "Format string attack", opens possibility for exploitation. Seq_puts() is safer and more efficient. Signed-off-by: David Wang <00107082@163.com> Reviewed-by: Kees Cook <kees@kernel.org> Link: https://lore.kernel.org/r/20241120053055.225195-1-00107082@163.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-11-19Merge tag 'irq-core-2024-11-18' of ↵Linus Torvalds12-76/+1212
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull interrupt subsystem updates from Thomas Gleixner: "Tree wide: - Make nr_irqs static to the core code and provide accessor functions to remove existing and prevent future aliasing problems with local variables or function arguments of the same name. Core code: - Prevent freeing an interrupt in the devres code which is not managed by devres in the first place. - Use seq_put_decimal_ull_width() for decimal values output in /proc/interrupts which increases performance significantly as it avoids parsing the format strings over and over. - Optimize raising the timer and hrtimer soft interrupts by using the 'set bit only' variants instead of the combined version which checks whether ksoftirqd should be woken up. The latter is a pointless exercise as both soft interrupts are raised in the context of the timer interrupt and therefore never wake up ksoftirqd. - Delegate timer/hrtimer soft interrupt processing to a dedicated thread on RT. Timer and hrtimer soft interrupts are always processed in ksoftirqd on RT enabled kernels. This can lead to high latencies when other soft interrupts are delegated to ksoftirqd as well. The separate thread allows to run them seperately under a RT scheduling policy to reduce the latency overhead. Drivers: - New drivers or extensions of existing drivers to support Renesas RZ/V2H(P), Aspeed AST27XX, T-HEAD C900 and ATMEL sam9x7 interrupt chips - Support for multi-cluster GICs on MIPS. MIPS CPUs can come with multiple CPU clusters, where each CPU cluster has its own GIC (Generic Interrupt Controller). This requires to access the GIC of a remote cluster through a redirect register block. This is encapsulated into a set of helper functions to keep the complexity out of the actual code paths which handle the GIC details. - Support for encrypted guests in the ARM GICV3 ITS driver The ITS page needs to be shared with the hypervisor and therefore must be decrypted. - Small cleanups and fixes all over the place" * tag 'irq-core-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits) irqchip/riscv-aplic: Prevent crash when MSI domain is missing genirq/proc: Use seq_put_decimal_ull_width() for decimal values softirq: Use a dedicated thread for timer wakeups on PREEMPT_RT. timers: Use __raise_softirq_irqoff() to raise the softirq. hrtimer: Use __raise_softirq_irqoff() to raise the softirq riscv: defconfig: Enable T-HEAD C900 ACLINT SSWI drivers irqchip: Add T-HEAD C900 ACLINT SSWI driver dt-bindings: interrupt-controller: Add T-HEAD C900 ACLINT SSWI device irqchip/stm32mp-exti: Use of_property_present() for non-boolean properties irqchip/mips-gic: Fix selection of GENERIC_IRQ_EFFECTIVE_AFF_MASK irqchip/mips-gic: Prevent indirect access to clusters without CPU cores irqchip/mips-gic: Multi-cluster support irqchip/mips-gic: Setup defaults in each cluster irqchip/mips-gic: Support multi-cluster in for_each_online_cpu_gic() irqchip/mips-gic: Replace open coded online CPU iterations genirq/irqdesc: Use str_enabled_disabled() helper in wakeup_show() genirq/devres: Don't free interrupt which is not managed by devres irqchip/gic-v3-its: Fix over allocation in itt_alloc_pool() irqchip/aspeed-intc: Add AST27XX INTC support dt-bindings: interrupt-controller: Add support for ASPEED AST27XX INTC ...
2024-11-16irqchip/riscv-aplic: Prevent crash when MSI domain is missingSamuel Holland2-1/+5
If the APLIC driver is probed before the IMSIC driver, the parent MSI domain will be missing, which causes a NULL pointer dereference in msi_create_device_irq_domain(). Avoid this by deferring probe until the parent MSI domain is available. Use dev_err_probe() to avoid printing an error message when returning -EPROBE_DEFER. Fixes: ca8df97fe679 ("irqchip/riscv-aplic: Add support for MSI-mode") Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20241114200133.3069460-1-samuel.holland@sifive.com
2024-11-13irqchip/loongson-eiointc: Add virt extension supportBibo Mao1-21/+87
Interrupts can be routed to maximal four virtual CPUs with real HW EIOINTC interrupt controller model, since interrupt routing is encoded with CPU bitmap and EIOINTC node combined method. Here add the EIOINTC virt extension support so that interrupts can be routed to 256 vCPUs in virtual machine mode. CPU bitmap is replaced with normal encoding and EIOINTC node type is removed, so there are 8 bits for cpu selection, at most 256 vCPUs are supported for interrupt routing. Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Co-developed-by: Song Gao <gaosong@loongson.cn> Signed-off-by: Song Gao <gaosong@loongson.cn> Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-11-07irqchip: Add T-HEAD C900 ACLINT SSWI driverInochi Amaoto3-0/+189
Add a driver for the T-HEAD C900 ACLINT SSWI device. This device allows the system with T-HEAD cpus to send ipi via fast device interface. Signed-off-by: Inochi Amaoto <inochiama@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20241031060859.722258-3-inochiama@gmail.com
2024-11-07irqchip/stm32mp-exti: Use of_property_present() for non-boolean propertiesRob Herring (Arm)1-2/+1
The use of of_property_read_bool() for non-boolean properties is deprecated in favor of of_property_present() when testing for property presence. Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Antonio Borneo <antonio.borneo@foss.st.com> Link: https://lore.kernel.org/all/20241104190836.278117-1-robh@kernel.org
2024-11-07irqchip/gic-v3: Force propagation of the active state with a read-backMarc Zyngier1-0/+7
Christoffer reports that on some implementations, writing to GICR_ISACTIVER0 (and similar GICD registers) can race badly with a guest issuing a deactivation of that interrupt via the system register interface. There are multiple reasons to this: - this uses an early write-acknoledgement memory type (nGnRE), meaning that the write may only have made it as far as some interconnect by the time the store is considered "done" - the GIC itself is allowed to buffer the write until it decides to take it into account (as long as it is in finite time) The effects are that the activation may not have taken effect by the time the kernel enters the guest, forcing an immediate exit, or that a guest deactivation occurs before the interrupt is active, doing nothing. In order to guarantee that the write to the ISACTIVER register has taken effect, read back from it, forcing the interconnect to propagate the write, and the GIC to process the write before returning the read. Reported-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Christoffer Dall <christoffer.dall@arm.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20241106084418.3794612-1-maz@kernel.org
2024-11-01irqchip/mips-gic: Fix selection of GENERIC_IRQ_EFFECTIVE_AFF_MASKNathan Chancellor1-1/+1
Without SMP enabled (such as in allnoconfig), there is a Kconfig warning because CONFIG_IRQ_EFFECTIVE_AFF_MASK is unconditionally selected by CONFIG_MIPS_GIC: WARNING: unmet direct dependencies detected for GENERIC_IRQ_EFFECTIVE_AFF_MASK Depends on [n]: SMP [=n] Selected by [y]: - MIPS_GIC [=y] Add a dependency on SMP to the selection, which matches all other selections of CONFIG_IRQ_EFFECTIVE_AFF_MASK. Fixes: 322a90638768 ("irqchip/mips-gic: Multi-cluster support") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20241101-mips-fix-generic_irq_effective_aff_mask-select-v1-1-d94db6e0de0d@kernel.org
2024-10-30irqchip/mips-gic: Prevent indirect access to clusters without CPU coresGregory CLEMENT1-4/+16
It is possible to have zero CPU cores in a cluster; in such cases, it is not possible to access the GIC, and any indirect access leads to an exception. Prevent access to such clusters by checking the number of cores in the cluster at all places which issue indirect cluster access. Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com> Signed-off-by: Aleksandar Rikalo <arikalo@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20241028175935.51250-14-arikalo@gmail.com
2024-10-30irqchip/mips-gic: Multi-cluster supportPaul Burton2-19/+143
The MIPS I6500 CPU & CM (Coherence Manager) 3.5 introduce the concept of multiple clusters to the system. In these systems, each cluster contains its own GIC, so the GIC isn't truly global any longer. Access to registers in the GICs of remote clusters is possible using a redirect register block much like the redirect register blocks provided by the CM & CPC, and configured through the same GCR_REDIRECT register that mips_cm_lock_other() abstraction builds upon. It is expected that external interrupts are connected identically on all clusters. That is, if there is a device providing an interrupt connected to GIC interrupt pin 0 then it should be connected to pin 0 of every GIC in the system. For the most part, the GIC can be treated as though it is still truly global, so long as interrupts in the cluster are configured properly. Introduce support for such multi-cluster systems in the MIPS GIC irqchip driver. A newly introduced gic_irq_lock_cluster() function allows: 1) Configure access to a GIC in a remote cluster via the redirect register block, using mips_cm_lock_other(). Or: 2) Detect that the interrupt in question is affine to the local cluster and plain old GIC register access to the GIC in the local cluster should be used. It is possible to access the local cluster's GIC registers via the redirect block, but keeping the special case for them is both good for performance (because we avoid the locking & indirection overhead of using the redirect block) and necessary to maintain compatibility with systems using CM revisions prior to 3.5 which don't support the redirect block. The gic_irq_lock_cluster() function relies upon an IRQs effective affinity in order to discover which cluster the IRQ is affine to. In order to track this & allow it to be updated at an appropriate point during gic_set_affinity() select the generic support for effective affinity using CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK. gic_set_affinity() is the one function which gains much complexity. It now deconfigures routing to any VP(E), ie. CPU, on the old cluster when moving affinity to a new cluster. gic_shared_irq_domain_map() moves its update of the IRQs effective affinity to before its use of gic_irq_lock_cluster(), to ensure that operation is on the cluster the IRQ is affine to. The remaining changes are straightforward use of the gic_irq_lock_cluster() function to select between local cluster & remote cluster code-paths when configuring interrupts. Signed-off-by: Paul Burton <paulburton@kernel.org> Signed-off-by: Chao-ying Fu <cfu@wavecomp.com> Signed-off-by: Dragan Mladjenovic <dragan.mladjenovic@syrmia.com> Signed-off-by: Aleksandar Rikalo <arikalo@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Serge Semin <fancer.lancer@gmail.com> Tested-by: Gregory CLEMENT <gregory.clement@bootlin.com> Link: https://lore.kernel.org/all/20241028175935.51250-5-arikalo@gmail.com
2024-10-30irqchip/mips-gic: Setup defaults in each clusterChao-ying Fu1-6/+24
In multi-cluster MIPS I6500 systems, there is a GIC per cluster. The default shared interrupt setup configured in gic_of_init() applies only to the GIC in the cluster containing the boot CPU, leaving the GICs of other clusters unconfigured. Configure the other clusters as well. Signed-off-by: Chao-ying Fu <cfu@wavecomp.com> Signed-off-by: Dragan Mladjenovic <dragan.mladjenovic@syrmia.com> Signed-off-by: Aleksandar Rikalo <arikalo@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Serge Semin <fancer.lancer@gmail.com> Tested-by: Gregory CLEMENT <gregory.clement@bootlin.com> Link: https://lore.kernel.org/all/20241028175935.51250-4-arikalo@gmail.com
2024-10-30irqchip/mips-gic: Support multi-cluster in for_each_online_cpu_gic()Paul Burton1-0/+7
Use CM's GCR_CL_REDIRECT register to access registers in remote clusters, so that users of gic_with_each_online_cpu() gains support for multi-cluster without further changes. Signed-off-by: Paul Burton <paulburton@kernel.org> Signed-off-by: Chao-ying Fu <cfu@wavecomp.com> Signed-off-by: Dragan Mladjenovic <dragan.mladjenovic@syrmia.com> Signed-off-by: Aleksandar Rikalo <arikalo@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Serge Semin <fancer.lancer@gmail.com> Tested-by: Gregory CLEMENT <gregory.clement@bootlin.com> Link: https://lore.kernel.org/all/20241028175935.51250-3-arikalo@gmail.com
2024-10-30irqchip/mips-gic: Replace open coded online CPU iterationsPaul Burton1-18/+41
Several places in the MIPS GIC driver iterate over the online CPUs to operate on the CPU's GIC local register block, accessed via the GIC's other/redirect register block. Abstract the process of iterating over online CPUs & configuring the other/redirect region to access their registers through a new for_each_online_cpu_gic() macro and convert all usage sites over. Signed-off-by: Paul Burton <paulburton@kernel.org> Signed-off-by: Chao-ying Fu <cfu@wavecomp.com> Signed-off-by: Dragan Mladjenovic <dragan.mladjenovic@syrmia.com> Signed-off-by: Aleksandar Rikalo <arikalo@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Serge Semin <fancer.lancer@gmail.com> Tested-by: Gregory CLEMENT <gregory.clement@bootlin.com> Link: https://lore.kernel.org/all/20241028175935.51250-2-arikalo@gmail.com
2024-10-27irqchip/gic-v4: Correctly deal with set_affinity on lazily-mapped VPEsMarc Zyngier1-2/+12
Zenghui points out that a recent change to the way set_affinity is handled for VPEs has the potential to return an error if the VPE hasn't been mapped yet (because the guest hasn't emited a MAPTI command yet), affecting GICv4.0 implementations that rely on the ITSList feature. Fix this by making the set_affinity succeed in this case, and return early, without trying to touch the HW. Fixes: 1442ee0011983 ("irqchip/gic-v4: Don't allow a VMOVP on a dying VPE") Reported-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Link: https://lore.kernel.org/all/20241027102220.1858558-1-maz@kernel.org Link: https://lore.kernel.org/r/aab45cd3-e5ca-58cf-e081-e32a17f5b4e7@huawei.com
2024-10-21