linux.git, branch v6.9.4

Linux 6.9.4

2024-06-12T09:39:59+00:00

Link: https://lore.kernel.org/r/20240606131651.683718371@linuxfoundation.org
Tested-by: Ronald Warsow 
Tested-by: SeongJae Park 
Tested-by: Pavel Machek (CIP) 
Tested-by: Shuah Khan 
Tested-by: Bagas Sanjaya 
Tested-by: Jon Hunter 
Tested-by: Mark Brown 
Tested-by: Allen Pais 
Tested-by: Pascal Ernster 
Tested-by: Ron Economos 
Tested-by: Peter Schneider 
Link: https://lore.kernel.org/r/20240609113803.338372290@linuxfoundation.org
Tested-by: Ronald Warsow 
Tested-by: SeongJae Park 
Tested-by: Bagas Sanjaya 
Tested-by: Mark Brown 
Tested-by: Jon Hunter 
Tested-by: Linux Kernel Functional Testing 
Tested-by: Kelsey Steele 
Signed-off-by: Greg Kroah-Hartman

platform/x86/intel-uncore-freq: Don't present root domain on error

2024-06-12T09:39:58+00:00

commit db643cb7ebe524d17b4b13583dda03485d4a1bc0 upstream.

If none of the clusters are added because of some error, fail to load
driver without presenting root domain. In this case root domain will
present invalid data.

Signed-off-by: Srinivas Pandruvada 
Fixes: 01c10f88c9b7 ("platform/x86/intel-uncore-freq: tpmi: Provide cluster level control")
Cc:  # 6.5+
Link: https://lore.kernel.org/r/20240415215210.2824868-1-srinivas.pandruvada@linux.intel.com
Reviewed-by: Hans de Goede 
Signed-off-by: Hans de Goede 
Signed-off-by: Greg Kroah-Hartman

platform/x86/intel/tpmi: Handle error from tpmi_process_info()

2024-06-12T09:39:58+00:00

commit 2920141fc149f71bad22361946417bc43783ed7f upstream.

When tpmi_process_info() returns error, fail to load the driver.
This can happen if call to ioremap() returns error.

Signed-off-by: Srinivas Pandruvada 
Reviewed-by: Ilpo Järvinen 
Cc: stable@vger.kernel.org # v6.3+
Link: https://lore.kernel.org/r/20240423204619.3946901-2-srinivas.pandruvada@linux.intel.com
Reviewed-by: Hans de Goede 
Signed-off-by: Hans de Goede 
Signed-off-by: Greg Kroah-Hartman

genirq/cpuhotplug, x86/vector: Prevent vector leak during CPU offline

2024-06-12T09:39:58+00:00

commit a6c11c0a5235fb144a65e0cb2ffd360ddc1f6c32 upstream.

The absence of IRQD_MOVE_PCNTXT prevents immediate effectiveness of
interrupt affinity reconfiguration via procfs. Instead, the change is
deferred until the next instance of the interrupt being triggered on the
original CPU.

When the interrupt next triggers on the original CPU, the new affinity is
enforced within __irq_move_irq(). A vector is allocated from the new CPU,
but the old vector on the original CPU remains and is not immediately
reclaimed. Instead, apicd->move_in_progress is flagged, and the reclaiming
process is delayed until the next trigger of the interrupt on the new CPU.

Upon the subsequent triggering of the interrupt on the new CPU,
irq_complete_move() adds a task to the old CPU's vector_cleanup list if it
remains online. Subsequently, the timer on the old CPU iterates over its
vector_cleanup list, reclaiming old vectors.

However, a rare scenario arises if the old CPU is outgoing before the
interrupt triggers again on the new CPU.

In that case irq_force_complete_move() is not invoked on the outgoing CPU
to reclaim the old apicd->prev_vector because the interrupt isn't currently
affine to the outgoing CPU, and irq_needs_fixup() returns false. Even
though __vector_schedule_cleanup() is later called on the new CPU, it
doesn't reclaim apicd->prev_vector; instead, it simply resets both
apicd->move_in_progress and apicd->prev_vector to 0.

As a result, the vector remains unreclaimed in vector_matrix, leading to a
CPU vector leak.

To address this issue, move the invocation of irq_force_complete_move()
before the irq_needs_fixup() call to reclaim apicd->prev_vector, if the
interrupt is currently or used to be affine to the outgoing CPU.

Additionally, reclaim the vector in __vector_schedule_cleanup() as well,
following a warning message, although theoretically it should never see
apicd->move_in_progress with apicd->prev_cpu pointing to an offline CPU.

Fixes: f0383c24b485 ("genirq/cpuhotplug: Add support for cleaning up move in progress")
Signed-off-by: Dongli Zhang 
Signed-off-by: Thomas Gleixner 
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240522220218.162423-1-dongli.zhang@oracle.com
Signed-off-by: Greg Kroah-Hartman

x86/topology/intel: Unlock CPUID before evaluating anything

2024-06-12T09:39:58+00:00

commit 0c2f6d04619ec2b53ad4b0b591eafc9389786e86 upstream.

Intel CPUs have a MSR bit to limit CPUID enumeration to leaf two. If
this bit is set by the BIOS then CPUID evaluation including topology
enumeration does not work correctly as the evaluation code does not try
to analyze any leaf greater than two.

This went unnoticed before because the original topology code just
repeated evaluation several times and managed to overwrite the initial
limited information with the correct one later. The new evaluation code
does it once and therefore ends up with the limited and wrong
information.

Cure this by unlocking CPUID right before evaluating anything which
depends on the maximum CPUID leaf being greater than two instead of
rereading stuff after unlock.

Fixes: 22d63660c35e ("x86/cpu: Use common topology code for Intel")
Reported-by: Peter Schneider 
Signed-off-by: Thomas Gleixner 
Signed-off-by: Borislav Petkov (AMD) 
Tested-by: Peter Schneider 
Cc: 
Link: https://lore.kernel.org/r/fd3f73dc-a86f-4bcf-9c60-43556a21eb42@googlemail.com
Signed-off-by: Greg Kroah-Hartman

KVM: x86: Don't advertise guest.MAXPHYADDR as host.MAXPHYADDR in CPUID

2024-06-12T09:39:58+00:00

commit 6f5c9600621b4efb5c61b482d767432eb1ad3a9c upstream.

Drop KVM's propagation of GuestPhysBits (CPUID leaf 80000008, EAX[23:16])
to HostPhysBits (same leaf, EAX[7:0]) when advertising the address widths
to userspace via KVM_GET_SUPPORTED_CPUID.

Per AMD, GuestPhysBits is intended for software use, and physical CPUs do
not set that field.  I.e. GuestPhysBits will be non-zero if and only if
KVM is running as a nested hypervisor, and in that case, GuestPhysBits is
NOT guaranteed to capture the CPU's effective MAXPHYADDR when running with
TDP enabled.

E.g. KVM will soon use GuestPhysBits to communicate the CPU's maximum
*addressable* guest physical address, which would result in KVM under-
reporting PhysBits when running as an L1 on a CPU with MAXPHYADDR=52,
but without 5-level paging.

Signed-off-by: Gerd Hoffmann 
Cc: stable@vger.kernel.org
Reviewed-by: Xiaoyao Li 
Link: https://lore.kernel.org/r/20240313125844.912415-2-kraxel@redhat.com
[sean: rewrite changelog with --verbose, Cc stable@]
Signed-off-by: Sean Christopherson 
Signed-off-by: Greg Kroah-Hartman

x86/pci: Skip early E820 check for ECAM region

2024-06-12T09:39:58+00:00

commit 199f968f1484a14024d0d467211ffc2faf193eb4 upstream.

Arul, Mateusz, Imcarneiro91, and Aman reported a regression caused by
07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map").  On the
Lenovo Legion 9i laptop, that commit removes the ECAM area from E820, which
means the early E820 validation fails, which means we don't enable ECAM in
the "early MCFG" path.

The static MCFG table describes ECAM without depending on the ACPI
interpreter.  Many Legion 9i ACPI methods rely on that, so they fail when
PCI config access isn't available, resulting in the embedded controller,
PS/2, audio, trackpad, and battery devices not being detected.  The _OSC
method also fails, so Linux can't take control of the PCIe hotplug, PME,
and AER features:

  # pci_mmcfg_early_init()

  PCI: ECAM [mem 0xc0000000-0xce0fffff] (base 0xc0000000) for domain 0000 [bus 00-e0]
  PCI: not using ECAM ([mem 0xc0000000-0xce0fffff] not reserved)

  ACPI Error: AE_ERROR, Returned by Handler for [PCI_Config] (20230628/evregion-300)
  ACPI: Interpreter enabled
  ACPI: Ignoring error and continuing table load
  ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PC00.RP01._SB.PC00], AE_NOT_FOUND (20230628/dswload2-162)
  ACPI Error: AE_NOT_FOUND, During name lookup/catalog (20230628/psobject-220)
  ACPI: Skipping parse of AML opcode: OpcodeName unavailable (0x0010)
  ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PC00.RP01._SB.PC00], AE_NOT_FOUND (20230628/dswload2-162)
  ACPI Error: AE_NOT_FOUND, During name lookup/catalog (20230628/psobject-220)
  ...
  ACPI Error: Aborting method \_SB.PC00._OSC due to previous error (AE_NOT_FOUND) (20230628/psparse-529)
  acpi PNP0A08:00: _OSC: platform retains control of PCIe features (AE_NOT_FOUND)

  # pci_mmcfg_late_init()

  PCI: ECAM [mem 0xc0000000-0xce0fffff] (base 0xc0000000) for domain 0000 [bus 00-e0]
  PCI: [Firmware Info]: ECAM [mem 0xc0000000-0xce0fffff] not reserved in ACPI motherboard resources
  PCI: ECAM [mem 0xc0000000-0xce0fffff] is EfiMemoryMappedIO; assuming valid
  PCI: ECAM [mem 0xc0000000-0xce0fffff] reserved to work around lack of ACPI motherboard _CRS

Per PCI Firmware r3.3, sec 4.1.2, ECAM space must be reserved by a PNP0C02
resource, but there's no requirement to mention it in E820, so we shouldn't
look at E820 to validate the ECAM space described by MCFG.

In 2006, 946f2ee5c731 ("[PATCH] i386/x86-64: Check that MCFG points to an
e820 reserved area") added a sanity check of E820 to work around buggy MCFG
tables, but that over-aggressive validation causes failures like this one.

Keep the E820 validation check for machines older than 2016, an arbitrary
ten years after 946f2ee5c731, so machines that depend on it don't break.

Skip the early E820 check for 2016 and newer BIOSes since there's no
requirement to describe ECAM in E820.

Link: https://lore.kernel.org/r/20240417204012.215030-2-helgaas@kernel.org
Fixes: 07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map")
Reported-by: Mateusz Kaduk 
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218444
Signed-off-by: Bjorn Helgaas 
Tested-by: Mateusz Kaduk 
Reviewed-by: Andy Shevchenko 
Reviewed-by: Hans de Goede 
Reviewed-by: Kuppuswamy Sathyanarayanan 
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman

x86/topology: Handle bogus ACPI tables correctly

2024-06-12T09:39:57+00:00

commit 9d22c96316ac59ed38e80920c698fed38717b91b upstream.

The ACPI specification clearly states how the processors should be
enumerated in the MADT:

 "To ensure that the boot processor is supported post initialization,
  two guidelines should be followed. The first is that OSPM should
  initialize processors in the order that they appear in the MADT. The
  second is that platform firmware should list the boot processor as the
  first processor entry in the MADT.
  ...
  Failure of OSPM implementations and platform firmware to abide by
  these guidelines can result in both unpredictable and non optimal
  platform operation."

The kernel relies on that ordering to detect the real BSP on crash kernels
which is important to avoid sending a INIT IPI to it as that would cause a
full machine reset.

On a Dell XPS 16 9640 the BIOS ignores this rule and enumerates the CPUs in
the wrong order. As a consequence the kernel falsely detects a crash kernel
and disables the corresponding CPU.

Prevent this by checking the IA32_APICBASE MSR for the BSP bit on the boot
CPU. If that bit is set, then the MADT based BSP detection can be safely
ignored. If the kernel detects a mismatch between the BSP bit and the first
enumerated MADT entry then emit a firmware bug message.

This obviously also has to be taken into account when the boot APIC ID and
the first enumerated APIC ID match. If the boot CPU does not have the BSP
bit set in the APICBASE MSR then there is no way for the boot CPU to
determine which of the CPUs is the real BSP. Sending an INIT to the real
BSP would reset the machine so the only sane way to deal with that is to
limit the number of CPUs to one and emit a corresponding warning message.

Fixes: 5c5682b9f87a ("x86/cpu: Detect real BSP on crash kernels")
Reported-by: Carsten Tolkmit 
Signed-off-by: Thomas Gleixner 
Tested-by: Carsten Tolkmit 
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/87le48jycb.ffs@tglx
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218837
Signed-off-by: Greg Kroah-Hartman

efi: libstub: only free priv.runtime_map when allocated

2024-06-12T09:39:57+00:00

commit 4b2543f7e1e6b91cfc8dd1696e3cdf01c3ac8974 upstream.

priv.runtime_map is only allocated when efi_novamap is not set.
Otherwise, it is an uninitialized value.  In the error path, it is freed
unconditionally.  Avoid passing an uninitialized value to free_pool.
Free priv.runtime_map only when it was allocated.

This bug was discovered and resolved using Coverity Static Analysis
Security Testing (SAST) by Synopsys, Inc.

Fixes: f80d26043af9 ("efi: libstub: avoid efi_get_memory_map() for allocating the virt map")
Cc: 
Signed-off-by: Hagar Hemdan 
Signed-off-by: Ard Biesheuvel 
Signed-off-by: Greg Kroah-Hartman

x86/efistub: Omit physical KASLR when memory reservations exist

2024-06-12T09:39:57+00:00

commit 15aa8fb852f995dd234a57f12dfb989044968bb6 upstream.

The legacy decompressor has elaborate logic to ensure that the
randomized physical placement of the decompressed kernel image does not
conflict with any memory reservations, including ones specified on the
command line using mem=, memmap=, efi_fake_mem= or hugepages=, which are
taken into account by the kernel proper at a later stage.

When booting in EFI mode, it is the firmware's job to ensure that the
chosen range does not conflict with any memory reservations that it
knows about, and this is trivially achieved by using the firmware's
memory allocation APIs.

That leaves reservations specified on the command line, though, which
the firmware knows nothing about, as these regions have no other special
significance to the platform. Since commit

  a1b87d54f4e4 ("x86/efistub: Avoid legacy decompressor when doing EFI boot")

these reservations are not taken into account when randomizing the
physical placement, which may result in conflicts where the memory
cannot be reserved by the kernel proper because its own executable image
resides there.

To avoid having to duplicate or reuse the existing complicated logic,
disable physical KASLR entirely when such overrides are specified. These
are mostly diagnostic tools or niche features, and physical KASLR (as
opposed to virtual KASLR, which is much more important as it affects the
memory addresses observed by code executing in the kernel) is something
we can live without.

Closes: https://lkml.kernel.org/r/FA5F6719-8824-4B04-803E-82990E65E627%40akamai.com
Reported-by: Ben Chaney 
Fixes: a1b87d54f4e4 ("x86/efistub: Avoid legacy decompressor when doing EFI boot")
Cc:   # v6.1+
Reviewed-by: Kees Cook 
Signed-off-by: Ard Biesheuvel 
Signed-off-by: Greg Kroah-Hartman