linux.git/arch/arm64/include, branch v6.15.10

arm64/gcs: task_gcs_el0_enable() should use passed task

2025-08-15T10:16:34+00:00

[ Upstream commit cbbcfb94c55c02a8c4ce52b5da0770b5591a314c ]

Mark Rutland noticed that the task parameter is ignored and
'current' is being used instead. Since this is usually
what its passed, it hasn't yet been causing problems but likely
will as the code gets more testing.

But, once this is fixed, it creates a new bug in copy_thread_gcs()
since the gcs_el_mode isn't yet set for the task before its being
checked. Move gcs_alloc_thread_stack() after the new task's
gcs_el0_mode initialization to avoid this.

Fixes: fc84bc5378a8 ("arm64/gcs: Context switch GCS state for EL0")
Signed-off-by: Jeremy Linton 
Reviewed-by: Mark Brown 
Link: https://lore.kernel.org/r/20250719043740.4548-2-jeremy.linton@arm.com
Signed-off-by: Catalin Marinas 
Signed-off-by: Sasha Levin

arm64/entry: Mask DAIF in cpu_switch_to(), call_on_irq_stack()

2025-08-01T08:51:27+00:00

commit d42e6c20de6192f8e4ab4cf10be8c694ef27e8cb upstream.

`cpu_switch_to()` and `call_on_irq_stack()` manipulate SP to change
to different stacks along with the Shadow Call Stack if it is enabled.
Those two stack changes cannot be done atomically and both functions
can be interrupted by SErrors or Debug Exceptions which, though unlikely,
is very much broken : if interrupted, we can end up with mismatched stacks
and Shadow Call Stack leading to clobbered stacks.

In `cpu_switch_to()`, it can happen when SP_EL0 points to the new task,
but x18 stills points to the old task's SCS. When the interrupt handler
tries to save the task's SCS pointer, it will save the old task
SCS pointer (x18) into the new task struct (pointed to by SP_EL0),
clobbering it.

In `call_on_irq_stack()`, it can happen when switching from the task stack
to the IRQ stack and when switching back. In both cases, we can be
interrupted when the SCS pointer points to the IRQ SCS, but SP points to
the task stack. The nested interrupt handler pushes its return addresses
on the IRQ SCS. It then detects that SP points to the task stack,
calls `call_on_irq_stack()` and clobbers the task SCS pointer with
the IRQ SCS pointer, which it will also use !

This leads to tasks returning to addresses on the wrong SCS,
or even on the IRQ SCS, triggering kernel panics via CONFIG_VMAP_STACK
or FPAC if enabled.

This is possible on a default config, but unlikely.
However, when enabling CONFIG_ARM64_PSEUDO_NMI, DAIF is unmasked and
instead the GIC is responsible for filtering what interrupts the CPU
should receive based on priority.
Given the goal of emulating NMIs, pseudo-NMIs can be received by the CPU
even in `cpu_switch_to()` and `call_on_irq_stack()`, possibly *very*
frequently depending on the system configuration and workload, leading
to unpredictable kernel panics.

Completely mask DAIF in `cpu_switch_to()` and restore it when returning.
Do the same in `call_on_irq_stack()`, but restore and mask around
the branch.
Mask DAIF even if CONFIG_SHADOW_CALL_STACK is not enabled for consistency
of behaviour between all configurations.

Introduce and use an assembly macro for saving and masking DAIF,
as the existing one saves but only masks IF.

Cc: 
Signed-off-by: Ada Couprie Diaz 
Reported-by: Cristian Prundeanu 
Fixes: 59b37fe52f49 ("arm64: Stash shadow stack pointer in the task struct on interrupt")
Tested-by: Cristian Prundeanu 
Acked-by: Will Deacon 
Link: https://lore.kernel.org/r/20250718142814.133329-1-ada.coupriediaz@arm.com
Signed-off-by: Will Deacon 
Signed-off-by: Greg Kroah-Hartman

arm64/mm: Close theoretical race where stale TLB entry remains valid

2025-06-27T10:13:01+00:00

commit 4b634918384c0f84c33aeb4dd9fd4c38e7be5ccb upstream.

Commit 3ea277194daa ("mm, mprotect: flush TLB if potentially racing with
a parallel reclaim leaving stale TLB entries") describes a race that,
prior to the commit, could occur between reclaim and operations such as
mprotect() when using reclaim's tlbbatch mechanism. See that commit for
details but the summary is:

"""
Nadav Amit identified a theoritical race between page reclaim and
mprotect due to TLB flushes being batched outside of the PTL being held.

He described the race as follows:

	CPU0				CPU1
	----				----
					user accesses memory using RW PTE
					[PTE now cached in TLB]
	try_to_unmap_one()
	==> ptep_get_and_clear()
	==> set_tlb_ubc_flush_pending()
					mprotect(addr, PROT_READ)
					==> change_pte_range()
					==> [ PTE non-present - no flush ]

					user writes using cached RW PTE
	...

	try_to_unmap_flush()
"""

The solution was to insert flush_tlb_batched_pending() in mprotect() and
friends to explcitly drain any pending reclaim TLB flushes. In the
modern version of this solution, arch_flush_tlb_batched_pending() is
called to do that synchronisation.

arm64's tlbbatch implementation simply issues TLBIs at queue-time
(arch_tlbbatch_add_pending()), eliding the trailing dsb(ish). The
trailing dsb(ish) is finally issued in arch_tlbbatch_flush() at the end
of the batch to wait for all the issued TLBIs to complete.

Now, the Arm ARM states:

"""
The completion of the TLB maintenance instruction is guaranteed only by
the execution of a DSB by the observer that performed the TLB
maintenance instruction. The execution of a DSB by a different observer
does not have this effect, even if the DSB is known to be executed after
the TLB maintenance instruction is observed by that different observer.
"""

arch_tlbbatch_add_pending() and arch_tlbbatch_flush() conform to this
requirement because they are called from the same task (either kswapd or
caller of madvise(MADV_PAGEOUT)), so either they are on the same CPU or
if the task was migrated, __switch_to() contains an extra dsb(ish).

HOWEVER, arm64's arch_flush_tlb_batched_pending() is also implemented as
a dsb(ish). But this may be running on a CPU remote from the one that
issued the outstanding TLBIs. So there is no architectural gurantee of
synchonization. Therefore we are still vulnerable to the theoretical
race described in Commit 3ea277194daa ("mm, mprotect: flush TLB if
potentially racing with a parallel reclaim leaving stale TLB entries").

Fix this by flushing the entire mm in arch_flush_tlb_batched_pending().
This aligns with what the other arches that implement the tlbbatch
feature do.

Cc: 
Fixes: 43b3dfdd0455 ("arm64: support batched/deferred tlb shootdown during page reclamation/migration")
Signed-off-by: Ryan Roberts 
Link: https://lore.kernel.org/r/20250530152445.2430295-1-ryan.roberts@arm.com
Signed-off-by: Will Deacon 
Signed-off-by: Greg Kroah-Hartman

arm64/fpsimd: Avoid RES0 bits in the SME trap handler

2025-06-19T13:39:29+00:00

[ Upstream commit 95507570fb2f75544af69760cd5d8f48fc5c7f20 ]

The SME trap handler consumes RES0 bits from the ESR when determining
the reason for the trap, and depends upon those bits reading as zero.
This may break in future when those RES0 bits are allocated a meaning
and stop reading as zero.

For SME traps taken with ESR_ELx.EC == 0b011101, the specific reason for
the trap is indicated by ESR_ELx.ISS.SMTC ("SME Trap Code"). This field
occupies bits [2:0] of ESR_ELx.ISS, and as of ARM DDI 0487 L.a, bits
[24:3] of ESR_ELx.ISS are RES0. ESR_ELx.ISS itself occupies bits [24:0]
of ESR_ELx.

Extract the SMTC field specifically, matching the way we handle ESR_ELx
fields elsewhere, and ensuring that the handler is future-proof.

Fixes: 8bd7f91c03d8 ("arm64/sme: Implement traps and syscall handling for SME")
Signed-off-by: Mark Rutland 
Cc: Marc Zyngier 
Cc: Mark Brown 
Cc: Will Deacon 
Reviewed-by: Mark Brown 
Link: https://lore.kernel.org/r/20250409164010.3480271-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas 
Signed-off-by: Sasha Levin

Merge tag 'arm64_cbpf_mitigation_2025_05_08' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

2025-05-12T00:45:00+00:00

Pull arm64 cBPF BHB mitigation from James Morse:
 "This adds the BHB mitigation into the code JITted for cBPF programs as
  these can be loaded by unprivileged users via features like seccomp.

  The existing mechanisms to disable the BHB mitigation will also
  prevent the mitigation being JITted. In addition, cBPF programs loaded
  by processes with the SYS_ADMIN capability are not mitigated as these
  could equally load an eBPF program that does the same thing.

  For good measure, the list of 'k' values for CPU's local mitigations
  is updated from the version on arm's website"

* tag 'arm64_cbpf_mitigation_2025_05_08' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: proton-pack: Add new CPUs 'k' values for branch mitigation
  arm64: bpf: Only mitigate cBPF programs loaded by unprivileged users
  arm64: bpf: Add BHB mitigation to the epilogue for cBPF programs
  arm64: proton-pack: Expose whether the branchy loop k value
  arm64: proton-pack: Expose whether the platform is mitigated by firmware
  arm64: insn: Add support for encoding DSB

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

2025-05-11T18:30:13+00:00

Pull KVM fixes from Paolo Bonzini:
 "ARM:

   - Avoid use of uninitialized memcache pointer in user_mem_abort()

   - Always set HCR_EL2.xMO bits when running in VHE, allowing
     interrupts to be taken while TGE=0 and fixing an ugly bug on
     AmpereOne that occurs when taking an interrupt while clearing the
     xMO bits (AC03_CPU_36)

   - Prevent VMMs from hiding support for AArch64 at any EL virtualized
     by KVM

   - Save/restore the host value for HCRX_EL2 instead of restoring an
     incorrect fixed value

   - Make host_stage2_set_owner_locked() check that the entire requested
     range is memory rather than just the first page

  RISC-V:

   - Add missing reset of smstateen CSRs

  x86:

   - Forcibly leave SMM on SHUTDOWN interception on AMD CPUs to avoid
     causing problems due to KVM stuffing INIT on SHUTDOWN (KVM needs to
     sanitize the VMCB as its state is undefined after SHUTDOWN,
     emulating INIT is the least awful choice).

   - Track the valid sync/dirty fields in kvm_run as a u64 to ensure KVM
     KVM doesn't goof a sanity check in the future.

   - Free obsolete roots when (re)loading the MMU to fix a bug where
     pre-faulting memory can get stuck due to always encountering a
     stale root.

   - When dumping GHCB state, use KVM's snapshot instead of the raw GHCB
     page to print state, so that KVM doesn't print stale/wrong
     information.

   - When changing memory attributes (e.g. shared <=> private), add
     potential hugepage ranges to the mmu_invalidate_range_{start,end}
     set so that KVM doesn't create a shared/private hugepage when the
     the corresponding attributes will become mixed (the attributes are
     commited *after* KVM finishes the invalidation).

   - Rework the SRSO mitigation to enable BP_SPEC_REDUCE only when KVM
     has at least one active VM. Effectively BP_SPEC_REDUCE when KVM is
     loaded led to very measurable performance regressions for non-KVM
     workloads"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: SVM: Set/clear SRSO's BP_SPEC_REDUCE on 0 <=> 1 VM count transitions
  KVM: arm64: Fix memory check in host_stage2_set_owner_locked()
  KVM: arm64: Kill HCRX_HOST_FLAGS
  KVM: arm64: Properly save/restore HCRX_EL2
  KVM: arm64: selftest: Don't try to disable AArch64 support
  KVM: arm64: Prevent userspace from disabling AArch64 support at any virtualisable EL
  KVM: arm64: Force HCR_EL2.xMO to 1 at all times in VHE mode
  KVM: arm64: Fix uninitialized memcache pointer in user_mem_abort()
  KVM: x86/mmu: Prevent installing hugepages when mem attributes are changing
  KVM: SVM: Update dump_ghcb() to use the GHCB snapshot fields
  KVM: RISC-V: reset smstateen CSRs
  KVM: x86/mmu: Check and free obsolete roots in kvm_mmu_reload()
  KVM: x86: Check that the high 32bits are clear in kvm_arch_vcpu_ioctl_run()
  KVM: SVM: Forcibly leave SMM mode on SHUTDOWN interception

arm64: proton-pack: Add new CPUs 'k' values for branch mitigation

2025-05-08T14:29:28+00:00

Update the list of 'k' values for the branch mitigation from arm's
website.

Add the values for Cortex-X1C. The MIDR_EL1 value can be found here:
https://developer.arm.com/documentation/101968/0002/Register-descriptions/AArch>

Link: https://developer.arm.com/documentation/110280/2-0/?lang=en
Signed-off-by: James Morse 
Reviewed-by: Catalin Marinas

arm64: bpf: Add BHB mitigation to the epilogue for cBPF programs

2025-05-08T14:28:35+00:00

A malicious BPF program may manipulate the branch history to influence
what the hardware speculates will happen next.

On exit from a BPF program, emit the BHB mititgation sequence.

This is only applied for 'classic' cBPF programs that are loaded by
seccomp.

Signed-off-by: James Morse 
Reviewed-by: Catalin Marinas 
Acked-by: Daniel Borkmann

arm64: proton-pack: Expose whether the branchy loop k value

2025-05-08T14:28:35+00:00

Add a helper to expose the k value of the branchy loop. This is needed
by the BPF JIT to generate the mitigation sequence in BPF programs.

Signed-off-by: James Morse 
Reviewed-by: Catalin Marinas

arm64: proton-pack: Expose whether the platform is mitigated by firmware

2025-05-08T14:28:35+00:00

is_spectre_bhb_fw_affected() allows the caller to determine if the CPU
is known to need a firmware mitigation. CPUs are either on the list
of CPUs we know about, or firmware has been queried and reported that
the platform is affected - and mitigated by firmware.

This helper is not useful to determine if the platform is mitigated
by firmware. A CPU could be on the know list, but the firmware may
not be implemented. Its affected but not mitigated.

spectre_bhb_enable_mitigation() handles this distinction by checking
the firmware state before enabling the mitigation.

Add a helper to expose this state. This will be used by the BPF JIT
to determine if calling firmware for a mitigation is necessary and
supported.

Signed-off-by: James Morse 
Reviewed-by: Catalin Marinas