linux.git/drivers/virt, branch v6.1.168

minmax: add in_range() macro

2025-10-02T11:40:41+00:00

[ Upstream commit f9bff0e31881d03badf191d3b0005839391f5f2b ]

Patch series "New page table range API", v6.

This patchset changes the API used by the MM to set up page table entries.
The four APIs are:

    set_ptes(mm, addr, ptep, pte, nr)
    update_mmu_cache_range(vma, addr, ptep, nr)
    flush_dcache_folio(folio)
    flush_icache_pages(vma, page, nr)

flush_dcache_folio() isn't technically new, but no architecture
implemented it, so I've done that for them.  The old APIs remain around
but are mostly implemented by calling the new interfaces.

The new APIs are based around setting up N page table entries at once.
The N entries belong to the same PMD, the same folio and the same VMA, so
ptep++ is a legitimate operation, and locking is taken care of for you.
Some architectures can do a better job of it than just a loop, but I have
hesitated to make too deep a change to architectures I don't understand
well.

One thing I have changed in every architecture is that PG_arch_1 is now a
per-folio bit instead of a per-page bit when used for dcache clean/dirty
tracking.  This was something that would have to happen eventually, and it
makes sense to do it now rather than iterate over every page involved in a
cache flush and figure out if it needs to happen.

The point of all this is better performance, and Fengwei Yin has measured
improvement on x86.  I suspect you'll see improvement on your architecture
too.  Try the new will-it-scale test mentioned here:
https://lore.kernel.org/linux-mm/20230206140639.538867-5-fengwei.yin@intel.com/
You'll need to run it on an XFS filesystem and have
CONFIG_TRANSPARENT_HUGEPAGE set.

This patchset is the basis for much of the anonymous large folio work
being done by Ryan, so it's received quite a lot of testing over the last
few months.

This patch (of 38):

Determine if a value lies within a range more efficiently (subtraction +
comparison vs two comparisons and an AND).  It also has useful (under some
circumstances) behaviour if the range exceeds the maximum value of the
type.  Convert all the conflicting definitions of in_range() within the
kernel; some can use the generic definition while others need their own
definition.

Link: https://lkml.kernel.org/r/20230802151406.3735276-1-willy@infradead.org
Link: https://lkml.kernel.org/r/20230802151406.3735276-2-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) 
Signed-off-by: Andrew Morton 
Signed-off-by: Eliav Farber 
Signed-off-by: Greg Kroah-Hartman

fix locking in efi_secret_unlink()

2025-08-28T14:25:51+00:00

[ Upstream commit 2c58d42de71f9c73e40afacc9d062892d2cc8862 ]

We used to need securityfs_remove() to undo simple_pin_fs() done when
the file had been created and to drop the second extra reference
taken at the same time.  Now that neither is needed (or done by
securityfs_remove()), we can simply call simple_unlink() and be done
with that - the broken games with locking had been there only for the
sake of securityfs_remove().

Signed-off-by: Al Viro 
Signed-off-by: Sasha Levin

drivers: virt: acrn: hsm: Use kzalloc to avoid info leak in pmcmd_ioctl

2025-03-13T11:53:23+00:00

commit 819cec1dc47cdeac8f5dd6ba81c1dbee2a68c3bb upstream.

In the "pmcmd_ioctl" function, three memory objects allocated by
kmalloc are initialized by "hcall_get_cpu_state", which are then
copied to user space. The initializer is indeed implemented in
"acrn_hypercall2" (arch/x86/include/asm/acrn.h). There is a risk of
information leakage due to uninitialized bytes.

Fixes: 3d679d5aec64 ("virt: acrn: Introduce interfaces to query C-states and P-states allowed by hypervisor")
Signed-off-by: Haoyu Li 
Cc: stable 
Acked-by: Fei Li 
Link: https://lore.kernel.org/r/20250130115811.92424-1-lihaoyu499@gmail.com
Signed-off-by: Greg Kroah-Hartman

drivers/virt/acrn: fix PFNMAP PTE checks in acrn_vm_ram_map()

2024-06-12T09:03:30+00:00

[ Upstream commit 3d6586008f7b638f91f3332602592caa8b00b559 ]

Patch series "mm: follow_pte() improvements and acrn follow_pte() fixes".

Patch #1 fixes a bunch of issues I spotted in the acrn driver.  It
compiles, that's all I know.  I'll appreciate some review and testing from
acrn folks.

Patch #2+#3 improve follow_pte(), passing a VMA instead of the MM, adding
more sanity checks, and improving the documentation.  Gave it a quick test
on x86-64 using VM_PAT that ends up using follow_pte().

This patch (of 3):

We currently miss handling various cases, resulting in a dangerous
follow_pte() (previously follow_pfn()) usage.

(1) We're not checking PTE write permissions.

Maybe we should simply always require pte_write() like we do for
pin_user_pages_fast(FOLL_WRITE)? Hard to tell, so let's check for
ACRN_MEM_ACCESS_WRITE for now.

(2) We're not rejecting refcounted pages.

As we are not using MMU notifiers, messing with refcounted pages is
dangerous and can result in use-after-free. Let's make sure to reject them.

(3) We are only looking at the first PTE of a bigger range.

We only lookup a single PTE, but memmap->len may span a larger area.
Let's loop over all involved PTEs and make sure the PFN range is
actually contiguous. Reject everything else: it couldn't have worked
either way, and rather made use access PFNs we shouldn't be accessing.

Link: https://lkml.kernel.org/r/20240410155527.474777-1-david@redhat.com
Link: https://lkml.kernel.org/r/20240410155527.474777-2-david@redhat.com
Fixes: 8a6e85f75a83 ("virt: acrn: obtain pa from VMA with PFNMAP flag")
Signed-off-by: David Hildenbrand 
Cc: Alex Williamson 
Cc: Christoph Hellwig 
Cc: Fei Li 
Cc: Gerald Schaefer 
Cc: Heiko Carstens 
Cc: Ingo Molnar 
Cc: Paolo Bonzini 
Cc: Yonghua Huang 
Cc: Sean Christopherson 
Signed-off-by: Andrew Morton 
Signed-off-by: Sasha Levin

virt: acrn: stop using follow_pfn

2024-06-12T09:03:30+00:00

[ Upstream commit 1b265da7ea1e1ae997fa119c2846bb389eb39c6b ]

Patch series "remove follow_pfn".

This series open codes follow_pfn in the only remaining caller, although
the code there remains questionable.  It then also moves follow_phys into
the only user and simplifies it a bit.

This patch (of 3):

Switch from follow_pfn to follow_pte so that we can get rid of follow_pfn.
Note that this doesn't fix any of the pre-existing raciness and lack of
permission checking in the code.

Link: https://lkml.kernel.org/r/20240324234542.2038726-1-hch@lst.de
Link: https://lkml.kernel.org/r/20240324234542.2038726-2-hch@lst.de
Signed-off-by: Christoph Hellwig 
Reviewed-by: David Hildenbrand 
Cc: Andy Lutomirski 
Cc: Dave Hansen 
Cc: Fei Li 
Cc: Peter Zijlstra 
Cc: Ingo Molnar 
Signed-off-by: Andrew Morton 
Stable-dep-of: 3d6586008f7b ("drivers/virt/acrn: fix PFNMAP PTE checks in acrn_vm_ram_map()")
Signed-off-by: Sasha Levin

virt: sevguest: Fix passing a stack buffer as a scatterlist target

2023-11-20T10:52:13+00:00

[ Upstream commit db10cb9b574675402bfd8fe1a31aafdd45b002df ]

CONFIG_DEBUG_SG highlights that get_{report,ext_report,derived_key)()}
are passing stack buffers as the @req_buf argument to
handle_guest_request(), generating a Call Trace of the following form:

    WARNING: CPU: 0 PID: 1175 at include/linux/scatterlist.h:187 enc_dec_message+0x518/0x5b0 [sev_guest]
    [..]
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023
    RIP: 0010:enc_dec_message+0x518/0x5b0 [sev_guest]
    Call Trace:
     
    [..]
     handle_guest_request+0x135/0x520 [sev_guest]
     get_ext_report+0x1ec/0x3e0 [sev_guest]
     snp_guest_ioctl+0x157/0x200 [sev_guest]

Note that the above Call Trace was with the DEBUG_SG BUG_ON()s converted
to WARN_ON()s.

This is benign as long as there are no hardware crypto accelerators
loaded for the aead cipher, and no subsequent dma_map_sg() is performed
on the scatterlist. However, sev-guest can not assume the presence of
an aead accelerator nor can it assume that CONFIG_DEBUG_SG is disabled.

Resolve this bug by allocating virt_addr_valid() memory, similar to the
other buffers am @snp_dev instance carries, to marshal requests from
user buffers to kernel buffers.

Reported-by: Peter Gonda 
Closes: http://lore.kernel.org/r/CAMkAt6r2VPPMZ__SQfJse8qWsUyYW3AgYbOUVM0S_Vtk=KvkxQ@mail.gmail.com
Fixes: fce96cf04430 ("virt: Add SEV-SNP guest driver")
Cc: Borislav Petkov 
Cc: Tom Lendacky 
Cc: Dionna Glaze 
Cc: Jeremi Piotrowski 
Tested-by: Kuppuswamy Sathyanarayanan 
Reviewed-by: Tom Lendacky 
Signed-off-by: Dan Williams 
Signed-off-by: Sasha Levin

x86/sev: Change snp_guest_issue_request()'s fw_err argument

2023-11-20T10:52:13+00:00

[ Upstream commit 0144e3b85d7b42e8a4cda991c0e81f131897457a ]

The GHCB specification declares that the firmware error value for
a guest request will be stored in the lower 32 bits of EXIT_INFO_2. The
upper 32 bits are for the VMM's own error code. The fw_err argument to
snp_guest_issue_request() is thus a misnomer, and callers will need
access to all 64 bits.

The type of unsigned long also causes problems, since sw_exit_info2 is
u64 (unsigned long long) vs the argument's unsigned long*. Change this
type for issuing the guest request. Pass the ioctl command struct's error
field directly instead of in a local variable, since an incomplete guest
request may not set the error code, and uninitialized stack memory would
be written back to user space.

The firmware might not even be called, so bookend the call with the no
firmware call error and clear the error.

Since the "fw_err" field is really exitinfo2 split into the upper bits'
vmm error code and lower bits' firmware error code, convert the 64 bit
value to a union.

[ bp:
- Massage commit message
- adjust code
- Fix a build issue as
Reported-by: kernel test robot
Link: https://lore.kernel.org/oe-kbuild-all/202303070609.vX6wp2Af-lkp@intel.com
- print exitinfo2 in hex
Tom:
- Correct -EIO exit case. ]

Signed-off-by: Dionna Glaze
Signed-off-by: Tom Lendacky
Signed-off-by: Borislav Petkov (AMD)
Link: https://lore.kernel.org/r/20230214164638.1189804-5-dionnaglaze@google.com
Link: https://lore.kernel.org/r/20230307192449.24732-12-bp@alien8.de
Stable-dep-of: db10cb9b5746 ("virt: sevguest: Fix passing a stack buffer as a scatterlist target")
Signed-off-by: Sasha Levin

virt: sevguest: Add CONFIG_CRYPTO dependency

2023-07-19T14:20:55+00:00

[ Upstream commit 84b9b44b99780d35fe72ac63c4724f158771e898 ]

This driver fails to link when CRYPTO is disabled, or in a loadable
module:

  WARNING: unmet direct dependencies detected for CRYPTO_GCM
  WARNING: unmet direct dependencies detected for CRYPTO_AEAD2
    Depends on [m]: CRYPTO [=m]
    Selected by [y]:
    - SEV_GUEST [=y] && VIRT_DRIVERS [=y] && AMD_MEM_ENCRYPT [=y]

x86_64-linux-ld: crypto/aead.o: in function `crypto_register_aeads':

Fixes: fce96cf04430 ("virt: Add SEV-SNP guest driver")
Signed-off-by: Arnd Bergmann 
Signed-off-by: Borislav Petkov (AMD) 
Link: https://lore.kernel.org/r/20230117171416.2715125-1-arnd@kernel.org
Signed-off-by: Sasha Levin

virt/coco/sev-guest: Double-buffer messages

2023-05-11T14:03:10+00:00

[ Upstream commit 965006103a14703cc42043bbf9b5e0cdf7a468ad ]

The encryption algorithms read and write directly to shared unencrypted
memory, which may leak information as well as permit the host to tamper
with the message integrity. Instead, copy whole messages in or out as
needed before doing any computation on them.

Fixes: d5af44dde546 ("x86/sev: Provide support for SNP guest request NAEs")
Signed-off-by: Dionna Glaze 
Signed-off-by: Borislav Petkov (AMD) 
Link: https://lore.kernel.org/r/20230214164638.1189804-3-dionnaglaze@google.com
Signed-off-by: Sasha Levin

virt/coco/sev-guest: Add throttling awareness

2023-03-22T12:34:06+00:00

commit 72f7754dcf31c87c92c0c353dcf747814cc5ce10 upstream.

A potentially malicious SEV guest can constantly hammer the hypervisor
using this driver to send down requests and thus prevent or at least
considerably hinder other guests from issuing requests to the secure
processor which is a shared platform resource.

Therefore, the host is permitted and encouraged to throttle such guest
requests.

Add the capability to handle the case when the hypervisor throttles
excessive numbers of requests issued by the guest. Otherwise, the VM
platform communication key will be disabled, preventing the guest from
attesting itself.

Realistically speaking, a well-behaved guest should not even care about
throttling. During its lifetime, it would end up issuing a handful of
requests which the hardware can easily handle.

This is more to address the case of a malicious guest. Such guest should
get throttled and if its VMPCK gets disabled, then that's its own
wrongdoing and perhaps that guest even deserves it.

To the implementation: the hypervisor signals with SNP_GUEST_REQ_ERR_BUSY
that the guest requests should be throttled. That error code is returned
in the upper 32-bit half of exitinfo2 and this is part of the GHCB spec
v2.

So the guest is given a throttling period of 1 minute in which it
retries the request every 2 seconds. This is a good default but if it
turns out to not pan out in practice, it can be tweaked later.

For safety, since the encryption algorithm in GHCBv2 is AES_GCM, control
must remain in the kernel to complete the request with the current
sequence number. Returning without finishing the request allows the
guest to make another request but with different message contents. This
is IV reuse, and breaks cryptographic protections.

  [ bp:
    - Rewrite commit message and do a simplified version.
    - The stable tags are supposed to denote that a cleanup should go
      upfront before backporting this so that any future fixes to this
      can preserve the sanity of the backporter(s). ]

Fixes: d5af44dde546 ("x86/sev: Provide support for SNP guest request NAEs")
Signed-off-by: Dionna Glaze 
Co-developed-by: Borislav Petkov (AMD) 
Signed-off-by: Borislav Petkov (AMD) 
Reviewed-by: Tom Lendacky 
Cc:  # d6fd48eff750 ("virt/coco/sev-guest: Check SEV_SNP attribute at probe time")
Cc:  # 970ab823743f (" virt/coco/sev-guest: Simplify extended guest request handling")
Cc:  # c5a338274bdb ("virt/coco/sev-guest: Remove the disable_vmpck label in handle_guest_request()")
Cc:  # 0fdb6cc7c89c ("virt/coco/sev-guest: Carve out the request issuing logic into a helper")
Cc:  # d25bae7dc7b0 ("virt/coco/sev-guest: Do some code style cleanups")
Cc:  # fa4ae42cc60a ("virt/coco/sev-guest: Convert the sw_exit_info_2 checking to a switch-case")
Link: https://lore.kernel.org/r/20230214164638.1189804-2-dionnaglaze@google.com
Signed-off-by: Greg Kroah-Hartman