linux.git/arch/arm/kvm, branch v3.16.81

KVM: arm/arm64: Fix VMID alloc race by reverting to lock-less

2019-04-04T15:13:57+00:00

commit fb544d1ca65a89f7a3895f7531221ceeed74ada7 upstream.

We recently addressed a VMID generation race by introducing a read/write
lock around accesses and updates to the vmid generation values.

However, kvm_arch_vcpu_ioctl_run() also calls need_new_vmid_gen() but
does so without taking the read lock.

As far as I can tell, this can lead to the same kind of race:

  VM 0, VCPU 0			VM 0, VCPU 1
  ------------			------------
  update_vttbr (vmid 254)
  				update_vttbr (vmid 1) // roll over
				read_lock(kvm_vmid_lock);
				force_vm_exit()
  local_irq_disable
  need_new_vmid_gen == false //because vmid gen matches

  enter_guest (vmid 254)
  				kvm_arch.vttbr = :
				read_unlock(kvm_vmid_lock);

  				enter_guest (vmid 1)

Which results in running two VCPUs in the same VM with different VMIDs
and (even worse) other VCPUs from other VMs could now allocate clashing
VMID 254 from the new generation as long as VCPU 0 is not exiting.

Attempt to solve this by making sure vttbr is updated before another CPU
can observe the updated VMID generation.

Fixes: f0cf47d939d0 "KVM: arm/arm64: Close VMID generation race"
Reviewed-by: Julien Thierry 
Signed-off-by: Christoffer Dall 
Signed-off-by: Marc Zyngier 
[bwh: Backported to 3.16:
 - Use ACCESS_ONCE() instead of {READ,WRITE}_ONCE()
 - Adjust filename]
Signed-off-by: Ben Hutchings

KVM: Protect device ops->create and list_add with kvm->lock

2019-03-25T17:32:35+00:00

commit a28ebea2adc4a2bef5989a5a181ec238f59fbcad upstream.

KVM devices were manipulating list data structures without any form of
synchronization, and some implementations of the create operations also
suffered from a lack of synchronization.

Now when we've split the xics create operation into create and init, we
can hold the kvm->lock mutex while calling the create operation and when
manipulating the devices list.

The error path in the generic code gets slightly ugly because we have to
take the mutex again and delete the device from the list, but holding
the mutex during anon_inode_getfd or releasing/locking the mutex in the
common non-error path seemed wrong.

Signed-off-by: Christoffer Dall 
Reviewed-by: Paolo Bonzini 
Acked-by: Christian Borntraeger 
Signed-off-by: Radim Krčmář 
[bwh: Backported to 3.16:
 - Drop change to a failure path that doesn't exist in kvm_vgic_create() 
 - Adjust filename, context]
Signed-off-by: Ben Hutchings

KVM: arm/arm64: Skip updating PTE entry if no change

2018-12-16T22:08:44+00:00

commit 976d34e2dab10ece5ea8fe7090b7692913f89084 upstream.

When there is contention on faulting in a particular page table entry
at stage 2, the break-before-make requirement of the architecture can
lead to additional refaulting due to TLB invalidation.

Avoid this by skipping a page table update if the new value of the PTE
matches the previous value.

Fixes: d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
Reviewed-by: Suzuki Poulose 
Acked-by: Christoffer Dall 
Signed-off-by: Punit Agrawal 
Signed-off-by: Marc Zyngier 
[bwh: Backported to 3.16: adjust filename]
Signed-off-by: Ben Hutchings

KVM: arm/arm64: Skip updating PMD entry if no change

2018-12-16T22:08:44+00:00

commit 86658b819cd0a9aa584cd84453ed268a6f013770 upstream.

Contention on updating a PMD entry by a large number of vcpus can lead
to duplicate work when handling stage 2 page faults. As the page table
update follows the break-before-make requirement of the architecture,
it can lead to repeated refaults due to clearing the entry and
flushing the tlbs.

This problem is more likely when -

* there are large number of vcpus
* the mapping is large block mapping

such as when using PMD hugepages (512MB) with 64k pages.

Fix this by skipping the page table update if there is no change in
the entry being updated.

Fixes: ad361f093c1e ("KVM: ARM: Support hugetlbfs backed huge pages")
Reviewed-by: Suzuki Poulose 
Acked-by: Christoffer Dall 
Signed-off-by: Punit Agrawal 
Signed-off-by: Marc Zyngier 
[bwh: Backported to 3.16: adjust filename]
Signed-off-by: Ben Hutchings

KVM: arm/arm64: Close VMID generation race

2018-10-21T07:45:29+00:00

commit f0cf47d939d0b4b4f660c5aaa4276fa3488f3391 upstream.

Before entering the guest, we check whether our VMID is still
part of the current generation. In order to avoid taking a lock,
we start with checking that the generation is still current, and
only if not current do we take the lock, recheck, and update the
generation and VMID.

This leaves open a small race: A vcpu can bump up the global
generation number as well as the VM's, but has not updated
the VMID itself yet.

At that point another vcpu from the same VM comes in, checks
the generation (and finds it not needing anything), and jumps
into the guest. At this point, we end-up with two vcpus belonging
to the same VM running with two different VMIDs. Eventually, the
VMID used by the second vcpu will get reassigned, and things will
really go wrong...

A simple solution would be to drop this initial check, and always take
the lock. This is likely to cause performance issues. A middle ground
is to convert the spinlock to a rwlock, and only take the read lock
on the fast path. If the check fails at that point, drop it and
acquire the write lock, rechecking the condition.

This ensures that the above scenario doesn't occur.

Reported-by: Mark Rutland 
Tested-by: Shannon Zhao 
Signed-off-by: Marc Zyngier 
[bwh: Backported to 3.16: adjust filename, context]
Signed-off-by: Ben Hutchings

arm: KVM: Fix SMCCC handling of unimplemented SMC/HVC calls

2018-06-16T21:22:06+00:00

commit 20e8175d246e9f9deb377f2784b3e7dfb2ad3e86 upstream.

KVM doesn't follow the SMCCC when it comes to unimplemented calls,
and inject an UNDEF instead of returning an error. Since firmware
calls are now used for security mitigation, they are becoming more
common, and the undef is counter productive.

Instead, let's follow the SMCCC which states that -1 must be returned
to the caller when getting an unknown function number.

Tested-by: Ard Biesheuvel 
Signed-off-by: Marc Zyngier 
Signed-off-by: Catalin Marinas 
[bwh: Backported to 3.16: Use vcpu_reg() instead of vcpu_set_reg()]
Signed-off-by: Ben Hutchings

KVM: arm/arm64: Fix HYP unmapping going off limits

2018-03-03T15:52:00+00:00

commit 7839c672e58bf62da8f2f0197fefb442c02ba1dd upstream.

When we unmap the HYP memory, we try to be clever and unmap one
PGD at a time. If we start with a non-PGD aligned address and try
to unmap a whole PGD, things go horribly wrong in unmap_hyp_range
(addr and end can never match, and it all goes really badly as we
keep incrementing pgd and parse random memory as page tables...).

The obvious fix is to let unmap_hyp_range do what it does best,
which is to iterate over a range.

The size of the linear mapping, which begins at PAGE_OFFSET, can be
easily calculated by subtracting PAGE_OFFSET form high_memory, because
high_memory is defined as the linear map address of the last byte of
DRAM, plus one.

The size of the vmalloc region is given trivially by VMALLOC_END -
VMALLOC_START.

Reported-by: Andre Przywara 
Tested-by: Andre Przywara 
Reviewed-by: Christoffer Dall 
Signed-off-by: Marc Zyngier 
Signed-off-by: Christoffer Dall 
[bwh: Backported to 3.16: adjust filename, context]
Signed-off-by: Ben Hutchings

KVM: Fix stack-out-of-bounds read in write_mmio

2018-01-01T20:52:11+00:00

commit e39d200fa5bf5b94a0948db0dae44c1b73b84a56 upstream.

Reported by syzkaller:

  BUG: KASAN: stack-out-of-bounds in write_mmio+0x11e/0x270 [kvm]
  Read of size 8 at addr ffff8803259df7f8 by task syz-executor/32298

  CPU: 6 PID: 32298 Comm: syz-executor Tainted: G           OE    4.15.0-rc2+ #18
  Hardware name: LENOVO ThinkCentre M8500t-N000/SHARKBAY, BIOS FBKTC1AUS 02/16/2016
  Call Trace:
   dump_stack+0xab/0xe1
   print_address_description+0x6b/0x290
   kasan_report+0x28a/0x370
   write_mmio+0x11e/0x270 [kvm]
   emulator_read_write_onepage+0x311/0x600 [kvm]
   emulator_read_write+0xef/0x240 [kvm]
   emulator_fix_hypercall+0x105/0x150 [kvm]
   em_hypercall+0x2b/0x80 [kvm]
   x86_emulate_insn+0x2b1/0x1640 [kvm]
   x86_emulate_instruction+0x39a/0xb90 [kvm]
   handle_exception+0x1b4/0x4d0 [kvm_intel]
   vcpu_enter_guest+0x15a0/0x2640 [kvm]
   kvm_arch_vcpu_ioctl_run+0x549/0x7d0 [kvm]
   kvm_vcpu_ioctl+0x479/0x880 [kvm]
   do_vfs_ioctl+0x142/0x9a0
   SyS_ioctl+0x74/0x80
   entry_SYSCALL_64_fastpath+0x23/0x9a

The path of patched vmmcall will patch 3 bytes opcode 0F 01 C1(vmcall)
to the guest memory, however, write_mmio tracepoint always prints 8 bytes
through *(u64 *)val since kvm splits the mmio access into 8 bytes. This
leaks 5 bytes from the kernel stack (CVE-2017-17741).  This patch fixes
it by just accessing the bytes which we operate on.

Before patch:

syz-executor-5567  [007] .... 51370.561696: kvm_mmio: mmio write len 3 gpa 0x10 val 0x1ffff10077c1010f

After patch:

syz-executor-13416 [002] .... 51302.299573: kvm_mmio: mmio write len 3 gpa 0x10 val 0xc1010f

Reported-by: Dmitry Vyukov 
Reviewed-by: Darren Kenny 
Reviewed-by: Marc Zyngier 
Tested-by: Marc Zyngier 
Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Cc: Marc Zyngier 
Cc: Christoffer Dall 
Signed-off-by: Wanpeng Li 
Signed-off-by: Paolo Bonzini 
[bwh: Backported to 3.16:
 - ARM implementation combines the KVM_TRACE_MMIO_WRITE and
   KVM_TRACE_MMIO_READ_UNSATISFIED cases
 - Adjust filename]
Signed-off-by: Ben Hutchings

arm/arm64: KVM: set right LR register value for 32 bit guest when inject abort

2018-01-01T20:51:57+00:00

commit fd6c8c206fc5d0717b0433b191de0715122f33bb upstream.

When a exception is trapped to EL2, hardware uses  ELR_ELx to hold
the current fault instruction address. If KVM wants to inject a
abort to 32 bit guest, it needs to set the LR register for the
guest to emulate this abort happened in the guest. Because ARM32
architecture is pipelined execution, so the LR value has an offset to
the fault instruction address.

The offsets applied to Link value for exceptions as shown below,
which should be added for the ARM32 link register(LR).

Table taken from ARMv8 ARM DDI0487B-B, table G1-10:
Exception			Offset, for PE state of:
				A32 	  T32
Undefined Instruction 		+4 	  +2
Prefetch Abort 			+4 	  +4
Data Abort 			+8 	  +8
IRQ or FIQ 			+4 	  +4

  [ Removed unused variables in inject_abt to avoid compile warnings.
    -- Christoffer ]

Signed-off-by: Dongjiu Geng 
Tested-by: Haibin Zhang 
Reviewed-by: Marc Zyngier 
Signed-off-by: Christoffer Dall 
[bwh: Backported to 3.16:
 - Don't delete cpsr variable in inject_abt() as it's still needed
 - Adjust context]
Signed-off-by: Ben Hutchings

arm: KVM: Allow unaligned accesses at HYP

2017-09-15T17:30:04+00:00

commit 33b5c38852b29736f3b472dd095c9a18ec22746f upstream.

We currently have the HSCTLR.A bit set, trapping unaligned accesses
at HYP, but we're not really prepared to deal with it.

Since the rest of the kernel is pretty happy about that, let's follow
its example and set HSCTLR.A to zero. Modern CPUs don't really care.

Signed-off-by: Marc Zyngier 
Signed-off-by: Christoffer Dall 
Signed-off-by: Ben Hutchings