summaryrefslogtreecommitdiff
path: root/arch/x86/kvm/vmx
AgeCommit message (Collapse)AuthorFilesLines
2022-02-23KVM: x86/pmu: Refactoring find_arch_event() to pmc_perf_hw_id()Like Xu1-4/+5
[ Upstream commit 7c174f305cbee6bdba5018aae02b84369e7ab995 ] The find_arch_event() returns a "unsigned int" value, which is used by the pmc_reprogram_counter() to program a PERF_TYPE_HARDWARE type perf_event. The returned value is actually the kernel defined generic perf_hw_id, let's rename it to pmc_perf_hw_id() with simpler incoming parameters for better self-explanation. Signed-off-by: Like Xu <likexu@tencent.com> Message-Id: <20211130074221.93635-3-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-02-16KVM: VMX: Set vmcs.PENDING_DBG.BS on #DB in STI/MOVSS blocking shadowSean Christopherson1-0/+25
[ Upstream commit b9bed78e2fa9571b7c983b20666efa0009030c71 ] Set vmcs.GUEST_PENDING_DBG_EXCEPTIONS.BS, a.k.a. the pending single-step breakpoint flag, when re-injecting a #DB with RFLAGS.TF=1, and STI or MOVSS blocking is active. Setting the flag is necessary to make VM-Entry consistency checks happy, as VMX has an invariant that if RFLAGS.TF is set and STI/MOVSS blocking is true, then the previous instruction must have been STI or MOV/POP, and therefore a single-step #DB must be pending since the RFLAGS.TF cannot have been set by the previous instruction, i.e. the one instruction delay after setting RFLAGS.TF must have already expired. Normally, the CPU sets vmcs.GUEST_PENDING_DBG_EXCEPTIONS.BS appropriately when recording guest state as part of a VM-Exit, but #DB VM-Exits intentionally do not treat the #DB as "guest state" as interception of the #DB effectively makes the #DB host-owned, thus KVM needs to manually set PENDING_DBG.BS when forwarding/re-injecting the #DB to the guest. Note, although this bug can be triggered by guest userspace, doing so requires IOPL=3, and guest userspace running with IOPL=3 has full access to all I/O ports (from the guest's perspective) and can crash/reboot the guest any number of ways. IOPL=3 is required because STI blocking kicks in if and only if RFLAGS.IF is toggled 0=>1, and if CPL>IOPL, STI either takes a #GP or modifies RFLAGS.VIF, not RFLAGS.IF. MOVSS blocking can be initiated by userspace, but can be coincident with a #DB if and only if DR7.GD=1 (General Detect enabled) and a MOV DR is executed in the MOVSS shadow. MOV DR #GPs at CPL>0, thus MOVSS blocking is problematic only for CPL0 (and only if the guest is crazy enough to access a DR in a MOVSS shadow). All other sources of #DBs are either suppressed by MOVSS blocking (single-step, code fetch, data, and I/O), are mutually exclusive with MOVSS blocking (T-bit task switch), or are already handled by KVM (ICEBP, a.k.a. INT1). This bug was originally found by running tests[1] created for XSA-308[2]. Note that Xen's userspace test emits ICEBP in the MOVSS shadow, which is presumably why the Xen bug was deemed to be an exploitable DOS from guest userspace. KVM already handles ICEBP by skipping the ICEBP instruction and thus clears MOVSS blocking as a side effect of its "emulation". [1] http://xenbits.xenproject.org/docs/xtf/xsa-308_2main_8c_source.html [2] https://xenbits.xen.org/xsa/advisory-308.html Reported-by: David Woodhouse <dwmw2@infradead.org> Reported-by: Alexander Graf <graf@amazon.de> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220120000624.655815-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-02-16KVM: nVMX: Also filter MSR_IA32_VMX_TRUE_PINBASED_CTLS when eVMCSVitaly Kuznetsov1-0/+1
[ Upstream commit f80ae0ef089a09e8c18da43a382c3caac9a424a7 ] Similar to MSR_IA32_VMX_EXIT_CTLS/MSR_IA32_VMX_TRUE_EXIT_CTLS, MSR_IA32_VMX_ENTRY_CTLS/MSR_IA32_VMX_TRUE_ENTRY_CTLS pair, MSR_IA32_VMX_TRUE_PINBASED_CTLS needs to be filtered the same way MSR_IA32_VMX_PINBASED_CTLS is currently filtered as guests may solely rely on 'true' MSR data. Note, none of the currently existing Windows/Hyper-V versions are known to stumble upon the unfiltered MSR_IA32_VMX_TRUE_PINBASED_CTLS, the change is aimed at making the filtering future proof. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220112170134.1904308-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-02-16KVM: nVMX: eVMCS: Filter out VM_EXIT_SAVE_VMX_PREEMPTION_TIMERVitaly Kuznetsov1-1/+3
[ Upstream commit 7a601e2cf61558dfd534a9ecaad09f5853ad8204 ] Enlightened VMCS v1 doesn't have VMX_PREEMPTION_TIMER_VALUE field, PIN_BASED_VMX_PREEMPTION_TIMER is also filtered out already so it makes sense to filter out VM_EXIT_SAVE_VMX_PREEMPTION_TIMER too. Note, none of the currently existing Windows/Hyper-V versions are known to enable 'save VMX-preemption timer value' when eVMCS is in use, the change is aimed at making the filtering future proof. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220112170134.1904308-3-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-02-05KVM: x86: Forcibly leave nested virt when SMM state is toggledSean Christopherson1-0/+1
commit f7e570780efc5cec9b2ed1e0472a7da14e864fdb upstream. Forcibly leave nested virtualization operation if userspace toggles SMM state via KVM_SET_VCPU_EVENTS or KVM_SYNC_X86_EVENTS. If userspace forces the vCPU out of SMM while it's post-VMXON and then injects an SMI, vmx_enter_smm() will overwrite vmx->nested.smm.vmxon and end up with both vmxon=false and smm.vmxon=false, but all other nVMX state allocated. Don't attempt to gracefully handle the transition as (a) most transitions are nonsencial, e.g. forcing SMM while L2 is running, (b) there isn't sufficient information to handle all transitions, e.g. SVM wants access to the SMRAM save state, and (c) KVM_SET_VCPU_EVENTS must precede KVM_SET_NESTED_STATE during state restore as the latter disallows putting the vCPU into L2 if SMM is active, and disallows tagging the vCPU as being post-VMXON in SMM if SMM is not active. Abuse of KVM_SET_VCPU_EVENTS manifests as a WARN and memory leak in nVMX due to failure to free vmcs01's shadow VMCS, but the bug goes far beyond just a memory leak, e.g. toggling SMM on while L2 is active puts the vCPU in an architecturally impossible state. WARNING: CPU: 0 PID: 3606 at free_loaded_vmcs arch/x86/kvm/vmx/vmx.c:2665 [inline] WARNING: CPU: 0 PID: 3606 at free_loaded_vmcs+0x158/0x1a0 arch/x86/kvm/vmx/vmx.c:2656 Modules linked in: CPU: 1 PID: 3606 Comm: syz-executor725 Not tainted 5.17.0-rc1-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:free_loaded_vmcs arch/x86/kvm/vmx/vmx.c:2665 [inline] RIP: 0010:free_loaded_vmcs+0x158/0x1a0 arch/x86/kvm/vmx/vmx.c:2656 Code: <0f> 0b eb b3 e8 8f 4d 9f 00 e9 f7 fe ff ff 48 89 df e8 92 4d 9f 00 Call Trace: <TASK> kvm_arch_vcpu_destroy+0x72/0x2f0 arch/x86/kvm/x86.c:11123 kvm_vcpu_destroy arch/x86/kvm/../../../virt/kvm/kvm_main.c:441 [inline] kvm_destroy_vcpus+0x11f/0x290 arch/x86/kvm/../../../virt/kvm/kvm_main.c:460 kvm_free_vcpus arch/x86/kvm/x86.c:11564 [inline] kvm_arch_destroy_vm+0x2e8/0x470 arch/x86/kvm/x86.c:11676 kvm_destroy_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:1217 [inline] kvm_put_kvm+0x4fa/0xb00 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1250 kvm_vm_release+0x3f/0x50 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1273 __fput+0x286/0x9f0 fs/file_table.c:311 task_work_run+0xdd/0x1a0 kernel/task_work.c:164 exit_task_work include/linux/task_work.h:32 [inline] do_exit+0xb29/0x2a30 kernel/exit.c:806 do_group_exit+0xd2/0x2f0 kernel/exit.c:935 get_signal+0x4b0/0x28c0 kernel/signal.c:2862 arch_do_signal_or_restart+0x2a9/0x1c40 arch/x86/kernel/signal.c:868 handle_signal_work kernel/entry/common.c:148 [inline] exit_to_user_mode_loop kernel/entry/common.c:172 [inline] exit_to_user_mode_prepare+0x17d/0x290 kernel/entry/common.c:207 __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline] syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:300 do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x44/0xae </TASK> Cc: stable@vger.kernel.org Reported-by: syzbot+8112db3ab20e70d50c31@syzkaller.appspotmail.com Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220125220358.2091737-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Backported-by: Tadeusz Struk <tadeusz.struk@linaro.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-27KVM: VMX: switch blocked_vcpu_on_cpu_lock to raw spinlockMarcelo Tosatti1-8/+8
commit 5f02ef741a785678930f3ff0a8b6b2b0ef1bb402 upstream. blocked_vcpu_on_cpu_lock is taken from hard interrupt context (pi_wakeup_handler), therefore it cannot sleep. Switch it to a raw spinlock. Fixes: [41297.066254] BUG: scheduling while atomic: CPU 0/KVM/635218/0x00010001 [41297.066323] Preemption disabled at: [41297.066324] [<ffffffff902ee47f>] irq_enter_rcu+0xf/0x60 [41297.066339] Call Trace: [41297.066342] <IRQ> [41297.066346] dump_stack_lvl+0x34/0x44 [41297.066353] ? irq_enter_rcu+0xf/0x60 [41297.066356] __schedule_bug.cold+0x7d/0x8b [41297.066361] __schedule+0x439/0x5b0 [41297.066365] ? task_blocks_on_rt_mutex.constprop.0.isra.0+0x1b0/0x440 [41297.066369] schedule_rtlock+0x1e/0x40 [41297.066371] rtlock_slowlock_locked+0xf1/0x260 [41297.066374] rt_spin_lock+0x3b/0x60 [41297.066378] pi_wakeup_handler+0x31/0x90 [kvm_intel] [41297.066388] sysvec_kvm_posted_intr_wakeup_ipi+0x9d/0xd0 [41297.066392] </IRQ> [41297.066392] asm_sysvec_kvm_posted_intr_wakeup_ipi+0x12/0x20 ... Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-20KVM: x86: Register Processor Trace interrupt hook iff PT enabled in guestSean Christopherson1-0/+1
commit f4b027c5c8199abd4fb6f00d67d380548dbfdfa8 upstream. Override the Processor Trace (PT) interrupt handler for guest mode if and only if PT is configured for host+guest mode, i.e. is being used independently by both host and guest. If PT is configured for system mode, the host fully controls PT and must handle all events. Fixes: 8479e04e7d6b ("KVM: x86: Inject PMI for KVM guest") Reported-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Reported-by: Artem Kashkanov <artem.kashkanov@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20211111020738.2512932-4-seanjc@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-29KVM: VMX: Wake vCPU when delivering posted IRQ even if vCPU == this vCPUSean Christopherson1-2/+1
commit fdba608f15e2427419997b0898750a49a735afcb upstream. Drop a check that guards triggering a posted interrupt on the currently running vCPU, and more importantly guards waking the target vCPU if triggering a posted interrupt fails because the vCPU isn't IN_GUEST_MODE. If a vIRQ is delivered from asynchronous context, the target vCPU can be the currently running vCPU and can also be blocking, in which case skipping kvm_vcpu_wake_up() is effectively dropping what is supposed to be a wake event for the vCPU. The "do nothing" logic when "vcpu == running_vcpu" mostly works only because the majority of calls to ->deliver_posted_interrupt(), especially when using posted interrupts, come from synchronous KVM context. But if a device is exposed to the guest using vfio-pci passthrough, the VFIO IRQ and vCPU are bound to the same pCPU, and the IRQ is _not_ configured to use posted interrupts, wake events from the device will be delivered to KVM from IRQ context, e.g. vfio_msihandler() | |-> eventfd_signal() | |-> ... | |-> irqfd_wakeup() | |->kvm_arch_set_irq_inatomic() | |-> kvm_irq_delivery_to_apic_fast() | |-> kvm_apic_set_irq() This also aligns the non-nested and nested usage of triggering posted interrupts, and will allow for additional cleanups. Fixes: 379a3c8ee444 ("KVM: VMX: Optimize posted-interrupt delivery for timer fastpath") Cc: stable@vger.kernel.org Reported-by: Longpeng (Mike) <longpeng2@huawei.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20211208015236.1616697-18-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08KVM: VMX: Set failure code in prepare_vmcs02()Dan Carpenter1-1/+3
[ Upstream commit bfbb307c628676929c2d329da0daf9d22afa8ad2 ] The error paths in the prepare_vmcs02() function are supposed to set *entry_failure_code but this path does not. It leads to using an uninitialized variable in the caller. Fixes: 71f7347025bf ("KVM: nVMX: Load GUEST_IA32_PERF_GLOBAL_CTRL MSR on VM-Entry") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Message-Id: <20211130125337.GB24578@kili> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-08KVM: x86: Use a stable condition around all VT-d PI pathsPaolo Bonzini1-9/+11
commit 53b7ca1a359389276c76fbc9e1009d8626a17e40 upstream. Currently, checks for whether VT-d PI can be used refer to the current status of the feature in the current vCPU; or they more or less pick vCPU 0 in case a specific vCPU is not available. However, these checks do not attempt to synchronize with changes to the IRTE. In particular, there is no path that updates the IRTE when APICv is re-activated on vCPU 0; and there is no path to wakeup a CPU that has APICv disabled, if the wakeup occurs because of an IRTE that points to a posted interrupt. To fix this, always go through the VT-d PI path as long as there are assigned devices and APICv is available on both the host and the VM side. Since the relevant condition was copied over three times, take the hint and factor it into a separate function. Suggested-by: Sean Christopherson <seanjc@google.com> Cc: stable@vger.kernel.org Reviewed-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: David Matlack <dmatlack@google.com> Message-Id: <20211123004311.2954158-5-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08KVM: nVMX: Flush current VPID (L1 vs. L2) for KVM_REQ_TLB_FLUSH_GUESTSean Christopherson1-9/+14
commit 2b4a5a5d56881ece3c66b9a9a8943a6f41bd7349 upstream. Flush the current VPID when handling KVM_REQ_TLB_FLUSH_GUEST instead of always flushing vpid01. Any TLB flush that is triggered when L2 is active is scoped to L2's VPID (if it has one), e.g. if L2 toggles CR4.PGE and L1 doesn't intercept PGE writes, then KVM's emulation of the TLB flush needs to be applied to L2's VPID. Reported-by: Lai Jiangshan <jiangshanlai+lkml@gmail.com> Fixes: 07ffaf343e34 ("KVM: nVMX: Sync all PGDs on nested transition with shadow paging") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211125014944.536398-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-26KVM: nVMX: don't use vcpu->arch.efer when checking host state on nested ↵Maxim Levitsky1-5/+17
state load commit af957eebfcc17433ee83ab85b1195a933ab5049c upstream. When loading nested state, don't use check vcpu->arch.efer to get the L1 host's 64-bit vs. 32-bit state and don't check it for consistency with respect to VM_EXIT_HOST_ADDR_SPACE_SIZE, as register state in vCPU may be stale when KVM_SET_NESTED_STATE is called---and architecturally does not exist. When restoring L2 state in KVM, the CPU is placed in non-root where nested VMX code has no snapshot of L1 host state: VMX (conditionally) loads host state fields loaded on VM-exit, but they need not correspond to the state before entry. A simple case occurs in KVM itself, where the host RIP field points to vmx_vmexit rather than the instruction following vmlaunch/vmresume. However, for the particular case of L1 being in 32- or 64-bit mode on entry, the exit controls can be treated instead as the source of truth regarding the state of L1 on entry, and can be used to check that vmcs12.VM_EXIT_HOST_ADDR_SPACE_SIZE matches vmcs12.HOST_EFER if vmcs12.VM_EXIT_LOAD_IA32_EFER is set. The consistency check on CPU EFER vs. vmcs12.VM_EXIT_HOST_ADDR_SPACE_SIZE, instead, happens only on VM-Enter. That's because, again, there's conceptually no "current" L1 EFER to check on KVM_SET_NESTED_STATE. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20211115131837.195527-2-mlevitsk@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-18KVM: nVMX: Query current VMCS when determining if MSR bitmaps are in useSean Christopherson1-4/+4
commit 7dfbc624eb5726367900c8d86deff50836240361 upstream. Check the current VMCS controls to determine if an MSR write will be intercepted due to MSR bitmaps being disabled. In the nested VMX case, KVM will disable MSR bitmaps in vmcs02 if they're disabled in vmcs12 or if KVM can't map L1's bitmaps for whatever reason. Note, the bad behavior is relatively benign in the current code base as KVM sets all bits in vmcs02's MSR bitmap by default, clears bits if and only if L0 KVM also disables interception of an MSR, and only uses the buggy helper for MSR_IA32_SPEC_CTRL. Because KVM explicitly tests WRMSR before disabling interception of MSR_IA32_SPEC_CTRL, the flawed check will only result in KVM reading MSR_IA32_SPEC_CTRL from hardware when it isn't strictly necessary. Tag the fix for stable in case a future fix wants to use msr_write_intercepted(), in which case a buggy implementation in older kernels could prove subtly problematic. Fixes: d28b387fb74d ("KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211109013047.2041518-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-18KVM: VMX: Unregister posted interrupt wakeup handler on hardware unsetupSean Christopherson1-2/+5
commit ec5a4919fa7b7d8c7a2af1c7e799b1fe4be84343 upstream. Unregister KVM's posted interrupt wakeup handler during unsetup so that a spurious interrupt that arrives after kvm_intel.ko is unloaded doesn't call into freed memory. Fixes: bf9f6ac8d749 ("KVM: Update Posted-Interrupts Descriptor when vCPU is blocked") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211009001107.3936588-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-27KVM: nVMX: promptly process interrupts delivered while in guest modePaolo Bonzini1-11/+6
commit 3a25dfa67fe40f3a2690af2c562e0947a78bd6a0 upstream. Since commit c300ab9f08df ("KVM: x86: Replace late check_nested_events() hack with more precise fix") there is no longer the certainty that check_nested_events() tries to inject an external interrupt vmexit to L1 on every call to vcpu_enter_guest. Therefore, even in that case we need to set KVM_REQ_EVENT. This ensures that inject_pending_event() is called, and from there kvm_check_nested_events(). Fixes: c300ab9f08df ("KVM: x86: Replace late check_nested_events() hack with more precise fix") Cc: stable@vger.kernel.org Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-06KVM: nVMX: Filter out all unsupported controls when eVMCS was activatedVitaly Kuznetsov2-7/+14
commit 8d68bad6d869fae8f4d50ab6423538dec7da72d1 upstream. Windows Server 2022 with Hyper-V role enabled failed to boot on KVM when enlightened VMCS is advertised. Debugging revealed there are two exposed secondary controls it is not happy with: SECONDARY_EXEC_ENABLE_VMFUNC and SECONDARY_EXEC_SHADOW_VMCS. These controls are known to be unsupported, as there are no corresponding fields in eVMCSv1 (see the comment above EVMCS1_UNSUPPORTED_2NDEXEC definition). Previously, commit 31de3d2500e4 ("x86/kvm/hyper-v: move VMX controls sanitization out of nested_enable_evmcs()") introduced the required filtering mechanism for VMX MSRs but for some reason put only known to be problematic (and not full EVMCS1_UNSUPPORTED_* lists) controls there. Note, Windows Server 2022 seems to have gained some sanity check for VMX MSRs: it doesn't even try to launch a guest when there's something it doesn't like, nested_evmcs_check_controls() mechanism can't catch the problem. Let's be bold this time and instead of playing whack-a-mole just filter out all unsupported controls from VMX MSRs. Fixes: 31de3d2500e4 ("x86/kvm/hyper-v: move VMX controls sanitization out of nested_enable_evmcs()") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210907163530.110066-1-vkuznets@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-15KVM: nVMX: Unconditionally clear nested.pi_pending on nested VM-EnterSean Christopherson1-4/+3
commit f7782bb8d818d8f47c26b22079db10599922787a upstream. Clear nested.pi_pending on nested VM-Enter even if L2 will run without posted interrupts enabled. If nested.pi_pending is left set from a previous L2, vmx_complete_nested_posted_interrupt() will pick up the stale flag and exit to userspace with an "internal emulation error" due the new L2 not having a valid nested.pi_desc. Arguably, vmx_complete_nested_posted_interrupt() should first check for posted interrupts being enabled, but it's also completely reasonable that KVM wouldn't screw up a fundamental flag. Not to mention that the mere existence of nested.pi_pending is a long-standing bug as KVM shouldn't move the posted interrupt out of the IRR until it's actually processed, e.g. KVM effectively drops an interrupt when it performs a nested VM-Exit with a "pending" posted interrupt. Fixing the mess is a future problem. Prior to vmx_complete_nested_posted_interrupt() interpreting a null PI descriptor as an error, this was a benign bug as the null PI descriptor effectively served as a check on PI not being enabled. Even then, the new flow did not become problematic until KVM started checking the result of kvm_check_nested_events(). Fixes: 705699a13994 ("KVM: nVMX: Enable nested posted interrupt processing") Fixes: 966eefb89657 ("KVM: nVMX: Disable vmcs02 posted interrupts if vmcs12 PID isn't mappable") Fixes: 47d3530f86c0 ("KVM: x86: Exit to userspace when kvm_check_nested_events fails") Cc: stable@vger.kernel.org Cc: Jim Mattson <jmattson@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210810144526.2662272-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-15KVM: VMX: avoid running vmx_handle_exit_irqoff in case of emulationMaxim Levitsky1-0/+3
commit 81b4b56d4f8130bbb99cf4e2b48082e5b4cfccb9 upstream. If we are emulating an invalid guest state, we don't have a correct exit reason, and thus we shouldn't do anything in this function. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210826095750.1650467-2-mlevitsk@redhat.com> Cc: stable@vger.kernel.org Fixes: 95b5a48c4f2b ("KVM: VMX: Handle NMIs, #MCs and async #PFs in common irqs-disabled fn", 2019-06-18) Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-08-18KVM: nVMX: Use vmx_need_pf_intercept() when deciding if L0 wants a #PFSean Christopherson1-1/+2
commit 18712c13709d2de9516c5d3414f707c4f0a9c190 upstream. Use vmx_need_pf_intercept() when determining if L0 wants to handle a #PF in L2 or if the VM-Exit should be forwarded to L1. The current logic fails to account for the case where #PF is intercepted to handle guest.MAXPHYADDR < host.MAXPHYADDR and ends up reflecting all #PFs into L1. At best, L1 will complain and inject the #PF back into L2. At worst, L1 will eat the unexpected fault and cause L2 to hang on infinite page faults. Note, while the bug was technically introduced by the commit that added support for the MAXPHYADDR madness, the shame is all on commit a0c134347baf ("KVM: VMX: introduce vmx_need_pf_intercept"). Fixes: 1dbf5d68af6f ("KVM: VMX: Add guest physical address check in EPT violation and misconfig") Cc: stable@vger.kernel.org Cc: Peter Shier <pshier@google.com> Cc: Oliver Upton <oupton@google.com> Cc: Jim Mattson <jmattson@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210812045615.3167686-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-08-18KVM: VMX: Use current VMCS to query WAITPKG support for MSR emulationSean Christopherson1-1/+1
commit 7b9cae027ba3aaac295ae23a62f47876ed97da73 upstream. Use the secondary_exec_controls_get() accessor in vmx_has_waitpkg() to effectively get the controls for the current VMCS, as opposed to using vmx->secondary_exec_controls, which is the cached value of KVM's desired controls for vmcs01 and truly not reflective of any particular VMCS. While the waitpkg control is not dynamic, i.e. vmcs01 will always hold the same waitpkg configuration as vmx->secondary_exec_controls, the same does not hold true for vmcs02 if the L1 VMM hides the feature from L2. If L1 hides the feature _and_ does not intercept MSR_IA32_UMWAIT_CONTROL, L2 could incorrectly read/write L1's virtual MSR instead of taking a #GP. Fixes: 6e3ba4abcea5 ("KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210810171952.2758100-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-14KVM: nVMX: Don't clobber nested MMU's A/D status on EPTP switchSean Christopherson1-7/+0
[ Upstream commit 272b0a998d084e7667284bdd2d0c675c6a2d11de ] Drop bogus logic that incorrectly clobbers the accessed/dirty enabling status of the nested MMU on an EPTP switch. When nested EPT is enabled, walk_mmu points at L2's _legacy_ page tables, not L1's EPT for L2. This is likely a benign bug, as mmu->ept_ad is never consumed (since the MMU is not a nested EPT MMU), and stuffing mmu_role.base.ad_disabled will never propagate into future shadow pages since the nested MMU isn't used to map anything, just to walk L2's page tables. Note, KVM also does a full MMU reload, i.e. the guest_mmu will be recreated using the new EPTP, and thus any change in A/D enabling will be properly recognized in the relevant MMU. Fixes: 41ab93727467 ("KVM: nVMX: Emulate EPTP switching for the L1 hypervisor") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210609234235.1244004-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14KVM: nVMX: Ensure 64-bit shift when checking VMFUNC bitmapSean Christopherson1-1/+1
[ Upstream commit 0e75225dfa4c5d5d51291f54a3d2d5895bad38da ] Use BIT_ULL() instead of an open-coded shift to check whether or not a function is enabled in L1's VMFUNC bitmap. This is a benign bug as KVM supports only bit 0, and will fail VM-Enter if any other bits are set, i.e. bits 63:32 are guaranteed to be zero. Note, "function" is bounded by hardware as VMFUNC will #UD before taking a VM-Exit if the function is greater than 63. Before: if ((vmcs12->vm_function_control & (1 << function)) == 0) 0x000000000001a916 <+118>: mov $0x1,%eax 0x000000000001a91b <+123>: shl %cl,%eax 0x000000000001a91d <+125>: cltq 0x000000000001a91f <+127>: and 0x128(%rbx),%rax After: if (!(vmcs12->vm_function_control & BIT_ULL(function & 63))) 0x000000000001a955 <+117>: mov 0x128(%rbx),%rdx 0x000000000001a95c <+124>: bt %rax,%rdx Fixes: 27c42a1bb867 ("KVM: nVMX: Enable VMFUNC for the L1 hypervisor") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210609234235.1244004-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14KVM: nVMX: Sync all PGDs on nested transition with shadow pagingSean Christopherson1-5/+12
[ Upstream commit 07ffaf343e34b555c9e7ea39a9c81c439a706f13 ] Trigger a full TLB flush on behalf of the guest on nested VM-Enter and VM-Exit when VPID is disabled for L2. kvm_mmu_new_pgd() syncs only the current PGD, which can theoretically leave stale, unsync'd entries in a previous guest PGD, which could be consumed if L2 is allowed to load CR3 with PCID_NOFLUSH=1. Rename KVM_REQ_HV_TLB_FLUSH to KVM_REQ_TLB_FLUSH_GUEST so that it can be utilized for its obvious purpose of emulating a guest TLB flush. Note, there is no change the actual TLB flush executed by KVM, even though the fast PGD switch uses KVM_REQ_TLB_FLUSH_CURRENT. When VPID is disabled for L2, vpid02 is guaranteed to be '0', and thus nested_get_vpid02() will return the VPID that is shared by L1 and L2. Generate the request outside of kvm_mmu_new_pgd(), as getting the common helper to correctly identify which requested is needed is quite painful. E.g. using KVM_REQ_TLB_FLUSH_GUEST when nested EPT is in play is wrong as a TLB flush from the L1 kernel's perspective does not invalidate EPT mappings. And, by using KVM_REQ_TLB_FLUSH_GUEST, nVMX can do future simplification by moving the logic into nested_vmx_transition_tlb_flush(). Fixes: 41fab65e7c44 ("KVM: nVMX: Skip MMU sync on nested VMX transition when possible") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210609234235.1244004-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14KVM: nVMX: Handle split-lock #AC exceptions that happen in L2Sean Christopherson4-2/+11
commit b33bb78a1fada6445c265c585ee0dd0fc6279102 upstream. Mark #ACs that won't be reinjected to the guest as wanted by L0 so that KVM handles split-lock #AC from L2 instead of forwarding the exception to L1. Split-lock #AC isn't yet virtualized, i.e. L1 will treat it like a regular #AC and do the wrong thing, e.g. reinject it into L2. Fixes: e6f8b6c12f03 ("KVM: VMX: Extend VMXs #AC interceptor to handle split lock #AC in guest") Cc: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622172244.3561540-1-seanjc@google.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-28KVM: x86: Defer vtime accounting 'til after IRQ handlingWanpeng Li1-3/+3
commit 160457140187c5fb127b844e5a85f87f00a01b14 upstream. Defer the call to account guest time until after servicing any IRQ(s) that happened in the guest or immediately after VM-Exit. Tick-based accounting of vCPU time relies on PF_VCPU being set when the tick IRQ handler runs, and IRQs are blocked throughout the main sequence of vcpu_enter_guest(), including the call into vendor code to actually enter and exit the guest. This fixes a bug where reported guest time remains '0', even when running an infinite loop in the guest: https://bugzilla.kernel.org/show_bug.cgi?id=209831 Fixes: 87fa7f3e98a131 ("x86/kvm: Move context tracking where it belongs") Suggested-by: Thomas Gleixner <tglx@linutronix.de> Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210505002735.1684165-4-seanjc@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-19KVM: VMX: Disable preemption when probing user return MSRsSean Christopherson1-4/+1
commit 5104d7ffcf24749939bea7fdb5378d186473f890 upstream. Disable preemption when probing a user return MSR via RDSMR/WRMSR. If the MSR holds a different value per logical CPU, the WRMSR could corrupt the host's value if KVM is preempted between the RDMSR and WRMSR, and then rescheduled on a different CPU. Opportunistically land the helper in common x86, SVM will use the helper in a future commit. Fixes: 4be534102624 ("KVM: VMX: Initialize vmx->guest_msrs[] right after allocation") Cc: stable@vger.kernel.org Cc: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210504171734.1434054-6-seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-19KVM: VMX: Do not advertise RDPID if ENABLE_RDTSCP control is unsupportedSean Christopherson1-2/+4
commit 8aec21c04caa2000f91cf8822ae0811e4b0c3971 upstream. Clear KVM's RDPID capability if the ENABLE_RDTSCP secondary exec control is unsupported. Despite being enumerated in a separate CPUID flag, RDPID is bundled under the same VMCS control as RDTSCP and will #UD in VMX non-root if ENABLE_RDTSCP is not enabled. Fixes: 41cd02c6f7f6 ("kvm: x86: Expose RDPID in KVM_GET_SUPPORTED_CPUID") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210504171734.1434054-2-seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-19KVM: nVMX: Always make an attempt to map eVMCS after migrationVitaly Kuznetsov1-10/+19
commit f5c7e8425f18fdb9bdb7d13340651d7876890329 upstream. When enlightened VMCS is in use and nested state is migrated with vmx_get_nested_state()/vmx_set_nested_state() KVM can't map evmcs page right away: evmcs gpa is not 'struct kvm_vmx_nested_state_hdr' and we can't read it from VP assist page because userspace may decide to restore HV_X64_MSR_VP_ASSIST_PAGE after restoring nested state (and QEMU, for example, does exactly that). To make sure eVMCS is mapped /vmx_set_nested_state() raises KVM_REQ_GET_NESTED_STATE_PAGES request. Commit f2c7ef3ba955 ("KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES on nested vmexit") added KVM_REQ_GET_NESTED_STATE_PAGES clearing to nested_vmx_vmexit() to make sure MSR permission bitmap is not switched when an immediate exit from L2 to L1 happens right after migration (caused by a pending event, for example). Unfortunately, in the exact same situation we still need to have eVMCS mapped so nested_sync_vmcs12_to_shadow() reflects changes in VMCS12 to eVMCS. As a band-aid, restore nested_get_evmcs_page() when clearing KVM_REQ_GET_NESTED_STATE_PAGES in nested_vmx_vmexit(). The 'fix' is far from being ideal as we can't easily propagate possible failures and even if we could, this is most likely already too late to do so. The whole 'KVM_REQ_GET_NESTED_STATE_PAGES' idea for mapping eVMCS after migration seems to be fragile as we diverge too much from the 'native' path when vmptr loading happens on vmx_set_nested_state(). Fixes: f2c7ef3ba955 ("KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES on nested vmexit") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210503150854.1144255-2-vkuznets@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-19KVM: x86: Move RDPID emulation intercept to its own enumSean Christopherson1-1/+2
commit 2183de4161b90bd3851ccd3910c87b2c9adfc6ed upstream. Add a dedicated intercept enum for RDPID instead of piggybacking RDTSCP. Unlike VMX's ENABLE_RDTSCP, RDPID is not bound to SVM's RDTSCP intercept. Fixes: fb6d4d340e05 ("KVM: x86: emulate RDPID") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210504171734.1434054-5-seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-19KVM/VMX: Invoke NMI non-IST entry instead of IST entryLai Jiangshan1-7/+9
commit a217a6593cec8b315d4c2f344bae33660b39b703 upstream. In VMX, the host NMI handler needs to be invoked after NMI VM-Exit. Before commit 1a5488ef0dcf6 ("KVM: VMX: Invoke NMI handler via indirect call instead of INTn"), this was done by INTn ("int $2"). But INTn microcode is relatively expensive, so the commit reworked NMI VM-Exit handling to invoke the kernel handler by function call. But this missed a detail. The NMI entry point for direct invocation is fetched from the IDT table and called on the kernel stack. But on 64-bit the NMI entry installed in the IDT expects to be invoked on the IST stack. It relies on the "NMI executing" variable on the IST stack to work correctly, which is at a fixed position in the IST stack. When the entry point is unexpectedly called on the kernel stack, the RSP-addressed "NMI executing" variable is obviously also on the kernel stack and is "uninitialized" and can cause the NMI entry code to run in the wrong way. Provide a non-ist entry point for VMX which shares the C-function with the regular NMI entry and invoke the new asm entry point instead. On 32-bit this just maps to the regular NMI entry point as 32-bit has no ISTs and is not affected. [ tglx: Made it independent for backporting, massaged changelog ] Fixes: 1a5488ef0dcf6 ("KVM: VMX: Invoke NMI handler via indirect call instead of INTn") Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Lai Jiangshan <laijs@linux.alibaba.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/87r1imi8i1.ffs@nanos.tec.linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-14KVM: VMX: Intercept FS/GS_BASE MSR accesses for 32-bit KVMSean Christopherson2-0/+6
[ Upstream commit dbdd096a5a74b94f6b786a47baef2085859b0dce ] Disable pass-through of the FS and GS base MSRs for 32-bit KVM. Intel's SDM unequivocally states that the MSRs exist if and only if the CPU supports x86-64. FS_BASE and GS_BASE are mostly a non-issue; a clever guest could opportunistically use the MSRs without issue. KERNEL_GS_BASE is a bigger problem, as a clever guest would subtly be broken if it were migrated, as KVM disallows software access to the MSRs, and unlike the direct variants, KERNEL_GS_BASE needs to be explicitly migrated as it's not captured in the VMCS. Fixes: 25c5f225beda ("KVM: VMX: Enable MSR Bitmap feature") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210422023831.3473491-1-seanjc@google.com> [*NOT* for stable kernels. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-14KVM: x86: dump_vmcs should not assume GUEST_IA32_EFER is validDavid Edmondson1-6/+3
[ Upstream commit d9e46d344e62a0d56fd86a8289db5bed8a57c92e ] If the VM entry/exit controls for loading/saving MSR_EFER are either not available (an older processor or explicitly disabled) or not used (host and guest values are the same), reading GUEST_IA32_EFER from the VMCS returns an inaccurate value. Because of this, in dump_vmcs() don't use GUEST_IA32_EFER to decide whether to print the PDPTRs - always do so if the fields exist. Fixes: 4eb64dce8d0a ("KVM: x86: dump VMCS on invalid entry") Signed-off-by: David Edmondson <david.edmondson@oracle.com> Message-Id: <20210318120841.133123-2-david.edmondson@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-14KVM: nVMX: Truncate base/index GPR value on address calc in !64-bitSean Christopherson1-2/+2
commit 82277eeed65eed6c6ee5b8f97bd978763eab148f upstream. Drop bits 63:32 of the base and/or index GPRs when calculating the effective address of a VMX instruction memory operand. Outside of 64-bit mode, memory encodings are strictly limited to E*X and below. Fixes: 064aea774768 ("KVM: nVMX: Decoding memory operands of VMX instructions") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210422022128.3464144-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-14KVM: nVMX: Truncate bits 63:32 of VMCS field on nested check in !64-bitSean Christopherson1-1/+1
commit ee050a577523dfd5fac95e6cc182ebe0293ead59 upstream. Drop bits 63:32 of the VMCS field encoding when checking for a nested VM-Exit on VMREAD/VMWRITE in !64-bit mode. VMREAD and VMWRITE always use 32-bit operands outside of 64-bit mode. The actual emulation of VMREAD/VMWRITE does the right thing, this bug is purely limited to incorrectly causing a nested VM-Exit if a GPR happens to have bits 63:32 set outside of 64-bit mode. Fixes: a7cde481b6e8 ("KVM: nVMX: Do not forward VMREAD/VMWRITE VMExits to L1 if required so by vmcs12 vmread/vmwrite bitmaps") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210422022128.3464144-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-14KVM: nVMX: Defer the MMU reload to the normal path on an EPTP switchSean Christopherson1-7/+2
commit c805f5d5585ab5e0cdac6b1ccf7086eb120fb7db upstream. Defer reloading the MMU after a EPTP successful EPTP switch. The VMFUNC instruction itself is executed in the previous EPTP context, any side effects, e.g. updating RIP, should occur in the old context. Practically speaking, this bug is benign as VMX doesn't touch the MMU when skipping an emulated instruction, nor does queuing a single-step #DB. No other post-switch side effects exist. Fixes: 41ab93727467 ("KVM: nVMX: Emulate EPTP switching for the L1 hypervisor") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210305011101.3597423-14-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-21KVM: VMX: Don't use vcpu->run->internal.ndata as an array indexReiji Watanabe1-5/+5
[ Upstream commit 04c4f2ee3f68c9a4bf1653d15f1a9a435ae33f7a ] __vmx_handle_exit() uses vcpu->run->internal.ndata as an index for an array access. Since vcpu->run is (can be) mapped to a user address space with a writer permission, the 'ndata' could be updated by the user process at anytime (the user process can set it to outside the bounds of the array). So, it is not safe that __vmx_handle_exit() uses the 'ndata' that way. Fixes: 1aa561b1a4c0 ("kvm: x86: Add "last CPU" to some KVM_EXIT information") Signed-off-by: Reiji Watanabe <reijiw@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <20210413154739.490299-1-reijiw@google.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-04-21KVM: VMX: Convert vcpu_vmx.exit_reason to a unionSean Christopherson3-49/+86
[ Upstream commit 8e53324021645f820a01bf8aa745711c802c8542 ] Convert vcpu_vmx.exit_reason from a u32 to a union (of size u32). The full VM_EXIT_REASON field is comprised of a 16-bit basic exit reason in bits 15:0, and single-bit modifiers in bits 31:16. Historically, KVM has only had to worry about handling the "failed VM-Entry" modifier, which could only be set in very specific flows and required dedicated handling. I.e. manually stripping the FAILED_VMENTRY bit was a somewhat viable approach. But even with only a single bit to worry about, KVM has had several bugs related to comparing a basic exit reason against the full exit reason store in vcpu_vmx. Upcoming Intel features, e.g. SGX, will add new modifier bits that can be set on more or less any VM-Exit, as opposed to the significantly more restricted FAILED_VMENTRY, i.e. correctly handling everything in one-off flows isn't scalable. Tracking exit reason in a union forces code to explicitly choose between consuming the full exit reason and the basic exit, and is a convenient way to document and access the modifiers. No functional change intended. Cc: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Message-Id: <20201106090315.18606-2-chenyi.qiang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-02-10