[RHEL7,COMMIT] KVM: x86: Update the exit_qualification access bits while walking an address

Submitted by Konstantin Khorenko on May 16, 2018, 9:50 a.m.

Details

Message ID 201805160950.w4G9olvi030111@finist_ce7.work
State New
Series "KVM: x86: Update the exit_qualification access bits while walking an address"
Headers show

Commit Message

Konstantin Khorenko May 16, 2018, 9:50 a.m.
The commit is pushed to "branch-rh7-3.10.0-693.21.1.vz7.50.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-693.21.1.vz7.47.6
------>
commit 171e28319dd6785845b96fee73f608f72f46df4d
Author: KarimAllah Ahmed <karahmed@amazon.de>
Date:   Wed May 16 12:50:47 2018 +0300

    KVM: x86: Update the exit_qualification access bits while walking an address
    
    ... to avoid having a stale value when handling an EPT misconfig for MMIO
    regions.
    
    MMIO regions that are not passed-through to the guest are handled through
    EPT misconfigs. The first time a certain MMIO page is touched it causes an
    EPT violation, then KVM marks the EPT entry to cause an EPT misconfig
    instead. Any subsequent accesses to the entry will generate an EPT
    misconfig.
    
    Things gets slightly complicated with nested guest handling for MMIO
    regions that are not passed through from L0 (i.e. emulated by L0
    user-space).
    
    An EPT violation for one of these MMIO regions from L2, exits to L0
    hypervisor. L0 would then look at the EPT12 mapping for L1 hypervisor and
    realize it is not present (or not sufficient to serve the request). Then L0
    injects an EPT violation to L1. L1 would then update its EPT mappings. The
    EXIT_QUALIFICATION value for L1 would come from exit_qualification variable
    in "struct vcpu". The problem is that this variable is only updated on EPT
    violation and not on EPT misconfig. So if an EPT violation because of a
    read happened first, then an EPT misconfig because of a write happened
    afterwards. The L0 hypervisor will still contain exit_qualification value
    from the previous read instead of the write and end up injecting an EPT
    violation to the L1 hypervisor with an out of date EXIT_QUALIFICATION.
    
    The EPT violation that is injected from L0 to L1 needs to have the correct
    EXIT_QUALIFICATION specially for the access bits because the individual
    access bits for MMIO EPTs are updated only on actual access of this
    specific type. So for the example above, the L1 hypervisor will keep
    updating only the read bit in the EPT then resume the L2 guest. The L2
    guest would end up causing another exit where the L0 *again* will inject
    another EPT violation to L1 hypervisor with *again* an out of date
    exit_qualification which indicates a read and not a write. Then this
    ping-pong just keeps happening without making any forward progress.
    
    The behavior of mapping MMIO regions changed in:
    
       commit a340b3e229b24 ("kvm: Map PFN-type memory regions as writable (if possible)")
    
    ... where an EPT violation for a read would also fixup the write bits to
    avoid another EPT violation which by acciddent would fix the bug mentioned
    above.
    
    This commit fixes this situation and ensures that the access bits for the
    exit_qualifcation is up to date. That ensures that even L1 hypervisor
    running with a KVM version before the commit mentioned above would still
    work.
    
    ( The description above assumes EPT to be available and used by L1
      hypervisor + the L1 hypervisor is passing through the MMIO region to the L2
      guest while this MMIO region is emulated by the L0 user-space ).
    
    Cc: Paolo Bonzini <pbonzini@redhat.com>
    Cc: Radim Krčmář <rkrcmar@redhat.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: H. Peter Anvin <hpa@zytor.com>
    Cc: x86@kernel.org
    Cc: kvm@vger.kernel.org
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
    Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
    
    (cherry picked from commit ddd6f0e94d3153951580d5b88b9d97c7e26a0e00)
    Signed-off-by: Jan Dakinevich <jan.dakinevich@virtuozzo.com>
    
    =====================
    Patchset description:
    
    EPT fixes and enhancements
    
    Backport of EPT fixes from upstream for
    https://jira.sw.ru/browse/PSBM-84046
    
    Bandan Das (3):
      kvm: mmu: don't set the present bit unconditionally
      kvm: mmu: track read permission explicitly for shadow EPT page tables
      kvm: vmx: advertise support for ept execute only
    
    Junaid Shahid (2):
      kvm: x86: mmu: Use symbolic constants for EPT Violation Exit
        Qualifications
      kvm: x86: mmu: Rename EPT_VIOLATION_READ/WRITE/INSTR constants
    
    KarimAllah Ahmed (2):
      kvm: Map PFN-type memory regions as writable (if possible)
      KVM: x86: Update the exit_qualification access bits while walking an
        address
    
    Paolo Bonzini (5):
      KVM: nVMX: we support 1GB EPT pages
      kvm: x86: MMU support for EPT accessed/dirty bits
      kvm: nVMX: support EPT accessed/dirty bits
      KVM: MMU: return page fault error code from permission_fault
      KVM: nVMX: fix EPT permissions as reported in exit qualification
---
 arch/x86/kvm/paging_tmpl.h | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

Patch hide | download patch | download mbox

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index f96f2a4d5bb9..acc359a7bbef 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -436,14 +436,21 @@  static int FNAME(walk_addr_generic)(struct guest_walker *walker,
 	 * done by is_rsvd_bits_set() above.
 	 *
 	 * We set up the value of exit_qualification to inject:
-	 * [2:0] - Derive from [2:0] of real exit_qualification at EPT violation
+	 * [2:0] - Derive from the access bits. The exit_qualification might be
+	 *         out of date if it is serving an EPT misconfiguration.
 	 * [5:3] - Calculated by the page walk of the guest EPT page tables
 	 * [7:8] - Derived from [7:8] of real exit_qualification
 	 *
 	 * The other bits are set to 0.
 	 */
 	if (!(errcode & PFERR_RSVD_MASK)) {
-		vcpu->arch.exit_qualification &= 0x187;
+		vcpu->arch.exit_qualification &= 0x180;
+		if (write_fault)
+			vcpu->arch.exit_qualification |= EPT_VIOLATION_ACC_WRITE;
+		if (user_fault)
+			vcpu->arch.exit_qualification |= EPT_VIOLATION_ACC_READ;
+		if (fetch_fault)
+			vcpu->arch.exit_qualification |= EPT_VIOLATION_ACC_INSTR;
 		vcpu->arch.exit_qualification |= (pte_access & 0x7) << 3;
 	}
 #endif