[Devel] kvm/x86: skip async_pf when in guest mode

Submitted by Roman Kagan on Dec. 2, 2016, 1:27 p.m.

Details

Message ID 20161202132754.26814-1-rkagan@virtuozzo.com
State New
Series "kvm/x86: skip async_pf when in guest mode"
Headers show

Commit Message

Roman Kagan Dec. 2, 2016, 1:27 p.m.
Async pagefault machinery assumes communication with L1 guests only: all
the state -- MSRs, apf area addresses, etc, -- are for L1.  However, it
currently doesn't check if the vCPU is running L1 or L2, and may inject

To reproduce the problem, use a host with swap enabled, run a VM on it,
run a nested VM on top, and set RSS limit for L1 on the host via
/sys/fs/cgroup/memory/machine.slice/machine-*.scope/memory.limit_in_bytes
to swap it out (you may need to tighten and release it once or twice, or
create some memory load inside L1).  Very quickly L2 guest starts
receiving pagefaults with bogus %cr2 (apf tokens from the host
actually), and L1 guest starts accumulating tasks stuck in D state in
kvm_async_pf_task_wait.

To avoid that, only do async_pf stuff when executing L1 guest.

Note: this patch only fixes x86; other async_pf-capable arches may also
need something similar.

Signed-off-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
(cherry picked from commit 80e2a7bb8d7050d2ea6d8961c526a65d30d5eb08)
Fixes: PSBM-54491
Signed-off-by: Roman Kagan <rkagan@virtuozzo.com>
---
The patch has been merged into kvm/queue but not yet pull-requested to
Linus

 arch/x86/kvm/mmu.c | 2 +-
 arch/x86/kvm/x86.c | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

Patch hide | download patch | download mbox

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 17973ed..c82bf5f 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3481,7 +3481,7 @@  static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
 	if (!async)
 		return false; /* *pfn has correct page already */
 
-	if (!prefault && can_do_async_pf(vcpu)) {
+	if (!prefault && !is_guest_mode(vcpu) && can_do_async_pf(vcpu)) {
 		trace_kvm_try_async_get_page(gva, gfn);
 		if (kvm_find_async_pf_gfn(vcpu, gfn)) {
 			trace_kvm_async_pf_doublefault(gva, gfn);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 78ea28c..4edeb8a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6780,7 +6780,8 @@  static int __vcpu_run(struct kvm_vcpu *vcpu)
 			++vcpu->stat.request_irq_exits;
 		}
 
-		kvm_check_async_pf_completion(vcpu);
+		if (!is_guest_mode(vcpu))
+			kvm_check_async_pf_completion(vcpu);
 
 		if (signal_pending(current)) {
 			r = -EINTR;