[RHEL7,COMMIT] ms/virtio_balloon: fix another race between migration and ballooning

Submitted by Konstantin Khorenko on Feb. 11, 2019, 2:40 p.m.

Details

Message ID 201902111440.x1BEeVcr010782@finist-ce7.sw.ru
State New
Series "virtio_balloon: fix another race between migration and ballooning"
Headers show

Commit Message

Konstantin Khorenko Feb. 11, 2019, 2:40 p.m.
The commit is pushed to "branch-rh7-3.10.0-957.1.3.vz7.83.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-957.1.3.vz7.83.11
------>
commit 1bbb4f5de6b5103929328e82e96091be0cf8b37b
Author: Jiang Biao <jiang.biao2@zte.com.cn>
Date:   Mon Feb 11 17:40:31 2019 +0300

    ms/virtio_balloon: fix another race between migration and ballooning
    
    Kernel panic when with high memory pressure, calltrace looks like,
    
    PID: 21439 TASK: ffff881be3afedd0 CPU: 16 COMMAND: "java"
     #0 [ffff881ec7ed7630] machine_kexec at ffffffff81059beb
     #1 [ffff881ec7ed7690] __crash_kexec at ffffffff81105942
     #2 [ffff881ec7ed7760] crash_kexec at ffffffff81105a30
     #3 [ffff881ec7ed7778] oops_end at ffffffff816902c8
     #4 [ffff881ec7ed77a0] no_context at ffffffff8167ff46
     #5 [ffff881ec7ed77f0] __bad_area_nosemaphore at ffffffff8167ffdc
     #6 [ffff881ec7ed7838] __node_set at ffffffff81680300
     #7 [ffff881ec7ed7860] __do_page_fault at ffffffff8169320f
     #8 [ffff881ec7ed78c0] do_page_fault at ffffffff816932b5
     #9 [ffff881ec7ed78f0] page_fault at ffffffff8168f4c8
        [exception RIP: _raw_spin_lock_irqsave+47]
        RIP: ffffffff8168edef RSP: ffff881ec7ed79a8 RFLAGS: 00010046
        RAX: 0000000000000246 RBX: ffffea0019740d00 RCX: ffff881ec7ed7fd8
        RDX: 0000000000020000 RSI: 0000000000000016 RDI: 0000000000000008
        RBP: ffff881ec7ed79a8 R8: 0000000000000246 R9: 000000000001a098
        R10: ffff88107ffda000 R11: 0000000000000000 R12: 0000000000000000
        R13: 0000000000000008 R14: ffff881ec7ed7a80 R15: ffff881be3afedd0
        ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
    
    It happens in the pagefault and results in double pagefault
    during compacting pages when memory allocation fails.
    
    Analysed the vmcore, the page leads to second pagefault is corrupted
    with _mapcount=-256, but private=0.
    
    It's caused by the race between migration and ballooning, and lock
    missing in virtballoon_migratepage() of virtio_balloon driver.
    This patch fix the bug.
    
    Fixes: e22504296d4f64f ("virtio_balloon: introduce migration primitives to balloon pages")
    Cc: stable@vger.kernel.org
    Signed-off-by: Jiang Biao <jiang.biao2@zte.com.cn>
    Signed-off-by: Huang Chong <huang.chong@zte.com.cn>
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    
    https://jira.sw.ru/browse/PSBM-91532
    (cherry picked from commit 89da619bc18d79bca5304724c11d4ba3b67ce2c6)
    Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
---
 drivers/virtio/virtio_balloon.c | 2 ++
 1 file changed, 2 insertions(+)

Patch hide | download patch | download mbox

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index ab36461fee08..4e79a13f8dcb 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -497,7 +497,9 @@  static int virtballoon_migratepage(struct balloon_dev_info *vb_dev_info,
 	tell_host(vb, vb->inflate_vq);
 
 	/* balloon's page migration 2nd step -- deflate "page" */
+	spin_lock_irqsave(&vb_dev_info->pages_lock, flags);
 	balloon_page_delete(page);
+	spin_unlock_irqrestore(&vb_dev_info->pages_lock, flags);
 	vb->num_pfns = VIRTIO_BALLOON_PAGES_PER_PAGE;
 	set_page_pfns(vb, vb->pfns, page);
 	tell_host(vb, vb->deflate_vq);