[rh7,2/2] kvm: move actual VM memory shrink out of kvm_lock

Submitted by Konstantin Khorenko on June 5, 2019, 3:30 p.m.

Details

Message ID 20190605153003.16790-2-khorenko@virtuozzo.com
State New
Series "Series without cover letter"
Headers show

Commit Message

Konstantin Khorenko June 5, 2019, 3:30 p.m.
We face a situation when a node with many cpu cores (88) and a lot
of RAM (1Tb) and many VMs (300) has almost all cpu cores busy in
mmu_shrink_scan():
all but one just wait for kvm_lock,
the last one is busy with actual memory shrink for a VM.

Let's allow parallel VM shrinking:
- just inc the VM usage count, so it's not destroyed under us
- drop the kvm_lock, so over shrinkers are free to go
- and shrink our VM without holding the kvm_lock
- dec the VM usage count after we finish with shrinking

As we shrink only single VM, we don't need the protect the vm_list
all the way during shrink.

https://jira.sw.ru/browse/PSBM-95077

Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
---
 arch/x86/kvm/mmu.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

Patch hide | download patch | download mbox

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 1d576bf305e1..0c3ded90dd38 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -5359,8 +5359,10 @@  mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
 		 * to shrink more than one VM and it is very unlikely to see
 		 * !n_used_mmu_pages so many times.
 		 */
-		if (!nr_to_scan--)
+		if (!nr_to_scan--) {
+			spin_unlock(&kvm_lock);
 			break;
+		}
 
 		/* Does not matter if we will shrink current VM or not, let's
 		 * move it to the tail, so next shrink won't hit it again soon.
@@ -5381,6 +5383,9 @@  mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
 		      !kvm_has_zapped_obsolete_pages(kvm))
 			continue;
 
+		kvm_get_kvm(kvm);
+		spin_unlock(&kvm_lock);
+
 		idx = srcu_read_lock(&kvm->srcu);
 		spin_lock(&kvm->mmu_lock);
 
@@ -5398,10 +5403,11 @@  mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
 		spin_unlock(&kvm->mmu_lock);
 		srcu_read_unlock(&kvm->srcu, idx);
 
+		kvm_put_kvm(kvm);
+
 		break;
 	}
 
-	spin_unlock(&kvm_lock);
 	return freed;
 
 }

Comments

Kirill Tkhai June 6, 2019, 3:18 p.m.
On 05.06.2019 18:30, Konstantin Khorenko wrote:
> We face a situation when a node with many cpu cores (88) and a lot
> of RAM (1Tb) and many VMs (300) has almost all cpu cores busy in
> mmu_shrink_scan():
> all but one just wait for kvm_lock,
> the last one is busy with actual memory shrink for a VM.
> 
> Let's allow parallel VM shrinking:
> - just inc the VM usage count, so it's not destroyed under us
> - drop the kvm_lock, so over shrinkers are free to go
> - and shrink our VM without holding the kvm_lock
> - dec the VM usage count after we finish with shrinking
> 
> As we shrink only single VM, we don't need the protect the vm_list
> all the way during shrink.
> 
> https://jira.sw.ru/browse/PSBM-95077
> 
> Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
> ---
>  arch/x86/kvm/mmu.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 1d576bf305e1..0c3ded90dd38 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -5359,8 +5359,10 @@ mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
>  		 * to shrink more than one VM and it is very unlikely to see
>  		 * !n_used_mmu_pages so many times.
>  		 */
> -		if (!nr_to_scan--)
> +		if (!nr_to_scan--) {
> +			spin_unlock(&kvm_lock);
>  			break;
> +		}
>  
>  		/* Does not matter if we will shrink current VM or not, let's
>  		 * move it to the tail, so next shrink won't hit it again soon.
> @@ -5381,6 +5383,9 @@ mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
>  		      !kvm_has_zapped_obsolete_pages(kvm))
>  			continue;
>  
> +		kvm_get_kvm(kvm);

I assume you actually want to use kvm_try_get_kvm() here.

> +		spin_unlock(&kvm_lock);
> +
>  		idx = srcu_read_lock(&kvm->srcu);
>  		spin_lock(&kvm->mmu_lock);
>  
> @@ -5398,10 +5403,11 @@ mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
>  		spin_unlock(&kvm->mmu_lock);
>  		srcu_read_unlock(&kvm->srcu, idx);
>  
> +		kvm_put_kvm(kvm);
> +
>  		break;
>  	}
>  
> -	spin_unlock(&kvm_lock);
>  	return freed;
>  
>  }
>