[Devel,RHEL7,COMMIT] ve/sched: make load balancing more agressive

Submitted by Konstantin Khorenko on Aug. 10, 2016, 3:18 p.m.

Details

Message ID 201608101518.u7AFIUx0026447@finist_cl7.x64_64.work.ct
State New
Series "sched: make load balancing more agressive"
Headers show

Commit Message

Konstantin Khorenko Aug. 10, 2016, 3:18 p.m.
The commit is pushed to "branch-rh7-3.10.0-327.22.2.vz7.16.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-327.22.2.vz7.16.4
------>
commit c177bb75132a5e7ef5aae3ef466b40fe7f336da9
Author: Vladimir Davydov <vdavydov@virtuozzo.com>
Date:   Wed Aug 10 19:18:29 2016 +0400

    ve/sched: make load balancing more agressive
    
    Currently, we only pull tasks if the destination cpu group load is below
    the average over the domain being rebalanced. This sounds reasonable,
    but only as long as there's no pinned tasks, otherwise we can get an
    unfair task distribution. For instance, suppose the host has 16 cores
    and there's a container pinned to two of the cores (either strictly by
    using cpumask or indirectly by setting cpulimit). If we start 16 tasks
    in the container, then the average load will be 1, so that even if 15
    tasks turn out to run on the same cpu (out of 2), no tasks will be
    pulled, which is wrong.
    
    To overcome this issue, let's port the following patches from PCS6:
    
      diff-sched-balance-even-if-load-is-greater-than-average
      diff-sched-always-try-to-equalize-load-between-this-and-busiest-cpus-when-balancing
    
    They make the balance procedure pull tasks even if the destination is
    above average, by setting the imbalance value to be
    
      (source_load - destination_load) / 2
    
    instead of
    
      (average_load - destination_load) / 2
    
    This implies decreasing the convergence speed of the balancing
    procedure, but PCS6 has worked like that for quite a while, so it should
    be fine.
    
    Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
---
 kernel/sched/fair.c | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

Patch hide | download patch | download mbox

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 96e581a..a419f39 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6637,7 +6637,7 @@  static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
 	/* How much load to actually move to equalise the imbalance */
 	env->imbalance = min(
 		max_pull * busiest->group_power,
-		(sds->avg_load - this->avg_load) * this->group_power
+		(busiest->avg_load - this->avg_load) * this->group_power
 	) / SCHED_POWER_SCALE;
 
 	/*
@@ -6714,13 +6714,6 @@  static struct sched_group *find_busiest_group(struct lb_env *env)
 	if (this->avg_load >= busiest->avg_load)
 		goto out_balanced;
 
-	/*
-	 * Don't pull any tasks if this group is already above the domain
-	 * average load.
-	 */
-	if (this->avg_load >= sds.avg_load)
-		goto out_balanced;
-
 	if (env->idle == CPU_IDLE) {
 		/*
 		 * This cpu is idle. If the busiest group load doesn't