[Devel,rh7] sched: make load balancing more agressive

Submitted by Vladimir Davydov on Aug. 3, 2016, 4:43 p.m.

Details

Message ID 1470242637-16327-1-git-send-email-vdavydov@virtuozzo.com
State New
Series "sched: make load balancing more agressive"
Headers show

Commit Message

Vladimir Davydov Aug. 3, 2016, 4:43 p.m.
Currently, we only pull tasks if the destination cpu group load is below
the average over the domain being rebalanced. This sounds reasonable,
but only as long as there's no pinned tasks, otherwise we can get an
unfair task distribution. For instance, suppose the host has 16 cores
and there's a container pinned to two of the cores (either strictly by
using cpumask or indirectly by setting cpulimit). If we start 16 tasks
in the container, then the average load will be 1, so that even if 15
tasks turn out to run on the same cpu (out of 2), no tasks will be
pulled, which is wrong.

To overcome this issue, let's port the following patches from PCS6:

  diff-sched-balance-even-if-load-is-greater-than-average
  diff-sched-always-try-to-equalize-load-between-this-and-busiest-cpus-when-balancing

They make the balance procedure pull tasks even if the destination is
above average, by setting the imbalance value to be

  (source_load - destination_load) / 2

instead of

  (average_load - destination_load) / 2

This implies decreasing the convergence speed of the balancing
procedure, but PCS6 has worked like that for quite a while, so it should
be fine.

Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
---
 kernel/sched/fair.c | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

Patch hide | download patch | download mbox

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index cedd178f963c..685517597a30 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6618,7 +6618,7 @@  static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
 	/* How much load to actually move to equalise the imbalance */
 	env->imbalance = min(
 		max_pull * busiest->group_power,
-		(sds->avg_load - this->avg_load) * this->group_power
+		(busiest->avg_load - this->avg_load) * this->group_power
 	) / SCHED_POWER_SCALE;
 
 	/*
@@ -6695,13 +6695,6 @@  static struct sched_group *find_busiest_group(struct lb_env *env)
 	if (this->avg_load >= busiest->avg_load)
 		goto out_balanced;
 
-	/*
-	 * Don't pull any tasks if this group is already above the domain
-	 * average load.
-	 */
-	if (this->avg_load >= sds.avg_load)
-		goto out_balanced;
-
 	if (env->idle == CPU_IDLE) {
 		/*
 		 * This cpu is idle. If the busiest group load doesn't