[Devel] sched/fair: Synchronize throttle_count on boosted wakeups

Submitted by Kirill Tkhai on April 14, 2017, 10:32 a.m.


Message ID 149216591760.4648.86808080509188346.stgit@localhost.localdomain
State New
Series "sched/fair: Synchronize throttle_count on boosted wakeups"
Headers show

Commit Message

Kirill Tkhai April 14, 2017, 10:32 a.m.
Currently, we skip updating of throttle count during
boosted wakeups in enqueue_entity(). It's because of
boosted tasks make cfs_rq's boosted too, and throttling
is impossible there. But it's wrong. Imagine the below

Boosted task enqueued as first task on cfs_rq skips
updating throttle_count. Then not-boosted task wakes up,
and the update is not called too, because there are
two tasks on the cfs_rq. Later, the first task goes
to sleep in kernel (e.g., call sys_poll()), the
second task is working long and then throttling happen
(because the second task is not boosted and runtime

Now, the time for the third not-boosted task to be moved
on this cpu. The update does happen again, as there are
two tasks on the cfs_rq, but check_preempt_wakeup()
works as well. It skips throttled_hierarchy() check as the
throttle_count was not updated, and finds matching se
between the first task and rq->curr. Then, the third
task's runtime is good enough to replace rq->curr,
set_next_buddy() is called. When rq->curr.se depth
is less than the cfs_rq's depth, cfs_rq's (grand)parent se
is marked as next buddy, though it does not contain
queued children (the cfs_rq is throttled and dequeued).
So, further pick_next_entity() gets NULL pointer
on dereferrence of parent_cfs_rq->rb_leftmost, and


Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
 kernel/sched/fair.c |   12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

Patch hide | download patch | download mbox

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a8cf67c977b..6de2bc3e4b5 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3079,7 +3079,7 @@  place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
 	se->vruntime = max_vruntime(se->vruntime, vruntime);
-static void check_enqueue_throttle(struct cfs_rq *cfs_rq);
+static void check_enqueue_throttle(struct cfs_rq *cfs_rq, int flags);
 static inline void check_schedstat_required(void)
@@ -3139,8 +3139,7 @@  enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 	if (cfs_rq->nr_running == 1) {
-		if (!(flags & ENQUEUE_BOOST))
-			check_enqueue_throttle(cfs_rq);
+		check_enqueue_throttle(cfs_rq, flags);
@@ -3975,7 +3974,7 @@  static void do_sched_cfs_slack_timer(struct cfs_bandwidth *cfs_b)
  * expired/exceeded, otherwise it may be allowed to steal additional ticks of
  * runtime as update_curr() throttling can not not trigger until it's on-rq.
-static void check_enqueue_throttle(struct cfs_rq *cfs_rq)
+static void check_enqueue_throttle(struct cfs_rq *cfs_rq, int flags)
@@ -4002,6 +4001,9 @@  static void check_enqueue_throttle(struct cfs_rq *cfs_rq)
+	if (flags & ENQUEUE_BOOST)
+		return;
 	/* an active group must be handled by the update_curr()->put() path */
 	if (!cfs_rq->runtime_enabled || cfs_rq->curr)
@@ -4174,7 +4176,7 @@  static inline u64 cfs_rq_clock_task(struct cfs_rq *cfs_rq)
 static void account_cfs_rq_runtime(struct cfs_rq *cfs_rq, u64 delta_exec) {}
 static void check_cfs_rq_runtime(struct cfs_rq *cfs_rq) {}
-static void check_enqueue_throttle(struct cfs_rq *cfs_rq) {}
+static void check_enqueue_throttle(struct cfs_rq *cfs_rq, int flags) {}
 static inline void sync_throttle(struct task_group *tg, int cpu) {}
 static __always_inline void return_cfs_rq_runtime(struct cfs_rq *cfs_rq) {}