Subject: [PATCH] pthread: Fix bug that pthread_create may cause priority inversion

Submitted by zhaohang (F) on Sept. 9, 2019, 1:57 p.m.

Details

Message ID 59FB1E003EF3A943BD6BAD197ABD4D6A2B5D55@dggemi524-mbx.china.huawei.com
State New
Series "Subject: [PATCH] pthread: Fix bug that pthread_create may cause priority inversion"
Headers show

Commit Message

zhaohang (F) Sept. 9, 2019, 1:57 p.m.
Thanks for your reply. Maybe this patch may work by switching to having the new thread just wait for the parent to tell it whether priority setup succeeded.

Subject: [PATCH] pthread: Fix bug that pthread_create may cause priority
 inversion

--------

In pthread_create, caller wait for new thread to set its prio, and
then new thread wake caller up. It may cause priority inversion when
caller waits for new thread and the new thread waits for another thread
whose prio is lower than caller.

Just let caller set prio and affinity of new thread to fix it.

Signed-off-by: Zhao Hang <zhaohang14@huawei.com>

---
 src/thread/pthread_create.c | 23 ++++++++++-------------
 1 file changed, 10 insertions(+), 13 deletions(-)

--
2.7.4

-----邮件原件-----
发件人: Rich Felker [mailto:dalias@aerifal.cx] 代表 Rich Felker
发送时间: 2019年9月5日 21:34
收件人: zhaohang (F) <zhaohang14@huawei.com>
抄送: musl@lists.openwall.com
主题: Re: [musl] src/thread/pthread_create: Why prio of child thread is set by himself

On Thu, Sep 05, 2019 at 02:14:36AM +0000, zhaohang (F) wrote:
> In the function pthread_create, father thread will wait child if 

> attr._a_sched is set, after SYS_clone is finished.Child thread will 

> set his prio in entry 'start', and then wake father thread to 

> continue.

> 

> But consider this kind of situation, there are three threads: A with 

> prio 51, B with prio 30, and C with prio 20 created by A, and there is 

> only simplest sched policy 'FIFO'.

> 

> When system starts, A is running because A is higher than B, then A 

> uses pthread_create to create C. After C is cloned, A wait for C to 

> set prio and wake him up, but after C set his prio to 20, B will be 

> sched. And if B won't exit, A and C will never get sched, even if A is 

> higher than B. Maybe this is a kind of priority inversion.

> 

> So why prio of child is set by himself rather than father? If prio of 

> child is set by father, something will go wrong? Or other 

> considerations?


I think you're correct in your analysis of this problem; I'm going to look at it more in a bit to make sure.

Originally, pthread_create (in the caller) was responsible for setting priority; this changed in b8742f32602add243ee2ce74d804015463726899 and 40bae2d32fd6f3ffea437fa745ad38a1fe77b27e as part of trying to trim down the pthread structure and get init-time-only junk out of it.
However, 04335d9260c076cf4d9264bd93dd3b06c237a639 largely undid that already, and moved the extra start args to a struct on the new thread's stack so that it doesn't contribute to size/clutter in struct pthread. It should be easy to switch back to having the new thread just wait for the parent to tell it whether priority setup succeeded.

One related issue this also turned up is that exiting in detached state is probably a bad idea. Depending on priorities, the thread that failed to start could linger for a long time after pthread_create returns, potentially causing spurious transient resource exhaustion with no way to wait for it to subside. At some point we should probably switch from forcing detached exit to forcing joinable (or equivalent; forcing linking of pthread_join code is somewhat
undesirable) exit so that when a failed pthread_create returns it's not consuming any kernel task resources.

Thanks for the report.

Rich

Patch hide | download patch | download mbox

diff --git a/src/thread/pthread_create.c b/src/thread/pthread_create.c
index 7d4dc2e..ae08c0f 100644
--- a/src/thread/pthread_create.c
+++ b/src/thread/pthread_create.c
@@ -181,15 +181,8 @@  static int start(void *p)
 {
        struct start_args *args = p;
        if (args->attr) {
-               pthread_t self = __pthread_self();
-               int ret = -__syscall(SYS_sched_setscheduler, self->tid,
-                       args->attr->_a_policy, &args->attr->_a_prio);
-               if (a_swap(args->perr, ret)==-2)
-                       __wake(args->perr, 1, 1);
-               if (ret) {
-                       self->detach_state = DT_DETACHED;
-                       __pthread_exit(0);
-               }
+               if (a_cas(args->perr, -1, -2) == -1)
+                       __wait(args->perr, 0, -2, 1);
        }
        __syscall(SYS_rt_sigprocmask, SIG_SETMASK, &args->sig_mask, 0, _NSIG/8);
        __pthread_exit(args->start_func(args->start_arg));
@@ -367,10 +360,14 @@  int __pthread_create(pthread_t *restrict res, const pthread_attr_t *restrict att
        }

        if (attr._a_sched) {
-               if (a_cas(&err, -1, -2)==-1)
-                       __wait(&err, 0, -2, 1);
-               ret = err;
-               if (ret) return ret;
+               ret = -__syscall(SYS_sched_setscheduler, new->tid, attr._a_policy, &attr._a_prio);
+               if (ret) {
+                       new->detach_state = DT_DETACHED;
+                       pthread_cancel(new);
+                       return ret;
+               }
+               if (a_swap(&err, 0) == -2)
+                       __wake(&err, 1, 1);
        }

        *res = new;

Comments

Szabolcs Nagy Sept. 9, 2019, 2:54 p.m.
* zhaohang (F) <zhaohang14@huawei.com> [2019-09-09 13:57:36 +0000]:
> diff --git a/src/thread/pthread_create.c b/src/thread/pthread_create.c
> index 7d4dc2e..ae08c0f 100644
> --- a/src/thread/pthread_create.c
> +++ b/src/thread/pthread_create.c
> @@ -181,15 +181,8 @@ static int start(void *p)
>  {
>         struct start_args *args = p;
>         if (args->attr) {
> -               pthread_t self = __pthread_self();
> -               int ret = -__syscall(SYS_sched_setscheduler, self->tid,
> -                       args->attr->_a_policy, &args->attr->_a_prio);
> -               if (a_swap(args->perr, ret)==-2)
> -                       __wake(args->perr, 1, 1);
> -               if (ret) {
> -                       self->detach_state = DT_DETACHED;
> -                       __pthread_exit(0);
> -               }
> +               if (a_cas(args->perr, -1, -2) == -1)
> +                       __wait(args->perr, 0, -2, 1);
>         }
>         __syscall(SYS_rt_sigprocmask, SIG_SETMASK, &args->sig_mask, 0, _NSIG/8);
>         __pthread_exit(args->start_func(args->start_arg));
> @@ -367,10 +360,14 @@ int __pthread_create(pthread_t *restrict res, const pthread_attr_t *restrict att
>         }
> 
>         if (attr._a_sched) {
> -               if (a_cas(&err, -1, -2)==-1)
> -                       __wait(&err, 0, -2, 1);
> -               ret = err;
> -               if (ret) return ret;
> +               ret = -__syscall(SYS_sched_setscheduler, new->tid, attr._a_policy, &attr._a_prio);
> +               if (ret) {
> +                       new->detach_state = DT_DETACHED;
> +                       pthread_cancel(new);
> +                       return ret;

the child has the cancel signal blocked so it will never act on the signal.

but even if that's fixed, the detached child may not get scheduled to
handle the signal for a long time and will take up stack/tid resources.

i think Rich already has a solution that will deal with these issues.

> +               }
> +               if (a_swap(&err, 0) == -2)
> +                       __wake(&err, 1, 1);
>         }
> 
>         *res = new;