Potential deadlock in pthread_kill()

Submitted by Rich Felker on July 6, 2020, 10 p.m.

Details

Message ID 20200706220024.GJ6430@brightrain.aerifal.cx
State New
Series "Potential deadlock in pthread_kill()"
Headers show

Commit Message

Rich Felker July 6, 2020, 10 p.m.
On Tue, Jun 30, 2020 at 02:00:13PM -0700, Hydro Flask wrote:
> On 2020-06-30 12:54, Rich Felker wrote:
> >Note that for fixing this issue, it won't suffice just to make
> >pthread_kill block signals. The other places that use the killlock
> >also need to block signals, to make the lock fully AS-safe.
> 
> Rich did you see the issue I was talking about in the example in my
> previous email? Does it make sense?
> 
> I do not mind submitting a patch for this issue if you don't
> currently have the spare cycles, just let me know if we are in
> agreement in terms of understanding the issue and I can submit a
> patch that blocks "app signals" in every place that uses the
> killlock.
> 
> Probably could also remove the pthread_self() special case that you
> referenced earlier in the thread but that would be a second patch.

Yes, I see it clearly now. Sorry it took a while. I have prepared the
attached patch which I'll push soon if there are no problems.

Rich
From 7cc9496a18c3fa665c286b8be41d790c795e0171 Mon Sep 17 00:00:00 2001
From: Rich Felker <dalias@aerifal.cx>
Date: Mon, 6 Jul 2020 17:56:19 -0400
Subject: [PATCH] make thread killlock async-signal-safe for pthread_kill

pthread_kill is required to be AS-safe. that requirement can't be met
if the target thread's killlock can be taken in contexts where
application-installed signal handlers can run.

block signals around use of this lock in all pthread_* functions which
target a tid, and reorder blocking/unblocking of signals in
pthread_exit so that they're blocked whenever the killlock is held.
---
 src/thread/pthread_create.c        | 11 ++++++-----
 src/thread/pthread_getschedparam.c |  3 +++
 src/thread/pthread_kill.c          |  3 +++
 src/thread/pthread_setschedparam.c |  3 +++
 src/thread/pthread_setschedprio.c  |  3 +++
 5 files changed, 18 insertions(+), 5 deletions(-)

Patch hide | download patch | download mbox

diff --git a/src/thread/pthread_create.c b/src/thread/pthread_create.c
index 6bdfb44f..10f1b7d8 100644
--- a/src/thread/pthread_create.c
+++ b/src/thread/pthread_create.c
@@ -72,12 +72,13 @@  _Noreturn void __pthread_exit(void *result)
 	/* Access to target the exiting thread with syscalls that use
 	 * its kernel tid is controlled by killlock. For detached threads,
 	 * any use past this point would have undefined behavior, but for
-	 * joinable threads it's a valid usage that must be handled. */
+	 * joinable threads it's a valid usage that must be handled.
+	 * Signals must be blocked since pthread_kill must be AS-safe. */
+	__block_app_sigs(&set);
 	LOCK(self->killlock);
 
-	/* The thread list lock must be AS-safe, and thus requires
-	 * application signals to be blocked before it can be taken. */
-	__block_app_sigs(&set);
+	/* The thread list lock must be AS-safe, and thus depends on
+	 * application signals being blocked above. */
 	__tl_lock();
 
 	/* If this is the only thread in the list, don't proceed with
@@ -85,8 +86,8 @@  _Noreturn void __pthread_exit(void *result)
 	 * signal state to prepare for exit to call atexit handlers. */
 	if (self->next == self) {
 		__tl_unlock();
-		__restore_sigs(&set);
 		UNLOCK(self->killlock);
+		__restore_sigs(&set);
 		exit(0);
 	}
 
diff --git a/src/thread/pthread_getschedparam.c b/src/thread/pthread_getschedparam.c
index 1cba073d..c098befb 100644
--- a/src/thread/pthread_getschedparam.c
+++ b/src/thread/pthread_getschedparam.c
@@ -4,6 +4,8 @@ 
 int pthread_getschedparam(pthread_t t, int *restrict policy, struct sched_param *restrict param)
 {
 	int r;
+	sigset_t set;
+	__block_app_sigs(&set);
 	LOCK(t->killlock);
 	if (!t->tid) {
 		r = ESRCH;
@@ -14,5 +16,6 @@  int pthread_getschedparam(pthread_t t, int *restrict policy, struct sched_param
 		}
 	}
 	UNLOCK(t->killlock);
+	__restore_sigs(&set);
 	return r;
 }
diff --git a/src/thread/pthread_kill.c b/src/thread/pthread_kill.c
index 3d9395cb..446254b6 100644
--- a/src/thread/pthread_kill.c
+++ b/src/thread/pthread_kill.c
@@ -4,9 +4,12 @@ 
 int pthread_kill(pthread_t t, int sig)
 {
 	int r;
+	sigset_t set;
+	__block_app_sigs(&set);
 	LOCK(t->killlock);
 	r = t->tid ? -__syscall(SYS_tkill, t->tid, sig)
 		: (sig+0U >= _NSIG ? EINVAL : 0);
 	UNLOCK(t->killlock);
+	__restore_sigs(&set);
 	return r;
 }
diff --git a/src/thread/pthread_setschedparam.c b/src/thread/pthread_setschedparam.c
index 038d13d8..76d4d45a 100644
--- a/src/thread/pthread_setschedparam.c
+++ b/src/thread/pthread_setschedparam.c
@@ -4,8 +4,11 @@ 
 int pthread_setschedparam(pthread_t t, int policy, const struct sched_param *param)
 {
 	int r;
+	sigset_t set;
+	__block_app_sigs(&set);
 	LOCK(t->killlock);
 	r = !t->tid ? ESRCH : -__syscall(SYS_sched_setscheduler, t->tid, policy, param);
 	UNLOCK(t->killlock);
+	__restore_sigs(&set);
 	return r;
 }
diff --git a/src/thread/pthread_setschedprio.c b/src/thread/pthread_setschedprio.c
index 5bf4a019..fc2e13dd 100644
--- a/src/thread/pthread_setschedprio.c
+++ b/src/thread/pthread_setschedprio.c
@@ -4,8 +4,11 @@ 
 int pthread_setschedprio(pthread_t t, int prio)
 {
 	int r;
+	sigset_t set;
+	__block_app_sigs(&set);
 	LOCK(t->killlock);
 	r = !t->tid ? ESRCH : -__syscall(SYS_sched_setparam, t->tid, &prio);
 	UNLOCK(t->killlock);
+	__restore_sigs(&set);
 	return r;
 }

Comments

Hydro Flask July 6, 2020, 10:14 p.m.
On 2020-07-06 15:00, Rich Felker wrote:
> Yes, I see it clearly now. Sorry it took a while. I have prepared the
> attached patch which I'll push soon if there are no problems.

Needs one more tiny tweak. I noticed that pthread_cancel() calls 
pthread_kill(). That means pthread_kill() must be async-cancel-safe. If 
an asynchronous cancellation happens after pthread_kill() grabs the 
killlock, then it will deadlock because the asynchronous 
pthread_exit(PTHREAD_CANCELED) call will then recursively try to grab 
killlock.

The solution as far as I can tell is to not just block app signals when 
grabbing killlock, but all signals.
Rich Felker July 6, 2020, 10:22 p.m.
On Mon, Jul 06, 2020 at 03:14:43PM -0700, Hydro Flask wrote:
> On 2020-07-06 15:00, Rich Felker wrote:
> >Yes, I see it clearly now. Sorry it took a while. I have prepared the
> >attached patch which I'll push soon if there are no problems.
> 
> Needs one more tiny tweak. I noticed that pthread_cancel() calls
> pthread_kill(). That means pthread_kill() must be async-cancel-safe.
> If an asynchronous cancellation happens after pthread_kill() grabs
> the killlock, then it will deadlock because the asynchronous
> pthread_exit(PTHREAD_CANCELED) call will then recursively try to
> grab killlock.
> 
> The solution as far as I can tell is to not just block app signals
> when grabbing killlock, but all signals.

Indeed. It'd also work to disable async cancellation for the duration
of the pthread_cancel call, but that's almost surely more work.

Are you in agreement that it suffices for only pthread_kill to block
all signals? (Still blocking just app signals everywhere else) The
scheduling functions could be changed too, but I'm hesitant to change
pthread_exit without thinking about it further since it has a lot more
subtleties. And I think only pthread_kill needs it since it's the only
one that needs to be AC-safe.

Rich
Hydro Flask July 6, 2020, 10:37 p.m.
On 2020-07-06 15:22, Rich Felker wrote:
> On Mon, Jul 06, 2020 at 03:14:43PM -0700, Hydro Flask wrote:
>> On 2020-07-06 15:00, Rich Felker wrote:
>> >Yes, I see it clearly now. Sorry it took a while. I have prepared the
>> >attached patch which I'll push soon if there are no problems.
>> 
>> Needs one more tiny tweak. I noticed that pthread_cancel() calls
>> pthread_kill(). That means pthread_kill() must be async-cancel-safe.
>> If an asynchronous cancellation happens after pthread_kill() grabs
>> the killlock, then it will deadlock because the asynchronous
>> pthread_exit(PTHREAD_CANCELED) call will then recursively try to
>> grab killlock.
>> 
>> The solution as far as I can tell is to not just block app signals
>> when grabbing killlock, but all signals.
> 
> Are you in agreement that it suffices for only pthread_kill to block
> all signals? (Still blocking just app signals everywhere else) The
> scheduling functions could be changed too, but I'm hesitant to change
> pthread_exit without thinking about it further since it has a lot more
> subtleties. And I think only pthread_kill needs it since it's the only
> one that needs to be AC-safe.

Yes I think that's right, since every other function that grabs the 
killlock can assume that async cancellation won't happen.