Bug with priority inheritance and condition variables

Submitted by Rich Felker on Oct. 26, 2020, 7:49 p.m.

Details

Message ID 20201026194905.GM534@brightrain.aerifal.cx
State New
Series "Bug with priority inheritance and condition variables"
Headers show

Commit Message

Rich Felker Oct. 26, 2020, 7:49 p.m.
On Thu, Sep 24, 2020 at 12:14:07PM -0400, Rich Felker wrote:
> On Thu, Sep 24, 2020 at 03:58:17PM +0100, Edward Scott wrote:
> > Hello,
> > 
> > There appears to be a bug when using priority inheritance in combination
> > with condition variables. I have some code that reproduces the bug:
> > 
> > https://github.com/edward-scott/musl-prio-inherit-cv-bug
> > 
> > Using git bisect I traced the origin of the bug to this commit:
> > 
> > https://git.musl-libc.org/cgit/musl/commit/?id=54ca677983d47529bab8752315ac1a2b49888870
> > 
> > which is the commit that is described as "implement priority inheritance
> > mutexes".
> > 
> > From my analysis it appears that _m_waiters is used by the
> > priority inheritance logic to maintain some state (as described in the
> > commit message) but that conflicts with some use of _m_waiters in the
> > condition variable implementation.
> 
> I think this is entirely correct analysis. Thanks for catching this!
> 
> > The consequence is that pthread_mutex_lock erroneously returns EDEADLK.
> 
> OK, it took me a second to understand this part, because I thought it
> would be ENOTRECOVERABLE, but that's only for robust+PI mutexes.
> EDEADLK seems to be a consequence of succeeding but returning EBUSY,
> which is "wrong" but should only be able to happen with inconsistent
> state, as produced by pthread_cond_timedwait.
> 
> > I don't understand the code well enough to produce a fix.
> 
> I'll take a look. I'd like to just drop adusting the waiters count
> here and instead set the bit-31 may-have-waiters flag here, but I'm
> not sure that's right for all mutex types. It certainly can be made to
> do that just on PI mutexes if needed but having fewer special cases is
> preferable.
> 
> > The demo code (a cut version of some production code) will reproduce the
> > failure. Commenting out the pthread_mutexattr_setprotocol call in
> > the iot_mutex_init function at the end of the thread.c file will cause the
> > code to work as intended (without priority inheritance). The code works
> > fine either way with the GNU lib.
> > 
> > BTW can I recommend that the "magic numbers"  used to represent mutex modes
> > be replaced at some point with defined constants as it would make the code
> > much easier to follow.
> 
> Yes, it's been something I kinda wanted to do, but that would have
> obfuscated and cluttered the actual changes in development when it was
> being done. It might be time to go back and add some now that this
> code is mature.
> 
> > This is my first post to this list so I hope this message is on the right
> > list and is helpful.
> 
> Yep, this is fine. Thanks again!

This took a while to get to, but here's my proposed patch. It drops
all waiters modification in favor of setting the "may have waiters"
flag whenever there's another waiter to be woken. At the time this is
done, the calling thread holds the mutex (except on error re-locking
it, but then the mutex is non-recoverable or else UB occurred), and
setting the flag guarantees it will perform a wake when it eventually
unlocks it.

With the patch applied, your test program gets further along but still
hangs. I think the problem is the #if 0 block in threadpool.c; with
that changed to #if 1, it runs to completion.

Rich

Patch hide | download patch | download mbox

diff --git a/src/thread/pthread_cond_timedwait.c b/src/thread/pthread_cond_timedwait.c
index d1501240..02858f7d 100644
--- a/src/thread/pthread_cond_timedwait.c
+++ b/src/thread/pthread_cond_timedwait.c
@@ -146,14 +146,13 @@  relock:
 
 	if (oldstate == WAITING) goto done;
 
-	if (!node.next) a_inc(&m->_m_waiters);
-
 	/* Unlock the barrier that's holding back the next waiter, and
 	 * either wake it or requeue it to the mutex. */
-	if (node.prev)
+	if (node.prev) {
+		int val = m->_m_lock;
+		if (val>0) a_cas(&m->_m_lock, val, val|0x80000000);
 		unlock_requeue(&node.prev->barrier, &m->_m_lock, m->_m_type & 128);
-	else
-		a_dec(&m->_m_waiters);
+	}
 
 	/* Since a signal was consumed, cancellation is not permitted. */
 	if (e == ECANCELED) e = 0;