sem_wait and EINTR

Submitted by Markus Wichmann on Dec. 6, 2018, 5:33 p.m.

Details

Message ID 20181206173337.GD32233@voyager
State New
Series "sem_wait and EINTR"
Headers show

Commit Message

Markus Wichmann Dec. 6, 2018, 5:33 p.m.
Hi all,

is the attached patch acceptable? A word about the bitfields: I
generally dislike them for most things, but I didn't want to destroy the
alignment struct __libc had going on, and these other flags really only
are 0 or 1.

Patch is untested for want of an old kernel.

Ciao,
Markus

Patch hide | download patch | download mbox

From 816abf54fbbb02923331b69b62333b8a0edb4181 Mon Sep 17 00:00:00 2001
From: Markus Wichmann <nullplan@gmx.net>
Date: Thu, 6 Dec 2018 18:30:26 +0100
Subject: [PATCH 3/3] Add workaround for old linux bug.

---
 src/internal/libc.h        | 4 +---
 src/signal/sigaction.c     | 2 ++
 src/thread/sem_timedwait.c | 5 ++++-
 3 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/src/internal/libc.h b/src/internal/libc.h
index ac97dc7e..2e6737d9 100644
--- a/src/internal/libc.h
+++ b/src/internal/libc.h
@@ -18,9 +18,7 @@  struct tls_module {
 };
 
 struct __libc {
-	int can_do_threads;
-	int threaded;
-	int secure;
+	unsigned can_do_threads:1, threaded:1, secure:1, handling_sigs:1;
 	volatile int threads_minus_1;
 	size_t *auxv;
 	struct tls_module *tls_head;
diff --git a/src/signal/sigaction.c b/src/signal/sigaction.c
index af47195e..5f02b99b 100644
--- a/src/signal/sigaction.c
+++ b/src/signal/sigaction.c
@@ -43,6 +43,8 @@  int __libc_sigaction(int sig, const struct sigaction *restrict sa, struct sigact
 					SIGPT_SET, 0, _NSIG/8);
 				unmask_done = 1;
 			}
+			if (!(sa->sa_flags & SA_RESTART))
+				libc.handling_sigs = 1;
 		}
 		/* Changing the disposition of SIGABRT to anything but
 		 * SIG_DFL requires a lock, so that it cannot be changed
diff --git a/src/thread/sem_timedwait.c b/src/thread/sem_timedwait.c
index 8132eb1b..e76ae9de 100644
--- a/src/thread/sem_timedwait.c
+++ b/src/thread/sem_timedwait.c
@@ -22,7 +22,10 @@  int sem_timedwait(sem_t *restrict sem, const struct timespec *restrict at)
 		pthread_cleanup_push(cleanup, (void *)(sem->__val+1));
 		r = __timedwait_cp(sem->__val, -1, CLOCK_REALTIME, at, sem->__val[2]);
 		pthread_cleanup_pop(1);
-		if (r && r != EINTR) {
+                /* Linux pre-2.6.22 bug: Sometimes SYS_futex returns with EINTR when it should not.
+                 * Workaround: Retry on EINTR unless someone installed handlers before.
+                 */
+		if (r && (r != EINTR || libc.handling_sigs)) {
 			errno = r;
 			return -1;
 		}
-- 
2.19.1


Comments

Rich Felker Dec. 9, 2018, 2:51 a.m.
On Thu, Dec 06, 2018 at 06:33:37PM +0100, Markus Wichmann wrote:
> Hi all,
> 
> is the attached patch acceptable? A word about the bitfields: I
> generally dislike them for most things, but I didn't want to destroy the
> alignment struct __libc had going on, and these other flags really only
> are 0 or 1.
> 
> Patch is untested for want of an old kernel.
> 
> Ciao,
> Markus

> >From 816abf54fbbb02923331b69b62333b8a0edb4181 Mon Sep 17 00:00:00 2001
> From: Markus Wichmann <nullplan@gmx.net>
> Date: Thu, 6 Dec 2018 18:30:26 +0100
> Subject: [PATCH 3/3] Add workaround for old linux bug.
> 
> ---
>  src/internal/libc.h        | 4 +---
>  src/signal/sigaction.c     | 2 ++
>  src/thread/sem_timedwait.c | 5 ++++-
>  3 files changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/src/internal/libc.h b/src/internal/libc.h
> index ac97dc7e..2e6737d9 100644
> --- a/src/internal/libc.h
> +++ b/src/internal/libc.h
> @@ -18,9 +18,7 @@ struct tls_module {
>  };
>  
>  struct __libc {
> -	int can_do_threads;
> -	int threaded;
> -	int secure;
> +	unsigned can_do_threads:1, threaded:1, secure:1, handling_sigs:1;
>  	volatile int threads_minus_1;
>  	size_t *auxv;
>  	struct tls_module *tls_head;
> diff --git a/src/signal/sigaction.c b/src/signal/sigaction.c
> index af47195e..5f02b99b 100644
> --- a/src/signal/sigaction.c
> +++ b/src/signal/sigaction.c
> @@ -43,6 +43,8 @@ int __libc_sigaction(int sig, const struct sigaction *restrict sa, struct sigact
>  					SIGPT_SET, 0, _NSIG/8);
>  				unmask_done = 1;
>  			}
> +			if (!(sa->sa_flags & SA_RESTART))
> +				libc.handling_sigs = 1;
>  		}
>  		/* Changing the disposition of SIGABRT to anything but
>  		 * SIG_DFL requires a lock, so that it cannot be changed
> diff --git a/src/thread/sem_timedwait.c b/src/thread/sem_timedwait.c
> index 8132eb1b..e76ae9de 100644
> --- a/src/thread/sem_timedwait.c
> +++ b/src/thread/sem_timedwait.c
> @@ -22,7 +22,10 @@ int sem_timedwait(sem_t *restrict sem, const struct timespec *restrict at)
>  		pthread_cleanup_push(cleanup, (void *)(sem->__val+1));
>  		r = __timedwait_cp(sem->__val, -1, CLOCK_REALTIME, at, sem->__val[2]);
>  		pthread_cleanup_pop(1);
> -		if (r && r != EINTR) {
> +                /* Linux pre-2.6.22 bug: Sometimes SYS_futex returns with EINTR when it should not.
> +                 * Workaround: Retry on EINTR unless someone installed handlers before.
> +                 */
> +		if (r && (r != EINTR || libc.handling_sigs)) {
>  			errno = r;
>  			return -1;
>  		}
> -- 
> 2.19.1
> 

Conceptually the logic looks correct, but I'm trying to phase out the
__libc structure, and more importantly, introducing a bitfield here is
not safe unless access to all bitfield members is controlled by a
common mutex or other synchronization mechanism. There is a data race
if any access to the other fields is accessed when sigaction is
callable from another thread, and the impact of a data race on any of
these is critical.

Unfortunately, even without the bitfield, there is a data race between
access to the new flag from sem_timedwait and from sigaction. In
theory, this potentially results in a race window where sem_wait
retries on EINTR because it doesn't yet see the updated handling_sigs.

In reality, I don't think that happens -- at least not on kernels
without the bug -- because the store to handling_sigs is sequenced
before sigaction, which is sequenced before the EINTR-generating
signal delivery, which is sequenced before the load in sig_timedwait,
with memory barriers (necessarily from an implementation standpoint)
imposed by whatever kernel mechanism delivers the signal. However
there is still a race between concurrent/unsequenced sigaction calls.

The "right" fix would be to use an AS-safe lock to protect the
handling_sigs object, or make it atomic (a_store on the sigaction
side, a_barrier before load on the sem side). Alternatively we could
assume the above reasoning about sequencing of sigaction before EINTR
and just use a lock on the sigaction side, but the atomic is probably
simplest and "safer".

A couple other small nits: comments especially should not exceed 80
columns (preferably <75 or so) assuming tab width 8; code should avoid
it too but I see it slightly does in a few places there. Spaces should
not be used for indention. And the commit message should reflect the
change made; it's not adding a workaround, but reducing the scope of a
previous workaround (which should be cited by commit id) to fix a
conformance bug. The name "handling_sigs" is also a bit misleading;
what it's indicating is that interrupting signal handlers are or have
been present.

I'm happy to fix up these issues and commit if there aren't mistakes
in the above comments that still need to be addressed.

Rich
Markus Wichmann Dec. 9, 2018, 6:50 a.m.
On Sat, Dec 08, 2018 at 09:51:40PM -0500, Rich Felker wrote:
> Conceptually the logic looks correct, but I'm trying to phase out the
> __libc structure,

Oh? Well, I suppose it does offer no benefit the linker doesn't already
give you. I suppose we could make the flag a static int in sigaction(),
with a hidden getter function.

> and more importantly, introducing a bitfield here is
> not safe unless access to all bitfield members is controlled by a
> common mutex or other synchronization mechanism. There is a data race
> if any access to the other fields is accessed when sigaction is
> callable from another thread, and the impact of a data race on any of
> these is critical.
> 

The flags can_do_thrads, threaded, and secure are all set before the
program becomes multi-threaded and they keep their values throughout the
rest of the program. So the only accesses are read accesses.

handling_sigs on the other hand might be written by multiple threads at
the same time, but to the same value. Can that ever be an issue?

Wait... I think I figured it out. On some architectures you can have
incoherent caches. Threads need to synchronize to access shared
ressources, and those synchronization functions take care of that
problem, but I circumvented them. Now, if the caches are set to
write-back mode (which is common), then writing to handling_sigs only
writes to the cache line for the address of handling_sigs, but not to
main memory. So another thread on another core will not see the update.

Worse: If by a bad roll of the dice the writing thread's cache line is
evicted before the reading thread's cache line is (it might have written
somewhere else in that cache line), that latter eviction will overwrite
handling_sigs back to 0.

Am I close?

> Unfortunately, even without the bitfield, there is a data race between
> access to the new flag from sem_timedwait and from sigaction. In
> theory, this potentially results in a race window where sem_wait
> retries on EINTR because it doesn't yet see the updated handling_sigs.
> 

In that case, the sem_wait() and the sigaction() would be unserialized,
and any correct program cannot depend on the order of operations. In
particular, the caller if the sem_wait() can't depend on the sigaction()
having installed the signal handler yet.

> The "right" fix would be to use an AS-safe lock to protect the
> handling_sigs object, or make it atomic (a_store on the sigaction
> side, a_barrier before load on the sem side). Alternatively we could
> assume the above reasoning about sequencing of sigaction before EINTR
> and just use a lock on the sigaction side, but the atomic is probably
> simplest and "safer".
> 

Plus, a lock for a single int that can only ever go from 0 to 1 would be
a bit overkill.

> A couple other small nits: comments especially should not exceed 80
> columns (preferably <75 or so) assuming tab width 8; code should avoid
> it too but I see it slightly does in a few places there. Spaces should
> not be used for indention.

Ah crap. I forgot the comment the first time I was in the file, and when
I started vim again, I forgot to re-apply the musl settings.

> And the commit message should reflect the
> change made; it's not adding a workaround, but reducing the scope of a
> previous workaround (which should be cited by commit id) to fix a
> conformance bug. The name "handling_sigs" is also a bit misleading;
> what it's indicating is that interrupting signal handlers are or have
> been present.
> 
> I'm happy to fix up these issues and commit if there aren't mistakes
> in the above comments that still need to be addressed.
> 
> Rich

Well, it certainly was a learning experience for me. Just goes to show
that you never learn as much as when you're making a mistake and have to
figure it out.

Ciao,
Markus
Rich Felker Dec. 12, 2018, 12:32 a.m.
On Sun, Dec 09, 2018 at 07:50:33AM +0100, Markus Wichmann wrote:
> On Sat, Dec 08, 2018 at 09:51:40PM -0500, Rich Felker wrote:
> > Conceptually the logic looks correct, but I'm trying to phase out the
> > __libc structure,
> 
> Oh? Well, I suppose it does offer no benefit the linker doesn't already
> give you. I suppose we could make the flag a static int in sigaction(),
> with a hidden getter function.

For the record, the original purpose of the __libc structure was, in
the days of the first-gen dynamic linker, making it possible to access
the members in a pure PC-relative manner before relocations had taken
place, and without assuming the toolchain supported visibility. In the
fallback case, libc was #defined like errno, as (*__libc_location())
or similar, and the struct was static in the file where that function
was defined. It later got used for other globals we wanted to make
cheap to access, and, to some extent, to micro-optimize the placement
of atomics relative to cache lines in ways that didn't actually
matter.

Now that we just use hidden visibility for internal things, access is
just as efficient without putting them in the special hidden-vis libc
structure, and static linking is more efficient since the linker can
avoid pulling in members that don't actually get used.

> > and more importantly, introducing a bitfield here is
> > not safe unless access to all bitfield members is controlled by a
> > common mutex or other synchronization mechanism. There is a data race
> > if any access to the other fields is accessed when sigaction is
> > callable from another thread, and the impact of a data race on any of
> > these is critical.
> > 
> 
> The flags can_do_thrads, threaded, and secure are all set before the
> program becomes multi-threaded and they keep their values throughout the
> rest of the program. So the only accesses are read accesses.
> 
> handling_sigs on the other hand might be written by multiple threads at
> the same time, but to the same value. Can that ever be an issue?
> 
> Wait... I think I figured it out. On some architectures you can have
> incoherent caches. Threads need to synchronize to access shared
> ressources, and those synchronization functions take care of that
> problem, but I circumvented them. Now, if the caches are set to
> write-back mode (which is common), then writing to handling_sigs only
> writes to the cache line for the address of handling_sigs, but not to
> main memory. So another thread on another core will not see the update.
> 
> Worse: If by a bad roll of the dice the writing thread's cache line is
> evicted before the reading thread's cache line is (it might have written
> somewhere else in that cache line), that latter eviction will overwrite
> handling_sigs back to 0.
> 
> Am I close?

There might or might not be mechanisms on very weakly-ordered machines
by which this could blow up, but the reasoning here is more simple and
high-level: we just don't assume properties about the machine's memory
ordering properties, and instead treat data races as the undefined
behavior they are, with the assumption that "anything could happen".

> > Unfortunately, even without the bitfield, there is a data race between
> > access to the new flag from sem_timedwait and from sigaction. In
> > theory, this potentially results in a race window where sem_wait
> > retries on EINTR because it doesn't yet see the updated handling_sigs.
> 
> In that case, the sem_wait() and the sigaction() would be unserialized,
> and any correct program cannot depend on the order of operations. In
> particular, the caller if the sem_wait() can't depend on the sigaction()
> having installed the signal handler yet.

Consider the case where thread A is blocked in sem_wait, and thread B
installs an interrupting signal handler then calls pthread_kill to
send the signal to thread A. Despite any operations that synchronize
memory, since the pthread_kill is ordered after the sigaction in
thread B, it seems a requirement that the signal generate EINTR. There
necessarily is some underlying synchronization in kernelspace that
would ensure this even without any effort by userspace, but it's not
part of the abstract model I'm working from. We could just assume it's
there or add a_barrier() before the check and make it explicit.

> > The "right" fix would be to use an AS-safe lock to protect the
> > handling_sigs object, or make it atomic (a_store on the sigaction
> > side, a_barrier before load on the sem side). Alternatively we could
> > assume the above reasoning about sequencing of sigaction before EINTR
> > and just use a lock on the sigaction side, but the atomic is probably
> > simplest and "safer".
> > 
> 
> Plus, a lock for a single int that can only ever go from 0 to 1 would be
> a bit overkill.

It's not a big deal, but I think a_store would work just as well and
be simpler.

> > A couple other small nits: comments especially should not exceed 80
> > columns (preferably <75 or so) assuming tab width 8; code should avoid
> > it too but I see it slightly does in a few places there. Spaces should
> > not be used for indention.
> 
> Ah crap. I forgot the comment the first time I was in the file, and when
> I started vim again, I forgot to re-apply the musl settings.
> 
> > And the commit message should reflect the
> > change made; it's not adding a workaround, but reducing the scope of a
> > previous workaround (which should be cited by commit id) to fix a
> > conformance bug. The name "handling_sigs" is also a bit misleading;
> > what it's indicating is that interrupting signal handlers are or have
> > been present.
> > 
> > I'm happy to fix up these issues and commit if there aren't mistakes
> > in the above comments that still need to be addressed.
> 
> Well, it certainly was a learning experience for me. Just goes to show
> that you never learn as much as when you're making a mistake and have to
> figure it out.

One other thought: would it be preferable for the EINTR suppression in
the absence of interruptible signal handlers to be in __timedwait
rather than sem_timedwait? Then any code using __timedwait would
benefit from it. I'm not sure if there are other callers where it
would help but it wouldn't hurt either.

Rich
Markus Wichmann Dec. 12, 2018, 5:15 a.m.
On Tue, Dec 11, 2018 at 07:32:38PM -0500, Rich Felker wrote:
> One other thought: would it be preferable for the EINTR suppression in
> the absence of interruptible signal handlers to be in __timedwait
> rather than sem_timedwait? Then any code using __timedwait would
> benefit from it. I'm not sure if there are other callers where it
> would help but it wouldn't hurt either.
> 
> Rich

Nope, that would not help. I had a look at all users of SYS_futex that
might be impacted by the kernel bug you mentioned (and followed them
back to the public interfaces that use them). Only __timedwait does
anything with the return value at all, and of all the users of
__timedwait(), sem_timedwait() is the only function even specified to
return EINTR. All others are specified to *never* return EINTR. At least
according to the manpages I have here (manpages-posix).

So only sem_timedwait() needs this patch. For the other users it would
hurt conformance.

Ciao,
Markus
Rich Felker Dec. 14, 2018, 7:45 p.m.
On Wed, Dec 12, 2018 at 06:15:59AM +0100, Markus Wichmann wrote:
> On Tue, Dec 11, 2018 at 07:32:38PM -0500, Rich Felker wrote:
> > One other thought: would it be preferable for the EINTR suppression in
> > the absence of interruptible signal handlers to be in __timedwait
> > rather than sem_timedwait? Then any code using __timedwait would
> > benefit from it. I'm not sure if there are other callers where it
> > would help but it wouldn't hurt either.
> 
> Nope, that would not help. I had a look at all users of SYS_futex that

Perhaps help was the wrong word; I think you're right that there's
nowhere else it matters and that all other callers already ignore
EINTR unconditionally because they're supposed to. The only plausible
improvement is avoiding spurious dec/inc cycle on the waiter count in
some places. On the other hand it might be a nicer factorization (less
ugly and linux-bug-specific logic in high level code, i.e.
sem_timedwait) if the workaround were buried in low-level stuff
(__timedwait).

x> might be impacted by the kernel bug you mentioned (and followed them
> back to the public interfaces that use them). Only __timedwait does
> anything with the return value at all, and of all the users of
> __timedwait(), sem_timedwait() is the only function even specified to
> return EINTR. All others are specified to *never* return EINTR. At least
> according to the manpages I have here (manpages-posix).
> 
> So only sem_timedwait() needs this patch. For the other users it would
> hurt conformance.

I don't see how it could hurt conformance.

Rich
Markus Wichmann Dec. 15, 2018, 9:45 a.m.
On Fri, Dec 14, 2018 at 02:45:16PM -0500, Rich Felker wrote:
> Perhaps help was the wrong word; I think you're right that there's
> nowhere else it matters and that all other callers already ignore
> EINTR unconditionally because they're supposed to. The only plausible
> improvement is avoiding spurious dec/inc cycle on the waiter count in
> some places. On the other hand it might be a nicer factorization (less
> ugly and linux-bug-specific logic in high level code, i.e.
> sem_timedwait) if the workaround were buried in low-level stuff
> (__timedwait).
> 
> [...]
> 
> I don't see how it could hurt conformance.
> 
> Rich

I was misunderstanding you. I thought you were about to put the
maybe-retry on EINTR into __timedwait() and then remove the
corresponding check from all the users of __timedwait(). But apparently
you want leave the users relatively unmolested. Yeah, that sounds better
than what I pictured. Go right ahead, I say.

Ciao,
Markus