Fix the use of sigaltstack to return to the saved main stack.

Submitted by James Y Knight on July 10, 2019, 8:11 p.m.

Details

Message ID CAA2zVHrijyMnrUef5XUqRKX+H0QOwugRmk+gJOgPLvbxJ78=9g@mail.gmail.com
State New
Series "Fix the use of sigaltstack to return to the saved main stack."
Headers show

Commit Message

James Y Knight July 10, 2019, 8:11 p.m.
Updated patches attached.

W.r.t.:
> Here, "set to" is
> probably something the resolution of Austin Group issue 1187 failed to
> fix; it should probably be "includes" rather than "is set to". But I'm
> not sure it makes sense to have any flags set alongside SS_DISABLE
> anyway.

While the SS_AUTODISARM flag has no effect if specified alongside
SS_DISABLE, the kernel still accepts and stores it. So A subsequent call to
sigaltstack can return SS_DISABLE|SS_AUTODISARM in the "old" flags value.
To avoid the case where the old value returned from sigaltstack is not
accepted back as the input, I used the "includes" semantics here.


On Wed, Jul 10, 2019 at 2:39 PM Rich Felker <dalias@libc.org> wrote:

> On Wed, Jul 10, 2019 at 02:04:18PM -0400, James Y Knight wrote:
> > On Tue, Jul 9, 2019 at 3:30 PM Rich Felker <dalias@libc.org> wrote:
>
>

Patch hide | download patch | download mbox

From 313618c8e00df106528e24cabc38379b4783f51e Mon Sep 17 00:00:00 2001
From: James Y Knight <jyknight@google.com>
Date: Tue, 9 Jul 2019 14:31:24 -0400
Subject: [PATCH] Verify that returning to the original stack doesn't return an
 error (e.g. ENOMEM).

---
 src/regression/sigaltstack.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/regression/sigaltstack.c b/src/regression/sigaltstack.c
index bfdc44a..2bfe329 100644
--- a/src/regression/sigaltstack.c
+++ b/src/regression/sigaltstack.c
@@ -30,7 +30,7 @@  static void handler(int sig)
 
 int main(void)
 {
-	stack_t ss;
+	stack_t ss, oldss;
 	struct sigaction sa;
 
 	ss.ss_sp = stack;
@@ -39,7 +39,7 @@  int main(void)
 	sa.sa_handler = handler;
 	sa.sa_flags = SA_ONSTACK;
 
-	T(sigaltstack(&ss, 0));
+	T(sigaltstack(&ss, &oldss));
 	T(sigfillset(&sa.sa_mask));
 	T(sigaction(SIGUSR1, &sa, 0));
 	T(raise(SIGUSR1));
@@ -56,7 +56,7 @@  int main(void)
 		t_error("sigaltstack with bad ss_flags should have failed with EINVAL, "
 			"got %s\n", strerror(errno));
 	errno = 0;
-	T(sigaltstack(0, 0));
+	T(sigaltstack(&oldss, 0));
 
 	return t_status;
 }
-- 
2.22.0.410.gd8fdbe21b5-goog


Comments

Szabolcs Nagy July 10, 2019, 9:23 p.m.
* James Y Knight <jyknight@google.com> [2019-07-10 16:11:23 -0400]:
>  int sigaltstack(const stack_t *restrict ss, stack_t *restrict old)
>  {
> +	// We must check requirements which Linux fails to verify in the syscall
> +	// itself.
>  	if (ss) {
> -		if (ss->ss_size < MINSIGSTKSZ) {
> +		// The syscall does already check against MINSIGSTKSZ, however,
> +		// the kernel's value is smaller than musl's value on some
> +		// architectures. Thus, although this check may appear
> +		// redundant, it is not.

the comment does not make sense to me, the check is obviously
not redundant.

MINSIGSTKSZ is a libc api, has nothing to do with the kernel

the kernel also defines a MINSIGSZTKSZ but musl is an
abstraction layer higher, the linux limit should not be
observable to users, only the limit defined by musl,
which ensures not only that the kernel can deliver a
signal but also reserves space of any current or future
hackery the c runtime may need to do around signal handling,
so that trivial c language signal handler is guaranteed
to work.

this is the only reasonable way to make such limit useful.
if it were only a kernel limit, then application code would
have to guess the libc signal handling overhead and add that
to the MINSIGSZTKSZ when allocating signal stacks.
Rich Felker July 10, 2019, 9:48 p.m.
On Wed, Jul 10, 2019 at 11:23:19PM +0200, Szabolcs Nagy wrote:
> * James Y Knight <jyknight@google.com> [2019-07-10 16:11:23 -0400]:
> >  int sigaltstack(const stack_t *restrict ss, stack_t *restrict old)
> >  {
> > +	// We must check requirements which Linux fails to verify in the syscall
> > +	// itself.
> >  	if (ss) {
> > -		if (ss->ss_size < MINSIGSTKSZ) {
> > +		// The syscall does already check against MINSIGSTKSZ, however,
> > +		// the kernel's value is smaller than musl's value on some
> > +		// architectures. Thus, although this check may appear
> > +		// redundant, it is not.
> 
> the comment does not make sense to me, the check is obviously
> not redundant.

Yes. Also, in musl, we generally document motivations like this as
part of commit messages rather than comments. This ties them to the
timeline of changes, to the author, and prevents them from sticking
around when code changes and they no longer make sense.

James, could you submit this patch just as the minimal change to
correct the current bug? If additional documentation of why things are
the way they are is needed that can be done separately.

> MINSIGSTKSZ is a libc api, has nothing to do with the kernel
> 
> the kernel also defines a MINSIGSZTKSZ but musl is an
> abstraction layer higher, the linux limit should not be
> observable to users, only the limit defined by musl,
> which ensures not only that the kernel can deliver a
> signal but also reserves space of any current or future
> hackery the c runtime may need to do around signal handling,
> so that trivial c language signal handler is guaranteed
> to work.
> 
> this is the only reasonable way to make such limit useful.
> if it were only a kernel limit, then application code would
> have to guess the libc signal handling overhead and add that
> to the MINSIGSZTKSZ when allocating signal stacks.

In this case it's more that the kernel values are just wrong. libc
isn't imposing a stronger limit here because of libc code needing
stack, but because the kernel values don't account for signal frame
size. The kernel values presumably can't be changed because the
syscall interface is stable/locked, and it's risky to change for libc
too after it's set (see the issue with whether the x86 values are
right in the presence of AVX512 -- that's why on later archs we
imposed stronger limits).

Rich
Florian Weimer July 12, 2019, 9:18 a.m.
* Szabolcs Nagy:

> the comment does not make sense to me, the check is obviously
> not redundant.
>
> MINSIGSTKSZ is a libc api, has nothing to do with the kernel
>
> the kernel also defines a MINSIGSZTKSZ but musl is an
> abstraction layer higher, the linux limit should not be
> observable to users, only the limit defined by musl,
> which ensures not only that the kernel can deliver a
> signal but also reserves space of any current or future
> hackery the c runtime may need to do around signal handling,
> so that trivial c language signal handler is guaranteed
> to work.

Please keep in mind that the kernel stack requirements for delivering a
signal vary and tend to increase over time, with newer CPU generations
with larger register files.  It leads to bugs to pretend this value is
constant.

Thanks,
Florian
Rich Felker July 12, 2019, 4:06 p.m.
On Fri, Jul 12, 2019 at 11:18:49AM +0200, Florian Weimer wrote:
> * Szabolcs Nagy:
> 
> > the comment does not make sense to me, the check is obviously
> > not redundant.
> >
> > MINSIGSTKSZ is a libc api, has nothing to do with the kernel
> >
> > the kernel also defines a MINSIGSZTKSZ but musl is an
> > abstraction layer higher, the linux limit should not be
> > observable to users, only the limit defined by musl,
> > which ensures not only that the kernel can deliver a
> > signal but also reserves space of any current or future
> > hackery the c runtime may need to do around signal handling,
> > so that trivial c language signal handler is guaranteed
> > to work.
> 
> Please keep in mind that the kernel stack requirements for delivering a
> signal vary and tend to increase over time, with newer CPU generations
> with larger register files.  It leads to bugs to pretend this value is
> constant.

If that's really going to be the position, then we need to standardize
removal of the macro as mandatory, and replacement by a sysconf. I
think that's a mistake though. Unbounded increase in register file
size destroys the potential for tolerable task-switch performance,
even for tasks in the same process that should switch very fast. My
(probably unpopular) opinion is that kernel just should not support
use of ridiculous huge register file additions like AVX512, or should
do so only for processes tagged as compatible with it (which would be
built with an ABI having a larger value of the constant).

Note that some archs like aarch64 explicitly reserved an upper bound
for future expansion, which is IMO already way too large, but at least
it's a bound.

Rich