fix tls offsets when p_vaddr%p_align != 0 for TLS_ABOVE_TP

Submitted by Szabolcs Nagy on May 14, 2019, 2:01 a.m.

Details

Message ID 20190514020131.GC16415@port70.net
State New
Series "fix tls offsets when p_vaddr%p_align != 0 for TLS_ABOVE_TP"
Headers show

Commit Message

Szabolcs Nagy May 14, 2019, 2:01 a.m.
this came up because lld changed tls alignment on aarch64 as a
workaround for a bionic abi issue https://reviews.llvm.org/D53906
but lld does not handle p_vaddr%p_align!=0 right so it broke on glibc
https://reviews.llvm.org/D61824

the patch is untested (bfd linker cannot seem to create problematic
elf objects), but at least there are no regressions with libc-test.

Patch hide | download patch | download mbox

From 8c94fcbc9faeb8b07132506757c3d3973652420e Mon Sep 17 00:00:00 2001
From: Szabolcs Nagy <nsz@port70.net>
Date: Mon, 13 May 2019 18:47:11 +0000
Subject: [PATCH] fix tls offsets when p_vaddr%p_align != 0 for TLS_ABOVE_TP

currently the bfd linker does not seem to create tls segments where
p_vaddr%p_align != 0, but this is valid in ELF and then the runtime
computed tls offset must satisfy

  offset%p_align == (base+p_vaddr)%p_align

and in case of local exec tls (main executable) the smallest such
offset must be used (otherwise it is incompatible with the offset
computed by the static linker). the !TLS_ABOVE_TP case handled this
correctly (the offset is negative then in the formula).

the ldso code for TLS_ABOVE_TP is changed so the static tls offset
of each module satisfies the formula and tls_offset always points
to the end of the allocated static tls area (and not aligned up to
tls_align or MIN_TLS_ALIGN). the tls_offset computation was wrong
when multiple modules were loaded with static tls and in some the
tls segment p_memsz%p_align != 0.
---
 ldso/dynlink.c       | 13 ++++++-------
 src/env/__init_tls.c |  3 ++-
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/ldso/dynlink.c b/ldso/dynlink.c
index 42a5470d..6dc39483 100644
--- a/ldso/dynlink.c
+++ b/ldso/dynlink.c
@@ -1126,9 +1126,9 @@  static struct dso *load_library(const char *name, struct dso *needed_by)
 		p->tls_id = ++tls_cnt;
 		tls_align = MAXP2(tls_align, p->tls.align);
 #ifdef TLS_ABOVE_TP
-		p->tls.offset = tls_offset + ( (tls_align-1) &
-			-(tls_offset + (uintptr_t)p->tls.image) );
-		tls_offset += p->tls.size;
+		p->tls.offset = tls_offset + ( (p->tls.align-1) &
+			(-tls_offset + (uintptr_t)p->tls.image) );
+		tls_offset = p->tls.offset + p->tls.size;
 #else
 		tls_offset += p->tls.size + p->tls.align - 1;
 		tls_offset -= (tls_offset + (uintptr_t)p->tls.image)
@@ -1797,10 +1797,9 @@  _Noreturn void __dls3(size_t *sp)
 		app.tls_id = tls_cnt = 1;
 #ifdef TLS_ABOVE_TP
 		app.tls.offset = GAP_ABOVE_TP;
-		app.tls.offset += -GAP_ABOVE_TP & (app.tls.align-1);
-		tls_offset = app.tls.offset + app.tls.size
-			+ ( -((uintptr_t)app.tls.image + app.tls.size)
-			& (app.tls.align-1) );
+		app.tls.offset += (-GAP_ABOVE_TP + (uintptr_t)app.tls.image)
+			& (app.tls.align-1);
+		tls_offset = app.tls.offset + app.tls.size;
 #else
 		tls_offset = app.tls.offset = app.tls.size
 			+ ( -((uintptr_t)app.tls.image + app.tls.size)
diff --git a/src/env/__init_tls.c b/src/env/__init_tls.c
index 5f12500c..772baba3 100644
--- a/src/env/__init_tls.c
+++ b/src/env/__init_tls.c
@@ -115,7 +115,8 @@  static void static_init_tls(size_t *aux)
 		& (main_tls.align-1);
 #ifdef TLS_ABOVE_TP
 	main_tls.offset = GAP_ABOVE_TP;
-	main_tls.offset += -GAP_ABOVE_TP & (main_tls.align-1);
+	main_tls.offset += (-GAP_ABOVE_TP + (uintptr_t)main_tls.image)
+		& (main_tls.align-1);
 #else
 	main_tls.offset = main_tls.size;
 #endif
-- 
2.21.0


Comments

Rich Felker May 16, 2019, 12:20 a.m.
On Tue, May 14, 2019 at 04:01:31AM +0200, Szabolcs Nagy wrote:
> this came up because lld changed tls alignment on aarch64 as a
> workaround for a bionic abi issue https://reviews.llvm.org/D53906
> but lld does not handle p_vaddr%p_align!=0 right so it broke on glibc
> https://reviews.llvm.org/D61824
> 
> the patch is untested (bfd linker cannot seem to create problematic
> elf objects), but at least there are no regressions with libc-test.

> >From 8c94fcbc9faeb8b07132506757c3d3973652420e Mon Sep 17 00:00:00 2001
> From: Szabolcs Nagy <nsz@port70.net>
> Date: Mon, 13 May 2019 18:47:11 +0000
> Subject: [PATCH] fix tls offsets when p_vaddr%p_align != 0 for TLS_ABOVE_TP
> 
> currently the bfd linker does not seem to create tls segments where
> p_vaddr%p_align != 0, but this is valid in ELF and then the runtime
> computed tls offset must satisfy
> 
>   offset%p_align == (base+p_vaddr)%p_align
> 
> and in case of local exec tls (main executable) the smallest such
> offset must be used (otherwise it is incompatible with the offset
> computed by the static linker). the !TLS_ABOVE_TP case handled this
> correctly (the offset is negative then in the formula).
> 
> the ldso code for TLS_ABOVE_TP is changed so the static tls offset
> of each module satisfies the formula and tls_offset always points
> to the end of the allocated static tls area (and not aligned up to
> tls_align or MIN_TLS_ALIGN).

I guess this saves some wasted memory?

> the tls_offset computation was wrong
> when multiple modules were loaded with static tls and in some the
> tls segment p_memsz%p_align != 0.

I don't understand this part. Are you saying we're currently
misaligning TLS for some libraries now? 

> ---
>  ldso/dynlink.c       | 13 ++++++-------
>  src/env/__init_tls.c |  3 ++-
>  2 files changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/ldso/dynlink.c b/ldso/dynlink.c
> index 42a5470d..6dc39483 100644
> --- a/ldso/dynlink.c
> +++ b/ldso/dynlink.c
> @@ -1126,9 +1126,9 @@ static struct dso *load_library(const char *name, struct dso *needed_by)
>  		p->tls_id = ++tls_cnt;
>  		tls_align = MAXP2(tls_align, p->tls.align);
>  #ifdef TLS_ABOVE_TP
> -		p->tls.offset = tls_offset + ( (tls_align-1) &
> -			-(tls_offset + (uintptr_t)p->tls.image) );
> -		tls_offset += p->tls.size;
> +		p->tls.offset = tls_offset + ( (p->tls.align-1) &
> +			(-tls_offset + (uintptr_t)p->tls.image) );
> +		tls_offset = p->tls.offset + p->tls.size;

Is there a motivation for the seemingly independent change from use of
tls_align to use of p->tls.align here?

>  #else
>  		tls_offset += p->tls.size + p->tls.align - 1;
>  		tls_offset -= (tls_offset + (uintptr_t)p->tls.image)
> @@ -1797,10 +1797,9 @@ _Noreturn void __dls3(size_t *sp)
>  		app.tls_id = tls_cnt = 1;
>  #ifdef TLS_ABOVE_TP
>  		app.tls.offset = GAP_ABOVE_TP;
> -		app.tls.offset += -GAP_ABOVE_TP & (app.tls.align-1);
> -		tls_offset = app.tls.offset + app.tls.size
> -			+ ( -((uintptr_t)app.tls.image + app.tls.size)
> -			& (app.tls.align-1) );
> +		app.tls.offset += (-GAP_ABOVE_TP + (uintptr_t)app.tls.image)
> +			& (app.tls.align-1);
> +		tls_offset = app.tls.offset + app.tls.size;
>  #else
>  		tls_offset = app.tls.offset = app.tls.size
>  			+ ( -((uintptr_t)app.tls.image + app.tls.size)
> diff --git a/src/env/__init_tls.c b/src/env/__init_tls.c
> index 5f12500c..772baba3 100644
> --- a/src/env/__init_tls.c
> +++ b/src/env/__init_tls.c
> @@ -115,7 +115,8 @@ static void static_init_tls(size_t *aux)
>  		& (main_tls.align-1);
>  #ifdef TLS_ABOVE_TP
>  	main_tls.offset = GAP_ABOVE_TP;
> -	main_tls.offset += -GAP_ABOVE_TP & (main_tls.align-1);
> +	main_tls.offset += (-GAP_ABOVE_TP + (uintptr_t)main_tls.image)
> +		& (main_tls.align-1);
>  #else
>  	main_tls.offset = main_tls.size;
>  #endif
> -- 
> 2.21.0
> 

I think you're probably right about all these things and I want to
apply this, but I also want to understand it a bit better first.

Rich
Szabolcs Nagy May 16, 2019, 7:48 a.m.
* Rich Felker <dalias@libc.org> [2019-05-15 20:20:51 -0400]:
> On Tue, May 14, 2019 at 04:01:31AM +0200, Szabolcs Nagy wrote:
> > this came up because lld changed tls alignment on aarch64 as a
> > workaround for a bionic abi issue https://reviews.llvm.org/D53906
> > but lld does not handle p_vaddr%p_align!=0 right so it broke on glibc
> > https://reviews.llvm.org/D61824
> > 
> > the patch is untested (bfd linker cannot seem to create problematic
> > elf objects), but at least there are no regressions with libc-test.
> 
> > >From 8c94fcbc9faeb8b07132506757c3d3973652420e Mon Sep 17 00:00:00 2001
> > From: Szabolcs Nagy <nsz@port70.net>
> > Date: Mon, 13 May 2019 18:47:11 +0000
> > Subject: [PATCH] fix tls offsets when p_vaddr%p_align != 0 for TLS_ABOVE_TP
> > 
> > currently the bfd linker does not seem to create tls segments where
> > p_vaddr%p_align != 0, but this is valid in ELF and then the runtime
> > computed tls offset must satisfy
> > 
> >   offset%p_align == (base+p_vaddr)%p_align
> > 
> > and in case of local exec tls (main executable) the smallest such
> > offset must be used (otherwise it is incompatible with the offset
> > computed by the static linker). the !TLS_ABOVE_TP case handled this
> > correctly (the offset is negative then in the formula).
> > 
> > the ldso code for TLS_ABOVE_TP is changed so the static tls offset
> > of each module satisfies the formula and tls_offset always points
> > to the end of the allocated static tls area (and not aligned up to
> > tls_align or MIN_TLS_ALIGN).
> 
> I guess this saves some wasted memory?
> 
> > the tls_offset computation was wrong
> > when multiple modules were loaded with static tls and in some the
> > tls segment p_memsz%p_align != 0.
> 
> I don't understand this part. Are you saying we're currently
> misaligning TLS for some libraries now? 

any time there are 'alignment gaps' in the static tls area
for shared libs, since tls_offset += dso.tls.size is the
only update to tls_offset which can be misaligned, it is
not updated when dso.tls.offset is aligned up.

the only exception is tls in the application: tls_offset
is aligned up after the app tls. (which my introduce
bigger gap than necessary)

i can make the tls_offset fix a separate change from
the p_vaddr%p_align!=0 fix.

> > ---
> >  ldso/dynlink.c       | 13 ++++++-------
> >  src/env/__init_tls.c |  3 ++-
> >  2 files changed, 8 insertions(+), 8 deletions(-)
> > 
> > diff --git a/ldso/dynlink.c b/ldso/dynlink.c
> > index 42a5470d..6dc39483 100644
> > --- a/ldso/dynlink.c
> > +++ b/ldso/dynlink.c
> > @@ -1126,9 +1126,9 @@ static struct dso *load_library(const char *name, struct dso *needed_by)
> >  		p->tls_id = ++tls_cnt;
> >  		tls_align = MAXP2(tls_align, p->tls.align);
> >  #ifdef TLS_ABOVE_TP
> > -		p->tls.offset = tls_offset + ( (tls_align-1) &
> > -			-(tls_offset + (uintptr_t)p->tls.image) );
> > -		tls_offset += p->tls.size;
> > +		p->tls.offset = tls_offset + ( (p->tls.align-1) &
> > +			(-tls_offset + (uintptr_t)p->tls.image) );
> > +		tls_offset = p->tls.offset + p->tls.size;
> 
> Is there a motivation for the seemingly independent change from use of
> tls_align to use of p->tls.align here?

to 'satisfy the formula' which uses p_align, using the max
alignment seemed unnecessary overalignment.

if all modules follow the same formula and static linking
too then you only have to prove that one thing to work.

> >  #else
> >  		tls_offset += p->tls.size + p->tls.align - 1;
> >  		tls_offset -= (tls_offset + (uintptr_t)p->tls.image)
> > @@ -1797,10 +1797,9 @@ _Noreturn void __dls3(size_t *sp)
> >  		app.tls_id = tls_cnt = 1;
> >  #ifdef TLS_ABOVE_TP
> >  		app.tls.offset = GAP_ABOVE_TP;
> > -		app.tls.offset += -GAP_ABOVE_TP & (app.tls.align-1);
> > -		tls_offset = app.tls.offset + app.tls.size
> > -			+ ( -((uintptr_t)app.tls.image + app.tls.size)
> > -			& (app.tls.align-1) );
> > +		app.tls.offset += (-GAP_ABOVE_TP + (uintptr_t)app.tls.image)
> > +			& (app.tls.align-1);
> > +		tls_offset = app.tls.offset + app.tls.size;
> >  #else
> >  		tls_offset = app.tls.offset = app.tls.size
> >  			+ ( -((uintptr_t)app.tls.image + app.tls.size)
> > diff --git a/src/env/__init_tls.c b/src/env/__init_tls.c
> > index 5f12500c..772baba3 100644
> > --- a/src/env/__init_tls.c
> > +++ b/src/env/__init_tls.c
> > @@ -115,7 +115,8 @@ static void static_init_tls(size_t *aux)
> >  		& (main_tls.align-1);
> >  #ifdef TLS_ABOVE_TP
> >  	main_tls.offset = GAP_ABOVE_TP;
> > -	main_tls.offset += -GAP_ABOVE_TP & (main_tls.align-1);
> > +	main_tls.offset += (-GAP_ABOVE_TP + (uintptr_t)main_tls.image)
> > +		& (main_tls.align-1);
> >  #else
> >  	main_tls.offset = main_tls.size;
> >  #endif
> > -- 
> > 2.21.0
> > 
> 
> I think you're probably right about all these things and I want to
> apply this, but I also want to understand it a bit better first.
> 
> Rich
Rich Felker May 16, 2019, 1:22 p.m.
On Thu, May 16, 2019 at 09:48:50AM +0200, Szabolcs Nagy wrote:
> * Rich Felker <dalias@libc.org> [2019-05-15 20:20:51 -0400]:
> > On Tue, May 14, 2019 at 04:01:31AM +0200, Szabolcs Nagy wrote:
> > > this came up because lld changed tls alignment on aarch64 as a
> > > workaround for a bionic abi issue https://reviews.llvm.org/D53906
> > > but lld does not handle p_vaddr%p_align!=0 right so it broke on glibc
> > > https://reviews.llvm.org/D61824
> > > 
> > > the patch is untested (bfd linker cannot seem to create problematic
> > > elf objects), but at least there are no regressions with libc-test.
> > 
> > > >From 8c94fcbc9faeb8b07132506757c3d3973652420e Mon Sep 17 00:00:00 2001
> > > From: Szabolcs Nagy <nsz@port70.net>
> > > Date: Mon, 13 May 2019 18:47:11 +0000
> > > Subject: [PATCH] fix tls offsets when p_vaddr%p_align != 0 for TLS_ABOVE_TP
> > > 
> > > currently the bfd linker does not seem to create tls segments where
> > > p_vaddr%p_align != 0, but this is valid in ELF and then the runtime
> > > computed tls offset must satisfy
> > > 
> > >   offset%p_align == (base+p_vaddr)%p_align
> > > 
> > > and in case of local exec tls (main executable) the smallest such
> > > offset must be used (otherwise it is incompatible with the offset
> > > computed by the static linker). the !TLS_ABOVE_TP case handled this
> > > correctly (the offset is negative then in the formula).
> > > 
> > > the ldso code for TLS_ABOVE_TP is changed so the static tls offset
> > > of each module satisfies the formula and tls_offset always points
> > > to the end of the allocated static tls area (and not aligned up to
> > > tls_align or MIN_TLS_ALIGN).
> > 
> > I guess this saves some wasted memory?
> > 
> > > the tls_offset computation was wrong
> > > when multiple modules were loaded with static tls and in some the
> > > tls segment p_memsz%p_align != 0.
> > 
> > I don't understand this part. Are you saying we're currently
> > misaligning TLS for some libraries now? 
> 
> any time there are 'alignment gaps' in the static tls area
> for shared libs, since tls_offset += dso.tls.size is the
> only update to tls_offset which can be misaligned, it is
> not updated when dso.tls.offset is aligned up.
> 
> the only exception is tls in the application: tls_offset
> is aligned up after the app tls. (which my introduce
> bigger gap than necessary)
> 
> i can make the tls_offset fix a separate change from
> the p_vaddr%p_align!=0 fix.

I'd like that, both because it was non-obvious to me that this was
another major change, and because it fixes a separate, major
user-facing bug (this is what was breaking rust, and could break lots
of other things) rather than the new issue with lld's Bionic hack.

> 
> > > ---
> > >  ldso/dynlink.c       | 13 ++++++-------
> > >  src/env/__init_tls.c |  3 ++-
> > >  2 files changed, 8 insertions(+), 8 deletions(-)
> > > 
> > > diff --git a/ldso/dynlink.c b/ldso/dynlink.c
> > > index 42a5470d..6dc39483 100644
> > > --- a/ldso/dynlink.c
> > > +++ b/ldso/dynlink.c
> > > @@ -1126,9 +1126,9 @@ static struct dso *load_library(const char *name, struct dso *needed_by)
> > >  		p->tls_id = ++tls_cnt;
> > >  		tls_align = MAXP2(tls_align, p->tls.align);
> > >  #ifdef TLS_ABOVE_TP
> > > -		p->tls.offset = tls_offset + ( (tls_align-1) &
> > > -			-(tls_offset + (uintptr_t)p->tls.image) );
> > > -		tls_offset += p->tls.size;
> > > +		p->tls.offset = tls_offset + ( (p->tls.align-1) &
> > > +			(-tls_offset + (uintptr_t)p->tls.image) );
> > > +		tls_offset = p->tls.offset + p->tls.size;
> > 
> > Is there a motivation for the seemingly independent change from use of
> > tls_align to use of p->tls.align here?
> 
> to 'satisfy the formula' which uses p_align, using the max
> alignment seemed unnecessary overalignment.
> 
> if all modules follow the same formula and static linking
> too then you only have to prove that one thing to work.

OK, this sounds good. After reading it less tired it makes sense to me
too. Thanks for the explanation.

Rich