[v2] fix tls access on arm targets before armv6k

Submitted by Szabolcs Nagy on Aug. 24, 2018, 7:27 p.m.

Details

Message ID 20180824192749.GC4418@port70.net
State New
Series "fix tls access on arm targets before armv6k"
Headers show

Commit Message

Szabolcs Nagy Aug. 24, 2018, 7:27 p.m.
rewrote it in asm to fix a runtime-abi issue.

Patch hide | download patch | download mbox

From 90da0eb3d0e92203d822ef9d5f96ef0e4e276ca2 Mon Sep 17 00:00:00 2001
From: Szabolcs Nagy <nsz@port70.net>
Date: Thu, 23 Aug 2018 10:57:34 +0000
Subject: [PATCH] fix tls access on arm targets before armv6k

commit 610c5a8524c3d6cd3ac5a5f1231422e7648a3791 changed the thread
pointer setup so tp points at the end of the pthread struct on arm,
but failed to update __aeabi_read_tp so it was off by 8.

this broke tls access in code that is compiled with -mtp=soft, which
is the default when target arch is pre armv6k or thumb1.

__aeabi_read_tp used to call c code, but that was incorrect as the
arm runtime abi specifies special pcs for this function: it is only
allowed to clobber r0, ip, lr and cpsr.

if musl is compiled for a target with hardware tp then the code can
be further optimized so __aeabi_read_tp is an alias to __a_gettp_cp15,
but that requires preprocessed asm with conditionals, for now this
is a minimal bug fix.
---
 src/thread/arm/__aeabi_read_tp.s   | 10 ++++++----
 src/thread/arm/__aeabi_read_tp_c.c |  8 --------
 2 files changed, 6 insertions(+), 12 deletions(-)
 delete mode 100644 src/thread/arm/__aeabi_read_tp_c.c

diff --git a/src/thread/arm/__aeabi_read_tp.s b/src/thread/arm/__aeabi_read_tp.s
index 9d0cd311..2585620c 100644
--- a/src/thread/arm/__aeabi_read_tp.s
+++ b/src/thread/arm/__aeabi_read_tp.s
@@ -2,7 +2,9 @@ 
 .global __aeabi_read_tp
 .type __aeabi_read_tp,%function
 __aeabi_read_tp:
-	push {r1,r2,r3,lr}
-	bl __aeabi_read_tp_c
-	pop {r1,r2,r3,lr}
-	bx lr
+	ldr r0,1f
+	add r0,r0,pc
+	ldr r0,[r0]
+2:	bx r0
+	.align 2
+1:	.word __a_gettp_ptr - 2b
diff --git a/src/thread/arm/__aeabi_read_tp_c.c b/src/thread/arm/__aeabi_read_tp_c.c
deleted file mode 100644
index 654bdc57..00000000
--- a/src/thread/arm/__aeabi_read_tp_c.c
+++ /dev/null
@@ -1,8 +0,0 @@ 
-#include "pthread_impl.h"
-#include <stdint.h>
-
-__attribute__((__visibility__("hidden")))
-void *__aeabi_read_tp_c(void)
-{
-	return (void *)((uintptr_t)__pthread_self()-8+sizeof(struct pthread));
-}
-- 
2.18.0


Comments

Rich Felker Aug. 24, 2018, 8:17 p.m.
On Fri, Aug 24, 2018 at 09:27:50PM +0200, Szabolcs Nagy wrote:
> rewrote it in asm to fix a runtime-abi issue.

I've already queued the previous fix. I could replace it since it's
not pushed, but I'd kinda prefer to make this change independent from
the bugfix since it's already been done in 2 steps anyway; normally I
don't like to mix bugfixes with refactoring except when it would be
hard to do the bugfix independently first.

> >From 90da0eb3d0e92203d822ef9d5f96ef0e4e276ca2 Mon Sep 17 00:00:00 2001
> From: Szabolcs Nagy <nsz@port70.net>
> Date: Thu, 23 Aug 2018 10:57:34 +0000
> Subject: [PATCH] fix tls access on arm targets before armv6k
> 
> commit 610c5a8524c3d6cd3ac5a5f1231422e7648a3791 changed the thread
> pointer setup so tp points at the end of the pthread struct on arm,
> but failed to update __aeabi_read_tp so it was off by 8.
> 
> this broke tls access in code that is compiled with -mtp=soft, which
> is the default when target arch is pre armv6k or thumb1.
> 
> __aeabi_read_tp used to call c code, but that was incorrect as the
> arm runtime abi specifies special pcs for this function: it is only
> allowed to clobber r0, ip, lr and cpsr.

Maybe keep just this paragraph if factoring as 2 commits?

> if musl is compiled for a target with hardware tp then the code can
> be further optimized so __aeabi_read_tp is an alias to __a_gettp_cp15,
> but that requires preprocessed asm with conditionals, for now this
> is a minimal bug fix.

I think this optimization isn't so important since, if the user is
compiling for a higher ISA level, they won't even be generating calls
to __aeabi_read_tp. It would marginally help the case where you
compile an optimized-for-native-cpu libc.so but use low-baseline
application binaries, but I doubt that's a case many users care about.

> ---
>  src/thread/arm/__aeabi_read_tp.s   | 10 ++++++----
>  src/thread/arm/__aeabi_read_tp_c.c |  8 --------
>  2 files changed, 6 insertions(+), 12 deletions(-)
>  delete mode 100644 src/thread/arm/__aeabi_read_tp_c.c
> 
> diff --git a/src/thread/arm/__aeabi_read_tp.s b/src/thread/arm/__aeabi_read_tp.s
> index 9d0cd311..2585620c 100644
> --- a/src/thread/arm/__aeabi_read_tp.s
> +++ b/src/thread/arm/__aeabi_read_tp.s
> @@ -2,7 +2,9 @@
>  .global __aeabi_read_tp
>  .type __aeabi_read_tp,%function
>  __aeabi_read_tp:
> -	push {r1,r2,r3,lr}
> -	bl __aeabi_read_tp_c
> -	pop {r1,r2,r3,lr}
> -	bx lr
> +	ldr r0,1f
> +	add r0,r0,pc
> +	ldr r0,[r0]
> +2:	bx r0
> +	.align 2
> +1:	.word __a_gettp_ptr - 2b

Isn't there an idiomatic "adr" pseudo-mnemonic preferable to use
instead of the explicit pc arithmetic here? Or is that necessarily
less efficient?

> diff --git a/src/thread/arm/__aeabi_read_tp_c.c b/src/thread/arm/__aeabi_read_tp_c.c
> deleted file mode 100644
> index 654bdc57..00000000
> --- a/src/thread/arm/__aeabi_read_tp_c.c
> +++ /dev/null
> @@ -1,8 +0,0 @@
> -#include "pthread_impl.h"
> -#include <stdint.h>
> -
> -__attribute__((__visibility__("hidden")))
> -void *__aeabi_read_tp_c(void)
> -{
> -	return (void *)((uintptr_t)__pthread_self()-8+sizeof(struct pthread));
> -}
> -- 
> 2.18.0
>
Szabolcs Nagy Aug. 24, 2018, 9:56 p.m.
* Rich Felker <dalias@libc.org> [2018-08-24 16:17:36 -0400]:
> On Fri, Aug 24, 2018 at 09:27:50PM +0200, Szabolcs Nagy wrote:
> > rewrote it in asm to fix a runtime-abi issue.
> 
> I've already queued the previous fix. I could replace it since it's
> not pushed, but I'd kinda prefer to make this change independent from
> the bugfix since it's already been done in 2 steps anyway; normally I
> don't like to mix bugfixes with refactoring except when it would be
> hard to do the bugfix independently first.
> 

ok.

> >  __aeabi_read_tp:
> > -	push {r1,r2,r3,lr}
> > -	bl __aeabi_read_tp_c
> > -	pop {r1,r2,r3,lr}
> > -	bx lr
> > +	ldr r0,1f
> > +	add r0,r0,pc
> > +	ldr r0,[r0]
> > +2:	bx r0
> > +	.align 2
> > +1:	.word __a_gettp_ptr - 2b
> 
> Isn't there an idiomatic "adr" pseudo-mnemonic preferable to use
> instead of the explicit pc arithmetic here? Or is that necessarily
> less efficient?
> 

adr r,label sets r to the address of label within a small offset. i don't
think that helps, you can avoid using pc, but it will be more instructions:

	ldr r0,1f
	adr ip,1f // == add ip,pc,1f-2f
	add r0,ip
2:	ldr r0,[r0]
	bx r0
	.align 2
1:	.word __a_gettp_ptr - 1b

the main ugliness of the pc arithmetics is that pc evaluates to pc+8 in arm
mode but pc+4 in thumb, that's why i needed label 2, but i think it's fine.