Incorrect thread TID caching

Submitted by Rich Felker on Feb. 17, 2021, 9:07 p.m.

Details

Message ID 20210217210748.GL11590@brightrain.aerifal.cx
State New
Series "Incorrect thread TID caching"
Headers show

Commit Message

Rich Felker Feb. 17, 2021, 9:07 p.m.
On Wed, Feb 17, 2021 at 03:11:57PM -0500, Rich Felker wrote:
> On Wed, Feb 17, 2021 at 02:49:45PM -0500, Dominic Chen wrote:
> > On 2/15/2021 11:56 AM, Rich Felker wrote:
> > >Following up on this now, the code in _Fork is something I really
> > >don't want to duplicate for clone() for risk of forgetting there's a
> > >copy in the latter and letting it bitrot there. I'd rather refactor
> > >things so the same logic can be shared...
> > 
> > Thanks for the update. Can you use something like
> > __attribute__((always_inline)) to just write the logic once but
> > force it to be inlined into both library functions?
> 
> Whether it's inlined isn't really a big deal; this is not a hot path.
> It's more just a matter of how it needs to be split up at the source
> level, and it seems to be messy whichever way we choose.
> 
> Trying to avoid calling __clone doesn't seem like such a good idea,
> since the child has to run on a new stack -- if we did avoid it we'd
> need a new way to switch stacks. The generic __unmapself has a hack
> to do this already that we could reuse without needing new
> arch-specific glue though.
> 
> I'll keep trying things and see if I come up with something not too
> unreasonable.

Attached is a draft of how clone() *could* work without refactoring
the pre/post logic from _Fork to use __clone. I don't particularly
like it, and CRTJMP abuse like this (which __unmapself also does, as
noted above) is not valid for FDPIC archs (it actually expects a code
address not a function pointer). I'll try a version the other way and
see how it looks.

Rich

Patch hide | download patch | download mbox

diff --git a/src/linux/clone.c b/src/linux/clone.c
index 8c1af7d3..32d53a8f 100644
--- a/src/linux/clone.c
+++ b/src/linux/clone.c
@@ -4,6 +4,25 @@ 
 #include <sched.h>
 #include "pthread_impl.h"
 #include "syscall.h"
+#include "dynlink.h"
+
+extern int _Forklike(int (*)(void *), void *);
+
+struct clone_args {
+	int flags;
+	pid_t *ptid, *ctid;
+};
+
+static int do_clone(void *p)
+{
+	struct clone_args *args = p;
+	int mask = CLONE_PARENT_SETTID | CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID;
+	int r = __syscall(SYS_clone, args->flags & ~mask, 0);
+	if (r>0 && (args->flags & CLONE_PARENT_SETTID)) *args->ptid = r;
+	if (!r && (args->flags & CLONE_CHILD_SETTID)) *args->ctid = __syscall(SYS_gettid);
+	if (!r && (args->flags & CLONE_CHILD_CLEARTID)) __syscall(SYS_set_tid_address, args->ctid);
+	return r;
+}
 
 int clone(int (*func)(void *), void *stack, int flags, void *arg, ...)
 {
@@ -17,5 +36,13 @@  int clone(int (*func)(void *), void *stack, int flags, void *arg, ...)
 	ctid = va_arg(ap, pid_t *);
 	va_end(ap);
 
+	if (!(flags & CLONE_VM)) {
+		struct clone_args args = { .flags = flags, .ptid = ptid, .ctid = ctid };
+		if (flags & CLONE_THREAD) return __syscall_ret(-EINVAL);
+		int r = _Forklike(do_clone, &args);
+		if (r) return r;
+		CRTJMP(func, stack);
+	}
+
 	return __syscall_ret(__clone(func, stack, flags, arg, ptid, tls, ctid));
 }

Comments

Dominic Chen March 12, 2021, 9:14 p.m.
On 2/17/2021 4:07 PM, Rich Felker wrote:
>> Whether it's inlined isn't really a big deal; this is not a hot path.
>> It's more just a matter of how it needs to be split up at the source
>> level, and it seems to be messy whichever way we choose.
>>
>> Trying to avoid calling __clone doesn't seem like such a good idea,
>> since the child has to run on a new stack -- if we did avoid it we'd
>> need a new way to switch stacks. The generic __unmapself has a hack
>> to do this already that we could reuse without needing new
>> arch-specific glue though.
>>
>> I'll keep trying things and see if I come up with something not too
>> unreasonable.
>>
>> Attached is a draft of how clone() *could* work without refactoring
>> the pre/post logic from _Fork to use __clone. I don't particularly
>> like it, and CRTJMP abuse like this (which __unmapself also does, as
>> noted above) is not valid for FDPIC archs (it actually expects a code
>> address not a function pointer). I'll try a version the other way and
>> see how it looks.

Sorry for the delay. I did end up needing robust mutex functionality, 
and ended up using a variant of your patch to fix-up. There were two 
issues that I noticed:

 1. Prior to CRTJMP, the 'arg' argument needs to be moved into the first
    argument register for the local calling convention. I spent a little
    time trying to get an architecture-neutral version working using a
    wrapper function, but gave up and used some inline assembly to
    populate %rdi on x86_64.
 2. If a thread exits without unlocking a robust mutex, the program will
    subsequently SIGSEGV since the thread's memory region has been
    unmapped, but a pointer to the robust mutex remains on the internal
    linked list. Not sure where the fault lies here, but just thought
    I'd mention it.

Thanks,

Dominic