dlsym(handle) may search in unrelated libraries

Submitted by Markus Wichmann on Feb. 6, 2019, 8:25 p.m.

Details

Message ID 20190206202518.GC5469@voyager
State New
Series "dlsym(handle) may search in unrelated libraries"
Headers show

Commit Message

Markus Wichmann Feb. 6, 2019, 8:25 p.m.
On Wed, Feb 06, 2019 at 08:02:28PM +0300, Alexey Izbyshev wrote:
> Unfortunately, my test case was a simplified example of a general problem:
> dso->deps is assigned only for the main app and for libraries opened with
> dlopen(), but not for their dependencies. Consider the following:

Right you are. It took me a while to understand what the deps array was
even for (since musl's dlclose() doesn't do anything, tracking
dependencies is mostly pointless), but I found it is needed for lazy
relocation processing. So it is necessary for all libs opened by
dlopen() directly to contain a list of all their dependencies. All the
other libs can have an empty list.

So I propose the attached patch in addition to the previous one. This
will set dso->deps to the empty list in all libs not directly loaded
from dlopen(). The previous patch is still necessary, as nothing ever
calls load_deps() on the libc or the vdso, but all other modules get a
load_deps() treatment.

> 
> Alexey
> 
> 

Ciao,
Markus

Patch hide | download patch | download mbox

From 841dd4e075040e2aeb01adea8ef5e2f7c0fc006a Mon Sep 17 00:00:00 2001
From: Markus Wichmann <nullplan@gmx.net>
Date: Wed, 6 Feb 2019 21:13:05 +0100
Subject: [PATCH 7/7] Initialize deps on non-directly loaded libs.

As pointed out by Alexey Izbyshev, having the deps member be zero opens
dlopen() and dlsym() up to malfunctions, if a library was previously
loaded as dependency and is then dlopen()ed. Therefore, we now set the
deps member of the dso descriptor to the sentry value in all libs loaded
as dependencies.
---
 ldso/dynlink.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ldso/dynlink.c b/ldso/dynlink.c
index 6ffeca85..f8346c54 100644
--- a/ldso/dynlink.c
+++ b/ldso/dynlink.c
@@ -1158,8 +1158,8 @@  static void load_deps(struct dso *p)
 				*deps = tmp;
 			}
 		}
+		if (!p->deps) p->deps = (struct dso**)&nodeps_dummy;
 	}
-	if (!*deps) *deps = (struct dso **)&nodeps_dummy;
 }
 
 static void load_preload(char *s)
-- 
2.20.1


Comments

Alexey Izbyshev Feb. 6, 2019, 9:23 p.m.
On 2019-02-06 23:25, Markus Wichmann wrote:
> Right you are. It took me a while to understand what the deps array was
> even for (since musl's dlclose() doesn't do anything, tracking
> dependencies is mostly pointless), but I found it is needed for lazy
> relocation processing. So it is necessary for all libs opened by
> dlopen() directly to contain a list of all their dependencies. All the
> other libs can have an empty list.

Actually, dso->deps is used in dlsym(handle) because it must use the 
dependency order for symbol search, so it's incorrect to have deps empty 
for "all the other" libs. Consider the following modification of my 
previous example:

$ cat bazdep.c
int bazdep = 1;
extern int bazdepdep;
int *p = &bazdepdep;
$ cat bazdepdep.c
int bazdepdep = 2;
$ cat main.c
#include <dlfcn.h>
#include <stdio.h>

int main(void) {
   if (!dlopen("libbaz.so", RTLD_NOW|RTLD_LOCAL))
     return 1;
   if (!dlopen("libfoo.so", RTLD_NOW|RTLD_LOCAL))
     return 1;
   void *h = dlopen("libbazdep.so", RTLD_NOW|RTLD_LOCAL);
   printf("%p\n", dlsym(h, "bar"));
   printf("%p\n", dlsym(h, "bazdepdep"));
}

The correct output is zero in the first line and some non-zero address 
in the second. Vanilla musl 1.1.21 prints two non-zero addresses. But 
with your patch the output is two zeros because dlsym() can't search in 
dependencies of "libbazdep.so" anymore.

Alexey
Markus Wichmann Feb. 7, 2019, 5:33 a.m.
On Thu, Feb 07, 2019 at 12:23:06AM +0300, Alexey Izbyshev wrote:
> On 2019-02-06 23:25, Markus Wichmann wrote:
> > Right you are. It took me a while to understand what the deps array was
> > even for (since musl's dlclose() doesn't do anything, tracking
> > dependencies is mostly pointless), but I found it is needed for lazy
> > relocation processing. So it is necessary for all libs opened by
> > dlopen() directly to contain a list of all their dependencies. All the
> > other libs can have an empty list.
> 
> Actually, dso->deps is used in dlsym(handle) because it must use the
> dependency order for symbol search, so it's incorrect to have deps empty for
> "all the other" libs. Consider the following modification of my previous
> example:
> 
> $ cat bazdep.c
> int bazdep = 1;
> extern int bazdepdep;
> int *p = &bazdepdep;
> $ cat bazdepdep.c
> int bazdepdep = 2;
> $ cat main.c
> #include <dlfcn.h>
> #include <stdio.h>
> 
> int main(void) {
>   if (!dlopen("libbaz.so", RTLD_NOW|RTLD_LOCAL))
>     return 1;
>   if (!dlopen("libfoo.so", RTLD_NOW|RTLD_LOCAL))
>     return 1;
>   void *h = dlopen("libbazdep.so", RTLD_NOW|RTLD_LOCAL);
>   printf("%p\n", dlsym(h, "bar"));
>   printf("%p\n", dlsym(h, "bazdepdep"));
> }
> 
> The correct output is zero in the first line and some non-zero address in
> the second. Vanilla musl 1.1.21 prints two non-zero addresses. But with your
> patch the output is two zeros because dlsym() can't search in dependencies
> of "libbazdep.so" anymore.
> 
> Alexey

OK, so life just got more interesting. I gather the deps handling was
always incorrect.

Let's consider the original code. liba depends on libb, which depends on
libc. dlopen("liba") returns a handle with libb and libc in the deps,
but libb->deps == 0. If we now call dlopen("libb"), that does the right
thing, but only because libb happens to be the last lib in the chain. If
we'd have loaded libx, liby, and libz before trying libb, it would add
all the symbols of libs x, y, and z to the libb handle.

I guess the hope was that this situation never arrises. So how do we fix
this?

I think the easiest is probably going to be to patch up load_deps, but
avoiding recursion is going to be the fun part. My plan is to make
dso->deps contain all direct and indirect dependencies (which is what
the code seems to depend on, anyway). This is going to consume more
memory, but we are talking a few pointers, and we are dealing with
shared libs, anyway.

As you said, order is important. What is the correct order, depth-first
or breadth-first? I think it should be depth-first, but lack any
authoritative knowledge on this. It would make the most sense, anyway
(if, from the point of view of a user a library contains all the symbols
of its dependencies, then those dependencies must also contain all the
symbols of their dependencies). So with the following dependency tree:

liba->libb->libc
    `>libx->liby

the handle for liba would list libc before libx.

Easiest implementation is probably still going to be recursive. Let's
hope the dependency trees don't get too wild.

I'll look into it after work.

Ciao,
Markus
Alexey Izbyshev Feb. 7, 2019, 1:42 p.m.
On 2/7/19 8:33 AM, Markus Wichmann wrote:
> Let's consider the original code. liba depends on libb, which depends on
> libc. dlopen("liba") returns a handle with libb and libc in the deps,
> but libb->deps == 0. If we now call dlopen("libb"), that does the right
> thing, but only because libb happens to be the last lib in the chain. If
> we'd have loaded libx, liby, and libz before trying libb, it would add
> all the symbols of libs x, y, and z to the libb handle.

Your description almost captures the problem, but is imprecise in the 
last part: "it would add all the symbols of libs x, y, and z to the libb 
handle". load_deps() looks only at DT_NEEDED entries of libraries it 
iterates over, so, for example, if libx depends on both liby and libz, 
then liby and libz (but not libx) would be added to deps of libb.

Moreover, consider the following dependency hierarchy (loaded on 
dlopen("liba")):
liba
   libb
   libd
     libe

In this case, even dlopen("libb") wouldn't do the right thing because 
load_deps() would find libe in DT_NEEDED of libd and add it to deps of libb.
> 
> As you said, order is important. What is the correct order, depth-first
> or breadth-first? I think it should be depth-first, but lack any
> authoritative knowledge on this.
dlsym(handle) uses so-called "dependency order"[1], which is 
breadth-first[2]. This is what musl current does in cases when 
load_deps() is called on a real first load of a library (so that 
everything that's further in the dso list are implicitly loaded 
dependencies of this library).

So with the following dependency tree:
> 
> liba->libb->libc
>      `>libx->liby
> 
> the handle for liba would list libc before libx.
> 
The correct order is what load_deps() does currently: liba libb libx 
libc liby

 > Easiest implementation is probably still going to be recursive. Let's
hope the dependency trees don't get too wild.

I think the easiest way is simply to modify load_deps() to always 
traverse DT_NEEDED in breadth-first order without relying on the dso 
list in the outer loop. load_deps() already effectively maintains a 
queue (deps) that can be used for BFS, so no recursion is needed.
> 
> I'll look into it after work.
> 
Thanks for following this up, Markus!

[1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/dlsym.html
[2] http://pubs.opengroup.org/onlinepubs/9699919799/functions/dlopen.html

Alexey
Rich Felker Feb. 7, 2019, 4:54 p.m.
On Thu, Feb 07, 2019 at 06:33:27AM +0100, Markus Wichmann wrote:
> On Thu, Feb 07, 2019 at 12:23:06AM +0300, Alexey Izbyshev wrote:
> > On 2019-02-06 23:25, Markus Wichmann wrote:
> > > Right you are. It took me a while to understand what the deps array was
> > > even for (since musl's dlclose() doesn't do anything, tracking
> > > dependencies is mostly pointless), but I found it is needed for lazy
> > > relocation processing. So it is necessary for all libs opened by
> > > dlopen() directly to contain a list of all their dependencies. All the
> > > other libs can have an empty list.
> > 
> > Actually, dso->deps is used in dlsym(handle) because it must use the
> > dependency order for symbol search, so it's incorrect to have deps empty for
> > "all the other" libs. Consider the following modification of my previous
> > example:
> > 
> > $ cat bazdep.c
> > int bazdep = 1;
> > extern int bazdepdep;
> > int *p = &bazdepdep;
> > $ cat bazdepdep.c
> > int bazdepdep = 2;
> > $ cat main.c
> > #include <dlfcn.h>
> > #include <stdio.h>
> > 
> > int main(void) {
> >   if (!dlopen("libbaz.so", RTLD_NOW|RTLD_LOCAL))
> >     return 1;
> >   if (!dlopen("libfoo.so", RTLD_NOW|RTLD_LOCAL))
> >     return 1;
> >   void *h = dlopen("libbazdep.so", RTLD_NOW|RTLD_LOCAL);
> >   printf("%p\n", dlsym(h, "bar"));
> >   printf("%p\n", dlsym(h, "bazdepdep"));
> > }
> > 
> > The correct output is zero in the first line and some non-zero address in
> > the second. Vanilla musl 1.1.21 prints two non-zero addresses. But with your
> > patch the output is two zeros because dlsym() can't search in dependencies
> > of "libbazdep.so" anymore.
> > 
> > Alexey
> 
> OK, so life just got more interesting. I gather the deps handling was
> always incorrect.
> 
> Let's consider the original code. liba depends on libb, which depends on
> libc. dlopen("liba") returns a handle with libb and libc in the deps,
> but libb->deps == 0. If we now call dlopen("libb"), that does the right
> thing, but only because libb happens to be the last lib in the chain. If
> we'd have loaded libx, liby, and libz before trying libb, it would add
> all the symbols of libs x, y, and z to the libb handle.
> 
> I guess the hope was that this situation never arrises. So how do we fix
> this?
> 
> I think the easiest is probably going to be to patch up load_deps, but
> avoiding recursion is going to be the fun part. My plan is to make
> dso->deps contain all direct and indirect dependencies (which is what
> the code seems to depend on, anyway). This is going to consume more
> memory, but we are talking a few pointers, and we are dealing with
> shared libs, anyway.

Yes, it needs to contain all direct and indirect dependencies, in the
correct order. Right now I think it incorrectly contains a superset of
that. Is that your assessment of the situation too?

> As you said, order is important. What is the correct order, depth-first
> or breadth-first? I think it should be depth-first, but lack any
> authoritative knowledge on this.

It's not; it's breadth-first. The definition of "dependency order"
used for dlsym is here:

http://pubs.opengroup.org/onlinepubs/9699919799/functions/dlopen.html

    "Dependency ordering uses a breadth-first order starting with a
    given executable object file, then all of its dependencies, then
    any dependents of those, iterating until all dependencies are
    satisfied. With the exception of the global symbol table handle
    obtained via a dlopen() operation with a null pointer as the file
    argument, dependency ordering is used by the dlsym() function."

However, a depth-first list of dependencies is also needed to solve
the longstanding ctor-order issue discussed in several past threads
(which I can look up if search doesn't turn them up for you). This is
not a requirement of POSIX (which doesn't even have ctors), but it's a
reasonable expectation we currently get wrong and I think it might be
specified in ELF or some related sysv "standards".

I'm not sure if it makes sense to build both of these lists at the
same time. We currently try to avoid allocating dependency lists for
libraries loaded at startup in order not to allocate and setup data
that might never be used, defering until if/when dlopen is called on
them. I've wanted to do the ctor order walk without allocating a
dependency list, but I don't know a good way to do so. Note that the
code that runs ctors cannot allocate because, at the time it runs,
dlopen has already "committed" the load and can't back it out. It has
to have all resources it needs for the walk precommitted.

Since the beginning of a list of breadth-first dependencies is just
the direct dependencies, if we recorded the number of direct
dependencies for each dso, the breadth-first lists could be used to
perform a depth-first walk to run ctors; this can be made
non-recursive, at the cost of quadratic time but with trivially-tiny
constant factor, by simply restarting at the root of the tree after
each node and finding the deepest not-yet-constructed dso. This
suggests always allocating the breadth-first dependency lists might be
a good idea. That "feels" unfortunate since in principle they can get
large, but it's only going to be large for ridiculous bloatware with
hundreds of libraries, in which case these dependency lists are very
small on a relative scale.

> It would make the most sense, anyway
> (if, from the point of view of a user a library contains all the symbols
> of its dependencies, then those dependencies must also contain all the
> symbols of their dependencies). So with the following dependency tree:
> 
> liba->libb->libc
>     `>libx->liby
> 
> the handle for liba would list libc before libx.
> 
> Easiest implementation is probably still going to be recursive. Let's
> hope the dependency trees don't get too wild.

I don't think recursive can be done safely since you don't have a
small bound on levels of recursion. It could be done with an explicit
stack in allocated storage though, since the operation has to allocate
memory anyway and has to fail if allocation fails.

Rich
Markus Wichmann Feb. 7, 2019, 6:36 p.m.
On Thu, Feb 07, 2019 at 11:54:47AM -0500, Rich Felker wrote:
> Yes, it needs to contain all direct and indirect dependencies, in the
> correct order. Right now I think it incorrectly contains a superset of
> that. Is that your assessment of the situation too?
> 

Yes, it is a superset, if the original argument to load_deps() is not
the tail of the chain of libraries.

> However, a depth-first list of dependencies is also needed to solve
> the longstanding ctor-order issue discussed in several past threads
> (which I can look up if search doesn't turn them up for you). This is
> not a requirement of POSIX (which doesn't even have ctors), but it's a
> reasonable expectation we currently get wrong and I think it might be
> specified in ELF or some related sysv "standards".
> 

Requirements first, nice-to-haves later, right? Once we have the dlsym()
conformance issue squared away, we can still deal with this issue.

You mean, people expect their dependencies to be initialized in their
own initializers? When it is well-known that there is no order to
initializers, and e.g. the C++ standard says there is no order to
constructors of static objects.

> I'm not sure if it makes sense to build both of these lists at the
> same time. We currently try to avoid allocating dependency lists for
> libraries loaded at startup in order not to allocate and setup data
> that might never be used, defering until if/when dlopen is called on
> them. I've wanted to do the ctor order walk without allocating a
> dependency list, but I don't know a good way to do so. Note that the
> code that runs ctors cannot allocate because, at the time it runs,
> dlopen has already "committed" the load and can't back it out. It has
> to have all resources it needs for the walk precommitted.
> 

Depth first means stack. Means recursion or explicit stack. Explicit is
going to be hard, considering there is no known limit to dependency
nesting depth. We could argue that the user better make their stack size
ulimit large enough for the main thread, but I hazard a guess that
nobody expects dlopen() to use overly much stack in other threads.

Alright, what's the algorithm here?

init(lib):
    if (!lib.inited):
        foreach d in lib.deps init(d)
        start_init(lib)
        lib.inited = 1

That about right? Because that means we need a stack as high as the
nesting depth of dependencies. We can get a pessimistic estimation from
the number of deps loaded, but that may be way too high (see below).

> Since the beginning of a list of breadth-first dependencies is just
> the direct dependencies, if we recorded the number of direct
> dependencies for each dso, the breadth-first lists could be used to
> perform a depth-first walk to run ctors; this can be made
> non-recursive, at the cost of quadratic time but with trivially-tiny
> constant factor, by simply restarting at the root of the tree after
> each node and finding the deepest not-yet-constructed dso. This
> suggests always allocating the breadth-first dependency lists might be
> a good idea. That "feels" unfortunate since in principle they can get
> large, but it's only going to be large for ridiculous bloatware with
> hundreds of libraries, in which case these dependency lists are very
> small on a relative scale.
> 

$ ldd =mpv | wc -l
292

And that, I will remind you, is the "suckless" alternative to mplayer.

Anyway, that is breadth, not depth. Experimentally, it is hard to find a
chain of libraries deeper than ten. They all eventually just lead back
to libc.

So what you could do, when it comes time to run the initializers, is to
walk the dyn vectors again. Those are already allocated, and all the
libs they reference are already loaded, so load_library() would only be
a search operation. By the time the initializers run, that must be the
case, both at load time and at runtime. Unless loading the library
failed the first time, in which case at runtime we have already bailed.
And at load time we... haven't. Are you sure continuing on load failure
of a dependency during load time is a good idea?

> I don't think recursive can be done safely since you don't have a
> small bound on levels of recursion. It could be done with an explicit
> stack in allocated storage though, since the operation has to allocate
> memory anyway and has to fail if allocation fails.
> 
> Rich

Nevermind, it was a misunderstanding of the requirements.

Ciao,
Markus
Rich Felker Feb. 7, 2019, 6:57 p.m.
On Thu, Feb 07, 2019 at 07:36:38PM +0100, Markus Wichmann wrote:
> On Thu, Feb 07, 2019 at 11:54:47AM -0500, Rich Felker wrote:
> > However, a depth-first list of dependencies is also needed to solve
> > the longstanding ctor-order issue discussed in several past threads
> > (which I can look up if search doesn't turn them up for you). This is
> > not a requirement of POSIX (which doesn't even have ctors), but it's a
> > reasonable expectation we currently get wrong and I think it might be
> > specified in ELF or some related sysv "standards".
> 
> Requirements first, nice-to-haves later, right? Once we have the dlsym()
> conformance issue squared away, we can still deal with this issue.
> 
> You mean, people expect their dependencies to be initialized in their
> own initializers? When it is well-known that there is no order to
> initializers, and e.g. the C++ standard says there is no order to
> constructors of static objects.

Yes. GCC has an extension for ctor priority within static linking
scope, but for dynamic linking scope that doesn't help. I don't like
any of this but glib depends on it to avoid just doing the right thing
with pthread_once/call_once, and refuses to fix it.

> > I'm not sure if it makes sense to build both of these lists at the
> > same time. We currently try to avoid allocating dependency lists for
> > libraries loaded at startup in order not to allocate and setup data
> > that might never be used, defering until if/when dlopen is called on
> > them. I've wanted to do the ctor order walk without allocating a
> > dependency list, but I don't know a good way to do so. Note that the
> > code that runs ctors cannot allocate because, at the time it runs,
> > dlopen has already "committed" the load and can't back it out. It has
> > to have all resources it needs for the walk precommitted.
> 
> Depth first means stack. Means recursion or explicit stack. Explicit is
> going to be hard, considering there is no known limit to dependency
> nesting depth. We could argue that the user better make their stack size
> ulimit large enough for the main thread, but I hazard a guess that
> nobody expects dlopen() to use overly much stack in other threads.
> 
> Alright, what's the algorithm here?
> 
> init(lib):
>     if (!lib.inited):
>         foreach d in lib.deps init(d)
>         start_init(lib)
>         lib.inited = 1
> 
> That about right? Because that means we need a stack as high as the
> nesting depth of dependencies. We can get a pessimistic estimation from
> the number of deps loaded, but that may be way too high (see below).

Yes, but you can also avoid recursion just by looping to the deepest
dependency with !inited, then going back to the root. For a one-time
operation at dlopen-time or program-start time, the quadratic search
for each !inited seems unlikely to be a problem:

> > Since the beginning of a list of breadth-first dependencies is just
> > the direct dependencies, if we recorded the number of direct
> > dependencies for each dso, the breadth-first lists could be used to
> > perform a depth-first walk to run ctors; this can be made
> > non-recursive, at the cost of quadratic time but with trivially-tiny
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > constant factor, by simply restarting at the root of the tree after
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > each node and finding the deepest not-yet-constructed dso. This
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

> $ ldd =mpv | wc -l
> 292
> 
> And that, I will remind you, is the "suckless" alternative to mplayer.
> 
> Anyway, that is breadth, not depth. Experimentally, it is hard to find a
> chain of libraries deeper than ten. They all eventually just lead back
> to libc.
> 

> So what you could do, when it comes time to run the initializers, is to
> walk the dyn vectors again. Those are already allocated, and all the
> libs they reference are already loaded, so load_library() would only be
> a search operation.

Note that each load_library is a linear search, so you still have the
quadratic time, now with much higher coefficient. And if you want to
use my go-back-to-root approach to avoid having O(depth) working
space, it's cubic time.

Also, as-written, load_library is not "only a search operation";
perhaps this is a bug in itself. For absolute paths it will always
open the file and compare inode identity to see if it matches an
already-loaded file. This should probably be bypassed by first
checking long-name. Fortunately absolute paths in DT_NEEDED are
strongly discouraged/a-really-bad-idea, so it probably doesn't come up
much in practice.

> By the time the initializers run, that must be the
> case, both at load time and at runtime. Unless loading the library
> failed the first time, in which case at runtime we have already bailed.
> And at load time we... haven't. Are you sure continuing on load failure
> of a dependency during load time is a good idea?

I don't follow. The dlopen operation is not committed until load of
all dependencies completes successfully, and if any fail to load, the
whole operation is backed-out. But ctors don't/can't run until *after*
that, when we've already committed to success.

Rich
Markus Wichmann Feb. 7, 2019, 8:31 p.m.
On Thu, Feb 07, 2019 at 01:57:36PM -0500, Rich Felker wrote:
> Yes. GCC has an extension for ctor priority within static linking
> scope, but for dynamic linking scope that doesn't help. I don't like
> any of this but glib depends on it to avoid just doing the right thing
> with pthread_once/call_once, and refuses to fix it.
> 

Well, at least we are on the same page, here. And my opinion of glib is
validated once more. Unfortunately, at this point it is too big to fail,
in several ways.

> Yes, but you can also avoid recursion just by looping to the deepest
> dependency with !inited, then going back to the root. For a one-time
> operation at dlopen-time or program-start time, the quadratic search
> for each !inited seems unlikely to be a problem:
> 

Wait, I have an idea. If the only ordering is that the dependencies need
to be initialized before their dependents, then couldn't we just
initialize the libs in reverse BFS order? The elements further down the
tree are all necessarily further down the list, aren't they?

> I don't follow. The dlopen operation is not committed until load of
> all dependencies completes successfully, and if any fail to load, the
> whole operation is backed-out. But ctors don't/can't run until *after*
> that, when we've already committed to success.
> 

That is true for the runtime case, i.e. dlopen(). But load_deps() is
also called at load time. And initializers have to run at load time,
too. And in the correct order.

If at load time, any dependencies fail to load, an error message is
printed and then the loop continues. load_deps() has no way to signal
failure to the caller, and at load time it will not exit the function in
another way, i.e. longjump (which is good since that would be invalid at
that time). So by the time the initializers are called, all dependencies
are loaded except those which failed.

> Rich

Ciao,
Markus
Rich Felker Feb. 7, 2019, 9:33 p.m.
On Thu, Feb 07, 2019 at 09:31:38PM +0100, Markus Wichmann wrote:
> > Yes, but you can also avoid recursion just by looping to the deepest
> > dependency with !inited, then going back to the root. For a one-time
> > operation at dlopen-time or program-start time, the quadratic search
> > for each !inited seems unlikely to be a problem:
> > 
> 
> Wait, I have an idea. If the only ordering is that the dependencies need
> to be initialized before their dependents, then couldn't we just
> initialize the libs in reverse BFS order? The elements further down the
> tree are all necessarily further down the list, aren't they?

No. Suppose X depends on Y and Z, and Z also depends on Y. If you do
reverse-BFS order, you'll construct Z before Y, despite Z depending on
Y (and Z's ctors depending on Y's ctors already having run).

> > I don't follow. The dlopen operation is not committed until load of
> > all dependencies completes successfully, and if any fail to load, the
> > whole operation is backed-out. But ctors don't/can't run until *after*
> > that, when we've already committed to success.
> 
> That is true for the runtime case, i.e. dlopen(). But load_deps() is
> also called at load time. And initializers have to run at load time,
> too. And in the correct order.
> 
> If at load time, any dependencies fail to load, an error message is
> printed and then the loop continues. load_deps() has no way to signal
> failure to the caller, and at load time it will not exit the function in
> another way, i.e. longjump (which is good since that would be invalid at
> that time). So by the time the initializers are called, all dependencies
> are loaded except those which failed.

See the definition of error(). It sets ldso_fail so that execution
never proceeds to the program.

Rich
Rich Felker Feb. 7, 2019, 9:37 p.m.
On Thu, Feb 07, 2019 at 04:33:18PM -0500, Rich Felker wrote:
> On Thu, Feb 07, 2019 at 09:31:38PM +0100, Markus Wichmann wrote:
> > > Yes, but you can also avoid recursion just by looping to the deepest
> > > dependency with !inited, then going back to the root. For a one-time
> > > operation at dlopen-time or program-start time, the quadratic search
> > > for each !inited seems unlikely to be a problem:
> > > 
> > 
> > Wait, I have an idea. If the only ordering is that the dependencies need
> > to be initialized before their dependents, then couldn't we just
> > initialize the libs in reverse BFS order? The elements further down the
> > tree are all necessarily further down the list, aren't they?
> 
> No. Suppose X depends on Y and Z, and Z also depends on Y. If you do
> reverse-BFS order, you'll construct Z before Y, despite Z depending on
> Y (and Z's ctors depending on Y's ctors already having run).

Hmm, maybe if you add redundant entries like your code was doing, this
works -- the list gets "YZY" for my example -- but then you waste a
good bit of space and need a way to cut off circular dependencies
(which can't respect ctor dependency order) while not breaking the
order for non-circular ones.

Rich
A. Wilcox Feb. 8, 2019, 10:19 a.m.
On Feb 7, 2019, at 10:54 AM, Rich Felker <dalias@libc.org> wrote:
> 
> However, a depth-first list of dependencies is also needed to solve the longstanding ctor-order issue discussed in several past threads (which I can look up if search doesn't turn them up for you). This is not a requirement of POSIX (which doesn't even have ctors), but it's a reasonable expectation we currently get wrong and I think it might be specified in ELF or some related sysv "standards".


It is part of the ELF 1.2 standard and is not only required by GLib, but also Nouveau, gtk-doc, some Qt apps, and others.

More info, including the spec citation, example failures, and the musl ML thread, can be found on our BTS:

https://bts.adelielinux.org/show_bug.cgi?id=11

Best,
—arw 


--
A. Wilcox (Sent from my iPhone - not signed)
Project Lead, Adélie Linux
https://adelielinux.org
Szabolcs Nagy Feb. 8, 2019, noon
* A. Wilcox <awilfox@adelielinux.org> [2019-02-08 04:19:32 -0600]:
> On Feb 7, 2019, at 10:54 AM, Rich Felker <dalias@libc.org> wrote:
> > 
> > However, a depth-first list of dependencies is also needed to solve the longstanding ctor-order issue discussed in several past threads (which I can look up if search doesn't turn them up for you). This is not a requirement of POSIX (which doesn't even have ctors), but it's a reasonable expectation we currently get wrong and I think it might be specified in ELF or some related sysv "standards".
> 
> 
> It is part of the ELF 1.2 standard and is not only required by GLib, but also Nouveau, gtk-doc, some Qt apps, and others.

note, that the ctor order in c++ is undefined across translation
units so if a c++ application relies on it that's a bug: the c++
ctor concept does not necessarily map directly to the elf
initialization function concept on an elf platform so the elf
spec does not save them.

in the elf context, relying on DT_NEEDED for ctor ordering
across libraries means that static linking is broken since
then there is no DT_NEEDED.

so these packages should be fixed independently of musl
(assuming they want to be portable or at least static linkable).

that said, lot of libraries do rely on dependency based ctor
ordering even beyond the elf guarantees (glibc and solaris
dynamic linkers considers relocations on top of DT_NEEDED to
establish dependencies between dsos) and some applications
have dependency cycles which of course is impossible to get
right but glibc tries to keep the ctor order deterministic
in some way even in the presence of dlopen during init.

originally dynamic linkers did exactly what musl did, there
is a nice overview how the complexity evolved:
https://blogs.oracle.com/solaris/init-and-fini-processing-who-designed-this-v2

> More info, including the spec citation, example failures, and the musl ML thread, can be found on our BTS:
> 
> https://bts.adelielinux.org/show_bug.cgi?id=11
> 
> Best,
> —arw 
> 
> 
> --
> A. Wilcox (Sent from my iPhone - not signed)
> Project Lead, Adélie Linux
> https://adelielinux.org
Rich Felker Feb. 8, 2019, 4:09 p.m.
On Fri, Feb 08, 2019 at 01:00:16PM +0100, Szabolcs Nagy wrote:
> * A. Wilcox <awilfox@adelielinux.org> [2019-02-08 04:19:32 -0600]:
> > On Feb 7, 2019, at 10:54 AM, Rich Felker <dalias@libc.org> wrote:
> > > 
> > > However, a depth-first list of dependencies is also needed to solve the longstanding ctor-order issue discussed in several past threads (which I can look up if search doesn't turn them up for you). This is not a requirement of POSIX (which doesn't even have ctors), but it's a reasonable expectation we currently get wrong and I think it might be specified in ELF or some related sysv "standards".
> > 
> > 
> > It is part of the ELF 1.2 standard and is not only required by GLib, but also Nouveau, gtk-doc, some Qt apps, and others.
> 
> note, that the ctor order in c++ is undefined across translation
> units so if a c++ application relies on it that's a bug: the c++
> ctor concept does not necessarily map directly to the elf
> initialization function concept on an elf platform so the elf
> spec does not save them.

In this case they're not using C++ but the GNU C extension.

> in the elf context, relying on DT_NEEDED for ctor ordering
> across libraries means that static linking is broken since
> then there is no DT_NEEDED.

This is true, but they can make static linking also work (within
libraries that share a convention on the priority values to use) by
using the GNU C ctor priority attributes. This could fix the glib bug.
I forget if any distros are doing it out-of-tree or if they just
patched the one ctor to call the other init function explicitly.

> so these packages should be fixed independently of musl
> (assuming they want to be portable or at least static linkable).

This is probably mostly correct.

> that said, lot of libraries do rely on dependency based ctor
> ordering even beyond the elf guarantees (glibc and solaris
> dynamic linkers considers relocations on top of DT_NEEDED to
> establish dependencies between dsos) and some applications
> have dependency cycles which of course is impossible to get
> right but glibc tries to keep the ctor order deterministic
> in some way even in the presence of dlopen during init.
> 
> originally dynamic linkers did exactly what musl did, there
> is a nice overview how the complexity evolved:
> https://blogs.oracle.com/solaris/init-and-fini-processing-who-designed-this-v2

Good to know there's precedent.

Rich