Asterisk 16 segmentation fault

Submitted by Sebastian Kemper on Feb. 28, 2019, 8:27 p.m.

Details

Message ID 20190228202751.GB31965@darth.lan
State New
Series "Asterisk 16 segmentation fault"
Headers show

Commit Message

Sebastian Kemper Feb. 28, 2019, 8:27 p.m.
Hello list,

I'm trying to update Asterisk 15 to 16 on OpenWrt. It's apparently not
easy %-).

The first thing I ran into was that now the header astmm.h is included
by asterisk.h. In earlier releases this header was only included when
doing malloc debugging it seems.

astmm.h redefines some memory allocation functions as far as I
understand. musl doesn't seem to like this very much and the compile
fails. I worked around this with a small patch:


With this patch applied I can now compile Asterisk 16 against musl.

But when I tried to run the binary I ran into a segmentation fault. It
starts to initialize - and everything looks good - until it attempts to
load the modules/plugins.

I connected to the box with a remote gdb and extracted some information.
The segmentation fault seems to happen in main/loader.c
(https://github.com/asterisk/asterisk/blob/16.2/main/loader.c),
specifically in load_dlopen() I think.

I'm not a programmer so I have problems making sense of this. So I'm
hoping that maybe one of you can shine a light.

I set a break point in the mentioned file, line 952. Then I stepped
forward.

(gdb) 

Thread 1 "asterisk" hit Breakpoint 1, load_dlopen (resource_in=0x77d8f52b <parseHhMmSs+578> "D\350\240e", resource_in@entry=0x5fa910 "res_pjproject.so", so_ext=0x0, 
    so_ext@entry=0x53b91c "", filename=0x5fa910 "res_pjproject.so", filename@entry=0x7fff7a04 "/usr/lib/asterisk/modules/res_pjproject.so", flags=796226418, 
    flags@entry=258, suppress_logging=suppress_logging@entry=0) at loader.c:952
952		if (resource_being_loaded) {
(gdb) 
951		mod->lib = dlopen(filename, flags);
(gdb) 
952		if (resource_being_loaded) {
(gdb) 
955			const char *dlerror_msg = ast_strdupa(dlerror());
(gdb) 

Thread 1 "asterisk" received signal SIGSEGV, Segmentation fault.
strlen (s=0x0, s@entry=0x48d79d <load_dynamic_module+120> "\t\360\"\223\f\234\200\353\216#\005\032\240z\364e") at src/string/strlen.c:17
17		for (w = (const void *)s; !HASZERO(*w); w++);
(gdb) bt
#0  strlen (s=0x0, s@entry=0x48d79d <load_dynamic_module+120> "\t\360\"\223\f\234\200\353\216#\005\032\240z\364e") at src/string/strlen.c:17
#1  0x0048d5db in load_dlopen (resource_in=0x77d8f52b <parseHhMmSs+578> "D\350\240e", resource_in@entry=0x5fa910 "res_pjproject.so", so_ext=0x0, 
    so_ext@entry=0x53b91c "", filename=0x5fa910 "res_pjproject.so", filename@entry=0x7fff7a04 "/usr/lib/asterisk/modules/res_pjproject.so", flags=796226418, 
    flags@entry=258, suppress_logging=suppress_logging@entry=0) at loader.c:955
#2  0x0048d79d in load_dynamic_module (resource_in=resource_in@entry=0x5fa910 "res_pjproject.so", suppress_logging=suppress_logging@entry=1) at loader.c:1039
#3  0x0048eea3 in load_resource (resource_name=0x5fa910 "res_pjproject.so", suppress_logging=suppress_logging@entry=1, 
    module_priorities=module_priorities@entry=0x7fff8c24, required=0, preload=0) at loader.c:1635
#4  0x0048f5e1 in load_resource_list (mod_count=<synthetic pointer>, load_order=0x7fff8c1c) at loader.c:1984
#5  load_modules () at loader.c:2232
#6  0x0042c99d in asterisk_daemon (isroot=<optimized out>, rungroup=<optimized out>, runuser=<optimized out>) at asterisk.c:4146
#7  main (argc=<optimized out>, argv=<optimized out>) at asterisk.c:3918
(gdb)

Any help appreciated!

Kind regards,
Seb

Patch hide | download patch | download mbox

--- a/include/asterisk/astmm.h
+++ b/include/asterisk/astmm.h
@@ -113,8 +113,21 @@  int __ast_repl_vasprintf(char **strp, co
  */
 
 #if !defined(ASTMM_LIBC)
+#if defined(__UCLIBC__) || defined(__GLIBC__)
 /* BLOCK libc allocators by default. */
 #define ASTMM_LIBC ASTMM_BLOCK
+#else
+/* musl */
+#define ASTMM_LIBC ASTMM_IGNORE
+#endif
+#endif
+
+#if !defined(__UCLIBC__) && !defined(__GLIBC__)
+#if ASTMM_LIBC == ASTMM_REDIRECT
+/* musl */
+#undef ASTMM_LIBC
+#define ASTMM_LIBC ASTMM_IGNORE
+#endif
 #endif
 
 #if ASTMM_LIBC == ASTMM_IGNORE

Comments

Rich Felker Feb. 28, 2019, 9:15 p.m.
On Thu, Feb 28, 2019 at 09:27:51PM +0100, Sebastian Kemper wrote:
> Hello list,
> 
> I'm trying to update Asterisk 15 to 16 on OpenWrt. It's apparently not
> easy %-).
> 
> The first thing I ran into was that now the header astmm.h is included
> by asterisk.h. In earlier releases this header was only included when
> doing malloc debugging it seems.
> 
> astmm.h redefines some memory allocation functions as far as I
> understand. musl doesn't seem to like this very much and the compile
> fails. I worked around this with a small patch:
> 
> --- a/include/asterisk/astmm.h
> +++ b/include/asterisk/astmm.h
> @@ -113,8 +113,21 @@ int __ast_repl_vasprintf(char **strp, co
>   */
>  
>  #if !defined(ASTMM_LIBC)
> +#if defined(__UCLIBC__) || defined(__GLIBC__)
>  /* BLOCK libc allocators by default. */
>  #define ASTMM_LIBC ASTMM_BLOCK
> +#else
> +/* musl */
> +#define ASTMM_LIBC ASTMM_IGNORE
> +#endif
> +#endif
> +
> +#if !defined(__UCLIBC__) && !defined(__GLIBC__)
> +#if ASTMM_LIBC == ASTMM_REDIRECT
> +/* musl */
> +#undef ASTMM_LIBC
> +#define ASTMM_LIBC ASTMM_IGNORE
> +#endif
>  #endif
>  
>  #if ASTMM_LIBC == ASTMM_IGNORE
> 
> With this patch applied I can now compile Asterisk 16 against musl.
> 
> But when I tried to run the binary I ran into a segmentation fault. It
> starts to initialize - and everything looks good - until it attempts to
> load the modules/plugins.
> 
> I connected to the box with a remote gdb and extracted some information.
> The segmentation fault seems to happen in main/loader.c
> (https://github.com/asterisk/asterisk/blob/16.2/main/loader.c),
> specifically in load_dlopen() I think.
> 
> I'm not a programmer so I have problems making sense of this. So I'm
> hoping that maybe one of you can shine a light.
> 
> I set a break point in the mentioned file, line 952. Then I stepped
> forward.
> 
> (gdb) 
> 
> Thread 1 "asterisk" hit Breakpoint 1, load_dlopen (resource_in=0x77d8f52b <parseHhMmSs+578> "D\350\240e", resource_in@entry=0x5fa910 "res_pjproject.so", so_ext=0x0, 
>     so_ext@entry=0x53b91c "", filename=0x5fa910 "res_pjproject.so", filename@entry=0x7fff7a04 "/usr/lib/asterisk/modules/res_pjproject.so", flags=796226418, 
>     flags@entry=258, suppress_logging=suppress_logging@entry=0) at loader.c:952
> 952		if (resource_being_loaded) {
> (gdb) 
> 951		mod->lib = dlopen(filename, flags);
> (gdb) 
> 952		if (resource_being_loaded) {
> (gdb) 
> 955			const char *dlerror_msg = ast_strdupa(dlerror());
> (gdb) 
> 
> Thread 1 "asterisk" received signal SIGSEGV, Segmentation fault.
> strlen (s=0x0, s@entry=0x48d79d <load_dynamic_module+120> "\t\360\"\223\f\234\200\353\216#\005\032\240z\364e") at src/string/strlen.c:17
> 17		for (w = (const void *)s; !HASZERO(*w); w++);
> (gdb) bt
> #0  strlen (s=0x0, s@entry=0x48d79d <load_dynamic_module+120> "\t\360\"\223\f\234\200\353\216#\005\032\240z\364e") at src/string/strlen.c:17
> #1  0x0048d5db in load_dlopen (resource_in=0x77d8f52b <parseHhMmSs+578> "D\350\240e", resource_in@entry=0x5fa910 "res_pjproject.so", so_ext=0x0, 
>     so_ext@entry=0x53b91c "", filename=0x5fa910 "res_pjproject.so", filename@entry=0x7fff7a04 "/usr/lib/asterisk/modules/res_pjproject.so", flags=796226418, 
>     flags@entry=258, suppress_logging=suppress_logging@entry=0) at loader.c:955
> #2  0x0048d79d in load_dynamic_module (resource_in=resource_in@entry=0x5fa910 "res_pjproject.so", suppress_logging=suppress_logging@entry=1) at loader.c:1039
> #3  0x0048eea3 in load_resource (resource_name=0x5fa910 "res_pjproject.so", suppress_logging=suppress_logging@entry=1, 
>     module_priorities=module_priorities@entry=0x7fff8c24, required=0, preload=0) at loader.c:1635
> #4  0x0048f5e1 in load_resource_list (mod_count=<synthetic pointer>, load_order=0x7fff8c1c) at loader.c:1984
> #5  load_modules () at loader.c:2232
> #6  0x0042c99d in asterisk_daemon (isroot=<optimized out>, rungroup=<optimized out>, runuser=<optimized out>) at asterisk.c:4146
> #7  main (argc=<optimized out>, argv=<optimized out>) at asterisk.c:3918
> (gdb)
> 
> Any help appreciated!
> 
> Kind regards,
> Seb

The immediate cause of the crash seems to be passing a null pointer to
ast_strdupa, where dlerror() returns a null pointer because dlopen did
not fail (they either need to check the return value of dlopen or
check the return value of dlerror before using it; preferably the
former since dlerror could retain an outdated error from an earlier
call).

It looks like they expect resource_being_loaded to be 0 if the dlopen
succeeds; assuming the intent is that some constructor in the loaded
library sets it to 0, this is only valid for the first open. I'm
guessing they've introduced some new plugin architecture that's not
compatible with dlopen being permanent like it is in musl. The right
way to fix this is probably to get rid of the constructors and instead
have an explicit init function that their loader finds with dlsym and
calls after dlopen succeeds. It looks like they also have support for
building the modules builtin to the main program, rather than loading
them by dlopen; this may be able to work around the problem.

I think they have a lot of bugs and misunderstandings in this area.
For instance, this commit is completely bogus:

https://github.com/asterisk/asterisk/commit/23aa20bf209dbc02b01c757c5f5f1a787548ed96

I don't see where the behavior you're seeing was introduced, though.

Rich
Szabolcs Nagy Feb. 28, 2019, 9:19 p.m.
* Sebastian Kemper <sebastian_ml@gmx.net> [2019-02-28 21:27:51 +0100]:
> I'm not a programmer so I have problems making sense of this. So I'm
> hoping that maybe one of you can shine a light.

it means dlerror() returned 0 and ast_strdupa calls strlen on
this return value which segfaults as expected.

it is entirely valid for dlerror() to return 0 if there was no
error. did the dlopen("res_pjproject.so", flags) call succeed?
i.e. mod->lib != 0 ?.

e.g. the segfault can be avoided by

-			const char *dlerror_msg = ast_strdupa(dlerror());
+			const char *dlerror_msg = dlerror(); dlerror_msg = ast_strdupa(dlerror_msg ? dlerror_msg : "");

but we would need to know what this code is trying to do
(and how it worked before) for a proper fix.

> 
> I set a break point in the mentioned file, line 952. Then I stepped
> forward.
> 
> (gdb) 
> 
> Thread 1 "asterisk" hit Breakpoint 1, load_dlopen (resource_in=0x77d8f52b <parseHhMmSs+578> "D\350\240e", resource_in@entry=0x5fa910 "res_pjproject.so", so_ext=0x0, 
>     so_ext@entry=0x53b91c "", filename=0x5fa910 "res_pjproject.so", filename@entry=0x7fff7a04 "/usr/lib/asterisk/modules/res_pjproject.so", flags=796226418, 
>     flags@entry=258, suppress_logging=suppress_logging@entry=0) at loader.c:952
> 952		if (resource_being_loaded) {
> (gdb) 
> 951		mod->lib = dlopen(filename, flags);
> (gdb) 
> 952		if (resource_being_loaded) {
> (gdb) 
> 955			const char *dlerror_msg = ast_strdupa(dlerror());
> (gdb) 
> 
> Thread 1 "asterisk" received signal SIGSEGV, Segmentation fault.
> strlen (s=0x0, s@entry=0x48d79d <load_dynamic_module+120> "\t\360\"\223\f\234\200\353\216#\005\032\240z\364e") at src/string/strlen.c:17
> 17		for (w = (const void *)s; !HASZERO(*w); w++);
> (gdb) bt
> #0  strlen (s=0x0, s@entry=0x48d79d <load_dynamic_module+120> "\t\360\"\223\f\234\200\353\216#\005\032\240z\364e") at src/string/strlen.c:17
> #1  0x0048d5db in load_dlopen (resource_in=0x77d8f52b <parseHhMmSs+578> "D\350\240e", resource_in@entry=0x5fa910 "res_pjproject.so", so_ext=0x0, 
>     so_ext@entry=0x53b91c "", filename=0x5fa910 "res_pjproject.so", filename@entry=0x7fff7a04 "/usr/lib/asterisk/modules/res_pjproject.so", flags=796226418, 
>     flags@entry=258, suppress_logging=suppress_logging@entry=0) at loader.c:955
> #2  0x0048d79d in load_dynamic_module (resource_in=resource_in@entry=0x5fa910 "res_pjproject.so", suppress_logging=suppress_logging@entry=1) at loader.c:1039
> #3  0x0048eea3 in load_resource (resource_name=0x5fa910 "res_pjproject.so", suppress_logging=suppress_logging@entry=1, 
>     module_priorities=module_priorities@entry=0x7fff8c24, required=0, preload=0) at loader.c:1635
> #4  0x0048f5e1 in load_resource_list (mod_count=<synthetic pointer>, load_order=0x7fff8c1c) at loader.c:1984
> #5  load_modules () at loader.c:2232
> #6  0x0042c99d in asterisk_daemon (isroot=<optimized out>, rungroup=<optimized out>, runuser=<optimized out>) at asterisk.c:4146
> #7  main (argc=<optimized out>, argv=<optimized out>) at asterisk.c:3918
> (gdb)
> 
> Any help appreciated!
> 
> Kind regards,
> Seb
Jeffrey Walton Feb. 28, 2019, 9:32 p.m.
On Thu, Feb 28, 2019 at 4:15 PM Rich Felker <dalias@libc.org> wrote:
>
> On Thu, Feb 28, 2019 at 09:27:51PM +0100, Sebastian Kemper wrote:
> >
> > I'm trying to update Asterisk 15 to 16 on OpenWrt. It's apparently not
> > easy %-).
> >
> > ...
> It looks like they expect resource_being_loaded to be 0 if the dlopen
> succeeds; assuming the intent is that some constructor in the loaded
> library sets it to 0, this is only valid for the first open. I'm
> guessing they've introduced some new plugin architecture that's not
> compatible with dlopen being permanent like it is in musl. The right
> way to fix this is probably to get rid of the constructors and instead
> have an explicit init function that their loader finds with dlsym and
> calls after dlopen succeeds. It looks like they also have support for
> building the modules builtin to the main program, rather than loading
> them by dlopen; this may be able to work around the problem.
>
> I think they have a lot of bugs and misunderstandings in this area.
> For instance, this commit is completely bogus:
>
> https://github.com/asterisk/asterisk/commit/23aa20bf209d

I'd hazard a guess that they are experiencing Static Initialization
Order Fiasco problems with their C++ objects. No fun...

The last time I checked (several years ago), Asterisk also had a load
of UBSan findings. At the time they were not performing regular
testing with -fsanitize=undefined. It caused a lot of problems for me
during acceptance testing. That may have changed, though.

Jeff
Sebastian Kemper Feb. 28, 2019, 10:01 p.m.
On Thu, Feb 28, 2019 at 10:19:02PM +0100, Szabolcs Nagy wrote:
> * Sebastian Kemper <sebastian_ml@gmx.net> [2019-02-28 21:27:51 +0100]:
> > I'm not a programmer so I have problems making sense of this. So I'm
> > hoping that maybe one of you can shine a light.
> 
> it means dlerror() returned 0 and ast_strdupa calls strlen on
> this return value which segfaults as expected.
> 
> it is entirely valid for dlerror() to return 0 if there was no
> error. did the dlopen("res_pjproject.so", flags) call succeed?
> i.e. mod->lib != 0 ?.

Hello Szabolcs,

Thanks for your reply. And thanks to Rich & Jeffrey as well, of course.
Again I'm blown away by how fast one gets a reply on the musl list :-)

I stepped through it again and tried to get the value of mod->lib:

952		if (resource_being_loaded) {
(gdb) step
951		mod->lib = dlopen(filename, flags);
(gdb) step
952		if (resource_being_loaded) {
(gdb) p mod->lib
$10 = (void *) 0x5fc210
(gdb) p mod->lib
$11 = (void *) 0x5fc210
(gdb) p mod->lib
$12 = (void *) 0x5fc210
(gdb) p mod->lib
$13 = (void *) 0x5fc210
(gdb) p (*mod).lib
$14 = (void *) 0x5fc210
(gdb) p (*mod).li
There is no member named li.
(gdb) p (*mod).lib
$15 = (void *) 0x5fc210
(gdb) step
955			const char *dlerror_msg = ast_strdupa(dlerror());
(gdb) step

Thread 1 "asterisk" received signal SIGSEGV, Segmentation fault.
strlen (s=0x0, s@entry=0x48d79d <load_dynamic_module+120> "\t\360\"\223\f\234\200\353\216#\005\032\240z\364e") at src/string/strlen.c:17
17		for (w = (const void *)s; !HASZERO(*w); w++);
(gdb)

So if I'm not mistaken mod->lib is indeed != 0.

> e.g. the segfault can be avoided by
> 
> -                     const char *dlerror_msg = ast_strdupa(dlerror());
> +                     const char *dlerror_msg = dlerror(); dlerror_msg = ast_strdupa(dlerror_msg ? dlerror_msg : "");
> 
> but we would need to know what this code is trying to do
> (and how it worked before) for a proper fix.

astmm.h defines ast_strdupa() like this:

#if !defined(ast_strdupa) && defined(__GNUC__)
/*!
 * \brief duplicate a string in memory from the stack
 * \param s The string to duplicate
 *
 * This macro will duplicate the given string.  It returns a pointer to the stack
 * allocatted memory for the new string.
 */
#define ast_strdupa(s)                                                    \
        (__extension__                                                    \
        ({                                                                \
                const char *__old = (s);                                  \
                size_t __len = strlen(__old) + 1;                         \
                char *__new = __builtin_alloca(__len);                    \
                memcpy (__new, __old, __len);                             \
                __new;                                                    \
        }))
#endif

This define is the same in both Asterisk 15 and 16 (the define only has
moved from include/asterisk/utils.h to include/asterisk/astmm.h).

The 'static struct ast_module *load_dlopen()' in which it's used has
changed, though, between Asterisk 15 and 16:

@@ -569,13 +950,60 @@
        resource_being_loaded = mod;
        mod->lib = dlopen(filename, flags);
        if (resource_being_loaded) {
+               struct ast_str *list;
+               int c = 0;
+               const char *dlerror_msg = ast_strdupa(dlerror());
+
                resource_being_loaded = NULL;
                if (mod->lib) {
-                       ast_log(LOG_ERROR, "Module '%s' did not register itself during load\n", resource_in);
+                       module_load_error("Module '%s' did not register itself during load\n", resource_in);
                        logged_dlclose(resource_in, mod->lib);
-               } else if (!suppress_logging) {
-                       ast_log(LOG_WARNING, "Error loading module '%s': %s\n", resource_in, dlerror());
+
+                       goto error_return;
+               }
+
+               if (suppress_logging) {
+                       goto error_return;
                }
+
+               resource_being_loaded = mod;
+               mod->lib = dlopen(filename, RTLD_LAZY | RTLD_LOCAL);
+               if (resource_being_loaded) {
+                       resource_being_loaded = NULL;
+
+                       module_load_error("Error loading module '%s': %s\n", resource_in, dlerror_msg);
+                       logged_dlclose(resource_in, mod->lib);
+
+                       goto error_return;
+               }
+
+               list = ast_str_create(64);
+               if (list) {
+                       if (module_post_register(mod)) {
+                               goto loaded_error;
+                       }
+
+                       c = load_dlopen_missing(&list, &mod->requires);
+                       c += load_dlopen_missing(&list, &mod->enhances);
+#ifndef OPTIONAL_API
+                       c += load_dlopen_missing(&list, &mod->optional_modules);
+#endif
+               }
+
+               if (list && ast_str_strlen(list)) {
+                       module_load_error("Error loading module '%s', missing %s: %s\n",
+                               resource_in, c == 1 ? "dependency" : "dependencies", ast_str_buffer(list));
+               } else {
+                       module_load_error("Error loading module '%s': %s\n", resource_in, dlerror_msg);
+               }
+
+loaded_error:
+               ast_free(list);
+               unload_dynamic_module(mod);
+
+               return NULL;
+
+error_return:
                ast_free(mod);

                return NULL;
@@ -584,12 +1012,11 @@

So 'const char *dlerror_msg' seems to only be used for logging purposes.
Should I give the proposed change a spin?

I'm off to bed now. I would get back to you tomorrow evening.

Again thank you all, much appreciated!

With kind regards,
Seb
Sebastian Kemper March 2, 2019, 5:16 p.m.
On Thu, Feb 28, 2019 at 10:19:02PM +0100, Szabolcs Nagy wrote:
> e.g. the segfault can be avoided by
> 
> -			const char *dlerror_msg = ast_strdupa(dlerror());
> +			const char *dlerror_msg = dlerror(); dlerror_msg = ast_strdupa(dlerror_msg ? dlerror_msg : "");
> 
> but we would need to know what this code is trying to do
> (and how it worked before) for a proper fix.

Hello Szabolcs,

This seems to work. I was able to compile and run asterisk with this
(and make some phone calls).

I will raise a bug report upstream for this and point them here. I'm not
going to raise a pull request with your patch as upstream's handling of
pull requests includes signing some agreement etc.

I'll just point them to this thread and see what comes out of it.

Thanks again!

Kind regards,
Seb
Rich Felker March 2, 2019, 5:36 p.m.
On Sat, Mar 02, 2019 at 06:16:42PM +0100, Sebastian Kemper wrote:
> On Thu, Feb 28, 2019 at 10:19:02PM +0100, Szabolcs Nagy wrote:
> > e.g. the segfault can be avoided by
> > 
> > -			const char *dlerror_msg = ast_strdupa(dlerror());
> > +			const char *dlerror_msg = dlerror(); dlerror_msg = ast_strdupa(dlerror_msg ? dlerror_msg : "");
> > 
> > but we would need to know what this code is trying to do
> > (and how it worked before) for a proper fix.
> 
> Hello Szabolcs,
> 
> This seems to work. I was able to compile and run asterisk with this
> (and make some phone calls).
> 
> I will raise a bug report upstream for this and point them here. I'm not
> going to raise a pull request with your patch as upstream's handling of
> pull requests includes signing some agreement etc.
> 
> I'll just point them to this thread and see what comes out of it.
> 
> Thanks again!

Please be aware: This is just fixing the crash, but isn't fixing the
logic error in the Asterisk software that led to it. I'm pretty sure
module loading is not working correctly. Actually fixing this is going
to be a much bigger task.

Rich
Sebastian Kemper March 2, 2019, 5:59 p.m.
On Sat, Mar 02, 2019 at 12:36:09PM -0500, Rich Felker wrote:
> Please be aware: This is just fixing the crash, but isn't fixing the
> logic error in the Asterisk software that led to it. I'm pretty sure
> module loading is not working correctly. Actually fixing this is going
> to be a much bigger task.

Hi Rich,

Thanks, I mentioned that.

https://issues.asterisk.org/jira/browse/ASTERISK-28319

Kind regards,
Seb