public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Carlos O'Donell <carlos@redhat.com>
To: Florian Weimer <fweimer@redhat.com>, libc-alpha@sourceware.org
Subject: Re: [PATCH v2] elf: Make more functions available for binding during dlclose (bug 30425)
Date: Thu, 25 May 2023 14:57:28 -0400	[thread overview]
Message-ID: <b977747f-e5a2-1f42-e65e-ac944b51be43@redhat.com> (raw)
In-Reply-To: <87mt1w4n54.fsf@oldenburg3.str.redhat.com>

On 5/22/23 07:32, Florian Weimer via Libc-alpha wrote:
> Previously, after destructors for a DSO have been invoked, ld.so refused
> to bind against that DSO in all cases.  Relax this restriction somewhat
> if the referencing object is itself a DSO that is being unloaded.  This
> assumes that the symbol reference is not going to be stored anywhere.

The truth here is that the example has a circular reference due to the interposition
which makes it have an undefined load and unload order. You wrote a very specific test
that *avoids* needing anything from mod1 during mod2's initialization.

Having said all that we should *choose* an unload order that is the opposite of the
load order and make it consistent, and if we could load it we should not fail to
unload it because of a limitation of the dynamic loader. We might still fail due
to some logical dependency in user code though.

The DT_NEEDED in mod1 ensures we load: mod2 then mod1.
The closing of mod1 should unload: mod1 then mod2 (opposite order).

This consistent order should make it easier for users to debug other problems in
their library designs.

> The situation in the test case can arise fairly easily with C++ and
> objects that are built with different optimization levels and therefore
> define different functions with vague linkage.

Please post a v3.

Passes pre-commit CI - Thank you!
https://patchwork.sourceware.org/project/glibc/patch/87mt1w4n54.fsf@oldenburg3.str.redhat.com/
 
> ---
> v2: Expanded comment in elf/dl-lookup.c.
>  elf/Makefile                |  8 ++++++++
>  elf/dl-lookup.c             | 21 +++++++++++++++++--
>  elf/tst-dlclose-lazy-mod1.c | 35 ++++++++++++++++++++++++++++++++
>  elf/tst-dlclose-lazy-mod2.c | 49 +++++++++++++++++++++++++++++++++++++++++++++
>  elf/tst-dlclose-lazy.c      | 36 +++++++++++++++++++++++++++++++++
>  5 files changed, 147 insertions(+), 2 deletions(-)
> 
> diff --git a/elf/Makefile b/elf/Makefile
> index e262f3e6b1..be487b9cfb 100644
> --- a/elf/Makefile
> +++ b/elf/Makefile
> @@ -394,6 +394,7 @@ tests += \
>    tst-debug1 \
>    tst-deep1 \
>    tst-dl-is_dso \
> +  tst-dlclose-lazy \

OK. Add a test. No DT_NEEDED on mod1 or mod2.

>    tst-dlmodcount \
>    tst-dlmopen-dlerror \
>    tst-dlmopen-gethostbyname \
> @@ -813,6 +814,8 @@ modules-names += \
>    tst-dl_find_object-mod7 \
>    tst-dl_find_object-mod8 \
>    tst-dl_find_object-mod9 \
> +  tst-dlclose-lazy-mod1 \
> +  tst-dlclose-lazy-mod2 \

OK. Adds two DSOs.

>    tst-dlmopen-dlerror-mod \
>    tst-dlmopen-gethostbyname-mod \
>    tst-dlmopen-twice-mod1 \
> @@ -2997,3 +3000,8 @@ $(objpfx)tst-sprof-basic.out: tst-sprof-basic.sh $(objpfx)tst-sprof-basic
>  		 '$(run-program-env)' > $@; \
>  	$(evaluate-test)
>  generated += tst-sprof-mod.so.profile
> +
> +LDFLAGS-tst-dlclose-lazy-mod1.so = -Wl,-z,lazy,--no-as-needed

OK. No DF_BIND_NOW in DT_FLAGS.

> +$(objpfx)tst-dlclose-lazy-mod1.so: $(objpfx)tst-dlclose-lazy-mod2.so

OK. mod1 deps on mod2. mod1 will have explicit DT_NEEDED on mod2.

> +$(objpfx)tst-dlclose-lazy.out: \
> +  $(objpfx)tst-dlclose-lazy-mod1.so $(objpfx)tst-dlclose-lazy-mod2.so

OK. Output depends on mod1 and mod2 being built first, but not the binary itself.

> diff --git a/elf/dl-lookup.c b/elf/dl-lookup.c
> index 05f36a2507..a8f48fed12 100644
> --- a/elf/dl-lookup.c
> +++ b/elf/dl-lookup.c
> @@ -366,8 +366,25 @@ do_lookup_x (const char *undef_name, unsigned int new_hash,
>        if ((type_class & ELF_RTYPE_CLASS_COPY) && map->l_type == lt_executable)
>  	continue;
>  
> -      /* Do not look into objects which are going to be removed.  */
> -      if (map->l_removed)

OK. This line was added originally in 2005 as part of the removal of l_opencount.

diff --git a/elf/do-lookup.h b/elf/do-lookup.h
index c89638980e..62755ea013 100644
--- a/elf/do-lookup.h
+++ b/elf/do-lookup.h
@@ -52,6 +52,10 @@ do_lookup_x (const char *undef_name, unsigned long int hash,
       if ((type_class & ELF_RTYPE_CLASS_COPY) && map->l_type == lt_executable)
        continue;
 
+      /* Do not look into objects which are going to be removed.  */
+      if (map->l_removed)
+       continue;
+
       /* Print some debugging info if wanted.  */
       if (__builtin_expect (GLRO(dl_debug_mask) & DL_DEBUG_SYMBOLS, 0))
        _dl_debug_printf ("symbol=%s;  lookup in file=%s [%lu]\n",
---



> +      /* Do not look into objects which are going to be removed,
> +	 except when the referencing object itself is being removed.
> +
> +	 The second part covers the situation when an object lazily
> +	 binds to another object while running its destructor, but the
> +	 destructor of the other object has already run, so that
> +	 dlclose has set l_removed.  It may not always be obvious how
> +	 to avoid such a scenario to programmers creating DSOs,
> +	 particularly if C++ vague linkage is involved and triggers
> +	 symbol interposition.
> +
> +	 Accepting these to-be-removed objects makes the lazy and
> +	 BIND_NOW cases more similar.  (With BIND_NOW, the symbol is

OK. Agree 100%, making these both more similar is a worthy goal.

> +	 resolved early, before the destructor call, so the issue does
> +	 not arise.).  Behavior matches the constructor scenario: the
> +	 implementation allows binding to symbols of objects whose
> +	 constructors have not run.  In fact, not doing this would be
> +	 mostly incompatible with symbol interposition.  */

OK. Agreed on both the constructor and destructor reasoning.

> +      if (map->l_removed && !(undef_map != NULL && undef_map->l_removed))
>  	continue;
>  
>        /* Print some debugging info if wanted.  */
> diff --git a/elf/tst-dlclose-lazy-mod1.c b/elf/tst-dlclose-lazy-mod1.c
> new file mode 100644
> index 0000000000..5535a6b9e0
> --- /dev/null
> +++ b/elf/tst-dlclose-lazy-mod1.c
> @@ -0,0 +1,35 @@
> +/* Lazy binding during dlclose.  Directly loaded module.
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +/* This function is called from exported_function below.  It is only
> +   defined in this module.  */
> +void __attribute__ ((weak))

Why does this need to be weak?

Even without weak the call below goes through the PLT on x86_64 and so it can be
lazily bound. Are there some ABIs where it doesn't unless it's weak?

> +lazily_bound_exported_function (void)
> +{
> +}
> +
> +/* Called from tst-dlclose-lazy-mod2.so.  */
> +void
> +exported_function (int call_it)
> +{
> +  if (call_it)
> +    /* This used crash if called during dlcose because after invoking
> +       the destructor as part of dlclose, symbols were no longer
> +       available for binding (bug 30425).  */

Suggest:

/* Previous to the fix this would crash when called during dclose since
   symbols from the DSO were no longer available for binding (bug 30425)
   after the DSO started being closed by dlclose.  */

> +    lazily_bound_exported_function ();
> +}
> diff --git a/elf/tst-dlclose-lazy-mod2.c b/elf/tst-dlclose-lazy-mod2.c
> new file mode 100644
> index 0000000000..767f69ffdb
> --- /dev/null
> +++ b/elf/tst-dlclose-lazy-mod2.c
> @@ -0,0 +1,49 @@
> +/* Lazy binding during dlclose.  Indirectly loaded module.
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <stdio.h>
> +#include <stdlib.h>
> +
> +void
> +exported_function (int ignored)
> +{
> +  /* This function is interposed from tst-dlclose-lazy-mod1.so and
> +     thus never called.  */
> +  abort ();
> +}
> +
> +static void __attribute__ ((constructor))
> +init (void)
> +{
> +  puts ("info: tst-dlclose-lazy-mod2.so constructor called");
> +
> +  /* Trigger lazy binding to the definition in
> +     tst-dlclose-lazy-mod1.so, but not for
> +     lazily_bound_exported_function in that module.  */
> +  exported_function (0);

OK. Yes, since mod1 -> mod2 the mod1:exported_function() interposes mod2's version.

I built this to see what it looked like and review the results:

   3606820:	symbol=dlclose;  lookup in file=/home/carlos/build/glibc-pristine/elf/tst-dlclose-lazy [0]
   3606820:	symbol=dlclose;  lookup in file=/home/carlos/build/glibc-pristine/libc.so.6 [0]
   3606820:	binding file /home/carlos/build/glibc-pristine/elf/tst-dlclose-lazy [0] to /home/carlos/build/glibc-pristine/libc.so.6 [0]: normal symbol `dlclose' [GLIBC_2.34]
   3606820:	
   3606820:	calling fini: /home/carlos/build/glibc-pristine/elf/tst-dlclose-lazy-mod1.so [0]

The dlclose starts finalizing mod1.

   3606820:	
   3606820:	symbol=__cxa_finalize;  lookup in file=/home/carlos/build/glibc-pristine/elf/tst-dlclose-lazy [0]
   3606820:	symbol=__cxa_finalize;  lookup in file=/home/carlos/build/glibc-pristine/libc.so.6 [0]
   3606820:	binding file /home/carlos/build/glibc-pristine/elf/tst-dlclose-lazy-mod1.so [0] to /home/carlos/build/glibc-pristine/libc.so.6 [0]: normal symbol `__cxa_finalize' [GLIBC_2.2.5]
   3606820:	

With mod1 fully finalized we move on to mod2.

   3606820:	calling fini: /home/carlos/build/glibc-pristine/elf/tst-dlclose-lazy-mod2.so [0]
   3606820:	
info: tst-dlclose-lazy-mod2.so destructor called
   3606820:	symbol=lazily_bound_exported_function;  lookup in file=/home/carlos/build/glibc-pristine/elf/tst-dlclose-lazy [0]
   3606820:	symbol=lazily_bound_exported_function;  lookup in file=/home/carlos/build/glibc-pristine/libc.so.6 [0]
   3606820:	symbol=lazily_bound_exported_function;  lookup in file=/home/carlos/build/glibc-pristine/elf/ld.so [0]
   3606820:	symbol=lazily_bound_exported_function;  lookup in file=/home/carlos/build/glibc-pristine/elf/tst-dlclose-lazy-mod1.so [0]
   3606820:	binding file /home/carlos/build/glibc-pristine/elf/tst-dlclose-lazy-mod1.so [0] to /home/carlos/build/glibc-pristine/elf/tst-dlclose-lazy-mod1.so [0]: normal symbol `lazily_bound_exported_function'

Here we find mod1 in the scope since dlclose is not done yet and we bind to it.

   3606820:	symbol=__cxa_finalize;  lookup in file=/home/carlos/build/glibc-pristine/elf/tst-dlclose-lazy [0]
   3606820:	symbol=__cxa_finalize;  lookup in file=/home/carlos/build/glibc-pristine/libc.so.6 [0]
   3606820:	binding file /home/carlos/build/glibc-pristine/elf/tst-dlclose-lazy-mod2.so [0] to /home/carlos/build/glibc-pristine/libc.so.6 [0]: normal symbol `__cxa_finalize' [GLIBC_2.2.5]
   3606820:	
   3606820:	file=/home/carlos/build/glibc-pristine/elf/tst-dlclose-lazy-mod1.so [0];  destroying link map
   3606820:	
   3606820:	file=/home/carlos/build/glibc-pristine/elf/tst-dlclose-lazy-mod2.so [0];  destroying link map
   3606820:	
   3606820:	calling fini:  [0]
   3606820:	
   3606820:	symbol=__cxa_finalize;  lookup in file=/home/carlos/build/glibc-pristine/elf/tst-dlclose-lazy [0]
   3606820:	symbol=__cxa_finalize;  lookup in file=/home/carlos/build/glibc-pristine/libc.so.6 [0]
   3606820:	binding file /home/carlos/build/glibc-pristine/elf/tst-dlclose-lazy [0] to /home/carlos/build/glibc-pristine/libc.so.6 [0]: normal symbol `__cxa_finalize' [GLIBC_2.2.5]
   3606820:	
   3606820:	calling fini: /home/carlos/build/glibc-pristine/libc.so.6 [0]
   3606820:	
   3606820:	
   3606820:	calling fini: /home/carlos/build/glibc-pristine/elf/ld.so [0]
   3606820:	


> +}
> +
> +static void __attribute__ ((destructor))
> +fini (void)
> +{
> +  puts ("info: tst-dlclose-lazy-mod2.so destructor called");
> +
> +  /* Trigger the lazily_bound_exported_function call in
> +     exported_function in tst-dlclose-lazy-mod1.so.  */
> +  exported_function (1);
> +}
> diff --git a/elf/tst-dlclose-lazy.c b/elf/tst-dlclose-lazy.c
> new file mode 100644
> index 0000000000..8207d134f3
> --- /dev/null
> +++ b/elf/tst-dlclose-lazy.c
> @@ -0,0 +1,36 @@
> +/* Test lazy binding during dlclose (bug 30425).
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <dlfcn.h>
> +#include <support/xdlfcn.h>
> +#include <support/check.h>

Suggest something like this to explain the intent of the test:

/* The intent of the test is to verify that a circular dependency
   between two shared objects; when that dependency is created
   by symbol interposition or C++ vague linkage, can be correctly
   loaded and unloaded.  While the load and unload order is technically
   undefined, the implementation chooses a specific order and it should
   be possible to load *and* unload safely.  In this specific test the
   mod1 DSO depends on mod2, but mod2 depends on mod1 due to the
   relocation dependency of the interposed mod1 symbols.  Loading mod1
   causes mod2 to be loaded first because of the direct DT_NEEDED
   dependency.  Unloading mod1 should unload mod1 first, and then mod2,
   but during mod2's unload the loader should not abort, crash, or fail
   if mod2 references symbols in mod1, including for lazy binding.
   The intent is to ensure that developers get consistent behaviour
   from the loader and that they can then continue to debug other
   problematic dependencies in the application design.  */

> +
> +int
> +main (void)
> +{
> +  /* Load tst-dlclose-lazy-mod1.so, indirectly loading
> +     tst-dlclose-lazy-mod2.so.  */
> +  void *handle = xdlopen ("tst-dlclose-lazy-mod1.so", RTLD_GLOBAL | RTLD_LAZY);
> +
> +  /* Invoke the destructor of tst-dlclose-lazy-mod2.so, which calls
> +     into tst-dlclose-lazy-mod1.so after its destructor has been
> +     called.  */
> +  xdlclose (handle);

OK. This attempts to close mod1, which first attempts to close the dependency
mod2, which triggers mod2 desctructors to run the mod2 destructor lazily binds
lazily_bound_exported_function as exported_function(1) is called.

> +
> +  return 0;
> +}
> 
> base-commit: e1b02c5ed4099a53db8f356303fc0ef88db8a131
> 

-- 
Cheers,
Carlos.


  reply	other threads:[~2023-05-25 18:57 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-22 11:32 Florian Weimer
2023-05-25 18:57 ` Carlos O'Donell [this message]
2023-05-25 19:06   ` Florian Weimer
2023-05-29 20:09     ` Carlos O'Donell
2023-05-30  8:59       ` Florian Weimer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b977747f-e5a2-1f42-e65e-ac944b51be43@redhat.com \
    --to=carlos@redhat.com \
    --cc=fweimer@redhat.com \
    --cc=libc-alpha@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).