public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug dynamic-link/15199] New: dlopening a load-time library from an earlier library's initializer corrupts TLS state
@ 2013-02-25 22:14 luto at mit dot edu
  2013-02-26 19:07 ` [Bug dynamic-link/15199] " luto at mit dot edu
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: luto at mit dot edu @ 2013-02-25 22:14 UTC (permalink / raw)
  To: glibc-bugs

http://sourceware.org/bugzilla/show_bug.cgi?id=15199

             Bug #: 15199
           Summary: dlopening a load-time library from an earlier
                    library's initializer corrupts TLS state
           Product: glibc
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: dynamic-link
        AssignedTo: unassigned@sourceware.org
        ReportedBy: luto@mit.edu
    Classification: Unclassified


I don't have a simple testcase -- I'm triggering this by LD_PRELOADING
libtcmalloc, but I don't think this issue has anything to do with libtcmalloc. 
It's very sensitive to the exact order of libraries I'm using, though.

My libtcmalloc is linked in at load time, and it ends up with modid 1.  I have
only one thread (i.e. the main thread).  In _dl_allocate_tls_init, dtv[1] is
set to a freshly allocated block of zeros.  (libtcmalloc's TLS section is all
BSS.)  libtcmalloc's l_tls_offset is 32.

Now I do something strange: one of my libraries dlopens a (preloaded) library
that has a DT_NEEDED reference to libtcmalloc.  This results in:

#0  _dl_add_to_slotinfo (l=l@entry=0x7ffff7ffd728) at dl-tls.c:885
#1  0x00000034e3c137fc in dl_open_worker (a=a@entry=0x7fffffffdd90) at
dl-open.c:504
#2  0x00000034e3c0edc6 in _dl_catch_error
(objname=objname@entry=0x7fffffffdd80,
errstring=errstring@entry=0x7fffffffdd88, 
    mallocedp=mallocedp@entry=0x7fffffffdd70,
operate=operate@entry=0x34e3c12fb0 <dl_open_worker>, 
    args=args@entry=0x7fffffffdd90) at dl-error.c:177
#3  0x00000034e3c12c0c in _dl_open (file=0x7ffff5c06f80 "libamamalloc.so",
mode=-2147483643, caller_dlopen=<optimized out>, 
    nsid=-2, argc=5, argv=0x7fffffffe078, env=0x7fffffffe0a8) at dl-open.c:653
#4  0x00000034e4c01026 in dlopen_doit (a=a@entry=0x7fffffffdfa0) at dlopen.c:66
#5  0x00000034e3c0edc6 in _dl_catch_error (objname=0xa3de10,
errstring=0xa3de18, mallocedp=0xa3de08, 
    operate=0x34e4c00fc0 <dlopen_doit>, args=0x7fffffffdfa0) at dl-error.c:177
#6  0x00000034e4c0163c in _dlerror_run (operate=operate@entry=0x34e4c00fc0
<dlopen_doit>, args=args@entry=0x7fffffffdfa0)
    at dlerror.c:163
#7  0x00000034e4c010c1 in __dlopen (file=<optimized out>, mode=<optimized out>)
at dlopen.c:87

_dl_add_to_slotinfo is called on libtcmalloc and increments
dl_tls_dtv_slotinfo_list->slotinfo[1].gen to 2.  I suspect that the fact that
_dl_add_to_slotinfo was called at all here is the actual bug.

Later on, after initializing a bunch of things, I try to allocate memory.  The
code ends up here:

#0  _dl_update_slotinfo (req_modid=1) at dl-tls.c:672
#1  0x00000034e3c01264 in update_get_addr (ti=0x7ffff7fd4af0) at dl-tls.c:750
#2  0x00007ffff7dc0411 in GetStackTrace (result=0x94c118, max_depth=30,
skip_count=3) at src/stacktrace_libunwind-inl.h:85
#3  0x00007ffff7db3869 in RecordGrowth (growth=1048576) at src/page_heap.cc:463
#4  tcmalloc::PageHeap::GrowHeap (this=this@entry=0x98c000, n=n@entry=8) at
src/page_heap.cc:489
#5  0x00007ffff7db3b6b in tcmalloc::PageHeap::New (this=0x98c000, n=8) at
src/page_heap.cc:120
#6  0x00007ffff7db25c9 in tcmalloc::CentralFreeList::Populate
(this=0x7ffff7fee7a0 <tcmalloc::Static::central_cache_+74176>)
    at src/central_freelist.cc:318
#7  0x00007ffff7db27c8 in tcmalloc::CentralFreeList::FetchFromSpansSafe (
    this=this@entry=0x7ffff7fee7a0 <tcmalloc::Static::central_cache_+74176>) at
src/central_freelist.cc:285
#8  0x00007ffff7db2858 in tcmalloc::CentralFreeList::RemoveRange (
    this=0x7ffff7fee7a0 <tcmalloc::Static::central_cache_+74176>,
start=0x7ffffffae280, end=0x7ffffffae288, N=1)
    at src/central_freelist.cc:263
#9  0x00007ffff7db58e1 in tcmalloc::ThreadCache::FetchFromCentralCache
(this=0xa170c0, cl=<optimized out>, byte_size=65536)
    at src/thread_cache.cc:160
#10 0x00007ffff7dc515b in Allocate (cl=<optimized out>, size=65536,
this=0xa170c0) at src/thread_cache.h:364
#11 do_malloc_small (size=65536, heap=0xa170c0) at src/tcmalloc.cc:1088
#12 do_malloc_no_errno (size=65536) at src/tcmalloc.cc:1095
#13 cpp_alloc (nothrow=false, size=65536) at src/tcmalloc.cc:1423
#14 tc_new (size=65536) at src/tcmalloc.cc:1601
#15 0x00007ffff6527ca5 in allocate (this=0xa3e520, __n=<optimized out>)
    at /usr/lib/gcc/x86_64-redhat-linux/4.7.2/../../../../i

This is *not* a recursive allocation or glibc invocation; it's a call to
tc_new, which is overriding the standard operator new, from STL.

_dl_update_slotinfo ends up here:

          /* If there is currently memory allocate for this
         dtv entry free it.  */
          /* XXX Ideally we will at some point create a memory
         pool.  */
          if (! dtv[modid].pointer.is_static
          && dtv[modid].pointer.val != TLS_DTV_UNALLOCATED)
        /* Note that free is called for NULL is well.  We
           deallocate even if it is this dtv entry we are
           supposed to load.  The reason is that we call
           memalign and not malloc.  */
        free (dtv[modid].pointer.val);

          /* This module is loaded dynamically- We defer memory
         allocation.  */
          dtv[modid].pointer.is_static = false;
          dtv[modid].pointer.val = TLS_DTV_UNALLOCATED;

I assume that this code is intended to free memory from a previously dlclose'd
module that has its modid reused, but this is *not* the case here.  libtcmalloc
is not dynamic -- glibc should not be freeing its TLS space.  (Maybe there
should be an assertion here that the slot is dynamic.)

At this point, dtv[0].counter is 1 and dl_tls_dtv_slotinfo_list->slotinfo[1] is
2, which is (I think) how we got here.

The eventual failure mode is a straightforward infinite loop in
tls_get_addr_tail: l_tls_offset is not FORCED_DYNAMIC_TLS_OFFSET and
dtv[1].pointer.val == TLS_DTV_UNALLOCATED, so it spins forever waiting for some
nonexistent parallel dlopen call to do something.  (I don't understand this
code path at all, but the infinite loop is straightforward.)

I understand if calling dlopen from a shared library initializer is verboten,
but, if so, it should abort instead of corrupting internal TLS data structures.

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug dynamic-link/15199] dlopening a load-time library from an earlier library's initializer corrupts TLS state
  2013-02-25 22:14 [Bug dynamic-link/15199] New: dlopening a load-time library from an earlier library's initializer corrupts TLS state luto at mit dot edu
@ 2013-02-26 19:07 ` luto at mit dot edu
  2014-06-13 18:47 ` fweimer at redhat dot com
  2015-01-22  2:18 ` sfilargi at cisco dot com
  2 siblings, 0 replies; 4+ messages in thread
From: luto at mit dot edu @ 2013-02-26 19:07 UTC (permalink / raw)
  To: glibc-bugs

http://sourceware.org/bugzilla/show_bug.cgi?id=15199

--- Comment #1 from Andy Lutomirski <luto at mit dot edu> 2013-02-26 19:06:57 UTC ---
Here's a self-contained testcase, tested on Fedora 18.

--- begin a.c ---
#include <dlfcn.h>

void a(void) {}
extern void abort(void);

__attribute__((constructor)) static void init(void)
{
  write(1, "dlopen b\n", 9);
  if (!dlopen("libb.so", RTLD_LAZY | RTLD_NOLOAD)) /* This corrupts TLS state
*/
    abort();
  write(1, "dlopen done\n", 12);
}
--- end a.c ---

--- begin b.c ---
static __thread int tls;

void b()
{
  write(1, "Begin TLS access\n", 17);
  tls = 1;  /* This will infinite loop because TLS state is corrupt */
  write(1, "Done\n", 5);
}
--- end b.c ---

--- begin main.c ---
extern void a(void), b(void);

int main()
{
  a();  /* Just to DT_NEEDED it. */
  b();  /* This one will hang. */
}
--- end main.c ---

To trigger the bug, do this:

$ gcc -g -fPIC -shared -o liba.so a.c
$ gcc -g -fPIC -shared -o libb.so b.c
$ gcc -g -o main main.c libb.so liba.so -ldl
$ LD_LIBRARY_PATH=. ./main
dlopen b
dlopen done
Begin TLS access
   [this infinite loops]

Reversing the link order of libb.so and liba.so will cause this code to work.

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug dynamic-link/15199] dlopening a load-time library from an earlier library's initializer corrupts TLS state
  2013-02-25 22:14 [Bug dynamic-link/15199] New: dlopening a load-time library from an earlier library's initializer corrupts TLS state luto at mit dot edu
  2013-02-26 19:07 ` [Bug dynamic-link/15199] " luto at mit dot edu
@ 2014-06-13 18:47 ` fweimer at redhat dot com
  2015-01-22  2:18 ` sfilargi at cisco dot com
  2 siblings, 0 replies; 4+ messages in thread
From: fweimer at redhat dot com @ 2014-06-13 18:47 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=15199

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |fweimer at redhat dot com
              Flags|                            |security-

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug dynamic-link/15199] dlopening a load-time library from an earlier library's initializer corrupts TLS state
  2013-02-25 22:14 [Bug dynamic-link/15199] New: dlopening a load-time library from an earlier library's initializer corrupts TLS state luto at mit dot edu
  2013-02-26 19:07 ` [Bug dynamic-link/15199] " luto at mit dot edu
  2014-06-13 18:47 ` fweimer at redhat dot com
@ 2015-01-22  2:18 ` sfilargi at cisco dot com
  2 siblings, 0 replies; 4+ messages in thread
From: sfilargi at cisco dot com @ 2015-01-22  2:18 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=15199

Stavros Filargyropoulos <sfilargi at cisco dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |sfilargi at cisco dot com

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-01-22  2:18 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-25 22:14 [Bug dynamic-link/15199] New: dlopening a load-time library from an earlier library's initializer corrupts TLS state luto at mit dot edu
2013-02-26 19:07 ` [Bug dynamic-link/15199] " luto at mit dot edu
2014-06-13 18:47 ` fweimer at redhat dot com
2015-01-22  2:18 ` sfilargi at cisco dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).