From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <glibc-bugs-return-20318-listarch-glibc-bugs=sources.redhat.com@sourceware.org>
Received: (qmail 15523 invoked by alias); 26 Nov 2013 23:02:56 -0000
Mailing-List: contact glibc-bugs-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <glibc-bugs.sourceware.org>
List-Subscribe: <mailto:glibc-bugs-subscribe@sourceware.org>
List-Post: <mailto:glibc-bugs@sourceware.org>
List-Help: <mailto:glibc-bugs-help@sourceware.org>, <http://sourceware.org/lists.html#faqs>
Sender: glibc-bugs-owner@sourceware.org
Received: (qmail 15454 invoked by uid 48); 26 Nov 2013 23:02:52 -0000
From: "carlos at redhat dot com" <sourceware-bugzilla@sourceware.org>
To: glibc-bugs@sourceware.org
Subject: [Bug network/10652] getaddrinfo causes segfault if multithreaded and linked statically
Date: Tue, 26 Nov 2013 23:02:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: glibc
X-Bugzilla-Component: network
X-Bugzilla-Version: unspecified
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: carlos at redhat dot com
X-Bugzilla-Status: NEW
X-Bugzilla-Priority: P2
X-Bugzilla-Assigned-To: unassigned at sourceware dot org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields: bug_status
Message-ID: <bug-10652-131-dAIo01VlaC@http.sourceware.org/bugzilla/>
In-Reply-To: <bug-10652-131@http.sourceware.org/bugzilla/>
References: <bug-10652-131@http.sourceware.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://sourceware.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2013-11/txt/msg00289.txt.bz2

https://sourceware.org/bugzilla/show_bug.cgi?id=10652

Carlos O'Donell <carlos at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |NEW
--- Comment #19 from Carlos O'Donell <carlos at redhat dot com> ---
In a test case where the application doesn't link against libpthread, but a
dlopen'd library does, parallel calls to getaddrinfo cause corruption in the IO
layers and eventually a crash.

Even though libpthread.so.1 has been loaded the weak-ref-and-check idiom in the
NSS code isn't working. The GOT entry stays zero and therefore the nss code
skips doing any locking and we get serious corruption via
get_contents->__GI_fgets_unlocked (doing unlocked file IO with multiple threads
causes data races and corruption). 

The skipped locks are in _nss_files_gethostbyname4_r (libnss_files.so). When
the application is compiled with -lpthread the GOT entry has a non-zero value
of 0x00007ffff77bc460 which is "0x7ffff77bc460 <__GI___pthread_mutex_lock>:   
sub    $0x8,%rsp" and therefore correct. That entry is the GOT entry #40 with
relocation: 000000000020bfd8  0000001a00000006 R_X86_64_GLOB_DAT     
0000000000000000 __pthread_mutex_lock + 0.

If libpthread is loaded *after* libnss_files.so is loaded I don't see that
there is anything you can do to make the NSS code use locks since the GOT
relocation has already been processed. However in this case libpthread is
loaded *before* libnss_files.so, but it appears as if the resolution scope
prevents the symbols from libpthread being made available to libnss_files.so?

e.g.
     20987:     object=/home/carlos/build/glibc/nss/libnss_files.so.2 [0]
     20987:      scope 0: ./crash_main_no_pthread
/home/carlos/build/glibc/dlfcn/libdl.so.2 /home/carlos/build/glibc/libc.so.6
/home/carlos/build/glibc/elf/ld.so
     20987:      scope 1: /home/carlos/build/glibc/nss/libnss_files.so.2
/home/carlos/build/glibc/libc.so.6 /home/carlos/build/glibc/elf/ld.so

Notice libnss_files.so.2 is in it's own scope without libpthread. As opposed to
crash_getaddrinfo.so's scope with libpthread in it

e.g.
     20987:     object=/home/carlos/support/2013-11-22/crash_getaddrinfo.so [0]
     20987:      scope 0: ./crash_main_no_pthread
/home/carlos/build/glibc/dlfcn/libdl.so.2 /home/carlos/build/glibc/libc.so.6
/home/carlos/build/glibc/elf/ld.so
     20987:      scope 1: /home/carlos/support/2013-11-22/crash_getaddrinfo.so
/home/carlos/build/glibc/nptl/libpthread.so.0
/home/carlos/build/glibc/libc.so.6 /home/carlos/build/glibc/elf/ld.so

I don't know what's the right answer here. There are really only two resolution
scopes, global and local, the scopes listed above are internal details of
glibc's dyanmic loader. Why libpthread's symbols wouldn't be used for the
relocation in libnss_files.so is what baffles me, one would have to track down
the exact relocation and determine why the libpthread symbol isn't used.

I'm not working on this so I'm flipping this to NEW, but I thought I'd post
what I saw during my analysis of a similar internal Red Hat bug.

-- 
You are receiving this mail because:
You are on the CC list for the bug.