From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 15523 invoked by alias); 26 Nov 2013 23:02:56 -0000 Mailing-List: contact glibc-bugs-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: glibc-bugs-owner@sourceware.org Received: (qmail 15454 invoked by uid 48); 26 Nov 2013 23:02:52 -0000 From: "carlos at redhat dot com" To: glibc-bugs@sourceware.org Subject: [Bug network/10652] getaddrinfo causes segfault if multithreaded and linked statically Date: Tue, 26 Nov 2013 23:02:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: network X-Bugzilla-Version: unspecified X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: carlos at redhat dot com X-Bugzilla-Status: NEW X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_status Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2013-11/txt/msg00289.txt.bz2 https://sourceware.org/bugzilla/show_bug.cgi?id=10652 Carlos O'Donell changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEW --- Comment #19 from Carlos O'Donell --- In a test case where the application doesn't link against libpthread, but a dlopen'd library does, parallel calls to getaddrinfo cause corruption in the IO layers and eventually a crash. Even though libpthread.so.1 has been loaded the weak-ref-and-check idiom in the NSS code isn't working. The GOT entry stays zero and therefore the nss code skips doing any locking and we get serious corruption via get_contents->__GI_fgets_unlocked (doing unlocked file IO with multiple threads causes data races and corruption). The skipped locks are in _nss_files_gethostbyname4_r (libnss_files.so). When the application is compiled with -lpthread the GOT entry has a non-zero value of 0x00007ffff77bc460 which is "0x7ffff77bc460 <__GI___pthread_mutex_lock>: sub $0x8,%rsp" and therefore correct. That entry is the GOT entry #40 with relocation: 000000000020bfd8 0000001a00000006 R_X86_64_GLOB_DAT 0000000000000000 __pthread_mutex_lock + 0. If libpthread is loaded *after* libnss_files.so is loaded I don't see that there is anything you can do to make the NSS code use locks since the GOT relocation has already been processed. However in this case libpthread is loaded *before* libnss_files.so, but it appears as if the resolution scope prevents the symbols from libpthread being made available to libnss_files.so? e.g. 20987: object=/home/carlos/build/glibc/nss/libnss_files.so.2 [0] 20987: scope 0: ./crash_main_no_pthread /home/carlos/build/glibc/dlfcn/libdl.so.2 /home/carlos/build/glibc/libc.so.6 /home/carlos/build/glibc/elf/ld.so 20987: scope 1: /home/carlos/build/glibc/nss/libnss_files.so.2 /home/carlos/build/glibc/libc.so.6 /home/carlos/build/glibc/elf/ld.so Notice libnss_files.so.2 is in it's own scope without libpthread. As opposed to crash_getaddrinfo.so's scope with libpthread in it e.g. 20987: object=/home/carlos/support/2013-11-22/crash_getaddrinfo.so [0] 20987: scope 0: ./crash_main_no_pthread /home/carlos/build/glibc/dlfcn/libdl.so.2 /home/carlos/build/glibc/libc.so.6 /home/carlos/build/glibc/elf/ld.so 20987: scope 1: /home/carlos/support/2013-11-22/crash_getaddrinfo.so /home/carlos/build/glibc/nptl/libpthread.so.0 /home/carlos/build/glibc/libc.so.6 /home/carlos/build/glibc/elf/ld.so I don't know what's the right answer here. There are really only two resolution scopes, global and local, the scopes listed above are internal details of glibc's dyanmic loader. Why libpthread's symbols wouldn't be used for the relocation in libnss_files.so is what baffles me, one would have to track down the exact relocation and determine why the libpthread symbol isn't used. I'm not working on this so I'm flipping this to NEW, but I thought I'd post what I saw during my analysis of a similar internal Red Hat bug. -- You are receiving this mail because: You are on the CC list for the bug.