public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug malloc/16159] New: malloc_printerr() deadlock, when calling malloc_printerr() again
@ 2013-11-13  3:30 darryl.miles at darrylmiles dot org
  2013-11-13  3:31 ` [Bug malloc/16159] " darryl.miles at darrylmiles dot org
                   ` (24 more replies)
  0 siblings, 25 replies; 29+ messages in thread
From: darryl.miles at darrylmiles dot org @ 2013-11-13  3:30 UTC (permalink / raw)
  To: glibc-bugs

http://sourceware.org/bugzilla/show_bug.cgi?id=16159

            Bug ID: 16159
           Summary: malloc_printerr() deadlock, when calling
                    malloc_printerr() again
           Product: glibc
           Version: 2.12
            Status: NEW
          Severity: normal
          Priority: P2
         Component: malloc
          Assignee: unassigned at sourceware dot org
          Reporter: darryl.miles at darrylmiles dot org

malloc_printerr() on error detection "free(): invalid next size (fast)" ends up
calling into:

backtrace.c:init()
dl-libc.c:do_dlopen()
malloc.c:calloc()
malloc.c:malloc_printerr()

The malloc error reporting should only report the first error, not attempt to
recusively report all error (we knew it was corrupted at the outer most point,
so any further work inside malloc is also likely to find corruption).

Full stack trace to follow.


The main problem is the process does not abort() and die, it hangs around in:

pthread_once.S:pthread_one()
backtrace.c:__backtrace()

I think due to recursive lock, this lock should trylock() on the 2nd time and
abort() the process immediately.  It does appear to deadlock itself.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug malloc/16159] malloc_printerr() deadlock, when calling malloc_printerr() again
  2013-11-13  3:30 [Bug malloc/16159] New: malloc_printerr() deadlock, when calling malloc_printerr() again darryl.miles at darrylmiles dot org
@ 2013-11-13  3:31 ` darryl.miles at darrylmiles dot org
  2013-11-13  3:37 ` darryl.miles at darrylmiles dot org
                   ` (23 subsequent siblings)
  24 siblings, 0 replies; 29+ messages in thread
From: darryl.miles at darrylmiles dot org @ 2013-11-13  3:31 UTC (permalink / raw)
  To: glibc-bugs

http://sourceware.org/bugzilla/show_bug.cgi?id=16159

--- Comment #1 from Darryl Miles <darryl.miles at darrylmiles dot org> ---
(gdb) bt
#0  pthread_once () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_once.S:95
#1  0x00007f8dfb540994 in __backtrace (array=<value optimized out>, size=64) at
../sysdeps/ia64/backtrace.c:85
#2  0x00007f8dfb4b280b in __libc_message (do_abort=2, fmt=0x7f8dfb599fc0 "***
glibc detected *** %s: %s: 0x%s ***\n") at
../sysdeps/unix/sysv/linux/libc_fatal.c:178
#3  0x00007f8dfb4b8126 in malloc_printerr (action=3, str=0x7f8dfb5980f9
"malloc(): memory corruption", ptr=<value optimized out>) at malloc.c:6311
#4  0x00007f8dfb4bbba4 in _int_malloc (av=0x7f8dfb7d0e80, bytes=<value
optimized out>) at malloc.c:4411
#5  0x00007f8dfb4bc5e6 in __libc_calloc (n=<value optimized out>,
elem_size=<value optimized out>) at malloc.c:4075
#6  0x00007f8dfda14d1f in _dl_new_object (realname=0x247de20
"/lib64/libgcc_s.so.1", libname=0x7f8dfb596e3e "libgcc_s.so.1", type=2,
loader=0x0, mode=-1879048191, nsid=0) at dl-object.c:77
#7  0x00007f8dfda111ae in _dl_map_object_from_fd (name=0x7f8dfb596e3e
"libgcc_s.so.1", fd=6, fbp=0x7fffd2c1ace0, realname=0x247de20
"/lib64/libgcc_s.so.1", loader=0x0, l_type=2, mode=-1879048191,
stack_endp=0x7fffd2c1b028, nsid=0)
    at dl-load.c:975
#8  0x00007f8dfda1236a in _dl_map_object (loader=0x0, name=0x7f8dfb596e3e
"libgcc_s.so.1", type=2, trace_mode=0, mode=<value optimized out>, nsid=<value
optimized out>) at dl-load.c:2274
#9  0x00007f8dfda1ca34 in dl_open_worker (a=0x7fffd2c1b250) at dl-open.c:227
#10 0x00007f8dfda181a6 in _dl_catch_error (objname=0x7fffd2c1b2a0,
errstring=0x7fffd2c1b298, mallocedp=0x7fffd2c1b2af, operate=0x7f8dfda1c910
<dl_open_worker>, args=0x7fffd2c1b250) at dl-error.c:178
#11 0x00007f8dfda1c4ea in _dl_open (file=0x7f8dfb596e3e "libgcc_s.so.1",
mode=-2147483647, caller_dlopen=0x0, nsid=-2, argc=8, argv=<value optimized
out>, env=0x7fffd2c30020) at dl-open.c:569
#12 0x00007f8dfb568340 in do_dlopen (ptr=<value optimized out>) at dl-libc.c:86
#13 0x00007f8dfda181a6 in _dl_catch_error (objname=0x7fffd2c1b460,
errstring=0x7fffd2c1b458, mallocedp=0x7fffd2c1b46f, operate=0x7f8dfb568300
<do_dlopen>, args=0x7fffd2c1b440) at dl-error.c:178
#14 0x00007f8dfb568497 in dlerror_run (name=<value optimized out>, mode=<value
optimized out>) at dl-libc.c:47
#15 __libc_dlopen_mode (name=<value optimized out>, mode=<value optimized out>)
at dl-libc.c:160
#16 0x00007f8dfb540895 in init () at ../sysdeps/ia64/backtrace.c:41
#17 0x00007f8dfb7e1b23 in pthread_once () at
../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_once.S:104
#18 0x00007f8dfb540994 in __backtrace (array=<value optimized out>, size=64) at
../sysdeps/ia64/backtrace.c:85
#19 0x00007f8dfb4b280b in __libc_message (do_abort=2, fmt=0x7f8dfb599fc0 "***
glibc detected *** %s: %s: 0x%s ***\n") at
../sysdeps/unix/sysv/linux/libc_fatal.c:178
#20 0x00007f8dfb4b8126 in malloc_printerr (action=3, str=0x7f8dfb59a2b8
"free(): invalid next size (fast)", ptr=<value optimized out>) at malloc.c:6311
#21 0x00007f8dfb4bac53 in _int_free (av=0x7f8dfb7d0e80, p=0x24d52c0,
have_lock=0) at malloc.c:4811

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug malloc/16159] malloc_printerr() deadlock, when calling malloc_printerr() again
  2013-11-13  3:30 [Bug malloc/16159] New: malloc_printerr() deadlock, when calling malloc_printerr() again darryl.miles at darrylmiles dot org
  2013-11-13  3:31 ` [Bug malloc/16159] " darryl.miles at darrylmiles dot org
@ 2013-11-13  3:37 ` darryl.miles at darrylmiles dot org
  2013-11-13  3:44 ` carlos at redhat dot com
                   ` (22 subsequent siblings)
  24 siblings, 0 replies; 29+ messages in thread
From: darryl.miles at darrylmiles dot org @ 2013-11-13  3:37 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=16159

--- Comment #2 from Darryl Miles <darryl.miles at darrylmiles dot org> ---
See also bug#956

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug malloc/16159] malloc_printerr() deadlock, when calling malloc_printerr() again
  2013-11-13  3:30 [Bug malloc/16159] New: malloc_printerr() deadlock, when calling malloc_printerr() again darryl.miles at darrylmiles dot org
  2013-11-13  3:31 ` [Bug malloc/16159] " darryl.miles at darrylmiles dot org
  2013-11-13  3:37 ` darryl.miles at darrylmiles dot org
@ 2013-11-13  3:44 ` carlos at redhat dot com
  2013-11-13  3:57 ` carlos at redhat dot com
                   ` (21 subsequent siblings)
  24 siblings, 0 replies; 29+ messages in thread
From: carlos at redhat dot com @ 2013-11-13  3:44 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=16159

Carlos O'Donell <carlos at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |matthias.andree at gmx dot de

--- Comment #3 from Carlos O'Donell <carlos at redhat dot com> ---
*** Bug 956 has been marked as a duplicate of this bug. ***

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug malloc/16159] malloc_printerr() deadlock, when calling malloc_printerr() again
  2013-11-13  3:30 [Bug malloc/16159] New: malloc_printerr() deadlock, when calling malloc_printerr() again darryl.miles at darrylmiles dot org
                   ` (2 preceding siblings ...)
  2013-11-13  3:44 ` carlos at redhat dot com
@ 2013-11-13  3:57 ` carlos at redhat dot com
  2013-11-13  7:57   ` Ondřej Bílka
  2013-11-13  7:57 ` neleai at seznam dot cz
                   ` (20 subsequent siblings)
  24 siblings, 1 reply; 29+ messages in thread
From: carlos at redhat dot com @ 2013-11-13  3:57 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=16159

Carlos O'Donell <carlos at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |SUSPENDED
                 CC|                            |carlos at redhat dot com

--- Comment #4 from Carlos O'Donell <carlos at redhat dot com> ---
This is going to be difficult to fix and invasive. I will do my best to explain
why.

At the point of the failure we want to be able to print a backtrace. The only
way to get a reliable backtrace is to use the unwinder provided by gcc via
libgcc_s.so.1 (this may vary by machine). In order to get access to the
unwinder we must dlopen that shared library. During the dlopen process we need
to calloc enough structures to hookup the new shared library into the
structures used by the dynamic linker.

One resolution to this problem is to ensure that malloc has a fall-back
allocation scheme that is robust against failure and then during the
malloc_printerr we flip an internal bit and switch to the temporary reserve
allocations. We could also create a new internal API for using the temporary
allocations and then dlopen could use that in the event that we are crashing
and need to dlopen one last library (the unwinder on demand). That would
prevent other threads from consuming the reserve allocations after
malloc_printerr is entered by another thread.

This is a considerable amount of work and we aren't going to get to this issue
until a core developer or someone with serious interest commits to fixing this.
Therefore I'm moving this to SUSPENDED until we find the resources to fix the
issue.

This issue should remain open and new issues submited about this bug should be
marked as duplicates of this issue.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Bug malloc/16159] malloc_printerr() deadlock, when calling  malloc_printerr() again
  2013-11-13  3:57 ` carlos at redhat dot com
@ 2013-11-13  7:57   ` Ondřej Bílka
  0 siblings, 0 replies; 29+ messages in thread
From: Ondřej Bílka @ 2013-11-13  7:57 UTC (permalink / raw)
  To: carlos at redhat dot com; +Cc: glibc-bugs

On Wed, Nov 13, 2013 at 03:57:02AM +0000, carlos at redhat dot com wrote:
> One resolution to this problem is to ensure that malloc has a fall-back
> allocation scheme that is robust against failure and then during the
> malloc_printerr we flip an internal bit and switch to the temporary reserve
> allocations. We could also create a new internal API for using the temporary
> allocations and then dlopen could use that in the event that we are crashing
> and need to dlopen one last library (the unwinder on demand). That would
> prevent other threads from consuming the reserve allocations after
> malloc_printerr is entered by another thread.
> 
> This is a considerable amount of work and we aren't going to get to this issue
> until a core developer or someone with serious interest commits to fixing this.
> Therefore I'm moving this to SUSPENDED until we find the resources to fix the
> issue.
> 
Why not reuse a singal-safe malloc for dlopen?


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug malloc/16159] malloc_printerr() deadlock, when calling malloc_printerr() again
  2013-11-13  3:30 [Bug malloc/16159] New: malloc_printerr() deadlock, when calling malloc_printerr() again darryl.miles at darrylmiles dot org
                   ` (3 preceding siblings ...)
  2013-11-13  3:57 ` carlos at redhat dot com
@ 2013-11-13  7:57 ` neleai at seznam dot cz
  2013-11-13 13:00 ` darryl.miles at darrylmiles dot org
                   ` (19 subsequent siblings)
  24 siblings, 0 replies; 29+ messages in thread
From: neleai at seznam dot cz @ 2013-11-13  7:57 UTC (permalink / raw)
  To: glibc-bugs

http://sourceware.org/bugzilla/show_bug.cgi?id=16159

--- Comment #5 from Ondrej Bilka <neleai at seznam dot cz> ---
On Wed, Nov 13, 2013 at 03:57:02AM +0000, carlos at redhat dot com wrote:
> One resolution to this problem is to ensure that malloc has a fall-back
> allocation scheme that is robust against failure and then during the
> malloc_printerr we flip an internal bit and switch to the temporary reserve
> allocations. We could also create a new internal API for using the temporary
> allocations and then dlopen could use that in the event that we are crashing
> and need to dlopen one last library (the unwinder on demand). That would
> prevent other threads from consuming the reserve allocations after
> malloc_printerr is entered by another thread.
> 
> This is a considerable amount of work and we aren't going to get to this issue
> until a core developer or someone with serious interest commits to fixing this.
> Therefore I'm moving this to SUSPENDED until we find the resources to fix the
> issue.
> 
Why not reuse a singal-safe malloc for dlopen?

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug malloc/16159] malloc_printerr() deadlock, when calling malloc_printerr() again
  2013-11-13  3:30 [Bug malloc/16159] New: malloc_printerr() deadlock, when calling malloc_printerr() again darryl.miles at darrylmiles dot org
                   ` (4 preceding siblings ...)
  2013-11-13  7:57 ` neleai at seznam dot cz
@ 2013-11-13 13:00 ` darryl.miles at darrylmiles dot org
  2013-11-13 14:31   ` Ondřej Bílka
  2013-11-13 13:11 ` darryl.miles at darrylmiles dot org
                   ` (18 subsequent siblings)
  24 siblings, 1 reply; 29+ messages in thread
From: darryl.miles at darrylmiles dot org @ 2013-11-13 13:00 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=16159

--- Comment #6 from Darryl Miles <darryl.miles at darrylmiles dot org> ---
This fancy backtrace stuff is nice and all but... the process must die!


Can't the pthread_once use a non-blocking lock ?

Can the lock be a recursive type ?

Can pthread_trylock() used in this non-critial path ?  if already locked, and

if possible to check if locked by our thread-id ? 

then we immediately abort the process (causing execution of the process to die,
like it should).  No backtrace is emitted, great!


How do I stop this fancy backtrace stuff from working ?  I want to setup an
environment variable to turn it off as a workaround ?

How do I make this fancy backtrace stuff work, by preloading the dlopen() stuff
it might need, during initialization of malloc() ?  I want to setup an
environment variable for that too.


There is no need to actually fix the bug, you are over thinking the issue.  But
this fancy stuff needs to be turned off or preloaded, before the process gets
into an undefined state (due to memory bug).

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug malloc/16159] malloc_printerr() deadlock, when calling malloc_printerr() again
  2013-11-13  3:30 [Bug malloc/16159] New: malloc_printerr() deadlock, when calling malloc_printerr() again darryl.miles at darrylmiles dot org
                   ` (5 preceding siblings ...)
  2013-11-13 13:00 ` darryl.miles at darrylmiles dot org
@ 2013-11-13 13:11 ` darryl.miles at darrylmiles dot org
  2013-11-13 14:31 ` neleai at seznam dot cz
                   ` (17 subsequent siblings)
  24 siblings, 0 replies; 29+ messages in thread
From: darryl.miles at darrylmiles dot org @ 2013-11-13 13:11 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=16159

--- Comment #7 from Darryl Miles <darryl.miles at darrylmiles dot org> ---
Another idea, do not backtrace() every malloc() error, only the first one (the
outer most one).


But right now the process deadlocks itself, on what looks to be a non-recursive
mutex trying to do fancy backtrace on every malloc() problem found.

The process must die.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Bug malloc/16159] malloc_printerr() deadlock, when calling malloc_printerr() again
  2013-11-13 13:00 ` darryl.miles at darrylmiles dot org
@ 2013-11-13 14:31   ` Ondřej Bílka
  0 siblings, 0 replies; 29+ messages in thread
From: Ondřej Bílka @ 2013-11-13 14:31 UTC (permalink / raw)
  To: darryl.miles at darrylmiles dot org; +Cc: glibc-bugs

On Wed, Nov 13, 2013 at 01:00:06PM +0000, darryl.miles at darrylmiles dot org wrote:
> How do I stop this fancy backtrace stuff from working ?  I want to setup an
> environment variable to turn it off as a workaround ?
> 
> How do I make this fancy backtrace stuff work, by preloading the dlopen() stuff
> it might need, during initialization of malloc() ?  I want to setup an
> environment variable for that too.
>

As a quick workaround you can add following code to your application/preload this.

#include <execinfo.h>
static void __attribute__ ((constructor))
init_backtrace()
{
   void *bt[10];
   backtrace (bt, 10);
}


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug malloc/16159] malloc_printerr() deadlock, when calling malloc_printerr() again
  2013-11-13  3:30 [Bug malloc/16159] New: malloc_printerr() deadlock, when calling malloc_printerr() again darryl.miles at darrylmiles dot org
                   ` (6 preceding siblings ...)
  2013-11-13 13:11 ` darryl.miles at darrylmiles dot org
@ 2013-11-13 14:31 ` neleai at seznam dot cz
  2013-11-13 15:50 ` bugdal at aerifal dot cx
                   ` (16 subsequent siblings)
  24 siblings, 0 replies; 29+ messages in thread
From: neleai at seznam dot cz @ 2013-11-13 14:31 UTC (permalink / raw)
  To: glibc-bugs

http://sourceware.org/bugzilla/show_bug.cgi?id=16159

--- Comment #8 from Ondrej Bilka <neleai at seznam dot cz> ---
On Wed, Nov 13, 2013 at 01:00:06PM +0000, darryl.miles at darrylmiles dot org
wrote:
> How do I stop this fancy backtrace stuff from working ?  I want to setup an
> environment variable to turn it off as a workaround ?
> 
> How do I make this fancy backtrace stuff work, by preloading the dlopen() stuff
> it might need, during initialization of malloc() ?  I want to setup an
> environment variable for that too.
>

As a quick workaround you can add following code to your application/preload
this.

#include <execinfo.h>
static void __attribute__ ((constructor))
init_backtrace()
{
   void *bt[10];
   backtrace (bt, 10);
}

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug malloc/16159] malloc_printerr() deadlock, when calling malloc_printerr() again
  2013-11-13  3:30 [Bug malloc/16159] New: malloc_printerr() deadlock, when calling malloc_printerr() again darryl.miles at darrylmiles dot org
                   ` (7 preceding siblings ...)
  2013-11-13 14:31 ` neleai at seznam dot cz
@ 2013-11-13 15:50 ` bugdal at aerifal dot cx
  2013-11-13 16:03 ` carlos at redhat dot com
                   ` (15 subsequent siblings)
  24 siblings, 0 replies; 29+ messages in thread
From: bugdal at aerifal dot cx @ 2013-11-13 15:50 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=16159

Rich Felker <bugdal at aerifal dot cx> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bugdal at aerifal dot cx

--- Comment #9 from Rich Felker <bugdal at aerifal dot cx> ---
Carlos, this is yet another reason why dlopen'ing libgcc_s is simply the wrong
thing to do, and libgcc_eh should be static-linked into libc. (The other big
reason is the possibility of pthread_cancel aborting the program.) At one time
in the distant past, it was necessary for there to only be one copy of this
code (and its data) in the whole program; otherwise, exception propagation (or
backtracing) across DSOs would not work reliably. But modern unwinding code
uses dl_iterate_phdr and works fine even if multiple copies of the code are
present in the program.

Fixing this error in the way I describe will greatly simplify glibc and improve
its reliability.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug malloc/16159] malloc_printerr() deadlock, when calling malloc_printerr() again
  2013-11-13  3:30 [Bug malloc/16159] New: malloc_printerr() deadlock, when calling malloc_printerr() again darryl.miles at darrylmiles dot org
                   ` (8 preceding siblings ...)
  2013-11-13 15:50 ` bugdal at aerifal dot cx
@ 2013-11-13 16:03 ` carlos at redhat dot com
  2013-11-13 16:12 ` joseph at codesourcery dot com
                   ` (14 subsequent siblings)
  24 siblings, 0 replies; 29+ messages in thread
From: carlos at redhat dot com @ 2013-11-13 16:03 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=16159

--- Comment #10 from Carlos O'Donell <carlos at redhat dot com> ---
(In reply to Rich Felker from comment #9)
> Carlos, this is yet another reason why dlopen'ing libgcc_s is simply the
> wrong thing to do, and libgcc_eh should be static-linked into libc. (The
> other big reason is the possibility of pthread_cancel aborting the program.)
> At one time in the distant past, it was necessary for there to only be one
> copy of this code (and its data) in the whole program; otherwise, exception
> propagation (or backtracing) across DSOs would not work reliably. But modern
> unwinding code uses dl_iterate_phdr and works fine even if multiple copies
> of the code are present in the program.
> 
> Fixing this error in the way I describe will greatly simplify glibc and
> improve its reliability.

That sounds like a good idea to me, I also agree that dlopening libgcc_s.so.1
always seemed like a terrible idea to me. We just need the resources to do the
rewrite and fixup the linking to use libgcc_eh. I will leave this SUSPENDED
until we find someone to clean this up.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug malloc/16159] malloc_printerr() deadlock, when calling malloc_printerr() again
  2013-11-13  3:30 [Bug malloc/16159] New: malloc_printerr() deadlock, when calling malloc_printerr() again darryl.miles at darrylmiles dot org
                   ` (9 preceding siblings ...)
  2013-11-13 16:03 ` carlos at redhat dot com
@ 2013-11-13 16:12 ` joseph at codesourcery dot com
  2013-11-13 16:23   ` Ondřej Bílka
  2013-11-13 16:23 ` neleai at seznam dot cz
                   ` (13 subsequent siblings)
  24 siblings, 1 reply; 29+ messages in thread
From: joseph at codesourcery dot com @ 2013-11-13 16:12 UTC (permalink / raw)
  To: glibc-bugs

http://sourceware.org/bugzilla/show_bug.cgi?id=16159

--- Comment #11 from joseph at codesourcery dot com <joseph at codesourcery dot com> ---
On Wed, 13 Nov 2013, bugdal at aerifal dot cx wrote:

> Carlos, this is yet another reason why dlopen'ing libgcc_s is simply the wrong
> thing to do, and libgcc_eh should be static-linked into libc. (The other big

Static-linking libgcc_eh into any glibc library is a bad idea because it 
complicates bootstrapping: it means glibc built with an initial bootstrap 
compiler (which was built without glibc headers available, implying full 
EH functionality is not present in libgcc) is not identical to glibc built 
with a compiler built using full shared glibc and headers.  (It's *also* a 
bad idea because new compilers can start using new DWARF unwind opcodes 
that an old copy of the unwind code won't understand, causing problems 
using new programs with old glibc.)

The answer for libpthread is for it to dlopen libgcc_s when loaded rather 
than at pthread_cancel time (or to be made to depend (DT_NEEDED) on 
libgcc_s in a way that doesn't require libgcc_s to be available when 
libpthread is built).  The answer for other cases is to disable the 
backtracing by default as discussed in bug 12189 (possibly with an 
environment variable, not available in setuid programs, that can reenable 
it - in which case glibc would dlopen libgcc_s at startup).

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Bug malloc/16159] malloc_printerr() deadlock, when calling malloc_printerr() again
  2013-11-13 16:12 ` joseph at codesourcery dot com
@ 2013-11-13 16:23   ` Ondřej Bílka
  0 siblings, 0 replies; 29+ messages in thread
From: Ondřej Bílka @ 2013-11-13 16:23 UTC (permalink / raw)
  To: joseph at codesourcery dot com; +Cc: glibc-bugs

On Wed, Nov 13, 2013 at 04:12:53PM +0000, joseph at codesourcery dot com wrote:
> http://sourceware.org/bugzilla/show_bug.cgi?id=16159
> 
> --- Comment #11 from joseph at codesourcery dot com <joseph at codesourcery dot com> ---
> On Wed, 13 Nov 2013, bugdal at aerifal dot cx wrote:
> 
> > Carlos, this is yet another reason why dlopen'ing libgcc_s is simply the wrong
> > thing to do, and libgcc_eh should be static-linked into libc. (The other big
> 
> Static-linking libgcc_eh into any glibc library is a bad idea because it 
> complicates bootstrapping: it means glibc built with an initial bootstrap 
> compiler (which was built without glibc headers available, implying full 
> EH functionality is not present in libgcc) is not identical to glibc built 
> with a compiler built using full shared glibc and headers.  (It's *also* a 
> bad idea because new compilers can start using new DWARF unwind opcodes 
> that an old copy of the unwind code won't understand, causing problems 
> using new programs with old glibc.)
> 
Why did you jump from dlopening to static linking? Dynamic linking would
work and if there is concern that user does not have one we could
provide a stub implementation and function to test if we deal with stub
or real one.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug malloc/16159] malloc_printerr() deadlock, when calling malloc_printerr() again
  2013-11-13  3:30 [Bug malloc/16159] New: malloc_printerr() deadlock, when calling malloc_printerr() again darryl.miles at darrylmiles dot org
                   ` (10 preceding siblings ...)
  2013-11-13 16:12 ` joseph at codesourcery dot com
@ 2013-11-13 16:23 ` neleai at seznam dot cz
  2013-11-13 16:28 ` bugdal at aerifal dot cx
                   ` (12 subsequent siblings)
  24 siblings, 0 replies; 29+ messages in thread
From: neleai at seznam dot cz @ 2013-11-13 16:23 UTC (permalink / raw)
  To: glibc-bugs

http://sourceware.org/bugzilla/show_bug.cgi?id=16159

--- Comment #12 from Ondrej Bilka <neleai at seznam dot cz> ---
On Wed, Nov 13, 2013 at 04:12:53PM +0000, joseph at codesourcery dot com wrote:
> http://sourceware.org/bugzilla/show_bug.cgi?id=16159
> 
> --- Comment #11 from joseph at codesourcery dot com <joseph at codesourcery dot com> ---
> On Wed, 13 Nov 2013, bugdal at aerifal dot cx wrote:
> 
> > Carlos, this is yet another reason why dlopen'ing libgcc_s is simply the wrong
> > thing to do, and libgcc_eh should be static-linked into libc. (The other big
> 
> Static-linking libgcc_eh into any glibc library is a bad idea because it 
> complicates bootstrapping: it means glibc built with an initial bootstrap 
> compiler (which was built without glibc headers available, implying full 
> EH functionality is not present in libgcc) is not identical to glibc built 
> with a compiler built using full shared glibc and headers.  (It's *also* a 
> bad idea because new compilers can start using new DWARF unwind opcodes 
> that an old copy of the unwind code won't understand, causing problems 
> using new programs with old glibc.)
> 
Why did you jump from dlopening to static linking? Dynamic linking would
work and if there is concern that user does not have one we could
provide a stub implementation and function to test if we deal with stub
or real one.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug malloc/16159] malloc_printerr() deadlock, when calling malloc_printerr() again
  2013-11-13  3:30 [Bug malloc/16159] New: malloc_printerr() deadlock, when calling malloc_printerr() again darryl.miles at darrylmiles dot org
                   ` (11 preceding siblings ...)
  2013-11-13 16:23 ` neleai at seznam dot cz
@ 2013-11-13 16:28 ` bugdal at aerifal dot cx
  2013-11-13 16:30 ` bugdal at aerifal dot cx
                   ` (11 subsequent siblings)
  24 siblings, 0 replies; 29+ messages in thread
From: bugdal at aerifal dot cx @ 2013-11-13 16:28 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=16159

--- Comment #13 from Rich Felker <bugdal at aerifal dot cx> ---
Joseph, the bootstrapping issue can presumably be fixed (and bootstrapping made
easier) simply by providing a way to install headers without building glibc.
This may even allow you to shave one or more steps off of the full bootstrap
process.

As for the issue of new DWARF opcodes, if they prevent older unwind code from
being able to interpret the unwind information at all (rather than just failing
to take advantage of the new features) that seems like a fundamental design bug
elsewhere that should be reported. I'm not clear whether or not that's really
the case.

With that said, I find your alternate fix proposal acceptable. For the
libpthread issue, I believe the DT_NEEDED could be generated at build time
using a fake libgcc_s.so.1 in the glibc source tree. As for disabling backtrace
by default, that's perfectly acceptable. Alternatively, glibc could always
attempt to load libgcc_s.so.1 at startup and disable backtrace if it's not
found.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug malloc/16159] malloc_printerr() deadlock, when calling malloc_printerr() again
  2013-11-13  3:30 [Bug malloc/16159] New: malloc_printerr() deadlock, when calling malloc_printerr() again darryl.miles at darrylmiles dot org
                   ` (12 preceding siblings ...)
  2013-11-13 16:28 ` bugdal at aerifal dot cx
@ 2013-11-13 16:30 ` bugdal at aerifal dot cx
  2013-11-13 16:47 ` joseph at codesourcery dot com
                   ` (10 subsequent siblings)
  24 siblings, 0 replies; 29+ messages in thread
From: bugdal at aerifal dot cx @ 2013-11-13 16:30 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=16159

--- Comment #14 from Rich Felker <bugdal at aerifal dot cx> ---
Ondrej, is having glibc contain a DT_NEEDED entry for libgcc.so.1 really an
option that's on the table? I think this would also interfere with
bootstrapping issues Joseph and others may be concerned about, as well as
hurting load-time performance.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug malloc/16159] malloc_printerr() deadlock, when calling malloc_printerr() again
  2013-11-13  3:30 [Bug malloc/16159] New: malloc_printerr() deadlock, when calling malloc_printerr() again darryl.miles at darrylmiles dot org
                   ` (13 preceding siblings ...)
  2013-11-13 16:30 ` bugdal at aerifal dot cx
@ 2013-11-13 16:47 ` joseph at codesourcery dot com
  2013-11-13 16:54 ` joseph at codesourcery dot com
                   ` (9 subsequent siblings)
  24 siblings, 0 replies; 29+ messages in thread
From: joseph at codesourcery dot com @ 2013-11-13 16:47 UTC (permalink / raw)
  To: glibc-bugs

http://sourceware.org/bugzilla/show_bug.cgi?id=16159

--- Comment #15 from joseph at codesourcery dot com <joseph at codesourcery dot com> ---
On Wed, 13 Nov 2013, neleai at seznam dot cz wrote:

> Why did you jump from dlopening to static linking? Dynamic linking would
> work and if there is concern that user does not have one we could
> provide a stub implementation and function to test if we deal with stub
> or real one.

I don't think default dlopening libgcc_s from libc at startup is desirable 
on performance grounds (most programs will never need it), whereas from 
libpthread it's likely to be less significant.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug malloc/16159] malloc_printerr() deadlock, when calling malloc_printerr() again
  2013-11-13  3:30 [Bug malloc/16159] New: malloc_printerr() deadlock, when calling malloc_printerr() again darryl.miles at darrylmiles dot org
                   ` (14 preceding siblings ...)
  2013-11-13 16:47 ` joseph at codesourcery dot com
@ 2013-11-13 16:54 ` joseph at codesourcery dot com
  2013-11-14 14:32 ` neleai at seznam dot cz
                   ` (8 subsequent siblings)
  24 siblings, 0 replies; 29+ messages in thread
From: joseph at codesourcery dot com @ 2013-11-13 16:54 UTC (permalink / raw)
  To: glibc-bugs

http://sourceware.org/bugzilla/show_bug.cgi?id=16159

--- Comment #16 from joseph at codesourcery dot com <joseph at codesourcery dot com> ---
On Wed, 13 Nov 2013, bugdal at aerifal dot cx wrote:

> Joseph, the bootstrapping issue can presumably be fixed (and bootstrapping made
> easier) simply by providing a way to install headers without building glibc.

There already is.  But to install the correct set of headers (some 
generated at build time) you first need an appropriately configured 
compiler to configure glibc.  That's the old three-compiler bootstrap 
process: first build a basic compiler, then install headers with it and 
crt*.o and build a dummy libc.so, then build a second compiler with shared 
libgcc, then build glibc, then build a third compiler.  I changed things 
in glibc and GCC so that a two-compiler process suffices: the initial 
compiler built without headers can build glibc and the result is identical 
to what you get if you repeatedly alternate GCC and glibc builds.  
(Ideally you'd have a one-compiler process, where the second compiler 
build only builds/rebuilds GCC's runtime libraries where they depend on 
system headers or shared glibc, not GCC itself.)

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug malloc/16159] malloc_printerr() deadlock, when calling malloc_printerr() again
  2013-11-13  3:30 [Bug malloc/16159] New: malloc_printerr() deadlock, when calling malloc_printerr() again darryl.miles at darrylmiles dot org
                   ` (15 preceding siblings ...)
  2013-11-13 16:54 ` joseph at codesourcery dot com
@ 2013-11-14 14:32 ` neleai at seznam dot cz
  2013-11-14 15:54 ` bugdal at aerifal dot cx
                   ` (7 subsequent siblings)
  24 siblings, 0 replies; 29+ messages in thread
From: neleai at seznam dot cz @ 2013-11-14 14:32 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=16159

Ondrej Bilka <neleai at seznam dot cz> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |neleai at seznam dot cz

--- Comment #17 from Ondrej Bilka <neleai at seznam dot cz> ---
Joseph, do you have a benchmark to measure libgcc overhead?

I tried a following

cat "int main()
{
  return 42;
}" > x.c
gcc x.c -O3  -o nogcc
gcc x.c -O3 -lgcc -o withgcc
time for I in `seq 1 10000`; do ./nogcc; done
time for I in `seq 1 10000`; do ./withgcc; done

And I cannot distinguish these from noise. When I linked with -lpthread there
was a noticable slowdown.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug malloc/16159] malloc_printerr() deadlock, when calling malloc_printerr() again
  2013-11-13  3:30 [Bug malloc/16159] New: malloc_printerr() deadlock, when calling malloc_printerr() again darryl.miles at darrylmiles dot org
                   ` (16 preceding siblings ...)
  2013-11-14 14:32 ` neleai at seznam dot cz
@ 2013-11-14 15:54 ` bugdal at aerifal dot cx
  2013-11-14 16:47 ` neleai at seznam dot cz
                   ` (6 subsequent siblings)
  24 siblings, 0 replies; 29+ messages in thread
From: bugdal at aerifal dot cx @ 2013-11-14 15:54 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=16159

--- Comment #18 from Rich Felker <bugdal at aerifal dot cx> ---
Ondrej, did you even check your results with readelf or ldd? -lgcc is a static
library and is always linked, so of course it won't make any difference. You
need to test with -lgcc_s (and double-check to make sure the dependency really
got added).

BTW, I'm not sure how well your test will do measuring exec time versus other
overhead. If you'd like, I have a test I can post that execs itself and
measures the actual time from just before the execve syscall to the start of
main.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug malloc/16159] malloc_printerr() deadlock, when calling malloc_printerr() again
  2013-11-13  3:30 [Bug malloc/16159] New: malloc_printerr() deadlock, when calling malloc_printerr() again darryl.miles at darrylmiles dot org
                   ` (17 preceding siblings ...)
  2013-11-14 15:54 ` bugdal at aerifal dot cx
@ 2013-11-14 16:47 ` neleai at seznam dot cz
  2013-11-14 17:08 ` bugdal at aerifal dot cx
                   ` (5 subsequent siblings)
  24 siblings, 0 replies; 29+ messages in thread
From: neleai at seznam dot cz @ 2013-11-14 16:47 UTC (permalink / raw)
  To: glibc-bugs

http://sourceware.org/bugzilla/show_bug.cgi?id=16159

--- Comment #19 from Ondrej Bilka <neleai at seznam dot cz> ---
On Thu, Nov 14, 2013 at 03:54:30PM +0000, bugdal at aerifal dot cx wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=16159
> 
> --- Comment #18 from Rich Felker <bugdal at aerifal dot cx> ---
> Ondrej, did you even check your results with readelf or ldd? -lgcc is a static
> library and is always linked, so of course it won't make any difference. You
> need to test with -lgcc_s (and double-check to make sure the dependency really
> got added).
> 
I asked for benchmark because of that, with a lgcc_s there is difference.

plain

real    0m3.039s
user    0m0.195s
sys    0m3.049s

with lgcc_s

real    0m3.141s
user    0m0.169s
sys    0m3.179s

with lpthread

real    0m3.282s
user    0m0.182s
sys    0m3.308s

> BTW, I'm not sure how well your test will do measuring exec time versus other
> overhead. If you'd like, I have a test I can post that execs itself and
> measures the actual time from just before the execve syscall to the start of
> main.
> 
These also count as I wanted to show a relative performance impact. If
this is taken into extreme we could improve performance by staticaly linking lm
and lpthread

Or using prelink.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug malloc/16159] malloc_printerr() deadlock, when calling malloc_printerr() again
  2013-11-13  3:30 [Bug malloc/16159] New: malloc_printerr() deadlock, when calling malloc_printerr() again darryl.miles at darrylmiles dot org
                   ` (18 preceding siblings ...)
  2013-11-14 16:47 ` neleai at seznam dot cz
@ 2013-11-14 17:08 ` bugdal at aerifal dot cx
  2013-11-28 13:52 ` eblake at redhat dot com
                   ` (4 subsequent siblings)
  24 siblings, 0 replies; 29+ messages in thread
From: bugdal at aerifal dot cx @ 2013-11-14 17:08 UTC (permalink / raw)
  To: glibc-bugs

http://sourceware.org/bugzilla/show_bug.cgi?id=16159

--- Comment #20 from Rich Felker <bugdal at aerifal dot cx> ---
On Thu, Nov 14, 2013 at 04:47:48PM +0000, neleai at seznam dot cz wrote:
> These also count as I wanted to show a relative performance impact. If

I agree this approach makes sense, but the relative performance impact
could change when the program (possibly linked with libgcc_s) is
invoked via posix_spawn or vfork+exec from a high-load server versus
as part of an inefficient shell script where the shell may have a lot
of additional syscall overhead on each command (this might also vary
between shells; dash or busybox ash might perform very differently
from bash). So while we may not care about the most extreme impact, I
think it's important to consider how large the relative overhead is
when the invocation conditions are a low-overhead, real-world
scenario.

> this is taken into extreme we could improve performance by staticaly linking lm
> and lpthread

Yes, of course -- actually, I would recommend merging all of the glibc
.so's into libc.so, but I understand that the current situation with
symbol versions greatly complicates this, and that there might be
other issues. It would certainly improve load-time performance and
memory overhead for small programs, though. But I think this is
outside the scope of this bug report. The interest in looking at
performance here is asking whether a proposed change would make
performance noticably worse (a regression), not how we can best
optimize startup performance.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug malloc/16159] malloc_printerr() deadlock, when calling malloc_printerr() again
  2013-11-13  3:30 [Bug malloc/16159] New: malloc_printerr() deadlock, when calling malloc_printerr() again darryl.miles at darrylmiles dot org
                   ` (19 preceding siblings ...)
  2013-11-14 17:08 ` bugdal at aerifal dot cx
@ 2013-11-28 13:52 ` eblake at redhat dot com
  2014-02-23 23:34 ` adconrad at 0c3 dot net
                   ` (3 subsequent siblings)
  24 siblings, 0 replies; 29+ messages in thread
From: eblake at redhat dot com @ 2013-11-28 13:52 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=16159

Eric Blake <eblake at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |eblake at redhat dot com

--- Comment #21 from Eric Blake <eblake at redhat dot com> ---
(In reply to Darryl Miles from comment #6)
> How do I stop this fancy backtrace stuff from working ?  I want to setup an
> environment variable to turn it off as a workaround ?

According to:
https://lists.gnu.org/archive/html/bug-gnulib/2013-11/msg00103.html
setting MALLOC_CHECK_=2 in the environment is sufficient to prevent the error
message attempts; but that sounds like something you set at program start
rather than something we can do via setenv() at the time of reporting the first
error (because setenv uses malloc).

> There is no need to actually fix the bug, you are over thinking the issue. 

Yes, there IS a need to fix something.  The link above points to a case of a
user that is unhappy that their ./configure failed because the conftest program
hung after tickling a malloc corruption bug in regex.  Configure should never
hang (thankfully, configure tests are one case where the MALLOC_CHECK_=2 trick
may be sufficient - someone probing for known glibc bugs doesn't care about a
bactrace, only about successful exit status).

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug malloc/16159] malloc_printerr() deadlock, when calling malloc_printerr() again
  2013-11-13  3:30 [Bug malloc/16159] New: malloc_printerr() deadlock, when calling malloc_printerr() again darryl.miles at darrylmiles dot org
                   ` (20 preceding siblings ...)
  2013-11-28 13:52 ` eblake at redhat dot com
@ 2014-02-23 23:34 ` adconrad at 0c3 dot net
  2014-06-13 12:18 ` fweimer at redhat dot com
                   ` (2 subsequent siblings)
  24 siblings, 0 replies; 29+ messages in thread
From: adconrad at 0c3 dot net @ 2014-02-23 23:34 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=16159

Adam Conrad <adconrad at 0c3 dot net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |adconrad at 0c3 dot net

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug malloc/16159] malloc_printerr() deadlock, when calling malloc_printerr() again
  2013-11-13  3:30 [Bug malloc/16159] New: malloc_printerr() deadlock, when calling malloc_printerr() again darryl.miles at darrylmiles dot org
                   ` (21 preceding siblings ...)
  2014-02-23 23:34 ` adconrad at 0c3 dot net
@ 2014-06-13 12:18 ` fweimer at redhat dot com
  2015-05-19  1:15 ` cvs-commit at gcc dot gnu.org
  2015-05-19  1:16 ` siddhesh at redhat dot com
  24 siblings, 0 replies; 29+ messages in thread
From: fweimer at redhat dot com @ 2014-06-13 12:18 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=16159

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |fweimer at redhat dot com
           See Also|                            |https://sourceware.org/bugz
                   |                            |illa/show_bug.cgi?id=12189
              Flags|                            |security-

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug malloc/16159] malloc_printerr() deadlock, when calling malloc_printerr() again
  2013-11-13  3:30 [Bug malloc/16159] New: malloc_printerr() deadlock, when calling malloc_printerr() again darryl.miles at darrylmiles dot org
                   ` (22 preceding siblings ...)
  2014-06-13 12:18 ` fweimer at redhat dot com
@ 2015-05-19  1:15 ` cvs-commit at gcc dot gnu.org
  2015-05-19  1:16 ` siddhesh at redhat dot com
  24 siblings, 0 replies; 29+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2015-05-19  1:15 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=16159

--- Comment #23 from cvs-commit at gcc dot gnu.org <cvs-commit at gcc dot gnu.org> ---
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, master has been updated
       via  fff94fa2245612191123a8015eac94eb04f001e2 (commit)
       via  99db95db37b4fd95986fadb263e4180b7381d10d (commit)
       via  920d70128baa41ce6ce3b1b4771fe912f8d1691a (commit)
      from  46f894d8c60afcc06056a376340df2f378694551 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=fff94fa2245612191123a8015eac94eb04f001e2

commit fff94fa2245612191123a8015eac94eb04f001e2
Author: Siddhesh Poyarekar <siddhesh@redhat.com>
Date:   Tue May 19 06:40:37 2015 +0530

    Avoid deadlock in malloc on backtrace (BZ #16159)

    When the malloc subsystem detects some kind of memory corruption,
    depending on the configuration it prints the error, a backtrace, a
    memory map and then aborts the process.  In this process, the
    backtrace() call may result in a call to malloc, resulting in
    various kinds of problematic behavior.

    In one case, the malloc it calls may detect a corruption and call
    backtrace again, and a stack overflow may result due to the infinite
    recursion.  In another case, the malloc it calls may deadlock on an
    arena lock with the malloc (or free, realloc, etc.) that detected the
    corruption.  In yet another case, if the program is linked with
    pthreads, backtrace may do a pthread_once initialization, which
    deadlocks on itself.

    In all these cases, the program exit is not as intended.  This is
    avoidable by marking the arena that malloc detected a corruption on,
    as unusable.  The following patch does that.  Features of this patch
    are as follows:

    - A flag is added to the mstate struct of the arena to indicate if the
      arena is corrupt.

    - The flag is checked whenever malloc functions try to get a lock on
      an arena.  If the arena is unusable, a NULL is returned, causing the
      malloc to use mmap or try the next arena.

    - malloc_printerr sets the corrupt flag on the arena when it detects a
      corruption

    - free does not concern itself with the flag at all.  It is not
      important since the backtrace workflow does not need free.  A free
      in a parallel thread may cause another corruption, but that's not
      new

    - The flag check and set are not atomic and may race.  This is fine
      since we don't care about contention during the flag check.  We want
      to make sure that the malloc call in the backtrace does not trip on
      itself and all that action happens in the same thread and not across
      threads.

    I verified that the test case does not show any regressions due to
    this patch.  I also ran the malloc benchmarks and found an
    insignificant difference in timings (< 2%).

        * malloc/Makefile (tests): New test case tst-malloc-backtrace.
        * malloc/arena.c (arena_lock): Check if arena is corrupt.
        (reused_arena): Find a non-corrupt arena.
        (heap_trim): Pass arena to unlink.
        * malloc/hooks.c (malloc_check_get_size): Pass arena to
        malloc_printerr.
        (top_check): Likewise.
        (free_check): Likewise.
        (realloc_check): Likewise.
        * malloc/malloc.c (malloc_printerr): Add arena argument.
        (unlink): Likewise.
        (munmap_chunk): Adjust.
        (ARENA_CORRUPTION_BIT): New macro.
        (arena_is_corrupt): Likewise.
        (set_arena_corrupt): Likewise.
        (sysmalloc): Use mmap if there are no usable arenas.
        (_int_malloc): Likewise.
        (__libc_malloc): Don't fail if arena_get returns NULL.
        (_mid_memalign): Likewise.
        (__libc_calloc): Likewise.
        (__libc_realloc): Adjust for additional argument to
        malloc_printerr.
        (_int_free): Likewise.
        (malloc_consolidate): Likewise.
        (_int_realloc): Likewise.
        (_int_memalign): Don't touch corrupt arenas.
        * malloc/tst-malloc-backtrace.c: New test case.

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=99db95db37b4fd95986fadb263e4180b7381d10d

commit 99db95db37b4fd95986fadb263e4180b7381d10d
Author: Siddhesh Poyarekar <siddhesh@redhat.com>
Date:   Tue May 19 06:36:29 2015 +0530

    Succeed if make check does not report any errors

    The conditional that evaluates if there are any FAILed test cases
    currently always fails, since we ensure it fails if we find any
    unexpected results in tests.sum and it would obviously fail if it does
    not find failed results in tests.sum.  This patch fixes this by simply
    inverting the result of the egrep, i.e. succeed if egrep fails (to
    find failed results) and fail if it succeeds.

    Tested with 'make subdirs=localedata check' and 'make subdirs=locale
    check' where all tests succeed and with 'make subdirs=elf check' where
    a couple of tests fail for me.

         * Makefile (summarize-tests): Fix return value on success.

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=920d70128baa41ce6ce3b1b4771fe912f8d1691a

commit 920d70128baa41ce6ce3b1b4771fe912f8d1691a
Author: Siddhesh Poyarekar <siddhesh@redhat.com>
Date:   Tue May 19 06:35:37 2015 +0530

    Add envz_remove to the libc manual

    I was told that Ma Shimao submitted a patch to add envz_remove to the
    libc manual, but the patch could not be accepted since he does not
    have a copyright assignment in place.  I have been woefully behind on
    libc-alpha recently and have not seen the patch or the discussion
    thread.  I have also not read the man page for envz_remove, so
    Alexandre Oliva asked me if I could write this independently and post
    a patch.  The patch below is the result of the same - I have written
    it based on the implementation in string/envz.c and Alex told me via
    email that the function is AS, AC and MT-safe like envz_strip.

    I assume Alex and Carlos cannot review this since they have been
    tainted by the original patch (I haven't even tried to look for a link
    to it since I don't want to be tainted) so someone else will have to
    review this.  If there are no reviewers till the end of the week, I
    will commit this since I believe there is a chance that there are no
    other reviewers who haven't read that thread.

        * manual/string.texi (Envz Functions): Add envz_remove.

-----------------------------------------------------------------------

Summary of changes:
 ChangeLog                                          |   35 ++++
 Makefile                                           |    2 +-
 NEWS                                               |   20 +-
 malloc/Makefile                                    |    6 +-
 malloc/arena.c                                     |   22 ++-
 malloc/hooks.c                                     |   12 +-
 malloc/malloc.c                                    |  173 ++++++++++++--------
 .../tst-detach1.c => malloc/tst-malloc-backtrace.c |   49 +++---
 manual/string.texi                                 |    8 +
 9 files changed, 213 insertions(+), 114 deletions(-)
 copy nptl/tst-detach1.c => malloc/tst-malloc-backtrace.c (57%)

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug malloc/16159] malloc_printerr() deadlock, when calling malloc_printerr() again
  2013-11-13  3:30 [Bug malloc/16159] New: malloc_printerr() deadlock, when calling malloc_printerr() again darryl.miles at darrylmiles dot org
                   ` (23 preceding siblings ...)
  2015-05-19  1:15 ` cvs-commit at gcc dot gnu.org
@ 2015-05-19  1:16 ` siddhesh at redhat dot com
  24 siblings, 0 replies; 29+ messages in thread
From: siddhesh at redhat dot com @ 2015-05-19  1:16 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=16159

Siddhesh Poyarekar <siddhesh at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |FIXED

--- Comment #24 from Siddhesh Poyarekar <siddhesh at redhat dot com> ---
Fixed in master.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2015-05-19  1:16 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-13  3:30 [Bug malloc/16159] New: malloc_printerr() deadlock, when calling malloc_printerr() again darryl.miles at darrylmiles dot org
2013-11-13  3:31 ` [Bug malloc/16159] " darryl.miles at darrylmiles dot org
2013-11-13  3:37 ` darryl.miles at darrylmiles dot org
2013-11-13  3:44 ` carlos at redhat dot com
2013-11-13  3:57 ` carlos at redhat dot com
2013-11-13  7:57   ` Ondřej Bílka
2013-11-13  7:57 ` neleai at seznam dot cz
2013-11-13 13:00 ` darryl.miles at darrylmiles dot org
2013-11-13 14:31   ` Ondřej Bílka
2013-11-13 13:11 ` darryl.miles at darrylmiles dot org
2013-11-13 14:31 ` neleai at seznam dot cz
2013-11-13 15:50 ` bugdal at aerifal dot cx
2013-11-13 16:03 ` carlos at redhat dot com
2013-11-13 16:12 ` joseph at codesourcery dot com
2013-11-13 16:23   ` Ondřej Bílka
2013-11-13 16:23 ` neleai at seznam dot cz
2013-11-13 16:28 ` bugdal at aerifal dot cx
2013-11-13 16:30 ` bugdal at aerifal dot cx
2013-11-13 16:47 ` joseph at codesourcery dot com
2013-11-13 16:54 ` joseph at codesourcery dot com
2013-11-14 14:32 ` neleai at seznam dot cz
2013-11-14 15:54 ` bugdal at aerifal dot cx
2013-11-14 16:47 ` neleai at seznam dot cz
2013-11-14 17:08 ` bugdal at aerifal dot cx
2013-11-28 13:52 ` eblake at redhat dot com
2014-02-23 23:34 ` adconrad at 0c3 dot net
2014-06-13 12:18 ` fweimer at redhat dot com
2015-05-19  1:15 ` cvs-commit at gcc dot gnu.org
2015-05-19  1:16 ` siddhesh at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).