public inbox for cygwin-developers@cygwin.com
 help / color / mirror / Atom feed
* malloc crash
@ 2021-10-24 21:46 Ken Brown
  2021-10-25  8:56 ` Takashi Yano
  2021-10-25  8:59 ` Corinna Vinschen
  0 siblings, 2 replies; 27+ messages in thread
From: Ken Brown @ 2021-10-24 21:46 UTC (permalink / raw)
  To: cygwin-devel

[-- Attachment #1: Type: text/plain, Size: 5086 bytes --]

I'm trying to debug the fifo problem reported here:

   https://cygwin.com/pipermail/cygwin/2021-October/249635.html

To keep my email self-contained, here are the reproduction instructions.  Run 
the attached script with argument 1000.  The output is supposed to look like this:

$ ./fifo_test.sh 1000
Creating 1000 fifo readers...
Created PID=6503  reading from /tmp/catfifo_0
FIFO 0
Created PID=6506  reading from /tmp/catfifo_1
FIFO 1
[...]
Created PID=9506  reading from /tmp/catfifo_998
FIFO 998
Created PID=9509  reading from /tmp/catfifo_999
FIFO 999

But invariably one of the exec'd cat processes will appear to hang.  (Actually 
it goes into an infinite loop.)  If you attach gdb to that process and catch it 
at the right time, you see something like this:

[...]
Reading symbols from /usr/bin/cat.exe...
Reading symbols from /usr/lib/debug//usr/bin/cat.exe.dbg...
(gdb) thr 1
[Switching to thread 1 (Thread 9692.0x8658)]
#0  0x00007ffe950ed674 in ntdll!ZwCreateEvent ()
    from /c/WINDOWS/SYSTEM32/ntdll.dll
(gdb) bt
#0  0x00007ffe950ed674 in ntdll!ZwCreateEvent ()
    from /c/WINDOWS/SYSTEM32/ntdll.dll
#1  0x00000001800e56c8 in CreateEventW (
     lpEventAttributes=0x18030ac90 <sec_none_nih>, bManualReset=0,
     bInitialState=0, lpName=0x0)
     at ../../../../temp/winsup/cygwin/kernel32.cc:46
#2  0x00000001800e57c1 in CreateEventA (
     lpEventAttributes=0x18030ac90 <sec_none_nih>, bManualReset=0,
     bInitialState=0, lpName=0x0)
     at ../../../../temp/winsup/cygwin/kernel32.cc:71
#3  0x00000001801493f1 in sig_send (p=0x180010000, si=..., tls=0xffffce00)
     at ../../../../temp/winsup/cygwin/sigproc.cc:698
#4  0x00000001800676c9 in exception::handle (e=0xffffc5b0, frame=0xffffcd80,
     in=0xffffc0c0, dispatch=0xffffbf40)
     at ../../../../temp/winsup/cygwin/exceptions.cc:834
#5  0x00007ffe950f20cf in ntdll!.chkstk () from /c/WINDOWS/SYSTEM32/ntdll.dll
#6  0x00007ffe950a1454 in ntdll!RtlRaiseException ()
    from /c/WINDOWS/SYSTEM32/ntdll.dll
#7  0x00007ffe950f0bfe in ntdll!KiUserExceptionDispatcher ()
    from /c/WINDOWS/SYSTEM32/ntdll.dll
#8  0x0000000180191a5c in init_top (m=0x18036f860 <_gm_>, p=0x800010000,
     psize=65456) at ../../../../temp/winsup/cygwin/malloc.cc:3903
#9  0x0000000180193249 in sys_alloc (m=0x18036f860 <_gm_>, nb=256)
     at ../../../../temp/winsup/cygwin/malloc.cc:4186
#10 0x0000000180196b96 in dlmalloc (bytes=248)
     at ../../../../temp/winsup/cygwin/malloc.cc:4669
#11 0x0000000180197f5d in dlcalloc (n_elements=1, elem_size=248)
     at ../../../../temp/winsup/cygwin/malloc.cc:4799
#12 0x00000001800e9030 in calloc (nmemb=1, size=248)
     at ../../../../temp/winsup/cygwin/malloc_wrapper.cc:101
#13 0x0000000180044a2a in operator new (s=248)
     at ../../../../temp/winsup/cygwin/cxx.cc:21
#14 0x000000018016a75d in pthread::init_mainthread ()
     at ../../../../temp/winsup/cygwin/thread.cc:371
#15 0x000000018004a310 in dll_crt0_1 ()
     at ../../../../temp/winsup/cygwin/dcrt0.cc:887
#16 0x000000018004771c in _cygtls::call2 (this=0xffffce00,
     func=0x18004a218 <dll_crt0_1(void*)>, arg=0x0, buf=0xffffcdb0)
     at ../../../../temp/winsup/cygwin/cygtls.cc:40
#17 0x00000001800476c1 in _cygtls::call (func=0x18004a218 <dll_crt0_1(void*)>,
     arg=0x0) at ../../../../temp/winsup/cygwin/cygtls.cc:27
#18 0x000000018004aac9 in _dll_crt0 ()
     at ../../../../temp/winsup/cygwin/dcrt0.cc:1099
#19 0x0000000000000000 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

Typing 'finish' enough times until it won't return anymore shows that there is 
an infinite loop starting with an access violation here:

(gdb) f 8
#8  0x0000000180191a5c in init_top (m=0x18036f860 <_gm_>, p=0x800010000,
     psize=65456) at ../../../../temp/winsup/cygwin/malloc.cc:3903
3903      p->head = psize | PINUSE_BIT;

I guess there's an infinite loop rather than a crash because the exec'd cat 
process isn't fully initialized yet, and the exception handler just keeps 
continuing execution at the site of the access violation.

If I'm reading the backtrace correctly, the access violation occurs while Cygwin 
is trying to allocate storage for the main thread object of the exec'd process.

I'm not familiar enough with the relevant Cygwin internals to take the analysis 
any further, but my guess is that the problem is somehow triggered by the 
creation of a new thread at the end of fhandler_fifo::fixup_after_exec:

       new cygthread (fifo_reader_thread, this, "fifo_reader", thr_sync_evt);

Is this a bug in the fifo code?  Is there some reason I shouldn't be creating a 
new thread in fixup_after_exec?  If so, I'm not sure what to do.  The fifo 
reader code depends crucially on having that thread running.

By the way, every once in a while the hang seems to occur in the forked bash 
process, before it execs cat.  This could also be due to the creation of a new 
thread, this time in fixup_after_fork.

Ken

P.S. The gdb session was based on a build from current git HEAD, but the problem 
also occurs in Cygwin 3.2.0.  So I don't think it's related to the new pipe code.

[-- Attachment #2: fifo_test.sh --]
[-- Type: text/plain, Size: 782 bytes --]

#!/bin/bash

# take arg as number of iterations (default=100)
STEPS="${1-100}"

FIFO_PFX="/tmp/catfifo_"
FIFO_WAIT=0
STEP_WAIT=0

function mysleep() { if [ -n "$1" -a "$1" != "0" ]; then sleep "$1"; fi }

function cleanup(){
	rm -f "$FIFO_PFX"*
}
trap cleanup EXIT

printf "Creating $STEPS fifo readers...\n"
for ((i=0; i<STEPS; i++ )); do
	fifo="$FIFO_PFX$i"

	# create fifo
	mkfifo "$fifo"
	mysleep $FIFO_WAIT

	# fork a process reading from fifo and writing it to stdout
	cat < "$fifo" &
	pid=$!
	printf "Created PID=$pid  reading from $fifo\n"

	# redirect FD3 to the fifo and print a message to it
	exec 3>"$fifo"		
	printf "FIFO %d\n" "$i" >&3

	# close the file descriptor, wait for process to exit and clean up
	exec 3>&-
	wait $pid
	rm -f "$fifo"

	mysleep $STEP_WAIT
done

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: malloc crash
  2021-10-24 21:46 malloc crash Ken Brown
@ 2021-10-25  8:56 ` Takashi Yano
  2021-10-25 13:37   ` Ken Brown
  2021-10-25  8:59 ` Corinna Vinschen
  1 sibling, 1 reply; 27+ messages in thread
From: Takashi Yano @ 2021-10-25  8:56 UTC (permalink / raw)
  To: cygwin-developers

On Sun, 24 Oct 2021 17:46:40 -0400
Ken Brown wrote:
> I'm trying to debug the fifo problem reported here:
> 
>    https://cygwin.com/pipermail/cygwin/2021-October/249635.html
> 
> To keep my email self-contained, here are the reproduction instructions.  Run 
> the attached script with argument 1000.  The output is supposed to look like this:
> 
> $ ./fifo_test.sh 1000
> Creating 1000 fifo readers...
> Created PID=6503  reading from /tmp/catfifo_0
> FIFO 0
> Created PID=6506  reading from /tmp/catfifo_1
> FIFO 1
> [...]
> Created PID=9506  reading from /tmp/catfifo_998
> FIFO 998
> Created PID=9509  reading from /tmp/catfifo_999
> FIFO 999
> 
> But invariably one of the exec'd cat processes will appear to hang.  (Actually 
> it goes into an infinite loop.)  If you attach gdb to that process and catch it 
> at the right time, you see something like this:

I noticed that this does not occur with 32-bit cygwin.
This occurs only with 64-bit cygwin in my environment.

Does malloc behave differently between 32 and 64 bit cygwin?

-- 
Takashi Yano <takashi.yano@nifty.ne.jp>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: malloc crash
  2021-10-24 21:46 malloc crash Ken Brown
  2021-10-25  8:56 ` Takashi Yano
@ 2021-10-25  8:59 ` Corinna Vinschen
  2021-10-25 12:35   ` Ken Brown
  1 sibling, 1 reply; 27+ messages in thread
From: Corinna Vinschen @ 2021-10-25  8:59 UTC (permalink / raw)
  To: cygwin-developers

On Oct 24 17:46, Ken Brown wrote:
> I'm trying to debug the fifo problem reported here:
> 
>   https://cygwin.com/pipermail/cygwin/2021-October/249635.html
> 
> To keep my email self-contained, here are the reproduction instructions.
> Run the attached script with argument 1000.  The output is supposed to look
> like this:
> [...]
>     func=0x18004a218 <dll_crt0_1(void*)>, arg=0x0, buf=0xffffcdb0)
>     at ../../../../temp/winsup/cygwin/cygtls.cc:40
> #17 0x00000001800476c1 in _cygtls::call (func=0x18004a218 <dll_crt0_1(void*)>,
>     arg=0x0) at ../../../../temp/winsup/cygwin/cygtls.cc:27
> #18 0x000000018004aac9 in _dll_crt0 ()
>     at ../../../../temp/winsup/cygwin/dcrt0.cc:1099
> #19 0x0000000000000000 in ?? ()
> Backtrace stopped: previous frame inner to this frame (corrupt stack?)
> 
> Typing 'finish' enough times until it won't return anymore shows that there
> is an infinite loop starting with an access violation here:
> 
> (gdb) f 8
> #8  0x0000000180191a5c in init_top (m=0x18036f860 <_gm_>, p=0x800010000,
>     psize=65456) at ../../../../temp/winsup/cygwin/malloc.cc:3903
> 3903      p->head = psize | PINUSE_BIT;

The address p=0x800010000 indicates that this malloc tries to alloc heap
space, and the address 0x800010000 is right at the start.  Exec'd
process, so this SEGV is rather strange, becasue that would mean this
part of the VM isn't commited.  How's that supposed to happen?  Malloc
should have called sbrk before, which in turn would have committed this
part of the heap.  Puzzeling.

> If I'm reading the backtrace correctly, the access violation occurs while
> Cygwin is trying to allocate storage for the main thread object of the
> exec'd process.

Looks like it, yes.

> I'm not familiar enough with the relevant Cygwin internals to take the
> analysis any further, but my guess is that the problem is somehow triggered
> by the creation of a new thread at the end of
> fhandler_fifo::fixup_after_exec:
> 
>       new cygthread (fifo_reader_thread, this, "fifo_reader", thr_sync_evt);
> 
> Is this a bug in the fifo code?  Is there some reason I shouldn't be
> creating a new thread in fixup_after_exec?

I'm not aware of any.  Starting cygthreads is an integral part of
process startup, e. g., the wait_sig thread.

Has the thread already been started at this point?


Corinna

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: malloc crash
  2021-10-25  8:59 ` Corinna Vinschen
@ 2021-10-25 12:35   ` Ken Brown
  2021-10-25 15:39     ` Corinna Vinschen
  0 siblings, 1 reply; 27+ messages in thread
From: Ken Brown @ 2021-10-25 12:35 UTC (permalink / raw)
  To: cygwin-developers

On 10/25/2021 4:59 AM, Corinna Vinschen wrote:
> On Oct 24 17:46, Ken Brown wrote:
>> I'm trying to debug the fifo problem reported here:
>>
>>    https://cygwin.com/pipermail/cygwin/2021-October/249635.html
>>
>> To keep my email self-contained, here are the reproduction instructions.
>> Run the attached script with argument 1000.  The output is supposed to look
>> like this:
>> [...]
>>      func=0x18004a218 <dll_crt0_1(void*)>, arg=0x0, buf=0xffffcdb0)
>>      at ../../../../temp/winsup/cygwin/cygtls.cc:40
>> #17 0x00000001800476c1 in _cygtls::call (func=0x18004a218 <dll_crt0_1(void*)>,
>>      arg=0x0) at ../../../../temp/winsup/cygwin/cygtls.cc:27
>> #18 0x000000018004aac9 in _dll_crt0 ()
>>      at ../../../../temp/winsup/cygwin/dcrt0.cc:1099
>> #19 0x0000000000000000 in ?? ()
>> Backtrace stopped: previous frame inner to this frame (corrupt stack?)
>>
>> Typing 'finish' enough times until it won't return anymore shows that there
>> is an infinite loop starting with an access violation here:
>>
>> (gdb) f 8
>> #8  0x0000000180191a5c in init_top (m=0x18036f860 <_gm_>, p=0x800010000,
>>      psize=65456) at ../../../../temp/winsup/cygwin/malloc.cc:3903
>> 3903      p->head = psize | PINUSE_BIT;
> 
> The address p=0x800010000 indicates that this malloc tries to alloc heap
> space, and the address 0x800010000 is right at the start.  Exec'd
> process, so this SEGV is rather strange, becasue that would mean this
> part of the VM isn't commited.  How's that supposed to happen?  Malloc
> should have called sbrk before, which in turn would have committed this
> part of the heap.  Puzzeling.
> 
>> If I'm reading the backtrace correctly, the access violation occurs while
>> Cygwin is trying to allocate storage for the main thread object of the
>> exec'd process.
> 
> Looks like it, yes.
> 
>> I'm not familiar enough with the relevant Cygwin internals to take the
>> analysis any further, but my guess is that the problem is somehow triggered
>> by the creation of a new thread at the end of
>> fhandler_fifo::fixup_after_exec:
>>
>>        new cygthread (fifo_reader_thread, this, "fifo_reader", thr_sync_evt);
>>
>> Is this a bug in the fifo code?  Is there some reason I shouldn't be
>> creating a new thread in fixup_after_exec?
> 
> I'm not aware of any.  Starting cygthreads is an integral part of
> process startup, e. g., the wait_sig thread.
> 
> Has the thread already been started at this point?

Yes, here's the backtrace of that thread:

Thread 5 (Thread 9692.0x7c4c):
#0  0x00000001801934f9 in sys_alloc (m=0x18036f860 <_gm_>, nb=1040) at 
../../../../temp/winsup/cygwin/malloc.cc:4232
#1  0x0000000180196b96 in dlmalloc (bytes=1024) at 
../../../../temp/winsup/cygwin/malloc.cc:4669
#2  0x00000001801993e1 in dlrealloc (oldmem=0x0, bytes=1024) at 
../../../../temp/winsup/cygwin/malloc.cc:5187
#3  0x00000001800e8eed in realloc (p=0x0, size=1024) at 
../../../../temp/winsup/cygwin/malloc_wrapper.cc:73
#4  0x000000018008b6c4 in fhandler_fifo::add_client_handler (this=0x1803a0d80, 
new_pipe_instance=false) at ../../../../temp/winsup/cygwin/fhandler_fifo.cc:330
#5  0x000000018008b9ee in fhandler_fifo::update_my_handlers (this=0x1803a0d80) 
at ../../../../temp/winsup/cygwin/fhandler_fifo.cc:407
#6  0x000000018008bfe6 in fhandler_fifo::fifo_reader_thread_func 
(this=0x1803a0d80) at ../../../../temp/winsup/cygwin/fhandler_fifo.cc:531
#7  0x000000018008bcda in fifo_reader_thread (param=0x1803a0d80) at 
../../../../temp/winsup/cygwin/fhandler_fifo.cc:453
#8  0x000000018004684f in cygthread::callfunc (this=0x180276620 <threads>, 
issimplestub=false) at ../../../../temp/winsup/cygwin/cygthread.cc:48
#9  0x0000000180046a25 in cygthread::stub (arg=0x180276620 <threads>) at 
../../../../temp/winsup/cygwin/cygthread.cc:91
#10 0x000000018004771c in _cygtls::call2 (this=0x114ce00, func=0x180046856 
<cygthread::stub(void*)>, arg=0x180276620 <threads>, buf=0x114cce0) at 
../../../../temp/winsup/cygwin/cygtls.cc:40
#11 0x00000001800476c1 in _cygtls::call (func=0x180046856 
<cygthread::stub(void*)>, arg=0x180276620 <threads>) at 
../../../../temp/winsup/cygwin/cygtls.cc:27
#12 0x00000001800e4e65 in threadfunc_fe (arg=0x180276620 <threads>) at 
../../../../temp/winsup/cygwin/init.cc:28
#13 0x00007ffe94c27034 in KERNEL32!BaseThreadInitThunk () from 
/c/WINDOWS/System32/KERNEL32.DLL
#14 0x00007ffe950a2651 in ntdll!RtlUserThreadStart () from 
/c/WINDOWS/SYSTEM32/ntdll.dll
#15 0x0000000000000000 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

It's also trying to allocate memory.  Is this a race between two threads 
allocating memory?

Ken

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: malloc crash
  2021-10-25  8:56 ` Takashi Yano
@ 2021-10-25 13:37   ` Ken Brown
  0 siblings, 0 replies; 27+ messages in thread
From: Ken Brown @ 2021-10-25 13:37 UTC (permalink / raw)
  To: cygwin-developers

On 10/25/2021 4:56 AM, Takashi Yano wrote:
> On Sun, 24 Oct 2021 17:46:40 -0400
> Ken Brown wrote:
>> I'm trying to debug the fifo problem reported here:
>>
>>     https://cygwin.com/pipermail/cygwin/2021-October/249635.html
>>
>> To keep my email self-contained, here are the reproduction instructions.  Run
>> the attached script with argument 1000.  The output is supposed to look like this:
>>
>> $ ./fifo_test.sh 1000
>> Creating 1000 fifo readers...
>> Created PID=6503  reading from /tmp/catfifo_0
>> FIFO 0
>> Created PID=6506  reading from /tmp/catfifo_1
>> FIFO 1
>> [...]
>> Created PID=9506  reading from /tmp/catfifo_998
>> FIFO 998
>> Created PID=9509  reading from /tmp/catfifo_999
>> FIFO 999
>>
>> But invariably one of the exec'd cat processes will appear to hang.  (Actually
>> it goes into an infinite loop.)  If you attach gdb to that process and catch it
>> at the right time, you see something like this:
> 
> I noticed that this does not occur with 32-bit cygwin.
> This occurs only with 64-bit cygwin in my environment.

It hadn't occurred to me to test 32-bit, but I just tried, and I can't reproduce 
the problem in that environment either.

> Does malloc behave differently between 32 and 64 bit cygwin?

Ken

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: malloc crash
  2021-10-25 12:35   ` Ken Brown
@ 2021-10-25 15:39     ` Corinna Vinschen
  2021-10-25 21:29       ` Mark Geisert
  0 siblings, 1 reply; 27+ messages in thread
From: Corinna Vinschen @ 2021-10-25 15:39 UTC (permalink / raw)
  To: cygwin-developers

On Oct 25 08:35, Ken Brown wrote:
> On 10/25/2021 4:59 AM, Corinna Vinschen wrote:
> > Has the thread already been started at this point?
> 
> Yes, here's the backtrace of that thread:
> 
> Thread 5 (Thread 9692.0x7c4c):
> #0  0x00000001801934f9 in sys_alloc (m=0x18036f860 <_gm_>, nb=1040) at
> ../../../../temp/winsup/cygwin/malloc.cc:4232
> #1  0x0000000180196b96 in dlmalloc (bytes=1024) at
> ../../../../temp/winsup/cygwin/malloc.cc:4669
> #2  0x00000001801993e1 in dlrealloc (oldmem=0x0, bytes=1024) at
> ../../../../temp/winsup/cygwin/malloc.cc:5187
> #3  0x00000001800e8eed in realloc (p=0x0, size=1024) at
> ../../../../temp/winsup/cygwin/malloc_wrapper.cc:73

Er... huh?  So both threads are in a malloc function?  This shouldn't
have happened, given the clunky muto guarding malloc calls.  This is
really strange.  Why's the muto not working here?


Corinna

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: malloc crash
  2021-10-25 15:39     ` Corinna Vinschen
@ 2021-10-25 21:29       ` Mark Geisert
  2021-10-25 22:02         ` Ken Brown
  0 siblings, 1 reply; 27+ messages in thread
From: Mark Geisert @ 2021-10-25 21:29 UTC (permalink / raw)
  To: cygwin-developers

Corinna Vinschen wrote:
> On Oct 25 08:35, Ken Brown wrote:
>> On 10/25/2021 4:59 AM, Corinna Vinschen wrote:
>>> Has the thread already been started at this point?
>>
>> Yes, here's the backtrace of that thread:
>>
>> Thread 5 (Thread 9692.0x7c4c):
>> #0  0x00000001801934f9 in sys_alloc (m=0x18036f860 <_gm_>, nb=1040) at
>> ../../../../temp/winsup/cygwin/malloc.cc:4232
>> #1  0x0000000180196b96 in dlmalloc (bytes=1024) at
>> ../../../../temp/winsup/cygwin/malloc.cc:4669
>> #2  0x00000001801993e1 in dlrealloc (oldmem=0x0, bytes=1024) at
>> ../../../../temp/winsup/cygwin/malloc.cc:5187
>> #3  0x00000001800e8eed in realloc (p=0x0, size=1024) at
>> ../../../../temp/winsup/cygwin/malloc_wrapper.cc:73
> 
> Er... huh?  So both threads are in a malloc function?  This shouldn't
> have happened, given the clunky muto guarding malloc calls.  This is
> really strange.  Why's the muto not working here?

Is it possible both threads have executed malloc_init()?
If so, the second one would reinit the muto.

There's no exclusion on malloc_init.. chicken v egg.

..mark

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: malloc crash
  2021-10-25 21:29       ` Mark Geisert
@ 2021-10-25 22:02         ` Ken Brown
  2021-10-25 23:36           ` Mark Geisert
  2021-10-26  9:24           ` Corinna Vinschen
  0 siblings, 2 replies; 27+ messages in thread
From: Ken Brown @ 2021-10-25 22:02 UTC (permalink / raw)
  To: cygwin-developers

On 10/25/2021 5:29 PM, Mark Geisert wrote:
> Corinna Vinschen wrote:
>> On Oct 25 08:35, Ken Brown wrote:
>>> On 10/25/2021 4:59 AM, Corinna Vinschen wrote:
>>>> Has the thread already been started at this point?
>>>
>>> Yes, here's the backtrace of that thread:
>>>
>>> Thread 5 (Thread 9692.0x7c4c):
>>> #0  0x00000001801934f9 in sys_alloc (m=0x18036f860 <_gm_>, nb=1040) at
>>> ../../../../temp/winsup/cygwin/malloc.cc:4232
>>> #1  0x0000000180196b96 in dlmalloc (bytes=1024) at
>>> ../../../../temp/winsup/cygwin/malloc.cc:4669
>>> #2  0x00000001801993e1 in dlrealloc (oldmem=0x0, bytes=1024) at
>>> ../../../../temp/winsup/cygwin/malloc.cc:5187
>>> #3  0x00000001800e8eed in realloc (p=0x0, size=1024) at
>>> ../../../../temp/winsup/cygwin/malloc_wrapper.cc:73
>>
>> Er... huh?  So both threads are in a malloc function?  This shouldn't
>> have happened, given the clunky muto guarding malloc calls.  This is
>> really strange.  Why's the muto not working here?
> 
> Is it possible both threads have executed malloc_init()?
> If so, the second one would reinit the muto.

Or does the fifo_reader thread call a malloc function before the main thread has 
called malloc_init()?  This would presumably cause __malloc_lock() to fail, but 
there's no error check.

Ken

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: malloc crash
  2021-10-25 22:02         ` Ken Brown
@ 2021-10-25 23:36           ` Mark Geisert
  2021-10-26  0:18             ` Takashi Yano
  2021-10-26  9:24           ` Corinna Vinschen
  1 sibling, 1 reply; 27+ messages in thread
From: Mark Geisert @ 2021-10-25 23:36 UTC (permalink / raw)
  To: cygwin-developers

Ken Brown wrote:
> On 10/25/2021 5:29 PM, Mark Geisert wrote:
>> Corinna Vinschen wrote:
>>> On Oct 25 08:35, Ken Brown wrote:
>>>> On 10/25/2021 4:59 AM, Corinna Vinschen wrote:
>>>>> Has the thread already been started at this point?
>>>>
>>>> Yes, here's the backtrace of that thread:
>>>>
>>>> Thread 5 (Thread 9692.0x7c4c):
>>>> #0  0x00000001801934f9 in sys_alloc (m=0x18036f860 <_gm_>, nb=1040) at
>>>> ../../../../temp/winsup/cygwin/malloc.cc:4232
>>>> #1  0x0000000180196b96 in dlmalloc (bytes=1024) at
>>>> ../../../../temp/winsup/cygwin/malloc.cc:4669
>>>> #2  0x00000001801993e1 in dlrealloc (oldmem=0x0, bytes=1024) at
>>>> ../../../../temp/winsup/cygwin/malloc.cc:5187
>>>> #3  0x00000001800e8eed in realloc (p=0x0, size=1024) at
>>>> ../../../../temp/winsup/cygwin/malloc_wrapper.cc:73
>>>
>>> Er... huh?  So both threads are in a malloc function?  This shouldn't
>>> have happened, given the clunky muto guarding malloc calls.  This is
>>> really strange.  Why's the muto not working here?
>>
>> Is it possible both threads have executed malloc_init()?
>> If so, the second one would reinit the muto.
> 
> Or does the fifo_reader thread call a malloc function before the main thread has 
> called malloc_init()?  This would presumably cause __malloc_lock() to fail, but 
> there's no error check.

If there's a global constructor involved, that is known to happen.  Constructors 
are run from dll_crt0_0(), before malloc_init() is called from dll_crt0_1().  See 
dcrt0.cc for the details.

..mark

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: malloc crash
  2021-10-25 23:36           ` Mark Geisert
@ 2021-10-26  0:18             ` Takashi Yano
  2021-10-26  0:54               ` Mark Geisert
  0 siblings, 1 reply; 27+ messages in thread
From: Takashi Yano @ 2021-10-26  0:18 UTC (permalink / raw)
  To: cygwin-developers

On Mon, 25 Oct 2021 16:36:50 -0700
Mark Geisert wrote:
> Ken Brown wrote:
> > On 10/25/2021 5:29 PM, Mark Geisert wrote:
> >> Corinna Vinschen wrote:
> >>> On Oct 25 08:35, Ken Brown wrote:
> >>>> On 10/25/2021 4:59 AM, Corinna Vinschen wrote:
> >>>>> Has the thread already been started at this point?
> >>>>
> >>>> Yes, here's the backtrace of that thread:
> >>>>
> >>>> Thread 5 (Thread 9692.0x7c4c):
> >>>> #0  0x00000001801934f9 in sys_alloc (m=0x18036f860 <_gm_>, nb=1040) at
> >>>> ../../../../temp/winsup/cygwin/malloc.cc:4232
> >>>> #1  0x0000000180196b96 in dlmalloc (bytes=1024) at
> >>>> ../../../../temp/winsup/cygwin/malloc.cc:4669
> >>>> #2  0x00000001801993e1 in dlrealloc (oldmem=0x0, bytes=1024) at
> >>>> ../../../../temp/winsup/cygwin/malloc.cc:5187
> >>>> #3  0x00000001800e8eed in realloc (p=0x0, size=1024) at
> >>>> ../../../../temp/winsup/cygwin/malloc_wrapper.cc:73
> >>>
> >>> Er... huh?  So both threads are in a malloc function?  This shouldn't
> >>> have happened, given the clunky muto guarding malloc calls.  This is
> >>> really strange.  Why's the muto not working here?
> >>
> >> Is it possible both threads have executed malloc_init()?
> >> If so, the second one would reinit the muto.
> > 
> > Or does the fifo_reader thread call a malloc function before the main thread has 
> > called malloc_init()?  This would presumably cause __malloc_lock() to fail, but 
> > there's no error check.
> 
> If there's a global constructor involved, that is known to happen.  Constructors 
> are run from dll_crt0_0(), before malloc_init() is called from dll_crt0_1().  See 
> dcrt0.cc for the details.

So how about moving malloc_init() call from dll_crt0_1() to dll_crl0_0()
so that malloc() can be called in fixup_after_fork/exec()?

-- 
Takashi Yano <takashi.yano@nifty.ne.jp>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: malloc crash
  2021-10-26  0:18             ` Takashi Yano
@ 2021-10-26  0:54               ` Mark Geisert
  2021-10-26  8:30                 ` Mark Geisert
  2021-10-26  9:27                 ` Corinna Vinschen
  0 siblings, 2 replies; 27+ messages in thread
From: Mark Geisert @ 2021-10-26  0:54 UTC (permalink / raw)
  To: cygwin-developers

Takashi Yano wrote:
> On Mon, 25 Oct 2021 16:36:50 -0700
> Mark Geisert wrote:
>> Ken Brown wrote:
>>> On 10/25/2021 5:29 PM, Mark Geisert wrote:
>>>> Corinna Vinschen wrote:
>>>>> On Oct 25 08:35, Ken Brown wrote:
>>>>>> On 10/25/2021 4:59 AM, Corinna Vinschen wrote:
>>>>>>> Has the thread already been started at this point?
>>>>>>
>>>>>> Yes, here's the backtrace of that thread:
>>>>>>
>>>>>> Thread 5 (Thread 9692.0x7c4c):
>>>>>> #0  0x00000001801934f9 in sys_alloc (m=0x18036f860 <_gm_>, nb=1040) at
>>>>>> ../../../../temp/winsup/cygwin/malloc.cc:4232
>>>>>> #1  0x0000000180196b96 in dlmalloc (bytes=1024) at
>>>>>> ../../../../temp/winsup/cygwin/malloc.cc:4669
>>>>>> #2  0x00000001801993e1 in dlrealloc (oldmem=0x0, bytes=1024) at
>>>>>> ../../../../temp/winsup/cygwin/malloc.cc:5187
>>>>>> #3  0x00000001800e8eed in realloc (p=0x0, size=1024) at
>>>>>> ../../../../temp/winsup/cygwin/malloc_wrapper.cc:73
>>>>>
>>>>> Er... huh?  So both threads are in a malloc function?  This shouldn't
>>>>> have happened, given the clunky muto guarding malloc calls.  This is
>>>>> really strange.  Why's the muto not working here?
>>>>
>>>> Is it possible both threads have executed malloc_init()?
>>>> If so, the second one would reinit the muto.
>>>
>>> Or does the fifo_reader thread call a malloc function before the main thread has
>>> called malloc_init()?  This would presumably cause __malloc_lock() to fail, but
>>> there's no error check.
>>
>> If there's a global constructor involved, that is known to happen.  Constructors
>> are run from dll_crt0_0(), before malloc_init() is called from dll_crt0_1().  See
>> dcrt0.cc for the details.
> 
> So how about moving malloc_init() call from dll_crt0_1() to dll_crl0_0()
> so that malloc() can be called in fixup_after_fork/exec()?

It appears simple, but this is a touchy area of code.  The _0 and _1 are two 
separate phases of process startup.  I'd want to hear Corinna's thoughts on this.

I'd also like to verify somehow that this is the scenario Ken is hitting.

When I was researching different mallocs for Cygwin I hit the constructor snag 
repeatedly.  I did try delaying the constructor-running until after malloc_init(). 
  More problems.  I did not try moving malloc_init() to before the constructor run.

..mark

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: malloc crash
  2021-10-26  0:54               ` Mark Geisert
@ 2021-10-26  8:30                 ` Mark Geisert
  2021-10-26  8:52                   ` Takashi Yano
  2021-10-26  9:27                 ` Corinna Vinschen
  1 sibling, 1 reply; 27+ messages in thread
From: Mark Geisert @ 2021-10-26  8:30 UTC (permalink / raw)
  To: cygwin-developers

Replying to myself to correct something I wrote...

Mark Geisert wrote:
> Takashi Yano wrote:
>> On Mon, 25 Oct 2021 16:36:50 -0700
>> Mark Geisert wrote:
>>> Ken Brown wrote:
>>>> On 10/25/2021 5:29 PM, Mark Geisert wrote:
>>>>> Corinna Vinschen wrote:
>>>>>> On Oct 25 08:35, Ken Brown wrote:
>>>>>>> On 10/25/2021 4:59 AM, Corinna Vinschen wrote:
>>>>>>>> Has the thread already been started at this point?
>>>>>>>
>>>>>>> Yes, here's the backtrace of that thread:
>>>>>>>
>>>>>>> Thread 5 (Thread 9692.0x7c4c):
>>>>>>> #0  0x00000001801934f9 in sys_alloc (m=0x18036f860 <_gm_>, nb=1040) at
>>>>>>> ../../../../temp/winsup/cygwin/malloc.cc:4232
>>>>>>> #1  0x0000000180196b96 in dlmalloc (bytes=1024) at
>>>>>>> ../../../../temp/winsup/cygwin/malloc.cc:4669
>>>>>>> #2  0x00000001801993e1 in dlrealloc (oldmem=0x0, bytes=1024) at
>>>>>>> ../../../../temp/winsup/cygwin/malloc.cc:5187
>>>>>>> #3  0x00000001800e8eed in realloc (p=0x0, size=1024) at
>>>>>>> ../../../../temp/winsup/cygwin/malloc_wrapper.cc:73
>>>>>>
>>>>>> Er... huh?  So both threads are in a malloc function?  This shouldn't
>>>>>> have happened, given the clunky muto guarding malloc calls.  This is
>>>>>> really strange.  Why's the muto not working here?
>>>>>
>>>>> Is it possible both threads have executed malloc_init()?
>>>>> If so, the second one would reinit the muto.
>>>>
>>>> Or does the fifo_reader thread call a malloc function before the main thread has
>>>> called malloc_init()?  This would presumably cause __malloc_lock() to fail, but
>>>> there's no error check.
>>>
>>> If there's a global constructor involved, that is known to happen.  Constructors
>>> are run from dll_crt0_0(), before malloc_init() is called from dll_crt0_1().  See
>>> dcrt0.cc for the details.
>>
>> So how about moving malloc_init() call from dll_crt0_1() to dll_crl0_0()
>> so that malloc() can be called in fixup_after_fork/exec()?
> 
> It appears simple, but this is a touchy area of code.  The _0 and _1 are two 
> separate phases of process startup.  I'd want to hear Corinna's thoughts on this.
> 
> I'd also like to verify somehow that this is the scenario Ken is hitting.
> 
> When I was researching different mallocs for Cygwin I hit the constructor snag 
> repeatedly.  I did try delaying the constructor-running until after malloc_init(). 
>   More problems.  I did not try moving malloc_init() to before the constructor run.

Apologies; this was many months ago.  What I did try was moving the malloc_init() 
to before running the constructor chain, as Takashi suggested.  That is what gave 
me more problems.  I don't recall what they were, but I reverted that attempt.

The "future malloc" build of Cygwin I'm running doesn't exhibit Ken's issue, as 
far as I can tell.  It has a specific fix to avoid the scenario I've been talking 
about here, but I don't want to take us down that path unless we're sure Ken's 
hitting that same scenario.

..mark

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: malloc crash
  2021-10-26  8:30                 ` Mark Geisert
@ 2021-10-26  8:52                   ` Takashi Yano
  2021-10-26  8:59                     ` Mark Geisert
  0 siblings, 1 reply; 27+ messages in thread
From: Takashi Yano @ 2021-10-26  8:52 UTC (permalink / raw)
  To: cygwin-developers

On Tue, 26 Oct 2021 01:30:13 -0700
Mark Geisert wrote:
> Replying to myself to correct something I wrote...
> 
> Mark Geisert wrote:
> > Takashi Yano wrote:
> >> On Mon, 25 Oct 2021 16:36:50 -0700
> >> Mark Geisert wrote:
> >>> Ken Brown wrote:
> >>>> On 10/25/2021 5:29 PM, Mark Geisert wrote:
> >>>>> Corinna Vinschen wrote:
> >>>>>> Er... huh?  So both threads are in a malloc function?  This shouldn't
> >>>>>> have happened, given the clunky muto guarding malloc calls.  This is
> >>>>>> really strange.  Why's the muto not working here?
> >>>>>
> >>>>> Is it possible both threads have executed malloc_init()?
> >>>>> If so, the second one would reinit the muto.
> >>>>
> >>>> Or does the fifo_reader thread call a malloc function before the main thread has
> >>>> called malloc_init()?  This would presumably cause __malloc_lock() to fail, but
> >>>> there's no error check.
> >>>
> >>> If there's a global constructor involved, that is known to happen.  Constructors
> >>> are run from dll_crt0_0(), before malloc_init() is called from dll_crt0_1().  See
> >>> dcrt0.cc for the details.
> >>
> >> So how about moving malloc_init() call from dll_crt0_1() to dll_crl0_0()
> >> so that malloc() can be called in fixup_after_fork/exec()?
> > 
> > It appears simple, but this is a touchy area of code.  The _0 and _1 are two 
> > separate phases of process startup.  I'd want to hear Corinna's thoughts on this.
> > 
> > I'd also like to verify somehow that this is the scenario Ken is hitting.
> > 
> > When I was researching different mallocs for Cygwin I hit the constructor snag 
> > repeatedly.  I did try delaying the constructor-running until after malloc_init(). 
> >   More problems.  I did not try moving malloc_init() to before the constructor run.
> 
> Apologies; this was many months ago.  What I did try was moving the malloc_init() 
> to before running the constructor chain, as Takashi suggested.  That is what gave 
> me more problems.  I don't recall what they were, but I reverted that attempt.
> 
> The "future malloc" build of Cygwin I'm running doesn't exhibit Ken's issue, as 
> far as I can tell.  It has a specific fix to avoid the scenario I've been talking 
> about here, but I don't want to take us down that path unless we're sure Ken's 
> hitting that same scenario.

I tried the following patch, and confirmed that the issue has
been disappeared. I do not notice any other problems so far
with this patch.

diff --git a/winsup/cygwin/dcrt0.cc b/winsup/cygwin/dcrt0.cc
index 6f4723bb0..0d541ec14 100644
--- a/winsup/cygwin/dcrt0.cc
+++ b/winsup/cygwin/dcrt0.cc
@@ -773,6 +773,10 @@ dll_crt0_0 ()
   do_global_ctors (&__CTOR_LIST__, 1);
   cygthread::init ();
 
+  /* malloc_init() has been moved from dll_crt0_1() to here so that
+     malloc() can be called in fixup_after_exec(). */
+  malloc_init ();
+
   if (!child_proc_info)
     {
       setup_cygheap ();
@@ -857,7 +861,7 @@ dll_crt0_1 (void *)
      on a functioning malloc and it's possible that the user's program may
      have overridden malloc.  We only know about that at this stage,
      unfortunately. */
-  malloc_init ();
+  /* malloc_init() has been moved to dll_crt0_0(). */
   user_shared->initialize ();
 
 #ifdef CYGHEAP_DEBUG


Where is the "constructor chain" you mentioned?

-- 
Takashi Yano <takashi.yano@nifty.ne.jp>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: malloc crash
  2021-10-26  8:52                   ` Takashi Yano
@ 2021-10-26  8:59                     ` Mark Geisert
  2021-10-26  9:26                       ` Takashi Yano
  2021-10-26  9:28                       ` Corinna Vinschen
  0 siblings, 2 replies; 27+ messages in thread
From: Mark Geisert @ 2021-10-26  8:59 UTC (permalink / raw)
  To: cygwin-developers

Takashi Yano wrote:
> On Tue, 26 Oct 2021 01:30:13 -0700
> Mark Geisert wrote:
>> Replying to myself to correct something I wrote...
>>
>> Mark Geisert wrote:
>>> Takashi Yano wrote:
>>>> On Mon, 25 Oct 2021 16:36:50 -0700
>>>> Mark Geisert wrote:
>>>>> Ken Brown wrote:
>>>>>> On 10/25/2021 5:29 PM, Mark Geisert wrote:
>>>>>>> Corinna Vinschen wrote:
>>>>>>>> Er... huh?  So both threads are in a malloc function?  This shouldn't
>>>>>>>> have happened, given the clunky muto guarding malloc calls.  This is
>>>>>>>> really strange.  Why's the muto not working here?
>>>>>>>
>>>>>>> Is it possible both threads have executed malloc_init()?
>>>>>>> If so, the second one would reinit the muto.
>>>>>>
>>>>>> Or does the fifo_reader thread call a malloc function before the main thread has
>>>>>> called malloc_init()?  This would presumably cause __malloc_lock() to fail, but
>>>>>> there's no error check.
>>>>>
>>>>> If there's a global constructor involved, that is known to happen.  Constructors
>>>>> are run from dll_crt0_0(), before malloc_init() is called from dll_crt0_1().  See
>>>>> dcrt0.cc for the details.
>>>>
>>>> So how about moving malloc_init() call from dll_crt0_1() to dll_crl0_0()
>>>> so that malloc() can be called in fixup_after_fork/exec()?
>>>
>>> It appears simple, but this is a touchy area of code.  The _0 and _1 are two
>>> separate phases of process startup.  I'd want to hear Corinna's thoughts on this.
>>>
>>> I'd also like to verify somehow that this is the scenario Ken is hitting.
>>>
>>> When I was researching different mallocs for Cygwin I hit the constructor snag
>>> repeatedly.  I did try delaying the constructor-running until after malloc_init().
>>>    More problems.  I did not try moving malloc_init() to before the constructor run.
>>
>> Apologies; this was many months ago.  What I did try was moving the malloc_init()
>> to before running the constructor chain, as Takashi suggested.  That is what gave
>> me more problems.  I don't recall what they were, but I reverted that attempt.
>>
>> The "future malloc" build of Cygwin I'm running doesn't exhibit Ken's issue, as
>> far as I can tell.  It has a specific fix to avoid the scenario I've been talking
>> about here, but I don't want to take us down that path unless we're sure Ken's
>> hitting that same scenario.
> 
> I tried the following patch, and confirmed that the issue has
> been disappeared. I do not notice any other problems so far
> with this patch.
> 
> diff --git a/winsup/cygwin/dcrt0.cc b/winsup/cygwin/dcrt0.cc
> index 6f4723bb0..0d541ec14 100644
> --- a/winsup/cygwin/dcrt0.cc
> +++ b/winsup/cygwin/dcrt0.cc
> @@ -773,6 +773,10 @@ dll_crt0_0 ()
>     do_global_ctors (&__CTOR_LIST__, 1);

       ^^^^^^^^^^^^^^^

>     cygthread::init ();
>   
> +  /* malloc_init() has been moved from dll_crt0_1() to here so that
> +     malloc() can be called in fixup_after_exec(). */
> +  malloc_init ();
> +
>     if (!child_proc_info)
>       {
>         setup_cygheap ();
> @@ -857,7 +861,7 @@ dll_crt0_1 (void *)
>        on a functioning malloc and it's possible that the user's program may
>        have overridden malloc.  We only know about that at this stage,
>        unfortunately. */
> -  malloc_init ();
> +  /* malloc_init() has been moved to dll_crt0_0(). */
>     user_shared->initialize ();
>   
>   #ifdef CYGHEAP_DEBUG
> 
> 
> Where is the "constructor chain" you mentioned?

See above.  Try moving your new lines above the call to do_global_ctors().  Also 
note the comment just above the original location of those lines.. you're now 
ignoring what the comment warns about.

..mark

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: malloc crash
  2021-10-25 22:02         ` Ken Brown
  2021-10-25 23:36           ` Mark Geisert
@ 2021-10-26  9:24           ` Corinna Vinschen
  2021-10-26 14:32             ` Ken Brown
  1 sibling, 1 reply; 27+ messages in thread
From: Corinna Vinschen @ 2021-10-26  9:24 UTC (permalink / raw)
  To: cygwin-developers

On Oct 25 18:02, Ken Brown wrote:
> On 10/25/2021 5:29 PM, Mark Geisert wrote:
> > Corinna Vinschen wrote:
> > > On Oct 25 08:35, Ken Brown wrote:
> > > > On 10/25/2021 4:59 AM, Corinna Vinschen wrote:
> > > > > Has the thread already been started at this point?
> > > > 
> > > > Yes, here's the backtrace of that thread:
> > > > 
> > > > Thread 5 (Thread 9692.0x7c4c):
> > > > #0  0x00000001801934f9 in sys_alloc (m=0x18036f860 <_gm_>, nb=1040) at
> > > > ../../../../temp/winsup/cygwin/malloc.cc:4232
> > > > #1  0x0000000180196b96 in dlmalloc (bytes=1024) at
> > > > ../../../../temp/winsup/cygwin/malloc.cc:4669
> > > > #2  0x00000001801993e1 in dlrealloc (oldmem=0x0, bytes=1024) at
> > > > ../../../../temp/winsup/cygwin/malloc.cc:5187
> > > > #3  0x00000001800e8eed in realloc (p=0x0, size=1024) at
> > > > ../../../../temp/winsup/cygwin/malloc_wrapper.cc:73
> > > 
> > > Er... huh?  So both threads are in a malloc function?  This shouldn't
> > > have happened, given the clunky muto guarding malloc calls.  This is
> > > really strange.  Why's the muto not working here?
> > 
> > Is it possible both threads have executed malloc_init()?
> > If so, the second one would reinit the muto.

Right, but malloc_init is only called from dll_crt0_1, so only the main
thread can actually call malloc_init, other threads never get there.

> Or does the fifo_reader thread call a malloc function before the main thread
> has called malloc_init()?  This would presumably cause __malloc_lock() to
> fail, but there's no error check.

That sounds more likely.  In theory this shouldn't have much influence,
though.  First of all, all fixup calls are running in a single thread,
so there's no serialization required(*), and the malloc_init call
doesn't set up the malloc arena, it just initializes the muto and checks
for user space provided malloc calls, which is not a problem in this
scenario.

(*) unless multiple threads are started during fixup and some of these
    threads mallocate memory again...

Ken, is there a chance to tweak the fifo code to stop creating
threads from inside fixup, and to defer the thread start to the first
call in the process actually relying on the thread being started?


Corinna

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: malloc crash
  2021-10-26  8:59                     ` Mark Geisert
@ 2021-10-26  9:26                       ` Takashi Yano
  2021-10-26  9:31                         ` Corinna Vinschen
  2021-10-26  9:28                       ` Corinna Vinschen
  1 sibling, 1 reply; 27+ messages in thread
From: Takashi Yano @ 2021-10-26  9:26 UTC (permalink / raw)
  To: cygwin-developers

On Tue, 26 Oct 2021 01:59:36 -0700
Mark Geisert wrote:
> Takashi Yano wrote:
> > On Tue, 26 Oct 2021 01:30:13 -0700
> > Mark Geisert wrote:
> >> Apologies; this was many months ago.  What I did try was moving the malloc_init()
> >> to before running the constructor chain, as Takashi suggested.  That is what gave
> >> me more problems.  I don't recall what they were, but I reverted that attempt.
> >>
> >> The "future malloc" build of Cygwin I'm running doesn't exhibit Ken's issue, as
> >> far as I can tell.  It has a specific fix to avoid the scenario I've been talking
> >> about here, but I don't want to take us down that path unless we're sure Ken's
> >> hitting that same scenario.
> > 
> > I tried the following patch, and confirmed that the issue has
> > been disappeared. I do not notice any other problems so far
> > with this patch.
> > 
> > diff --git a/winsup/cygwin/dcrt0.cc b/winsup/cygwin/dcrt0.cc
> > index 6f4723bb0..0d541ec14 100644
> > --- a/winsup/cygwin/dcrt0.cc
> > +++ b/winsup/cygwin/dcrt0.cc
> > @@ -773,6 +773,10 @@ dll_crt0_0 ()
> >     do_global_ctors (&__CTOR_LIST__, 1);
> 
>        ^^^^^^^^^^^^^^^
> 
> >     cygthread::init ();
> >   
> > +  /* malloc_init() has been moved from dll_crt0_1() to here so that
> > +     malloc() can be called in fixup_after_exec(). */
> > +  malloc_init ();
> > +
> >     if (!child_proc_info)
> >       {
> >         setup_cygheap ();
> > @@ -857,7 +861,7 @@ dll_crt0_1 (void *)
> >        on a functioning malloc and it's possible that the user's program may
> >        have overridden malloc.  We only know about that at this stage,
> >        unfortunately. */
> > -  malloc_init ();
> > +  /* malloc_init() has been moved to dll_crt0_0(). */
> >     user_shared->initialize ();
> >   
> >   #ifdef CYGHEAP_DEBUG
> > 
> > 
> > Where is the "constructor chain" you mentioned?
> 
> See above.  Try moving your new lines above the call to do_global_ctors().  Also 
> note the comment just above the original location of those lines.. you're now 
> ignoring what the comment warns about.

I have just tried moving malloc_init() before do_global_ctors(),
however, I do not encountered any problems.

I do not understand what "user's program may have overridden malloc"
means...

-- 
Takashi Yano <takashi.yano@nifty.ne.jp>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: malloc crash
  2021-10-26  0:54               ` Mark Geisert
  2021-10-26  8:30                 ` Mark Geisert
@ 2021-10-26  9:27                 ` Corinna Vinschen
  1 sibling, 0 replies; 27+ messages in thread
From: Corinna Vinschen @ 2021-10-26  9:27 UTC (permalink / raw)
  To: cygwin-developers

On Oct 25 17:54, Mark Geisert wrote:
> Takashi Yano wrote:
> > On Mon, 25 Oct 2021 16:36:50 -0700
> > Mark Geisert wrote:
> > > Ken Brown wrote:
> > > > On 10/25/2021 5:29 PM, Mark Geisert wrote:
> > > > > Corinna Vinschen wrote:
> > > > > > On Oct 25 08:35, Ken Brown wrote:
> > > > > > > On 10/25/2021 4:59 AM, Corinna Vinschen wrote:
> > > > > > > > Has the thread already been started at this point?
> > > > > > > 
> > > > > > > Yes, here's the backtrace of that thread:
> > > > > > > 
> > > > > > > Thread 5 (Thread 9692.0x7c4c):
> > > > > > > #0  0x00000001801934f9 in sys_alloc (m=0x18036f860 <_gm_>, nb=1040) at
> > > > > > > ../../../../temp/winsup/cygwin/malloc.cc:4232
> > > > > > > #1  0x0000000180196b96 in dlmalloc (bytes=1024) at
> > > > > > > ../../../../temp/winsup/cygwin/malloc.cc:4669
> > > > > > > #2  0x00000001801993e1 in dlrealloc (oldmem=0x0, bytes=1024) at
> > > > > > > ../../../../temp/winsup/cygwin/malloc.cc:5187
> > > > > > > #3  0x00000001800e8eed in realloc (p=0x0, size=1024) at
> > > > > > > ../../../../temp/winsup/cygwin/malloc_wrapper.cc:73
> > > > > > 
> > > > > > Er... huh?  So both threads are in a malloc function?  This shouldn't
> > > > > > have happened, given the clunky muto guarding malloc calls.  This is
> > > > > > really strange.  Why's the muto not working here?
> > > > > 
> > > > > Is it possible both threads have executed malloc_init()?
> > > > > If so, the second one would reinit the muto.
> > > > 
> > > > Or does the fifo_reader thread call a malloc function before the main thread has
> > > > called malloc_init()?  This would presumably cause __malloc_lock() to fail, but
> > > > there's no error check.
> > > 
> > > If there's a global constructor involved, that is known to happen.  Constructors
> > > are run from dll_crt0_0(), before malloc_init() is called from dll_crt0_1().  See
> > > dcrt0.cc for the details.
> > 
> > So how about moving malloc_init() call from dll_crt0_1() to dll_crl0_0()
> > so that malloc() can be called in fixup_after_fork/exec()?
> 
> It appears simple, but this is a touchy area of code.  The _0 and _1 are two
> separate phases of process startup.  I'd want to hear Corinna's thoughts on
> this.

This would have to be split into two calls then.  The muto initialization
can be moved to dll_crt0_0, but the check for user space provided malloc
can only occur after the processes' CRT code has run, so this can only
be done from dll_crt0_1.

> I'd also like to verify somehow that this is the scenario Ken is hitting.

Yeah, that's not a safe bet yet.


Corinna

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: malloc crash
  2021-10-26  8:59                     ` Mark Geisert
  2021-10-26  9:26                       ` Takashi Yano
@ 2021-10-26  9:28                       ` Corinna Vinschen
  1 sibling, 0 replies; 27+ messages in thread
From: Corinna Vinschen @ 2021-10-26  9:28 UTC (permalink / raw)
  To: cygwin-developers

On Oct 26 01:59, Mark Geisert wrote:
> Takashi Yano wrote:
> > I tried the following patch, and confirmed that the issue has
> > been disappeared. I do not notice any other problems so far
> > with this patch.
> > 
> > diff --git a/winsup/cygwin/dcrt0.cc b/winsup/cygwin/dcrt0.cc
> > index 6f4723bb0..0d541ec14 100644
> > --- a/winsup/cygwin/dcrt0.cc
> > +++ b/winsup/cygwin/dcrt0.cc
> > @@ -773,6 +773,10 @@ dll_crt0_0 ()
> >     do_global_ctors (&__CTOR_LIST__, 1);
> 
>       ^^^^^^^^^^^^^^^
> 
> >     cygthread::init ();
> > +  /* malloc_init() has been moved from dll_crt0_1() to here so that
> > +     malloc() can be called in fixup_after_exec(). */
> > +  malloc_init ();
> > +
> >     if (!child_proc_info)
> >       {
> >         setup_cygheap ();
> > @@ -857,7 +861,7 @@ dll_crt0_1 (void *)
> >        on a functioning malloc and it's possible that the user's program may
> >        have overridden malloc.  We only know about that at this stage,
> >        unfortunately. */
> > -  malloc_init ();
> > +  /* malloc_init() has been moved to dll_crt0_0(). */
> >     user_shared->initialize ();
> >   #ifdef CYGHEAP_DEBUG
> > 
> > 
> > Where is the "constructor chain" you mentioned?
> 
> See above.  Try moving your new lines above the call to do_global_ctors().
> Also note the comment just above the original location of those lines..
> you're now ignoring what the comment warns about.

So we need malloc_init_0 and malloc_init_1, right? :)


Corinna

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: malloc crash
  2021-10-26  9:26                       ` Takashi Yano
@ 2021-10-26  9:31                         ` Corinna Vinschen
  0 siblings, 0 replies; 27+ messages in thread
From: Corinna Vinschen @ 2021-10-26  9:31 UTC (permalink / raw)
  To: cygwin-developers

On Oct 26 18:26, Takashi Yano wrote:
> On Tue, 26 Oct 2021 01:59:36 -0700
> Mark Geisert wrote:
> > See above.  Try moving your new lines above the call to do_global_ctors().  Also 
> > note the comment just above the original location of those lines.. you're now 
> > ignoring what the comment warns about.
> 
> I have just tried moving malloc_init() before do_global_ctors(),
> however, I do not encountered any problems.
> 
> I do not understand what "user's program may have overridden malloc"
> means...

Check out the malloc_init code.  You have to assume that an application
comes with it's own malloc implementation.  One of the scenarios
discussed at the time was an app including dmalloc, a malloc lib for
debugging memory allocation.


Corinna

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: malloc crash
  2021-10-26  9:24           ` Corinna Vinschen
@ 2021-10-26 14:32             ` Ken Brown
  2021-10-26 16:03               ` Corinna Vinschen
  0 siblings, 1 reply; 27+ messages in thread
From: Ken Brown @ 2021-10-26 14:32 UTC (permalink / raw)
  To: cygwin-developers

On 10/26/2021 5:24 AM, Corinna Vinschen wrote:
> On Oct 25 18:02, Ken Brown wrote:
>> Or does the fifo_reader thread call a malloc function before the main thread
>> has called malloc_init()?  This would presumably cause __malloc_lock() to
>> fail, but there's no error check.
> 
> That sounds more likely.  In theory this shouldn't have much influence,
> though.  First of all, all fixup calls are running in a single thread,
> so there's no serialization required(*), and the malloc_init call
> doesn't set up the malloc arena, it just initializes the muto and checks
> for user space provided malloc calls, which is not a problem in this
> scenario.
> 
> (*) unless multiple threads are started during fixup and some of these
>      threads mallocate memory again...
> 
> Ken, is there a chance to tweak the fifo code to stop creating
> threads from inside fixup, and to defer the thread start to the first
> call in the process actually relying on the thread being started?

I can't think of any way to do that.  The thread is listening for various events 
that cause it to take action, so it has to always be running.  But I can 
probably tweak the code so that the thread doesn't have to call malloc early on.

It might take a while to get this right, and the bug has existed ever since I 
overhauled the fifo code.  So I don't think you have to hold up releasing 3.3.0 
while I work on this.

Ken

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: malloc crash
  2021-10-26 14:32             ` Ken Brown
@ 2021-10-26 16:03               ` Corinna Vinschen
  2021-10-26 16:36                 ` Ken Brown
  2021-10-26 16:44                 ` Takashi Yano
  0 siblings, 2 replies; 27+ messages in thread
From: Corinna Vinschen @ 2021-10-26 16:03 UTC (permalink / raw)
  To: cygwin-developers

On Oct 26 10:32, Ken Brown wrote:
> On 10/26/2021 5:24 AM, Corinna Vinschen wrote:
> > On Oct 25 18:02, Ken Brown wrote:
> > > Or does the fifo_reader thread call a malloc function before the main thread
> > > has called malloc_init()?  This would presumably cause __malloc_lock() to
> > > fail, but there's no error check.
> > 
> > That sounds more likely.  In theory this shouldn't have much influence,
> > though.  First of all, all fixup calls are running in a single thread,
> > so there's no serialization required(*), and the malloc_init call
> > doesn't set up the malloc arena, it just initializes the muto and checks
> > for user space provided malloc calls, which is not a problem in this
> > scenario.
> > 
> > (*) unless multiple threads are started during fixup and some of these
> >      threads mallocate memory again...
> > 
> > Ken, is there a chance to tweak the fifo code to stop creating
> > threads from inside fixup, and to defer the thread start to the first
> > call in the process actually relying on the thread being started?
> 
> I can't think of any way to do that.  The thread is listening for various
> events that cause it to take action, so it has to always be running.  But I
> can probably tweak the code so that the thread doesn't have to call malloc
> early on.
> 
> It might take a while to get this right, and the bug has existed ever since
> I overhauled the fifo code.  So I don't think you have to hold up releasing
> 3.3.0 while I work on this.

Try the below patch instead, per Takashi's testing and subsequent discussion.

From 9e53881e81bc6d2d072a0d625a9eac8ffc7cc698 Mon Sep 17 00:00:00 2001
From: Corinna Vinschen <corinna@vinschen.de>
Date: Tue, 26 Oct 2021 17:53:08 +0200
Subject: [PATCH] Cygwin: split malloc_init

Per https://cygwin.com/pipermail/cygwin-developers/2021-October/012429.html,
we may encounter a crash when starting multiple threads during process
startup (here: fhandler_fifo::fixup_after_{fork,exec}) which in turn
allocate memory via malloc.

The problem is concurrent usage of malloc before the malloc muto has
been initialized.

To fix this issue, split malloc_init into malloc_init_0, called from
dll_crt0_0, and malloc_init_1, called from dll_crt_0_1.  malloc_init_0
just initializes the muto, malloc_init_1 checks for user space provided
malloc.

Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
---
 winsup/cygwin/dcrt0.cc          | 4 +++-
 winsup/cygwin/heap.cc           | 1 -
 winsup/cygwin/heap.h            | 3 ++-
 winsup/cygwin/malloc_wrapper.cc | 6 +++++-
 4 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/winsup/cygwin/dcrt0.cc b/winsup/cygwin/dcrt0.cc
index 6f4723bb059d..7c460274fb86 100644
--- a/winsup/cygwin/dcrt0.cc
+++ b/winsup/cygwin/dcrt0.cc
@@ -769,6 +769,8 @@ dll_crt0_0 ()
   NtOpenProcessToken (NtCurrentProcess (), MAXIMUM_ALLOWED, &hProcToken);
   set_cygwin_privileges (hProcToken);
 
+  malloc_init_0 ();
+
   device::init ();
   do_global_ctors (&__CTOR_LIST__, 1);
   cygthread::init ();
@@ -857,7 +859,7 @@ dll_crt0_1 (void *)
      on a functioning malloc and it's possible that the user's program may
      have overridden malloc.  We only know about that at this stage,
      unfortunately. */
-  malloc_init ();
+  malloc_init_1 ();
   user_shared->initialize ();
 
 #ifdef CYGHEAP_DEBUG
diff --git a/winsup/cygwin/heap.cc b/winsup/cygwin/heap.cc
index b839c8cd48ee..f27f81bc4b59 100644
--- a/winsup/cygwin/heap.cc
+++ b/winsup/cygwin/heap.cc
@@ -230,7 +230,6 @@ user_heap_info::init ()
   debug_printf ("heap base %p, heap top %p, heap size %ly (%lu)",
 		base, top, chunk, chunk);
   page_const--;
-  // malloc_init ();
 }
 
 #define pround(n) (((size_t)(n) + page_const) & ~page_const)
diff --git a/winsup/cygwin/heap.h b/winsup/cygwin/heap.h
index 565758e4872c..42099051f619 100644
--- a/winsup/cygwin/heap.h
+++ b/winsup/cygwin/heap.h
@@ -10,7 +10,8 @@ details. */
 
 /* Heap management. */
 void heap_init ();
-void malloc_init ();
+void malloc_init_0 ();
+void malloc_init_1 ();
 
 #define inheap(s) \
   (cygheap->user_heap.ptr && s \
diff --git a/winsup/cygwin/malloc_wrapper.cc b/winsup/cygwin/malloc_wrapper.cc
index 3b245800abec..85c411a3e258 100644
--- a/winsup/cygwin/malloc_wrapper.cc
+++ b/winsup/cygwin/malloc_wrapper.cc
@@ -272,10 +272,14 @@ strdup (const char *s)
 muto NO_COPY mallock;
 
 void
-malloc_init ()
+malloc_init_0 ()
 {
   mallock.init ("mallock");
+}
 
+void
+malloc_init_1 ()
+{
   /* Check if malloc is provided by application. If so, redirect all
      calls to malloc/free/realloc to application provided. This may
      happen if some other dll calls cygwin's malloc, but main code provides
-- 
2.31.1



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: malloc crash
  2021-10-26 16:03               ` Corinna Vinschen
@ 2021-10-26 16:36                 ` Ken Brown
  2021-10-26 16:49                   ` Corinna Vinschen
  2021-10-26 16:44                 ` Takashi Yano
  1 sibling, 1 reply; 27+ messages in thread
From: Ken Brown @ 2021-10-26 16:36 UTC (permalink / raw)
  To: cygwin-developers

On 10/26/2021 12:03 PM, Corinna Vinschen wrote:
> On Oct 26 10:32, Ken Brown wrote:
>> It might take a while to get this right, and the bug has existed ever since
>> I overhauled the fifo code.  So I don't think you have to hold up releasing
>> 3.3.0 while I work on this.
> 
> Try the below patch instead, per Takashi's testing and subsequent discussion.
> [...]
I agree with Takashi that this fixes the problem.  Thanks to both of you!  This 
saved me a lot of testing and the risk of introducing new bugs.

Ken

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: malloc crash
  2021-10-26 16:03               ` Corinna Vinschen
  2021-10-26 16:36                 ` Ken Brown
@ 2021-10-26 16:44                 ` Takashi Yano
  1 sibling, 0 replies; 27+ messages in thread
From: Takashi Yano @ 2021-10-26 16:44 UTC (permalink / raw)
  To: cygwin-developers

On Tue, 26 Oct 2021 18:03:28 +0200
Corinna Vinschen wrote:
> On Oct 26 10:32, Ken Brown wrote:
> > On 10/26/2021 5:24 AM, Corinna Vinschen wrote:
> > > Ken, is there a chance to tweak the fifo code to stop creating
> > > threads from inside fixup, and to defer the thread start to the first
> > > call in the process actually relying on the thread being started?
> > 
> > I can't think of any way to do that.  The thread is listening for various
> > events that cause it to take action, so it has to always be running.  But I
> > can probably tweak the code so that the thread doesn't have to call malloc
> > early on.
> > 
> > It might take a while to get this right, and the bug has existed ever since
> > I overhauled the fifo code.  So I don't think you have to hold up releasing
> > 3.3.0 while I work on this.
> 
> Try the below patch instead, per Takashi's testing and subsequent discussion.

Your patch works fine for me.

-- 
Takashi Yano <takashi.yano@nifty.ne.jp>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: malloc crash
  2021-10-26 16:36                 ` Ken Brown
@ 2021-10-26 16:49                   ` Corinna Vinschen
  2021-10-26 17:10                     ` Ken Brown
  2021-10-27  0:44                     ` Takashi Yano
  0 siblings, 2 replies; 27+ messages in thread
From: Corinna Vinschen @ 2021-10-26 16:49 UTC (permalink / raw)
  To: cygwin-developers

On Oct 26 12:36, Ken Brown wrote:
> On 10/26/2021 12:03 PM, Corinna Vinschen wrote:
> > On Oct 26 10:32, Ken Brown wrote:
> > > It might take a while to get this right, and the bug has existed ever since
> > > I overhauled the fifo code.  So I don't think you have to hold up releasing
> > > 3.3.0 while I work on this.
> > 
> > Try the below patch instead, per Takashi's testing and subsequent discussion.
> > [...]
> I agree with Takashi that this fixes the problem.  Thanks to both of you!
> This saved me a lot of testing and the risk of introducing new bugs.

Below's another incarnation of this patch.  I wasn't quite happy with
the first one.  Statically initialized SRWLOCKs seem to be the natural
way to solve the problem...

Ken, Takashi, can you please test it both, just to get as much testing
as possible?


Thanks,
Corinna

From 44a79a6eca3d322020dda0919023d78dda129d4d Mon Sep 17 00:00:00 2001
From: Corinna Vinschen <corinna@vinschen.de>
Date: Tue, 26 Oct 2021 17:53:08 +0200
Subject: [PATCH] Cygwin: convert malloc lock to SRWLOCK

Per https://cygwin.com/pipermail/cygwin-developers/2021-October/012429.html,
we may encounter a crash when starting multiple threads during process
startup (here: fhandler_fifo::fixup_after_{fork,exec}) which in turn
allocate memory via malloc.

The problem is concurrent usage of malloc before the malloc muto has
been initialized.

To fix this issue, convert the muto to a SRWLOCK and make sure it is
statically initalized.  Thus, malloc can be called as early as necessary
and malloc_init is only required to check for user space provided malloc.

Note that this requires to implement a __malloc_trylock macro to be
called from fork.

Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
---
 winsup/cygwin/cygmalloc.h       | 8 +++++---
 winsup/cygwin/dcrt0.cc          | 1 +
 winsup/cygwin/fork.cc           | 2 +-
 winsup/cygwin/heap.cc           | 1 -
 winsup/cygwin/heap.h            | 1 -
 winsup/cygwin/malloc_wrapper.cc | 4 +---
 winsup/cygwin/release/3.3.0     | 4 ++++
 7 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/winsup/cygwin/cygmalloc.h b/winsup/cygwin/cygmalloc.h
index 84bad824c5ae..36f56294fb35 100644
--- a/winsup/cygwin/cygmalloc.h
+++ b/winsup/cygwin/cygmalloc.h
@@ -36,9 +36,11 @@ void *mmap64 (void *, size_t, int, int, int, off_t);
 
 #elif defined (__INSIDE_CYGWIN__)
 
-# define __malloc_lock() mallock.acquire ()
-# define __malloc_unlock() mallock.release ()
-extern muto mallock;
+# define __malloc_lock() AcquireSRWLockExclusive (&mallock)
+# define __malloc_trylock() TryAcquireSRWLockExclusive (&mallock)
+# define __malloc_unlock() ReleaseSRWLockExclusive (&mallock)
+extern SRWLOCK NO_COPY mallock;
+void malloc_init ();
 
 #endif
 
diff --git a/winsup/cygwin/dcrt0.cc b/winsup/cygwin/dcrt0.cc
index 6f4723bb059d..f3d09c1695f8 100644
--- a/winsup/cygwin/dcrt0.cc
+++ b/winsup/cygwin/dcrt0.cc
@@ -29,6 +29,7 @@ details. */
 #include "shared_info.h"
 #include "cygwin_version.h"
 #include "dll_init.h"
+#include "cygmalloc.h"
 #include "heap.h"
 #include "tls_pbuf.h"
 #include "exception.h"
diff --git a/winsup/cygwin/fork.cc b/winsup/cygwin/fork.cc
index 719217856ae3..912f1727441e 100644
--- a/winsup/cygwin/fork.cc
+++ b/winsup/cygwin/fork.cc
@@ -296,7 +296,7 @@ frok::parent (volatile char * volatile stack_here)
   si.lpReserved2 = (LPBYTE) &ch;
   si.cbReserved2 = sizeof (ch);
 
-  bool locked = __malloc_lock ();
+  bool locked = __malloc_trylock ();
 
   /* Remove impersonation */
   cygheap->user.deimpersonate ();
diff --git a/winsup/cygwin/heap.cc b/winsup/cygwin/heap.cc
index b839c8cd48ee..f27f81bc4b59 100644
--- a/winsup/cygwin/heap.cc
+++ b/winsup/cygwin/heap.cc
@@ -230,7 +230,6 @@ user_heap_info::init ()
   debug_printf ("heap base %p, heap top %p, heap size %ly (%lu)",
 		base, top, chunk, chunk);
   page_const--;
-  // malloc_init ();
 }
 
 #define pround(n) (((size_t)(n) + page_const) & ~page_const)
diff --git a/winsup/cygwin/heap.h b/winsup/cygwin/heap.h
index 565758e4872c..25200f286f8a 100644
--- a/winsup/cygwin/heap.h
+++ b/winsup/cygwin/heap.h
@@ -10,7 +10,6 @@ details. */
 
 /* Heap management. */
 void heap_init ();
-void malloc_init ();
 
 #define inheap(s) \
   (cygheap->user_heap.ptr && s \
diff --git a/winsup/cygwin/malloc_wrapper.cc b/winsup/cygwin/malloc_wrapper.cc
index 3b245800abec..2842e5398501 100644
--- a/winsup/cygwin/malloc_wrapper.cc
+++ b/winsup/cygwin/malloc_wrapper.cc
@@ -269,13 +269,11 @@ strdup (const char *s)
    newlib will call __malloc_lock and __malloc_unlock at appropriate
    times.  */
 
-muto NO_COPY mallock;
+SRWLOCK NO_COPY mallock = SRWLOCK_INIT;
 
 void
 malloc_init ()
 {
-  mallock.init ("mallock");
-
   /* Check if malloc is provided by application. If so, redirect all
      calls to malloc/free/realloc to application provided. This may
      happen if some other dll calls cygwin's malloc, but main code provides
diff --git a/winsup/cygwin/release/3.3.0 b/winsup/cygwin/release/3.3.0
index 895c273974cd..1595b17539ce 100644
--- a/winsup/cygwin/release/3.3.0
+++ b/winsup/cygwin/release/3.3.0
@@ -78,3 +78,7 @@ Bug Fixes
 - Fix access violation that can sometimes occur when copy/pasting between
   32-bit and 64-bit Cygwin environments.  Align clipboard descriptor layouts.
   Addresses: https://cygwin.com/pipermail/cygwin-patches/2021q4/011517.html
+
+- Fix a synchronization issue when running multiple threads from DLL
+  initialization which in turn call malloc.
+  Addresses: https://cygwin.com/pipermail/cygwin/2021-October/249635.html
-- 
2.31.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: malloc crash
  2021-10-26 16:49                   ` Corinna Vinschen
@ 2021-10-26 17:10                     ` Ken Brown
  2021-10-27  0:44                     ` Takashi Yano
  1 sibling, 0 replies; 27+ messages in thread
From: Ken Brown @ 2021-10-26 17:10 UTC (permalink / raw)
  To: cygwin-developers

On 10/26/2021 12:49 PM, Corinna Vinschen wrote:
> Below's another incarnation of this patch.  I wasn't quite happy with
> the first one.  Statically initialized SRWLOCKs seem to be the natural
> way to solve the problem...
> 
> Ken, Takashi, can you please test it both, just to get as much testing
> as possible?

Works for me.

Ken

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: malloc crash
  2021-10-26 16:49                   ` Corinna Vinschen
  2021-10-26 17:10                     ` Ken Brown
@ 2021-10-27  0:44                     ` Takashi Yano
  2021-10-27  9:01                       ` Corinna Vinschen
  1 sibling, 1 reply; 27+ messages in thread
From: Takashi Yano @ 2021-10-27  0:44 UTC (permalink / raw)
  To: cygwin-developers

On Tue, 26 Oct 2021 18:49:33 +0200
Corinna Vinschen <corinna-cygwin@cygwin.com> wrote:
> On Oct 26 12:36, Ken Brown wrote:
> > On 10/26/2021 12:03 PM, Corinna Vinschen wrote:
> > > On Oct 26 10:32, Ken Brown wrote:
> > > > It might take a while to get this right, and the bug has existed ever since
> > > > I overhauled the fifo code.  So I don't think you have to hold up releasing
> > > > 3.3.0 while I work on this.
> > > 
> > > Try the below patch instead, per Takashi's testing and subsequent discussion.
> > > [...]
> > I agree with Takashi that this fixes the problem.  Thanks to both of you!
> > This saved me a lot of testing and the risk of introducing new bugs.
> 
> Below's another incarnation of this patch.  I wasn't quite happy with
> the first one.  Statically initialized SRWLOCKs seem to be the natural
> way to solve the problem...
> 
> Ken, Takashi, can you please test it both, just to get as much testing
> as possible?

The new patch also works fine for me.

-- 
Takashi Yano <takashi.yano@nifty.ne.jp>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: malloc crash
  2021-10-27  0:44                     ` Takashi Yano
@ 2021-10-27  9:01                       ` Corinna Vinschen
  0 siblings, 0 replies; 27+ messages in thread
From: Corinna Vinschen @ 2021-10-27  9:01 UTC (permalink / raw)
  To: cygwin-developers

On Oct 27 09:44, Takashi Yano wrote:
> On Tue, 26 Oct 2021 18:49:33 +0200
> Corinna Vinschen <corinna-cygwin@cygwin.com> wrote:
> > On Oct 26 12:36, Ken Brown wrote:
> > > On 10/26/2021 12:03 PM, Corinna Vinschen wrote:
> > > > On Oct 26 10:32, Ken Brown wrote:
> > > > > It might take a while to get this right, and the bug has existed ever since
> > > > > I overhauled the fifo code.  So I don't think you have to hold up releasing
> > > > > 3.3.0 while I work on this.
> > > > 
> > > > Try the below patch instead, per Takashi's testing and subsequent discussion.
> > > > [...]
> > > I agree with Takashi that this fixes the problem.  Thanks to both of you!
> > > This saved me a lot of testing and the risk of introducing new bugs.
> > 
> > Below's another incarnation of this patch.  I wasn't quite happy with
> > the first one.  Statically initialized SRWLOCKs seem to be the natural
> > way to solve the problem...
> > 
> > Ken, Takashi, can you please test it both, just to get as much testing
> > as possible?
> 
> The new patch also works fine for me.

Thanks guys, I pushed it.


Corinna

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2021-10-27  9:01 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-24 21:46 malloc crash Ken Brown
2021-10-25  8:56 ` Takashi Yano
2021-10-25 13:37   ` Ken Brown
2021-10-25  8:59 ` Corinna Vinschen
2021-10-25 12:35   ` Ken Brown
2021-10-25 15:39     ` Corinna Vinschen
2021-10-25 21:29       ` Mark Geisert
2021-10-25 22:02         ` Ken Brown
2021-10-25 23:36           ` Mark Geisert
2021-10-26  0:18             ` Takashi Yano
2021-10-26  0:54               ` Mark Geisert
2021-10-26  8:30                 ` Mark Geisert
2021-10-26  8:52                   ` Takashi Yano
2021-10-26  8:59                     ` Mark Geisert
2021-10-26  9:26                       ` Takashi Yano
2021-10-26  9:31                         ` Corinna Vinschen
2021-10-26  9:28                       ` Corinna Vinschen
2021-10-26  9:27                 ` Corinna Vinschen
2021-10-26  9:24           ` Corinna Vinschen
2021-10-26 14:32             ` Ken Brown
2021-10-26 16:03               ` Corinna Vinschen
2021-10-26 16:36                 ` Ken Brown
2021-10-26 16:49                   ` Corinna Vinschen
2021-10-26 17:10                     ` Ken Brown
2021-10-27  0:44                     ` Takashi Yano
2021-10-27  9:01                       ` Corinna Vinschen
2021-10-26 16:44                 ` Takashi Yano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).