From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from m0.truegem.net (m0.truegem.net [69.55.228.47]) by sourceware.org (Postfix) with ESMTPS id 7B7C63858D3C for ; Tue, 26 Oct 2021 08:30:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 7B7C63858D3C Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=maxrnd.com Authentication-Results: sourceware.org; spf=none smtp.mailfrom=maxrnd.com Received: (from daemon@localhost) by m0.truegem.net (8.12.11/8.12.11) id 19Q8UJ5h027417 for ; Tue, 26 Oct 2021 01:30:19 -0700 (PDT) (envelope-from mark@maxrnd.com) Received: from 162-235-43-67.lightspeed.irvnca.sbcglobal.net(162.235.43.67), claiming to be "[192.168.1.100]" via SMTP by m0.truegem.net, id smtpdG63thT; Tue Oct 26 01:30:13 2021 Subject: Re: malloc crash To: cygwin-developers@cygwin.com References: <6a4d6675-7e4d-bcb3-9aff-acc0788d211d@cornell.edu> <97873b16-7ec3-02d7-1861-3ec62a79c37e@cornell.edu> <4b322eb0-4941-6b8f-6f46-aa76caf5a66f@cornell.edu> <2819d0db-3c5c-2d31-2b21-91efafb7f8f4@maxrnd.com> <20211026091855.7aaf2de97d10174121cbc8f9@nifty.ne.jp> From: Mark Geisert Message-ID: Date: Tue, 26 Oct 2021 01:30:13 -0700 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0 SeaMonkey/2.49.4 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-3.2 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, NICE_REPLY_A, SPF_HELO_NONE, SPF_NONE, TXREP, WEIRD_PORT autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: cygwin-developers@cygwin.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Cygwin core component developers mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 26 Oct 2021 08:30:22 -0000 Replying to myself to correct something I wrote... Mark Geisert wrote: > Takashi Yano wrote: >> On Mon, 25 Oct 2021 16:36:50 -0700 >> Mark Geisert wrote: >>> Ken Brown wrote: >>>> On 10/25/2021 5:29 PM, Mark Geisert wrote: >>>>> Corinna Vinschen wrote: >>>>>> On Oct 25 08:35, Ken Brown wrote: >>>>>>> On 10/25/2021 4:59 AM, Corinna Vinschen wrote: >>>>>>>> Has the thread already been started at this point? >>>>>>> >>>>>>> Yes, here's the backtrace of that thread: >>>>>>> >>>>>>> Thread 5 (Thread 9692.0x7c4c): >>>>>>> #0  0x00000001801934f9 in sys_alloc (m=0x18036f860 <_gm_>, nb=1040) at >>>>>>> ../../../../temp/winsup/cygwin/malloc.cc:4232 >>>>>>> #1  0x0000000180196b96 in dlmalloc (bytes=1024) at >>>>>>> ../../../../temp/winsup/cygwin/malloc.cc:4669 >>>>>>> #2  0x00000001801993e1 in dlrealloc (oldmem=0x0, bytes=1024) at >>>>>>> ../../../../temp/winsup/cygwin/malloc.cc:5187 >>>>>>> #3  0x00000001800e8eed in realloc (p=0x0, size=1024) at >>>>>>> ../../../../temp/winsup/cygwin/malloc_wrapper.cc:73 >>>>>> >>>>>> Er... huh?  So both threads are in a malloc function?  This shouldn't >>>>>> have happened, given the clunky muto guarding malloc calls.  This is >>>>>> really strange.  Why's the muto not working here? >>>>> >>>>> Is it possible both threads have executed malloc_init()? >>>>> If so, the second one would reinit the muto. >>>> >>>> Or does the fifo_reader thread call a malloc function before the main thread has >>>> called malloc_init()?  This would presumably cause __malloc_lock() to fail, but >>>> there's no error check. >>> >>> If there's a global constructor involved, that is known to happen.  Constructors >>> are run from dll_crt0_0(), before malloc_init() is called from dll_crt0_1().  See >>> dcrt0.cc for the details. >> >> So how about moving malloc_init() call from dll_crt0_1() to dll_crl0_0() >> so that malloc() can be called in fixup_after_fork/exec()? > > It appears simple, but this is a touchy area of code.  The _0 and _1 are two > separate phases of process startup.  I'd want to hear Corinna's thoughts on this. > > I'd also like to verify somehow that this is the scenario Ken is hitting. > > When I was researching different mallocs for Cygwin I hit the constructor snag > repeatedly.  I did try delaying the constructor-running until after malloc_init(). >  More problems.  I did not try moving malloc_init() to before the constructor run. Apologies; this was many months ago. What I did try was moving the malloc_init() to before running the constructor chain, as Takashi suggested. That is what gave me more problems. I don't recall what they were, but I reverted that attempt. The "future malloc" build of Cygwin I'm running doesn't exhibit Ken's issue, as far as I can tell. It has a specific fix to avoid the scenario I've been talking about here, but I don't want to take us down that path unless we're sure Ken's hitting that same scenario. ..mark