From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from m0.truegem.net (m0.truegem.net [69.55.228.47]) by sourceware.org (Postfix) with ESMTPS id E5219385800A for ; Tue, 26 Oct 2021 08:59:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E5219385800A Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=maxrnd.com Authentication-Results: sourceware.org; spf=none smtp.mailfrom=maxrnd.com Received: (from daemon@localhost) by m0.truegem.net (8.12.11/8.12.11) id 19Q8xfJ7029579 for ; Tue, 26 Oct 2021 01:59:41 -0700 (PDT) (envelope-from mark@maxrnd.com) Received: from 162-235-43-67.lightspeed.irvnca.sbcglobal.net(162.235.43.67), claiming to be "[192.168.1.100]" via SMTP by m0.truegem.net, id smtpd1PkYtQ; Tue Oct 26 01:59:36 2021 Subject: Re: malloc crash To: cygwin-developers@cygwin.com References: <6a4d6675-7e4d-bcb3-9aff-acc0788d211d@cornell.edu> <97873b16-7ec3-02d7-1861-3ec62a79c37e@cornell.edu> <4b322eb0-4941-6b8f-6f46-aa76caf5a66f@cornell.edu> <2819d0db-3c5c-2d31-2b21-91efafb7f8f4@maxrnd.com> <20211026091855.7aaf2de97d10174121cbc8f9@nifty.ne.jp> <20211026175229.1eda36caab1b03314a8cf165@nifty.ne.jp> From: Mark Geisert Message-ID: Date: Tue, 26 Oct 2021 01:59:36 -0700 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0 SeaMonkey/2.49.4 MIME-Version: 1.0 In-Reply-To: <20211026175229.1eda36caab1b03314a8cf165@nifty.ne.jp> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-9.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, NICE_REPLY_A, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: cygwin-developers@cygwin.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Cygwin core component developers mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 26 Oct 2021 08:59:43 -0000 Takashi Yano wrote: > On Tue, 26 Oct 2021 01:30:13 -0700 > Mark Geisert wrote: >> Replying to myself to correct something I wrote... >> >> Mark Geisert wrote: >>> Takashi Yano wrote: >>>> On Mon, 25 Oct 2021 16:36:50 -0700 >>>> Mark Geisert wrote: >>>>> Ken Brown wrote: >>>>>> On 10/25/2021 5:29 PM, Mark Geisert wrote: >>>>>>> Corinna Vinschen wrote: >>>>>>>> Er... huh?  So both threads are in a malloc function?  This shouldn't >>>>>>>> have happened, given the clunky muto guarding malloc calls.  This is >>>>>>>> really strange.  Why's the muto not working here? >>>>>>> >>>>>>> Is it possible both threads have executed malloc_init()? >>>>>>> If so, the second one would reinit the muto. >>>>>> >>>>>> Or does the fifo_reader thread call a malloc function before the main thread has >>>>>> called malloc_init()?  This would presumably cause __malloc_lock() to fail, but >>>>>> there's no error check. >>>>> >>>>> If there's a global constructor involved, that is known to happen.  Constructors >>>>> are run from dll_crt0_0(), before malloc_init() is called from dll_crt0_1().  See >>>>> dcrt0.cc for the details. >>>> >>>> So how about moving malloc_init() call from dll_crt0_1() to dll_crl0_0() >>>> so that malloc() can be called in fixup_after_fork/exec()? >>> >>> It appears simple, but this is a touchy area of code.  The _0 and _1 are two >>> separate phases of process startup.  I'd want to hear Corinna's thoughts on this. >>> >>> I'd also like to verify somehow that this is the scenario Ken is hitting. >>> >>> When I was researching different mallocs for Cygwin I hit the constructor snag >>> repeatedly.  I did try delaying the constructor-running until after malloc_init(). >>>  More problems.  I did not try moving malloc_init() to before the constructor run. >> >> Apologies; this was many months ago. What I did try was moving the malloc_init() >> to before running the constructor chain, as Takashi suggested. That is what gave >> me more problems. I don't recall what they were, but I reverted that attempt. >> >> The "future malloc" build of Cygwin I'm running doesn't exhibit Ken's issue, as >> far as I can tell. It has a specific fix to avoid the scenario I've been talking >> about here, but I don't want to take us down that path unless we're sure Ken's >> hitting that same scenario. > > I tried the following patch, and confirmed that the issue has > been disappeared. I do not notice any other problems so far > with this patch. > > diff --git a/winsup/cygwin/dcrt0.cc b/winsup/cygwin/dcrt0.cc > index 6f4723bb0..0d541ec14 100644 > --- a/winsup/cygwin/dcrt0.cc > +++ b/winsup/cygwin/dcrt0.cc > @@ -773,6 +773,10 @@ dll_crt0_0 () > do_global_ctors (&__CTOR_LIST__, 1); ^^^^^^^^^^^^^^^ > cygthread::init (); > > + /* malloc_init() has been moved from dll_crt0_1() to here so that > + malloc() can be called in fixup_after_exec(). */ > + malloc_init (); > + > if (!child_proc_info) > { > setup_cygheap (); > @@ -857,7 +861,7 @@ dll_crt0_1 (void *) > on a functioning malloc and it's possible that the user's program may > have overridden malloc. We only know about that at this stage, > unfortunately. */ > - malloc_init (); > + /* malloc_init() has been moved to dll_crt0_0(). */ > user_shared->initialize (); > > #ifdef CYGHEAP_DEBUG > > > Where is the "constructor chain" you mentioned? See above. Try moving your new lines above the call to do_global_ctors(). Also note the comment just above the original location of those lines.. you're now ignoring what the comment warns about. ..mark