From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from conssluserg-04.nifty.com (conssluserg-04.nifty.com [210.131.2.83]) by sourceware.org (Postfix) with ESMTPS id B3EC83858D3C for ; Tue, 26 Oct 2021 08:52:43 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B3EC83858D3C Authentication-Results: sourceware.org; dmarc=fail (p=none dis=none) header.from=nifty.ne.jp Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=nifty.ne.jp Received: from Express5800-S70 (z221123.dynamic.ppp.asahi-net.or.jp [110.4.221.123]) (authenticated) by conssluserg-04.nifty.com with ESMTP id 19Q8qTut006292 for ; Tue, 26 Oct 2021 17:52:30 +0900 DKIM-Filter: OpenDKIM Filter v2.10.3 conssluserg-04.nifty.com 19Q8qTut006292 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nifty.ne.jp; s=dec2015msa; t=1635238350; bh=FlQgVH/KdDMgCp9h4HAIm1XJeKVJk+COvw9+Bouvgbw=; h=Date:From:To:Subject:In-Reply-To:References:From; b=dcfFP6RmOOa9yjHEBb2lbpHPO+CnVXVz6zAPxuSMYDQtl8kE3g1fwC9MsqpkqHYmH YQM1mG45PjK94L0sW9VbizHBubwLER1QsOqpkORl+/bWaAnMrmTj233n9APb3irLv3 K8ibXUg7mPRIud+kjS3mdboafFhlGjYprwJCmXvnu9/h1NA1lUbqzVwWfKi2xv5gnB dbS7KNSueyzAajiw5QFp3XRysSOnz2VD1eBQDNLo+U4d4OkYzo9Nzsu0S5e3zcy4Mx QoxiQx8/0ye0+Z6+E5xGO6FEQxEmOeaTwPyORc9pBe7u0QHdMEmRTJyZLwJU5zSI5T 4AtDOT2JmrZug== X-Nifty-SrcIP: [110.4.221.123] Date: Tue, 26 Oct 2021 17:52:29 +0900 From: Takashi Yano To: cygwin-developers@cygwin.com Subject: Re: malloc crash Message-Id: <20211026175229.1eda36caab1b03314a8cf165@nifty.ne.jp> In-Reply-To: References: <6a4d6675-7e4d-bcb3-9aff-acc0788d211d@cornell.edu> <97873b16-7ec3-02d7-1861-3ec62a79c37e@cornell.edu> <4b322eb0-4941-6b8f-6f46-aa76caf5a66f@cornell.edu> <2819d0db-3c5c-2d31-2b21-91efafb7f8f4@maxrnd.com> <20211026091855.7aaf2de97d10174121cbc8f9@nifty.ne.jp> X-Mailer: Sylpheed 3.7.0 (GTK+ 2.24.30; i686-pc-mingw32) Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-10.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: cygwin-developers@cygwin.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Cygwin core component developers mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 26 Oct 2021 08:52:47 -0000 On Tue, 26 Oct 2021 01:30:13 -0700 Mark Geisert wrote: > Replying to myself to correct something I wrote... > > Mark Geisert wrote: > > Takashi Yano wrote: > >> On Mon, 25 Oct 2021 16:36:50 -0700 > >> Mark Geisert wrote: > >>> Ken Brown wrote: > >>>> On 10/25/2021 5:29 PM, Mark Geisert wrote: > >>>>> Corinna Vinschen wrote: > >>>>>> Er... huh?  So both threads are in a malloc function?  This shouldn't > >>>>>> have happened, given the clunky muto guarding malloc calls.  This is > >>>>>> really strange.  Why's the muto not working here? > >>>>> > >>>>> Is it possible both threads have executed malloc_init()? > >>>>> If so, the second one would reinit the muto. > >>>> > >>>> Or does the fifo_reader thread call a malloc function before the main thread has > >>>> called malloc_init()?  This would presumably cause __malloc_lock() to fail, but > >>>> there's no error check. > >>> > >>> If there's a global constructor involved, that is known to happen.  Constructors > >>> are run from dll_crt0_0(), before malloc_init() is called from dll_crt0_1().  See > >>> dcrt0.cc for the details. > >> > >> So how about moving malloc_init() call from dll_crt0_1() to dll_crl0_0() > >> so that malloc() can be called in fixup_after_fork/exec()? > > > > It appears simple, but this is a touchy area of code.  The _0 and _1 are two > > separate phases of process startup.  I'd want to hear Corinna's thoughts on this. > > > > I'd also like to verify somehow that this is the scenario Ken is hitting. > > > > When I was researching different mallocs for Cygwin I hit the constructor snag > > repeatedly.  I did try delaying the constructor-running until after malloc_init(). > >  More problems.  I did not try moving malloc_init() to before the constructor run. > > Apologies; this was many months ago. What I did try was moving the malloc_init() > to before running the constructor chain, as Takashi suggested. That is what gave > me more problems. I don't recall what they were, but I reverted that attempt. > > The "future malloc" build of Cygwin I'm running doesn't exhibit Ken's issue, as > far as I can tell. It has a specific fix to avoid the scenario I've been talking > about here, but I don't want to take us down that path unless we're sure Ken's > hitting that same scenario. I tried the following patch, and confirmed that the issue has been disappeared. I do not notice any other problems so far with this patch. diff --git a/winsup/cygwin/dcrt0.cc b/winsup/cygwin/dcrt0.cc index 6f4723bb0..0d541ec14 100644 --- a/winsup/cygwin/dcrt0.cc +++ b/winsup/cygwin/dcrt0.cc @@ -773,6 +773,10 @@ dll_crt0_0 () do_global_ctors (&__CTOR_LIST__, 1); cygthread::init (); + /* malloc_init() has been moved from dll_crt0_1() to here so that + malloc() can be called in fixup_after_exec(). */ + malloc_init (); + if (!child_proc_info) { setup_cygheap (); @@ -857,7 +861,7 @@ dll_crt0_1 (void *) on a functioning malloc and it's possible that the user's program may have overridden malloc. We only know about that at this stage, unfortunately. */ - malloc_init (); + /* malloc_init() has been moved to dll_crt0_0(). */ user_shared->initialize (); #ifdef CYGHEAP_DEBUG Where is the "constructor chain" you mentioned? -- Takashi Yano