From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by sourceware.org (Postfix) with ESMTPS id 4CA4E3858C00 for ; Thu, 4 Aug 2022 07:14:48 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 4CA4E3858C00 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id EF6023EF23; Thu, 4 Aug 2022 07:14:46 +0000 (UTC) Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id ADFFD2C141; Thu, 4 Aug 2022 07:14:46 +0000 (UTC) Date: Thu, 4 Aug 2022 07:14:46 +0000 (UTC) From: Richard Biener To: Aldy Hernandez cc: gcc-patches , "MacLeod, Andrew" Subject: Re: [PATCH] Backwards threader greedy search TLC In-Reply-To: Message-ID: References: <45651.122080305533000620@us-mta-640.us.mimecast.lan> User-Agent: Alpine 2.22 (LSU 394 2020-01-19) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-5.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Aug 2022 07:14:49 -0000 On Thu, 4 Aug 2022, Aldy Hernandez wrote: > On Wed, Aug 3, 2022 at 11:53 AM Richard Biener wrote: > > > > I've tried to understand how the greedy search works seeing the > > bitmap dances and the split into resolve_phi. I've summarized > > the intent of the algorithm as > > > > // For further greedy searching we want to remove interesting > > // names defined in BB but add ones on the PHI edges for the > > // respective edges. > > > > but the implementation differs in detail. In particular when > > there is more than one interesting PHI in BB it seems to only consider > > the first for translating defs across edges. It also only applies > > the loop crossing restriction when there is an interesting PHI. > > I've also noticed that while the set of interesting names is rolled > > back, m_imports just keeps growing - is that a bug? > > I've never quite liked how I was handling PHIs. I had some > improvements in this space, especially the problem with only > considering the first def across a PHI, but I ran out of time last > cycle, and we were already into stage3. > > The loop crossing restriction I inherited from the original > implementation, though I suppose I restricted it further by only > looking at interesting PHIs. In practice I don't think it mattered, > because we cap loop crossing violations in cancel_invalid_paths in the > registry. Does anything break if you keep the restriction across the > board? Probably not - I've also spotted all these "late" invalidates all over the place, some of them should be done earlier to limit the exponential search. I plan to go find them and divide them into things that can be checked locally per BB (good to do at greedy search time) and those that need the full path (we should simply not register such path). I'll keep this as is for now. > Good spotting on the imports growing. Instinctively that seems like a bug. > > [As a suggestion, check the total threadable paths as a sanity check > if you make any changes in this space. What I've been doing is > counting "Jumps threaded" in the *.stat dump file across the .ii files > in a bootstrap, and making sure you're not getting the same # of > threads because we missed one in the backward threader which then got > picked up by DOM. I divide up the threads by ethread, thread, > threadfull, DOM, etc, to make sure I'm not just shuffling threading > opportunities around.] I'll do this experiment and will roll this fix into the patch if it works out. > But yeah, we shouldn't be growing imports unnecessarily, if nothing > else because of the bitmap explosion you're noticing in other places. > In practice, I'm not so sure it slows the solver itself down: > > // They are hints for the solver. Adding more elements doesn't slow > // us down, because we don't solve anything that doesn't appear in the > // path. On the other hand, not having enough imports will limit what > // we can solve. > > But that comment may be old ;-). I hoped so, yes. > > > > The following preserves the loop crossing restriction to the case > > of interesting PHIs but merges resolve_phi back, changing interesting > > as outlined with the intent above. It should get more threading > > cases when there are multiple interesting PHI defs in a block. > > It might be a bit faster due to less bitmap operations but in the > > end the main intent was to make what happens more obvious. > > Sweet! > > > > > Bootstrap and regtest pending on x86_64-unknown-linux-gnu. > > > > Aldy - you wrote the existing implementation, is the following OK? > > I'm tickled pink you're cleaning and improving this. Looks good. OK, I'll re-test and push after doing the m_imports experiment. Thanks, Richard.