From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2001:67c:2178:6::1c]) by sourceware.org (Postfix) with ESMTPS id 5891D3858D32 for ; Mon, 15 Aug 2022 09:39:12 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 5891D3858D32 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id AE2D233E14; Mon, 15 Aug 2022 09:39:09 +0000 (UTC) Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 8C6BD2C1B9; Mon, 15 Aug 2022 09:39:09 +0000 (UTC) Date: Mon, 15 Aug 2022 09:39:08 +0000 (UTC) From: Richard Biener To: Aldy Hernandez cc: Jeff Law , gcc-patches , "MacLeod, Andrew" Subject: Re: [PATCH] Support threading of just the exit edge In-Reply-To: Message-ID: References: <20220812120145.91C6513AAE@imap2.suse-dmz.suse.de> User-Agent: Alpine 2.22 (LSU 394 2020-01-19) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-11.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Aug 2022 09:39:16 -0000 On Fri, 12 Aug 2022, Aldy Hernandez wrote: > On Fri, Aug 12, 2022 at 2:01 PM Richard Biener wrote: > > > > This started with noticing we add ENTRY_BLOCK to our threads > > just for the sake of simplifying the conditional at the end of > > the first block in a function. That's not really threading > > anything but it ends up duplicating the entry block, and > > re-writing the result instead of statically fold the jump. > > Hmmm, but threading 2 blocks is not really threading at all?? Unless > I'm misunderstanding something, this was even documented in the > backwards threader: > > [snip] > That's not really a jump threading opportunity, but instead is > simple cprop & simplification. We could handle it here if we > wanted by wiring up all the incoming edges. If we run this > early in IPA, that might be worth doing. For now we just > reject that case. */ > if (m_path.length () <= 1) > return false; > > Which you undoubtedly ran into because you're specifically eliding the check: > > > - if (m_profit.profitable_path_p (m_path, m_name, taken_edge, > > - &irreducible) > > + if ((m_path.length () == 1 > > + || m_profit.profitable_path_p (m_path, m_name, taken_edge, > > + &irreducible)) Correct. But currently the threader just "cheats", picks up one more block and then "threads" the case anyway, doing this simple simplification in the most expensive way possible ... > > > > The following tries to handle those by recording simplifications > > of the exit conditional as a thread of length one. That requires > > special-casing them in the backward copier since if we do not > > have any block to copy but modify the jump in place and remove > > not taken edges this confuses the hell out of remaining threads. > > > > So back_jt_path_registry::update_cfg now first marks all > > edges we know are never taken and then prunes the threading > > candidates when they include such edge. Then it makes sure > > to first perform unreachable edge removal (so we avoid > > copying them when other thread paths contain the prevailing > > edge) before continuing to apply the remaining threads. > > This is all beyond my pay grade. I'm not very well versed in the > threader per se. So if y'all think it's a good idea, by all means. > Perhaps Jeff can chime in, or remembers the above comment? > > > > > In statistics you can see this avoids quite a bunch of useless > > threads (I've investiated 3 random files from cc1files with > > dropped stats in any of the thread passes). > > > > Still thinking about it it would be nice to avoid the work of > > discovering those candidates we have to throw away later > > which could eventually be done by having the backward threader > > perform a RPO walk over the CFG, skipping edges that can be > > statically determined as not being executed. Below I'm > > abusing the path range query to statically analyze the exit > > branch but I assume there's a simpler way of folding this stmt > > which could then better integrate with such a walk. > > Unreachable paths can be queried with > path_range_query::unreachable_path_p (). Could you leverage this? > The idea was that if we ever resolved any SSA name to UNDEFINED, the > path itself was unreachable. The situation is that we end up with paths where an intermediate branch on the path can be simplified to false - but of course only if we put all intermediate branch dependences into the list of imports to consider. I don't like it very much to use the "threading" code to perform the simplification but I couldn't figure a cheap way to perform the simplification without invoking a full EVRP? That said, the backwards threader simply does basic_block bb; FOR_EACH_BB_FN (bb, m_fun) if (EDGE_COUNT (bb->succs) > 1) maybe_thread_block (bb); bool changed = m_registry.thread_through_all_blocks (true); instead of that we should only consider edges that may be executable by instead walking the CFG along such edges, simplifying BB exit conditionals. Unfortunately EVRP is now a C++ maze so I couldn't find how to actually do such simplification, not knowing how interacting with ranger influences the path query use either. If you or Andrew has any suggestions on how to essentially do a if (edge e = find_taken_edge (bb)) { ... } where find_taken_edge should be at least as powerful as using the path query for a single bb then I'd be all ears. As said, I tried to find the code to cut&paste in EVRP but failed to ... Thanks, Richard. > > Aldy > > > > > In any case it seems worth more conciously handling the > > case of exit branches that simplify without path sensitive > > information. > > > > Then the patch also restricts path discovery when we'd produce > > threads we'll reject later during copying - the backward threader > > copying cannot handle paths where the to duplicate blocks are > > not from exactly the same loop. I'm probably going to split this > > part out. > > > > Any thoughts? > > > > * gimple-range-path.cc (path_range_query::set_path): Adjust > > assert to allow paths of size one. > > * tree-ssa-threadbackward.cc (back_threader::maybe_register_path): > > Paths of size one are always profitable. > > (back_threader::find_paths_to_names): Likewise. > > Do not walk further if we are leaving the current loop. > > (back_threader::find_taken_edge): Remove assert. Do not > > walk to ENTRY_BLOCK. > > * tree-ssa-threadupdate.cc (back_jt_path_registry::update_cfg): > > Handle jump threads of just the exit edge by modifying the > > control statement in-place. > > --- > > gcc/gimple-range-path.cc | 2 +- > > gcc/tree-ssa-threadbackward.cc | 21 ++++++++----- > > gcc/tree-ssa-threadupdate.cc | 54 ++++++++++++++++++++++++++++++++++ > > 3 files changed, 69 insertions(+), 8 deletions(-) > > > > diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc > > index 78146f5683e..a7d277c31b8 100644 > > --- a/gcc/gimple-range-path.cc > > +++ b/gcc/gimple-range-path.cc > > @@ -220,7 +220,7 @@ path_range_query::unreachable_path_p () > > void > > path_range_query::set_path (const vec &path) > > { > > - gcc_checking_assert (path.length () > 1); > > + gcc_checking_assert (!path.is_empty ()); > > m_path = path.copy (); > > m_pos = m_path.length () - 1; > > bitmap_clear (m_has_cache_entry); > > diff --git a/gcc/tree-ssa-threadbackward.cc b/gcc/tree-ssa-threadbackward.cc > > index b886027fccf..669098e4ec3 100644 > > --- a/gcc/tree-ssa-threadbackward.cc > > +++ b/gcc/tree-ssa-threadbackward.cc > > @@ -241,8 +241,9 @@ back_threader::maybe_register_path () > > else > > { > > bool irreducible = false; > > - if (m_profit.profitable_path_p (m_path, m_name, taken_edge, > > - &irreducible) > > + if ((m_path.length () == 1 > > + || m_profit.profitable_path_p (m_path, m_name, taken_edge, > > + &irreducible)) > > && debug_counter () > > && m_registry.register_path (m_path, taken_edge)) > > { > > @@ -267,7 +268,6 @@ back_threader::maybe_register_path () > > edge > > back_threader::find_taken_edge (const vec &path) > > { > > - gcc_checking_assert (path.length () > 1); > > switch (gimple_code (m_last_stmt)) > > { > > case GIMPLE_COND: > > @@ -350,9 +350,15 @@ back_threader::find_paths_to_names (basic_block bb, bitmap interesting, > > m_path.safe_push (bb); > > > > // Try to resolve the path without looking back. > > - if (m_path.length () > 1 > > - && (!m_profit.profitable_path_p (m_path, m_name, NULL) > > - || maybe_register_path ())) > > + if ((m_path.length () > 1 > > + && !m_profit.profitable_path_p (m_path, m_name, NULL)) > > + || maybe_register_path ()) > > + ; > > + > > + // The backwards thread copier cannot copy blocks that do not belong > > + // to the same loop, so when the new source of the path entry no > > + // longer belongs to it we don't need to search further. > > + else if (m_path[0]->loop_father != bb->loop_father) > > ; > > > > // Continue looking for ways to extend the path but limit the > > @@ -445,7 +451,8 @@ back_threader::find_paths_to_names (basic_block bb, bitmap interesting, > > edge e; > > FOR_EACH_EDGE (e, iter, bb->preds) > > { > > - if (e->flags & EDGE_ABNORMAL > > + if ((e->flags & EDGE_ABNORMAL) > > + || e->src->index == ENTRY_BLOCK > > // This is like path_crosses_loops in profitable_path_p but > > // more restrictive to avoid peeling off loop iterations (see > > // tree-ssa/pr14341.c for an example). > > diff --git a/gcc/tree-ssa-threadupdate.cc b/gcc/tree-ssa-threadupdate.cc > > index 59c268a3567..d40fa7c4cff 100644 > > --- a/gcc/tree-ssa-threadupdate.cc > > +++ b/gcc/tree-ssa-threadupdate.cc > > @@ -2613,6 +2613,60 @@ back_jt_path_registry::update_cfg (bool /*peel_loop_headers*/) > > bool retval = false; > > hash_set visited_starting_edges; > > > > + /* Mark never taken edges from paths that are just jump simplifications. */ > > + auto_edge_flag never_taken (cfun); > > + for (auto path : m_paths) > > + if (path->length () == 1) > > + { > > + edge_iterator ei; > > + edge e; > > + FOR_EACH_EDGE (e, ei, (*path)[0]->e->src->succs) > > + if (e != (*path)[0]->e) > > + e->flags |= never_taken; > > + } > > + > > + /* And prune paths that contain such edge before we remove them. */ > > + for (unsigned i = 0; i < m_paths.length ();) > > + { > > + bool remove = false; > > + for (auto je : *m_paths[i]) > > + { > > + if (je->e->flags & never_taken) > > + { > > + cancel_thread (m_paths[i], > > + "Avoiding threading through unreachable edge"); > > + remove = true; > > + break; > > + } > > + } > > + if (!remove) > > + ++i; > > + else > > + m_paths.unordered_remove (i); > > + } > > + > > + /* Finally perform those threads first, this way we avoid copying the > > + dead outgoing edges when other theads contain the prevailing edge. */ > > + for (unsigned i = 0; i < m_paths.length ();) > > + { > > + vec *path = m_paths[i]; > > + if (path->length () != 1) > > + { > > + ++i; > > + continue; > > + } > > + edge exit = (*path)[0]->e; > > + remove_ctrl_stmt_and_useless_edges (exit->src, exit->dest); > > + exit->flags &= ~(EDGE_TRUE_VALUE | EDGE_FALSE_VALUE | EDGE_ABNORMAL); > > + exit->flags |= EDGE_FALLTHRU; > > + /* We do not update dominance info. */ > > + free_dominance_info (CDI_DOMINATORS); > > + retval = true; > > + m_num_threaded_edges++; > > + path->release (); > > + m_paths.unordered_remove (i); > > + } > > + > > while (m_paths.length ()) > > { > > vec *path = m_paths[0]; > > -- > > 2.35.3 > > > > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)