From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id D5DBE3858406 for ; Wed, 10 Aug 2022 15:42:45 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org D5DBE3858406 Received: from mail-oa1-f70.google.com (mail-oa1-f70.google.com [209.85.160.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-371-OKIpEs5qNYmIQLkO3doZ4Q-1; Wed, 10 Aug 2022 11:42:42 -0400 X-MC-Unique: OKIpEs5qNYmIQLkO3doZ4Q-1 Received: by mail-oa1-f70.google.com with SMTP id 586e51a60fabf-10e88633e1cso3308162fac.21 for ; Wed, 10 Aug 2022 08:42:42 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc; bh=ygcjc2J/NXmQS1tlM6BXa4yV0b5RmFiNH64r2pfmqLU=; b=RRy1p0UGiAqicOGDBvporsnmIHtEsibaJhIp/olJwm7RX1t4UgsvwWgr52qMIHGolv C2WXQzBK/81BmqZxwIed83IvKMrHnQIaxcMmTxUNOsuzmm481dicHHgVdE30vKP42E2e R5oHE3ovHmhAhKAJklrKlFE1HcP2oIzbTQJgAq2uuiTVcWHuGZigJnUhh0Pvu3vn4985 Zj/qw5eJCq45CqeBnmwK5gmk08P/J8bDZjfuvwU/oUL3omDmmhi4Zd7cYpY33it717Sr y+1x80qzMtQrwzzGssiAXiE/aPeJAK/0UTNWaQZ66DX0DM9uFGnrc8bDjSMdP/xvaKmy u//w== X-Gm-Message-State: ACgBeo2C5ktMBb0GBrm3hSqhe4t6T3xSv24S/mOuC4yD1YK3iimqIXD9 65mzsqitHKSO8q1PJ6C2IRxjnG8nBhLhQ6joROLz3qpGUU2doWCKz36j9FEd3Sjjb7fgcnGY60I I5Vdw8ANvt5cuGmBPbhAtHiI5O4Olv41Ukw== X-Received: by 2002:a05:6808:f0f:b0:343:2e0e:ac52 with SMTP id m15-20020a0568080f0f00b003432e0eac52mr1527556oiw.36.1660146161162; Wed, 10 Aug 2022 08:42:41 -0700 (PDT) X-Google-Smtp-Source: AA6agR5NSbCZGnBxReMqybi2lk/gXrL/P5WwY7ZkBfUcqglIFaGNbu4Iisy2sWE6RbJW3PQI58nYSLz6liEhmy89ndU= X-Received: by 2002:a05:6808:f0f:b0:343:2e0e:ac52 with SMTP id m15-20020a0568080f0f00b003432e0eac52mr1527541oiw.36.1660146160639; Wed, 10 Aug 2022 08:42:40 -0700 (PDT) MIME-Version: 1.0 References: <37255.122080909012202281@us-mta-359.us.mimecast.lan> <55ed3acb-ac0c-d7b7-0e9b-58591cf992a2@redhat.com> In-Reply-To: From: Aldy Hernandez Date: Wed, 10 Aug 2022 17:42:18 +0200 Message-ID: Subject: Re: [PATCH] tree-optimization/106514 - revisit m_import compute in backward threading To: Richard Biener Cc: Andrew MacLeod , gcc-patches X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Aug 2022 15:42:49 -0000 I'm still on vacation, but I figured I'd mention a few things to either unblock you, or move the conversation along. On Wed, Aug 10, 2022 at 8:45 AM Richard Biener wrote: > > On Tue, 9 Aug 2022, Andrew MacLeod wrote: > > > > > On 8/9/22 09:01, Richard Biener wrote: > > > This revisits how we compute imports later used for the ranger path > > > query during backwards threading. The compute_imports function > > > of the path solver ends up pulling the SSA def chain of regular > > > stmts without limit and since it starts with just the gori imports > > > of the path exit it misses some interesting names to translate > > > during path discovery. In fact with a still empty path this > > > compute_imports function looks like not the correct tool. > > > > I don't really know how this works in practice. Aldys off this week, so he > > can comment when he returns. > > > > The original premise was along the line of recognizing that only changes to a > > GORI import name to a block can affect the branch at the end of the block. > > ie, if the path doesn't change any import to block A, then the branch at the > > end of block A will not change either. Likewise, if it does change an > > import, then we look at whether the branch can be threaded. Beyond that > > basic premise, I dont know what all it does. > > Yep, I also think that's the idea. Yes. > > > I presume the unbounded def chain is for local defs within a block that in > > turn feeds the import to another block. Im not sure why we need to do much > > with those.. again, its only the import to the defchain that can affect the > > outcome t the end of the chain.. and if it changes, then you need to > > recalculate the entire chain.. but that would be part of the normal path > > walk. I suspect ther eis also some pruning that can be done there, as GORi > > reflects "can affect the range" not "will affect the range". > > > > Perhaps whats going on is that all those local elements are being added up > > front to the list of interesting names? That would certainly blow up the > > bitmaps and loops and such. > > What it does is, if we have > > bb: > _3 = _5 + 1; > _1 = _3 + _4; > if (_1 > _2) > > it puts _3 and _4 and _5 into the set of interesting names. That's > OK and desired I think? The actual problem is that compute_imports > will follow the def of _5 and _4 into dominating blocks recursively, > adding things to the imports even if the definition blocks are not > on the path (the path is empty at the point we call compute_imports). That sounds like a bug. We shouldn't recurse unbounded outside of the path. For that matter, for anything outside the path, we should be querying the ranger, which can cache things appropriately. The intent was that for anything queried outside of the path, we should be using the ranger through range_on_path_entry. > For the testcase at hand this pulls in some 1000s of names into the > initial set of imports. Now, the current path discovery code > only adds to imports by means of PHI translating from a PHI > def to a PHI use on the path edge - it doesn't add any further > local names used to define such PHI edge use from blocks not > dominating the path exit (but on the about to be threaded path). > > I'll also note that compute_imports does > > // Exported booleans along the path, may help conditionals. > if (m_resolve) > for (i = 0; i < m_path.length (); ++i) > { > basic_block bb = m_path[i]; > tree name; > FOR_EACH_GORI_EXPORT_NAME (gori, bb, name) > if (TREE_CODE (TREE_TYPE (name)) == BOOLEAN_TYPE) > bitmap_set_bit (imports, SSA_NAME_VERSION (name)); > } > > but at least for the backwards threader this isn't effective > since the path is empty at the point we call this. And no > other code in the backwards threader does sth like this. IIRC, this came about to help with statements leading up to the condition. See fur_source::register_outgoing_edges: // Now look for other relations in the exports. This will find stmts // leading to the condition such as: // c_2 = a_4 < b_7 // if (c_2) FOR_EACH_GORI_EXPORT_NAME (*(gori ()), bb, name) { if (TREE_CODE (TREE_TYPE (name)) != BOOLEAN_TYPE) continue; At least back when I wrote it, it made a difference in the number of threads we got. > I would also say for the exit block it doesn't make > much sense to look at gori exports. Plus I don't understand > what this tries to capture - it presumably catches stmts > like _1 = _2 == _3 that have uses in downstream blocks but > might be not directly part of the conditional op def chain. > > In the end all the threader does is, once the path is complete, > compute ranges for the path and the imports and then fold the > path exit stmt. > > I'm not sure which names we need in the import set for this > but as I guess the set of imports constrain the names we > compute ranges for so for the BB above, if we want to have > a range of _1 we need _3 in the imports set even though it > is not in the set of GORI imports? > > What I'm seeing with the threader is that we have quite some > cases where we have an "unrelated" CFG diamond on the path but > end up threading a random path through it (because neither the > path query infrastructure nor the copier knows to copy the > whole diamond). When the def chain of the operands on the Hmmm, this is unrelated, but we had talked about constraining the backward threader to notice that we had gone back up to the beginning of a diamond, and there was no sense going up the other arm to the top of the diamond. So something like this: A / \ L R \ / B If we had gone up B<-R-<-A, we should have a way to notice that anything before B<-L<-A would yield the same result, and only recurse back on one side. Alas, stage3 came upon us, and besides, I wasn't convinced that calculating all that would be worth it. > controlling conditions are not in the import set then I > suppose the path range computation doesn't do anything to > simplify things there (we do after all specify a condition > result by means of arbitrarily choosing one of the paths). > Such diamonds are the main source of exponential behavior > of the path discovery and I'm thinking of ways to improve > things here. The original backwards threader, supposedly > due to a bug, only ever considered one path for continuing > beyond a diamond - but maybe that was on purpose because > the intermediate condition doesn't meaninfully contribute > to resolving the exit condition. > > Anyway, the important result of the change is that the imports > set is vastly smaller since it is now constrained to the > actual path. Plus it now contains the local defs in blocks That sounds good. > of the path that do not dominate the exit block which means > it might get more threading opportunities. Which reminds me > to re-do the cc1files experiment for this change. Double check threading results at each step/patch. Things that are seemingly obvious can have drastic changes in the thread count. > > > Im sure Aldy will pitch in when he returns from vacation. > > Good, I'll wait for his comments. > > Richard. Overall, I don't want you to be blocked until I return. You seem to have a good grasp of what's going on, and if you're careful to check we're not regressing in the thread count (unless the threads are unreachable or senseless), we could always do things as a follow-up. It also helps to have more than oner person knowledgeable in this area. Thanks again. Aldy > > > > > > > > > > > The following instead implements what it does during the path discovery > > > and since we add the exit block we seed the initial imports and > > > interesting names from just the exit conditional. When we then > > > process interesting names (aka imports we did not yet see the definition > > > of) we prune local defs but add their uses in a similar way as > > > compute_imports would have done. > > > > > > The patch also properly unwinds m_imports during the path discovery > > > backtracking and from a debugging session I have verified the two > > > sets evolve as expected now while previously behaving slightly erratic. > > > > > > Fortunately the m_imports set now also is shrunken significantly for > > > the PR69592 testcase (aka PR106514) so that there's overall speedup > > > when increasing --param max-jump-thread-duplication-stmts as > > > 15 -> 30 -> 60 -> 120 from 1s -> 2s -> 13s -> 27s to with the patch > > > 1s -> 2s -> 4s -> 8s. > > > > > > This runs into a latent issue in X which doesn't seem to expect > > > any PHI nodes with a constant argument on an edge inside the path. > > > But we now have those as interesting, for example for the ICEing > > > g++.dg/torture/pr100925.C which just has sth like > > > > > > if (i) > > > x = 1; > > > else > > > x = 5; > > > if (x == 1) > > > ... > > > > > > where we now have the path from if (i) to if (x) and the PHI for x > > > in the set of imports to consider for resolving x == 1 which IMHO > > > looks exactly like what we want. The path_range_query::ssa_range_in_phi > > > papers over the issue and drops the range to varying instead of > > > crashing. I didn't want to mess with this any further in this patch > > > (but I couldn't resist replacing the loop over PHI args with > > > PHI_ARG_DEF_FROM_EDGE, so mind the re-indenting). > > > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu w/o the > > > path_range_query::ssa_range_in_phi fix, now re-running with. > > > > > > OK? > > > > > > Thanks, > > > Richard. > > > > > > PR tree-optimization/106514 > > > * tree-ssa-threadbackward.cc (back_threader::find_paths_to_names): > > > Compute and unwind both m_imports and interesting on the fly during > > > path discovery. > > > (back_threader::find_paths): Compute the original m_imports > > > from just the SSA uses of the exit conditional. Drop > > > handling single_succ_to_potentially_threadable_block. > > > * gimple-range-path.cc (path_range_query::ssa_range_in_phi): Handle > > > constant PHI arguments without crashing. Use PHI_ARG_DEF_FROM_EDGE. > > > --- > > > gcc/gimple-range-path.cc | 52 ++++++++--------- > > > gcc/tree-ssa-threadbackward.cc | 104 ++++++++++++++++++++++++++------- > > > 2 files changed, 106 insertions(+), 50 deletions(-) > > > > > > diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc > > > index 43e7526b6fc..b4376011ea8 100644 > > > --- a/gcc/gimple-range-path.cc > > > +++ b/gcc/gimple-range-path.cc > > > @@ -276,8 +276,6 @@ void > > > path_range_query::ssa_range_in_phi (vrange &r, gphi *phi) > > > { > > > tree name = gimple_phi_result (phi); > > > - basic_block bb = gimple_bb (phi); > > > - unsigned nargs = gimple_phi_num_args (phi); > > > > > > if (at_entry ()) > > > { > > > @@ -287,6 +285,7 @@ path_range_query::ssa_range_in_phi (vrange &r, gphi > > > *phi) > > > // Try to fold the phi exclusively with global or cached values. > > > // This will get things like PHI <5(99), 6(88)>. We do this by > > > // calling range_of_expr with no context. > > > + unsigned nargs = gimple_phi_num_args (phi); > > > Value_Range arg_range (TREE_TYPE (name)); > > > r.set_undefined (); > > > for (size_t i = 0; i < nargs; ++i) > > > @@ -303,36 +302,31 @@ path_range_query::ssa_range_in_phi (vrange &r, gphi > > > *phi) > > > return; > > > } > > > + basic_block bb = gimple_bb (phi); > > > basic_block prev = prev_bb (); > > > edge e_in = find_edge (prev, bb); > > > - > > > - for (size_t i = 0; i < nargs; ++i) > > > - if (e_in == gimple_phi_arg_edge (phi, i)) > > > - { > > > - tree arg = gimple_phi_arg_def (phi, i); > > > - // Avoid using the cache for ARGs defined in this block, as > > > - // that could create an ordering problem. > > > - if (ssa_defined_in_bb (arg, bb) || !get_cache (r, arg)) > > > - { > > > - if (m_resolve) > > > - { > > > - Value_Range tmp (TREE_TYPE (name)); > > > - // Using both the range on entry to the path, and the > > > - // range on this edge yields significantly better > > > - // results. > > > - if (defined_outside_path (arg)) > > > - range_on_path_entry (r, arg); > > > - else > > > - r.set_varying (TREE_TYPE (name)); > > > - m_ranger->range_on_edge (tmp, e_in, arg); > > > - r.intersect (tmp); > > > - return; > > > - } > > > + tree arg = PHI_ARG_DEF_FROM_EDGE (phi, e_in); > > > + // Avoid using the cache for ARGs defined in this block, as > > > + // that could create an ordering problem. > > > + if (ssa_defined_in_bb (arg, bb) || !get_cache (r, arg)) > > > + { > > > + if (m_resolve) > > > + { > > > + Value_Range tmp (TREE_TYPE (name)); > > > + // Using both the range on entry to the path, and the > > > + // range on this edge yields significantly better > > > + // results. > > > + if (TREE_CODE (arg) == SSA_NAME > > > + && defined_outside_path (arg)) > > > + range_on_path_entry (r, arg); > > > + else > > > r.set_varying (TREE_TYPE (name)); > > > - } > > > - return; > > > - } > > > - gcc_unreachable (); > > > + m_ranger->range_on_edge (tmp, e_in, arg); > > > + r.intersect (tmp); > > > + return; > > > + } > > > + r.set_varying (TREE_TYPE (name)); > > > + } > > > } > > > > > > // If NAME is defined in BB, set R to the range of NAME, and return > > > diff --git a/gcc/tree-ssa-threadbackward.cc b/gcc/tree-ssa-threadbackward.cc > > > index 4cd4d21c712..a544e97b2af 100644 > > > --- a/gcc/tree-ssa-threadbackward.cc > > > +++ b/gcc/tree-ssa-threadbackward.cc > > > @@ -482,32 +482,76 @@ back_threader::find_paths_to_names (basic_block bb, > > > bitmap interesting, > > > { > > > // For further greedy searching we want to remove interesting > > > // names defined in BB but add ones on the PHI edges for the > > > - // respective edges. We do this by starting with all names > > > + // respective edges and adding imports from those stmts. > > > + // We do this by starting with all names > > > // not defined in BB as interesting, collecting a list of > > > // interesting PHIs in BB on the fly. Then we iterate over > > > // predecessor edges, adding interesting PHI edge defs to > > > // the set of interesting names to consider when processing it. > > > auto_bitmap new_interesting; > > > + auto_vec new_imports; > > > auto_vec interesting_phis; > > > bitmap_iterator bi; > > > unsigned i; > > > + auto_vec worklist; > > > EXECUTE_IF_SET_IN_BITMAP (interesting, 0, i, bi) > > > { > > > tree name = ssa_name (i); > > > gimple *def_stmt = SSA_NAME_DEF_STMT (name); > > > + /* Imports remain interesting. */ > > > if (gimple_bb (def_stmt) != bb) > > > - bitmap_set_bit (new_interesting, i); > > > - else if (gphi *phi = dyn_cast (def_stmt)) > > > { > > > - tree res = gimple_phi_result (phi); > > > - if (!SSA_NAME_OCCURS_IN_ABNORMAL_PHI (res)) > > > - interesting_phis.safe_push (phi); > > > + bitmap_set_bit (new_interesting, i); > > > + continue; > > > + } > > > + worklist.quick_push (name); > > > + while (!worklist.is_empty ()) > > > + { > > > + tree name = worklist.pop (); > > > + gimple *def_stmt = SSA_NAME_DEF_STMT (name); > > > + /* Newly discovered imports are interesting. */ > > > + if (gimple_bb (def_stmt) != bb) > > > + { > > > + bitmap_set_bit (new_interesting, SSA_NAME_VERSION (name)); > > > + continue; > > > + } > > > + /* Local PHIs participate in renaming below. */ > > > + if (gphi *phi = dyn_cast (def_stmt)) > > > + { > > > + tree res = gimple_phi_result (phi); > > > + if (!SSA_NAME_OCCURS_IN_ABNORMAL_PHI (res)) > > > + interesting_phis.safe_push (phi); > > > + } > > > + /* For other local defs process their uses, amending > > > + imports on the way. */ > > > + else if (is_gimple_assign (def_stmt)) > > > + { > > > + for (unsigned j = 1; j < gimple_num_ops (def_stmt); ++j) > > > + { > > > + tree rhs = gimple_op (def_stmt, j); > > > + if (TREE_CODE (rhs) == SSA_NAME > > > + && bitmap_set_bit (m_imports, > > > + SSA_NAME_VERSION (rhs))) > > > + { > > > + /* ??? We probably want to track a 'visited' > > > + flag separately and add to imports only > > > + when the def is handled by ranger. The > > > + current way adds us defs that are neither > > > + PHIs nor "interesting" assigns, like for > > > + example loads. */ > > > + new_imports.safe_push (SSA_NAME_VERSION (rhs)); > > > + worklist.safe_push (rhs); > > > + } > > > + } > > > + } > > > } > > > } > > > + worklist.release (); > > > if (!bitmap_empty_p (new_interesting) > > > || !interesting_phis.is_empty ()) > > > { > > > - auto_vec unwind (interesting_phis.length ()); > > > + auto_vec unwind (interesting_phis.length ()); > > > + auto_vec imports_unwind (interesting_phis.length ()); > > > edge_iterator iter; > > > edge e; > > > FOR_EACH_EDGE (e, iter, bb->preds) > > > @@ -525,22 +569,31 @@ back_threader::find_paths_to_names (basic_block bb, > > > bitmap interesting, > > > { > > > tree def = PHI_ARG_DEF_FROM_EDGE (phi, e); > > > if (TREE_CODE (def) == SSA_NAME) > > > - if (bitmap_set_bit (new_interesting, > > > - SSA_NAME_VERSION (def))) > > > - { > > > - bitmap_set_bit (m_imports, SSA_NAME_VERSION (def)); > > > - unwind.quick_push (def); > > > - } > > > + { > > > + int ver = SSA_NAME_VERSION (def); > > > + if (bitmap_set_bit (new_interesting, ver)) > > > + { > > > + if (bitmap_set_bit (m_imports, ver)) > > > + imports_unwind.quick_push (ver); > > > + unwind.quick_push (ver); > > > + } > > > + } > > > } > > > find_paths_to_names (e->src, new_interesting, overall_paths); > > > - // Restore new_interesting. We leave m_imports alone since > > > - // we do not prune defs in BB from it and separately keeping > > > - // track of which bits to unwind isn't worth the trouble. > > > - for (tree def : unwind) > > > - bitmap_clear_bit (new_interesting, SSA_NAME_VERSION (def)); > > > + // Restore new_interesting. > > > + for (int def : unwind) > > > + bitmap_clear_bit (new_interesting, def); > > > unwind.truncate (0); > > > + // Restore and m_imports. > > > + for (int def : imports_unwind) > > > + bitmap_clear_bit (m_imports, def); > > > + imports_unwind.truncate (0); > > > } > > > } > > > + /* m_imports tracks all interesting names on the path, so when > > > + backtracking we have to restore it. */ > > > + for (int j : new_imports) > > > + bitmap_clear_bit (m_imports, j); > > > } > > > else if (dump_file && (dump_flags & TDF_DETAILS)) > > > fprintf (dump_file, " FAIL: Search space limit %d reached.\n", > > > @@ -566,15 +619,24 @@ back_threader::find_paths (basic_block bb, tree name) > > > && gimple_code (stmt) != GIMPLE_SWITCH)) > > > return; > > > - if (EDGE_COUNT (bb->succs) > 1 > > > - || single_succ_to_potentially_threadable_block (bb)) > > > + if (EDGE_COUNT (bb->succs) > 1) > > > { > > > m_last_stmt = stmt; > > > m_visited_bbs.empty (); > > > m_path.truncate (0); > > > m_name = name; > > > - m_solver->compute_imports (m_imports, bb); > > > + // We compute imports of the path during discovery starting > > > + // just with names used in the conditional. > > > + bitmap_clear (m_imports); > > > + ssa_op_iter iter; > > > + FOR_EACH_SSA_TREE_OPERAND (name, stmt, iter, SSA_OP_USE) > > > + bitmap_set_bit (m_imports, SSA_NAME_VERSION (name)); > > > + > > > + // Interesting is the set of imports we still not have see > > > + // the definition of. So while imports only grow, the > > > + // set of interesting defs dwindles and once empty we can > > > + // stop searching. > > > auto_bitmap interesting; > > > bitmap_copy (interesting, m_imports); > > > find_paths_to_names (bb, interesting, 1); > > > > > > > > -- > Richard Biener > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, > Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; > HRB 36809 (AG Nuernberg)