From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id D42F63858D1E for ; Wed, 17 Aug 2022 14:39:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org D42F63858D1E Received: from mail-il1-f197.google.com (mail-il1-f197.google.com [209.85.166.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-359-K3EWdHVSMQqOuhgIEmrf8A-1; Wed, 17 Aug 2022 10:39:14 -0400 X-MC-Unique: K3EWdHVSMQqOuhgIEmrf8A-1 Received: by mail-il1-f197.google.com with SMTP id i12-20020a056e021d0c00b002df2d676974so9079522ila.5 for ; Wed, 17 Aug 2022 07:39:13 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:from:references:cc:to:content-language:subject :user-agent:mime-version:date:message-id:x-gm-message-state:from:to :cc; bh=P2OA8f4sE1v0883+xQeUncxT1u8xMdgkZCbEafZytw8=; b=0iQp6Gac9ngGgSZ8Bb221uoiQ6BcGQYIO4NGHIX3dI3DrxDfCs0n/2JVfqzF1/d4g+ HVXSfBtI+ghLuCaih8mFOJunjqXDggbXdxLfn3yZ7xzCvZKj2V1jFUO7jJv7CMj895Pl 7KRhFg/NmWnS7rwIrQR2+NoAxra233yGcGkFPMfIFxqbJk6XNDC8/5VgxbdgqhG5I3ET SS/Zy5AzNR3+75OTSnfxsI7y+5ugn4n5OGiV9QhfmhBUVPpnzCpoWQ1HBjcOUTAeHfHG dYYkxzbrKuPeAYd6Dc3s0PxxSvKI/WX+SIqIrbMgH+aDVv45yN9+filPAuGF2BW5cGdc HWpA== X-Gm-Message-State: ACgBeo2c1sGNOYbftIRxj6mzHc6zq/8LocsNeH+rB8cbe4D6p4JLwy9C P7mVyQoI4lrciFl1xkGrN9SJtNxtyhtO4qIwNHSfGprU1MsTLVAlenJeqdriVBiFgCdq9KFdIaC u71C4gApKR/pDpu9x0w== X-Received: by 2002:a6b:8f16:0:b0:688:7e04:bc70 with SMTP id r22-20020a6b8f16000000b006887e04bc70mr5601765iod.147.1660747153125; Wed, 17 Aug 2022 07:39:13 -0700 (PDT) X-Google-Smtp-Source: AA6agR65/8ONBVFlc55oGck3XrDbyQZDlpYLALsrzs0FZtdf6BWTTFU6AcenS+0GjVfRWpCFC7SyeA== X-Received: by 2002:a6b:8f16:0:b0:688:7e04:bc70 with SMTP id r22-20020a6b8f16000000b006887e04bc70mr5601749iod.147.1660747152695; Wed, 17 Aug 2022 07:39:12 -0700 (PDT) Received: from ?IPV6:2605:8d80:6c1:af9:e31b:28b7:6965:5ce0? ([2605:8d80:6c1:af9:e31b:28b7:6965:5ce0]) by smtp.gmail.com with ESMTPSA id c17-20020a023311000000b003434ee85d38sm5615820jae.4.2022.08.17.07.39.10 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 17 Aug 2022 07:39:11 -0700 (PDT) Message-ID: <7dbd0ee3-638a-1b84-9be9-9310eefdce11@redhat.com> Date: Wed, 17 Aug 2022 10:39:08 -0400 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0 Subject: Re: [PATCH] Support threading of just the exit edge To: Richard Biener Cc: Aldy Hernandez , Jeff Law , gcc-patches References: <20220812120145.91C6513AAE@imap2.suse-dmz.suse.de> <3b6265b1-a0ab-f0b7-2d1f-aaf6277eec14@redhat.com> <4c846dc9-0044-35d2-dd9a-87ec93dd8d75@redhat.com> From: Andrew MacLeod In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: multipart/mixed; boundary="------------TTd9K7ATfge4YwbLSFAhaxH7" Content-Language: en-US X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00, BODY_8BITS, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, NICE_REPLY_A, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2022 14:39:19 -0000 This is a multi-part message in MIME format. --------------TTd9K7ATfge4YwbLSFAhaxH7 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit On 8/17/22 03:42, Richard Biener wrote: > On Tue, 16 Aug 2022, Andrew MacLeod wrote: > >> On 8/16/22 05:18, Richard Biener wrote: >>> On Mon, 15 Aug 2022, Aldy Hernandez wrote: >>> >>>> On Mon, Aug 15, 2022 at 9:24 PM Andrew MacLeod wrote: >>>>> heh. or just >>>>> >>>>> >>>>> + int_range<2> r; >>>>> + if (!fold_range (r, const_cast (cond_stmt)) >>>>> + || !r.singleton_p (&val)) >>>>> >>>>> >>>>> if you do not provide a range_query to any of the fold_using_range code, >>>>> it defaults to: >>>>> >>>>> fur_source::fur_source (range_query *q) >>>>> { >>>>> if (q) >>>>> m_query = q; >>>>> else if (cfun) >>>>> m_query = get_range_query (cfun); >>>>> else >>>>> m_query = get_global_range_query (); >>>>> m_gori = NULL; >>>>> } >>>>> >>>> Sweet. Even better! >>> So when I do the following incremental change ontop of the posted >>> patch then I see that the path query is able to simplify more >>> "single BB paths" than the global range folding. >>> >>> diff --git a/gcc/tree-ssa-threadbackward.cc >>> b/gcc/tree-ssa-threadbackward.cc >>> index 669098e4ec3..777e778037f 100644 >>> --- a/gcc/tree-ssa-threadbackward.cc >>> +++ b/gcc/tree-ssa-threadbackward.cc >>> @@ -314,6 +314,12 @@ back_threader::find_taken_edge_cond (const >>> vec &path, >>> { >>> int_range_max r; >>> + int_range<2> rf; >>> + if (path.length () == 1) >>> + { >>> + fold_range (rf, cond); >>> + } >>> + >>> m_solver->compute_ranges (path, m_imports); >>> m_solver->range_of_stmt (r, cond); >>> @@ -325,6 +331,8 @@ back_threader::find_taken_edge_cond (const >>> vec &path, >>> >>> if (r == true_range || r == false_range) >>> { >>> + if (path.length () == 1) >>> + gcc_assert (r == rf); >>> edge e_true, e_false; >>> basic_block bb = gimple_bb (cond); >>> extract_true_false_edges_from_block (bb, &e_true, &e_false); >>> >>> Even doing the following (not sure what's the difference and in >>> particular expense over the path range query) results in missed >>> simplifications (checking my set of cc1files). >>> >>> diff --git a/gcc/tree-ssa-threadbackward.cc >>> b/gcc/tree-ssa-threadbackward.cc >>> index 669098e4ec3..1d43a179d08 100644 >>> --- a/gcc/tree-ssa-threadbackward.cc >>> +++ b/gcc/tree-ssa-threadbackward.cc >>> @@ -99,6 +99,7 @@ private: >>> >>> back_threader_registry m_registry; >>> back_threader_profitability m_profit; >>> + gimple_ranger *m_ranger; >>> path_range_query *m_solver; >>> >>> // Current path being analyzed. >>> @@ -146,12 +147,14 @@ back_threader::back_threader (function *fun, >>> unsigned flags, bool first) >>> // The path solver needs EDGE_DFS_BACK in resolving mode. >>> if (flags & BT_RESOLVE) >>> mark_dfs_back_edges (); >>> - m_solver = new path_range_query (flags & BT_RESOLVE); >>> + m_ranger = new gimple_ranger; >>> + m_solver = new path_range_query (flags & BT_RESOLVE, m_ranger); >>> } >>> >>> back_threader::~back_threader () >>> { >>> delete m_solver; >>> + delete m_ranger; >>> >>> loop_optimizer_finalize (); >>> } >>> @@ -314,6 +317,12 @@ back_threader::find_taken_edge_cond (const >>> vec &path, >>> { >>> int_range_max r; >>> + int_range<2> rf; >>> + if (path.length () == 1) >>> + { >>> + fold_range (rf, cond, m_ranger); >>> + } >>> + >>> m_solver->compute_ranges (path, m_imports); >>> m_solver->range_of_stmt (r, cond); >>> @@ -325,6 +334,8 @@ back_threader::find_taken_edge_cond (const >>> vec &path, >>> >>> if (r == true_range || r == false_range) >>> { >>> + if (path.length () == 1) >>> + gcc_assert (r == rf); >>> edge e_true, e_false; >>> basic_block bb = gimple_bb (cond); >>> extract_true_false_edges_from_block (bb, &e_true, &e_false); >>> >>> one example is >>> >>> [local count: 14414059]: >>> _128 = node_177(D)->typed.type; >>> pretmp_413 = MEM[(const union tree_node *)_128].base.code; >>> _431 = pretmp_413 + 65519; >>> if (_128 == 0B) >>> goto ; [18.09%] >>> else >>> goto ; [81.91%] >>> >>> where m_imports for the path is just _128 and the range computed is >>> false while the ranger query returns VARYING. But >>> path_range_query::range_defined_in_block does >>> >>> if (bb && POINTER_TYPE_P (TREE_TYPE (name))) >>> m_ranger->m_cache.m_exit.maybe_adjust_range (r, name, bb); >> This is the coarse grained "side effect applies somewhere in the block" >> mechanism.  There is no understanding of where in the block it happens. >>> which adjusts the range to ~[0, 0], probably because of the >>> dereference in the following stmt. >>> >>> Why does fold_range not do this when folding the exit test? Is there >>> a way to make it do so? It looks like the only routine using this >>> in gimple-range.cc is range_on_edge and there it's used for e->src >>> after calling range_on_exit for e->src (why's it not done in >>> range_on_exit?). A testcase for this is >> Fold_range doesnt do this because it is simply another statement.  It makes no >> attempt to understand the context in which you are folding something. you >> could be folding that stmt from a different location (ie recomputing)   If >> your context is that you are looking for the range after the last statement >> has been executed, then one needs to check to see if there are any side >> effects. > Hmm, but I'm asking it to fold a specific statement - how can that ever > be folded "from a different context"? In fact it traces as > " at stmt " with the stmt I pass it. But yes, the issue is of course > that to compute the range of q != 0 it needs to compute the range > of 'q' but it simply does > > // If no name, simply call the base routine. > if (!name) > { > res = fold_range_internal (r, s, NULL_TREE); > if (res && is_a (s)) > { > // Update any exports in the cache if this is a gimple cond > statement. > tree exp; > basic_block bb = gimple_bb (s); > FOR_EACH_GORI_EXPORT_NAME (m_cache.m_gori, bb, exp) > m_cache.propagate_updated_value (exp, bb); > > and fur_depend seems to get the context so it could adjust ranges > in its get_operand which seems to just call range_of_expr with > the stmt as third arg. Unfortunately gimple_ranger::range_of_expr > is undocumented, but again 'stmt' is dumped as " at stmt " thus > seems to be the "context" here? Oh, so you are looking at gimple_ranger::range_of_stmt(), not fold_range. > // If name is defined in this block, try to get an range from S. > > what is 'S'? def_stmt or stmt? > > if (def_stmt && gimple_bb (def_stmt) == bb) > { > // Declared in this block, if it has a global set, check for an > // override from a block walk, otherwise calculate it. > if (m_cache.get_global_range (r, expr)) > m_cache.block_range (r, bb, expr, false); > else > range_of_stmt (r, def_stmt, expr); > > so, add > > if (is_ctrl_stmt (stmt)) > m_cache.m_exit.maybe_adjust_range (r, expr, bb); > > at the end (also covering the range_on_entry case). I suppose > range_on_entry does not evaluate all possible paths from definition > to BB, using maybe_adjust_range on blocks inbetween and unioning > ranges at path merges? That would likely be prohibitly expensive. in fact it does. The cache takes care of filling in the blanks and requesting ranges that have not been satisfied yet in a reasonably efficient way.  Maybe_adjust_range() will deal only with inferred ranges that are marked as having occurred in this block. The problem I ran into with trying to doing it in range_on_exit(), is that if there is an abnornal edge out of the block, it is unsafe to assume that anything is true at block exit..   when processing the abnormal destination, we ask for range_on_exit of anything it uses from the src block. That said, I suppose is should be safe for range_of_stmt to assume the statement successfully executed (or there wouldnt be a defintion produced), and therefore if it is the last stmt in the block, we should in theory be able to do as you suggest there. Im trying to think if its safe to also do it in range_of_expr. if is_ctrl_stmt() is true, can I assume that the stmt cannot throw an exception under any circumstance?   in which case, its proabbly safe to put that in range_of_expr.  Ive attached a patch which does that if you want to try it. The caveat is that it is only a partial solution... it will only work for names on that stmt.  if you have anything more complex, ie if (a == 0 || b == 0)  we have a seqeunce feeding the ctrl stmt.. c_1 = a == 0 c_2 = b == 0 c_3 = c_1 && c_2 if (c_3 == 0) only the evaluation of c_3 will have the ctrl stmt as its context.. the others will be evaluted on their own statement, and thus neither a nor b would pick up anything from the block as they are evalauted and cached as they are seen.    unless of course we are doing a walk :-P > >> ranger uses it for range_on_edge (), because  it knows all the statements in >> the block have been executed, and its safe to apply anything seen in the >> block.  It does it right after range_on_exit() is called internally. >> >> Once upon a time, it was integrated with range-on-exit, but it turned out >> there were certain times that it was causing problems. There have been some >> cleanups since then, it probably safe now to return that call to >> range_on_exit.. but doesnt buy us a whole lot by itself.. except of course I >> have now OKd using range_on-entry/exit generally :-) >> >> the cache also uses it when walking blocks to pick up inferred values during >> an on-entry cache fill. >> >> >>> int foo (int **p, int i) >>> { >>> int *q = *p; >>> int res = *q + i; >>> if (q) >>> return res; >>> return -1; >>> } >>> >>> which we "thread" with the path and with the above ICEs because >>> fold_range doesn't get that if (q) is always true. Without the >> Its a known limitation that, unless you are doing a walk, on-demand requests >> will "miss" some inferred ranges, because they are only maintained at the >> basic block level.  (ie, we will know that q is non-null in BB2,  but we don't >> know where, so we can make no assumptions at the exit condition about q in >> this case. the path_query is invoked in on-demand mode because it wont walk >> the entire IL,. so the first time you ask for the range of an ssa-name, it >> will quickly zip over the immediate use list and "fill" the on-exit structure >> for any blocks which a non-null reference is seen.  This allows the threader >> to pick up non-null from blocks outside the path that havent been examined. >> >> VRP does a walk, and while during the walk, adjusts ranges on the fly for the >> current block via the gimple_ranger::register_inferred_ranges () routine. >> which is really just a wrapper around ranger_cache::apply_inferred_ranges () >> (in gimple-range-cache.cc) >> >> This is called after every statement and is where we take care of bookkeeping >> for adjusting values, and adding them to the blocks list. >> >> if the path query is walking those statement, it could also "adjust" the range >> of q on the fly... but it has to have walked those statements.  from that >> routine, the relevant bits use the gimple-infer class to find any inferred >> ranges from the statement, and would look something like: >> >>   gimple_infer_range infer(s); >>   for (unsigned x = 0; x < infer.num (); x++) >>     { >>       tree name = infer.name (x); >>       if (!interesting_p (name)) >>          continue; >>       get_current_path_range (r, name); >>       if (r.intersect (infer.range (x))) >>         set_current_path_range (name, r); >>     } >> >> That would adjust the value of q to be non-zero after   "int res = *q + i;" >> >> but you need to have walked the statement before you get to the condition. >> as long as they are all in your list of interesting statement to look at, then >> you'd be golden. > Ah, so while the above might work it would go against the spirit of this > being only fully useful if applied during a walk, adjusting the > cached ranges? It's still a bit unfortunate that one needs to walk all > stmts for this - the usecase wants to just walk the CFG and control stmts, > and with the path query it would only cover the single BB of each > control stmt (thus the above hack would probably work out). I wouldn't say its against the spirit, merely against trying to catch all cases in a practical way :-).  If we add that, it should work for this case.   for the threader, we will have visited all the uses of q to fill the inferred block cache q will have a non-null refererence in bb2.  and when calculating the final value, if we are indeed assuming anything in the block can be applied to the final stmt, then it should be ok. but we'll still mis anything but the simpliest cases.  maybe thats OK.   better to catch a few cases than none. let me think about how, especially for paths, we might be able to take care of calculating an entire complex condition/expression in the context of the original call. So, a longer term goal was to perhaps have some sort of persistent ranger across compatible passes.. turn it into a pass property, and only dispose of it when a pass makes enough IL changes to invalidate it...  In the meantime, it should be possible to take a ranger that just completed a VRP pass, and use that as the root ranger for a threading pass immediately after.. I think there may be some lingering issues with abnormal edges if we "re-visit" blocks which we claim to have walked due to the way I store inferred ranges in those block.. (the expectation being we never go back up into the block, so the on-entry cache works like the "current def" vector in the original EVRP.  I'd have to think about that too. > > Meanwhile I'm leaning towards calling this a phase ordering issue > of threading + VRP, but that also means we shouldn't deliberately > try to preserve "threadings" of this kind - in fact we might want > to explicitely reject them? we are probably going to want to visit some pass ordering. > Richard. --------------TTd9K7ATfge4YwbLSFAhaxH7 Content-Type: text/x-patch; charset=UTF-8; name="cont.diff" Content-Disposition: attachment; filename="cont.diff" Content-Transfer-Encoding: base64 ZGlmZiAtLWdpdCBhL2djYy9naW1wbGUtcmFuZ2UuY2MgYi9nY2MvZ2ltcGxlLXJhbmdlLmNjCmlu ZGV4IGViMzQ3ZWVlNDViLi4xODhhY2IyYzMwOCAxMDA2NDQKLS0tIGEvZ2NjL2dpbXBsZS1yYW5n ZS5jYworKysgYi9nY2MvZ2ltcGxlLXJhbmdlLmNjCkBAIC0xMjgsNiArMTI4LDEwIEBAIGdpbXBs ZV9yYW5nZXI6OnJhbmdlX29mX2V4cHIgKHZyYW5nZSAmciwgdHJlZSBleHByLCBnaW1wbGUgKnN0 bXQpCiAgICAgICAvLyBPdGhlcndpc2UgT1AgY29tZXMgZnJvbSBvdXRzaWRlIHRoaXMgYmxvY2ss IHVzZSByYW5nZSBvbiBlbnRyeS4KICAgICAgIGVsc2UKIAlyYW5nZV9vbl9lbnRyeSAociwgYmIs IGV4cHIpOworICAgICAgLy8gSWYgdGhlIGNvbnRleHQgaXMgdGhlIGZpbmFsIHN0YXRlbWVudCwg aXQgaXMgc2FmZSB0byBhcHBseSBhbnkKKyAgICAgIC8vIGtub3duIGluZmVycmVkIHJhbmdlcyBm b3IgdGhlIGJsb2NrLgorICAgICAgaWYgKGlzX2N0cmxfc3RtdCAoc3RtdCkpCisgICAgICAgbV9j YWNoZS5tX2V4aXQubWF5YmVfYWRqdXN0X3JhbmdlIChyLCBleHByLCBiYik7CiAgICAgfQogICBp ZiAoaWR4KQogICAgIHRyYWNlci50cmFpbGVyIChpZHgsICJyYW5nZV9vZl9leHByIiwgdHJ1ZSwg ZXhwciwgcik7Cg== --------------TTd9K7ATfge4YwbLSFAhaxH7--