From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 3C6283898526 for ; Fri, 30 Jul 2021 08:39:18 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 3C6283898526 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 8F4D96D; Fri, 30 Jul 2021 01:39:17 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.126]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 198063F66F; Fri, 30 Jul 2021 01:39:16 -0700 (PDT) From: Richard Sandiford To: Aldy Hernandez Mail-Followup-To: Aldy Hernandez , GCC patches , richard.sandiford@arm.com Cc: GCC patches Subject: Re: [PATCH] Replace evrp use in loop versioning with ranger. References: <20210724141937.2325339-1-aldyh@redhat.com> Date: Fri, 30 Jul 2021 09:39:15 +0100 In-Reply-To: (Aldy Hernandez's message of "Tue, 27 Jul 2021 11:52:11 +0200") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_ASCII_DIVIDERS, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Jul 2021 08:39:20 -0000 Aldy Hernandez writes: > On Mon, Jul 26, 2021 at 7:28 PM Richard Sandiford > wrote: >> >> Aldy Hernandez writes: >> > On Mon, Jul 26, 2021 at 4:18 PM Richard Sandiford >> > wrote: >> >> >> >> Aldy Hernandez writes: >> >> > This patch replaces the evrp_range_analyzer in the loop versioning = code >> >> > with an on-demand ranger. >> >> > >> >> > Everything was pretty straightforward, except that range_of_expr re= quires >> >> > a gimple statement as context to provide context aware ranges. I d= idn't see >> >> > a convient place where the statement was saved, so I made a vector = indexed >> >> > by SSA names. As an alternative, I tried to use the loop's first s= tatement, >> >> > but that proved to be insufficient. >> >> >> >> The mapping is one-to-many though: there can be multiple statements >> >> for each SSA name. Maybe that doesn't matter in this context and >> >> any of the statements can act as a representative. >> >> >> >> I'm surprised that the loop's first statement didn't work though, >> >> since the SSA name is supposedly known to be loop-invariant. What we= nt >> >> wrong when you tried that? >> > >> > I was looking at the first statement of loop_info->block_list and one >> > of the dg.exp=3Dloop-versioning* tests failed. Perhaps I should have >> > used the loop itself, as in the attached patch. With this patch all >> > of the loop-versioning tests pass. >> > >> >> >> >> > I am not familiar with loop versioning, but if the DOM walk was only >> >> > necessary for the calls to record_ranges_from_stmt, this too could = be >> >> > removed as the ranger will work without it. >> >> >> >> Yeah, that was the only reason. If the information is available at >> >> version_for_unity (I guess it is) then we should just avoid recording >> >> the versioning there if so. >> >> >> >> How expensive is the check? If the result is worth caching, perhaps >> >> we should have two bitmaps: the existing one, and one that records >> >> whether we've checked a particular SSA name. >> >> >> >> If the check is relatively cheap then that won't be worth it though. >> > >> > If you're asking about the range_of_expr check, that's all cached, so >> > it should be pretty cheap. Besides, we're no longer calculating >> > ranges for each statement in the IL, as we were doing in lv_dom_walker >> > with evrp's record_ranges_from_stmt. Only statements of interest are >> > queried. >> >> Sounds good. If the results are already cached then another level >> of caching (via the second bitmap I mentioned above) would obviously >> be a waste of time. > > My callgrind harness for performance testing wasn't able to pick up > enough samples to measure the time spent in > pass_loop_versioning::execute. I've seen this happen before with > passes that run too fast. I'm afraid I don't have enough cycles to > continue working on this. Yeah, any testing of this was above and beyond IMO. Hearing that the range query does its own caching was enough for me. :-) >> > How about this patch, pending tests? >> >> OK, thanks, as a strict improvement over the status quo. But it'd be >> even better without the dom walk :-) > > I've removed the DOM walk, and re-tested. > > OK to push? Sorry for asking for another iteration, but=E2=80=A6 > Aldy > > From 9b1cba95377e7b26b4f0495b1b5998d2f7f33a14 Mon Sep 17 00:00:00 2001 > From: Aldy Hernandez > Date: Sat, 24 Jul 2021 12:29:28 +0200 > Subject: [PATCH] Replace evrp use in loop versioning with ranger. > > This patch replaces the evrp_range_analyzer in the loop versioning code > with a ranger. > > Tested on x86-64 Linux. > > gcc/ChangeLog: > > * gimple-loop-versioning.cc (lv_dom_walker::lv_dom_walker): Remove. > (loop_versioning::lv_dom_walker::before_dom_children): Remove. > (loop_versioning::lv_dom_walker::after_dom_children): Remove. > (loop_versioning::prune_loop_conditions): Replace vr_values use > with range_query interface. > (loop_versioning::prune_conditions): Replace dom walk with > straight iteration. > (pass_loop_versioning::execute): Use ranger. > --- > gcc/gimple-loop-versioning.cc | 78 ++++++++--------------------------- > 1 file changed, 18 insertions(+), 60 deletions(-) > > diff --git a/gcc/gimple-loop-versioning.cc b/gcc/gimple-loop-versioning.cc > index 4b70c5a4aab..52eb6429171 100644 > --- a/gcc/gimple-loop-versioning.cc > +++ b/gcc/gimple-loop-versioning.cc > @@ -30,19 +30,17 @@ along with GCC; see the file COPYING3. If not see > #include "tree-ssa-loop.h" > #include "ssa.h" > #include "tree-scalar-evolution.h" > -#include "tree-chrec.h" > #include "tree-ssa-loop-ivopts.h" > #include "fold-const.h" > #include "tree-ssa-propagate.h" > #include "tree-inline.h" > #include "domwalk.h" > -#include "alloc-pool.h" > -#include "vr-values.h" > -#include "gimple-ssa-evrp-analyze.h" > #include "tree-vectorizer.h" > #include "omp-general.h" > #include "predict.h" > #include "tree-into-ssa.h" > +#include "gimple-range.h" > +#include "tree-cfg.h" >=20=20 > namespace { >=20=20 > @@ -253,24 +251,6 @@ public: > unsigned int run (); >=20=20 > private: > - /* Used to walk the dominator tree to find loop versioning conditions > - that are always false. */ > - class lv_dom_walker : public dom_walker > - { > - public: > - lv_dom_walker (loop_versioning &); > - > - edge before_dom_children (basic_block) FINAL OVERRIDE; > - void after_dom_children (basic_block) FINAL OVERRIDE; > - > - private: > - /* The parent pass. */ > - loop_versioning &m_lv; > - > - /* Used to build context-dependent range information. */ > - evrp_range_analyzer m_range_analyzer; > - }; > - > /* Used to simplify statements based on conditions that are established > by the version checks. */ > class name_prop : public substitute_and_fold_engine > @@ -308,7 +288,7 @@ private: > bool analyze_block (basic_block); > bool analyze_blocks (); >=20=20 > - void prune_loop_conditions (class loop *, vr_values *); > + void prune_loop_conditions (class loop *); > bool prune_conditions (); >=20=20 > void merge_loop_info (class loop *, class loop *); > @@ -499,36 +479,6 @@ loop_info::worth_versioning_p () const > && (!bitmap_empty_p (&unity_names) || subloops_benefit_p)); > } >=20=20 > -loop_versioning::lv_dom_walker::lv_dom_walker (loop_versioning &lv) > - : dom_walker (CDI_DOMINATORS), m_lv (lv), m_range_analyzer (false) > -{ > -} > - > -/* Process BB before processing the blocks it dominates. */ > - > -edge > -loop_versioning::lv_dom_walker::before_dom_children (basic_block bb) > -{ > - m_range_analyzer.enter (bb); > - > - if (bb =3D=3D bb->loop_father->header) > - m_lv.prune_loop_conditions (bb->loop_father, &m_range_analyzer); > - > - for (gimple_stmt_iterator si =3D gsi_start_bb (bb); !gsi_end_p (si); > - gsi_next (&si)) > - m_range_analyzer.record_ranges_from_stmt (gsi_stmt (si), false); > - > - return NULL; > -} > - > -/* Process BB after processing the blocks it dominates. */ > - > -void > -loop_versioning::lv_dom_walker::after_dom_children (basic_block bb) > -{ > - m_range_analyzer.leave (bb); > -} > - > /* Decide whether to replace VAL with a new value in a versioned loop. > Return the new value if so, otherwise return null. */ >=20=20 > @@ -1483,18 +1433,21 @@ loop_versioning::analyze_blocks () > LOOP. */ >=20=20 > void > -loop_versioning::prune_loop_conditions (class loop *loop, vr_values *vrs) > +loop_versioning::prune_loop_conditions (class loop *loop) > { > loop_info &li =3D get_loop_info (loop); >=20=20 > int to_remove =3D -1; > bitmap_iterator bi; > unsigned int i; > + int_range_max r; > EXECUTE_IF_SET_IN_BITMAP (&li.unity_names, 0, i, bi) > { > tree name =3D ssa_name (i); > - const value_range_equiv *vr =3D vrs->get_value_range (name); > - if (vr && !vr->may_contain_p (build_one_cst (TREE_TYPE (name)))) > + gimple *stmt =3D first_stmt (loop->header); > + > + if (get_range_query (cfun)->range_of_expr (r, name, stmt) > + && !r.contains_p (build_one_cst (TREE_TYPE (name)))) > { > if (dump_enabled_p ()) > dump_printf_loc (MSG_NOTE, find_loop_location (loop), > @@ -1519,9 +1472,11 @@ loop_versioning::prune_conditions () > AUTO_DUMP_SCOPE ("prune_loop_conditions", > dump_user_location_t::from_function_decl (m_fn->decl)); >=20=20 > - calculate_dominance_info (CDI_DOMINATORS); > - lv_dom_walker dom_walker (*this); > - dom_walker.walk (ENTRY_BLOCK_PTR_FOR_FN (m_fn)); > + basic_block bb; > + FOR_EACH_BB_FN (bb, m_fn) > + if (bb =3D=3D bb->loop_father->header) > + prune_loop_conditions (bb->loop_father); If we were going to keep pruning as a separate step, I think we should iterate over loops rather than blocks. However, what I meant by; >> >> If the information is available at >> >> version_for_unity (I guess it is) then we should just avoid recording >> >> the versioning there if so. is that we should instead put the get_range_query (cfun)->range_of_expr and !r.contains_p test=E2=80=A6 ------------------------------------------------------------------------ void loop_versioning::version_for_unity (gimple *stmt, tree name) { class loop *loop =3D loop_containing_stmt (stmt); loop_info &li =3D get_loop_info (loop); =E2=80=A6here if (bitmap_set_bit (&li.unity_names, SSA_NAME_VERSION (name))) ------------------------------------------------------------------------ and report that the value can't be 1 at that point. There would then be no need for a separate pruning step. Having this range information on tap makes the pass much simpler than it used to be. :-) FAOD, I think it would be good to keep using first_stmt (loop->header) (as in your patch) rather than use the stmt argument to version_for_unity. Thanks, Richard > + > return m_num_conditions !=3D 0; > } >=20=20 > @@ -1810,7 +1765,10 @@ pass_loop_versioning::execute (function *fn) > if (number_of_loops (fn) <=3D 1) > return 0; >=20=20 > - return loop_versioning (fn).run (); > + enable_ranger (fn); > + unsigned int ret =3D loop_versioning (fn).run (); > + disable_ranger (fn); > + return ret; > } >=20=20 > } // anon namespace