From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by sourceware.org (Postfix) with ESMTPS id 7F9AE3858C52 for ; Mon, 4 Apr 2022 06:28:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 7F9AE3858C52 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 569F4210DE; Mon, 4 Apr 2022 06:28:14 +0000 (UTC) Received: from murzim.suse.de (murzim.suse.de [10.160.4.192]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 6D9C2A3B88; Mon, 4 Apr 2022 06:28:13 +0000 (UTC) Date: Mon, 4 Apr 2022 08:28:13 +0200 (CEST) From: Richard Biener To: Martin Jambor cc: GCC Patches , Jan Hubicka Subject: Re: [RFC] ipa-cp: Feed results of IPA-CP into SCCVN In-Reply-To: Message-ID: References: <4247o845-r8s1-n167-7n69-24roro37s2s@fhfr.qr> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-10.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Apr 2022 06:28:17 -0000 On Fri, 1 Apr 2022, Martin Jambor wrote: > Hi, > > thanks for a very quick reply. > > On Fri, Apr 01 2022, Richard Biener wrote: > > On Fri, 1 Apr 2022, Martin Jambor wrote: > > > >> Hi, > >> > >> PRs 68930 and 92497 show that when IPA-CP figures out constants in > >> aggregate parameters or when passed by reference but the loads happen > >> in an inlined function the information is lost. This happens even > >> when the inlined function itself was known to have - or even cloned to > >> have - such constants in incoming parameters because the transform > >> phase of IPA passes is not run on them. See discussion in the bugs > >> for reasons why. > >> > >> Honza suggested that we can plug the results of IPA-CP analysis into > >> value numbering, so that FRE can figure out that some loads fetch > >> known constants. This is what this patch attempts to do. > >> > >> Although I spent quite some time reading tree-sccvn.c, it is complex > >> enough that I am sure I am not aware of various caveats and so I would > >> not be terribly surprised if there were some issues with my approach > >> that I am not aware of. Nevertheless, it seems to work well for simple > >> cases and even passes bootstrap and testing (and LTO bootstrap) on > >> x86_64-linux. > >> > >> I have experimented a little with using this approach instead of the > >> function walking parts of the IPA-CP transformation phase. This would > >> mean that the known-constants would not participate in the passes after > >> IPA but before FRE - which are not many but there is a ccp and fwprop > >> pass among others. For simple testcases like > >> gcc/testsuite/gcc.dg/ipa/ipcp-agg-*.c, it makes not assembly difference > >> at all. > >> > >> What do you think? > > > > Comments below > > > >> Martin > >> > >> > >> gcc/ChangeLog: > >> > >> 2022-03-30 Martin Jambor > >> > >> PR ipa/68930 > >> PR ipa/92497 > >> * ipa-prop.cc (ipcp_get_aggregate_const): New function. > >> (ipcp_transform_function): Do not deallocate transformation info. > >> * ipa-prop.h (ipcp_get_aggregate_const): Declare. > >> * tree-ssa-sccvn.cc: Include alloc-pool.h, symbol-summary.h and > >> ipa-prop.h. > >> (vn_reference_lookup_2): When hitting default-def vuse, query > >> IPA-CP transformation info for any known constants. > >> > >> gcc/testsuite/ChangeLog: > >> > >> 2022-03-30 Martin Jambor > >> > >> PR ipa/68930 > >> PR ipa/92497 > >> * gcc.dg/ipa/pr92497-1.c: New test. > >> * gcc.dg/ipa/pr92497-2.c: Likewise. > >> --- > >> gcc/ipa-prop.cc | 43 ++++++++++++++++++++++++---- > >> gcc/ipa-prop.h | 2 ++ > >> gcc/testsuite/gcc.dg/ipa/pr92497-1.c | 26 +++++++++++++++++ > >> gcc/testsuite/gcc.dg/ipa/pr92497-2.c | 26 +++++++++++++++++ > >> gcc/tree-ssa-sccvn.cc | 35 +++++++++++++++++++++- > >> 5 files changed, 126 insertions(+), 6 deletions(-) > >> create mode 100644 gcc/testsuite/gcc.dg/ipa/pr92497-1.c > >> create mode 100644 gcc/testsuite/gcc.dg/ipa/pr92497-2.c > >> > >> diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc > >> index e55fe2776f2..a73a5d9ec1d 100644 > >> --- a/gcc/ipa-prop.cc > >> +++ b/gcc/ipa-prop.cc > >> @@ -5748,6 +5748,44 @@ ipcp_modif_dom_walker::before_dom_children (basic_block bb) > >> return NULL; > >> } > >> > >> +/* If IPA-CP discovered a constant in parameter PARM at OFFSET of a given SIZE > >> + - whether passed by reference or not is given by BY_REF - return that > >> + constant. Otherwise return NULL_TREE. */ > >> + > >> +tree > >> +ipcp_get_aggregate_const (tree parm, bool by_ref, > >> + HOST_WIDE_INT offset, HOST_WIDE_INT size) > > > > I'd prefer to pass in the function decl or struct function or > > cgraph node. > > OK. > > > > >> +{ > >> + cgraph_node *cnode = cgraph_node::get (current_function_decl); > >> + > >> + ipa_agg_replacement_value *aggval = ipa_get_agg_replacements_for_node (cnode); > >> + if (!aggval) > >> + return NULL_TREE; > >> + > >> + int index = 0; > >> + for (tree p = DECL_ARGUMENTS (current_function_decl); > >> + p != parm; p = DECL_CHAIN (p)) > >> + { > >> + index++; > >> + if (!p) > >> + return NULL_TREE; > >> + } > >> + > >> + ipa_agg_replacement_value *v; > >> + for (v = aggval; v; v = v->next) > >> + if (v->index == index > >> + && v->offset == offset) > >> + break; > >> + if (!v > >> + || v->by_ref != by_ref > >> + || maybe_ne (tree_to_poly_int64 (TYPE_SIZE (TREE_TYPE (v->value))), > >> + size)) > >> + return NULL_TREE; > > > > two linear searches here - ugh. I wonder if we should instead > > pre-fill a hash-map from PARM_DECL to a ipa_agg_replacement_value * > > vector sorted by offset which we can binary search? That could be > > done once when starting value-numbering (not on regions). Is > > there any reason the data structure is as it is? > > Only that it is usually a very short list. It is bounded by > param_ipa_cp_value_list_size (8 by default) times the number of > arguments and of course only few usually have any constants in them. > > Having said that, changing the structure is something I am looking into > also for other reasons and I am very much opened to not being so lax > about the worst case. > > > It seems that > > even ipcp_modif_dom_walker::before_dom_children will do a linear > > walk and ipa_get_param_decl_index_1 also linearly walks parameters. > > > > That looks highly sub-optimal to me, it's also done for each > > mention of a parameter. > > > >> + return v->value; > >> +} > >> + > >> + > >> /* Return true if we have recorded VALUE and MASK about PARM. > >> Set VALUE and MASk accordingly. */ > >> > > [...] > > >> diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc > >> index 66b4fc21f5b..da558055054 100644 > >> --- a/gcc/tree-ssa-sccvn.cc > >> +++ b/gcc/tree-ssa-sccvn.cc > >> @@ -74,6 +74,9 @@ along with GCC; see the file COPYING3. If not see > >> #include "ipa-modref-tree.h" > >> #include "ipa-modref.h" > >> #include "tree-ssa-sccvn.h" > >> +#include "alloc-pool.h" > >> +#include "symbol-summary.h" > >> +#include "ipa-prop.h" > >> > >> /* This algorithm is based on the SCC algorithm presented by Keith > >> Cooper and L. Taylor Simpson in "SCC-Based Value numbering" > >> @@ -2273,7 +2276,7 @@ vn_walk_cb_data::push_partial_def (pd_data pd, > >> with the current VUSE and performs the expression lookup. */ > >> > >> static void * > >> -vn_reference_lookup_2 (ao_ref *op ATTRIBUTE_UNUSED, tree vuse, void *data_) > >> +vn_reference_lookup_2 (ao_ref *op, tree vuse, void *data_) > >> { > >> vn_walk_cb_data *data = (vn_walk_cb_data *)data_; > >> vn_reference_t vr = data->vr; > >> @@ -2307,6 +2310,36 @@ vn_reference_lookup_2 (ao_ref *op ATTRIBUTE_UNUSED, tree vuse, void *data_) > >> return *slot; > >> } > >> > >> + if (SSA_NAME_IS_DEFAULT_DEF (vuse)) > >> + { > >> + HOST_WIDE_INT offset, size; > >> + tree v = NULL_TREE; > >> + if (op->base && TREE_CODE (op->base) == PARM_DECL > >> + && op->offset.is_constant (&offset) > >> + && op->size.is_constant (&size)) > > > > you probably also want to check > > > > op->max_size_known_p () > > && known_eq (op->size, op->max_size) > > OK, thanks. > > > > > I see data->partial_defs.is_empty () is already checked in this > > function, but we won't currently ever call vn_reference_lookup_3 > > for default defs. > > > > That said, ideally we'd handle also partial defs, but we can > > start without. > > I'll need to look into that a bit more but yes, sure! > > > > >> + v = ipcp_get_aggregate_const (op->base, false, offset, size); > >> + else if (op->ref) > >> + { > >> + HOST_WIDE_INT offset, size; > >> + bool reverse; > >> + tree base = get_ref_base_and_extent_hwi (op->ref, &offset, > >> + &size, &reverse); > >> + if (base && TREE_CODE (base) == PARM_DECL) > > > > In which case do you arrive here but op->base is not set above? > > Maybe you need to use ao_ref_base (op->base) since op->base is > > filled lazily from op->ref. > > Right, this is something I wanted to ask in my email but forgot. I > believe the answer is never, I just was not sure. I'll delete this > condition. > > > > >> + v = ipcp_get_aggregate_const (base, false, offset, size); > >> + else if (base > >> + && TREE_CODE (base) == MEM_REF > >> + && integer_zerop (TREE_OPERAND (base, 1)) > >> + && TREE_CODE (TREE_OPERAND (base, 0)) == SSA_NAME > >> + && SSA_NAME_IS_DEFAULT_DEF (TREE_OPERAND (base, 0)) > >> + && (TREE_CODE (SSA_NAME_VAR (TREE_OPERAND (base, 0))) > >> + == PARM_DECL)) > > > > This code should also be in the op->base part. > > I'll remove that branch. > > > > > I'd prefer if you pass DECL_BY_REFERENCE rather than second-guessing it > > (or assert you guessed right). > > I am not sure I understand but by_ref in IPA-CP really means the > parameter is an explicit pointer (rather than DECL_BY_REFERENCE) and the > constants correspond to MEM_REFs through that pointer (as opposed to > direct loads from the PARM_DECL itself). That the by_ref corresponds to > what IPA-CP arrived at is checked in the linear look-up, so in the case > of some LTO-induced type mismatch, no constant will be found. Does that > make sense? I see, so are we then not possibly confused by void foo (struct X *p) { struct X *q = p; bar (&p, q); } that is, when 'p' has its address taken we will not have p as SSA name but on the caller side it's going to be a pointer. In foo above we're then seeing a load from 'p' but not from '*p'. How do you avoid taking the [0, pointer-size] access as an access to *p? Given that you simply "strip" the SSA name and go for the PARM_DECL? I thought you simply excluded this case by means of DECL_BY_REFERENCE which would mean you cannot take the address of the pointer in the callee since the pointer is not visible in the source. Richard. > Thanks again for the quick feedback. I'll address the concerns but am > very happy that the overall approach seems feasible. > > Martin > > > > > >> + v = ipcp_get_aggregate_const (SSA_NAME_VAR (TREE_OPERAND (base, 0)), > >> + true, offset, size); > >> + } > >> + if (v) > >> + return data->finish (vr->set, vr->base_set, v); > >> + } > >> + > >> return NULL; > >> } > >> > >> > > > > -- > > Richard Biener > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, > > Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg) > -- Richard Biener SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)