From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2001:67c:2178:6::1d]) by sourceware.org (Postfix) with ESMTPS id 613FC3858C39 for ; Fri, 13 Jan 2023 17:49:59 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 613FC3858C39 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=suse.cz Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id C62A820110; Fri, 13 Jan 2023 17:49:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1673632196; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=DOdAigFQugWzPE96q9zUh5iMkdp3qhJ5icLiBo+x4ak=; b=PxeAM4i31swCRc4H3ltOeKr8i2a5GkvlBLKZP3ryOhJoGHggpkcGKyseh1H33FTCKvfhzX 4puERN9SEN6ROn1rPQHJbvPhA3OfNQq3NlVBb0YDFDJzewuesYYoGTJfZvEDPXvaLXZOrl tyPLsILOC8OSGEEIJMfP2jl6/vR53No= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1673632196; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=DOdAigFQugWzPE96q9zUh5iMkdp3qhJ5icLiBo+x4ak=; b=s3WoeSJH63xzpcgH86g4Yc0Z8ZjKYzPHaG6rlVCnJWEw2eRSTP7An/xKQhpvvt/rLtqkLp XBMrrEKlORMOrjCw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id B96D213913; Fri, 13 Jan 2023 17:49:56 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 0AU7LcSZwWMffAAAMHmgww (envelope-from ); Fri, 13 Jan 2023 17:49:56 +0000 From: Martin Jambor To: Manolis Tsamis , gcc-patches@gcc.gnu.org Cc: Richard Biener , Philipp Tomsich , Jan Hubicka , Christoph Muellner , Manolis Tsamis Subject: Re: [PATCH v2] ipa-cp: Speculatively call specialized functions In-Reply-To: <20221216161054.3663182-1-manolis.tsamis@vrull.eu> References: <20221216161054.3663182-1-manolis.tsamis@vrull.eu> User-Agent: Notmuch/0.37 (https://notmuchmail.org) Emacs/28.2 (x86_64-suse-linux-gnu) Date: Fri, 13 Jan 2023 18:49:56 +0100 Message-ID: MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-11.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_SOFTFAIL,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hello, sorry for getting to this quite late. I have only had a quick glance at ipa-cp.cc hunks so far. On Fri, Dec 16 2022, Manolis Tsamis wrote: > The IPA CP pass offers a wide range of optimizations, where most of them > lead to specialized functions that are called from a call site. > This can lead to multiple specialized function clones, if more than > one call-site allows such an optimization. > If not all call-sites can be optimized, the program might end > up with call-sites to the original function. > > This pass assumes that non-optimized call-sites (i.e. call-sites > that don't call specialized functions) are likely to be called > with arguments that would allow calling specialized clones. > Since we cannot guarantee this (for obvious reasons), we can't > replace the existing calls. However, we can introduce dynamic > guards that test the arguments for the collected constants > and calls the specialized function if there is a match. > > To demonstrate the effect, let's consider the following program part: > > func_1() > myfunc(1) > func_2() > myfunc(2) > func_i(i) > myfunc(i) > > In this case the transformation would do the following: > > func_1() > myfunc.constprop.1() // myfunc() with arg0 == 1 > func_2() > myfunc.constprop.2() // myfunc() with arg0 == 2 > func_i(i) > if (i == 1) > myfunc.constprop.1() // myfunc() with arg0 == 1 > else if (i == 2) > myfunc.constprop.2() // myfunc() with arg0 == 2 > else > myfunc(i) My understanding of the code, however, is that it rather creates func_i(i) if (i == 1) myfunc.constprop.1_1() // mostly equivalent but separate from myfunc.constprop.1 else if (i == 2) myfunc.constprop.2_1() // mostly equivalent but separate from myfunc.constprop.2 else myfunc(i) Which I find difficult to justify. From comments it looked like the reason is avoiding calling find_more_scalar_values, is that correct? I'd like to know more about the cases you are targeting and cases where adding the additional known scalar constants were an issue. I think it needs to be tackled differently. By the way, as IPA-CP works now (it would be nice but difficult to lift that limitation), all but up to one constant in known_csts are constants in all call contexts, so without calling find_more_scalar_values you should need just one run-time condition per speculative call. So tracking which constant is which might be better than avoiding find_more_scalar_values? Also growth limits in ipa-cp are not updated appropriately. Some more comments inline: > > The pass consists of two main parts: > * collecting all specialized functions and the argument/constant pair(s) > * insertion of the guards during materialization > > The patch integrates well into ipa-cp and related IPA functionality. > Given the nature of IPA, the changes are touching many IPA-related > files as well as call-graph data structures. > > The impact of the dynamic guard is expected to be less than the speedup > gained by enabled optimizations (e.g. inlining or constant propagation). > > gcc/Changelog: > > * cgraph.cc (cgraph_add_edge_to_call_site_hash): Add support for guarded specialized edges. > (cgraph_edge::set_call_stmt): Likewise. > (symbol_table::create_edge): Likewise. > (cgraph_edge::remove): Likewise. > (cgraph_edge::make_speculative): Likewise. > (cgraph_edge::make_specialized): Likewise. > (cgraph_edge::remove_specializations): Likewise. > (cgraph_edge::redirect_call_stmt_to_callee): Likewise. > (cgraph_edge::dump_edge_flags): Likewise. > (verify_speculative_call): Likewise. > (verify_specialized_call): Likewise. > (cgraph_node::verify_node): Likewise. > * cgraph.h (class GTY): Add new class that contains info of specialized edges. > * cgraphclones.cc (cgraph_edge::clone): Add support for guarded specialized edges. > (cgraph_node::set_call_stmt_including_clones): Likewise. > * ipa-cp.cc (want_remove_some_param_p): Likewise. > (create_specialized_node): Likewise. > (add_specialized_edges): Likewise. > (ipcp_driver): Likewise. > * ipa-fnsummary.cc (redirect_to_unreachable): Likewise. > (ipa_fn_summary_t::duplicate): Likewise. > (analyze_function_body): Likewise. > (estimate_edge_size_and_time): Likewise. > (remap_edge_summaries): Likewise. > * ipa-inline-transform.cc (inline_transform): Likewise. > * ipa-inline.cc (edge_badness): Likewise. > lto-cgraph.cc (lto_output_edge): Likewise. > (input_edge): Likewise. > * tree-inline.cc (copy_bb): Likewise. > * value-prof.cc (gimple_sc): Add function to create guarded specializations. > * value-prof.h (gimple_sc): Likewise. Please also include test-cases. > > Signed-off-by: Manolis Tsamis > > --- > [...] > diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc > index cc031ebed0f..31d01ada928 100644 > --- a/gcc/ipa-cp.cc > +++ b/gcc/ipa-cp.cc > @@ -119,6 +119,7 @@ along with GCC; see the file COPYING3. If not see > #include "symbol-summary.h" > #include "tree-vrp.h" > #include "ipa-prop.h" > +#include "gimple-pretty-print.h" > #include "tree-pretty-print.h" > #include "tree-inline.h" > #include "ipa-fnsummary.h" > @@ -5239,16 +5240,20 @@ want_remove_some_param_p (cgraph_node *node, vec known_csts) > return false; > } > > +static hash_map> *available_specializations; > + > /* Create a specialized version of NODE with known constants in KNOWN_CSTS, > known contexts in KNOWN_CONTEXTS and known aggregate values in AGGVALS and > - redirect all edges in CALLERS to it. */ > + redirect all edges in CALLERS to it. If IS_SPECULATIVE is true then this > + node is created to be part of a guarded specialization edge. */ > > static struct cgraph_node * > create_specialized_node (struct cgraph_node *node, > vec known_csts, > vec known_contexts, > vec *aggvals, > - vec &callers) > + vec &callers, > + bool is_speculative) > { > ipa_node_params *new_info, *info = ipa_node_params_sum->get (node); > vec *replace_trees = NULL; > @@ -5383,7 +5388,7 @@ create_specialized_node (struct cgraph_node *node, > for (const ipa_argagg_value &av : aggvals) > new_node->maybe_create_reference (av.value, NULL); > > - if (dump_file && (dump_flags & TDF_DETAILS)) > + if (dump_file && (dump_flags & TDF_DETAILS) && !is_speculative) > { > fprintf (dump_file, " the new node is %s.\n", new_node->dump_name ()); > if (known_contexts.exists ()) > @@ -5409,6 +5414,13 @@ create_specialized_node (struct cgraph_node *node, > new_info->known_csts = known_csts; > new_info->known_contexts = known_contexts; > > + if (is_speculative && !info->ipcp_orig_node) What is the reason for testing !info->ipcp_orig_node node here? > + { > + vec &spec_nodes > + = available_specializations->get_or_insert (node); > + spec_nodes.safe_push (new_node); > + } > + > ipcp_discover_new_direct_edges (new_node, known_csts, known_contexts, > aggvals); > > @@ -6104,6 +6116,21 @@ decide_about_value (struct cgraph_node *node, int index, HOST_WIDE_INT offset, > known_csts = avals->m_known_vals.copy (); > known_contexts = copy_useful_known_contexts (avals->m_known_contexts); > } > + > + /* If guarded specialization is enabled then we create an additional > + clone with KNOWN_CSTS and no known contexts or aggregates. > + We don't want find_more_scalar_values because adding more constants > + instreases the complexity of the guard and reduces the chance > + that it is used. */ > + if (flag_ipa_guarded_specialization && !val->self_recursion_generated_p ()) > + { > + vec no_callers = vNULL; > + cgraph_node *guarded_spec_node > + = create_specialized_node (node, known_csts.copy (), vNULL, > + NULL, no_callers, true); It looks like that if the value being considered is an aggregate value (offset is non-negative) or polymorphic_context (yeah, the whole function is a template), neither of which is recorded in known_csts, you'll end up creating a clone with no specialization at all (other than that for all direct calls). > + update_profiling_info (node, guarded_spec_node); I must say I don't know what is the best way to distribute profiling counts in the transformation you propose, but this is not going to do the right thing. update_profiling_info tries to divide the counts proportionally depending on sum of counts of calls to the original and the clone and since the clone has no callers at this point, it will become quite cold. > + } > + > find_more_scalar_values_for_callers_subset (node, known_csts, callers); > find_more_contexts_for_caller_subset (node, &known_contexts, callers); > vec *aggvals > @@ -6111,7 +6138,7 @@ decide_about_value (struct cgraph_node *node, int index, HOST_WIDE_INT offset, > gcc_checking_assert (ipcp_val_agg_replacement_ok_p (aggvals, index, > offset, val->value)); > val->spec_node = create_specialized_node (node, known_csts, known_contexts, > - aggvals, callers); > + aggvals, callers, false); > > if (val->self_recursion_generated_p ()) > self_gen_clones->safe_push (val->spec_node); > @@ -6270,7 +6297,7 @@ decide_whether_version_node (struct cgraph_node *node) > known_contexts = vNULL; > } > clone = create_specialized_node (node, known_csts, known_contexts, > - aggvals, callers); > + aggvals, callers, false); > info->do_clone_for_all_contexts = false; > ipa_node_params_sum->get (clone)->is_all_contexts_clone = true; > ret = true; > @@ -6546,6 +6573,135 @@ ipcp_store_vr_results (void) > } > } > > +/* Add new edges to the call graph to represent the available specializations > + of each specialized function. */ > +static void > +add_specialized_edges (void) > +{ > + cgraph_edge *e; > + cgraph_node *n, *spec_n; > + tree known_cst; > + unsigned i, j; > + > + FOR_EACH_DEFINED_FUNCTION (n) > + { > + if (dump_file && n->callees) > + fprintf (dump_file, > + "Procesing function %s for specialization of edges.\n", > + n->dump_name ()); > + > + if (n->ipcp_clone) > + continue; > + > + bool update = false; > + for (e = n->callees; e; e = e->next_callee) > + { > + if (!e->callee || e->recursive_p ()) > + continue; > + > + vec *specialization_nodes > + = available_specializations->get (e->callee); > + > + /* Even if the calle is a specialized node it is still valid to > + further create guarded specializations based on the original node. > + If the existing specialized node doesn't have any known constants > + then it is probably profitable to specialize further. */ So you are saying thar scalar constants specializations are always better than aggregate or polymorphic_context ones? IMHO this should be at least driven by some heuristics like the number that good_cloning_opportunity_p uses. > + if (e->callee->ipcp_clone && !specialization_nodes) > + { > + ipa_node_params *info > + = ipa_node_params_sum->get (e->callee); > + gcc_checking_assert (info->ipcp_orig_node); > + > + bool has_known_constant = false; > + FOR_EACH_VEC_ELT (info->known_csts, i, known_cst) > + if (known_cst != NULL_TREE) > + { > + has_known_constant = true; > + break; > + } > + > + if (!has_known_constant) > + specialization_nodes > + = available_specializations->get (info->ipcp_orig_node); > + } > + > + if (!specialization_nodes) > + continue; > + > + unsigned num_of_specializations = 0; > + unsigned max_num_of_specializations = opt_for_fn (n->decl, > + param_ipa_spec_max_per_edge); > + > + FOR_EACH_VEC_ELT (*specialization_nodes, i, spec_n) > + { > + if (dump_file) > + fprintf (dump_file, > + "Edge has available specialization %s.\n", > + spec_n->dump_name ()); > + > + ipa_node_params *spec_params = ipa_node_params_sum->get (spec_n); > + vec replaced_args = vNULL; > + bool failed = false; > + > + FOR_EACH_VEC_ELT (spec_params->known_csts, j, known_cst) As I wrote before, I think you are testing also constants which we know are there. The idea is interesting, thanks for exploring these options. As I said, knowing a bit more about what motivated you might help us to reason about it. Martin