* [PATCH 5/6] ipa-cp: Add dumping of overall_size after cloning
2020-09-29 18:12 [PATCH 0/6] IPA cleanups and IPA-CP improvements for 548.exchange2_r Martin Jambor
@ 2020-09-21 14:25 ` Martin Jambor
2020-09-29 18:39 ` Jan Hubicka
2020-09-21 14:25 ` [PATCH 4/6] ipa: Multiple predicates for loop properties, with frequencies Martin Jambor
` (4 subsequent siblings)
5 siblings, 1 reply; 18+ messages in thread
From: Martin Jambor @ 2020-09-21 14:25 UTC (permalink / raw)
To: GCC Patches; +Cc: Jan Hubicka
When experimenting with IPA-CP parameters, especially when looking
into exchange2_r, it has been very useful to know what the value of
overall_size is at different stages of the decision process. This
patch therefore adds it to the generated dumps.
gcc/ChangeLog:
2020-09-07 Martin Jambor <mjambor@suse.cz>
* ipa-cp.c (estimate_local_effects): Add overeall_size to dumped
string.
(decide_about_value): Add dumping new overall_size.
---
gcc/ipa-cp.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index f6320c787de..12acf24c553 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -3517,7 +3517,8 @@ estimate_local_effects (struct cgraph_node *node)
if (dump_file)
fprintf (dump_file, " Decided to specialize for all "
- "known contexts, growth deemed beneficial.\n");
+ "known contexts, growth (to %li) deemed "
+ "beneficial.\n", overall_size);
}
else if (dump_file && (dump_flags & TDF_DETAILS))
fprintf (dump_file, " Not cloning for all contexts because "
@@ -5506,6 +5507,9 @@ decide_about_value (struct cgraph_node *node, int index, HOST_WIDE_INT offset,
val->spec_node = create_specialized_node (node, known_csts, known_contexts,
aggvals, callers);
overall_size += val->local_size_cost;
+ if (dump_file && (dump_flags & TDF_DETAILS))
+ fprintf (dump_file, " overall size reached %li\n",
+ overall_size);
/* TODO: If for some lattice there is only one other known value
left, make a special node for it too. */
--
2.28.0
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 6/6] ipa-cp: Separate and increase the large-unit parameter
2020-09-29 18:12 [PATCH 0/6] IPA cleanups and IPA-CP improvements for 548.exchange2_r Martin Jambor
2020-09-21 14:25 ` [PATCH 5/6] ipa-cp: Add dumping of overall_size after cloning Martin Jambor
2020-09-21 14:25 ` [PATCH 4/6] ipa: Multiple predicates for loop properties, with frequencies Martin Jambor
@ 2020-09-21 14:25 ` Martin Jambor
2020-09-29 19:30 ` Jan Hubicka
2020-10-26 11:00 ` Tamar Christina
2020-09-28 18:47 ` [PATCH 1/6] ipa: Bundle vectors describing argument values Martin Jambor
` (2 subsequent siblings)
5 siblings, 2 replies; 18+ messages in thread
From: Martin Jambor @ 2020-09-21 14:25 UTC (permalink / raw)
To: GCC Patches; +Cc: Jan Hubicka
A previous patch in the series has taught IPA-CP to identify the
important cloning opportunities in 548.exchange2_r as worthwhile on
their own, but the optimization is still prevented from taking place
because of the overall unit-growh limit. This patches raises that
limit so that it takes place and the benchmark runs 30% faster (on AMD
Zen2 CPU at least).
Before this patch, IPA-CP uses the following formulae to arrive at the
overall_size limit:
base = MAX(orig_size, param_large_unit_insns)
unit_growth_limit = base + base * param_ipa_cp_unit_growth / 100
since param_ipa_cp_unit_growth has default 10, param_large_unit_insns
has default value 10000.
The problem with exchange2 (at least on zen2 but I have had a quick
look on aarch64 too) is that the original estimated unit size is 10513
and so param_large_unit_insns does not apply and the default limit is
therefore 11564 which is good enough only for one of the ideal 8
clonings, we need the limit to be at least 16291.
I would like to raise param_ipa_cp_unit_growth a little bit more soon
too, but most certainly not to 55. Therefore, the large_unit must be
increased. In this patch, I decided to decouple the inlining and
ipa-cp large-unit parameters. It also makes sense because IPA-CP uses
it only at -O3 while inlining also at -O2 (IIUC). But if we agree we
can try raising param_large_unit_insns to 13-14 thousand
"instructions," perhaps it is not necessary. But then again, it may
make sense to actually increase the IPA-CP limit further.
I plan to experiment with IPA-CP tuning on a larger set of programs.
Meanwhile, mainly to address the 548.exchange2_r regression, I'm
suggesting this simple change.
gcc/ChangeLog:
2020-09-07 Martin Jambor <mjambor@suse.cz>
* params.opt (ipa-cp-large-unit-insns): New parameter.
* ipa-cp.c (get_max_overall_size): Use the new parameter.
---
gcc/ipa-cp.c | 2 +-
gcc/params.opt | 4 ++++
2 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index 12acf24c553..2152f9e5876 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -3448,7 +3448,7 @@ static long
get_max_overall_size (cgraph_node *node)
{
long max_new_size = orig_overall_size;
- long large_unit = opt_for_fn (node->decl, param_large_unit_insns);
+ long large_unit = opt_for_fn (node->decl, param_ipa_cp_large_unit_insns);
if (max_new_size < large_unit)
max_new_size = large_unit;
int unit_growth = opt_for_fn (node->decl, param_ipa_cp_unit_growth);
diff --git a/gcc/params.opt b/gcc/params.opt
index acb59f17e45..9d177ab50ad 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -218,6 +218,10 @@ Percentage penalty functions containing a single call to another function will r
Common Joined UInteger Var(param_ipa_cp_unit_growth) Init(10) Param Optimization
How much can given compilation unit grow because of the interprocedural constant propagation (in percent).
+-param=ipa-cp-large-unit-insns=
+Common Joined UInteger Var(param_ipa_cp_large_unit_insns) Optimization Init(16000) Param
+The size of translation unit that IPA-CP pass considers large.
+
-param=ipa-cp-value-list-size=
Common Joined UInteger Var(param_ipa_cp_value_list_size) Init(8) Param Optimization
Maximum size of a list of values associated with each parameter for interprocedural constant propagation.
--
2.28.0
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 4/6] ipa: Multiple predicates for loop properties, with frequencies
2020-09-29 18:12 [PATCH 0/6] IPA cleanups and IPA-CP improvements for 548.exchange2_r Martin Jambor
2020-09-21 14:25 ` [PATCH 5/6] ipa-cp: Add dumping of overall_size after cloning Martin Jambor
@ 2020-09-21 14:25 ` Martin Jambor
2020-09-29 22:18 ` Jan Hubicka
2020-09-21 14:25 ` [PATCH 6/6] ipa-cp: Separate and increase the large-unit parameter Martin Jambor
` (3 subsequent siblings)
5 siblings, 1 reply; 18+ messages in thread
From: Martin Jambor @ 2020-09-21 14:25 UTC (permalink / raw)
To: GCC Patches; +Cc: Jan Hubicka
This patch enhances the ability of IPA to reason under what conditions
loops in a function have known iteration counts or strides because it
replaces single predicates which currently hold conjunction of
predicates for all loops with vectors capable of holding multiple
predicates, each with a cumulative frequency of loops with the
property.
This second property is then used by IPA-CP to much more aggressively
boost its heuristic score for cloning opportunities which make
iteration counts or strides of frequent loops compile time constant.
gcc/ChangeLog:
2020-09-03 Martin Jambor <mjambor@suse.cz>
* ipa-fnsummary.h (ipa_freqcounting_predicate): New type.
(ipa_fn_summary): Change the type of loop_iterations and loop_strides
to vectors of ipa_freqcounting_predicate.
(ipa_fn_summary::ipa_fn_summary): Construct the new vectors.
(ipa_call_estimates): New fields loops_with_known_iterations and
loops_with_known_strides.
* ipa-cp.c (hint_time_bonus): Multiply param_ipa_cp_loop_hint_bonus
with the expected frequencies of loops with known iteration count or
stride.
* ipa-fnsummary.c (add_freqcounting_predicate): New function.
(ipa_fn_summary::~ipa_fn_summary): Release the new vectors instead of
just two predicates.
(remap_hint_predicate_after_duplication): Replace with function
remap_freqcounting_preds_after_dup.
(ipa_fn_summary_t::duplicate): Use it or duplicate new vectors.
(ipa_dump_fn_summary): Dump the new vectors.
(analyze_function_body): Compute the loop property vectors.
(ipa_call_context::estimate_size_and_time): Calculate also
loops_with_known_iterations and loops_with_known_strides. Adjusted
dumping accordinly.
(remap_hint_predicate): Replace with function
remap_freqcounting_predicate.
(ipa_merge_fn_summary_after_inlining): Use it.
(inline_read_section): Stream loopcounting vectors instead of two
simple predicates.
(ipa_fn_summary_write): Likewise.
* params.opt (ipa-max-loop-predicates): New parameter.
* doc/invoke.texi (ipa-max-loop-predicates): Document new param.
gcc/testsuite/ChangeLog:
2020-09-03 Martin Jambor <mjambor@suse.cz>
* gcc.dg/ipa/ipcp-loophint-1.c: New test.
---
gcc/doc/invoke.texi | 4 +
gcc/ipa-cp.c | 9 +
gcc/ipa-fnsummary.c | 318 ++++++++++++++-------
gcc/ipa-fnsummary.h | 38 ++-
gcc/params.opt | 4 +
gcc/testsuite/gcc.dg/ipa/ipcp-loophint-1.c | 29 ++
6 files changed, 288 insertions(+), 114 deletions(-)
create mode 100644 gcc/testsuite/gcc.dg/ipa/ipcp-loophint-1.c
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 226b0e1dc91..829598228ac 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -13433,6 +13433,10 @@ of iterations of a loop known, it adds a bonus of
@option{ipa-cp-loop-hint-bonus} to the profitability score of
the candidate.
+@item ipa-max-loop-predicates
+The maximum number of different predicates IPA will use to describe when
+loops in a function have known properties.
+
@item ipa-max-aa-steps
During its analysis of function bodies, IPA-CP employs alias analysis
in order to track values pointed to by function parameters. In order
diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index 77c84a6ed5d..f6320c787de 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -3205,6 +3205,15 @@ hint_time_bonus (cgraph_node *node, const ipa_call_estimates &estimates)
ipa_hints hints = estimates.hints;
if (hints & (INLINE_HINT_loop_iterations | INLINE_HINT_loop_stride))
result += opt_for_fn (node->decl, param_ipa_cp_loop_hint_bonus);
+
+ sreal bonus_for_one = opt_for_fn (node->decl, param_ipa_cp_loop_hint_bonus);
+
+ if (hints & INLINE_HINT_loop_iterations)
+ result += (estimates.loops_with_known_iterations * bonus_for_one).to_int ();
+
+ if (hints & INLINE_HINT_loop_stride)
+ result += (estimates.loops_with_known_strides * bonus_for_one).to_int ();
+
return result;
}
diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c
index 6082f34d63f..bbbb94aa930 100644
--- a/gcc/ipa-fnsummary.c
+++ b/gcc/ipa-fnsummary.c
@@ -310,6 +310,36 @@ set_hint_predicate (predicate **p, predicate new_predicate)
}
}
+/* Find if NEW_PREDICATE is already in V and if so, increment its freq.
+ Otherwise add a new item to the vector with this predicate and frerq equal
+ to add_freq, unless the number of predicates would exceed MAX_NUM_PREDICATES
+ in which case the function does nothing. */
+
+static void
+add_freqcounting_predicate (vec<ipa_freqcounting_predicate, va_gc> **v,
+ const predicate &new_predicate, sreal add_freq,
+ unsigned max_num_predicates)
+{
+ if (new_predicate == false || new_predicate == true)
+ return;
+ ipa_freqcounting_predicate *f;
+ for (int i = 0; vec_safe_iterate (*v, i, &f); i++)
+ if (new_predicate == f->predicate)
+ {
+ f->freq += add_freq;
+ return;
+ }
+ if (vec_safe_length (*v) >= max_num_predicates)
+ /* Too many different predicates to account for. */
+ return;
+
+ ipa_freqcounting_predicate fcp;
+ fcp.predicate = NULL;
+ set_hint_predicate (&fcp.predicate, new_predicate);
+ fcp.freq = add_freq;
+ vec_safe_push (*v, fcp);
+ return;
+}
/* Compute what conditions may or may not hold given information about
parameters. RET_CLAUSE returns truths that may hold in a specialized copy,
@@ -710,13 +740,17 @@ ipa_call_summary::~ipa_call_summary ()
ipa_fn_summary::~ipa_fn_summary ()
{
- if (loop_iterations)
- edge_predicate_pool.remove (loop_iterations);
- if (loop_stride)
- edge_predicate_pool.remove (loop_stride);
+ unsigned len = vec_safe_length (loop_iterations);
+ for (unsigned i = 0; i < len; i++)
+ edge_predicate_pool.remove ((*loop_iterations)[i].predicate);
+ len = vec_safe_length (loop_strides);
+ for (unsigned i = 0; i < len; i++)
+ edge_predicate_pool.remove ((*loop_strides)[i].predicate);
vec_free (conds);
vec_free (size_time_table);
vec_free (call_size_time_table);
+ vec_free (loop_iterations);
+ vec_free (loop_strides);
}
void
@@ -729,24 +763,33 @@ ipa_fn_summary_t::remove_callees (cgraph_node *node)
ipa_call_summaries->remove (e);
}
-/* Same as remap_predicate_after_duplication but handle hint predicate *P.
- Additionally care about allocating new memory slot for updated predicate
- and set it to NULL when it becomes true or false (and thus uninteresting).
- */
+/* Duplicate predicates in loop hint vector, allocating memory for them and
+ remove and deallocate any uninteresting (true or false) ones. Return the
+ result. */
-static void
-remap_hint_predicate_after_duplication (predicate **p,
- clause_t possible_truths)
+static vec<ipa_freqcounting_predicate, va_gc> *
+remap_freqcounting_preds_after_dup (vec<ipa_freqcounting_predicate, va_gc> *v,
+ clause_t possible_truths)
{
- predicate new_predicate;
+ if (vec_safe_length (v) == 0)
+ return NULL;
- if (!*p)
- return;
+ vec<ipa_freqcounting_predicate, va_gc> *res = v->copy ();
+ int len = res->length();
+ for (int i = len - 1; i >= 0; i--)
+ {
+ predicate new_predicate
+ = (*res)[i].predicate->remap_after_duplication (possible_truths);
+ /* We do not want to free previous predicate; it is used by node
+ origin. */
+ (*res)[i].predicate = NULL;
+ set_hint_predicate (&(*res)[i].predicate, new_predicate);
- new_predicate = (*p)->remap_after_duplication (possible_truths);
- /* We do not want to free previous predicate; it is used by node origin. */
- *p = NULL;
- set_hint_predicate (p, new_predicate);
+ if (!(*res)[i].predicate)
+ res->unordered_remove (i);
+ }
+
+ return res;
}
@@ -859,9 +902,11 @@ ipa_fn_summary_t::duplicate (cgraph_node *src,
optimized_out_size += es->call_stmt_size * ipa_fn_summary::size_scale;
edge_set_predicate (edge, &new_predicate);
}
- remap_hint_predicate_after_duplication (&info->loop_iterations,
+ info->loop_iterations
+ = remap_freqcounting_preds_after_dup (info->loop_iterations,
possible_truths);
- remap_hint_predicate_after_duplication (&info->loop_stride,
+ info->loop_strides
+ = remap_freqcounting_preds_after_dup (info->loop_strides,
possible_truths);
/* If inliner or someone after inliner will ever start producing
@@ -873,17 +918,21 @@ ipa_fn_summary_t::duplicate (cgraph_node *src,
else
{
info->size_time_table = vec_safe_copy (info->size_time_table);
- if (info->loop_iterations)
+ info->loop_iterations = vec_safe_copy (info->loop_iterations);
+ info->loop_strides = vec_safe_copy (info->loop_strides);
+
+ ipa_freqcounting_predicate *f;
+ for (int i = 0; vec_safe_iterate (info->loop_iterations, i, &f); i++)
{
- predicate p = *info->loop_iterations;
- info->loop_iterations = NULL;
- set_hint_predicate (&info->loop_iterations, p);
+ predicate p = *f->predicate;
+ f->predicate = NULL;
+ set_hint_predicate (&f->predicate, p);
}
- if (info->loop_stride)
+ for (int i = 0; vec_safe_iterate (info->loop_strides, i, &f); i++)
{
- predicate p = *info->loop_stride;
- info->loop_stride = NULL;
- set_hint_predicate (&info->loop_stride, p);
+ predicate p = *f->predicate;
+ f->predicate = NULL;
+ set_hint_predicate (&f->predicate, p);
}
}
if (!dst->inlined_to)
@@ -1045,15 +1094,28 @@ ipa_dump_fn_summary (FILE *f, struct cgraph_node *node)
}
fprintf (f, "\n");
}
- if (s->loop_iterations)
+ ipa_freqcounting_predicate *fcp;
+ bool first_fcp = true;
+ for (int i = 0; vec_safe_iterate (s->loop_iterations, i, &fcp); i++)
{
- fprintf (f, " loop iterations:");
- s->loop_iterations->dump (f, s->conds);
+ if (first_fcp)
+ {
+ fprintf (f, " loop iterations:");
+ first_fcp = false;
+ }
+ fprintf (f, " %3.2f for ", fcp->freq.to_double ());
+ fcp->predicate->dump (f, s->conds);
}
- if (s->loop_stride)
+ first_fcp = true;
+ for (int i = 0; vec_safe_iterate (s->loop_strides, i, &fcp); i++)
{
- fprintf (f, " loop stride:");
- s->loop_stride->dump (f, s->conds);
+ if (first_fcp)
+ {
+ fprintf (f, " loop strides:");
+ first_fcp = false;
+ }
+ fprintf (f, " %3.2f for :", fcp->freq.to_double ());
+ fcp->predicate->dump (f, s->conds);
}
fprintf (f, " calls:\n");
dump_ipa_call_summary (f, 4, node, s);
@@ -2543,12 +2605,13 @@ analyze_function_body (struct cgraph_node *node, bool early)
if (fbi.info)
compute_bb_predicates (&fbi, node, info, params_summary);
+ const profile_count entry_count = ENTRY_BLOCK_PTR_FOR_FN (cfun)->count;
order = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
nblocks = pre_and_rev_post_order_compute (NULL, order, false);
for (n = 0; n < nblocks; n++)
{
bb = BASIC_BLOCK_FOR_FN (cfun, order[n]);
- freq = bb->count.to_sreal_scale (ENTRY_BLOCK_PTR_FOR_FN (cfun)->count);
+ freq = bb->count.to_sreal_scale (entry_count);
if (clobber_only_eh_bb_p (bb))
{
if (dump_file && (dump_flags & TDF_DETAILS))
@@ -2790,23 +2853,28 @@ analyze_function_body (struct cgraph_node *node, bool early)
if (nonconstant_names.exists () && !early)
{
+ ipa_fn_summary *s = ipa_fn_summaries->get (node);
class loop *loop;
- predicate loop_iterations = true;
- predicate loop_stride = true;
+ unsigned max_loop_predicates = opt_for_fn (node->decl,
+ param_ipa_max_loop_predicates);
if (dump_file && (dump_flags & TDF_DETAILS))
flow_loops_dump (dump_file, NULL, 0);
scev_initialize ();
FOR_EACH_LOOP (loop, 0)
{
+ predicate loop_iterations = true;
+ sreal header_freq;
edge ex;
unsigned int j;
class tree_niter_desc niter_desc;
- if (loop->header->aux)
- bb_predicate = *(predicate *) loop->header->aux;
- else
- bb_predicate = false;
+ if (!loop->header->aux)
+ continue;
+ profile_count phdr_count = loop_preheader_edge (loop)->count ();
+ sreal phdr_freq = phdr_count.to_sreal_scale (entry_count);
+
+ bb_predicate = *(predicate *) loop->header->aux;
auto_vec<edge> exits = get_loop_exit_edges (loop);
FOR_EACH_VEC_ELT (exits, j, ex)
if (number_of_iterations_exit (loop, ex, &niter_desc, false)
@@ -2821,10 +2889,10 @@ analyze_function_body (struct cgraph_node *node, bool early)
will_be_nonconstant = bb_predicate & will_be_nonconstant;
if (will_be_nonconstant != true
&& will_be_nonconstant != false)
- /* This is slightly inprecise. We may want to represent each
- loop with independent predicate. */
loop_iterations &= will_be_nonconstant;
}
+ add_freqcounting_predicate (&s->loop_iterations, loop_iterations,
+ phdr_freq, max_loop_predicates);
}
/* To avoid quadratic behavior we analyze stride predicates only
@@ -2833,14 +2901,17 @@ analyze_function_body (struct cgraph_node *node, bool early)
for (loop = loops_for_fn (cfun)->tree_root->inner;
loop != NULL; loop = loop->next)
{
+ predicate loop_stride = true;
basic_block *body = get_loop_body (loop);
+ profile_count phdr_count = loop_preheader_edge (loop)->count ();
+ sreal phdr_freq = phdr_count.to_sreal_scale (entry_count);
for (unsigned i = 0; i < loop->num_nodes; i++)
{
gimple_stmt_iterator gsi;
- if (body[i]->aux)
- bb_predicate = *(predicate *) body[i]->aux;
- else
- bb_predicate = false;
+ if (!body[i]->aux)
+ continue;
+
+ bb_predicate = *(predicate *) body[i]->aux;
for (gsi = gsi_start_bb (body[i]); !gsi_end_p (gsi);
gsi_next (&gsi))
{
@@ -2869,16 +2940,13 @@ analyze_function_body (struct cgraph_node *node, bool early)
will_be_nonconstant = bb_predicate & will_be_nonconstant;
if (will_be_nonconstant != true
&& will_be_nonconstant != false)
- /* This is slightly inprecise. We may want to represent
- each loop with independent predicate. */
loop_stride = loop_stride & will_be_nonconstant;
}
}
+ add_freqcounting_predicate (&s->loop_strides, loop_stride,
+ phdr_freq, max_loop_predicates);
free (body);
}
- ipa_fn_summary *s = ipa_fn_summaries->get (node);
- set_hint_predicate (&s->loop_iterations, loop_iterations);
- set_hint_predicate (&s->loop_stride, loop_stride);
scev_finalize ();
}
FOR_ALL_BB_FN (bb, my_function)
@@ -3551,6 +3619,8 @@ ipa_call_context::estimate_size_and_time (ipa_call_estimates *estimates,
sreal time = 0;
int min_size = 0;
ipa_hints hints = 0;
+ sreal loops_with_known_iterations = 0;
+ sreal loops_with_known_strides = 0;
int i;
if (dump_file && (dump_flags & TDF_DETAILS))
@@ -3643,16 +3713,27 @@ ipa_call_context::estimate_size_and_time (ipa_call_estimates *estimates,
if (est_hints)
{
- if (info->loop_iterations
- && !info->loop_iterations->evaluate (m_possible_truths))
- hints |= INLINE_HINT_loop_iterations;
- if (info->loop_stride
- && !info->loop_stride->evaluate (m_possible_truths))
- hints |= INLINE_HINT_loop_stride;
if (info->scc_no)
hints |= INLINE_HINT_in_scc;
if (DECL_DECLARED_INLINE_P (m_node->decl))
hints |= INLINE_HINT_declared_inline;
+
+ ipa_freqcounting_predicate *fcp;
+ for (i = 0; vec_safe_iterate (info->loop_iterations, i, &fcp); i++)
+ if (!fcp->predicate->evaluate (m_possible_truths))
+ {
+ hints |= INLINE_HINT_loop_iterations;
+ loops_with_known_iterations += fcp->freq;
+ }
+ estimates->loops_with_known_iterations = loops_with_known_iterations;
+
+ for (i = 0; vec_safe_iterate (info->loop_strides, i, &fcp); i++)
+ if (!fcp->predicate->evaluate (m_possible_truths))
+ {
+ hints |= INLINE_HINT_loop_stride;
+ loops_with_known_strides += fcp->freq;
+ }
+ estimates->loops_with_known_strides = loops_with_known_strides;
}
size = RDIV (size, ipa_fn_summary::size_scale);
@@ -3660,12 +3741,15 @@ ipa_call_context::estimate_size_and_time (ipa_call_estimates *estimates,
if (dump_file && (dump_flags & TDF_DETAILS))
{
+ fprintf (dump_file, "\n size:%i", (int) size);
if (est_times)
- fprintf (dump_file, "\n size:%i time:%f nonspec time:%f\n",
- (int) size, time.to_double (),
- nonspecialized_time.to_double ());
- else
- fprintf (dump_file, "\n size:%i (time not estimated)\n", (int) size);
+ fprintf (dump_file, " time:%f nonspec time:%f",
+ time.to_double (), nonspecialized_time.to_double ());
+ if (est_hints)
+ fprintf (dump_file, " loops with known iterations:%f "
+ "known strides:%f", loops_with_known_iterations.to_double (),
+ loops_with_known_strides.to_double ());
+ fprintf (dump_file, "\n");
}
if (est_times)
{
@@ -3865,32 +3949,29 @@ remap_edge_summaries (struct cgraph_edge *inlined_edge,
}
}
-/* Same as remap_predicate, but set result into hint *HINT. */
+/* Run remap_after_inlining on each predicate in V. */
static void
-remap_hint_predicate (class ipa_fn_summary *info,
- class ipa_node_params *params_summary,
- class ipa_fn_summary *callee_info,
- predicate **hint,
- vec<int> operand_map,
- vec<int> offset_map,
- clause_t possible_truths,
- predicate *toplev_predicate)
-{
- predicate p;
+remap_freqcounting_predicate (class ipa_fn_summary *info,
+ class ipa_node_params *params_summary,
+ class ipa_fn_summary *callee_info,
+ vec<ipa_freqcounting_predicate, va_gc> *v,
+ vec<int> operand_map,
+ vec<int> offset_map,
+ clause_t possible_truths,
+ predicate *toplev_predicate)
- if (!*hint)
- return;
- p = (*hint)->remap_after_inlining
- (info, params_summary, callee_info,
- operand_map, offset_map,
- possible_truths, *toplev_predicate);
- if (p != false && p != true)
+{
+ ipa_freqcounting_predicate *fcp;
+ for (int i = 0; vec_safe_iterate (v, i, &fcp); i++)
{
- if (!*hint)
- set_hint_predicate (hint, p);
- else
- **hint &= p;
+ predicate p
+ = fcp->predicate->remap_after_inlining (info, params_summary,
+ callee_info, operand_map,
+ offset_map, possible_truths,
+ *toplev_predicate);
+ if (p != false && p != true)
+ *fcp->predicate &= p;
}
}
@@ -3998,12 +4079,12 @@ ipa_merge_fn_summary_after_inlining (struct cgraph_edge *edge)
remap_edge_summaries (edge, edge->callee, info, params_summary,
callee_info, operand_map,
offset_map, clause, &toplev_predicate);
- remap_hint_predicate (info, params_summary, callee_info,
- &callee_info->loop_iterations,
- operand_map, offset_map, clause, &toplev_predicate);
- remap_hint_predicate (info, params_summary, callee_info,
- &callee_info->loop_stride,
- operand_map, offset_map, clause, &toplev_predicate);
+ remap_freqcounting_predicate (info, params_summary, callee_info,
+ info->loop_iterations, operand_map,
+ offset_map, clause, &toplev_predicate);
+ remap_freqcounting_predicate (info, params_summary, callee_info,
+ info->loop_strides, operand_map,
+ offset_map, clause, &toplev_predicate);
HOST_WIDE_INT stack_frame_offset = ipa_get_stack_frame_offset (edge->callee);
HOST_WIDE_INT peak = stack_frame_offset + callee_info->estimated_stack_size;
@@ -4334,12 +4415,34 @@ inline_read_section (struct lto_file_decl_data *file_data, const char *data,
info->size_time_table->quick_push (e);
}
- p.stream_in (&ib);
- if (info)
- set_hint_predicate (&info->loop_iterations, p);
- p.stream_in (&ib);
- if (info)
- set_hint_predicate (&info->loop_stride, p);
+ count2 = streamer_read_uhwi (&ib);
+ for (j = 0; j < count2; j++)
+ {
+ p.stream_in (&ib);
+ sreal fcp_freq = sreal::stream_in (&ib);
+ if (info)
+ {
+ ipa_freqcounting_predicate fcp;
+ fcp.predicate = NULL;
+ set_hint_predicate (&fcp.predicate, p);
+ fcp.freq = fcp_freq;
+ vec_safe_push (info->loop_iterations, fcp);
+ }
+ }
+ count2 = streamer_read_uhwi (&ib);
+ for (j = 0; j < count2; j++)
+ {
+ p.stream_in (&ib);
+ sreal fcp_freq = sreal::stream_in (&ib);
+ if (info)
+ {
+ ipa_freqcounting_predicate fcp;
+ fcp.predicate = NULL;
+ set_hint_predicate (&fcp.predicate, p);
+ fcp.freq = fcp_freq;
+ vec_safe_push (info->loop_strides, fcp);
+ }
+ }
for (e = node->callees; e; e = e->next_callee)
read_ipa_call_summary (&ib, e, info != NULL);
for (e = node->indirect_calls; e; e = e->next_callee)
@@ -4502,14 +4605,19 @@ ipa_fn_summary_write (void)
e->exec_predicate.stream_out (ob);
e->nonconst_predicate.stream_out (ob);
}
- if (info->loop_iterations)
- info->loop_iterations->stream_out (ob);
- else
- streamer_write_uhwi (ob, 0);
- if (info->loop_stride)
- info->loop_stride->stream_out (ob);
- else
- streamer_write_uhwi (ob, 0);
+ ipa_freqcounting_predicate *fcp;
+ streamer_write_uhwi (ob, vec_safe_length (info->loop_iterations));
+ for (i = 0; vec_safe_iterate (info->loop_iterations, i, &fcp); i++)
+ {
+ fcp->predicate->stream_out (ob);
+ fcp->freq.stream_out (ob);
+ }
+ streamer_write_uhwi (ob, vec_safe_length (info->loop_strides));
+ for (i = 0; vec_safe_iterate (info->loop_strides, i, &fcp); i++)
+ {
+ fcp->predicate->stream_out (ob);
+ fcp->freq.stream_out (ob);
+ }
for (edge = cnode->callees; edge; edge = edge->next_callee)
write_ipa_call_summary (ob, edge);
for (edge = cnode->indirect_calls; edge; edge = edge->next_callee)
diff --git a/gcc/ipa-fnsummary.h b/gcc/ipa-fnsummary.h
index ccb6b432f0b..f4dd5b85ab9 100644
--- a/gcc/ipa-fnsummary.h
+++ b/gcc/ipa-fnsummary.h
@@ -101,6 +101,19 @@ public:
}
};
+/* Structure to capture how frequently some interesting events occur given a
+ particular predicate. The structure is used to estimate how often we
+ encounter loops with known iteration count or stride in various
+ contexts. */
+
+struct GTY(()) ipa_freqcounting_predicate
+{
+ /* The described event happens with this frequency... */
+ sreal freq;
+ /* ...when this predicate evaluates to false. */
+ class predicate * GTY((skip)) predicate;
+};
+
/* Function inlining information. */
class GTY(()) ipa_fn_summary
{
@@ -112,8 +125,9 @@ public:
inlinable (false), single_caller (false),
fp_expressions (false), estimated_stack_size (false),
time (0), conds (NULL),
- size_time_table (NULL), call_size_time_table (NULL), loop_iterations (NULL),
- loop_stride (NULL), growth (0), scc_no (0)
+ size_time_table (NULL), call_size_time_table (NULL),
+ loop_iterations (NULL), loop_strides (NULL),
+ growth (0), scc_no (0)
{
}
@@ -125,7 +139,7 @@ public:
estimated_stack_size (s.estimated_stack_size),
time (s.time), conds (s.conds), size_time_table (s.size_time_table),
call_size_time_table (NULL),
- loop_iterations (s.loop_iterations), loop_stride (s.loop_stride),
+ loop_iterations (s.loop_iterations), loop_strides (s.loop_strides),
growth (s.growth), scc_no (s.scc_no)
{}
@@ -164,12 +178,10 @@ public:
vec<size_time_entry, va_gc> *size_time_table;
vec<size_time_entry, va_gc> *call_size_time_table;
- /* Predicate on when some loop in the function becomes to have known
- bounds. */
- predicate * GTY((skip)) loop_iterations;
- /* Predicate on when some loop in the function becomes to have known
- stride. */
- predicate * GTY((skip)) loop_stride;
+ /* Predicates on when some loops in the function can have known bounds. */
+ vec<ipa_freqcounting_predicate, va_gc> *loop_iterations;
+ /* Predicates on when some loops in the function can have known strides. */
+ vec<ipa_freqcounting_predicate, va_gc> *loop_strides;
/* Estimated growth for inlining all copies of the function before start
of small functions inlining.
This value will get out of date as the callers are duplicated, but
@@ -308,6 +320,14 @@ struct ipa_call_estimates
/* Further discovered reasons why to inline or specialize the give calls. */
ipa_hints hints;
+
+ /* Frequency how often a loop with known number of iterations is encountered.
+ Calculated with hints. */
+ sreal loops_with_known_iterations;
+
+ /* Frequency how often a loop with known strides is encountered. Calculated
+ with hints. */
+ sreal loops_with_known_strides;
};
class ipa_cached_call_context;
diff --git a/gcc/params.opt b/gcc/params.opt
index 5bc7e1619c5..acb59f17e45 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -234,6 +234,10 @@ Maximum number of aggregate content items for a parameter in jump functions and
Common Joined UInteger Var(param_ipa_max_param_expr_ops) Init(10) Param Optimization
Maximum number of operations in a parameter expression that can be handled by IPA analysis.
+-param=ipa-max-loop-predicates=
+Common Joined UInteger Var(param_ipa_max_loop_predicates) Init(16) Param Optimization
+Maximum number of different predicates used to track properties of loops in IPA analysis.
+
-param=ipa-max-switch-predicate-bounds=
Common Joined UInteger Var(param_ipa_max_switch_predicate_bounds) Init(5) Param Optimization
Maximal number of boundary endpoints of case ranges of switch statement used during IPA function summary generation.
diff --git a/gcc/testsuite/gcc.dg/ipa/ipcp-loophint-1.c b/gcc/testsuite/gcc.dg/ipa/ipcp-loophint-1.c
new file mode 100644
index 00000000000..6d049af68af
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/ipcp-loophint-1.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-ipa-cp-details" } */
+
+extern int *o, *p, *q, *r;
+
+#define FUNCTIONS fa(), fb(), fc(), fd(), fe(), ff(), fg()
+
+extern void FUNCTIONS;
+
+void foo (int c)
+{
+ FUNCTIONS;
+ FUNCTIONS;
+ for (int i = 0; i < 100; i++)
+ {
+ for (int j = 0; j < c; j++)
+ o[i] = p[i] + q[i] * r[i];
+ }
+ FUNCTIONS;
+ FUNCTIONS;
+}
+
+void bar()
+{
+ foo (8);
+ p[4]++;
+}
+
+/* { dg-final { scan-ipa-dump {with known iterations:[1-9]} "cp" } } */
--
2.28.0
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 1/6] ipa: Bundle vectors describing argument values
2020-09-29 18:12 [PATCH 0/6] IPA cleanups and IPA-CP improvements for 548.exchange2_r Martin Jambor
` (2 preceding siblings ...)
2020-09-21 14:25 ` [PATCH 6/6] ipa-cp: Separate and increase the large-unit parameter Martin Jambor
@ 2020-09-28 18:47 ` Martin Jambor
2020-10-02 11:54 ` Jan Hubicka
2020-09-28 18:47 ` [PATCH 3/6] ipa: Bundle estimates of ipa_call_context::estimate_size_and_time Martin Jambor
2020-09-28 18:47 ` [PATCH 2/6] ipa: Introduce ipa_cached_call_context Martin Jambor
5 siblings, 1 reply; 18+ messages in thread
From: Martin Jambor @ 2020-09-28 18:47 UTC (permalink / raw)
To: GCC Patches; +Cc: Jan Hubicka
Hi,
this large patch is mostly mechanical change which aims to replace
uses of separate vectors about known scalar values (usually called
known_vals or known_csts), known aggregate values (known_aggs), known
virtual call contexts (known_contexts) and known value
ranges (known_value_ranges) with uses of either new type
ipa_call_arg_values or ipa_auto_call_arg_values, both of which simply
contain these vectors inside them.
The need for two distinct comes from the fact that when the vectors
are constructed from jump functions or lattices, we really should use
auto_vecs with embedded storage allocated on stack. On the other hand,
the bundle in ipa_call_context can be allocated on heap when in cache,
one time for each call_graph node.
ipa_call_context is constructible from ipa_auto_call_arg_values but
then its vectors must not be resized, otherwise the vectors will stop
pointing to the stack ones. Unfortunately, I don't think the
structure embedded in ipa_call_context can be made constant because we
need to manipulate and deallocate it when in cache.
gcc/ChangeLog:
2020-09-01 Martin Jambor <mjambor@suse.cz>
* ipa-prop.h (ipa_auto_call_arg_values): New type.
(class ipa_call_arg_values): Likewise.
(ipa_get_indirect_edge_target): Replaced vector arguments with
ipa_call_arg_values in declaration. Added an overload for
ipa_auto_call_arg_values.
* ipa-fnsummary.h (ipa_call_context): Removed members m_known_vals,
m_known_contexts, m_known_aggs, duplicate_from, release and equal_to,
new members m_avals, store_to_cache and equivalent_to_p. Adjusted
construcotr arguments.
(estimate_ipcp_clone_size_and_time): Replaced vector arguments
with ipa_auto_call_arg_values in declaration.
(evaluate_properties_for_edge): Likewise.
* ipa-cp.c (ipa_get_indirect_edge_target): Adjusted to work on
ipa_call_arg_values rather than on separate vectors. Added an
overload for ipa_auto_call_arg_values.
(devirtualization_time_bonus): Adjusted to work on
ipa_auto_call_arg_values rather than on separate vectors.
(gather_context_independent_values): Adjusted to work on
ipa_auto_call_arg_values rather than on separate vectors.
(perform_estimation_of_a_value): Likewise.
(estimate_local_effects): Likewise.
(modify_known_vectors_with_val): Adjusted both variants to work on
ipa_auto_call_arg_values and rename them to
copy_known_vectors_add_val.
(decide_about_value): Adjusted to work on ipa_call_arg_values rather
than on separate vectors.
(decide_whether_version_node): Likewise.
* ipa-fnsummary.c (evaluate_conditions_for_known_args): Likewise.
(evaluate_properties_for_edge): Likewise.
(ipa_fn_summary_t::duplicate): Likewise.
(estimate_edge_devirt_benefit): Adjusted to work on
ipa_call_arg_values rather than on separate vectors.
(estimate_edge_size_and_time): Likewise.
(estimate_calls_size_and_time_1): Likewise.
(summarize_calls_size_and_time): Adjusted calls to
estimate_edge_size_and_time.
(estimate_calls_size_and_time): Adjusted to work on
ipa_call_arg_values rather than on separate vectors.
(ipa_call_context::ipa_call_context): Construct from a pointer to
ipa_auto_call_arg_values instead of inividual vectors.
(ipa_call_context::duplicate_from): Adjusted to access vectors within
m_avals.
(ipa_call_context::release): Likewise.
(ipa_call_context::equal_to): Likewise.
(ipa_call_context::estimate_size_and_time): Adjusted to work on
ipa_call_arg_values rather than on separate vectors.
(estimate_ipcp_clone_size_and_time): Adjusted to work with
ipa_auto_call_arg_values rather than on separate vectors.
(ipa_merge_fn_summary_after_inlining): Likewise. Adjusted call to
estimate_edge_size_and_time.
(ipa_update_overall_fn_summary): Adjusted call to
estimate_edge_size_and_time.
* ipa-inline-analysis.c (do_estimate_edge_time): Adjusted to work with
ipa_auto_call_arg_values rather than with separate vectors.
(do_estimate_edge_size): Likewise.
(do_estimate_edge_hints): Likewise.
* ipa-prop.c (ipa_auto_call_arg_values::~ipa_auto_call_arg_values):
New destructor.
---
gcc/ipa-cp.c | 245 ++++++++++-----------
gcc/ipa-fnsummary.c | 446 +++++++++++++++++---------------------
gcc/ipa-fnsummary.h | 27 +--
gcc/ipa-inline-analysis.c | 41 +---
gcc/ipa-prop.c | 10 +
gcc/ipa-prop.h | 112 +++++++++-
6 files changed, 452 insertions(+), 429 deletions(-)
diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index b3e7d41ea10..292dd7e5bdf 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -3117,30 +3117,40 @@ ipa_get_indirect_edge_target_1 (struct cgraph_edge *ie,
return target;
}
-
-/* If an indirect edge IE can be turned into a direct one based on KNOWN_CSTS,
- KNOWN_CONTEXTS (which can be vNULL) or KNOWN_AGGS (which also can be vNULL)
- return the destination. */
+/* If an indirect edge IE can be turned into a direct one based on data in
+ AVALS, return the destination. Store into *SPECULATIVE a boolean determinig
+ whether the discovered target is only speculative guess. */
tree
ipa_get_indirect_edge_target (struct cgraph_edge *ie,
- vec<tree> known_csts,
- vec<ipa_polymorphic_call_context> known_contexts,
- vec<ipa_agg_value_set> known_aggs,
+ ipa_call_arg_values *avals,
bool *speculative)
{
- return ipa_get_indirect_edge_target_1 (ie, known_csts, known_contexts,
- known_aggs, NULL, speculative);
+ return ipa_get_indirect_edge_target_1 (ie, avals->m_known_vals,
+ avals->m_known_contexts,
+ avals->m_known_aggs,
+ NULL, speculative);
}
-/* Calculate devirtualization time bonus for NODE, assuming we know KNOWN_CSTS
- and KNOWN_CONTEXTS. */
+/* The same functionality as above overloaded for ipa_auto_call_arg_values. */
+
+tree
+ipa_get_indirect_edge_target (struct cgraph_edge *ie,
+ ipa_auto_call_arg_values *avals,
+ bool *speculative)
+{
+ return ipa_get_indirect_edge_target_1 (ie, avals->m_known_vals,
+ avals->m_known_contexts,
+ avals->m_known_aggs,
+ NULL, speculative);
+}
+
+/* Calculate devirtualization time bonus for NODE, assuming we know information
+ about arguments stored in AVALS. */
static int
devirtualization_time_bonus (struct cgraph_node *node,
- vec<tree> known_csts,
- vec<ipa_polymorphic_call_context> known_contexts,
- vec<ipa_agg_value_set> known_aggs)
+ ipa_auto_call_arg_values *avals)
{
struct cgraph_edge *ie;
int res = 0;
@@ -3153,8 +3163,7 @@ devirtualization_time_bonus (struct cgraph_node *node,
tree target;
bool speculative;
- target = ipa_get_indirect_edge_target (ie, known_csts, known_contexts,
- known_aggs, &speculative);
+ target = ipa_get_indirect_edge_target (ie, avals, &speculative);
if (!target)
continue;
@@ -3306,32 +3315,27 @@ context_independent_aggregate_values (class ipcp_param_lattices *plats)
return res;
}
-/* Allocate KNOWN_CSTS, KNOWN_CONTEXTS and, if non-NULL, KNOWN_AGGS and
- populate them with values of parameters that are known independent of the
- context. INFO describes the function. If REMOVABLE_PARAMS_COST is
- non-NULL, the movement cost of all removable parameters will be stored in
- it. */
+/* Grow vectors in AVALS and fill them with information about values of
+ parameters that are known to be independent of the context. Only calculate
+ m_known_aggs if CALCULATE_AGGS is true. INFO describes the function. If
+ REMOVABLE_PARAMS_COST is non-NULL, the movement cost of all removable
+ parameters will be stored in it.
+
+ TODO: Also grow context independent value range vectors. */
static bool
gather_context_independent_values (class ipa_node_params *info,
- vec<tree> *known_csts,
- vec<ipa_polymorphic_call_context>
- *known_contexts,
- vec<ipa_agg_value_set> *known_aggs,
+ ipa_auto_call_arg_values *avals,
+ bool calculate_aggs,
int *removable_params_cost)
{
int i, count = ipa_get_param_count (info);
bool ret = false;
- known_csts->create (0);
- known_contexts->create (0);
- known_csts->safe_grow_cleared (count, true);
- known_contexts->safe_grow_cleared (count, true);
- if (known_aggs)
- {
- known_aggs->create (0);
- known_aggs->safe_grow_cleared (count, true);
- }
+ avals->m_known_vals.safe_grow_cleared (count, true);
+ avals->m_known_contexts.safe_grow_cleared (count, true);
+ if (calculate_aggs)
+ avals->m_known_aggs.safe_grow_cleared (count, true);
if (removable_params_cost)
*removable_params_cost = 0;
@@ -3345,7 +3349,7 @@ gather_context_independent_values (class ipa_node_params *info,
{
ipcp_value<tree> *val = lat->values;
gcc_checking_assert (TREE_CODE (val->value) != TREE_BINFO);
- (*known_csts)[i] = val->value;
+ avals->m_known_vals[i] = val->value;
if (removable_params_cost)
*removable_params_cost
+= estimate_move_cost (TREE_TYPE (val->value), false);
@@ -3363,15 +3367,15 @@ gather_context_independent_values (class ipa_node_params *info,
/* Do not account known context as reason for cloning. We can see
if it permits devirtualization. */
if (ctxlat->is_single_const ())
- (*known_contexts)[i] = ctxlat->values->value;
+ avals->m_known_contexts[i] = ctxlat->values->value;
- if (known_aggs)
+ if (calculate_aggs)
{
vec<ipa_agg_value> agg_items;
struct ipa_agg_value_set *agg;
agg_items = context_independent_aggregate_values (plats);
- agg = &(*known_aggs)[i];
+ agg = &avals->m_known_aggs[i];
agg->items = agg_items;
agg->by_ref = plats->aggs_by_ref;
ret |= !agg_items.is_empty ();
@@ -3381,25 +3385,23 @@ gather_context_independent_values (class ipa_node_params *info,
return ret;
}
-/* Perform time and size measurement of NODE with the context given in
- KNOWN_CSTS, KNOWN_CONTEXTS and KNOWN_AGGS, calculate the benefit and cost
- given BASE_TIME of the node without specialization, REMOVABLE_PARAMS_COST of
- all context-independent removable parameters and EST_MOVE_COST of estimated
- movement of the considered parameter and store it into VAL. */
+/* Perform time and size measurement of NODE with the context given in AVALS,
+ calculate the benefit compared to the node without specialization and store
+ it into VAL. Take into account REMOVABLE_PARAMS_COST of all
+ context-independent or unused removable parameters and EST_MOVE_COST, the
+ estimated movement of the considered parameter. */
static void
-perform_estimation_of_a_value (cgraph_node *node, vec<tree> known_csts,
- vec<ipa_polymorphic_call_context> known_contexts,
- vec<ipa_agg_value_set> known_aggs,
- int removable_params_cost,
- int est_move_cost, ipcp_value_base *val)
+perform_estimation_of_a_value (cgraph_node *node,
+ ipa_auto_call_arg_values *avals,
+ int removable_params_cost, int est_move_cost,
+ ipcp_value_base *val)
{
int size, time_benefit;
sreal time, base_time;
ipa_hints hints;
- estimate_ipcp_clone_size_and_time (node, known_csts, known_contexts,
- known_aggs, &size, &time,
+ estimate_ipcp_clone_size_and_time (node, avals, &size, &time,
&base_time, &hints);
base_time -= time;
if (base_time > 65535)
@@ -3412,8 +3414,7 @@ perform_estimation_of_a_value (cgraph_node *node, vec<tree> known_csts,
time_benefit = 0;
else
time_benefit = base_time.to_int ()
- + devirtualization_time_bonus (node, known_csts, known_contexts,
- known_aggs)
+ + devirtualization_time_bonus (node, avals)
+ hint_time_bonus (node, hints)
+ removable_params_cost + est_move_cost;
@@ -3454,9 +3455,6 @@ estimate_local_effects (struct cgraph_node *node)
{
class ipa_node_params *info = IPA_NODE_REF (node);
int i, count = ipa_get_param_count (info);
- vec<tree> known_csts;
- vec<ipa_polymorphic_call_context> known_contexts;
- vec<ipa_agg_value_set> known_aggs;
bool always_const;
int removable_params_cost;
@@ -3466,11 +3464,10 @@ estimate_local_effects (struct cgraph_node *node)
if (dump_file && (dump_flags & TDF_DETAILS))
fprintf (dump_file, "\nEstimating effects for %s.\n", node->dump_name ());
- always_const = gather_context_independent_values (info, &known_csts,
- &known_contexts, &known_aggs,
+ ipa_auto_call_arg_values avals;
+ always_const = gather_context_independent_values (info, &avals, true,
&removable_params_cost);
- int devirt_bonus = devirtualization_time_bonus (node, known_csts,
- known_contexts, known_aggs);
+ int devirt_bonus = devirtualization_time_bonus (node, &avals);
if (always_const || devirt_bonus
|| (removable_params_cost && node->can_change_signature))
{
@@ -3482,8 +3479,7 @@ estimate_local_effects (struct cgraph_node *node)
init_caller_stats (&stats);
node->call_for_symbol_thunks_and_aliases (gather_caller_stats, &stats,
false);
- estimate_ipcp_clone_size_and_time (node, known_csts, known_contexts,
- known_aggs, &size, &time,
+ estimate_ipcp_clone_size_and_time (node, &avals, &size, &time,
&base_time, &hints);
time -= devirt_bonus;
time -= hint_time_bonus (node, hints);
@@ -3536,18 +3532,17 @@ estimate_local_effects (struct cgraph_node *node)
if (lat->bottom
|| !lat->values
- || known_csts[i])
+ || avals.m_known_vals[i])
continue;
for (val = lat->values; val; val = val->next)
{
gcc_checking_assert (TREE_CODE (val->value) != TREE_BINFO);
- known_csts[i] = val->value;
+ avals.m_known_vals[i] = val->value;
int emc = estimate_move_cost (TREE_TYPE (val->value), true);
- perform_estimation_of_a_value (node, known_csts, known_contexts,
- known_aggs,
- removable_params_cost, emc, val);
+ perform_estimation_of_a_value (node, &avals, removable_params_cost,
+ emc, val);
if (dump_file && (dump_flags & TDF_DETAILS))
{
@@ -3559,7 +3554,7 @@ estimate_local_effects (struct cgraph_node *node)
val->local_time_benefit, val->local_size_cost);
}
}
- known_csts[i] = NULL_TREE;
+ avals.m_known_vals[i] = NULL_TREE;
}
for (i = 0; i < count; i++)
@@ -3574,15 +3569,14 @@ estimate_local_effects (struct cgraph_node *node)
if (ctxlat->bottom
|| !ctxlat->values
- || !known_contexts[i].useless_p ())
+ || !avals.m_known_contexts[i].useless_p ())
continue;
for (val = ctxlat->values; val; val = val->next)
{
- known_contexts[i] = val->value;
- perform_estimation_of_a_value (node, known_csts, known_contexts,
- known_aggs,
- removable_params_cost, 0, val);
+ avals.m_known_contexts[i] = val->value;
+ perform_estimation_of_a_value (node, &avals, removable_params_cost,
+ 0, val);
if (dump_file && (dump_flags & TDF_DETAILS))
{
@@ -3594,20 +3588,18 @@ estimate_local_effects (struct cgraph_node *node)
val->local_time_benefit, val->local_size_cost);
}
}
- known_contexts[i] = ipa_polymorphic_call_context ();
+ avals.m_known_contexts[i] = ipa_polymorphic_call_context ();
}
for (i = 0; i < count; i++)
{
class ipcp_param_lattices *plats = ipa_get_parm_lattices (info, i);
- struct ipa_agg_value_set *agg;
- struct ipcp_agg_lattice *aglat;
if (plats->aggs_bottom || !plats->aggs)
continue;
- agg = &known_aggs[i];
- for (aglat = plats->aggs; aglat; aglat = aglat->next)
+ ipa_agg_value_set *agg = &avals.m_known_aggs[i];
+ for (ipcp_agg_lattice *aglat = plats->aggs; aglat; aglat = aglat->next)
{
ipcp_value<tree> *val;
if (aglat->bottom || !aglat->values
@@ -3624,8 +3616,7 @@ estimate_local_effects (struct cgraph_node *node)
item.value = val->value;
agg->items.safe_push (item);
- perform_estimation_of_a_value (node, known_csts, known_contexts,
- known_aggs,
+ perform_estimation_of_a_value (node, &avals,
removable_params_cost, 0, val);
if (dump_file && (dump_flags & TDF_DETAILS))
@@ -3645,10 +3636,6 @@ estimate_local_effects (struct cgraph_node *node)
}
}
}
-
- known_csts.release ();
- known_contexts.release ();
- ipa_release_agg_values (known_aggs);
}
@@ -5372,31 +5359,34 @@ copy_useful_known_contexts (vec<ipa_polymorphic_call_context> known_contexts)
return vNULL;
}
-/* Copy KNOWN_CSTS and modify the copy according to VAL and INDEX. If
- non-empty, replace KNOWN_CONTEXTS with its copy too. */
+/* Copy known scalar values from AVALS into KNOWN_CSTS and modify the copy
+ according to VAL and INDEX. If non-empty, replace KNOWN_CONTEXTS with its
+ copy too. */
static void
-modify_known_vectors_with_val (vec<tree> *known_csts,
- vec<ipa_polymorphic_call_context> *known_contexts,
- ipcp_value<tree> *val,
- int index)
+copy_known_vectors_add_val (ipa_auto_call_arg_values *avals,
+ vec<tree> *known_csts,
+ vec<ipa_polymorphic_call_context> *known_contexts,
+ ipcp_value<tree> *val, int index)
{
- *known_csts = known_csts->copy ();
- *known_contexts = copy_useful_known_contexts (*known_contexts);
+ *known_csts = avals->m_known_vals.copy ();
+ *known_contexts = copy_useful_known_contexts (avals->m_known_contexts);
(*known_csts)[index] = val->value;
}
-/* Replace KNOWN_CSTS with its copy. Also copy KNOWN_CONTEXTS and modify the
- copy according to VAL and INDEX. */
+/* Copy known scalar values from AVALS into KNOWN_CSTS. Similarly, copy
+ contexts to KNOWN_CONTEXTS and modify the copy according to VAL and
+ INDEX. */
static void
-modify_known_vectors_with_val (vec<tree> *known_csts,
- vec<ipa_polymorphic_call_context> *known_contexts,
- ipcp_value<ipa_polymorphic_call_context> *val,
- int index)
+copy_known_vectors_add_val (ipa_auto_call_arg_values *avals,
+ vec<tree> *known_csts,
+ vec<ipa_polymorphic_call_context> *known_contexts,
+ ipcp_value<ipa_polymorphic_call_context> *val,
+ int index)
{
- *known_csts = known_csts->copy ();
- *known_contexts = known_contexts->copy ();
+ *known_csts = avals->m_known_vals.copy ();
+ *known_contexts = avals->m_known_contexts.copy ();
(*known_contexts)[index] = val->value;
}
@@ -5433,16 +5423,15 @@ ipcp_val_agg_replacement_ok_p (ipa_agg_replacement_value *,
return offset == -1;
}
-/* Decide whether to create a special version of NODE for value VAL of parameter
- at the given INDEX. If OFFSET is -1, the value is for the parameter itself,
- otherwise it is stored at the given OFFSET of the parameter. KNOWN_CSTS,
- KNOWN_CONTEXTS and KNOWN_AGGS describe the other already known values. */
+/* Decide whether to create a special version of NODE for value VAL of
+ parameter at the given INDEX. If OFFSET is -1, the value is for the
+ parameter itself, otherwise it is stored at the given OFFSET of the
+ parameter. AVALS describes the other already known values. */
template <typename valtype>
static bool
decide_about_value (struct cgraph_node *node, int index, HOST_WIDE_INT offset,
- ipcp_value<valtype> *val, vec<tree> known_csts,
- vec<ipa_polymorphic_call_context> known_contexts)
+ ipcp_value<valtype> *val, ipa_auto_call_arg_values *avals)
{
struct ipa_agg_replacement_value *aggvals;
int freq_sum, caller_count;
@@ -5492,13 +5481,16 @@ decide_about_value (struct cgraph_node *node, int index, HOST_WIDE_INT offset,
fprintf (dump_file, " Creating a specialized node of %s.\n",
node->dump_name ());
+ vec<tree> known_csts;
+ vec<ipa_polymorphic_call_context> known_contexts;
+
callers = gather_edges_for_value (val, node, caller_count);
if (offset == -1)
- modify_known_vectors_with_val (&known_csts, &known_contexts, val, index);
+ copy_known_vectors_add_val (avals, &known_csts, &known_contexts, val, index);
else
{
- known_csts = known_csts.copy ();
- known_contexts = copy_useful_known_contexts (known_contexts);
+ known_csts = avals->m_known_vals.copy ();
+ known_contexts = copy_useful_known_contexts (avals->m_known_contexts);
}
find_more_scalar_values_for_callers_subset (node, known_csts, callers);
find_more_contexts_for_caller_subset (node, &known_contexts, callers);
@@ -5522,8 +5514,6 @@ decide_whether_version_node (struct cgraph_node *node)
{
class ipa_node_params *info = IPA_NODE_REF (node);
int i, count = ipa_get_param_count (info);
- vec<tree> known_csts;
- vec<ipa_polymorphic_call_context> known_contexts;
bool ret = false;
if (count == 0)
@@ -5533,8 +5523,8 @@ decide_whether_version_node (struct cgraph_node *node)
fprintf (dump_file, "\nEvaluating opportunities for %s.\n",
node->dump_name ());
- gather_context_independent_values (info, &known_csts, &known_contexts,
- NULL, NULL);
+ ipa_auto_call_arg_values avals;
+ gather_context_independent_values (info, &avals, false, NULL);
for (i = 0; i < count;i++)
{
@@ -5543,12 +5533,11 @@ decide_whether_version_node (struct cgraph_node *node)
ipcp_lattice<ipa_polymorphic_call_context> *ctxlat = &plats->ctxlat;
if (!lat->bottom
- && !known_csts[i])
+ && !avals.m_known_vals[i])
{
ipcp_value<tree> *val;
for (val = lat->values; val; val = val->next)
- ret |= decide_about_value (node, i, -1, val, known_csts,
- known_contexts);
+ ret |= decide_about_value (node, i, -1, val, &avals);
}
if (!plats->aggs_bottom)
@@ -5557,22 +5546,20 @@ decide_whether_version_node (struct cgraph_node *node)
ipcp_value<tree> *val;
for (aglat = plats->aggs; aglat; aglat = aglat->next)
if (!aglat->bottom && aglat->values
- /* If the following is false, the one value is in
- known_aggs. */
+ /* If the following is false, the one value has been considered
+ for cloning for all contexts. */
&& (plats->aggs_contain_variable
|| !aglat->is_single_const ()))
for (val = aglat->values; val; val = val->next)
- ret |= decide_about_value (node, i, aglat->offset, val,
- known_csts, known_contexts);
+ ret |= decide_about_value (node, i, aglat->offset, val, &avals);
}
if (!ctxlat->bottom
- && known_contexts[i].useless_p ())
+ && avals.m_known_contexts[i].useless_p ())
{
ipcp_value<ipa_polymorphic_call_context> *val;
for (val = ctxlat->values; val; val = val->next)
- ret |= decide_about_value (node, i, -1, val, known_csts,
- known_contexts);
+ ret |= decide_about_value (node, i, -1, val, &avals);
}
info = IPA_NODE_REF (node);
@@ -5595,11 +5582,9 @@ decide_whether_version_node (struct cgraph_node *node)
if (!adjust_callers_for_value_intersection (callers, node))
{
/* If node is not called by anyone, or all its caller edges are
- self-recursive, the node is not really be in use, no need to
- do cloning. */
+ self-recursive, the node is not really in use, no need to do
+ cloning. */
callers.release ();
- known_csts.release ();
- known_contexts.release ();
info->do_clone_for_all_contexts = false;
return ret;
}
@@ -5608,6 +5593,9 @@ decide_whether_version_node (struct cgraph_node *node)
fprintf (dump_file, " - Creating a specialized node of %s "
"for all known contexts.\n", node->dump_name ());
+ vec<tree> known_csts = avals.m_known_vals.copy ();
+ vec<ipa_polymorphic_call_context> known_contexts
+ = copy_useful_known_contexts (avals.m_known_contexts);
find_more_scalar_values_for_callers_subset (node, known_csts, callers);
find_more_contexts_for_caller_subset (node, &known_contexts, callers);
ipa_agg_replacement_value *aggvals
@@ -5625,11 +5613,6 @@ decide_whether_version_node (struct cgraph_node *node)
IPA_NODE_REF (clone)->is_all_contexts_clone = true;
ret = true;
}
- else
- {
- known_csts.release ();
- known_contexts.release ();
- }
return ret;
}
diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c
index 4c1c1f91482..e8645aa0a1b 100644
--- a/gcc/ipa-fnsummary.c
+++ b/gcc/ipa-fnsummary.c
@@ -320,19 +320,18 @@ set_hint_predicate (predicate **p, predicate new_predicate)
is always false in the second and also builtin_constant_p tests cannot use
the fact that parameter is indeed a constant.
- KNOWN_VALS is partial mapping of parameters of NODE to constant values.
- KNOWN_AGGS is a vector of aggregate known offset/value set for each
- parameter. Return clause of possible truths. When INLINE_P is true, assume
- that we are inlining.
+ When INLINE_P is true, assume that we are inlining. AVAL contains known
+ information about argument values. The function does not modify its content
+ and so AVALs could also be of type ipa_call_arg_values but so far all
+ callers work with the auto version and so we avoid the conversion for
+ convenience.
- ERROR_MARK means compile time invariant. */
+ ERROR_MARK value of an argument means compile time invariant. */
static void
evaluate_conditions_for_known_args (struct cgraph_node *node,
bool inline_p,
- vec<tree> known_vals,
- vec<value_range> known_value_ranges,
- vec<ipa_agg_value_set> known_aggs,
+ ipa_auto_call_arg_values *avals,
clause_t *ret_clause,
clause_t *ret_nonspec_clause)
{
@@ -351,38 +350,33 @@ evaluate_conditions_for_known_args (struct cgraph_node *node,
/* We allow call stmt to have fewer arguments than the callee function
(especially for K&R style programs). So bound check here (we assume
- known_aggs vector, if non-NULL, has the same length as
- known_vals). */
- gcc_checking_assert (!known_aggs.length () || !known_vals.length ()
- || (known_vals.length () == known_aggs.length ()));
+ m_known_aggs vector is either empty or has the same length as
+ m_known_vals). */
+ gcc_checking_assert (!avals->m_known_aggs.length ()
+ || !avals->m_known_vals.length ()
+ || (avals->m_known_vals.length ()
+ == avals->m_known_aggs.length ()));
if (c->agg_contents)
{
- struct ipa_agg_value_set *agg;
-
if (c->code == predicate::changed
&& !c->by_ref
- && c->operand_num < (int)known_vals.length ()
- && (known_vals[c->operand_num] == error_mark_node))
+ && (avals->safe_sval_at(c->operand_num) == error_mark_node))
continue;
- if (c->operand_num < (int)known_aggs.length ())
+ if (ipa_agg_value_set *agg = avals->safe_aggval_at (c->operand_num))
{
- agg = &known_aggs[c->operand_num];
- val = ipa_find_agg_cst_for_param (agg,
- c->operand_num
- < (int) known_vals.length ()
- ? known_vals[c->operand_num]
- : NULL,
- c->offset, c->by_ref);
+ tree sval = avals->safe_sval_at (c->operand_num);
+ val = ipa_find_agg_cst_for_param (agg, sval, c->offset,
+ c->by_ref);
}
else
val = NULL_TREE;
}
- else if (c->operand_num < (int) known_vals.length ())
+ else
{
- val = known_vals[c->operand_num];
- if (val == error_mark_node && c->code != predicate::changed)
+ val = avals->safe_sval_at (c->operand_num);
+ if (val && val == error_mark_node && c->code != predicate::changed)
val = NULL_TREE;
}
@@ -446,53 +440,54 @@ evaluate_conditions_for_known_args (struct cgraph_node *node,
continue;
}
}
- if (c->operand_num < (int) known_value_ranges.length ()
+ if (c->operand_num < (int) avals->m_known_value_ranges.length ()
&& !c->agg_contents
- && !known_value_ranges[c->operand_num].undefined_p ()
- && !known_value_ranges[c->operand_num].varying_p ()
- && TYPE_SIZE (c->type)
- == TYPE_SIZE (known_value_ranges[c->operand_num].type ())
&& (!val || TREE_CODE (val) != INTEGER_CST))
{
- value_range vr = known_value_ranges[c->operand_num];
- if (!useless_type_conversion_p (c->type, vr.type ()))
+ value_range vr = avals->m_known_value_ranges[c->operand_num];
+ if (!vr.undefined_p ()
+ && !vr.varying_p ()
+ && (TYPE_SIZE (c->type) == TYPE_SIZE (vr.type ())))
{
- value_range res;
- range_fold_unary_expr (&res, NOP_EXPR,
- c->type, &vr, vr.type ());
- vr = res;
- }
- tree type = c->type;
-
- for (j = 0; vec_safe_iterate (c->param_ops, j, &op); j++)
- {
- if (vr.varying_p () || vr.undefined_p ())
- break;
-
- value_range res;
- if (!op->val[0])
- range_fold_unary_expr (&res, op->code, op->type, &vr, type);
- else if (!op->val[1])
+ if (!useless_type_conversion_p (c->type, vr.type ()))
{
- value_range op0 (op->val[0], op->val[0]);
- range_fold_binary_expr (&res, op->code, op->type,
- op->index ? &op0 : &vr,
- op->index ? &vr : &op0);
+ value_range res;
+ range_fold_unary_expr (&res, NOP_EXPR,
+ c->type, &vr, vr.type ());
+ vr = res;
+ }
+ tree type = c->type;
+
+ for (j = 0; vec_safe_iterate (c->param_ops, j, &op); j++)
+ {
+ if (vr.varying_p () || vr.undefined_p ())
+ break;
+
+ value_range res;
+ if (!op->val[0])
+ range_fold_unary_expr (&res, op->code, op->type, &vr, type);
+ else if (!op->val[1])
+ {
+ value_range op0 (op->val[0], op->val[0]);
+ range_fold_binary_expr (&res, op->code, op->type,
+ op->index ? &op0 : &vr,
+ op->index ? &vr : &op0);
+ }
+ else
+ gcc_unreachable ();
+ type = op->type;
+ vr = res;
+ }
+ if (!vr.varying_p () && !vr.undefined_p ())
+ {
+ value_range res;
+ value_range val_vr (c->val, c->val);
+ range_fold_binary_expr (&res, c->code, boolean_type_node,
+ &vr,
+ &val_vr);
+ if (res.zero_p ())
+ continue;
}
- else
- gcc_unreachable ();
- type = op->type;
- vr = res;
- }
- if (!vr.varying_p () && !vr.undefined_p ())
- {
- value_range res;
- value_range val_vr (c->val, c->val);
- range_fold_binary_expr (&res, c->code, boolean_type_node,
- &vr,
- &val_vr);
- if (res.zero_p ())
- continue;
}
}
@@ -538,24 +533,20 @@ fre_will_run_p (struct cgraph_node *node)
(if non-NULL) conditions evaluated for nonspecialized clone called
in a given context.
- KNOWN_VALS_PTR and KNOWN_AGGS_PTR must be non-NULL and will be filled by
- known constant and aggregate values of parameters.
-
- KNOWN_CONTEXT_PTR, if non-NULL, will be filled by polymorphic call contexts
- of parameter used by a polymorphic call. */
+ Vectors in AVALS will be populated with useful known information about
+ argument values - information not known to have any uses will be omitted -
+ except for m_known_contexts which will only be calculated if
+ COMPUTE_CONTEXTS is true. */
void
evaluate_properties_for_edge (struct cgraph_edge *e, bool inline_p,
clause_t *clause_ptr,
clause_t *nonspec_clause_ptr,
- vec<tree> *known_vals_ptr,
- vec<ipa_polymorphic_call_context>
- *known_contexts_ptr,
- vec<ipa_agg_value_set> *known_aggs_ptr)
+ ipa_auto_call_arg_values *avals,
+ bool compute_contexts)
{
struct cgraph_node *callee = e->callee->ultimate_alias_target ();
class ipa_fn_summary *info = ipa_fn_summaries->get (callee);
- auto_vec<value_range, 32> known_value_ranges;
class ipa_edge_args *args;
if (clause_ptr)
@@ -563,7 +554,7 @@ evaluate_properties_for_edge (struct cgraph_edge *e, bool inline_p,
if (ipa_node_params_sum
&& !e->call_stmt_cannot_inline_p
- && (info->conds || known_contexts_ptr)
+ && (info->conds || compute_contexts)
&& (args = IPA_EDGE_REF (e)) != NULL)
{
struct cgraph_node *caller;
@@ -608,15 +599,15 @@ evaluate_properties_for_edge (struct cgraph_edge *e, bool inline_p,
if (cst)
{
gcc_checking_assert (TREE_CODE (cst) != TREE_BINFO);
- if (!known_vals_ptr->length ())
- vec_safe_grow_cleared (known_vals_ptr, count, true);
- (*known_vals_ptr)[i] = cst;
+ if (!avals->m_known_vals.length ())
+ avals->m_known_vals.safe_grow_cleared (count, true);
+ avals->m_known_vals[i] = cst;
}
else if (inline_p && !es->param[i].change_prob)
{
- if (!known_vals_ptr->length ())
- vec_safe_grow_cleared (known_vals_ptr, count, true);
- (*known_vals_ptr)[i] = error_mark_node;
+ if (!avals->m_known_vals.length ())
+ avals->m_known_vals.safe_grow_cleared (count, true);
+ avals->m_known_vals[i] = error_mark_node;
}
/* If we failed to get simple constant, try value range. */
@@ -624,19 +615,20 @@ evaluate_properties_for_edge (struct cgraph_edge *e, bool inline_p,
&& vrp_will_run_p (caller)
&& ipa_is_param_used_by_ipa_predicates (callee_pi, i))
{
- value_range vr
+ value_range vr
= ipa_value_range_from_jfunc (caller_parms_info, e, jf,
ipa_get_type (callee_pi,
i));
if (!vr.undefined_p () && !vr.varying_p ())
{
- if (!known_value_ranges.length ())
+ if (!avals->m_known_value_ranges.length ())
{
- known_value_ranges.safe_grow (count, true);
+ avals->m_known_value_ranges.safe_grow (count, true);
for (int i = 0; i < count; ++i)
- new (&known_value_ranges[i]) value_range ();
+ new (&avals->m_known_value_ranges[i])
+ value_range ();
}
- known_value_ranges[i] = vr;
+ avals->m_known_value_ranges[i] = vr;
}
}
@@ -648,25 +640,25 @@ evaluate_properties_for_edge (struct cgraph_edge *e, bool inline_p,
caller, &jf->agg);
if (agg.items.length ())
{
- if (!known_aggs_ptr->length ())
- vec_safe_grow_cleared (known_aggs_ptr, count, true);
- (*known_aggs_ptr)[i] = agg;
+ if (!avals->m_known_aggs.length ())
+ avals->m_known_aggs.safe_grow_cleared (count, true);
+ avals->m_known_aggs[i] = agg;
}
}
}
/* For calls used in polymorphic calls we further determine
polymorphic call context. */
- if (known_contexts_ptr
+ if (compute_contexts
&& ipa_is_param_used_by_polymorphic_call (callee_pi, i))
{
ipa_polymorphic_call_context
ctx = ipa_context_from_jfunc (caller_parms_info, e, i, jf);
if (!ctx.useless_p ())
{
- if (!known_contexts_ptr->length ())
- known_contexts_ptr->safe_grow_cleared (count, true);
- (*known_contexts_ptr)[i]
+ if (!avals->m_known_contexts.length ())
+ avals->m_known_contexts.safe_grow_cleared (count, true);
+ avals->m_known_contexts[i]
= ipa_context_from_jfunc (caller_parms_info, e, i, jf);
}
}
@@ -685,18 +677,14 @@ evaluate_properties_for_edge (struct cgraph_edge *e, bool inline_p,
cst = NULL;
if (cst)
{
- if (!known_vals_ptr->length ())
- vec_safe_grow_cleared (known_vals_ptr, count, true);
- (*known_vals_ptr)[i] = cst;
+ if (!avals->m_known_vals.length ())
+ avals->m_known_vals.safe_grow_cleared (count, true);
+ avals->m_known_vals[i] = cst;
}
}
}
- evaluate_conditions_for_known_args (callee, inline_p,
- *known_vals_ptr,
- known_value_ranges,
- *known_aggs_ptr,
- clause_ptr,
+ evaluate_conditions_for_known_args (callee, inline_p, avals, clause_ptr,
nonspec_clause_ptr);
}
@@ -781,7 +769,7 @@ ipa_fn_summary_t::duplicate (cgraph_node *src,
vec<size_time_entry, va_gc> *entry = info->size_time_table;
/* Use SRC parm info since it may not be copied yet. */
class ipa_node_params *parms_info = IPA_NODE_REF (src);
- vec<tree> known_vals = vNULL;
+ ipa_auto_call_arg_values avals;
int count = ipa_get_param_count (parms_info);
int i, j;
clause_t possible_truths;
@@ -792,7 +780,7 @@ ipa_fn_summary_t::duplicate (cgraph_node *src,
struct cgraph_edge *edge, *next;
info->size_time_table = 0;
- known_vals.safe_grow_cleared (count, true);
+ avals.m_known_vals.safe_grow_cleared (count, true);
for (i = 0; i < count; i++)
{
struct ipa_replace_map *r;
@@ -801,20 +789,17 @@ ipa_fn_summary_t::duplicate (cgraph_node *src,
{
if (r->parm_num == i)
{
- known_vals[i] = r->new_tree;
+ avals.m_known_vals[i] = r->new_tree;
break;
}
}
}
evaluate_conditions_for_known_args (dst, false,
- known_vals,
- vNULL,
- vNULL,
+ &avals,
&possible_truths,
/* We are going to specialize,
so ignore nonspec truths. */
NULL);
- known_vals.release ();
info->account_size_time (0, 0, true_pred, true_pred);
@@ -3054,15 +3039,14 @@ compute_fn_summary_for_current (void)
return 0;
}
-/* Estimate benefit devirtualizing indirect edge IE, provided KNOWN_VALS,
- KNOWN_CONTEXTS and KNOWN_AGGS. */
+/* Estimate benefit devirtualizing indirect edge IE and return true if it can
+ be devirtualized and inlined, provided m_known_vals, m_known_contexts and
+ m_known_aggs in AVALS. Return false straight away if AVALS is NULL. */
static bool
estimate_edge_devirt_benefit (struct cgraph_edge *ie,
int *size, int *time,
- vec<tree> known_vals,
- vec<ipa_polymorphic_call_context> known_contexts,
- vec<ipa_agg_value_set> known_aggs)
+ ipa_call_arg_values *avals)
{
tree target;
struct cgraph_node *callee;
@@ -3070,13 +3054,13 @@ estimate_edge_devirt_benefit (struct cgraph_edge *ie,
enum availability avail;
bool speculative;
- if (!known_vals.length () && !known_contexts.length ())
+ if (!avals
+ || (!avals->m_known_vals.length() && !avals->m_known_contexts.length ()))
return false;
if (!opt_for_fn (ie->caller->decl, flag_indirect_inlining))
return false;
- target = ipa_get_indirect_edge_target (ie, known_vals, known_contexts,
- known_aggs, &speculative);
+ target = ipa_get_indirect_edge_target (ie, avals, &speculative);
if (!target || speculative)
return false;
@@ -3100,17 +3084,13 @@ estimate_edge_devirt_benefit (struct cgraph_edge *ie,
}
/* Increase SIZE, MIN_SIZE (if non-NULL) and TIME for size and time needed to
- handle edge E with probability PROB.
- Set HINTS if edge may be devirtualized.
- KNOWN_VALS, KNOWN_AGGS and KNOWN_CONTEXTS describe context of the call
- site. */
+ handle edge E with probability PROB. Set HINTS accordingly if edge may be
+ devirtualized. AVALS, if non-NULL, describes the context of the call site
+ as far as values of parameters are concerened. */
static inline void
estimate_edge_size_and_time (struct cgraph_edge *e, int *size, int *min_size,
- sreal *time,
- vec<tree> known_vals,
- vec<ipa_polymorphic_call_context> known_contexts,
- vec<ipa_agg_value_set> known_aggs,
+ sreal *time, ipa_call_arg_values *avals,
ipa_hints *hints)
{
class ipa_call_summary *es = ipa_call_summaries->get (e);
@@ -3119,8 +3099,7 @@ estimate_edge_size_and_time (struct cgraph_edge *e, int *size, int *min_size,
int cur_size;
if (!e->callee && hints && e->maybe_hot_p ()
- && estimate_edge_devirt_benefit (e, &call_size, &call_time,
- known_vals, known_contexts, known_aggs))
+ && estimate_edge_devirt_benefit (e, &call_size, &call_time, avals))
*hints |= INLINE_HINT_indirect_call;
cur_size = call_size * ipa_fn_summary::size_scale;
*size += cur_size;
@@ -3132,9 +3111,9 @@ estimate_edge_size_and_time (struct cgraph_edge *e, int *size, int *min_size,
/* Increase SIZE, MIN_SIZE and TIME for size and time needed to handle all
- calls in NODE. POSSIBLE_TRUTHS, KNOWN_VALS, KNOWN_AGGS and KNOWN_CONTEXTS
- describe context of the call site.
-
+ calls in NODE. POSSIBLE_TRUTHS and AVALS describe the context of the call
+ site.
+
Helper for estimate_calls_size_and_time which does the same but
(in most cases) faster. */
@@ -3143,9 +3122,7 @@ estimate_calls_size_and_time_1 (struct cgraph_node *node, int *size,
int *min_size, sreal *time,
ipa_hints *hints,
clause_t possible_truths,
- vec<tree> known_vals,
- vec<ipa_polymorphic_call_context> known_contexts,
- vec<ipa_agg_value_set> known_aggs)
+ ipa_call_arg_values *avals)
{
struct cgraph_edge *e;
for (e = node->callees; e; e = e->next_callee)
@@ -3154,10 +3131,8 @@ estimate_calls_size_and_time_1 (struct cgraph_node *node, int *size,
{
gcc_checking_assert (!ipa_call_summaries->get (e));
estimate_calls_size_and_time_1 (e->callee, size, min_size, time,
- hints,
- possible_truths,
- known_vals, known_contexts,
- known_aggs);
+ hints, possible_truths, avals);
+
continue;
}
class ipa_call_summary *es = ipa_call_summaries->get (e);
@@ -3175,9 +3150,7 @@ estimate_calls_size_and_time_1 (struct cgraph_node *node, int *size,
so we do not need to compute probabilities. */
estimate_edge_size_and_time (e, size,
es->predicate ? NULL : min_size,
- time,
- known_vals, known_contexts,
- known_aggs, hints);
+ time, avals, hints);
}
}
for (e = node->indirect_calls; e; e = e->next_callee)
@@ -3187,9 +3160,7 @@ estimate_calls_size_and_time_1 (struct cgraph_node *node, int *size,
|| es->predicate->evaluate (possible_truths))
estimate_edge_size_and_time (e, size,
es->predicate ? NULL : min_size,
- time,
- known_vals, known_contexts, known_aggs,
- hints);
+ time, avals, hints);
}
}
@@ -3211,8 +3182,7 @@ summarize_calls_size_and_time (struct cgraph_node *node,
int size = 0;
sreal time = 0;
- estimate_edge_size_and_time (e, &size, NULL, &time,
- vNULL, vNULL, vNULL, NULL);
+ estimate_edge_size_and_time (e, &size, NULL, &time, NULL, NULL);
struct predicate pred = true;
class ipa_call_summary *es = ipa_call_summaries->get (e);
@@ -3226,8 +3196,7 @@ summarize_calls_size_and_time (struct cgraph_node *node,
int size = 0;
sreal time = 0;
- estimate_edge_size_and_time (e, &size, NULL, &time,
- vNULL, vNULL, vNULL, NULL);
+ estimate_edge_size_and_time (e, &size, NULL, &time, NULL, NULL);
struct predicate pred = true;
class ipa_call_summary *es = ipa_call_summaries->get (e);
@@ -3238,17 +3207,15 @@ summarize_calls_size_and_time (struct cgraph_node *node,
}
/* Increase SIZE, MIN_SIZE and TIME for size and time needed to handle all
- calls in NODE. POSSIBLE_TRUTHS, KNOWN_VALS, KNOWN_AGGS and KNOWN_CONTEXTS
- describe context of the call site. */
+ calls in NODE. POSSIBLE_TRUTHS and AVALS (the latter if non-NULL) describe
+ context of the call site. */
static void
estimate_calls_size_and_time (struct cgraph_node *node, int *size,
int *min_size, sreal *time,
ipa_hints *hints,
clause_t possible_truths,
- vec<tree> known_vals,
- vec<ipa_polymorphic_call_context> known_contexts,
- vec<ipa_agg_value_set> known_aggs)
+ ipa_call_arg_values *avals)
{
class ipa_fn_summary *sum = ipa_fn_summaries->get (node);
bool use_table = true;
@@ -3267,9 +3234,10 @@ estimate_calls_size_and_time (struct cgraph_node *node, int *size,
use_table = false;
/* If there is an indirect edge that may be optimized, we need
to go the slow way. */
- else if ((known_vals.length ()
- || known_contexts.length ()
- || known_aggs.length ()) && hints)
+ else if (avals && hints
+ && (avals->m_known_vals.length ()
+ || avals->m_known_contexts.length ()
+ || avals->m_known_aggs.length ()))
{
class ipa_node_params *params_summary = IPA_NODE_REF (node);
unsigned int nargs = params_summary
@@ -3278,13 +3246,13 @@ estimate_calls_size_and_time (struct cgraph_node *node, int *size,
for (unsigned int i = 0; i < nargs && use_table; i++)
{
if (ipa_is_param_used_by_indirect_call (params_summary, i)
- && ((known_vals.length () > i && known_vals[i])
- || (known_aggs.length () > i
- && known_aggs[i].items.length ())))
+ && (avals->safe_sval_at (i)
+ || (avals->m_known_aggs.length () > i
+ && avals->m_known_aggs[i].items.length ())))
use_table = false;
else if (ipa_is_param_used_by_polymorphic_call (params_summary, i)
- && (known_contexts.length () > i
- && !known_contexts[i].useless_p ()))
+ && (avals->m_known_contexts.length () > i
+ && !avals->m_known_contexts[i].useless_p ()))
use_table = false;
}
}
@@ -3327,8 +3295,7 @@ estimate_calls_size_and_time (struct cgraph_node *node, int *size,
< ipa_fn_summary::max_size_time_table_size)
{
estimate_calls_size_and_time_1 (node, &old_size, NULL, &old_time, NULL,
- possible_truths, known_vals,
- known_contexts, known_aggs);
+ possible_truths, avals);
gcc_assert (*size == old_size);
if (time && (*time - old_time > 1 || *time - old_time < -1)
&& dump_file)
@@ -3340,31 +3307,22 @@ estimate_calls_size_and_time (struct cgraph_node *node, int *size,
/* Slow path by walking all edges. */
else
estimate_calls_size_and_time_1 (node, size, min_size, time, hints,
- possible_truths, known_vals, known_contexts,
- known_aggs);
+ possible_truths, avals);
}
-/* Default constructor for ipa call context.
- Memory allocation of known_vals, known_contexts
- and known_aggs vectors is owned by the caller, but can
- be release by ipa_call_context::release.
-
- inline_param_summary is owned by the caller. */
-ipa_call_context::ipa_call_context (cgraph_node *node,
- clause_t possible_truths,
+/* Main constructor for ipa call context. Memory allocation of ARG_VALUES
+ is owned by the caller. INLINE_PARAM_SUMMARY is also owned by the
+ caller. */
+
+ipa_call_context::ipa_call_context (cgraph_node *node, clause_t possible_truths,
clause_t nonspec_possible_truths,
- vec<tree> known_vals,
- vec<ipa_polymorphic_call_context>
- known_contexts,
- vec<ipa_agg_value_set> known_aggs,
vec<inline_param_summary>
- inline_param_summary)
+ inline_param_summary,
+ ipa_auto_call_arg_values *arg_values)
: m_node (node), m_possible_truths (possible_truths),
m_nonspec_possible_truths (nonspec_possible_truths),
m_inline_param_summary (inline_param_summary),
- m_known_vals (known_vals),
- m_known_contexts (known_contexts),
- m_known_aggs (known_aggs)
+ m_avals (arg_values)
{
}
@@ -3395,47 +3353,50 @@ ipa_call_context::duplicate_from (const ipa_call_context &ctx)
break;
}
}
- m_known_vals = vNULL;
- if (ctx.m_known_vals.exists ())
+ m_avals.m_known_vals = vNULL;
+ if (ctx.m_avals.m_known_vals.exists ())
{
- unsigned int n = MIN (ctx.m_known_vals.length (), nargs);
+ unsigned int n = MIN (ctx.m_avals.m_known_vals.length (), nargs);
for (unsigned int i = 0; i < n; i++)
if (ipa_is_param_used_by_indirect_call (params_summary, i)
- && ctx.m_known_vals[i])
+ && ctx.m_avals.m_known_vals[i])
{
- m_known_vals = ctx.m_known_vals.copy ();
+ m_avals.m_known_vals = ctx.m_avals.m_known_vals.copy ();
break;
}
}
- m_known_contexts = vNULL;
- if (ctx.m_known_contexts.exists ())
+ m_avals.m_known_contexts = vNULL;
+ if (ctx.m_avals.m_known_contexts.exists ())
{
- unsigned int n = MIN (ctx.m_known_contexts.length (), nargs);
+ unsigned int n = MIN (ctx.m_avals.m_known_contexts.length (), nargs);
for (unsigned int i = 0; i < n; i++)
if (ipa_is_param_used_by_polymorphic_call (params_summary, i)
- && !ctx.m_known_contexts[i].useless_p ())
+ && !ctx.m_avals.m_known_contexts[i].useless_p ())
{
- m_known_contexts = ctx.m_known_contexts.copy ();
+ m_avals.m_known_contexts = ctx.m_avals.m_known_contexts.copy ();
break;
}
}
- m_known_aggs = vNULL;
- if (ctx.m_known_aggs.exists ())
+ m_avals.m_known_aggs = vNULL;
+ if (ctx.m_avals.m_known_aggs.exists ())
{
- unsigned int n = MIN (ctx.m_known_aggs.length (), nargs);
+ unsigned int n = MIN (ctx.m_avals.m_known_aggs.length (), nargs);
for (unsigned int i = 0; i < n; i++)
if (ipa_is_param_used_by_indirect_call (params_summary, i)
- && !ctx.m_known_aggs[i].is_empty ())
+ && !ctx.m_avals.m_known_aggs[i].is_empty ())
{
- m_known_aggs = ipa_copy_agg_values (ctx.m_known_aggs);
+ m_avals.m_known_aggs
+ = ipa_copy_agg_values (ctx.m_avals.m_known_aggs);
break;
}
}
+
+ m_avals.m_known_value_ranges = vNULL;
}
/* Release memory used by known_vals/contexts/aggs vectors.
@@ -3449,11 +3410,11 @@ ipa_call_context::release (bool all)
/* See if context is initialized at first place. */
if (!m_node)
return;
- ipa_release_agg_values (m_known_aggs, all);
+ ipa_release_agg_values (m_avals.m_known_aggs, all);
if (all)
{
- m_known_vals.release ();
- m_known_contexts.release ();
+ m_avals.m_known_vals.release ();
+ m_avals.m_known_contexts.release ();
m_inline_param_summary.release ();
}
}
@@ -3499,77 +3460,81 @@ ipa_call_context::equal_to (const ipa_call_context &ctx)
return false;
}
}
- if (m_known_vals.exists () || ctx.m_known_vals.exists ())
+ if (m_avals.m_known_vals.exists () || ctx.m_avals.m_known_vals.exists ())
{
for (unsigned int i = 0; i < nargs; i++)
{
if (!ipa_is_param_used_by_indirect_call (params_summary, i))
continue;
- if (i >= m_known_vals.length () || !m_known_vals[i])
+ if (i >= m_avals.m_known_vals.length () || !m_avals.m_known_vals[i])
{
- if (i < ctx.m_known_vals.length () && ctx.m_known_vals[i])
+ if (i < ctx.m_avals.m_known_vals.length ()
+ && ctx.m_avals.m_known_vals[i])
return false;
continue;
}
- if (i >= ctx.m_known_vals.length () || !ctx.m_known_vals[i])
+ if (i >= ctx.m_avals.m_known_vals.length ()
+ || !ctx.m_avals.m_known_vals[i])
{
- if (i < m_known_vals.length () && m_known_vals[i])
+ if (i < m_avals.m_known_vals.length () && m_avals.m_known_vals[i])
return false;
continue;
}
- if (m_known_vals[i] != ctx.m_known_vals[i])
+ if (m_avals.m_known_vals[i] != ctx.m_avals.m_known_vals[i])
return false;
}
}
- if (m_known_contexts.exists () || ctx.m_known_contexts.exists ())
+ if (m_avals.m_known_contexts.exists ()
+ || ctx.m_avals.m_known_contexts.exists ())
{
for (unsigned int i = 0; i < nargs; i++)
{
if (!ipa_is_param_used_by_polymorphic_call (params_summary, i))
continue;
- if (i >= m_known_contexts.length ()
- || m_known_contexts[i].useless_p ())
+ if (i >= m_avals.m_known_contexts.length ()
+ || m_avals.m_known_contexts[i].useless_p ())
{
- if (i < ctx.m_known_contexts.length ()
- && !ctx.m_known_contexts[i].useless_p ())
+ if (i < ctx.m_avals.m_known_contexts.length ()
+ && !ctx.m_avals.m_known_contexts[i].useless_p ())
return false;
continue;
}
- if (i >= ctx.m_known_contexts.length ()
- || ctx.m_known_contexts[i].useless_p ())
+ if (i >= ctx.m_avals.m_known_contexts.length ()
+ || ctx.m_avals.m_known_contexts[i].useless_p ())
{
- if (i < m_known_contexts.length ()
- && !m_known_contexts[i].useless_p ())
+ if (i < m_avals.m_known_contexts.length ()
+ && !m_avals.m_known_contexts[i].useless_p ())
return false;
continue;
}
- if (!m_known_contexts[i].equal_to
- (ctx.m_known_contexts[i]))
+ if (!m_avals.m_known_contexts[i].equal_to
+ (ctx.m_avals.m_known_contexts[i]))
return false;
}
}
- if (m_known_aggs.exists () || ctx.m_known_aggs.exists ())
+ if (m_avals.m_known_aggs.exists () || ctx.m_avals.m_known_aggs.exists ())
{
for (unsigned int i = 0; i < nargs; i++)
{
if (!ipa_is_param_used_by_indirect_call (params_summary, i))
continue;
- if (i >= m_known_aggs.length () || m_known_aggs[i].is_empty ())
+ if (i >= m_avals.m_known_aggs.length ()
+ || m_avals.m_known_aggs[i].is_empty ())
{
- if (i < ctx.m_known_aggs.length ()
- && !ctx.m_known_aggs[i].is_empty ())
+ if (i < ctx.m_avals.m_known_aggs.length ()
+ && !ctx.m_avals.m_known_aggs[i].is_empty ())
return false;
continue;
}
- if (i >= ctx.m_known_aggs.length ()
- || ctx.m_known_aggs[i].is_empty ())
+ if (i >= ctx.m_avals.m_known_aggs.length ()
+ || ctx.m_avals.m_known_aggs[i].is_empty ())
{
- if (i < m_known_aggs.length ()
- && !m_known_aggs[i].is_empty ())
+ if (i < m_avals.m_known_aggs.length ()
+ && !m_avals.m_known_aggs[i].is_empty ())
return false;
continue;
}
- if (!m_known_aggs[i].equal_to (ctx.m_known_aggs[i]))
+ if (!m_avals.m_known_aggs[i].equal_to (ctx.m_avals.m_known_aggs[i]))
return false;
}
}
@@ -3619,7 +3584,7 @@ ipa_call_context::estimate_size_and_time (int *ret_size,
estimate_calls_size_and_time (m_node, &size, &min_size,
ret_time ? &time : NULL,
ret_hints ? &hints : NULL, m_possible_truths,
- m_known_vals, m_known_contexts, m_known_aggs);
+ &m_avals);
sreal nonspecialized_time = time;
@@ -3726,22 +3691,16 @@ ipa_call_context::estimate_size_and_time (int *ret_size,
void
estimate_ipcp_clone_size_and_time (struct cgraph_node *node,
- vec<tree> known_vals,
- vec<ipa_polymorphic_call_context>
- known_contexts,
- vec<ipa_agg_value_set> known_aggs,
+ ipa_auto_call_arg_values *avals,
int *ret_size, sreal *ret_time,
sreal *ret_nonspec_time,
ipa_hints *hints)
{
clause_t clause, nonspec_clause;
- /* TODO: Also pass known value ranges. */
- evaluate_conditions_for_known_args (node, false, known_vals, vNULL,
- known_aggs, &clause, &nonspec_clause);
- ipa_call_context ctx (node, clause, nonspec_clause,
- known_vals, known_contexts,
- known_aggs, vNULL);
+ evaluate_conditions_for_known_args (node, false, avals, &clause,
+ &nonspec_clause);
+ ipa_call_context ctx (node, clause, nonspec_clause, vNULL, avals);
ctx.estimate_size_and_time (ret_size, NULL, ret_time,
ret_nonspec_time, hints);
}
@@ -3970,10 +3929,8 @@ ipa_merge_fn_summary_after_inlining (struct cgraph_edge *edge)
if (callee_info->conds)
{
- auto_vec<tree, 32> known_vals;
- auto_vec<ipa_agg_value_set, 32> known_aggs;
- evaluate_properties_for_edge (edge, true, &clause, NULL,
- &known_vals, NULL, &known_aggs);
+ ipa_auto_call_arg_values avals;
+ evaluate_properties_for_edge (edge, true, &clause, NULL, &avals, false);
}
if (ipa_node_params_sum && callee_info->conds)
{
@@ -4067,8 +4024,7 @@ ipa_merge_fn_summary_after_inlining (struct cgraph_edge *edge)
int edge_size = 0;
sreal edge_time = 0;
- estimate_edge_size_and_time (edge, &edge_size, NULL, &edge_time, vNULL,
- vNULL, vNULL, 0);
+ estimate_edge_size_and_time (edge, &edge_size, NULL, &edge_time, NULL, 0);
/* Unaccount size and time of the optimized out call. */
info->account_size_time (-edge_size, -edge_time,
es->predicate ? *es->predicate : true,
@@ -4110,7 +4066,7 @@ ipa_update_overall_fn_summary (struct cgraph_node *node, bool reset)
estimate_calls_size_and_time (node, &size_info->size, &info->min_size,
&info->time, NULL,
~(clause_t) (1 << predicate::false_condition),
- vNULL, vNULL, vNULL);
+ NULL);
size_info->size = RDIV (size_info->size, ipa_fn_summary::size_scale);
info->min_size = RDIV (info->min_size, ipa_fn_summary::size_scale);
}
diff --git a/gcc/ipa-fnsummary.h b/gcc/ipa-fnsummary.h
index 4e1f841afad..6893858d18e 100644
--- a/gcc/ipa-fnsummary.h
+++ b/gcc/ipa-fnsummary.h
@@ -297,10 +297,8 @@ public:
ipa_call_context (cgraph_node *node,
clause_t possible_truths,
clause_t nonspec_possible_truths,
- vec<tree> known_vals,
- vec<ipa_polymorphic_call_context> known_contexts,
- vec<ipa_agg_value_set> known_aggs,
- vec<inline_param_summary> m_inline_param_summary);
+ vec<inline_param_summary> inline_param_summary,
+ ipa_auto_call_arg_values *arg_values);
ipa_call_context ()
: m_node(NULL)
{
@@ -328,14 +326,9 @@ private:
/* Inline summary maintains info about change probabilities. */
vec<inline_param_summary> m_inline_param_summary;
- /* The following is used only to resolve indirect calls. */
-
- /* Vector describing known values of parameters. */
- vec<tree> m_known_vals;
- /* Vector describing known polymorphic call contexts. */
- vec<ipa_polymorphic_call_context> m_known_contexts;
- /* Vector describing known aggregate values. */
- vec<ipa_agg_value_set> m_known_aggs;
+ /* Even after having calculated clauses, the information about argument
+ values is used to resolve indirect calls. */
+ ipa_call_arg_values m_avals;
};
extern fast_call_summary <ipa_call_summary *, va_heap> *ipa_call_summaries;
@@ -349,9 +342,7 @@ void ipa_free_fn_summary (void);
void ipa_free_size_summary (void);
void inline_analyze_function (struct cgraph_node *node);
void estimate_ipcp_clone_size_and_time (struct cgraph_node *,
- vec<tree>,
- vec<ipa_polymorphic_call_context>,
- vec<ipa_agg_value_set>,
+ ipa_auto_call_arg_values *,
int *, sreal *, sreal *,
ipa_hints *);
void ipa_merge_fn_summary_after_inlining (struct cgraph_edge *edge);
@@ -365,10 +356,8 @@ void evaluate_properties_for_edge (struct cgraph_edge *e,
bool inline_p,
clause_t *clause_ptr,
clause_t *nonspec_clause_ptr,
- vec<tree> *known_vals_ptr,
- vec<ipa_polymorphic_call_context>
- *known_contexts_ptr,
- vec<ipa_agg_value_set> *);
+ ipa_auto_call_arg_values *avals,
+ bool compute_contexts);
void ipa_fnsummary_c_finalize (void);
HOST_WIDE_INT ipa_get_stack_frame_offset (struct cgraph_node *node);
diff --git a/gcc/ipa-inline-analysis.c b/gcc/ipa-inline-analysis.c
index 148efbc09ef..d2ae8196d09 100644
--- a/gcc/ipa-inline-analysis.c
+++ b/gcc/ipa-inline-analysis.c
@@ -184,20 +184,16 @@ do_estimate_edge_time (struct cgraph_edge *edge, sreal *ret_nonspec_time)
ipa_hints hints;
struct cgraph_node *callee;
clause_t clause, nonspec_clause;
- auto_vec<tree, 32> known_vals;
- auto_vec<ipa_polymorphic_call_context, 32> known_contexts;
- auto_vec<ipa_agg_value_set, 32> known_aggs;
+ ipa_auto_call_arg_values avals;
class ipa_call_summary *es = ipa_call_summaries->get (edge);
int min_size = -1;
callee = edge->callee->ultimate_alias_target ();
gcc_checking_assert (edge->inline_failed);
- evaluate_properties_for_edge (edge, true,
- &clause, &nonspec_clause, &known_vals,
- &known_contexts, &known_aggs);
- ipa_call_context ctx (callee, clause, nonspec_clause, known_vals,
- known_contexts, known_aggs, es->param);
+ evaluate_properties_for_edge (edge, true, &clause, &nonspec_clause,
+ &avals, true);
+ ipa_call_context ctx (callee, clause, nonspec_clause, es->param, &avals);
if (node_context_cache != NULL)
{
node_context_summary *e = node_context_cache->get_create (callee);
@@ -255,7 +251,6 @@ do_estimate_edge_time (struct cgraph_edge *edge, sreal *ret_nonspec_time)
: edge->caller->count.ipa ())))
hints |= INLINE_HINT_known_hot;
- ctx.release ();
gcc_checking_assert (size >= 0);
gcc_checking_assert (time >= 0);
@@ -307,9 +302,6 @@ do_estimate_edge_size (struct cgraph_edge *edge)
int size;
struct cgraph_node *callee;
clause_t clause, nonspec_clause;
- auto_vec<tree, 32> known_vals;
- auto_vec<ipa_polymorphic_call_context, 32> known_contexts;
- auto_vec<ipa_agg_value_set, 32> known_aggs;
/* When we do caching, use do_estimate_edge_time to populate the entry. */
@@ -325,14 +317,11 @@ do_estimate_edge_size (struct cgraph_edge *edge)
/* Early inliner runs without caching, go ahead and do the dirty work. */
gcc_checking_assert (edge->inline_failed);
- evaluate_properties_for_edge (edge, true,
- &clause, &nonspec_clause,
- &known_vals, &known_contexts,
- &known_aggs);
- ipa_call_context ctx (callee, clause, nonspec_clause, known_vals,
- known_contexts, known_aggs, vNULL);
+ ipa_auto_call_arg_values avals;
+ evaluate_properties_for_edge (edge, true, &clause, &nonspec_clause,
+ &avals, true);
+ ipa_call_context ctx (callee, clause, nonspec_clause, vNULL, &avals);
ctx.estimate_size_and_time (&size, NULL, NULL, NULL, NULL);
- ctx.release ();
return size;
}
@@ -346,9 +335,6 @@ do_estimate_edge_hints (struct cgraph_edge *edge)
ipa_hints hints;
struct cgraph_node *callee;
clause_t clause, nonspec_clause;
- auto_vec<tree, 32> known_vals;
- auto_vec<ipa_polymorphic_call_context, 32> known_contexts;
- auto_vec<ipa_agg_value_set, 32> known_aggs;
/* When we do caching, use do_estimate_edge_time to populate the entry. */
@@ -364,14 +350,11 @@ do_estimate_edge_hints (struct cgraph_edge *edge)
/* Early inliner runs without caching, go ahead and do the dirty work. */
gcc_checking_assert (edge->inline_failed);
- evaluate_properties_for_edge (edge, true,
- &clause, &nonspec_clause,
- &known_vals, &known_contexts,
- &known_aggs);
- ipa_call_context ctx (callee, clause, nonspec_clause, known_vals,
- known_contexts, known_aggs, vNULL);
+ ipa_auto_call_arg_values avals;
+ evaluate_properties_for_edge (edge, true, &clause, &nonspec_clause,
+ &avals, true);
+ ipa_call_context ctx (callee, clause, nonspec_clause, vNULL, &avals);
ctx.estimate_size_and_time (NULL, NULL, NULL, NULL, &hints);
- ctx.release ();
hints |= simple_edge_hints (edge);
return hints;
}
diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c
index b28c78eeab4..230625a89bb 100644
--- a/gcc/ipa-prop.c
+++ b/gcc/ipa-prop.c
@@ -5795,4 +5795,14 @@ ipa_agg_value::equal_to (const ipa_agg_value &other)
return offset == other.offset
&& operand_equal_p (value, other.value, 0);
}
+
+/* Destructor also removing individual aggregate values. */
+
+ipa_auto_call_arg_values::~ipa_auto_call_arg_values ()
+{
+ ipa_release_agg_values (m_known_aggs, false);
+}
+
+
+
#include "gt-ipa-prop.h"
diff --git a/gcc/ipa-prop.h b/gcc/ipa-prop.h
index 23fcf905ef3..8b2edf6300c 100644
--- a/gcc/ipa-prop.h
+++ b/gcc/ipa-prop.h
@@ -433,6 +433,107 @@ ipa_get_jf_ancestor_type_preserved (struct ipa_jump_func *jfunc)
return jfunc->value.ancestor.agg_preserved;
}
+/* Class for allocating a bundle of various potentially known properties about
+ actual arguments of a particular call on stack for the usual case and on
+ heap only if there are unusually many arguments. The data is deallocated
+ when the instance of this class goes out of scope or is otherwise
+ destructed. */
+
+class ipa_auto_call_arg_values
+{
+public:
+ ~ipa_auto_call_arg_values ();
+
+ /* If m_known_vals (vector of known "scalar" values) is sufficiantly long,
+ return its element at INDEX, otherwise return NULL. */
+ tree safe_sval_at (int index)
+ {
+ /* TODO: Assert non-negative index here and test. */
+ if ((unsigned) index < m_known_vals.length ())
+ return m_known_vals[index];
+ return NULL;
+ }
+
+ /* If m_known_aggs is sufficiantly long, return the pointer rto its element
+ at INDEX, otherwise return NULL. */
+ ipa_agg_value_set *safe_aggval_at (int index)
+ {
+ /* TODO: Assert non-negative index here and test. */
+ if ((unsigned) index < m_known_aggs.length ())
+ return &m_known_aggs[index];
+ return NULL;
+ }
+
+ /* Vector describing known values of parameters. */
+ auto_vec<tree, 32> m_known_vals;
+
+ /* Vector describing known polymorphic call contexts. */
+ auto_vec<ipa_polymorphic_call_context, 32> m_known_contexts;
+
+ /* Vector describing known aggregate values. */
+ auto_vec<ipa_agg_value_set, 32> m_known_aggs;
+
+ /* Vector describing known value ranges of arguments. */
+ auto_vec<value_range, 32> m_known_value_ranges;
+};
+
+/* Class bundling the various potentially known properties about actual
+ arguments of a particular call. This variant does not deallocate the
+ bundled data in any way. */
+
+class ipa_call_arg_values
+{
+public:
+ /* Default constructor, setting the vectors to empty ones. */
+ ipa_call_arg_values ()
+ {}
+
+ /* Construct this general variant of the bundle from the variant which uses
+ auto_vecs to hold the vectors. This means that vectors of objects
+ constructed with this constructor should not be changed because if they
+ get reallocated, the member vectors and the underlying auto_vecs would get
+ out of sync. */
+ ipa_call_arg_values (ipa_auto_call_arg_values *aavals)
+ : m_known_vals (aavals->m_known_vals),
+ m_known_contexts (aavals->m_known_contexts),
+ m_known_aggs (aavals->m_known_aggs),
+ m_known_value_ranges (aavals->m_known_value_ranges)
+ {}
+
+ /* If m_known_vals (vector of known "scalar" values) is sufficiantly long,
+ return its element at INDEX, otherwise return NULL. */
+ tree safe_sval_at (int index)
+ {
+ /* TODO: Assert non-negative index here and test. */
+ if ((unsigned) index < m_known_vals.length ())
+ return m_known_vals[index];
+ return NULL;
+ }
+
+ /* If m_known_aggs is sufficiantly long, return the pointer rto its element
+ at INDEX, otherwise return NULL. */
+ ipa_agg_value_set *safe_aggval_at (int index)
+ {
+ /* TODO: Assert non-negative index here and test. */
+ if ((unsigned) index < m_known_aggs.length ())
+ return &m_known_aggs[index];
+ return NULL;
+ }
+
+ /* Vector describing known values of parameters. */
+ vec<tree> m_known_vals = vNULL;
+
+ /* Vector describing known polymorphic call contexts. */
+ vec<ipa_polymorphic_call_context> m_known_contexts = vNULL;
+
+ /* Vector describing known aggregate values. */
+ vec<ipa_agg_value_set> m_known_aggs = vNULL;
+
+ /* Vector describing known value ranges of arguments. */
+ vec<value_range> m_known_value_ranges = vNULL;
+};
+
+
/* Summary describing a single formal parameter. */
struct GTY(()) ipa_param_descriptor
@@ -970,12 +1071,13 @@ void ipa_initialize_node_params (struct cgraph_node *node);
bool ipa_propagate_indirect_call_infos (struct cgraph_edge *cs,
vec<cgraph_edge *> *new_edges);
-/* Indirect edge and binfo processing. */
+/* Indirect edge processing and target discovery. */
tree ipa_get_indirect_edge_target (struct cgraph_edge *ie,
- vec<tree>,
- vec<ipa_polymorphic_call_context>,
- vec<ipa_agg_value_set>,
- bool *);
+ ipa_call_arg_values *avals,
+ bool *speculative);
+tree ipa_get_indirect_edge_target (struct cgraph_edge *ie,
+ ipa_auto_call_arg_values *avals,
+ bool *speculative);
struct cgraph_edge *ipa_make_edge_direct_to_target (struct cgraph_edge *, tree,
bool speculative = false);
tree ipa_impossible_devirt_target (struct cgraph_edge *, tree);
--
2.28.0
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 2/6] ipa: Introduce ipa_cached_call_context
2020-09-29 18:12 [PATCH 0/6] IPA cleanups and IPA-CP improvements for 548.exchange2_r Martin Jambor
` (4 preceding siblings ...)
2020-09-28 18:47 ` [PATCH 3/6] ipa: Bundle estimates of ipa_call_context::estimate_size_and_time Martin Jambor
@ 2020-09-28 18:47 ` Martin Jambor
2020-09-29 18:27 ` Jan Hubicka
5 siblings, 1 reply; 18+ messages in thread
From: Martin Jambor @ 2020-09-28 18:47 UTC (permalink / raw)
To: GCC Patches; +Cc: Jan Hubicka
Hi,
as we discussed with Honza on the mailin glist last week, making
cached call context structure distinct from the normal one may make it
clearer that the cached data need to be explicitely deallocated.
This patch does that division. It is not mandatory for the overall
main goals of the patch set and can be dropped if deemed superfluous.
gcc/ChangeLog:
2020-09-02 Martin Jambor <mjambor@suse.cz>
* ipa-fnsummary.h (ipa_cached_call_context): New forward declaration
and class.
(class ipa_call_context): Make friend ipa_cached_call_context. Moved
methods duplicate_from and release to it too.
* ipa-fnsummary.c (ipa_call_context::duplicate_from): Moved to class
ipa_cached_call_context.
(ipa_call_context::release): Likewise, removed the parameter.
* ipa-inline-analysis.c (node_context_cache_entry): Change the type of
ctx to ipa_cached_call_context.
(do_estimate_edge_time): Remove parameter from the call to
ipa_cached_call_context::release.
---
gcc/ipa-fnsummary.c | 21 ++++++++-------------
gcc/ipa-fnsummary.h | 16 ++++++++++++++--
gcc/ipa-inline-analysis.c | 4 ++--
3 files changed, 24 insertions(+), 17 deletions(-)
diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c
index e8645aa0a1b..4ef7d2570e9 100644
--- a/gcc/ipa-fnsummary.c
+++ b/gcc/ipa-fnsummary.c
@@ -3329,7 +3329,7 @@ ipa_call_context::ipa_call_context (cgraph_node *node, clause_t possible_truths,
/* Set THIS to be a duplicate of CTX. Copy all relevant info. */
void
-ipa_call_context::duplicate_from (const ipa_call_context &ctx)
+ipa_cached_call_context::duplicate_from (const ipa_call_context &ctx)
{
m_node = ctx.m_node;
m_possible_truths = ctx.m_possible_truths;
@@ -3399,24 +3399,19 @@ ipa_call_context::duplicate_from (const ipa_call_context &ctx)
m_avals.m_known_value_ranges = vNULL;
}
-/* Release memory used by known_vals/contexts/aggs vectors.
- If ALL is true release also inline_param_summary.
- This happens when context was previously duplicated to be stored
- into cache. */
+/* Release memory used by known_vals/contexts/aggs vectors. and
+ inline_param_summary. */
void
-ipa_call_context::release (bool all)
+ipa_cached_call_context::release ()
{
/* See if context is initialized at first place. */
if (!m_node)
return;
- ipa_release_agg_values (m_avals.m_known_aggs, all);
- if (all)
- {
- m_avals.m_known_vals.release ();
- m_avals.m_known_contexts.release ();
- m_inline_param_summary.release ();
- }
+ ipa_release_agg_values (m_avals.m_known_aggs, true);
+ m_avals.m_known_vals.release ();
+ m_avals.m_known_contexts.release ();
+ m_inline_param_summary.release ();
}
/* Return true if CTX describes the same call context as THIS. */
diff --git a/gcc/ipa-fnsummary.h b/gcc/ipa-fnsummary.h
index 6893858d18e..020a6f0425d 100644
--- a/gcc/ipa-fnsummary.h
+++ b/gcc/ipa-fnsummary.h
@@ -287,6 +287,8 @@ public:
ipa_call_summary *dst_data);
};
+class ipa_cached_call_context;
+
/* This object describe a context of call. That is a summary of known
information about its parameters. Main purpose of this context is
to give more realistic estimations of function runtime, size and
@@ -307,8 +309,6 @@ public:
sreal *ret_time,
sreal *ret_nonspecialized_time,
ipa_hints *ret_hints);
- void duplicate_from (const ipa_call_context &ctx);
- void release (bool all = false);
bool equal_to (const ipa_call_context &);
bool exists_p ()
{
@@ -329,6 +329,18 @@ private:
/* Even after having calculated clauses, the information about argument
values is used to resolve indirect calls. */
ipa_call_arg_values m_avals;
+
+ friend ipa_cached_call_context;
+};
+
+/* Variant of ipa_call_context that is stored in a cache over a longer period
+ of time. */
+
+class ipa_cached_call_context : public ipa_call_context
+{
+public:
+ void duplicate_from (const ipa_call_context &ctx);
+ void release ();
};
extern fast_call_summary <ipa_call_summary *, va_heap> *ipa_call_summaries;
diff --git a/gcc/ipa-inline-analysis.c b/gcc/ipa-inline-analysis.c
index d2ae8196d09..b7af77f7b9b 100644
--- a/gcc/ipa-inline-analysis.c
+++ b/gcc/ipa-inline-analysis.c
@@ -57,7 +57,7 @@ fast_call_summary<edge_growth_cache_entry *, va_heap> *edge_growth_cache = NULL;
class node_context_cache_entry
{
public:
- ipa_call_context ctx;
+ ipa_cached_call_context ctx;
sreal time, nonspec_time;
int size;
ipa_hints hints;
@@ -226,7 +226,7 @@ do_estimate_edge_time (struct cgraph_edge *edge, sreal *ret_nonspec_time)
node_context_cache_miss++;
else
node_context_cache_clear++;
- e->entry.ctx.release (true);
+ e->entry.ctx.release ();
ctx.estimate_size_and_time (&size, &min_size,
&time, &nonspec_time, &hints);
e->entry.size = size;
--
2.28.0
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 3/6] ipa: Bundle estimates of ipa_call_context::estimate_size_and_time
2020-09-29 18:12 [PATCH 0/6] IPA cleanups and IPA-CP improvements for 548.exchange2_r Martin Jambor
` (3 preceding siblings ...)
2020-09-28 18:47 ` [PATCH 1/6] ipa: Bundle vectors describing argument values Martin Jambor
@ 2020-09-28 18:47 ` Martin Jambor
2020-09-29 18:39 ` Jan Hubicka
2020-09-28 18:47 ` [PATCH 2/6] ipa: Introduce ipa_cached_call_context Martin Jambor
5 siblings, 1 reply; 18+ messages in thread
From: Martin Jambor @ 2020-09-28 18:47 UTC (permalink / raw)
To: GCC Patches; +Cc: Jan Hubicka
A subsequent patch adds another two estimates that the code in
ipa_call_context::estimate_size_and_time computes, and the fact that
the function has a special output parameter for each thing it computes
would make it have just too many. Therefore, this patch collapses all
those ouptut parameters into one output structure.
gcc/ChangeLog:
2020-09-02 Martin Jambor <mjambor@suse.cz>
* ipa-inline-analysis.c (do_estimate_edge_time): Adjusted to use
ipa_call_estimates.
(do_estimate_edge_size): Likewise.
(do_estimate_edge_hints): Likewise.
* ipa-fnsummary.h (struct ipa_call_estimates): New type.
(ipa_call_context::estimate_size_and_time): Adjusted declaration.
(estimate_ipcp_clone_size_and_time): Likewise.
* ipa-cp.c (hint_time_bonus): Changed the type of the second argument
to ipa_call_estimates.
(perform_estimation_of_a_value): Adjusted to use ipa_call_estimates.
(estimate_local_effects): Likewise.
* ipa-fnsummary.c (ipa_call_context::estimate_size_and_time): Adjusted
to return estimates in a single ipa_call_estimates parameter.
(estimate_ipcp_clone_size_and_time): Likewise.
---
gcc/ipa-cp.c | 45 ++++++++++++++---------------
gcc/ipa-fnsummary.c | 60 +++++++++++++++++++--------------------
gcc/ipa-fnsummary.h | 36 +++++++++++++++++------
gcc/ipa-inline-analysis.c | 47 +++++++++++++++++-------------
4 files changed, 105 insertions(+), 83 deletions(-)
diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index 292dd7e5bdf..77c84a6ed5d 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -3196,12 +3196,13 @@ devirtualization_time_bonus (struct cgraph_node *node,
return res;
}
-/* Return time bonus incurred because of HINTS. */
+/* Return time bonus incurred because of hints stored in ESTIMATES. */
static int
-hint_time_bonus (cgraph_node *node, ipa_hints hints)
+hint_time_bonus (cgraph_node *node, const ipa_call_estimates &estimates)
{
int result = 0;
+ ipa_hints hints = estimates.hints;
if (hints & (INLINE_HINT_loop_iterations | INLINE_HINT_loop_stride))
result += opt_for_fn (node->decl, param_ipa_cp_loop_hint_bonus);
return result;
@@ -3397,15 +3398,13 @@ perform_estimation_of_a_value (cgraph_node *node,
int removable_params_cost, int est_move_cost,
ipcp_value_base *val)
{
- int size, time_benefit;
- sreal time, base_time;
- ipa_hints hints;
+ int time_benefit;
+ ipa_call_estimates estimates;
- estimate_ipcp_clone_size_and_time (node, avals, &size, &time,
- &base_time, &hints);
- base_time -= time;
- if (base_time > 65535)
- base_time = 65535;
+ estimate_ipcp_clone_size_and_time (node, avals, &estimates);
+ sreal time_delta = estimates.nonspecialized_time - estimates.time;
+ if (time_delta > 65535)
+ time_delta = 65535;
/* Extern inline functions have no cloning local time benefits because they
will be inlined anyway. The only reason to clone them is if it enables
@@ -3413,11 +3412,12 @@ perform_estimation_of_a_value (cgraph_node *node,
if (DECL_EXTERNAL (node->decl) && DECL_DECLARED_INLINE_P (node->decl))
time_benefit = 0;
else
- time_benefit = base_time.to_int ()
+ time_benefit = time_delta.to_int ()
+ devirtualization_time_bonus (node, avals)
- + hint_time_bonus (node, hints)
+ + hint_time_bonus (node, estimates)
+ removable_params_cost + est_move_cost;
+ int size = estimates.size;
gcc_checking_assert (size >=0);
/* The inliner-heuristics based estimates may think that in certain
contexts some functions do not have any size at all but we want
@@ -3472,23 +3472,21 @@ estimate_local_effects (struct cgraph_node *node)
|| (removable_params_cost && node->can_change_signature))
{
struct caller_statistics stats;
- ipa_hints hints;
- sreal time, base_time;
- int size;
+ ipa_call_estimates estimates;
init_caller_stats (&stats);
node->call_for_symbol_thunks_and_aliases (gather_caller_stats, &stats,
false);
- estimate_ipcp_clone_size_and_time (node, &avals, &size, &time,
- &base_time, &hints);
- time -= devirt_bonus;
- time -= hint_time_bonus (node, hints);
- time -= removable_params_cost;
- size -= stats.n_calls * removable_params_cost;
+ estimate_ipcp_clone_size_and_time (node, &avals, &estimates);
+ sreal time = estimates.nonspecialized_time - estimates.time;
+ time += devirt_bonus;
+ time += hint_time_bonus (node, estimates);
+ time += removable_params_cost;
+ int size = estimates.size - stats.n_calls * removable_params_cost;
if (dump_file)
fprintf (dump_file, " - context independent values, size: %i, "
- "time_benefit: %f\n", size, (base_time - time).to_double ());
+ "time_benefit: %f\n", size, (time).to_double ());
if (size <= 0 || node->local)
{
@@ -3499,8 +3497,7 @@ estimate_local_effects (struct cgraph_node *node)
"known contexts, code not going to grow.\n");
}
else if (good_cloning_opportunity_p (node,
- MIN ((base_time - time).to_int (),
- 65536),
+ MIN ((time).to_int (), 65536),
stats.freq_sum, stats.count_sum,
size))
{
diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c
index 4ef7d2570e9..6082f34d63f 100644
--- a/gcc/ipa-fnsummary.c
+++ b/gcc/ipa-fnsummary.c
@@ -3536,18 +3536,14 @@ ipa_call_context::equal_to (const ipa_call_context &ctx)
return true;
}
-/* Estimate size and time needed to execute call in the given context.
- Additionally determine hints determined by the context. Finally compute
- minimal size needed for the call that is independent on the call context and
- can be used for fast estimates. Return the values in RET_SIZE,
- RET_MIN_SIZE, RET_TIME and RET_HINTS. */
+/* Fill in the selected fields in ESTIMATES with value estimated for call in
+ this context. Always compute size and min_size. Only compute time and
+ nonspecialized_time if EST_TIMES is true. Only compute hints if EST_HINTS
+ is true. */
void
-ipa_call_context::estimate_size_and_time (int *ret_size,
- int *ret_min_size,
- sreal *ret_time,
- sreal *ret_nonspecialized_time,
- ipa_hints *ret_hints)
+ipa_call_context::estimate_size_and_time (ipa_call_estimates *estimates,
+ bool est_times, bool est_hints)
{
class ipa_fn_summary *info = ipa_fn_summaries->get (m_node);
size_time_entry *e;
@@ -3577,8 +3573,8 @@ ipa_call_context::estimate_size_and_time (int *ret_size,
if (m_node->callees || m_node->indirect_calls)
estimate_calls_size_and_time (m_node, &size, &min_size,
- ret_time ? &time : NULL,
- ret_hints ? &hints : NULL, m_possible_truths,
+ est_times ? &time : NULL,
+ est_hints ? &hints : NULL, m_possible_truths,
&m_avals);
sreal nonspecialized_time = time;
@@ -3605,7 +3601,7 @@ ipa_call_context::estimate_size_and_time (int *ret_size,
known to be constant in a specialized setting. */
if (nonconst)
size += e->size;
- if (!ret_time)
+ if (!est_times)
continue;
nonspecialized_time += e->time;
if (!nonconst)
@@ -3645,7 +3641,7 @@ ipa_call_context::estimate_size_and_time (int *ret_size,
if (time > nonspecialized_time)
time = nonspecialized_time;
- if (ret_hints)
+ if (est_hints)
{
if (info->loop_iterations
&& !info->loop_iterations->evaluate (m_possible_truths))
@@ -3663,18 +3659,23 @@ ipa_call_context::estimate_size_and_time (int *ret_size,
min_size = RDIV (min_size, ipa_fn_summary::size_scale);
if (dump_file && (dump_flags & TDF_DETAILS))
- fprintf (dump_file, "\n size:%i time:%f nonspec time:%f\n", (int) size,
- time.to_double (), nonspecialized_time.to_double ());
- if (ret_time)
- *ret_time = time;
- if (ret_nonspecialized_time)
- *ret_nonspecialized_time = nonspecialized_time;
- if (ret_size)
- *ret_size = size;
- if (ret_min_size)
- *ret_min_size = min_size;
- if (ret_hints)
- *ret_hints = hints;
+ {
+ if (est_times)
+ fprintf (dump_file, "\n size:%i time:%f nonspec time:%f\n",
+ (int) size, time.to_double (),
+ nonspecialized_time.to_double ());
+ else
+ fprintf (dump_file, "\n size:%i (time not estimated)\n", (int) size);
+ }
+ if (est_times)
+ {
+ estimates->time = time;
+ estimates->nonspecialized_time = nonspecialized_time;
+ }
+ estimates->size = size;
+ estimates->min_size = min_size;
+ if (est_hints)
+ estimates->hints = hints;
return;
}
@@ -3687,17 +3688,14 @@ ipa_call_context::estimate_size_and_time (int *ret_size,
void
estimate_ipcp_clone_size_and_time (struct cgraph_node *node,
ipa_auto_call_arg_values *avals,
- int *ret_size, sreal *ret_time,
- sreal *ret_nonspec_time,
- ipa_hints *hints)
+ ipa_call_estimates *estimates)
{
clause_t clause, nonspec_clause;
evaluate_conditions_for_known_args (node, false, avals, &clause,
&nonspec_clause);
ipa_call_context ctx (node, clause, nonspec_clause, vNULL, avals);
- ctx.estimate_size_and_time (ret_size, NULL, ret_time,
- ret_nonspec_time, hints);
+ ctx.estimate_size_and_time (estimates);
}
/* Return stack frame offset where frame of NODE is supposed to start inside
diff --git a/gcc/ipa-fnsummary.h b/gcc/ipa-fnsummary.h
index 020a6f0425d..ccb6b432f0b 100644
--- a/gcc/ipa-fnsummary.h
+++ b/gcc/ipa-fnsummary.h
@@ -287,6 +287,29 @@ public:
ipa_call_summary *dst_data);
};
+/* Estimated execution times, code sizes and other information about the
+ code executing a call described by ipa_call_context. */
+
+struct ipa_call_estimates
+{
+ /* Estimated size needed to execute call in the given context. */
+ int size;
+
+ /* Minimal size needed for the call that is + independent on the call context
+ and can be used for fast estimates. */
+ int min_size;
+
+ /* Estimated time needed to execute call in the given context. */
+ sreal time;
+
+ /* Estimated time needed to execute the function when not ignoring
+ computations known to be constant in this context. */
+ sreal nonspecialized_time;
+
+ /* Further discovered reasons why to inline or specialize the give calls. */
+ ipa_hints hints;
+};
+
class ipa_cached_call_context;
/* This object describe a context of call. That is a summary of known
@@ -305,10 +328,8 @@ public:
: m_node(NULL)
{
}
- void estimate_size_and_time (int *ret_size, int *ret_min_size,
- sreal *ret_time,
- sreal *ret_nonspecialized_time,
- ipa_hints *ret_hints);
+ void estimate_size_and_time (ipa_call_estimates *estimates,
+ bool est_times = true, bool est_hints = true);
bool equal_to (const ipa_call_context &);
bool exists_p ()
{
@@ -353,10 +374,9 @@ void ipa_dump_hints (FILE *f, ipa_hints);
void ipa_free_fn_summary (void);
void ipa_free_size_summary (void);
void inline_analyze_function (struct cgraph_node *node);
-void estimate_ipcp_clone_size_and_time (struct cgraph_node *,
- ipa_auto_call_arg_values *,
- int *, sreal *, sreal *,
- ipa_hints *);
+void estimate_ipcp_clone_size_and_time (struct cgraph_node *node,
+ ipa_auto_call_arg_values *avals,
+ ipa_call_estimates *estimates);
void ipa_merge_fn_summary_after_inlining (struct cgraph_edge *edge);
void ipa_update_overall_fn_summary (struct cgraph_node *node, bool reset = true);
void compute_fn_summary (struct cgraph_node *, bool);
diff --git a/gcc/ipa-inline-analysis.c b/gcc/ipa-inline-analysis.c
index b7af77f7b9b..acbf82e84d9 100644
--- a/gcc/ipa-inline-analysis.c
+++ b/gcc/ipa-inline-analysis.c
@@ -208,16 +208,12 @@ do_estimate_edge_time (struct cgraph_edge *edge, sreal *ret_nonspec_time)
&& !opt_for_fn (callee->decl, flag_profile_partial_training)
&& !callee->count.ipa_p ())
{
- sreal chk_time, chk_nonspec_time;
- int chk_size, chk_min_size;
-
- ipa_hints chk_hints;
- ctx.estimate_size_and_time (&chk_size, &chk_min_size,
- &chk_time, &chk_nonspec_time,
- &chk_hints);
- gcc_assert (chk_size == size && chk_time == time
- && chk_nonspec_time == nonspec_time
- && chk_hints == hints);
+ ipa_call_estimates chk_estimates;
+ ctx.estimate_size_and_time (&chk_estimates);
+ gcc_assert (chk_estimates.size == size
+ && chk_estimates.time == time
+ && chk_estimates.nonspecialized_time == nonspec_time
+ && chk_estimates.hints == hints);
}
}
else
@@ -227,18 +223,28 @@ do_estimate_edge_time (struct cgraph_edge *edge, sreal *ret_nonspec_time)
else
node_context_cache_clear++;
e->entry.ctx.release ();
- ctx.estimate_size_and_time (&size, &min_size,
- &time, &nonspec_time, &hints);
+ ipa_call_estimates estimates;
+ ctx.estimate_size_and_time (&estimates);
+ size = estimates.size;
e->entry.size = size;
+ time = estimates.time;
e->entry.time = time;
+ nonspec_time = estimates.nonspecialized_time;
e->entry.nonspec_time = nonspec_time;
+ hints = estimates.hints;
e->entry.hints = hints;
e->entry.ctx.duplicate_from (ctx);
}
}
else
- ctx.estimate_size_and_time (&size, &min_size,
- &time, &nonspec_time, &hints);
+ {
+ ipa_call_estimates estimates;
+ ctx.estimate_size_and_time (&estimates);
+ size = estimates.size;
+ time = estimates.time;
+ nonspec_time = estimates.nonspecialized_time;
+ hints = estimates.hints;
+ }
/* When we have profile feedback, we can quite safely identify hot
edges and for those we disable size limits. Don't do that when
@@ -321,8 +327,9 @@ do_estimate_edge_size (struct cgraph_edge *edge)
evaluate_properties_for_edge (edge, true, &clause, &nonspec_clause,
&avals, true);
ipa_call_context ctx (callee, clause, nonspec_clause, vNULL, &avals);
- ctx.estimate_size_and_time (&size, NULL, NULL, NULL, NULL);
- return size;
+ ipa_call_estimates estimates;
+ ctx.estimate_size_and_time (&estimates, false, false);
+ return estimates.size;
}
@@ -332,7 +339,6 @@ do_estimate_edge_size (struct cgraph_edge *edge)
ipa_hints
do_estimate_edge_hints (struct cgraph_edge *edge)
{
- ipa_hints hints;
struct cgraph_node *callee;
clause_t clause, nonspec_clause;
@@ -341,7 +347,7 @@ do_estimate_edge_hints (struct cgraph_edge *edge)
if (edge_growth_cache != NULL)
{
do_estimate_edge_time (edge);
- hints = edge_growth_cache->get (edge)->hints;
+ ipa_hints hints = edge_growth_cache->get (edge)->hints;
gcc_checking_assert (hints);
return hints - 1;
}
@@ -354,8 +360,9 @@ do_estimate_edge_hints (struct cgraph_edge *edge)
evaluate_properties_for_edge (edge, true, &clause, &nonspec_clause,
&avals, true);
ipa_call_context ctx (callee, clause, nonspec_clause, vNULL, &avals);
- ctx.estimate_size_and_time (NULL, NULL, NULL, NULL, &hints);
- hints |= simple_edge_hints (edge);
+ ipa_call_estimates estimates;
+ ctx.estimate_size_and_time (&estimates, false, true);
+ ipa_hints hints = estimates.hints | simple_edge_hints (edge);
return hints;
}
--
2.28.0
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 0/6] IPA cleanups and IPA-CP improvements for 548.exchange2_r
@ 2020-09-29 18:12 Martin Jambor
2020-09-21 14:25 ` [PATCH 5/6] ipa-cp: Add dumping of overall_size after cloning Martin Jambor
` (5 more replies)
0 siblings, 6 replies; 18+ messages in thread
From: Martin Jambor @ 2020-09-29 18:12 UTC (permalink / raw)
To: GCC Patches; +Cc: Jan Hubicka
Hi,
this patch set is a result of rebasing the one I sent here three weeks
ago on current trunk. Last week I also checked the WPA memory
requirements when building Firefox and it did not change from the
unpatched numbers.
Bootstrapped and tested and LTO bootstrapped on x86-64. OK for trunk?
Thanks,
Martin
Martin Jambor (6):
ipa: Bundle vectors describing argument values
ipa: Introduce ipa_cached_call_context
ipa: Bundle estimates of ipa_call_context::estimate_size_and_time
ipa: Multiple predicates for loop properties, with frequencies
ipa-cp: Add dumping of overall_size after cloning
ipa-cp: Separate and increase the large-unit parameter
gcc/doc/invoke.texi | 4 +
gcc/ipa-cp.c | 303 ++++----
gcc/ipa-fnsummary.c | 829 +++++++++++----------
gcc/ipa-fnsummary.h | 113 ++-
gcc/ipa-inline-analysis.c | 92 +--
gcc/ipa-prop.c | 10 +
gcc/ipa-prop.h | 112 ++-
gcc/params.opt | 8 +
gcc/testsuite/gcc.dg/ipa/ipcp-loophint-1.c | 29 +
9 files changed, 867 insertions(+), 633 deletions(-)
create mode 100644 gcc/testsuite/gcc.dg/ipa/ipcp-loophint-1.c
--
2.28.0
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/6] ipa: Introduce ipa_cached_call_context
2020-09-28 18:47 ` [PATCH 2/6] ipa: Introduce ipa_cached_call_context Martin Jambor
@ 2020-09-29 18:27 ` Jan Hubicka
0 siblings, 0 replies; 18+ messages in thread
From: Jan Hubicka @ 2020-09-29 18:27 UTC (permalink / raw)
To: Martin Jambor; +Cc: GCC Patches
> Hi,
>
> as we discussed with Honza on the mailin glist last week, making
> cached call context structure distinct from the normal one may make it
> clearer that the cached data need to be explicitely deallocated.
>
> This patch does that division. It is not mandatory for the overall
> main goals of the patch set and can be dropped if deemed superfluous.
>
> gcc/ChangeLog:
>
> 2020-09-02 Martin Jambor <mjambor@suse.cz>
>
> * ipa-fnsummary.h (ipa_cached_call_context): New forward declaration
> and class.
> (class ipa_call_context): Make friend ipa_cached_call_context. Moved
> methods duplicate_from and release to it too.
> * ipa-fnsummary.c (ipa_call_context::duplicate_from): Moved to class
> ipa_cached_call_context.
> (ipa_call_context::release): Likewise, removed the parameter.
> * ipa-inline-analysis.c (node_context_cache_entry): Change the type of
> ctx to ipa_cached_call_context.
> (do_estimate_edge_time): Remove parameter from the call to
> ipa_cached_call_context::release.
OK,
thanks
Honza
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 3/6] ipa: Bundle estimates of ipa_call_context::estimate_size_and_time
2020-09-28 18:47 ` [PATCH 3/6] ipa: Bundle estimates of ipa_call_context::estimate_size_and_time Martin Jambor
@ 2020-09-29 18:39 ` Jan Hubicka
0 siblings, 0 replies; 18+ messages in thread
From: Jan Hubicka @ 2020-09-29 18:39 UTC (permalink / raw)
To: Martin Jambor; +Cc: GCC Patches
> A subsequent patch adds another two estimates that the code in
> ipa_call_context::estimate_size_and_time computes, and the fact that
> the function has a special output parameter for each thing it computes
> would make it have just too many. Therefore, this patch collapses all
> those ouptut parameters into one output structure.
>
> gcc/ChangeLog:
>
> 2020-09-02 Martin Jambor <mjambor@suse.cz>
>
> * ipa-inline-analysis.c (do_estimate_edge_time): Adjusted to use
> ipa_call_estimates.
> (do_estimate_edge_size): Likewise.
> (do_estimate_edge_hints): Likewise.
> * ipa-fnsummary.h (struct ipa_call_estimates): New type.
> (ipa_call_context::estimate_size_and_time): Adjusted declaration.
> (estimate_ipcp_clone_size_and_time): Likewise.
> * ipa-cp.c (hint_time_bonus): Changed the type of the second argument
> to ipa_call_estimates.
> (perform_estimation_of_a_value): Adjusted to use ipa_call_estimates.
> (estimate_local_effects): Likewise.
> * ipa-fnsummary.c (ipa_call_context::estimate_size_and_time): Adjusted
> to return estimates in a single ipa_call_estimates parameter.
> (estimate_ipcp_clone_size_and_time): Likewise.
OK,
Honza
> ---
> gcc/ipa-cp.c | 45 ++++++++++++++---------------
> gcc/ipa-fnsummary.c | 60 +++++++++++++++++++--------------------
> gcc/ipa-fnsummary.h | 36 +++++++++++++++++------
> gcc/ipa-inline-analysis.c | 47 +++++++++++++++++-------------
> 4 files changed, 105 insertions(+), 83 deletions(-)
>
> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
> index 292dd7e5bdf..77c84a6ed5d 100644
> --- a/gcc/ipa-cp.c
> +++ b/gcc/ipa-cp.c
> @@ -3196,12 +3196,13 @@ devirtualization_time_bonus (struct cgraph_node *node,
> return res;
> }
>
> -/* Return time bonus incurred because of HINTS. */
> +/* Return time bonus incurred because of hints stored in ESTIMATES. */
>
> static int
> -hint_time_bonus (cgraph_node *node, ipa_hints hints)
> +hint_time_bonus (cgraph_node *node, const ipa_call_estimates &estimates)
> {
> int result = 0;
> + ipa_hints hints = estimates.hints;
> if (hints & (INLINE_HINT_loop_iterations | INLINE_HINT_loop_stride))
> result += opt_for_fn (node->decl, param_ipa_cp_loop_hint_bonus);
> return result;
> @@ -3397,15 +3398,13 @@ perform_estimation_of_a_value (cgraph_node *node,
> int removable_params_cost, int est_move_cost,
> ipcp_value_base *val)
> {
> - int size, time_benefit;
> - sreal time, base_time;
> - ipa_hints hints;
> + int time_benefit;
> + ipa_call_estimates estimates;
>
> - estimate_ipcp_clone_size_and_time (node, avals, &size, &time,
> - &base_time, &hints);
> - base_time -= time;
> - if (base_time > 65535)
> - base_time = 65535;
> + estimate_ipcp_clone_size_and_time (node, avals, &estimates);
> + sreal time_delta = estimates.nonspecialized_time - estimates.time;
> + if (time_delta > 65535)
> + time_delta = 65535;
>
> /* Extern inline functions have no cloning local time benefits because they
> will be inlined anyway. The only reason to clone them is if it enables
> @@ -3413,11 +3412,12 @@ perform_estimation_of_a_value (cgraph_node *node,
> if (DECL_EXTERNAL (node->decl) && DECL_DECLARED_INLINE_P (node->decl))
> time_benefit = 0;
> else
> - time_benefit = base_time.to_int ()
> + time_benefit = time_delta.to_int ()
> + devirtualization_time_bonus (node, avals)
> - + hint_time_bonus (node, hints)
> + + hint_time_bonus (node, estimates)
> + removable_params_cost + est_move_cost;
>
> + int size = estimates.size;
> gcc_checking_assert (size >=0);
> /* The inliner-heuristics based estimates may think that in certain
> contexts some functions do not have any size at all but we want
> @@ -3472,23 +3472,21 @@ estimate_local_effects (struct cgraph_node *node)
> || (removable_params_cost && node->can_change_signature))
> {
> struct caller_statistics stats;
> - ipa_hints hints;
> - sreal time, base_time;
> - int size;
> + ipa_call_estimates estimates;
>
> init_caller_stats (&stats);
> node->call_for_symbol_thunks_and_aliases (gather_caller_stats, &stats,
> false);
> - estimate_ipcp_clone_size_and_time (node, &avals, &size, &time,
> - &base_time, &hints);
> - time -= devirt_bonus;
> - time -= hint_time_bonus (node, hints);
> - time -= removable_params_cost;
> - size -= stats.n_calls * removable_params_cost;
> + estimate_ipcp_clone_size_and_time (node, &avals, &estimates);
> + sreal time = estimates.nonspecialized_time - estimates.time;
> + time += devirt_bonus;
> + time += hint_time_bonus (node, estimates);
> + time += removable_params_cost;
> + int size = estimates.size - stats.n_calls * removable_params_cost;
>
> if (dump_file)
> fprintf (dump_file, " - context independent values, size: %i, "
> - "time_benefit: %f\n", size, (base_time - time).to_double ());
> + "time_benefit: %f\n", size, (time).to_double ());
>
> if (size <= 0 || node->local)
> {
> @@ -3499,8 +3497,7 @@ estimate_local_effects (struct cgraph_node *node)
> "known contexts, code not going to grow.\n");
> }
> else if (good_cloning_opportunity_p (node,
> - MIN ((base_time - time).to_int (),
> - 65536),
> + MIN ((time).to_int (), 65536),
> stats.freq_sum, stats.count_sum,
> size))
> {
> diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c
> index 4ef7d2570e9..6082f34d63f 100644
> --- a/gcc/ipa-fnsummary.c
> +++ b/gcc/ipa-fnsummary.c
> @@ -3536,18 +3536,14 @@ ipa_call_context::equal_to (const ipa_call_context &ctx)
> return true;
> }
>
> -/* Estimate size and time needed to execute call in the given context.
> - Additionally determine hints determined by the context. Finally compute
> - minimal size needed for the call that is independent on the call context and
> - can be used for fast estimates. Return the values in RET_SIZE,
> - RET_MIN_SIZE, RET_TIME and RET_HINTS. */
> +/* Fill in the selected fields in ESTIMATES with value estimated for call in
> + this context. Always compute size and min_size. Only compute time and
> + nonspecialized_time if EST_TIMES is true. Only compute hints if EST_HINTS
> + is true. */
>
> void
> -ipa_call_context::estimate_size_and_time (int *ret_size,
> - int *ret_min_size,
> - sreal *ret_time,
> - sreal *ret_nonspecialized_time,
> - ipa_hints *ret_hints)
> +ipa_call_context::estimate_size_and_time (ipa_call_estimates *estimates,
> + bool est_times, bool est_hints)
> {
> class ipa_fn_summary *info = ipa_fn_summaries->get (m_node);
> size_time_entry *e;
> @@ -3577,8 +3573,8 @@ ipa_call_context::estimate_size_and_time (int *ret_size,
>
> if (m_node->callees || m_node->indirect_calls)
> estimate_calls_size_and_time (m_node, &size, &min_size,
> - ret_time ? &time : NULL,
> - ret_hints ? &hints : NULL, m_possible_truths,
> + est_times ? &time : NULL,
> + est_hints ? &hints : NULL, m_possible_truths,
> &m_avals);
>
> sreal nonspecialized_time = time;
> @@ -3605,7 +3601,7 @@ ipa_call_context::estimate_size_and_time (int *ret_size,
> known to be constant in a specialized setting. */
> if (nonconst)
> size += e->size;
> - if (!ret_time)
> + if (!est_times)
> continue;
> nonspecialized_time += e->time;
> if (!nonconst)
> @@ -3645,7 +3641,7 @@ ipa_call_context::estimate_size_and_time (int *ret_size,
> if (time > nonspecialized_time)
> time = nonspecialized_time;
>
> - if (ret_hints)
> + if (est_hints)
> {
> if (info->loop_iterations
> && !info->loop_iterations->evaluate (m_possible_truths))
> @@ -3663,18 +3659,23 @@ ipa_call_context::estimate_size_and_time (int *ret_size,
> min_size = RDIV (min_size, ipa_fn_summary::size_scale);
>
> if (dump_file && (dump_flags & TDF_DETAILS))
> - fprintf (dump_file, "\n size:%i time:%f nonspec time:%f\n", (int) size,
> - time.to_double (), nonspecialized_time.to_double ());
> - if (ret_time)
> - *ret_time = time;
> - if (ret_nonspecialized_time)
> - *ret_nonspecialized_time = nonspecialized_time;
> - if (ret_size)
> - *ret_size = size;
> - if (ret_min_size)
> - *ret_min_size = min_size;
> - if (ret_hints)
> - *ret_hints = hints;
> + {
> + if (est_times)
> + fprintf (dump_file, "\n size:%i time:%f nonspec time:%f\n",
> + (int) size, time.to_double (),
> + nonspecialized_time.to_double ());
> + else
> + fprintf (dump_file, "\n size:%i (time not estimated)\n", (int) size);
> + }
> + if (est_times)
> + {
> + estimates->time = time;
> + estimates->nonspecialized_time = nonspecialized_time;
> + }
> + estimates->size = size;
> + estimates->min_size = min_size;
> + if (est_hints)
> + estimates->hints = hints;
> return;
> }
>
> @@ -3687,17 +3688,14 @@ ipa_call_context::estimate_size_and_time (int *ret_size,
> void
> estimate_ipcp_clone_size_and_time (struct cgraph_node *node,
> ipa_auto_call_arg_values *avals,
> - int *ret_size, sreal *ret_time,
> - sreal *ret_nonspec_time,
> - ipa_hints *hints)
> + ipa_call_estimates *estimates)
> {
> clause_t clause, nonspec_clause;
>
> evaluate_conditions_for_known_args (node, false, avals, &clause,
> &nonspec_clause);
> ipa_call_context ctx (node, clause, nonspec_clause, vNULL, avals);
> - ctx.estimate_size_and_time (ret_size, NULL, ret_time,
> - ret_nonspec_time, hints);
> + ctx.estimate_size_and_time (estimates);
> }
>
> /* Return stack frame offset where frame of NODE is supposed to start inside
> diff --git a/gcc/ipa-fnsummary.h b/gcc/ipa-fnsummary.h
> index 020a6f0425d..ccb6b432f0b 100644
> --- a/gcc/ipa-fnsummary.h
> +++ b/gcc/ipa-fnsummary.h
> @@ -287,6 +287,29 @@ public:
> ipa_call_summary *dst_data);
> };
>
> +/* Estimated execution times, code sizes and other information about the
> + code executing a call described by ipa_call_context. */
> +
> +struct ipa_call_estimates
> +{
> + /* Estimated size needed to execute call in the given context. */
> + int size;
> +
> + /* Minimal size needed for the call that is + independent on the call context
> + and can be used for fast estimates. */
> + int min_size;
> +
> + /* Estimated time needed to execute call in the given context. */
> + sreal time;
> +
> + /* Estimated time needed to execute the function when not ignoring
> + computations known to be constant in this context. */
> + sreal nonspecialized_time;
> +
> + /* Further discovered reasons why to inline or specialize the give calls. */
> + ipa_hints hints;
> +};
> +
> class ipa_cached_call_context;
>
> /* This object describe a context of call. That is a summary of known
> @@ -305,10 +328,8 @@ public:
> : m_node(NULL)
> {
> }
> - void estimate_size_and_time (int *ret_size, int *ret_min_size,
> - sreal *ret_time,
> - sreal *ret_nonspecialized_time,
> - ipa_hints *ret_hints);
> + void estimate_size_and_time (ipa_call_estimates *estimates,
> + bool est_times = true, bool est_hints = true);
> bool equal_to (const ipa_call_context &);
> bool exists_p ()
> {
> @@ -353,10 +374,9 @@ void ipa_dump_hints (FILE *f, ipa_hints);
> void ipa_free_fn_summary (void);
> void ipa_free_size_summary (void);
> void inline_analyze_function (struct cgraph_node *node);
> -void estimate_ipcp_clone_size_and_time (struct cgraph_node *,
> - ipa_auto_call_arg_values *,
> - int *, sreal *, sreal *,
> - ipa_hints *);
> +void estimate_ipcp_clone_size_and_time (struct cgraph_node *node,
> + ipa_auto_call_arg_values *avals,
> + ipa_call_estimates *estimates);
> void ipa_merge_fn_summary_after_inlining (struct cgraph_edge *edge);
> void ipa_update_overall_fn_summary (struct cgraph_node *node, bool reset = true);
> void compute_fn_summary (struct cgraph_node *, bool);
> diff --git a/gcc/ipa-inline-analysis.c b/gcc/ipa-inline-analysis.c
> index b7af77f7b9b..acbf82e84d9 100644
> --- a/gcc/ipa-inline-analysis.c
> +++ b/gcc/ipa-inline-analysis.c
> @@ -208,16 +208,12 @@ do_estimate_edge_time (struct cgraph_edge *edge, sreal *ret_nonspec_time)
> && !opt_for_fn (callee->decl, flag_profile_partial_training)
> && !callee->count.ipa_p ())
> {
> - sreal chk_time, chk_nonspec_time;
> - int chk_size, chk_min_size;
> -
> - ipa_hints chk_hints;
> - ctx.estimate_size_and_time (&chk_size, &chk_min_size,
> - &chk_time, &chk_nonspec_time,
> - &chk_hints);
> - gcc_assert (chk_size == size && chk_time == time
> - && chk_nonspec_time == nonspec_time
> - && chk_hints == hints);
> + ipa_call_estimates chk_estimates;
> + ctx.estimate_size_and_time (&chk_estimates);
> + gcc_assert (chk_estimates.size == size
> + && chk_estimates.time == time
> + && chk_estimates.nonspecialized_time == nonspec_time
> + && chk_estimates.hints == hints);
> }
> }
> else
> @@ -227,18 +223,28 @@ do_estimate_edge_time (struct cgraph_edge *edge, sreal *ret_nonspec_time)
> else
> node_context_cache_clear++;
> e->entry.ctx.release ();
> - ctx.estimate_size_and_time (&size, &min_size,
> - &time, &nonspec_time, &hints);
> + ipa_call_estimates estimates;
> + ctx.estimate_size_and_time (&estimates);
> + size = estimates.size;
> e->entry.size = size;
> + time = estimates.time;
> e->entry.time = time;
> + nonspec_time = estimates.nonspecialized_time;
> e->entry.nonspec_time = nonspec_time;
> + hints = estimates.hints;
> e->entry.hints = hints;
> e->entry.ctx.duplicate_from (ctx);
> }
> }
> else
> - ctx.estimate_size_and_time (&size, &min_size,
> - &time, &nonspec_time, &hints);
> + {
> + ipa_call_estimates estimates;
> + ctx.estimate_size_and_time (&estimates);
> + size = estimates.size;
> + time = estimates.time;
> + nonspec_time = estimates.nonspecialized_time;
> + hints = estimates.hints;
> + }
>
> /* When we have profile feedback, we can quite safely identify hot
> edges and for those we disable size limits. Don't do that when
> @@ -321,8 +327,9 @@ do_estimate_edge_size (struct cgraph_edge *edge)
> evaluate_properties_for_edge (edge, true, &clause, &nonspec_clause,
> &avals, true);
> ipa_call_context ctx (callee, clause, nonspec_clause, vNULL, &avals);
> - ctx.estimate_size_and_time (&size, NULL, NULL, NULL, NULL);
> - return size;
> + ipa_call_estimates estimates;
> + ctx.estimate_size_and_time (&estimates, false, false);
> + return estimates.size;
> }
>
>
> @@ -332,7 +339,6 @@ do_estimate_edge_size (struct cgraph_edge *edge)
> ipa_hints
> do_estimate_edge_hints (struct cgraph_edge *edge)
> {
> - ipa_hints hints;
> struct cgraph_node *callee;
> clause_t clause, nonspec_clause;
>
> @@ -341,7 +347,7 @@ do_estimate_edge_hints (struct cgraph_edge *edge)
> if (edge_growth_cache != NULL)
> {
> do_estimate_edge_time (edge);
> - hints = edge_growth_cache->get (edge)->hints;
> + ipa_hints hints = edge_growth_cache->get (edge)->hints;
> gcc_checking_assert (hints);
> return hints - 1;
> }
> @@ -354,8 +360,9 @@ do_estimate_edge_hints (struct cgraph_edge *edge)
> evaluate_properties_for_edge (edge, true, &clause, &nonspec_clause,
> &avals, true);
> ipa_call_context ctx (callee, clause, nonspec_clause, vNULL, &avals);
> - ctx.estimate_size_and_time (NULL, NULL, NULL, NULL, &hints);
> - hints |= simple_edge_hints (edge);
> + ipa_call_estimates estimates;
> + ctx.estimate_size_and_time (&estimates, false, true);
> + ipa_hints hints = estimates.hints | simple_edge_hints (edge);
> return hints;
> }
>
> --
> 2.28.0
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 5/6] ipa-cp: Add dumping of overall_size after cloning
2020-09-21 14:25 ` [PATCH 5/6] ipa-cp: Add dumping of overall_size after cloning Martin Jambor
@ 2020-09-29 18:39 ` Jan Hubicka
0 siblings, 0 replies; 18+ messages in thread
From: Jan Hubicka @ 2020-09-29 18:39 UTC (permalink / raw)
To: Martin Jambor; +Cc: GCC Patches
> When experimenting with IPA-CP parameters, especially when looking
> into exchange2_r, it has been very useful to know what the value of
> overall_size is at different stages of the decision process. This
> patch therefore adds it to the generated dumps.
>
> gcc/ChangeLog:
>
> 2020-09-07 Martin Jambor <mjambor@suse.cz>
>
> * ipa-cp.c (estimate_local_effects): Add overeall_size to dumped
> string.
> (decide_about_value): Add dumping new overall_size.
OK,
thanks
Honza
> ---
> gcc/ipa-cp.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
> index f6320c787de..12acf24c553 100644
> --- a/gcc/ipa-cp.c
> +++ b/gcc/ipa-cp.c
> @@ -3517,7 +3517,8 @@ estimate_local_effects (struct cgraph_node *node)
>
> if (dump_file)
> fprintf (dump_file, " Decided to specialize for all "
> - "known contexts, growth deemed beneficial.\n");
> + "known contexts, growth (to %li) deemed "
> + "beneficial.\n", overall_size);
> }
> else if (dump_file && (dump_flags & TDF_DETAILS))
> fprintf (dump_file, " Not cloning for all contexts because "
> @@ -5506,6 +5507,9 @@ decide_about_value (struct cgraph_node *node, int index, HOST_WIDE_INT offset,
> val->spec_node = create_specialized_node (node, known_csts, known_contexts,
> aggvals, callers);
> overall_size += val->local_size_cost;
> + if (dump_file && (dump_flags & TDF_DETAILS))
> + fprintf (dump_file, " overall size reached %li\n",
> + overall_size);
>
> /* TODO: If for some lattice there is only one other known value
> left, make a special node for it too. */
> --
> 2.28.0
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 6/6] ipa-cp: Separate and increase the large-unit parameter
2020-09-21 14:25 ` [PATCH 6/6] ipa-cp: Separate and increase the large-unit parameter Martin Jambor
@ 2020-09-29 19:30 ` Jan Hubicka
2020-09-30 6:35 ` Richard Biener
2020-10-26 11:00 ` Tamar Christina
1 sibling, 1 reply; 18+ messages in thread
From: Jan Hubicka @ 2020-09-29 19:30 UTC (permalink / raw)
To: Martin Jambor; +Cc: GCC Patches
>
> gcc/ChangeLog:
>
> 2020-09-07 Martin Jambor <mjambor@suse.cz>
>
> * params.opt (ipa-cp-large-unit-insns): New parameter.
> * ipa-cp.c (get_max_overall_size): Use the new parameter.
OK,
thanks!
Honza
> ---
> gcc/ipa-cp.c | 2 +-
> gcc/params.opt | 4 ++++
> 2 files changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
> index 12acf24c553..2152f9e5876 100644
> --- a/gcc/ipa-cp.c
> +++ b/gcc/ipa-cp.c
> @@ -3448,7 +3448,7 @@ static long
> get_max_overall_size (cgraph_node *node)
> {
> long max_new_size = orig_overall_size;
> - long large_unit = opt_for_fn (node->decl, param_large_unit_insns);
> + long large_unit = opt_for_fn (node->decl, param_ipa_cp_large_unit_insns);
> if (max_new_size < large_unit)
> max_new_size = large_unit;
> int unit_growth = opt_for_fn (node->decl, param_ipa_cp_unit_growth);
> diff --git a/gcc/params.opt b/gcc/params.opt
> index acb59f17e45..9d177ab50ad 100644
> --- a/gcc/params.opt
> +++ b/gcc/params.opt
> @@ -218,6 +218,10 @@ Percentage penalty functions containing a single call to another function will r
> Common Joined UInteger Var(param_ipa_cp_unit_growth) Init(10) Param Optimization
> How much can given compilation unit grow because of the interprocedural constant propagation (in percent).
>
> +-param=ipa-cp-large-unit-insns=
> +Common Joined UInteger Var(param_ipa_cp_large_unit_insns) Optimization Init(16000) Param
> +The size of translation unit that IPA-CP pass considers large.
> +
> -param=ipa-cp-value-list-size=
> Common Joined UInteger Var(param_ipa_cp_value_list_size) Init(8) Param Optimization
> Maximum size of a list of values associated with each parameter for interprocedural constant propagation.
> --
> 2.28.0
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 4/6] ipa: Multiple predicates for loop properties, with frequencies
2020-09-21 14:25 ` [PATCH 4/6] ipa: Multiple predicates for loop properties, with frequencies Martin Jambor
@ 2020-09-29 22:18 ` Jan Hubicka
2020-10-02 12:31 ` Martin Jambor
0 siblings, 1 reply; 18+ messages in thread
From: Jan Hubicka @ 2020-09-29 22:18 UTC (permalink / raw)
To: Martin Jambor; +Cc: GCC Patches
> This patch enhances the ability of IPA to reason under what conditions
> loops in a function have known iteration counts or strides because it
> replaces single predicates which currently hold conjunction of
> predicates for all loops with vectors capable of holding multiple
> predicates, each with a cumulative frequency of loops with the
> property.
>
> This second property is then used by IPA-CP to much more aggressively
> boost its heuristic score for cloning opportunities which make
> iteration counts or strides of frequent loops compile time constant.
>
> gcc/ChangeLog:
>
> 2020-09-03 Martin Jambor <mjambor@suse.cz>
>
> * ipa-fnsummary.h (ipa_freqcounting_predicate): New type.
> (ipa_fn_summary): Change the type of loop_iterations and loop_strides
> to vectors of ipa_freqcounting_predicate.
> (ipa_fn_summary::ipa_fn_summary): Construct the new vectors.
> (ipa_call_estimates): New fields loops_with_known_iterations and
> loops_with_known_strides.
> * ipa-cp.c (hint_time_bonus): Multiply param_ipa_cp_loop_hint_bonus
> with the expected frequencies of loops with known iteration count or
> stride.
> * ipa-fnsummary.c (add_freqcounting_predicate): New function.
> (ipa_fn_summary::~ipa_fn_summary): Release the new vectors instead of
> just two predicates.
> (remap_hint_predicate_after_duplication): Replace with function
> remap_freqcounting_preds_after_dup.
> (ipa_fn_summary_t::duplicate): Use it or duplicate new vectors.
> (ipa_dump_fn_summary): Dump the new vectors.
> (analyze_function_body): Compute the loop property vectors.
> (ipa_call_context::estimate_size_and_time): Calculate also
> loops_with_known_iterations and loops_with_known_strides. Adjusted
> dumping accordinly.
> (remap_hint_predicate): Replace with function
> remap_freqcounting_predicate.
> (ipa_merge_fn_summary_after_inlining): Use it.
> (inline_read_section): Stream loopcounting vectors instead of two
> simple predicates.
> (ipa_fn_summary_write): Likewise.
> * params.opt (ipa-max-loop-predicates): New parameter.
> * doc/invoke.texi (ipa-max-loop-predicates): Document new param.
>
> diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c
> index 6082f34d63f..bbbb94aa930 100644
> --- a/gcc/ipa-fnsummary.c
> +++ b/gcc/ipa-fnsummary.c
> @@ -310,6 +310,36 @@ set_hint_predicate (predicate **p, predicate new_predicate)
> }
> }
>
> +/* Find if NEW_PREDICATE is already in V and if so, increment its freq.
> + Otherwise add a new item to the vector with this predicate and frerq equal
> + to add_freq, unless the number of predicates would exceed MAX_NUM_PREDICATES
> + in which case the function does nothing. */
> +
> +static void
> +add_freqcounting_predicate (vec<ipa_freqcounting_predicate, va_gc> **v,
> + const predicate &new_predicate, sreal add_freq,
> + unsigned max_num_predicates)
> +{
> + if (new_predicate == false || new_predicate == true)
> + return;
> + ipa_freqcounting_predicate *f;
> + for (int i = 0; vec_safe_iterate (*v, i, &f); i++)
> + if (new_predicate == f->predicate)
> + {
> + f->freq += add_freq;
> + return;
> + }
> + if (vec_safe_length (*v) >= max_num_predicates)
> + /* Too many different predicates to account for. */
> + return;
> +
> + ipa_freqcounting_predicate fcp;
> + fcp.predicate = NULL;
> + set_hint_predicate (&fcp.predicate, new_predicate);
> + fcp.freq = add_freq;
> + vec_safe_push (*v, fcp);
> + return;
> +}
>
> /* Compute what conditions may or may not hold given information about
> parameters. RET_CLAUSE returns truths that may hold in a specialized copy,
> @@ -710,13 +740,17 @@ ipa_call_summary::~ipa_call_summary ()
>
> ipa_fn_summary::~ipa_fn_summary ()
> {
> - if (loop_iterations)
> - edge_predicate_pool.remove (loop_iterations);
> - if (loop_stride)
> - edge_predicate_pool.remove (loop_stride);
> + unsigned len = vec_safe_length (loop_iterations);
> + for (unsigned i = 0; i < len; i++)
> + edge_predicate_pool.remove ((*loop_iterations)[i].predicate);
> + len = vec_safe_length (loop_strides);
> + for (unsigned i = 0; i < len; i++)
> + edge_predicate_pool.remove ((*loop_strides)[i].predicate);
For edges predicates are pointers since most of them have no interesting
predicate and thus NULL is more compact. I guess here it would make
snese to make predicates inline. Is there a problem with vectors not
liking non-pods?
> vec_free (conds);
> vec_free (size_time_table);
> vec_free (call_size_time_table);
> + vec_free (loop_iterations);
> + vec_free (loop_strides);
However auto_vecs should work in the brave new C++ world.
The patch looks reasonable to me. Did you check how much memory it
consumes building bigger projects? Also I am bit worried about our
ability to use it reasonably in the heuristics since it is quite
complicated value...
Honza
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 6/6] ipa-cp: Separate and increase the large-unit parameter
2020-09-29 19:30 ` Jan Hubicka
@ 2020-09-30 6:35 ` Richard Biener
2020-09-30 16:39 ` Martin Jambor
0 siblings, 1 reply; 18+ messages in thread
From: Richard Biener @ 2020-09-30 6:35 UTC (permalink / raw)
To: Jan Hubicka; +Cc: Martin Jambor, GCC Patches
On Tue, Sep 29, 2020 at 9:31 PM Jan Hubicka <hubicka@ucw.cz> wrote:
>
> >
> > gcc/ChangeLog:
> >
> > 2020-09-07 Martin Jambor <mjambor@suse.cz>
> >
> > * params.opt (ipa-cp-large-unit-insns): New parameter.
> > * ipa-cp.c (get_max_overall_size): Use the new parameter.
> OK,
Maybe the IPA CP large-unit should be a factor of the large-unit
param? Thus, make the new param ipa-cp-large-unit-factor
instead so when people increase large-unit they also get "other"
large units increased accordingly?
> thanks!
> Honza
> > ---
> > gcc/ipa-cp.c | 2 +-
> > gcc/params.opt | 4 ++++
> > 2 files changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
> > index 12acf24c553..2152f9e5876 100644
> > --- a/gcc/ipa-cp.c
> > +++ b/gcc/ipa-cp.c
> > @@ -3448,7 +3448,7 @@ static long
> > get_max_overall_size (cgraph_node *node)
> > {
> > long max_new_size = orig_overall_size;
> > - long large_unit = opt_for_fn (node->decl, param_large_unit_insns);
> > + long large_unit = opt_for_fn (node->decl, param_ipa_cp_large_unit_insns);
> > if (max_new_size < large_unit)
> > max_new_size = large_unit;
> > int unit_growth = opt_for_fn (node->decl, param_ipa_cp_unit_growth);
> > diff --git a/gcc/params.opt b/gcc/params.opt
> > index acb59f17e45..9d177ab50ad 100644
> > --- a/gcc/params.opt
> > +++ b/gcc/params.opt
> > @@ -218,6 +218,10 @@ Percentage penalty functions containing a single call to another function will r
> > Common Joined UInteger Var(param_ipa_cp_unit_growth) Init(10) Param Optimization
> > How much can given compilation unit grow because of the interprocedural constant propagation (in percent).
> >
> > +-param=ipa-cp-large-unit-insns=
> > +Common Joined UInteger Var(param_ipa_cp_large_unit_insns) Optimization Init(16000) Param
> > +The size of translation unit that IPA-CP pass considers large.
> > +
> > -param=ipa-cp-value-list-size=
> > Common Joined UInteger Var(param_ipa_cp_value_list_size) Init(8) Param Optimization
> > Maximum size of a list of values associated with each parameter for interprocedural constant propagation.
> > --
> > 2.28.0
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 6/6] ipa-cp: Separate and increase the large-unit parameter
2020-09-30 6:35 ` Richard Biener
@ 2020-09-30 16:39 ` Martin Jambor
0 siblings, 0 replies; 18+ messages in thread
From: Martin Jambor @ 2020-09-30 16:39 UTC (permalink / raw)
To: Richard Biener, Jan Hubicka; +Cc: GCC Patches
Hi,
On Wed, Sep 30 2020, Richard Biener wrote:
> On Tue, Sep 29, 2020 at 9:31 PM Jan Hubicka <hubicka@ucw.cz> wrote:
>>
>> >
>> > gcc/ChangeLog:
>> >
>> > 2020-09-07 Martin Jambor <mjambor@suse.cz>
>> >
>> > * params.opt (ipa-cp-large-unit-insns): New parameter.
>> > * ipa-cp.c (get_max_overall_size): Use the new parameter.
>> OK,
>
> Maybe the IPA CP large-unit should be a factor of the large-unit
> param? Thus, make the new param ipa-cp-large-unit-factor
> instead so when people increase large-unit they also get "other"
> large units increased accordingly?
I do not have a very strong opinion about this but I think that having
two separate parameters will make it easier for us to experiment with
the passes and is probably easier to document and thus also easier for
users who want to play with this to understand.
On the other hand, having a single param to tune sensitivity of all IPA
towards sizes - or just what big size means - does not seem like such a
big advantage to me.
But I guess I could be persuaded otherwise.
Thanks,
Martin
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/6] ipa: Bundle vectors describing argument values
2020-09-28 18:47 ` [PATCH 1/6] ipa: Bundle vectors describing argument values Martin Jambor
@ 2020-10-02 11:54 ` Jan Hubicka
0 siblings, 0 replies; 18+ messages in thread
From: Jan Hubicka @ 2020-10-02 11:54 UTC (permalink / raw)
To: Martin Jambor; +Cc: GCC Patches
> 2020-09-01 Martin Jambor <mjambor@suse.cz>
>
> * ipa-prop.h (ipa_auto_call_arg_values): New type.
> (class ipa_call_arg_values): Likewise.
> (ipa_get_indirect_edge_target): Replaced vector arguments with
> ipa_call_arg_values in declaration. Added an overload for
> ipa_auto_call_arg_values.
> * ipa-fnsummary.h (ipa_call_context): Removed members m_known_vals,
> m_known_contexts, m_known_aggs, duplicate_from, release and equal_to,
> new members m_avals, store_to_cache and equivalent_to_p. Adjusted
> construcotr arguments.
> (estimate_ipcp_clone_size_and_time): Replaced vector arguments
> with ipa_auto_call_arg_values in declaration.
> (evaluate_properties_for_edge): Likewise.
> * ipa-cp.c (ipa_get_indirect_edge_target): Adjusted to work on
> ipa_call_arg_values rather than on separate vectors. Added an
> overload for ipa_auto_call_arg_values.
> (devirtualization_time_bonus): Adjusted to work on
> ipa_auto_call_arg_values rather than on separate vectors.
> (gather_context_independent_values): Adjusted to work on
> ipa_auto_call_arg_values rather than on separate vectors.
> (perform_estimation_of_a_value): Likewise.
> (estimate_local_effects): Likewise.
> (modify_known_vectors_with_val): Adjusted both variants to work on
> ipa_auto_call_arg_values and rename them to
> copy_known_vectors_add_val.
> (decide_about_value): Adjusted to work on ipa_call_arg_values rather
> than on separate vectors.
> (decide_whether_version_node): Likewise.
> * ipa-fnsummary.c (evaluate_conditions_for_known_args): Likewise.
> (evaluate_properties_for_edge): Likewise.
> (ipa_fn_summary_t::duplicate): Likewise.
> (estimate_edge_devirt_benefit): Adjusted to work on
> ipa_call_arg_values rather than on separate vectors.
> (estimate_edge_size_and_time): Likewise.
> (estimate_calls_size_and_time_1): Likewise.
> (summarize_calls_size_and_time): Adjusted calls to
> estimate_edge_size_and_time.
> (estimate_calls_size_and_time): Adjusted to work on
> ipa_call_arg_values rather than on separate vectors.
> (ipa_call_context::ipa_call_context): Construct from a pointer to
> ipa_auto_call_arg_values instead of inividual vectors.
> (ipa_call_context::duplicate_from): Adjusted to access vectors within
> m_avals.
> (ipa_call_context::release): Likewise.
> (ipa_call_context::equal_to): Likewise.
> (ipa_call_context::estimate_size_and_time): Adjusted to work on
> ipa_call_arg_values rather than on separate vectors.
> (estimate_ipcp_clone_size_and_time): Adjusted to work with
> ipa_auto_call_arg_values rather than on separate vectors.
> (ipa_merge_fn_summary_after_inlining): Likewise. Adjusted call to
> estimate_edge_size_and_time.
> (ipa_update_overall_fn_summary): Adjusted call to
> estimate_edge_size_and_time.
> * ipa-inline-analysis.c (do_estimate_edge_time): Adjusted to work with
> ipa_auto_call_arg_values rather than with separate vectors.
> (do_estimate_edge_size): Likewise.
> (do_estimate_edge_hints): Likewise.
> * ipa-prop.c (ipa_auto_call_arg_values::~ipa_auto_call_arg_values):
> New destructor.
> @@ -328,14 +326,9 @@ private:
> /* Inline summary maintains info about change probabilities. */
> vec<inline_param_summary> m_inline_param_summary;
>
> - /* The following is used only to resolve indirect calls. */
> -
> - /* Vector describing known values of parameters. */
> - vec<tree> m_known_vals;
> - /* Vector describing known polymorphic call contexts. */
> - vec<ipa_polymorphic_call_context> m_known_contexts;
> - /* Vector describing known aggregate values. */
> - vec<ipa_agg_value_set> m_known_aggs;
> + /* Even after having calculated clauses, the information about argument
> + values is used to resolve indirect calls. */
> + ipa_call_arg_values m_avals;
In cached context we keep the vectors populated only if they are going
to be used by the predicates. Is this preserved?
Otherwise the patch looks OK. It would be nice to test it on building
bigger project and see how memory allocation is affeced by your change.
Honza
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 4/6] ipa: Multiple predicates for loop properties, with frequencies
2020-09-29 22:18 ` Jan Hubicka
@ 2020-10-02 12:31 ` Martin Jambor
0 siblings, 0 replies; 18+ messages in thread
From: Martin Jambor @ 2020-10-02 12:31 UTC (permalink / raw)
To: Jan Hubicka; +Cc: GCC Patches
Hi,
On Wed, Sep 30 2020, Jan Hubicka wrote:
>> This patch enhances the ability of IPA to reason under what conditions
>> loops in a function have known iteration counts or strides because it
>> replaces single predicates which currently hold conjunction of
>> predicates for all loops with vectors capable of holding multiple
>> predicates, each with a cumulative frequency of loops with the
>> property.
>>
>> This second property is then used by IPA-CP to much more aggressively
>> boost its heuristic score for cloning opportunities which make
>> iteration counts or strides of frequent loops compile time constant.
>>
>> gcc/ChangeLog:
>>
>> 2020-09-03 Martin Jambor <mjambor@suse.cz>
>>
>> * ipa-fnsummary.h (ipa_freqcounting_predicate): New type.
>> (ipa_fn_summary): Change the type of loop_iterations and loop_strides
>> to vectors of ipa_freqcounting_predicate.
>> (ipa_fn_summary::ipa_fn_summary): Construct the new vectors.
>> (ipa_call_estimates): New fields loops_with_known_iterations and
>> loops_with_known_strides.
>> * ipa-cp.c (hint_time_bonus): Multiply param_ipa_cp_loop_hint_bonus
>> with the expected frequencies of loops with known iteration count or
>> stride.
>> * ipa-fnsummary.c (add_freqcounting_predicate): New function.
>> (ipa_fn_summary::~ipa_fn_summary): Release the new vectors instead of
>> just two predicates.
>> (remap_hint_predicate_after_duplication): Replace with function
>> remap_freqcounting_preds_after_dup.
>> (ipa_fn_summary_t::duplicate): Use it or duplicate new vectors.
>> (ipa_dump_fn_summary): Dump the new vectors.
>> (analyze_function_body): Compute the loop property vectors.
>> (ipa_call_context::estimate_size_and_time): Calculate also
>> loops_with_known_iterations and loops_with_known_strides. Adjusted
>> dumping accordinly.
>> (remap_hint_predicate): Replace with function
>> remap_freqcounting_predicate.
>> (ipa_merge_fn_summary_after_inlining): Use it.
>> (inline_read_section): Stream loopcounting vectors instead of two
>> simple predicates.
>> (ipa_fn_summary_write): Likewise.
>> * params.opt (ipa-max-loop-predicates): New parameter.
>> * doc/invoke.texi (ipa-max-loop-predicates): Document new param.
>>
>> diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c
>> index 6082f34d63f..bbbb94aa930 100644
>> --- a/gcc/ipa-fnsummary.c
>> +++ b/gcc/ipa-fnsummary.c
>> @@ -310,6 +310,36 @@ set_hint_predicate (predicate **p, predicate new_predicate)
>> }
>> }
>>
>> +/* Find if NEW_PREDICATE is already in V and if so, increment its freq.
>> + Otherwise add a new item to the vector with this predicate and frerq equal
>> + to add_freq, unless the number of predicates would exceed MAX_NUM_PREDICATES
>> + in which case the function does nothing. */
>> +
>> +static void
>> +add_freqcounting_predicate (vec<ipa_freqcounting_predicate, va_gc> **v,
>> + const predicate &new_predicate, sreal add_freq,
>> + unsigned max_num_predicates)
>> +{
>> + if (new_predicate == false || new_predicate == true)
>> + return;
>> + ipa_freqcounting_predicate *f;
>> + for (int i = 0; vec_safe_iterate (*v, i, &f); i++)
>> + if (new_predicate == f->predicate)
>> + {
>> + f->freq += add_freq;
>> + return;
>> + }
>> + if (vec_safe_length (*v) >= max_num_predicates)
>> + /* Too many different predicates to account for. */
>> + return;
>> +
>> + ipa_freqcounting_predicate fcp;
>> + fcp.predicate = NULL;
>> + set_hint_predicate (&fcp.predicate, new_predicate);
>> + fcp.freq = add_freq;
>> + vec_safe_push (*v, fcp);
>> + return;
>> +}
>>
>> /* Compute what conditions may or may not hold given information about
>> parameters. RET_CLAUSE returns truths that may hold in a specialized copy,
>> @@ -710,13 +740,17 @@ ipa_call_summary::~ipa_call_summary ()
>>
>> ipa_fn_summary::~ipa_fn_summary ()
>> {
>> - if (loop_iterations)
>> - edge_predicate_pool.remove (loop_iterations);
>> - if (loop_stride)
>> - edge_predicate_pool.remove (loop_stride);
>> + unsigned len = vec_safe_length (loop_iterations);
>> + for (unsigned i = 0; i < len; i++)
>> + edge_predicate_pool.remove ((*loop_iterations)[i].predicate);
>> + len = vec_safe_length (loop_strides);
>> + for (unsigned i = 0; i < len; i++)
>> + edge_predicate_pool.remove ((*loop_strides)[i].predicate);
>
> For edges predicates are pointers since most of them have no interesting
> predicate and thus NULL is more compact. I guess here it would make
> snese to make predicates inline. Is there a problem with vectors not
> liking non-pods?
>> vec_free (conds);
>> vec_free (size_time_table);
>> vec_free (call_size_time_table);
>> + vec_free (loop_iterations);
>> + vec_free (loop_strides);
>
> However auto_vecs should work in the brave new C++ world.
Well, the summary lives in GC memory, so I don't think I can put
auto_vecs there.
I will add a note to look into putting a predicate directly instead as a
pointer to ipa_freqcounting_predicate as a follow-up patch.
>
> The patch looks reasonable to me. Did you check how much memory it
> consumes building bigger projects? Also I am bit worried about our
> ability to use it reasonably in the heuristics since it is quite
> complicated value...
Thanks!
Martin
^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [PATCH 6/6] ipa-cp: Separate and increase the large-unit parameter
2020-09-21 14:25 ` [PATCH 6/6] ipa-cp: Separate and increase the large-unit parameter Martin Jambor
2020-09-29 19:30 ` Jan Hubicka
@ 2020-10-26 11:00 ` Tamar Christina
1 sibling, 0 replies; 18+ messages in thread
From: Tamar Christina @ 2020-10-26 11:00 UTC (permalink / raw)
To: Martin Jambor, GCC Patches; +Cc: Jan Hubicka
Hi Martin,
I have been playing with --param ipa-cp-large-unit-insns but it doesn't seem to have any meaningful effect on
exchange2 and I still can't recover the 12% regression vs GCC 10.
Do I need to use another parameter here?
Thanks,
Tamar
> -----Original Message-----
> From: Gcc-patches <gcc-patches-bounces@gcc.gnu.org> On Behalf Of Martin
> Jambor
> Sent: Monday, September 21, 2020 3:25 PM
> To: GCC Patches <gcc-patches@gcc.gnu.org>
> Cc: Jan Hubicka <jh@suse.cz>
> Subject: [PATCH 6/6] ipa-cp: Separate and increase the large-unit parameter
>
> A previous patch in the series has taught IPA-CP to identify the important
> cloning opportunities in 548.exchange2_r as worthwhile on their own, but
> the optimization is still prevented from taking place because of the overall
> unit-growh limit. This patches raises that limit so that it takes place and the
> benchmark runs 30% faster (on AMD
> Zen2 CPU at least).
>
> Before this patch, IPA-CP uses the following formulae to arrive at the
> overall_size limit:
>
> base = MAX(orig_size, param_large_unit_insns) unit_growth_limit = base +
> base * param_ipa_cp_unit_growth / 100
>
> since param_ipa_cp_unit_growth has default 10, param_large_unit_insns
> has default value 10000.
>
> The problem with exchange2 (at least on zen2 but I have had a quick look on
> aarch64 too) is that the original estimated unit size is 10513 and so
> param_large_unit_insns does not apply and the default limit is therefore
> 11564 which is good enough only for one of the ideal 8 clonings, we need the
> limit to be at least 16291.
>
> I would like to raise param_ipa_cp_unit_growth a little bit more soon too,
> but most certainly not to 55. Therefore, the large_unit must be increased. In
> this patch, I decided to decouple the inlining and ipa-cp large-unit parameters.
> It also makes sense because IPA-CP uses it only at -O3 while inlining also at -
> O2 (IIUC). But if we agree we can try raising param_large_unit_insns to 13-14
> thousand "instructions," perhaps it is not necessary. But then again, it may
> make sense to actually increase the IPA-CP limit further.
>
> I plan to experiment with IPA-CP tuning on a larger set of programs.
> Meanwhile, mainly to address the 548.exchange2_r regression, I'm
> suggesting this simple change.
>
> gcc/ChangeLog:
>
> 2020-09-07 Martin Jambor <mjambor@suse.cz>
>
> * params.opt (ipa-cp-large-unit-insns): New parameter.
> * ipa-cp.c (get_max_overall_size): Use the new parameter.
> ---
> gcc/ipa-cp.c | 2 +-
> gcc/params.opt | 4 ++++
> 2 files changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c index 12acf24c553..2152f9e5876 100644
> --- a/gcc/ipa-cp.c
> +++ b/gcc/ipa-cp.c
> @@ -3448,7 +3448,7 @@ static long
> get_max_overall_size (cgraph_node *node) {
> long max_new_size = orig_overall_size;
> - long large_unit = opt_for_fn (node->decl, param_large_unit_insns);
> + long large_unit = opt_for_fn (node->decl,
> + param_ipa_cp_large_unit_insns);
> if (max_new_size < large_unit)
> max_new_size = large_unit;
> int unit_growth = opt_for_fn (node->decl, param_ipa_cp_unit_growth);
> diff --git a/gcc/params.opt b/gcc/params.opt index
> acb59f17e45..9d177ab50ad 100644
> --- a/gcc/params.opt
> +++ b/gcc/params.opt
> @@ -218,6 +218,10 @@ Percentage penalty functions containing a single call
> to another function will r Common Joined UInteger
> Var(param_ipa_cp_unit_growth) Init(10) Param Optimization How much can
> given compilation unit grow because of the interprocedural constant
> propagation (in percent).
>
> +-param=ipa-cp-large-unit-insns=
> +Common Joined UInteger Var(param_ipa_cp_large_unit_insns)
> Optimization
> +Init(16000) Param The size of translation unit that IPA-CP pass considers
> large.
> +
> -param=ipa-cp-value-list-size=
> Common Joined UInteger Var(param_ipa_cp_value_list_size) Init(8) Param
> Optimization Maximum size of a list of values associated with each
> parameter for interprocedural constant propagation.
> --
> 2.28.0
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 6/6] ipa-cp: Separate and increase the large-unit parameter
@ 2020-09-07 19:41 Martin Jambor
0 siblings, 0 replies; 18+ messages in thread
From: Martin Jambor @ 2020-09-07 19:41 UTC (permalink / raw)
To: GCC Patches; +Cc: Jan Hubicka
Hi,
a previous patch in the series has taught IPA-CP to identify the
important cloning opportunities in 548.exchange2_r as worthwhile on
their own, but the optimization is still prevented from taking place
because of the overall unit-growh limit. This patches raises that
limit so that it takes place and the benchmark runs 30% faster (on AMD
Zen2 CPU at least).
Before this patch, IPA-CP uses the following formulae to arrive at the
overall_size limit:
base = MAX(orig_size, param_large_unit_insns)
overall_size_limit = base + base * param_ipa_cp_unit_growth / 100
since param_ipa_cp_unit_growth has default 10, param_large_unit_insns
has default value 10000.
The problem with exchange2 (at least on zen2 but I have had a quick
look on aarch64 too) is that the original estimated unit size is 10513
and so param_large_unit_insns does not apply and the default limit is
therefore 11564 which is good enough only for one of the ideal 8
clonings, we need the limit to be at least 16291.
I would like to raise param_ipa_cp_unit_growth a little bit more soon
too, but most certainly not to 55. Therefore, the large_unit must be
increased. In this patch, I decided to decouple the inlining and
ipa-cp large-unit parameters. It also makes sense because IPA-CP uses
it only at -O3 while inlining also at -O2 (IIUC). But if we agree we
can try raising param_large_unit_insns to 13-14 thousand
"instructions," perhaps it is not necessary. But then again, it may
make sense to actually increase the IPA-CP limit further.
I plan to experiment with IPA-CP tuning on a larger set of programs.
Meanwhile, mainly to address the 548.exchange2_r regression, I'm
suggesting this simple change.
Bootstrapped and tested and LTO bootstrapped on x86_64 as a part of
the whole series.
OK for trunk?
Thanks,
Martin
gcc/ChangeLog:
2020-09-07 Martin Jambor <mjambor@suse.cz>
* params.opt (ipa-cp-large-unit-insns): New parameter.
* ipa-cp.c (get_max_overall_size): Use the new parameter.
---
gcc/ipa-cp.c | 2 +-
gcc/params.opt | 4 ++++
2 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index 12acf24c553..2152f9e5876 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -3448,7 +3448,7 @@ static long
get_max_overall_size (cgraph_node *node)
{
long max_new_size = orig_overall_size;
- long large_unit = opt_for_fn (node->decl, param_large_unit_insns);
+ long large_unit = opt_for_fn (node->decl, param_ipa_cp_large_unit_insns);
if (max_new_size < large_unit)
max_new_size = large_unit;
int unit_growth = opt_for_fn (node->decl, param_ipa_cp_unit_growth);
diff --git a/gcc/params.opt b/gcc/params.opt
index 97509963d71..ef2c1f81dd7 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -214,6 +214,10 @@ Percentage penalty functions containing a single call to another function will r
Common Joined UInteger Var(param_ipa_cp_unit_growth) Init(10) Param Optimization
How much can given compilation unit grow because of the interprocedural constant propagation (in percent).
+-param=ipa-cp-large-unit-insns=
+Common Joined UInteger Var(param_ipa_cp_large_unit_insns) Optimization Init(16000) Param
+The size of translation unit that IPA-CP pass considers large.
+
-param=ipa-cp-value-list-size=
Common Joined UInteger Var(param_ipa_cp_value_list_size) Init(8) Param Optimization
Maximum size of a list of values associated with each parameter for interprocedural constant propagation.
--
2.28.0
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2020-10-26 11:00 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-29 18:12 [PATCH 0/6] IPA cleanups and IPA-CP improvements for 548.exchange2_r Martin Jambor
2020-09-21 14:25 ` [PATCH 5/6] ipa-cp: Add dumping of overall_size after cloning Martin Jambor
2020-09-29 18:39 ` Jan Hubicka
2020-09-21 14:25 ` [PATCH 4/6] ipa: Multiple predicates for loop properties, with frequencies Martin Jambor
2020-09-29 22:18 ` Jan Hubicka
2020-10-02 12:31 ` Martin Jambor
2020-09-21 14:25 ` [PATCH 6/6] ipa-cp: Separate and increase the large-unit parameter Martin Jambor
2020-09-29 19:30 ` Jan Hubicka
2020-09-30 6:35 ` Richard Biener
2020-09-30 16:39 ` Martin Jambor
2020-10-26 11:00 ` Tamar Christina
2020-09-28 18:47 ` [PATCH 1/6] ipa: Bundle vectors describing argument values Martin Jambor
2020-10-02 11:54 ` Jan Hubicka
2020-09-28 18:47 ` [PATCH 3/6] ipa: Bundle estimates of ipa_call_context::estimate_size_and_time Martin Jambor
2020-09-29 18:39 ` Jan Hubicka
2020-09-28 18:47 ` [PATCH 2/6] ipa: Introduce ipa_cached_call_context Martin Jambor
2020-09-29 18:27 ` Jan Hubicka
-- strict thread matches above, loose matches on Subject: below --
2020-09-07 19:41 [PATCH 6/6] ipa-cp: Separate and increase the large-unit parameter Martin Jambor
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).