[PATCH] tree-optimization/104912 - ensure cost model is checked first

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] tree-optimization/104912 - ensure cost model is checked first
@ 2022-03-21 15:10 Richard Biener
  2022-03-31 12:57 ` Richard Sandiford
  0 siblings, 1 reply; 3+ messages in thread
From: Richard Biener @ 2022-03-21 15:10 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.sandiford, Jakub Jelinek

The following makes sure that when we build the versioning condition
for vectorization including the cost model check, we check for the
cost model and branch over other versioning checks.  That is what
the cost modeling assumes, since the cost model check is the only
one accounted for in the scalar outside cost.  Currently we emit
all checks as straight-line code combined with bitwise ops which
can result in surprising ordering of checks in the final assembly.

Since loop_version accepts only a single versioning condition
the splitting is done after the fact.

The result is a 1.5% speedup of 416.gamess on x86_64 when compiling
with -Ofast and tuning for generic or skylake.  That's not enough
to recover from the slowdown when vectorizing but it now cuts off
the expensive alias versioning test.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

OK for trunk?

For the rest of the regression my plan is to somehow factor in
the evolution of the number of iterations in the outer loop
(which is {1, +, 1}) to somehow bump the static profitability
estimate and together with the "cheap" cost model check never
execute the vectorized version (well, it is actually never executed,
but only because the alias check fails).

Thanks,
Richard.

2022-03-21  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/104912
	* tree-vect-loop-manip.cc (vect_loop_versioning): Split
	the cost model check to a separate BB to make sure it is
	checked first and not combined with other version checks.
---
 gcc/tree-vect-loop-manip.cc | 53 ++++++++++++++++++++++++++++++++++---
 1 file changed, 50 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index a7bbc916bbc..8ef333eb31b 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -3445,13 +3445,28 @@ vect_loop_versioning (loop_vec_info loop_vinfo,
 	cond_expr = expr;
     }
 
+  tree cost_name = NULL_TREE;
+  if (cond_expr
+      && !integer_truep (cond_expr)
+      && (version_niter
+	  || version_align
+	  || version_alias
+	  || version_simd_if_cond))
+    cost_name = cond_expr = force_gimple_operand_1 (unshare_expr (cond_expr),
+						    &cond_expr_stmt_list,
+						    is_gimple_val, NULL_TREE);
+
   if (version_niter)
     vect_create_cond_for_niters_checks (loop_vinfo, &cond_expr);
 
   if (cond_expr)
-    cond_expr = force_gimple_operand_1 (unshare_expr (cond_expr),
-					&cond_expr_stmt_list,
-					is_gimple_condexpr, NULL_TREE);
+    {
+      gimple_seq tem = NULL;
+      cond_expr = force_gimple_operand_1 (unshare_expr (cond_expr),
+					  &tem,
+					  is_gimple_condexpr, NULL_TREE);
+      gimple_seq_add_seq (&cond_expr_stmt_list, tem);
+    }
 
   if (version_align)
     vect_create_cond_for_align_checks (loop_vinfo, &cond_expr,
@@ -3654,6 +3669,38 @@ vect_loop_versioning (loop_vec_info loop_vinfo,
       update_ssa (TODO_update_ssa);
     }
 
+  /* Split the cost model check off to a separate BB.  Costing assumes
+     this is the only thing we perform when we enter the scalar loop.  */
+  if (cost_name)
+    {
+      gimple *def = SSA_NAME_DEF_STMT (cost_name);
+      /* All uses of the cost check are 'true' after the check we
+	 are going to insert.  */
+      replace_uses_by (cost_name, boolean_true_node);
+      /* And we're going to build the new single use of it.  */
+      gcond *cond = gimple_build_cond (NE_EXPR, cost_name, boolean_false_node,
+				       NULL_TREE, NULL_TREE);
+      edge e = split_block (gimple_bb (def), def);
+      gimple_stmt_iterator gsi = gsi_for_stmt (def);
+      gsi_insert_after (&gsi, cond, GSI_NEW_STMT);
+      edge true_e, false_e;
+      extract_true_false_edges_from_block (e->dest, &true_e, &false_e);
+      e->flags &= ~EDGE_FALLTHRU;
+      e->flags |= EDGE_TRUE_VALUE;
+      edge e2 = make_edge (e->src, false_e->dest, EDGE_FALSE_VALUE);
+      e->probability = prob;
+      e2->probability = prob.invert ();
+      set_immediate_dominator (CDI_DOMINATORS, false_e->dest, e->src);
+      auto_vec<basic_block, 3> adj;
+      for (basic_block son = first_dom_son (CDI_DOMINATORS, e->dest);
+	   son;
+	   son = next_dom_son (CDI_DOMINATORS, son))
+	if (EDGE_COUNT (son->preds) > 1)
+	  adj.safe_push (son);
+      for (auto son : adj)
+	set_immediate_dominator (CDI_DOMINATORS, son, e->src);
+    }
+
   if (version_niter)
     {
       /* The versioned loop could be infinite, we need to clear existing
-- 
2.34.1

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] tree-optimization/104912 - ensure cost model is checked first
  2022-03-21 15:10 [PATCH] tree-optimization/104912 - ensure cost model is checked first Richard Biener
@ 2022-03-31 12:57 ` Richard Sandiford
  2022-03-31 13:26   ` Richard Biener
  0 siblings, 1 reply; 3+ messages in thread
From: Richard Sandiford @ 2022-03-31 12:57 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Jakub Jelinek

Richard Biener <rguenther@suse.de> writes:
> The following makes sure that when we build the versioning condition
> for vectorization including the cost model check, we check for the
> cost model and branch over other versioning checks.  That is what
> the cost modeling assumes, since the cost model check is the only
> one accounted for in the scalar outside cost.  Currently we emit
> all checks as straight-line code combined with bitwise ops which
> can result in surprising ordering of checks in the final assembly.

Yeah, this had bugged me too, and meant that we made some bad
decisions in some of the local benchmarks we use.  Was just afraid
to poke at it, since it seemed like a deliberate decision. :-)

> Since loop_version accepts only a single versioning condition
> the splitting is done after the fact.
>
> The result is a 1.5% speedup of 416.gamess on x86_64 when compiling
> with -Ofast and tuning for generic or skylake.  That's not enough
> to recover from the slowdown when vectorizing but it now cuts off
> the expensive alias versioning test.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>
> OK for trunk?
>
> For the rest of the regression my plan is to somehow factor in
> the evolution of the number of iterations in the outer loop
> (which is {1, +, 1}) to somehow bump the static profitability
> estimate and together with the "cheap" cost model check never
> execute the vectorized version (well, it is actually never executed,
> but only because the alias check fails).
>
> Thanks,
> Richard.
>
> 2022-03-21  Richard Biener  <rguenther@suse.de>
>
> 	PR tree-optimization/104912
> 	* tree-vect-loop-manip.cc (vect_loop_versioning): Split
> 	the cost model check to a separate BB to make sure it is
> 	checked first and not combined with other version checks.
> ---
>  gcc/tree-vect-loop-manip.cc | 53 ++++++++++++++++++++++++++++++++++---
>  1 file changed, 50 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index a7bbc916bbc..8ef333eb31b 100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -3445,13 +3445,28 @@ vect_loop_versioning (loop_vec_info loop_vinfo,
>  	cond_expr = expr;
>      }
>  
> +  tree cost_name = NULL_TREE;
> +  if (cond_expr
> +      && !integer_truep (cond_expr)
> +      && (version_niter
> +	  || version_align
> +	  || version_alias
> +	  || version_simd_if_cond))
> +    cost_name = cond_expr = force_gimple_operand_1 (unshare_expr (cond_expr),
> +						    &cond_expr_stmt_list,
> +						    is_gimple_val, NULL_TREE);
> +
>    if (version_niter)
>      vect_create_cond_for_niters_checks (loop_vinfo, &cond_expr);
>  
>    if (cond_expr)
> -    cond_expr = force_gimple_operand_1 (unshare_expr (cond_expr),
> -					&cond_expr_stmt_list,
> -					is_gimple_condexpr, NULL_TREE);
> +    {
> +      gimple_seq tem = NULL;
> +      cond_expr = force_gimple_operand_1 (unshare_expr (cond_expr),
> +					  &tem,
> +					  is_gimple_condexpr, NULL_TREE);
> +      gimple_seq_add_seq (&cond_expr_stmt_list, tem);
> +    }
>  
>    if (version_align)
>      vect_create_cond_for_align_checks (loop_vinfo, &cond_expr,
> @@ -3654,6 +3669,38 @@ vect_loop_versioning (loop_vec_info loop_vinfo,
>        update_ssa (TODO_update_ssa);
>      }
>  
> +  /* Split the cost model check off to a separate BB.  Costing assumes
> +     this is the only thing we perform when we enter the scalar loop.  */

Maybe “…from a failed cost decision” or something?  Might sounds out
of context like it applied more generally.

> +  if (cost_name)
> +    {
> +      gimple *def = SSA_NAME_DEF_STMT (cost_name);

I realise it should only happen very rarely if at all, but is it
absolutely guaranteed that the cost condition doesn't fold to a
constant?

> +      /* All uses of the cost check are 'true' after the check we
> +	 are going to insert.  */
> +      replace_uses_by (cost_name, boolean_true_node);
> +      /* And we're going to build the new single use of it.  */
> +      gcond *cond = gimple_build_cond (NE_EXPR, cost_name, boolean_false_node,
> +				       NULL_TREE, NULL_TREE);
> +      edge e = split_block (gimple_bb (def), def);
> +      gimple_stmt_iterator gsi = gsi_for_stmt (def);
> +      gsi_insert_after (&gsi, cond, GSI_NEW_STMT);
> +      edge true_e, false_e;
> +      extract_true_false_edges_from_block (e->dest, &true_e, &false_e);
> +      e->flags &= ~EDGE_FALLTHRU;
> +      e->flags |= EDGE_TRUE_VALUE;
> +      edge e2 = make_edge (e->src, false_e->dest, EDGE_FALSE_VALUE);
> +      e->probability = prob;
> +      e2->probability = prob.invert ();

Is reusing prob the right thing to do?  Wouldn't the path to the vector
loop end up with a probability of PROB^2?

> +      set_immediate_dominator (CDI_DOMINATORS, false_e->dest, e->src);
> +      auto_vec<basic_block, 3> adj;
> +      for (basic_block son = first_dom_son (CDI_DOMINATORS, e->dest);
> +	   son;
> +	   son = next_dom_son (CDI_DOMINATORS, son))
> +	if (EDGE_COUNT (son->preds) > 1)
> +	  adj.safe_push (son);
> +      for (auto son : adj)
> +	set_immediate_dominator (CDI_DOMINATORS, son, e->src);

Seems unfortunate that so much bespoke code is needed to do an operation
like this, but that's not a very constructive comment, sorry. :-)

Anyway, LGTM, but I don't think I'd really be able to spot problems.

Thanks,
Richard

> +    }
> +
>    if (version_niter)
>      {
>        /* The versioned loop could be infinite, we need to clear existing

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] tree-optimization/104912 - ensure cost model is checked first
  2022-03-31 12:57 ` Richard Sandiford
@ 2022-03-31 13:26   ` Richard Biener
  0 siblings, 0 replies; 3+ messages in thread
From: Richard Biener @ 2022-03-31 13:26 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc-patches, Jakub Jelinek, Jan Hubicka

On Thu, 31 Mar 2022, Richard Sandiford wrote:

> Richard Biener <rguenther@suse.de> writes:
> > The following makes sure that when we build the versioning condition
> > for vectorization including the cost model check, we check for the
> > cost model and branch over other versioning checks.  That is what
> > the cost modeling assumes, since the cost model check is the only
> > one accounted for in the scalar outside cost.  Currently we emit
> > all checks as straight-line code combined with bitwise ops which
> > can result in surprising ordering of checks in the final assembly.
> 
> Yeah, this had bugged me too, and meant that we made some bad
> decisions in some of the local benchmarks we use.  Was just afraid
> to poke at it, since it seemed like a deliberate decision. :-)

I guess it was rather giving up because of the way our loop-versioning
infrastructure works ...

> > Since loop_version accepts only a single versioning condition
> > the splitting is done after the fact.
> >
> > The result is a 1.5% speedup of 416.gamess on x86_64 when compiling
> > with -Ofast and tuning for generic or skylake.  That's not enough
> > to recover from the slowdown when vectorizing but it now cuts off
> > the expensive alias versioning test.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> >
> > OK for trunk?
> >
> > For the rest of the regression my plan is to somehow factor in
> > the evolution of the number of iterations in the outer loop
> > (which is {1, +, 1}) to somehow bump the static profitability
> > estimate and together with the "cheap" cost model check never
> > execute the vectorized version (well, it is actually never executed,
> > but only because the alias check fails).
> >
> > Thanks,
> > Richard.
> >
> > 2022-03-21  Richard Biener  <rguenther@suse.de>
> >
> > 	PR tree-optimization/104912
> > 	* tree-vect-loop-manip.cc (vect_loop_versioning): Split
> > 	the cost model check to a separate BB to make sure it is
> > 	checked first and not combined with other version checks.
> > ---
> >  gcc/tree-vect-loop-manip.cc | 53 ++++++++++++++++++++++++++++++++++---
> >  1 file changed, 50 insertions(+), 3 deletions(-)
> >
> > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> > index a7bbc916bbc..8ef333eb31b 100644
> > --- a/gcc/tree-vect-loop-manip.cc
> > +++ b/gcc/tree-vect-loop-manip.cc
> > @@ -3445,13 +3445,28 @@ vect_loop_versioning (loop_vec_info loop_vinfo,
> >  	cond_expr = expr;
> >      }
> >  
> > +  tree cost_name = NULL_TREE;
> > +  if (cond_expr
> > +      && !integer_truep (cond_expr)
> > +      && (version_niter
> > +	  || version_align
> > +	  || version_alias
> > +	  || version_simd_if_cond))
> > +    cost_name = cond_expr = force_gimple_operand_1 (unshare_expr (cond_expr),
> > +						    &cond_expr_stmt_list,
> > +						    is_gimple_val, NULL_TREE);
> > +
> >    if (version_niter)
> >      vect_create_cond_for_niters_checks (loop_vinfo, &cond_expr);
> >  
> >    if (cond_expr)
> > -    cond_expr = force_gimple_operand_1 (unshare_expr (cond_expr),
> > -					&cond_expr_stmt_list,
> > -					is_gimple_condexpr, NULL_TREE);
> > +    {
> > +      gimple_seq tem = NULL;
> > +      cond_expr = force_gimple_operand_1 (unshare_expr (cond_expr),
> > +					  &tem,
> > +					  is_gimple_condexpr, NULL_TREE);
> > +      gimple_seq_add_seq (&cond_expr_stmt_list, tem);
> > +    }
> >  
> >    if (version_align)
> >      vect_create_cond_for_align_checks (loop_vinfo, &cond_expr,
> > @@ -3654,6 +3669,38 @@ vect_loop_versioning (loop_vec_info loop_vinfo,
> >        update_ssa (TODO_update_ssa);
> >      }
> >  
> > +  /* Split the cost model check off to a separate BB.  Costing assumes
> > +     this is the only thing we perform when we enter the scalar loop.  */
> 
> Maybe “…from a failed cost decision” or something?  Might sounds out
> of context like it applied more generally.

Fixed.

> > +  if (cost_name)
> > +    {
> > +      gimple *def = SSA_NAME_DEF_STMT (cost_name);
> 
> I realise it should only happen very rarely if at all, but is it
> absolutely guaranteed that the cost condition doesn't fold to a
> constant?

OK, better be safe than sorry - I added a SSA_NAME check.

> > +      /* All uses of the cost check are 'true' after the check we
> > +	 are going to insert.  */
> > +      replace_uses_by (cost_name, boolean_true_node);
> > +      /* And we're going to build the new single use of it.  */
> > +      gcond *cond = gimple_build_cond (NE_EXPR, cost_name, boolean_false_node,
> > +				       NULL_TREE, NULL_TREE);
> > +      edge e = split_block (gimple_bb (def), def);
> > +      gimple_stmt_iterator gsi = gsi_for_stmt (def);
> > +      gsi_insert_after (&gsi, cond, GSI_NEW_STMT);
> > +      edge true_e, false_e;
> > +      extract_true_false_edges_from_block (e->dest, &true_e, &false_e);
> > +      e->flags &= ~EDGE_FALLTHRU;
> > +      e->flags |= EDGE_TRUE_VALUE;
> > +      edge e2 = make_edge (e->src, false_e->dest, EDGE_FALSE_VALUE);
> > +      e->probability = prob;
> > +      e2->probability = prob.invert ();
> 
> Is reusing prob the right thing to do?  Wouldn't the path to the vector
> loop end up with a probability of PROB^2?

We're statically using profile_probability::likely () for the
vector path.  But you are right, and we scale loop frequencies with
prop.  So when we replace if () { with-prob A } with
if () if () { with-prob A } then we probably want both true edges
to have a probability of sqrt (prob) and the false edges sqrt (~prob)?

It _looks_ like profile_probability::split (prob) would do the
trick.  Here we replace if (X) with if (X && Y) with overall
likely() probability.

So something like (incremental):

@@ -3446,15 +3446,21 @@ vect_loop_versioning (loop_vec_info loop_vinfo,
     }
 
   tree cost_name = NULL_TREE;
+  profile_probability prob2 = profile_probability::uninitialized ();
   if (cond_expr
       && !integer_truep (cond_expr)
       && (version_niter
          || version_align
          || version_alias
          || version_simd_if_cond))
-    cost_name = cond_expr = force_gimple_operand_1 (unshare_expr 
(cond_expr),
-                                                   &cond_expr_stmt_list,
-                                                   is_gimple_val, 
NULL_TREE);
+    {
+      cost_name = cond_expr = force_gimple_operand_1 (unshare_expr 
(cond_expr),
+                                                     
&cond_expr_stmt_list,
+                                                     is_gimple_val, 
NULL_TREE);
+      /* Split prob () into two so that the overall probability of 
passing
+         both the cost-model and versioning checks is the orig prob.  */
+      prob2 = prob.split (prob);
+    }
 
   if (version_niter)
     vect_create_cond_for_niters_checks (loop_vinfo, &cond_expr);
@@ -3688,8 +3695,8 @@ vect_loop_versioning (loop_vec_info loop_vinfo,
       e->flags &= ~EDGE_FALLTHRU;
       e->flags |= EDGE_TRUE_VALUE;
       edge e2 = make_edge (e->src, false_e->dest, EDGE_FALSE_VALUE);
-      e->probability = prob;
-      e2->probability = prob.invert ();
+      e->probability = prob2;
+      e2->probability = prob2.invert ();
       set_immediate_dominator (CDI_DOMINATORS, false_e->dest, e->src);
       auto_vec<basic_block, 3> adj;
       for (basic_block son = first_dom_son (CDI_DOMINATORS, e->dest);

might do the trick?  Honza?

> > +      set_immediate_dominator (CDI_DOMINATORS, false_e->dest, e->src);
> > +      auto_vec<basic_block, 3> adj;
> > +      for (basic_block son = first_dom_son (CDI_DOMINATORS, e->dest);
> > +	   son;
> > +	   son = next_dom_son (CDI_DOMINATORS, son))
> > +	if (EDGE_COUNT (son->preds) > 1)
> > +	  adj.safe_push (son);
> > +      for (auto son : adj)
> > +	set_immediate_dominator (CDI_DOMINATORS, son, e->src);
> 
> Seems unfortunate that so much bespoke code is needed to do an operation
> like this, but that's not a very constructive comment, sorry. :-)

Yeah :/

> Anyway, LGTM, but I don't think I'd really be able to spot problems.

Let's see what Honza thinks about the probability issue.

Thanks,
Richard.

> Thanks,
> Richard
> 
> > +    }
> > +
> >    if (version_niter)
> >      {
> >        /* The versioned loop could be infinite, we need to clear existing
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-03-31 13:26 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-21 15:10 [PATCH] tree-optimization/104912 - ensure cost model is checked first Richard Biener
2022-03-31 12:57 ` Richard Sandiford
2022-03-31 13:26   ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).