public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH][v2] Remove --param vect-inner-loop-cost-factor
@ 2021-08-23 14:33 Richard Biener
  2021-08-23 14:38 ` Jan Hubicka
  2021-08-24  2:22 ` Kewen.Lin
  0 siblings, 2 replies; 6+ messages in thread
From: Richard Biener @ 2021-08-23 14:33 UTC (permalink / raw)
  To: gcc-patches; +Cc: hubicka, richard.sandiford

This removes --param vect-inner-loop-cost-factor in favor of looking
at the estimated number of iterations of the inner loop
when available and otherwise just assumes a single inner
iteration which is conservative on the side of not vectorizing.

The alternative is to retain the --param for exactly that case,
not sure if the result is better or not.  The --param is new on
head, it was static '50' before.

Any strong opinions?

Richard.

2021-08-23  Richard Biener  <rguenther@suse.de>

	* doc/invoke.texi (vect-inner-loop-cost-factor): Remove
	documentation.
	* params.opt (--param vect-inner-loop-cost-factor): Remove.
	* tree-vect-loop.c (_loop_vec_info::_loop_vec_info):
	Initialize inner_loop_cost_factor to 1.
	(vect_analyze_loop_form): Initialize inner_loop_cost_factor
	from the estimated number of iterations of the inner loop.
---
 gcc/doc/invoke.texi  |  5 -----
 gcc/params.opt       |  4 ----
 gcc/tree-vect-loop.c | 12 +++++++++++-
 3 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index c057cc1e4ae..054950132f6 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -14385,11 +14385,6 @@ code to iterate.  2 allows partial vector loads and stores in all loops.
 The parameter only has an effect on targets that support partial
 vector loads and stores.
 
-@item vect-inner-loop-cost-factor
-The factor which the loop vectorizer applies to the cost of statements
-in an inner loop relative to the loop being vectorized.  The default
-value is 50.
-
 @item avoid-fma-max-bits
 Maximum number of bits for which we avoid creating FMAs.
 
diff --git a/gcc/params.opt b/gcc/params.opt
index f9264887b40..f7b19fa430d 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -1113,8 +1113,4 @@ Bound on number of runtime checks inserted by the vectorizer's loop versioning f
 Common Joined UInteger Var(param_vect_partial_vector_usage) Init(2) IntegerRange(0, 2) Param Optimization
 Controls how loop vectorizer uses partial vectors.  0 means never, 1 means only for loops whose need to iterate can be removed, 2 means for all loops.  The default value is 2.
 
--param=vect-inner-loop-cost-factor=
-Common Joined UInteger Var(param_vect_inner_loop_cost_factor) Init(50) IntegerRange(1, 999999) Param Optimization
-The factor which the loop vectorizer applies to the cost of statements in an inner loop relative to the loop being vectorized.
-
 ; This comment is to ensure we retain the blank line above.
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index c521b43a47c..cb48717f20e 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -841,7 +841,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared)
     single_scalar_iteration_cost (0),
     vec_outside_cost (0),
     vec_inside_cost (0),
-    inner_loop_cost_factor (param_vect_inner_loop_cost_factor),
+    inner_loop_cost_factor (1),
     vectorizable (false),
     can_use_partial_vectors_p (param_vect_partial_vector_usage != 0),
     using_partial_vectors_p (false),
@@ -1519,6 +1519,16 @@ vect_analyze_loop_form (class loop *loop, vec_info_shared *shared)
       stmt_vec_info inner_loop_cond_info
 	= loop_vinfo->lookup_stmt (inner_loop_cond);
       STMT_VINFO_TYPE (inner_loop_cond_info) = loop_exit_ctrl_vec_info_type;
+      /* If we have an estimate on the number of iterations of the inner
+	 loop use that as the scale for costing, otherwise conservatively
+	 assume a single inner iteration.  */
+      widest_int nit;
+      if (get_estimated_loop_iterations (loop->inner, &nit))
+	LOOP_VINFO_INNER_LOOP_COST_FACTOR (loop_vinfo)
+	  /* Since costing is done on unsigned int cap the scale on
+	     some large number consistent with what we'd see in
+	     CFG counts.  */
+	  = wi::smax (nit, REG_BR_PROB_BASE).to_uhwi ();
     }
 
   gcc_assert (!loop->aux);
-- 
2.31.1

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH][v2] Remove --param vect-inner-loop-cost-factor
  2021-08-23 14:33 [PATCH][v2] Remove --param vect-inner-loop-cost-factor Richard Biener
@ 2021-08-23 14:38 ` Jan Hubicka
  2021-08-24  2:22 ` Kewen.Lin
  1 sibling, 0 replies; 6+ messages in thread
From: Jan Hubicka @ 2021-08-23 14:38 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, richard.sandiford

> 
> Any strong opinions?
> 
> Richard.
> 
> 2021-08-23  Richard Biener  <rguenther@suse.de>
> 
> 	* doc/invoke.texi (vect-inner-loop-cost-factor): Remove
> 	documentation.
> 	* params.opt (--param vect-inner-loop-cost-factor): Remove.
> 	* tree-vect-loop.c (_loop_vec_info::_loop_vec_info):
> 	Initialize inner_loop_cost_factor to 1.
> 	(vect_analyze_loop_form): Initialize inner_loop_cost_factor
> 	from the estimated number of iterations of the inner loop.
> ---
>  gcc/doc/invoke.texi  |  5 -----
>  gcc/params.opt       |  4 ----
>  gcc/tree-vect-loop.c | 12 +++++++++++-
>  3 files changed, 11 insertions(+), 10 deletions(-)
> 
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index c057cc1e4ae..054950132f6 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -14385,11 +14385,6 @@ code to iterate.  2 allows partial vector loads and stores in all loops.
>  The parameter only has an effect on targets that support partial
>  vector loads and stores.
>  
> -@item vect-inner-loop-cost-factor
> -The factor which the loop vectorizer applies to the cost of statements
> -in an inner loop relative to the loop being vectorized.  The default
> -value is 50.
> -
>  @item avoid-fma-max-bits
>  Maximum number of bits for which we avoid creating FMAs.
>  
> diff --git a/gcc/params.opt b/gcc/params.opt
> index f9264887b40..f7b19fa430d 100644
> --- a/gcc/params.opt
> +++ b/gcc/params.opt
> @@ -1113,8 +1113,4 @@ Bound on number of runtime checks inserted by the vectorizer's loop versioning f
>  Common Joined UInteger Var(param_vect_partial_vector_usage) Init(2) IntegerRange(0, 2) Param Optimization
>  Controls how loop vectorizer uses partial vectors.  0 means never, 1 means only for loops whose need to iterate can be removed, 2 means for all loops.  The default value is 2.
>  
> --param=vect-inner-loop-cost-factor=
> -Common Joined UInteger Var(param_vect_inner_loop_cost_factor) Init(50) IntegerRange(1, 999999) Param Optimization
> -The factor which the loop vectorizer applies to the cost of statements in an inner loop relative to the loop being vectorized.
> -
>  ; This comment is to ensure we retain the blank line above.
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index c521b43a47c..cb48717f20e 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -841,7 +841,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared)
>      single_scalar_iteration_cost (0),
>      vec_outside_cost (0),
>      vec_inside_cost (0),
> -    inner_loop_cost_factor (param_vect_inner_loop_cost_factor),
> +    inner_loop_cost_factor (1),
>      vectorizable (false),
>      can_use_partial_vectors_p (param_vect_partial_vector_usage != 0),
>      using_partial_vectors_p (false),
> @@ -1519,6 +1519,16 @@ vect_analyze_loop_form (class loop *loop, vec_info_shared *shared)
>        stmt_vec_info inner_loop_cond_info
>  	= loop_vinfo->lookup_stmt (inner_loop_cond);
>        STMT_VINFO_TYPE (inner_loop_cond_info) = loop_exit_ctrl_vec_info_type;
> +      /* If we have an estimate on the number of iterations of the inner
> +	 loop use that as the scale for costing, otherwise conservatively
> +	 assume a single inner iteration.  */
> +      widest_int nit;
> +      if (get_estimated_loop_iterations (loop->inner, &nit))
> +	LOOP_VINFO_INNER_LOOP_COST_FACTOR (loop_vinfo)
> +	  /* Since costing is done on unsigned int cap the scale on
> +	     some large number consistent with what we'd see in
> +	     CFG counts.  */
> +	  = wi::smax (nit, REG_BR_PROB_BASE).to_uhwi ();
This looks sane to me, but for the case where profile info is missing,
you will get false from get_estimated_loop_iterations and in that case
it will think that inner loop iterates once which seems bit unrealistic
for vectorizable loop nests.

I assume you want to deterine here that reducing cost of stmt in
inner loop is more importnat than increasing cost of stmt outside..

So perhaps keeping parmeter and capping it with get_max_loop_iterations
would be sane?

Honza
>      }
>  
>    gcc_assert (!loop->aux);
> -- 
> 2.31.1

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH][v2] Remove --param vect-inner-loop-cost-factor
  2021-08-23 14:33 [PATCH][v2] Remove --param vect-inner-loop-cost-factor Richard Biener
  2021-08-23 14:38 ` Jan Hubicka
@ 2021-08-24  2:22 ` Kewen.Lin
  2021-08-24  6:58   ` Richard Biener
  1 sibling, 1 reply; 6+ messages in thread
From: Kewen.Lin @ 2021-08-24  2:22 UTC (permalink / raw)
  To: Richard Biener; +Cc: richard.sandiford, hubicka, gcc-patches

Hi Richi,

on 2021/8/23 下午10:33, Richard Biener via Gcc-patches wrote:
> This removes --param vect-inner-loop-cost-factor in favor of looking
> at the estimated number of iterations of the inner loop
> when available and otherwise just assumes a single inner
> iteration which is conservative on the side of not vectorizing.
> 

I may miss something, the factor seems to be an amplifier, a single
inner iteration on the side of not vectorizing only relies on that
vector_cost < scalar_cost, if scalar_cost < vector_cost, the direction
will be flipped? ({vector,scalar}_cost is only for inner loop part).

Since we don't calculate/compare costing for inner loop independently
and early return if scalar_cost < vector_cost for inner loop, I guess
it's possible to have "scalar_cost < vector_cost" case theoretically,
especially when targets can cost something more on vector side.

> The alternative is to retain the --param for exactly that case,
> not sure if the result is better or not.  The --param is new on
> head, it was static '50' before.
> 

I think the intention of --param is to offer ports a way to tweak
it (no ports do it for now though :)).  Not sure how target costing
is sensitive to this factor, but I also prefer to make its default
value as 50 as Honza suggested to avoid more possible tweakings.

If targets want more, maybe we can extend it to:

default_hook:
  return estimated or likely_max if either is valid;
  return default value;
  
target hook:
  val = default_hook; // or from scratch
  tweak the val as it wishes;  

I guess there is no this need for now.

> Any strong opinions?
> 
> Richard.
> 
> 2021-08-23  Richard Biener  <rguenther@suse.de>
> 
> 	* doc/invoke.texi (vect-inner-loop-cost-factor): Remove
> 	documentation.
> 	* params.opt (--param vect-inner-loop-cost-factor): Remove.
> 	* tree-vect-loop.c (_loop_vec_info::_loop_vec_info):
> 	Initialize inner_loop_cost_factor to 1.
> 	(vect_analyze_loop_form): Initialize inner_loop_cost_factor
> 	from the estimated number of iterations of the inner loop.
> ---
>  gcc/doc/invoke.texi  |  5 -----
>  gcc/params.opt       |  4 ----
>  gcc/tree-vect-loop.c | 12 +++++++++++-
>  3 files changed, 11 insertions(+), 10 deletions(-)
> 
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index c057cc1e4ae..054950132f6 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -14385,11 +14385,6 @@ code to iterate.  2 allows partial vector loads and stores in all loops.
>  The parameter only has an effect on targets that support partial
>  vector loads and stores.
>  
> -@item vect-inner-loop-cost-factor
> -The factor which the loop vectorizer applies to the cost of statements
> -in an inner loop relative to the loop being vectorized.  The default
> -value is 50.
> -
>  @item avoid-fma-max-bits
>  Maximum number of bits for which we avoid creating FMAs.
>  
> diff --git a/gcc/params.opt b/gcc/params.opt
> index f9264887b40..f7b19fa430d 100644
> --- a/gcc/params.opt
> +++ b/gcc/params.opt
> @@ -1113,8 +1113,4 @@ Bound on number of runtime checks inserted by the vectorizer's loop versioning f
>  Common Joined UInteger Var(param_vect_partial_vector_usage) Init(2) IntegerRange(0, 2) Param Optimization
>  Controls how loop vectorizer uses partial vectors.  0 means never, 1 means only for loops whose need to iterate can be removed, 2 means for all loops.  The default value is 2.
>  
> --param=vect-inner-loop-cost-factor=
> -Common Joined UInteger Var(param_vect_inner_loop_cost_factor) Init(50) IntegerRange(1, 999999) Param Optimization
> -The factor which the loop vectorizer applies to the cost of statements in an inner loop relative to the loop being vectorized.
> -
>  ; This comment is to ensure we retain the blank line above.
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index c521b43a47c..cb48717f20e 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -841,7 +841,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared)
>      single_scalar_iteration_cost (0),
>      vec_outside_cost (0),
>      vec_inside_cost (0),
> -    inner_loop_cost_factor (param_vect_inner_loop_cost_factor),
> +    inner_loop_cost_factor (1),
>      vectorizable (false),
>      can_use_partial_vectors_p (param_vect_partial_vector_usage != 0),
>      using_partial_vectors_p (false),
> @@ -1519,6 +1519,16 @@ vect_analyze_loop_form (class loop *loop, vec_info_shared *shared)
>        stmt_vec_info inner_loop_cond_info
>  	= loop_vinfo->lookup_stmt (inner_loop_cond);
>        STMT_VINFO_TYPE (inner_loop_cond_info) = loop_exit_ctrl_vec_info_type;
> +      /* If we have an estimate on the number of iterations of the inner
> +	 loop use that as the scale for costing, otherwise conservatively
> +	 assume a single inner iteration.  */
> +      widest_int nit;
> +      if (get_estimated_loop_iterations (loop->inner, &nit))
> +	LOOP_VINFO_INNER_LOOP_COST_FACTOR (loop_vinfo)
> +	  /* Since costing is done on unsigned int cap the scale on
> +	     some large number consistent with what we'd see in
> +	     CFG counts.  */
> +	  = wi::smax (nit, REG_BR_PROB_BASE).to_uhwi ();

I noticed loop-doloop.c use _int version and likely_max, maybe you want that here?
 
  est_niter = get_estimated_loop_iterations_int (loop);
  if (est_niter == -1)
    est_niter = get_likely_max_loop_iterations_int (loop)


BR,
Kewen

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH][v2] Remove --param vect-inner-loop-cost-factor
  2021-08-24  2:22 ` Kewen.Lin
@ 2021-08-24  6:58   ` Richard Biener
  2021-08-24  8:42     ` Richard Biener
  2021-08-24 10:10     ` Jan Hubicka
  0 siblings, 2 replies; 6+ messages in thread
From: Richard Biener @ 2021-08-24  6:58 UTC (permalink / raw)
  To: Kewen.Lin; +Cc: richard.sandiford, hubicka, gcc-patches

On Tue, 24 Aug 2021, Kewen.Lin wrote:

> Hi Richi,
> 
> on 2021/8/23 ??10:33, Richard Biener via Gcc-patches wrote:
> > This removes --param vect-inner-loop-cost-factor in favor of looking
> > at the estimated number of iterations of the inner loop
> > when available and otherwise just assumes a single inner
> > iteration which is conservative on the side of not vectorizing.
> > 
> 
> I may miss something, the factor seems to be an amplifier, a single
> inner iteration on the side of not vectorizing only relies on that
> vector_cost < scalar_cost, if scalar_cost < vector_cost, the direction
> will be flipped? ({vector,scalar}_cost is only for inner loop part).
> 
> Since we don't calculate/compare costing for inner loop independently
> and early return if scalar_cost < vector_cost for inner loop, I guess
> it's possible to have "scalar_cost < vector_cost" case theoretically,
> especially when targets can cost something more on vector side.

True.

> > The alternative is to retain the --param for exactly that case,
> > not sure if the result is better or not.  The --param is new on
> > head, it was static '50' before.
> > 
> 
> I think the intention of --param is to offer ports a way to tweak
> it (no ports do it for now though :)).  Not sure how target costing
> is sensitive to this factor, but I also prefer to make its default
> value as 50 as Honza suggested to avoid more possible tweakings.
> 
> If targets want more, maybe we can extend it to:
> 
> default_hook:
>   return estimated or likely_max if either is valid;
>   return default value;
>   
> target hook:
>   val = default_hook; // or from scratch
>   tweak the val as it wishes;  
> 
> I guess there is no this need for now.
>
> > Any strong opinions?
> > 
> > Richard.
> > 
> > 2021-08-23  Richard Biener  <rguenther@suse.de>
> > 
> > 	* doc/invoke.texi (vect-inner-loop-cost-factor): Remove
> > 	documentation.
> > 	* params.opt (--param vect-inner-loop-cost-factor): Remove.
> > 	* tree-vect-loop.c (_loop_vec_info::_loop_vec_info):
> > 	Initialize inner_loop_cost_factor to 1.
> > 	(vect_analyze_loop_form): Initialize inner_loop_cost_factor
> > 	from the estimated number of iterations of the inner loop.
> > ---
> >  gcc/doc/invoke.texi  |  5 -----
> >  gcc/params.opt       |  4 ----
> >  gcc/tree-vect-loop.c | 12 +++++++++++-
> >  3 files changed, 11 insertions(+), 10 deletions(-)
> > 
> > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > index c057cc1e4ae..054950132f6 100644
> > --- a/gcc/doc/invoke.texi
> > +++ b/gcc/doc/invoke.texi
> > @@ -14385,11 +14385,6 @@ code to iterate.  2 allows partial vector loads and stores in all loops.
> >  The parameter only has an effect on targets that support partial
> >  vector loads and stores.
> >  
> > -@item vect-inner-loop-cost-factor
> > -The factor which the loop vectorizer applies to the cost of statements
> > -in an inner loop relative to the loop being vectorized.  The default
> > -value is 50.
> > -
> >  @item avoid-fma-max-bits
> >  Maximum number of bits for which we avoid creating FMAs.
> >  
> > diff --git a/gcc/params.opt b/gcc/params.opt
> > index f9264887b40..f7b19fa430d 100644
> > --- a/gcc/params.opt
> > +++ b/gcc/params.opt
> > @@ -1113,8 +1113,4 @@ Bound on number of runtime checks inserted by the vectorizer's loop versioning f
> >  Common Joined UInteger Var(param_vect_partial_vector_usage) Init(2) IntegerRange(0, 2) Param Optimization
> >  Controls how loop vectorizer uses partial vectors.  0 means never, 1 means only for loops whose need to iterate can be removed, 2 means for all loops.  The default value is 2.
> >  
> > --param=vect-inner-loop-cost-factor=
> > -Common Joined UInteger Var(param_vect_inner_loop_cost_factor) Init(50) IntegerRange(1, 999999) Param Optimization
> > -The factor which the loop vectorizer applies to the cost of statements in an inner loop relative to the loop being vectorized.
> > -
> >  ; This comment is to ensure we retain the blank line above.
> > diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> > index c521b43a47c..cb48717f20e 100644
> > --- a/gcc/tree-vect-loop.c
> > +++ b/gcc/tree-vect-loop.c
> > @@ -841,7 +841,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared)
> >      single_scalar_iteration_cost (0),
> >      vec_outside_cost (0),
> >      vec_inside_cost (0),
> > -    inner_loop_cost_factor (param_vect_inner_loop_cost_factor),
> > +    inner_loop_cost_factor (1),
> >      vectorizable (false),
> >      can_use_partial_vectors_p (param_vect_partial_vector_usage != 0),
> >      using_partial_vectors_p (false),
> > @@ -1519,6 +1519,16 @@ vect_analyze_loop_form (class loop *loop, vec_info_shared *shared)
> >        stmt_vec_info inner_loop_cond_info
> >  	= loop_vinfo->lookup_stmt (inner_loop_cond);
> >        STMT_VINFO_TYPE (inner_loop_cond_info) = loop_exit_ctrl_vec_info_type;
> > +      /* If we have an estimate on the number of iterations of the inner
> > +	 loop use that as the scale for costing, otherwise conservatively
> > +	 assume a single inner iteration.  */
> > +      widest_int nit;
> > +      if (get_estimated_loop_iterations (loop->inner, &nit))
> > +	LOOP_VINFO_INNER_LOOP_COST_FACTOR (loop_vinfo)
> > +	  /* Since costing is done on unsigned int cap the scale on
> > +	     some large number consistent with what we'd see in
> > +	     CFG counts.  */
> > +	  = wi::smax (nit, REG_BR_PROB_BASE).to_uhwi ();
> 
> I noticed loop-doloop.c use _int version and likely_max, maybe you want that here?
>  
>   est_niter = get_estimated_loop_iterations_int (loop);
>   if (est_niter == -1)
>     est_niter = get_likely_max_loop_iterations_int (loop)

I think that are two different things - get_estimated_loop_iterations_int
are the average number of iterations while 
get_likely_max_loop_iterations_int is an upper bound.  I'm not sure we
want to use an upper bound for costing.

Based on feedback from Honza I'm currently testing the variant below
which keeps the --param and uses it to cap the estimated number of
iterations.  That makes the scaling more precise for inner loops that
don't iterate much but keeps the --param to avoid overflow and to
keep the present behavior when there's no reliable profile info
available.

Richard.

From d140a56b6ef95555bbfe6d228a25b1d021c7d796 Mon Sep 17 00:00:00 2001
From: Richard Biener <rguenther@suse.de>
Date: Mon, 23 Aug 2021 14:15:14 +0200
Subject: [PATCH] Adjust inner loop cost scaling
To: gcc-patches@gcc.gnu.org

This makes use of the estimated number of iterations of the inner loop
to limit --param vect-inner-loop-cost-factor scaling.  It also reduces
the maximum value of vect-inner-loop-cost-factor to 10000 making it
less likely to cause overflow of costs.

2021-08-23  Richard Biener  <rguenther@suse.de>

	* doc/invoke.texi (vect-inner-loop-cost-factor): Adjust.
	* params.opt (--param vect-inner-loop-cost-factor): Adjust
	maximum value.
	* tree-vect-loop.c (vect_analyze_loop_form): Initialize
	inner_loop_cost_factor to the minimum of the estimated number
	of iterations of the inner loop and vect-inner-loop-cost-factor.
---
 gcc/doc/invoke.texi  | 7 ++++---
 gcc/params.opt       | 4 ++--
 gcc/tree-vect-loop.c | 7 +++++++
 3 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index c057cc1e4ae..a9d56fecf4e 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -14386,9 +14386,10 @@ The parameter only has an effect on targets that support partial
 vector loads and stores.
 
 @item vect-inner-loop-cost-factor
-The factor which the loop vectorizer applies to the cost of statements
-in an inner loop relative to the loop being vectorized.  The default
-value is 50.
+The maximum factor which the loop vectorizer applies to the cost of statements
+in an inner loop relative to the loop being vectorized.  The factor applied
+is the maximum of the estimated number of iterations of the inner loop and
+this parameter.  The default value of this parameter is 50.
 
 @item avoid-fma-max-bits
 Maximum number of bits for which we avoid creating FMAs.
diff --git a/gcc/params.opt b/gcc/params.opt
index f9264887b40..f414dc1a61c 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -1114,7 +1114,7 @@ Common Joined UInteger Var(param_vect_partial_vector_usage) Init(2) IntegerRange
 Controls how loop vectorizer uses partial vectors.  0 means never, 1 means only for loops whose need to iterate can be removed, 2 means for all loops.  The default value is 2.
 
 -param=vect-inner-loop-cost-factor=
-Common Joined UInteger Var(param_vect_inner_loop_cost_factor) Init(50) IntegerRange(1, 999999) Param Optimization
-The factor which the loop vectorizer applies to the cost of statements in an inner loop relative to the loop being vectorized.
+Common Joined UInteger Var(param_vect_inner_loop_cost_factor) Init(50) IntegerRange(1, 10000) Param Optimization
+The maximum factor which the loop vectorizer applies to the cost of statements in an inner loop relative to the loop being vectorized.
 
 ; This comment is to ensure we retain the blank line above.
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index c521b43a47c..cbdd5b407da 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -1519,6 +1519,13 @@ vect_analyze_loop_form (class loop *loop, vec_info_shared *shared)
       stmt_vec_info inner_loop_cond_info
 	= loop_vinfo->lookup_stmt (inner_loop_cond);
       STMT_VINFO_TYPE (inner_loop_cond_info) = loop_exit_ctrl_vec_info_type;
+      /* If we have an estimate on the number of iterations of the inner
+	 loop use that to limit the scale for costing, otherwise use
+	 --param vect-inner-loop-cost-factor literally.  */
+      widest_int nit;
+      if (get_estimated_loop_iterations (loop->inner, &nit))
+	LOOP_VINFO_INNER_LOOP_COST_FACTOR (loop_vinfo)
+	  = wi::smin (nit, param_vect_inner_loop_cost_factor).to_uhwi ();
     }
 
   gcc_assert (!loop->aux);
-- 
2.31.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH][v2] Remove --param vect-inner-loop-cost-factor
  2021-08-24  6:58   ` Richard Biener
@ 2021-08-24  8:42     ` Richard Biener
  2021-08-24 10:10     ` Jan Hubicka
  1 sibling, 0 replies; 6+ messages in thread
From: Richard Biener @ 2021-08-24  8:42 UTC (permalink / raw)
  To: Kewen.Lin; +Cc: richard.sandiford, hubicka, gcc-patches

On Tue, 24 Aug 2021, Richard Biener wrote:

> On Tue, 24 Aug 2021, Kewen.Lin wrote:
> 
> > Hi Richi,
> > 
> > on 2021/8/23 ??10:33, Richard Biener via Gcc-patches wrote:
> > > This removes --param vect-inner-loop-cost-factor in favor of looking
> > > at the estimated number of iterations of the inner loop
> > > when available and otherwise just assumes a single inner
> > > iteration which is conservative on the side of not vectorizing.
> > > 
> > 
> > I may miss something, the factor seems to be an amplifier, a single
> > inner iteration on the side of not vectorizing only relies on that
> > vector_cost < scalar_cost, if scalar_cost < vector_cost, the direction
> > will be flipped? ({vector,scalar}_cost is only for inner loop part).
> > 
> > Since we don't calculate/compare costing for inner loop independently
> > and early return if scalar_cost < vector_cost for inner loop, I guess
> > it's possible to have "scalar_cost < vector_cost" case theoretically,
> > especially when targets can cost something more on vector side.
> 
> True.
> 
> > > The alternative is to retain the --param for exactly that case,
> > > not sure if the result is better or not.  The --param is new on
> > > head, it was static '50' before.
> > > 
> > 
> > I think the intention of --param is to offer ports a way to tweak
> > it (no ports do it for now though :)).  Not sure how target costing
> > is sensitive to this factor, but I also prefer to make its default
> > value as 50 as Honza suggested to avoid more possible tweakings.
> > 
> > If targets want more, maybe we can extend it to:
> > 
> > default_hook:
> >   return estimated or likely_max if either is valid;
> >   return default value;
> >   
> > target hook:
> >   val = default_hook; // or from scratch
> >   tweak the val as it wishes;  
> > 
> > I guess there is no this need for now.
> >
> > > Any strong opinions?
> > > 
> > > Richard.
> > > 
> > > 2021-08-23  Richard Biener  <rguenther@suse.de>
> > > 
> > > 	* doc/invoke.texi (vect-inner-loop-cost-factor): Remove
> > > 	documentation.
> > > 	* params.opt (--param vect-inner-loop-cost-factor): Remove.
> > > 	* tree-vect-loop.c (_loop_vec_info::_loop_vec_info):
> > > 	Initialize inner_loop_cost_factor to 1.
> > > 	(vect_analyze_loop_form): Initialize inner_loop_cost_factor
> > > 	from the estimated number of iterations of the inner loop.
> > > ---
> > >  gcc/doc/invoke.texi  |  5 -----
> > >  gcc/params.opt       |  4 ----
> > >  gcc/tree-vect-loop.c | 12 +++++++++++-
> > >  3 files changed, 11 insertions(+), 10 deletions(-)
> > > 
> > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > > index c057cc1e4ae..054950132f6 100644
> > > --- a/gcc/doc/invoke.texi
> > > +++ b/gcc/doc/invoke.texi
> > > @@ -14385,11 +14385,6 @@ code to iterate.  2 allows partial vector loads and stores in all loops.
> > >  The parameter only has an effect on targets that support partial
> > >  vector loads and stores.
> > >  
> > > -@item vect-inner-loop-cost-factor
> > > -The factor which the loop vectorizer applies to the cost of statements
> > > -in an inner loop relative to the loop being vectorized.  The default
> > > -value is 50.
> > > -
> > >  @item avoid-fma-max-bits
> > >  Maximum number of bits for which we avoid creating FMAs.
> > >  
> > > diff --git a/gcc/params.opt b/gcc/params.opt
> > > index f9264887b40..f7b19fa430d 100644
> > > --- a/gcc/params.opt
> > > +++ b/gcc/params.opt
> > > @@ -1113,8 +1113,4 @@ Bound on number of runtime checks inserted by the vectorizer's loop versioning f
> > >  Common Joined UInteger Var(param_vect_partial_vector_usage) Init(2) IntegerRange(0, 2) Param Optimization
> > >  Controls how loop vectorizer uses partial vectors.  0 means never, 1 means only for loops whose need to iterate can be removed, 2 means for all loops.  The default value is 2.
> > >  
> > > --param=vect-inner-loop-cost-factor=
> > > -Common Joined UInteger Var(param_vect_inner_loop_cost_factor) Init(50) IntegerRange(1, 999999) Param Optimization
> > > -The factor which the loop vectorizer applies to the cost of statements in an inner loop relative to the loop being vectorized.
> > > -
> > >  ; This comment is to ensure we retain the blank line above.
> > > diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> > > index c521b43a47c..cb48717f20e 100644
> > > --- a/gcc/tree-vect-loop.c
> > > +++ b/gcc/tree-vect-loop.c
> > > @@ -841,7 +841,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared)
> > >      single_scalar_iteration_cost (0),
> > >      vec_outside_cost (0),
> > >      vec_inside_cost (0),
> > > -    inner_loop_cost_factor (param_vect_inner_loop_cost_factor),
> > > +    inner_loop_cost_factor (1),
> > >      vectorizable (false),
> > >      can_use_partial_vectors_p (param_vect_partial_vector_usage != 0),
> > >      using_partial_vectors_p (false),
> > > @@ -1519,6 +1519,16 @@ vect_analyze_loop_form (class loop *loop, vec_info_shared *shared)
> > >        stmt_vec_info inner_loop_cond_info
> > >  	= loop_vinfo->lookup_stmt (inner_loop_cond);
> > >        STMT_VINFO_TYPE (inner_loop_cond_info) = loop_exit_ctrl_vec_info_type;
> > > +      /* If we have an estimate on the number of iterations of the inner
> > > +	 loop use that as the scale for costing, otherwise conservatively
> > > +	 assume a single inner iteration.  */
> > > +      widest_int nit;
> > > +      if (get_estimated_loop_iterations (loop->inner, &nit))
> > > +	LOOP_VINFO_INNER_LOOP_COST_FACTOR (loop_vinfo)
> > > +	  /* Since costing is done on unsigned int cap the scale on
> > > +	     some large number consistent with what we'd see in
> > > +	     CFG counts.  */
> > > +	  = wi::smax (nit, REG_BR_PROB_BASE).to_uhwi ();
> > 
> > I noticed loop-doloop.c use _int version and likely_max, maybe you want that here?
> >  
> >   est_niter = get_estimated_loop_iterations_int (loop);
> >   if (est_niter == -1)
> >     est_niter = get_likely_max_loop_iterations_int (loop)
> 
> I think that are two different things - get_estimated_loop_iterations_int
> are the average number of iterations while 
> get_likely_max_loop_iterations_int is an upper bound.  I'm not sure we
> want to use an upper bound for costing.
> 
> Based on feedback from Honza I'm currently testing the variant below
> which keeps the --param and uses it to cap the estimated number of
> iterations.  That makes the scaling more precise for inner loops that
> don't iterate much but keeps the --param to avoid overflow and to
> keep the present behavior when there's no reliable profile info
> available.

And Kewen noticed the off-by-one by using get_estimated_loop_iterations
so I pushed the correct variant using estimated_stmt_executions.

Bootstrapped / tested on x86_64-unknown-linux-gnu.

Richard.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH][v2] Remove --param vect-inner-loop-cost-factor
  2021-08-24  6:58   ` Richard Biener
  2021-08-24  8:42     ` Richard Biener
@ 2021-08-24 10:10     ` Jan Hubicka
  1 sibling, 0 replies; 6+ messages in thread
From: Jan Hubicka @ 2021-08-24 10:10 UTC (permalink / raw)
  To: Richard Biener; +Cc: Kewen.Lin, richard.sandiford, gcc-patches

> > 
> > I noticed loop-doloop.c use _int version and likely_max, maybe you want that here?
> >  
> >   est_niter = get_estimated_loop_iterations_int (loop);
> >   if (est_niter == -1)
> >     est_niter = get_likely_max_loop_iterations_int (loop)
> 
> I think that are two different things - get_estimated_loop_iterations_int
> are the average number of iterations while 
> get_likely_max_loop_iterations_int is an upper bound.  I'm not sure we
> want to use an upper bound for costing.
> 
> Based on feedback from Honza I'm currently testing the variant below
> which keeps the --param and uses it to cap the estimated number of
> iterations.  That makes the scaling more precise for inner loops that
> don't iterate much but keeps the --param to avoid overflow and to
> keep the present behavior when there's no reliable profile info
> available.

indeed, get_likely_max_loop_iterations_int may be very large.  In some
cases it however will give useful value - for example when loop travels
small array.

So what one can use it for is to cap the --param value.
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index c521b43a47c..cbdd5b407da 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -1519,6 +1519,13 @@ vect_analyze_loop_form (class loop *loop, vec_info_shared *shared)
>        stmt_vec_info inner_loop_cond_info
>  	= loop_vinfo->lookup_stmt (inner_loop_cond);
>        STMT_VINFO_TYPE (inner_loop_cond_info) = loop_exit_ctrl_vec_info_type;
> +      /* If we have an estimate on the number of iterations of the inner
> +	 loop use that to limit the scale for costing, otherwise use
> +	 --param vect-inner-loop-cost-factor literally.  */
> +      widest_int nit;
> +      if (get_estimated_loop_iterations (loop->inner, &nit))
> +	LOOP_VINFO_INNER_LOOP_COST_FACTOR (loop_vinfo)
> +	  = wi::smin (nit, param_vect_inner_loop_cost_factor).to_uhwi ();

      if (get_estimated_loop_iterations (loop->inner, &nit))
	LOOP_VINFO_INNER_LOOP_COST_FACTOR (loop_vinfo)
	  = wi::smin (nit, REG_BR_PROB_BASE /*or other random big cap  */).to_uhwi ();
      else if (get_likely_max_loop_iterations (loop->inner, &nit))
	LOOP_VINFO_INNER_LOOP_COST_FACTOR (loop_vinfo)
	  = wi::smin (nit, param_vect_inner_loop_cost_factor).to_uhwi ();
      else
	LOOP_VINFO_INNER_LOOP_COST_FACTOR (loop_vinfo)
	  = param_vect_inner_loop_cost_factor;

I.e. if we really know the number of iterations, we probably want to
weight by it but we want to cap to avoid overflows.  I assume if we kno
that tripcount is 10000 or more we basically do not care about damage
done to outer loop as long as iner loop improves?

If we know max number of iterations and it is smaller then the param,
we want to use it as cap.

The situation where get_estimated_loop_iteraitons returns wrong value
should be rare - basically when the loop was duplicated by inliner
(or other transform) and it behaves a lot differently then the average
execution of the loop in the train run.  In this case we could also
argue that the loop is not statistically important :)

Honza
>      }
>  
>    gcc_assert (!loop->aux);
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-08-24 10:10 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-23 14:33 [PATCH][v2] Remove --param vect-inner-loop-cost-factor Richard Biener
2021-08-23 14:38 ` Jan Hubicka
2021-08-24  2:22 ` Kewen.Lin
2021-08-24  6:58   ` Richard Biener
2021-08-24  8:42     ` Richard Biener
2021-08-24 10:10     ` Jan Hubicka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).