* [PATCH][v2] Remove --param vect-inner-loop-cost-factor @ 2021-08-23 14:33 Richard Biener 2021-08-23 14:38 ` Jan Hubicka 2021-08-24 2:22 ` Kewen.Lin 0 siblings, 2 replies; 6+ messages in thread From: Richard Biener @ 2021-08-23 14:33 UTC (permalink / raw) To: gcc-patches; +Cc: hubicka, richard.sandiford This removes --param vect-inner-loop-cost-factor in favor of looking at the estimated number of iterations of the inner loop when available and otherwise just assumes a single inner iteration which is conservative on the side of not vectorizing. The alternative is to retain the --param for exactly that case, not sure if the result is better or not. The --param is new on head, it was static '50' before. Any strong opinions? Richard. 2021-08-23 Richard Biener <rguenther@suse.de> * doc/invoke.texi (vect-inner-loop-cost-factor): Remove documentation. * params.opt (--param vect-inner-loop-cost-factor): Remove. * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize inner_loop_cost_factor to 1. (vect_analyze_loop_form): Initialize inner_loop_cost_factor from the estimated number of iterations of the inner loop. --- gcc/doc/invoke.texi | 5 ----- gcc/params.opt | 4 ---- gcc/tree-vect-loop.c | 12 +++++++++++- 3 files changed, 11 insertions(+), 10 deletions(-) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index c057cc1e4ae..054950132f6 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -14385,11 +14385,6 @@ code to iterate. 2 allows partial vector loads and stores in all loops. The parameter only has an effect on targets that support partial vector loads and stores. -@item vect-inner-loop-cost-factor -The factor which the loop vectorizer applies to the cost of statements -in an inner loop relative to the loop being vectorized. The default -value is 50. - @item avoid-fma-max-bits Maximum number of bits for which we avoid creating FMAs. diff --git a/gcc/params.opt b/gcc/params.opt index f9264887b40..f7b19fa430d 100644 --- a/gcc/params.opt +++ b/gcc/params.opt @@ -1113,8 +1113,4 @@ Bound on number of runtime checks inserted by the vectorizer's loop versioning f Common Joined UInteger Var(param_vect_partial_vector_usage) Init(2) IntegerRange(0, 2) Param Optimization Controls how loop vectorizer uses partial vectors. 0 means never, 1 means only for loops whose need to iterate can be removed, 2 means for all loops. The default value is 2. --param=vect-inner-loop-cost-factor= -Common Joined UInteger Var(param_vect_inner_loop_cost_factor) Init(50) IntegerRange(1, 999999) Param Optimization -The factor which the loop vectorizer applies to the cost of statements in an inner loop relative to the loop being vectorized. - ; This comment is to ensure we retain the blank line above. diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index c521b43a47c..cb48717f20e 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -841,7 +841,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared) single_scalar_iteration_cost (0), vec_outside_cost (0), vec_inside_cost (0), - inner_loop_cost_factor (param_vect_inner_loop_cost_factor), + inner_loop_cost_factor (1), vectorizable (false), can_use_partial_vectors_p (param_vect_partial_vector_usage != 0), using_partial_vectors_p (false), @@ -1519,6 +1519,16 @@ vect_analyze_loop_form (class loop *loop, vec_info_shared *shared) stmt_vec_info inner_loop_cond_info = loop_vinfo->lookup_stmt (inner_loop_cond); STMT_VINFO_TYPE (inner_loop_cond_info) = loop_exit_ctrl_vec_info_type; + /* If we have an estimate on the number of iterations of the inner + loop use that as the scale for costing, otherwise conservatively + assume a single inner iteration. */ + widest_int nit; + if (get_estimated_loop_iterations (loop->inner, &nit)) + LOOP_VINFO_INNER_LOOP_COST_FACTOR (loop_vinfo) + /* Since costing is done on unsigned int cap the scale on + some large number consistent with what we'd see in + CFG counts. */ + = wi::smax (nit, REG_BR_PROB_BASE).to_uhwi (); } gcc_assert (!loop->aux); -- 2.31.1 ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH][v2] Remove --param vect-inner-loop-cost-factor 2021-08-23 14:33 [PATCH][v2] Remove --param vect-inner-loop-cost-factor Richard Biener @ 2021-08-23 14:38 ` Jan Hubicka 2021-08-24 2:22 ` Kewen.Lin 1 sibling, 0 replies; 6+ messages in thread From: Jan Hubicka @ 2021-08-23 14:38 UTC (permalink / raw) To: Richard Biener; +Cc: gcc-patches, richard.sandiford > > Any strong opinions? > > Richard. > > 2021-08-23 Richard Biener <rguenther@suse.de> > > * doc/invoke.texi (vect-inner-loop-cost-factor): Remove > documentation. > * params.opt (--param vect-inner-loop-cost-factor): Remove. > * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): > Initialize inner_loop_cost_factor to 1. > (vect_analyze_loop_form): Initialize inner_loop_cost_factor > from the estimated number of iterations of the inner loop. > --- > gcc/doc/invoke.texi | 5 ----- > gcc/params.opt | 4 ---- > gcc/tree-vect-loop.c | 12 +++++++++++- > 3 files changed, 11 insertions(+), 10 deletions(-) > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > index c057cc1e4ae..054950132f6 100644 > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi > @@ -14385,11 +14385,6 @@ code to iterate. 2 allows partial vector loads and stores in all loops. > The parameter only has an effect on targets that support partial > vector loads and stores. > > -@item vect-inner-loop-cost-factor > -The factor which the loop vectorizer applies to the cost of statements > -in an inner loop relative to the loop being vectorized. The default > -value is 50. > - > @item avoid-fma-max-bits > Maximum number of bits for which we avoid creating FMAs. > > diff --git a/gcc/params.opt b/gcc/params.opt > index f9264887b40..f7b19fa430d 100644 > --- a/gcc/params.opt > +++ b/gcc/params.opt > @@ -1113,8 +1113,4 @@ Bound on number of runtime checks inserted by the vectorizer's loop versioning f > Common Joined UInteger Var(param_vect_partial_vector_usage) Init(2) IntegerRange(0, 2) Param Optimization > Controls how loop vectorizer uses partial vectors. 0 means never, 1 means only for loops whose need to iterate can be removed, 2 means for all loops. The default value is 2. > > --param=vect-inner-loop-cost-factor= > -Common Joined UInteger Var(param_vect_inner_loop_cost_factor) Init(50) IntegerRange(1, 999999) Param Optimization > -The factor which the loop vectorizer applies to the cost of statements in an inner loop relative to the loop being vectorized. > - > ; This comment is to ensure we retain the blank line above. > diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c > index c521b43a47c..cb48717f20e 100644 > --- a/gcc/tree-vect-loop.c > +++ b/gcc/tree-vect-loop.c > @@ -841,7 +841,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared) > single_scalar_iteration_cost (0), > vec_outside_cost (0), > vec_inside_cost (0), > - inner_loop_cost_factor (param_vect_inner_loop_cost_factor), > + inner_loop_cost_factor (1), > vectorizable (false), > can_use_partial_vectors_p (param_vect_partial_vector_usage != 0), > using_partial_vectors_p (false), > @@ -1519,6 +1519,16 @@ vect_analyze_loop_form (class loop *loop, vec_info_shared *shared) > stmt_vec_info inner_loop_cond_info > = loop_vinfo->lookup_stmt (inner_loop_cond); > STMT_VINFO_TYPE (inner_loop_cond_info) = loop_exit_ctrl_vec_info_type; > + /* If we have an estimate on the number of iterations of the inner > + loop use that as the scale for costing, otherwise conservatively > + assume a single inner iteration. */ > + widest_int nit; > + if (get_estimated_loop_iterations (loop->inner, &nit)) > + LOOP_VINFO_INNER_LOOP_COST_FACTOR (loop_vinfo) > + /* Since costing is done on unsigned int cap the scale on > + some large number consistent with what we'd see in > + CFG counts. */ > + = wi::smax (nit, REG_BR_PROB_BASE).to_uhwi (); This looks sane to me, but for the case where profile info is missing, you will get false from get_estimated_loop_iterations and in that case it will think that inner loop iterates once which seems bit unrealistic for vectorizable loop nests. I assume you want to deterine here that reducing cost of stmt in inner loop is more importnat than increasing cost of stmt outside.. So perhaps keeping parmeter and capping it with get_max_loop_iterations would be sane? Honza > } > > gcc_assert (!loop->aux); > -- > 2.31.1 ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH][v2] Remove --param vect-inner-loop-cost-factor 2021-08-23 14:33 [PATCH][v2] Remove --param vect-inner-loop-cost-factor Richard Biener 2021-08-23 14:38 ` Jan Hubicka @ 2021-08-24 2:22 ` Kewen.Lin 2021-08-24 6:58 ` Richard Biener 1 sibling, 1 reply; 6+ messages in thread From: Kewen.Lin @ 2021-08-24 2:22 UTC (permalink / raw) To: Richard Biener; +Cc: richard.sandiford, hubicka, gcc-patches Hi Richi, on 2021/8/23 下午10:33, Richard Biener via Gcc-patches wrote: > This removes --param vect-inner-loop-cost-factor in favor of looking > at the estimated number of iterations of the inner loop > when available and otherwise just assumes a single inner > iteration which is conservative on the side of not vectorizing. > I may miss something, the factor seems to be an amplifier, a single inner iteration on the side of not vectorizing only relies on that vector_cost < scalar_cost, if scalar_cost < vector_cost, the direction will be flipped? ({vector,scalar}_cost is only for inner loop part). Since we don't calculate/compare costing for inner loop independently and early return if scalar_cost < vector_cost for inner loop, I guess it's possible to have "scalar_cost < vector_cost" case theoretically, especially when targets can cost something more on vector side. > The alternative is to retain the --param for exactly that case, > not sure if the result is better or not. The --param is new on > head, it was static '50' before. > I think the intention of --param is to offer ports a way to tweak it (no ports do it for now though :)). Not sure how target costing is sensitive to this factor, but I also prefer to make its default value as 50 as Honza suggested to avoid more possible tweakings. If targets want more, maybe we can extend it to: default_hook: return estimated or likely_max if either is valid; return default value; target hook: val = default_hook; // or from scratch tweak the val as it wishes; I guess there is no this need for now. > Any strong opinions? > > Richard. > > 2021-08-23 Richard Biener <rguenther@suse.de> > > * doc/invoke.texi (vect-inner-loop-cost-factor): Remove > documentation. > * params.opt (--param vect-inner-loop-cost-factor): Remove. > * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): > Initialize inner_loop_cost_factor to 1. > (vect_analyze_loop_form): Initialize inner_loop_cost_factor > from the estimated number of iterations of the inner loop. > --- > gcc/doc/invoke.texi | 5 ----- > gcc/params.opt | 4 ---- > gcc/tree-vect-loop.c | 12 +++++++++++- > 3 files changed, 11 insertions(+), 10 deletions(-) > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > index c057cc1e4ae..054950132f6 100644 > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi > @@ -14385,11 +14385,6 @@ code to iterate. 2 allows partial vector loads and stores in all loops. > The parameter only has an effect on targets that support partial > vector loads and stores. > > -@item vect-inner-loop-cost-factor > -The factor which the loop vectorizer applies to the cost of statements > -in an inner loop relative to the loop being vectorized. The default > -value is 50. > - > @item avoid-fma-max-bits > Maximum number of bits for which we avoid creating FMAs. > > diff --git a/gcc/params.opt b/gcc/params.opt > index f9264887b40..f7b19fa430d 100644 > --- a/gcc/params.opt > +++ b/gcc/params.opt > @@ -1113,8 +1113,4 @@ Bound on number of runtime checks inserted by the vectorizer's loop versioning f > Common Joined UInteger Var(param_vect_partial_vector_usage) Init(2) IntegerRange(0, 2) Param Optimization > Controls how loop vectorizer uses partial vectors. 0 means never, 1 means only for loops whose need to iterate can be removed, 2 means for all loops. The default value is 2. > > --param=vect-inner-loop-cost-factor= > -Common Joined UInteger Var(param_vect_inner_loop_cost_factor) Init(50) IntegerRange(1, 999999) Param Optimization > -The factor which the loop vectorizer applies to the cost of statements in an inner loop relative to the loop being vectorized. > - > ; This comment is to ensure we retain the blank line above. > diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c > index c521b43a47c..cb48717f20e 100644 > --- a/gcc/tree-vect-loop.c > +++ b/gcc/tree-vect-loop.c > @@ -841,7 +841,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared) > single_scalar_iteration_cost (0), > vec_outside_cost (0), > vec_inside_cost (0), > - inner_loop_cost_factor (param_vect_inner_loop_cost_factor), > + inner_loop_cost_factor (1), > vectorizable (false), > can_use_partial_vectors_p (param_vect_partial_vector_usage != 0), > using_partial_vectors_p (false), > @@ -1519,6 +1519,16 @@ vect_analyze_loop_form (class loop *loop, vec_info_shared *shared) > stmt_vec_info inner_loop_cond_info > = loop_vinfo->lookup_stmt (inner_loop_cond); > STMT_VINFO_TYPE (inner_loop_cond_info) = loop_exit_ctrl_vec_info_type; > + /* If we have an estimate on the number of iterations of the inner > + loop use that as the scale for costing, otherwise conservatively > + assume a single inner iteration. */ > + widest_int nit; > + if (get_estimated_loop_iterations (loop->inner, &nit)) > + LOOP_VINFO_INNER_LOOP_COST_FACTOR (loop_vinfo) > + /* Since costing is done on unsigned int cap the scale on > + some large number consistent with what we'd see in > + CFG counts. */ > + = wi::smax (nit, REG_BR_PROB_BASE).to_uhwi (); I noticed loop-doloop.c use _int version and likely_max, maybe you want that here? est_niter = get_estimated_loop_iterations_int (loop); if (est_niter == -1) est_niter = get_likely_max_loop_iterations_int (loop) BR, Kewen ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH][v2] Remove --param vect-inner-loop-cost-factor 2021-08-24 2:22 ` Kewen.Lin @ 2021-08-24 6:58 ` Richard Biener 2021-08-24 8:42 ` Richard Biener 2021-08-24 10:10 ` Jan Hubicka 0 siblings, 2 replies; 6+ messages in thread From: Richard Biener @ 2021-08-24 6:58 UTC (permalink / raw) To: Kewen.Lin; +Cc: richard.sandiford, hubicka, gcc-patches On Tue, 24 Aug 2021, Kewen.Lin wrote: > Hi Richi, > > on 2021/8/23 ??10:33, Richard Biener via Gcc-patches wrote: > > This removes --param vect-inner-loop-cost-factor in favor of looking > > at the estimated number of iterations of the inner loop > > when available and otherwise just assumes a single inner > > iteration which is conservative on the side of not vectorizing. > > > > I may miss something, the factor seems to be an amplifier, a single > inner iteration on the side of not vectorizing only relies on that > vector_cost < scalar_cost, if scalar_cost < vector_cost, the direction > will be flipped? ({vector,scalar}_cost is only for inner loop part). > > Since we don't calculate/compare costing for inner loop independently > and early return if scalar_cost < vector_cost for inner loop, I guess > it's possible to have "scalar_cost < vector_cost" case theoretically, > especially when targets can cost something more on vector side. True. > > The alternative is to retain the --param for exactly that case, > > not sure if the result is better or not. The --param is new on > > head, it was static '50' before. > > > > I think the intention of --param is to offer ports a way to tweak > it (no ports do it for now though :)). Not sure how target costing > is sensitive to this factor, but I also prefer to make its default > value as 50 as Honza suggested to avoid more possible tweakings. > > If targets want more, maybe we can extend it to: > > default_hook: > return estimated or likely_max if either is valid; > return default value; > > target hook: > val = default_hook; // or from scratch > tweak the val as it wishes; > > I guess there is no this need for now. > > > Any strong opinions? > > > > Richard. > > > > 2021-08-23 Richard Biener <rguenther@suse.de> > > > > * doc/invoke.texi (vect-inner-loop-cost-factor): Remove > > documentation. > > * params.opt (--param vect-inner-loop-cost-factor): Remove. > > * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): > > Initialize inner_loop_cost_factor to 1. > > (vect_analyze_loop_form): Initialize inner_loop_cost_factor > > from the estimated number of iterations of the inner loop. > > --- > > gcc/doc/invoke.texi | 5 ----- > > gcc/params.opt | 4 ---- > > gcc/tree-vect-loop.c | 12 +++++++++++- > > 3 files changed, 11 insertions(+), 10 deletions(-) > > > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > > index c057cc1e4ae..054950132f6 100644 > > --- a/gcc/doc/invoke.texi > > +++ b/gcc/doc/invoke.texi > > @@ -14385,11 +14385,6 @@ code to iterate. 2 allows partial vector loads and stores in all loops. > > The parameter only has an effect on targets that support partial > > vector loads and stores. > > > > -@item vect-inner-loop-cost-factor > > -The factor which the loop vectorizer applies to the cost of statements > > -in an inner loop relative to the loop being vectorized. The default > > -value is 50. > > - > > @item avoid-fma-max-bits > > Maximum number of bits for which we avoid creating FMAs. > > > > diff --git a/gcc/params.opt b/gcc/params.opt > > index f9264887b40..f7b19fa430d 100644 > > --- a/gcc/params.opt > > +++ b/gcc/params.opt > > @@ -1113,8 +1113,4 @@ Bound on number of runtime checks inserted by the vectorizer's loop versioning f > > Common Joined UInteger Var(param_vect_partial_vector_usage) Init(2) IntegerRange(0, 2) Param Optimization > > Controls how loop vectorizer uses partial vectors. 0 means never, 1 means only for loops whose need to iterate can be removed, 2 means for all loops. The default value is 2. > > > > --param=vect-inner-loop-cost-factor= > > -Common Joined UInteger Var(param_vect_inner_loop_cost_factor) Init(50) IntegerRange(1, 999999) Param Optimization > > -The factor which the loop vectorizer applies to the cost of statements in an inner loop relative to the loop being vectorized. > > - > > ; This comment is to ensure we retain the blank line above. > > diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c > > index c521b43a47c..cb48717f20e 100644 > > --- a/gcc/tree-vect-loop.c > > +++ b/gcc/tree-vect-loop.c > > @@ -841,7 +841,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared) > > single_scalar_iteration_cost (0), > > vec_outside_cost (0), > > vec_inside_cost (0), > > - inner_loop_cost_factor (param_vect_inner_loop_cost_factor), > > + inner_loop_cost_factor (1), > > vectorizable (false), > > can_use_partial_vectors_p (param_vect_partial_vector_usage != 0), > > using_partial_vectors_p (false), > > @@ -1519,6 +1519,16 @@ vect_analyze_loop_form (class loop *loop, vec_info_shared *shared) > > stmt_vec_info inner_loop_cond_info > > = loop_vinfo->lookup_stmt (inner_loop_cond); > > STMT_VINFO_TYPE (inner_loop_cond_info) = loop_exit_ctrl_vec_info_type; > > + /* If we have an estimate on the number of iterations of the inner > > + loop use that as the scale for costing, otherwise conservatively > > + assume a single inner iteration. */ > > + widest_int nit; > > + if (get_estimated_loop_iterations (loop->inner, &nit)) > > + LOOP_VINFO_INNER_LOOP_COST_FACTOR (loop_vinfo) > > + /* Since costing is done on unsigned int cap the scale on > > + some large number consistent with what we'd see in > > + CFG counts. */ > > + = wi::smax (nit, REG_BR_PROB_BASE).to_uhwi (); > > I noticed loop-doloop.c use _int version and likely_max, maybe you want that here? > > est_niter = get_estimated_loop_iterations_int (loop); > if (est_niter == -1) > est_niter = get_likely_max_loop_iterations_int (loop) I think that are two different things - get_estimated_loop_iterations_int are the average number of iterations while get_likely_max_loop_iterations_int is an upper bound. I'm not sure we want to use an upper bound for costing. Based on feedback from Honza I'm currently testing the variant below which keeps the --param and uses it to cap the estimated number of iterations. That makes the scaling more precise for inner loops that don't iterate much but keeps the --param to avoid overflow and to keep the present behavior when there's no reliable profile info available. Richard. From d140a56b6ef95555bbfe6d228a25b1d021c7d796 Mon Sep 17 00:00:00 2001 From: Richard Biener <rguenther@suse.de> Date: Mon, 23 Aug 2021 14:15:14 +0200 Subject: [PATCH] Adjust inner loop cost scaling To: gcc-patches@gcc.gnu.org This makes use of the estimated number of iterations of the inner loop to limit --param vect-inner-loop-cost-factor scaling. It also reduces the maximum value of vect-inner-loop-cost-factor to 10000 making it less likely to cause overflow of costs. 2021-08-23 Richard Biener <rguenther@suse.de> * doc/invoke.texi (vect-inner-loop-cost-factor): Adjust. * params.opt (--param vect-inner-loop-cost-factor): Adjust maximum value. * tree-vect-loop.c (vect_analyze_loop_form): Initialize inner_loop_cost_factor to the minimum of the estimated number of iterations of the inner loop and vect-inner-loop-cost-factor. --- gcc/doc/invoke.texi | 7 ++++--- gcc/params.opt | 4 ++-- gcc/tree-vect-loop.c | 7 +++++++ 3 files changed, 13 insertions(+), 5 deletions(-) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index c057cc1e4ae..a9d56fecf4e 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -14386,9 +14386,10 @@ The parameter only has an effect on targets that support partial vector loads and stores. @item vect-inner-loop-cost-factor -The factor which the loop vectorizer applies to the cost of statements -in an inner loop relative to the loop being vectorized. The default -value is 50. +The maximum factor which the loop vectorizer applies to the cost of statements +in an inner loop relative to the loop being vectorized. The factor applied +is the maximum of the estimated number of iterations of the inner loop and +this parameter. The default value of this parameter is 50. @item avoid-fma-max-bits Maximum number of bits for which we avoid creating FMAs. diff --git a/gcc/params.opt b/gcc/params.opt index f9264887b40..f414dc1a61c 100644 --- a/gcc/params.opt +++ b/gcc/params.opt @@ -1114,7 +1114,7 @@ Common Joined UInteger Var(param_vect_partial_vector_usage) Init(2) IntegerRange Controls how loop vectorizer uses partial vectors. 0 means never, 1 means only for loops whose need to iterate can be removed, 2 means for all loops. The default value is 2. -param=vect-inner-loop-cost-factor= -Common Joined UInteger Var(param_vect_inner_loop_cost_factor) Init(50) IntegerRange(1, 999999) Param Optimization -The factor which the loop vectorizer applies to the cost of statements in an inner loop relative to the loop being vectorized. +Common Joined UInteger Var(param_vect_inner_loop_cost_factor) Init(50) IntegerRange(1, 10000) Param Optimization +The maximum factor which the loop vectorizer applies to the cost of statements in an inner loop relative to the loop being vectorized. ; This comment is to ensure we retain the blank line above. diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index c521b43a47c..cbdd5b407da 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -1519,6 +1519,13 @@ vect_analyze_loop_form (class loop *loop, vec_info_shared *shared) stmt_vec_info inner_loop_cond_info = loop_vinfo->lookup_stmt (inner_loop_cond); STMT_VINFO_TYPE (inner_loop_cond_info) = loop_exit_ctrl_vec_info_type; + /* If we have an estimate on the number of iterations of the inner + loop use that to limit the scale for costing, otherwise use + --param vect-inner-loop-cost-factor literally. */ + widest_int nit; + if (get_estimated_loop_iterations (loop->inner, &nit)) + LOOP_VINFO_INNER_LOOP_COST_FACTOR (loop_vinfo) + = wi::smin (nit, param_vect_inner_loop_cost_factor).to_uhwi (); } gcc_assert (!loop->aux); -- 2.31.1 ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH][v2] Remove --param vect-inner-loop-cost-factor 2021-08-24 6:58 ` Richard Biener @ 2021-08-24 8:42 ` Richard Biener 2021-08-24 10:10 ` Jan Hubicka 1 sibling, 0 replies; 6+ messages in thread From: Richard Biener @ 2021-08-24 8:42 UTC (permalink / raw) To: Kewen.Lin; +Cc: richard.sandiford, hubicka, gcc-patches On Tue, 24 Aug 2021, Richard Biener wrote: > On Tue, 24 Aug 2021, Kewen.Lin wrote: > > > Hi Richi, > > > > on 2021/8/23 ??10:33, Richard Biener via Gcc-patches wrote: > > > This removes --param vect-inner-loop-cost-factor in favor of looking > > > at the estimated number of iterations of the inner loop > > > when available and otherwise just assumes a single inner > > > iteration which is conservative on the side of not vectorizing. > > > > > > > I may miss something, the factor seems to be an amplifier, a single > > inner iteration on the side of not vectorizing only relies on that > > vector_cost < scalar_cost, if scalar_cost < vector_cost, the direction > > will be flipped? ({vector,scalar}_cost is only for inner loop part). > > > > Since we don't calculate/compare costing for inner loop independently > > and early return if scalar_cost < vector_cost for inner loop, I guess > > it's possible to have "scalar_cost < vector_cost" case theoretically, > > especially when targets can cost something more on vector side. > > True. > > > > The alternative is to retain the --param for exactly that case, > > > not sure if the result is better or not. The --param is new on > > > head, it was static '50' before. > > > > > > > I think the intention of --param is to offer ports a way to tweak > > it (no ports do it for now though :)). Not sure how target costing > > is sensitive to this factor, but I also prefer to make its default > > value as 50 as Honza suggested to avoid more possible tweakings. > > > > If targets want more, maybe we can extend it to: > > > > default_hook: > > return estimated or likely_max if either is valid; > > return default value; > > > > target hook: > > val = default_hook; // or from scratch > > tweak the val as it wishes; > > > > I guess there is no this need for now. > > > > > Any strong opinions? > > > > > > Richard. > > > > > > 2021-08-23 Richard Biener <rguenther@suse.de> > > > > > > * doc/invoke.texi (vect-inner-loop-cost-factor): Remove > > > documentation. > > > * params.opt (--param vect-inner-loop-cost-factor): Remove. > > > * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): > > > Initialize inner_loop_cost_factor to 1. > > > (vect_analyze_loop_form): Initialize inner_loop_cost_factor > > > from the estimated number of iterations of the inner loop. > > > --- > > > gcc/doc/invoke.texi | 5 ----- > > > gcc/params.opt | 4 ---- > > > gcc/tree-vect-loop.c | 12 +++++++++++- > > > 3 files changed, 11 insertions(+), 10 deletions(-) > > > > > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > > > index c057cc1e4ae..054950132f6 100644 > > > --- a/gcc/doc/invoke.texi > > > +++ b/gcc/doc/invoke.texi > > > @@ -14385,11 +14385,6 @@ code to iterate. 2 allows partial vector loads and stores in all loops. > > > The parameter only has an effect on targets that support partial > > > vector loads and stores. > > > > > > -@item vect-inner-loop-cost-factor > > > -The factor which the loop vectorizer applies to the cost of statements > > > -in an inner loop relative to the loop being vectorized. The default > > > -value is 50. > > > - > > > @item avoid-fma-max-bits > > > Maximum number of bits for which we avoid creating FMAs. > > > > > > diff --git a/gcc/params.opt b/gcc/params.opt > > > index f9264887b40..f7b19fa430d 100644 > > > --- a/gcc/params.opt > > > +++ b/gcc/params.opt > > > @@ -1113,8 +1113,4 @@ Bound on number of runtime checks inserted by the vectorizer's loop versioning f > > > Common Joined UInteger Var(param_vect_partial_vector_usage) Init(2) IntegerRange(0, 2) Param Optimization > > > Controls how loop vectorizer uses partial vectors. 0 means never, 1 means only for loops whose need to iterate can be removed, 2 means for all loops. The default value is 2. > > > > > > --param=vect-inner-loop-cost-factor= > > > -Common Joined UInteger Var(param_vect_inner_loop_cost_factor) Init(50) IntegerRange(1, 999999) Param Optimization > > > -The factor which the loop vectorizer applies to the cost of statements in an inner loop relative to the loop being vectorized. > > > - > > > ; This comment is to ensure we retain the blank line above. > > > diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c > > > index c521b43a47c..cb48717f20e 100644 > > > --- a/gcc/tree-vect-loop.c > > > +++ b/gcc/tree-vect-loop.c > > > @@ -841,7 +841,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared) > > > single_scalar_iteration_cost (0), > > > vec_outside_cost (0), > > > vec_inside_cost (0), > > > - inner_loop_cost_factor (param_vect_inner_loop_cost_factor), > > > + inner_loop_cost_factor (1), > > > vectorizable (false), > > > can_use_partial_vectors_p (param_vect_partial_vector_usage != 0), > > > using_partial_vectors_p (false), > > > @@ -1519,6 +1519,16 @@ vect_analyze_loop_form (class loop *loop, vec_info_shared *shared) > > > stmt_vec_info inner_loop_cond_info > > > = loop_vinfo->lookup_stmt (inner_loop_cond); > > > STMT_VINFO_TYPE (inner_loop_cond_info) = loop_exit_ctrl_vec_info_type; > > > + /* If we have an estimate on the number of iterations of the inner > > > + loop use that as the scale for costing, otherwise conservatively > > > + assume a single inner iteration. */ > > > + widest_int nit; > > > + if (get_estimated_loop_iterations (loop->inner, &nit)) > > > + LOOP_VINFO_INNER_LOOP_COST_FACTOR (loop_vinfo) > > > + /* Since costing is done on unsigned int cap the scale on > > > + some large number consistent with what we'd see in > > > + CFG counts. */ > > > + = wi::smax (nit, REG_BR_PROB_BASE).to_uhwi (); > > > > I noticed loop-doloop.c use _int version and likely_max, maybe you want that here? > > > > est_niter = get_estimated_loop_iterations_int (loop); > > if (est_niter == -1) > > est_niter = get_likely_max_loop_iterations_int (loop) > > I think that are two different things - get_estimated_loop_iterations_int > are the average number of iterations while > get_likely_max_loop_iterations_int is an upper bound. I'm not sure we > want to use an upper bound for costing. > > Based on feedback from Honza I'm currently testing the variant below > which keeps the --param and uses it to cap the estimated number of > iterations. That makes the scaling more precise for inner loops that > don't iterate much but keeps the --param to avoid overflow and to > keep the present behavior when there's no reliable profile info > available. And Kewen noticed the off-by-one by using get_estimated_loop_iterations so I pushed the correct variant using estimated_stmt_executions. Bootstrapped / tested on x86_64-unknown-linux-gnu. Richard. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH][v2] Remove --param vect-inner-loop-cost-factor 2021-08-24 6:58 ` Richard Biener 2021-08-24 8:42 ` Richard Biener @ 2021-08-24 10:10 ` Jan Hubicka 1 sibling, 0 replies; 6+ messages in thread From: Jan Hubicka @ 2021-08-24 10:10 UTC (permalink / raw) To: Richard Biener; +Cc: Kewen.Lin, richard.sandiford, gcc-patches > > > > I noticed loop-doloop.c use _int version and likely_max, maybe you want that here? > > > > est_niter = get_estimated_loop_iterations_int (loop); > > if (est_niter == -1) > > est_niter = get_likely_max_loop_iterations_int (loop) > > I think that are two different things - get_estimated_loop_iterations_int > are the average number of iterations while > get_likely_max_loop_iterations_int is an upper bound. I'm not sure we > want to use an upper bound for costing. > > Based on feedback from Honza I'm currently testing the variant below > which keeps the --param and uses it to cap the estimated number of > iterations. That makes the scaling more precise for inner loops that > don't iterate much but keeps the --param to avoid overflow and to > keep the present behavior when there's no reliable profile info > available. indeed, get_likely_max_loop_iterations_int may be very large. In some cases it however will give useful value - for example when loop travels small array. So what one can use it for is to cap the --param value. > diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c > index c521b43a47c..cbdd5b407da 100644 > --- a/gcc/tree-vect-loop.c > +++ b/gcc/tree-vect-loop.c > @@ -1519,6 +1519,13 @@ vect_analyze_loop_form (class loop *loop, vec_info_shared *shared) > stmt_vec_info inner_loop_cond_info > = loop_vinfo->lookup_stmt (inner_loop_cond); > STMT_VINFO_TYPE (inner_loop_cond_info) = loop_exit_ctrl_vec_info_type; > + /* If we have an estimate on the number of iterations of the inner > + loop use that to limit the scale for costing, otherwise use > + --param vect-inner-loop-cost-factor literally. */ > + widest_int nit; > + if (get_estimated_loop_iterations (loop->inner, &nit)) > + LOOP_VINFO_INNER_LOOP_COST_FACTOR (loop_vinfo) > + = wi::smin (nit, param_vect_inner_loop_cost_factor).to_uhwi (); if (get_estimated_loop_iterations (loop->inner, &nit)) LOOP_VINFO_INNER_LOOP_COST_FACTOR (loop_vinfo) = wi::smin (nit, REG_BR_PROB_BASE /*or other random big cap */).to_uhwi (); else if (get_likely_max_loop_iterations (loop->inner, &nit)) LOOP_VINFO_INNER_LOOP_COST_FACTOR (loop_vinfo) = wi::smin (nit, param_vect_inner_loop_cost_factor).to_uhwi (); else LOOP_VINFO_INNER_LOOP_COST_FACTOR (loop_vinfo) = param_vect_inner_loop_cost_factor; I.e. if we really know the number of iterations, we probably want to weight by it but we want to cap to avoid overflows. I assume if we kno that tripcount is 10000 or more we basically do not care about damage done to outer loop as long as iner loop improves? If we know max number of iterations and it is smaller then the param, we want to use it as cap. The situation where get_estimated_loop_iteraitons returns wrong value should be rare - basically when the loop was duplicated by inliner (or other transform) and it behaves a lot differently then the average execution of the loop in the train run. In this case we could also argue that the loop is not statistically important :) Honza > } > > gcc_assert (!loop->aux); > -- > 2.31.1 > ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2021-08-24 10:10 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-08-23 14:33 [PATCH][v2] Remove --param vect-inner-loop-cost-factor Richard Biener 2021-08-23 14:38 ` Jan Hubicka 2021-08-24 2:22 ` Kewen.Lin 2021-08-24 6:58 ` Richard Biener 2021-08-24 8:42 ` Richard Biener 2021-08-24 10:10 ` Jan Hubicka
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).