Re: [PATCH v4] vect/rs6000: Support vector with length cost modeling

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Richard Sandiford <richard.sandiford@arm.com>
To: "Kewen.Lin" <linkw@linux.ibm.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>,
	Bill Schmidt <wschmidt@linux.ibm.com>,
	Segher Boessenkool <segher@kernel.crashing.org>,
	Richard Biener <richard.guenther@gmail.com>
Subject: Re: [PATCH v4] vect/rs6000: Support vector with length cost modeling
Date: Mon, 27 Jul 2020 14:40:53 +0100	[thread overview]
Message-ID: <mptime9f53u.fsf@arm.com> (raw)
In-Reply-To: <f9ca7491-d8f3-c8e3-5a55-1fbbe71a0d51@linux.ibm.com> (Kewen Lin's message of "Mon, 27 Jul 2020 11:58:52 +0800")

"Kewen.Lin" <linkw@linux.ibm.com> writes:
> Hi Richard,
>
> Thanks for the review again!
>
> on 2020/7/25 上午12:21, Richard Sandiford wrote:
>> "Kewen.Lin" <linkw@linux.ibm.com> writes:
>> 
>> Thanks, the rearrangement of the existing code looks good.  Could you
>> split that and the new LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo) stuff
>> out into separate patches?
>> 
>
> Splitted to https://gcc.gnu.org/pipermail/gcc-patches/2020-July/550691.html.
>
> errr... that subject should be with prefix "[PATCH] vect:".
>
> [snip ...] 
> (Some comments in the snipped content will be done in v4)
>
>>> +	 here.  */
>>> +
>>> +      /* For now we only operate length-based partial vectors on Power,
>>> +	 which has constant VF all the time, we need some tweakings below
>>> +	 if it doesn't hold in future.  */
>>> +      gcc_assert (LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ());
>> 
>> Where do you rely on this?  There didn't seem to be any obvious
>> to_constant uses.  Since this is “only” a cost calculation, we should
>> be using assumed_vf.
>
> Sorry for the confusion.  This was intended for the poly things like
> VF or nitems_per_ctrl which isn't constant during compilation time,
> then get people's attention on the possible runtime cost on things like
> scaling up for nitems_step etc.  But I just realized that the computations
> like the multiply with another constant can operate on the coefficient,
> it looks there is no runtime cost then?  If so, I think I thought too
> much before.  ;-)
>
>>> -      prologue_cost_vec.release ();
>>> -      epilogue_cost_vec.release ();
>>> +      (void) add_stmt_cost (loop_vinfo, target_cost_data, prol_cnt, scalar_stmt,
>>> +			    NULL, NULL_TREE, 0, vect_prologue);
>>> +      (void) add_stmt_cost (loop_vinfo, target_cost_data, body_cnt, scalar_stmt,
>>> +			    NULL, NULL_TREE, 0, vect_body);
>> 
>> IMO this seems to be reproducing too much of the functions that you
>> referred to.  And the danger with that is that they could easily
>> get out of sync later.
>
> Good point!  The original intention was to model as possible as we can,
> to avoid some bad decision due to some unmodeled pieces, like the case
> the loop body is small and some computation become nonnegligible.
> The unsync risks seems also applied for other codes.  How about adding
> some "note" comments in those functions?
>
> The updated v4 is attached by addressing your comments as well as Segher's
> comments.
>
> BR,
> Kewen
> -----
>
> gcc/ChangeLog:
>
> 	* config/rs6000/rs6000.c (rs6000_adjust_vect_cost_per_loop): New
> 	function.
> 	(rs6000_finish_cost): Call rs6000_adjust_vect_cost_per_loop.
> 	* tree-vect-loop.c (vect_estimate_min_profitable_iters): Add cost
> 	modeling for vector with length.
> 	* tree-vect-loop-manip.c (vect_set_loop_controls_directly): Update
> 	function comment.
> 	* tree-vect-stmts.c (vect_gen_len): Likewise.
>
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index 009afc5f894..86ef584e09b 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -5177,6 +5177,34 @@ rs6000_add_stmt_cost (class vec_info *vinfo, void *data, int count,
>    return retval;
>  }
>  
> +/* For some target specific vectorization cost which can't be handled per stmt,
> +   we check the requisite conditions and adjust the vectorization cost
> +   accordingly if satisfied.  One typical example is to model shift cost for
> +   vector with length by counting number of required lengths under condition
> +   LOOP_VINFO_FULLY_WITH_LENGTH_P.  */
> +
> +static void
> +rs6000_adjust_vect_cost_per_loop (rs6000_cost_data *data)
> +{
> +  struct loop *loop = data->loop_info;
> +  gcc_assert (loop);
> +  loop_vec_info loop_vinfo = loop_vec_info_for_loop (loop);
> +
> +  if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
> +    {
> +      rgroup_controls *rgc;
> +      unsigned int num_vectors_m1;
> +      unsigned int shift_cnt = 0;
> +      FOR_EACH_VEC_ELT (LOOP_VINFO_LENS (loop_vinfo), num_vectors_m1, rgc)
> +	if (rgc->type)
> +	  /* Each length needs one shift to fill into bits 0-7.  */
> +	  shift_cnt += num_vectors_m1 + 1;
> +
> +      rs6000_add_stmt_cost (loop_vinfo, (void *) data, shift_cnt, scalar_stmt,
> +			    NULL, NULL_TREE, 0, vect_body);
> +    }
> +}
> +
>  /* Implement targetm.vectorize.finish_cost.  */
>  
>  static void
> @@ -5186,7 +5214,10 @@ rs6000_finish_cost (void *data, unsigned *prologue_cost,
>    rs6000_cost_data *cost_data = (rs6000_cost_data*) data;
>  
>    if (cost_data->loop_info)
> -    rs6000_density_test (cost_data);
> +    {
> +      rs6000_adjust_vect_cost_per_loop (cost_data);
> +      rs6000_density_test (cost_data);
> +    }
>  
>    /* Don't vectorize minimum-vectorization-factor, simple copy loops
>       that require versioning for any reason.  The vectorization is at
> diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
> index 490e7befea3..9d0e3fc525e 100644
> --- a/gcc/tree-vect-loop-manip.c
> +++ b/gcc/tree-vect-loop-manip.c
> @@ -412,7 +412,10 @@ vect_maybe_permute_loop_masks (gimple_seq *seq, rgroup_controls *dest_rgm,
>  
>     This means that we cannot guarantee that such an induction variable
>     would ever hit a value that produces a set of all-false masks or zero
> -   lengths for RGC.  */
> +   lengths for RGC.
> +
> +   Note that please check cost modeling whether needed to be updated in
> +   function vect_estimate_min_profitable_iters if any changes.  */
>  
>  static tree
>  vect_set_loop_controls_directly (class loop *loop, loop_vec_info loop_vinfo,
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index 06cde4b1da3..a00160a7f2d 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -3798,6 +3798,70 @@ vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo,
>        (void) add_stmt_cost (loop_vinfo, target_cost_data, num_masks - 1,
>  			    vector_stmt, NULL, NULL_TREE, 0, vect_body);
>      }
> +  else if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
> +    {
> +      /* Referring to the functions vect_set_loop_condition_partial_vectors
> +	 and vect_set_loop_controls_directly, we need to generate each
> +	 length in the prologue and in the loop body if required. Although
> +	 there are some possible optimizations, we consider the worst case
> +	 here.  */
> +
> +      /* For wrap around checking.  */
> +      tree compare_type = LOOP_VINFO_RGROUP_COMPARE_TYPE (loop_vinfo);
> +      unsigned int compare_precision = TYPE_PRECISION (compare_type);
> +      widest_int iv_limit = vect_iv_limit_for_partial_vectors (loop_vinfo);
> +
> +      bool niters_known_p = LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo);
> +      bool need_iterate_p
> +	= (!LOOP_VINFO_EPILOGUE_P (loop_vinfo)
> +	   && !vect_known_niters_smaller_than_vf (loop_vinfo));
> +
> +      /* Init min/max, shift and minus cost relative to single
> +	 scalar_stmt. For now we only use length-based partial vectors on
> +	 Power, target specific cost tweaking may be needed for other
> +	 ports in future.  */
> +      unsigned int min_max_cost = 2;
> +      unsigned int shift_cost = 1, minus_cost = 1;

Please instead add a scalar_min_max to vect_cost_for_stmt, and use
scalar_stmt for shift and minus.  There shouldn't be any Power things
hard-coded into target-independent code.

> +      /* Init cost relative to single scalar_stmt.  */
> +      unsigned int prologue_cnt = 0;
> +      unsigned int body_cnt = 0;

Maybe _stmts instead of _cnt, to make it clearer what we're counting.
At first it looked like this was actually the raw cost.  I guess with
the above change we'd also want separate counters for min_max.

> +      rgroup_controls *rgc;
> +      unsigned int num_vectors_m1;
> +      FOR_EACH_VEC_ELT (LOOP_VINFO_LENS (loop_vinfo), num_vectors_m1, rgc)
> +	if (rgc->type)
> +	  {
> +	    unsigned nitems = rgc->max_nscalars_per_iter * rgc->factor;
> +
> +	    /* May need one shift for nitems_total computation.  */
> +	    if (nitems != 1 && !niters_known_p)
> +	      prologue_cnt += shift_cost;
> +
> +	    /* Need to handle wrap around.  */
> +	    if (iv_limit == -1
> +		|| (wi::min_precision (iv_limit * nitems, UNSIGNED)
> +		    > compare_precision))
> +	      prologue_cnt += (min_max_cost + minus_cost);

I think we should instead split this condition out into a subroutine
that both this function and vect_set_loop_condition_partial_vectors
can use.  It would take the loop_vec_info and rgroup_controls as
arguments.

That means that we might recompute things a few more times, but this
shouldn't be performance-critical.

Thanks,
Richard

next prev parent reply	other threads:[~2020-07-27 13:40 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-21  5:51 [PATCH] vect: " Kewen.Lin
2020-07-21  7:57 ` Richard Biener
2020-07-22  1:26   ` [PATCH v2] vect/rs6000: " Kewen.Lin
2020-07-22  6:38     ` Richard Biener
2020-07-22  7:08       ` Kewen.Lin
2020-07-22  9:11     ` Richard Sandiford
2020-07-22 15:48       ` [PATCH v3] " Kewen.Lin
2020-07-22 16:25         ` Kewen.Lin
2020-07-24 16:21           ` Richard Sandiford
2020-07-27  3:58             ` [PATCH v4] " Kewen.Lin
2020-07-27 13:40               ` Richard Sandiford [this message]
2020-07-28  8:36                 ` Kewen.Lin
2020-07-31 11:03                   ` Richard Sandiford
2020-07-31 11:20                     ` Richard Biener
2020-07-31 12:37                       ` Kewen.Lin
2020-07-31 13:01                         ` Richard Biener
2020-07-31 13:21                           ` Kewen.Lin
2020-07-31 14:51                 ` [PATCH v5] " Kewen.Lin
2020-08-05  7:27                   ` Richard Sandiford
2020-08-05 14:06                     ` Segher Boessenkool
2020-08-06  6:47                       ` Kewen.Lin
2020-07-22 17:49     ` [PATCH v2] " Segher Boessenkool
2020-07-27  3:44       ` Kewen.Lin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=mptime9f53u.fsf@arm.com \
    --to=richard.sandiford@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=linkw@linux.ibm.com \
    --cc=richard.guenther@gmail.com \
    --cc=segher@kernel.crashing.org \
    --cc=wschmidt@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).