Re: [PATCH 1/2][vect]PR 88915: Vectorize epilogues when versioning loops

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: "Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com>
To: Richard Biener <rguenther@suse.de>,
	gcc-patches <gcc-patches@gcc.gnu.org>,
	richard.sandiford@arm.com
Subject: Re: [PATCH 1/2][vect]PR 88915: Vectorize epilogues when versioning loops
Date: Fri, 25 Oct 2019 16:20:00 -0000	[thread overview]
Message-ID: <e55b5bcd-ccb2-8cf3-6dc9-3a58d58fc8b2@arm.com> (raw)
In-Reply-To: <mpto8y8fz0v.fsf@arm.com>



On 22/10/2019 18:52, Richard Sandiford wrote:
> Thanks for doing this.  Hope this message doesn't cover too much old
> ground or duplicate too much...
> 
> "Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com> writes:
>> @@ -2466,15 +2476,65 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
>>     else
>>       niters_prolog = build_int_cst (type, 0);
>>   
>> +  loop_vec_info epilogue_vinfo = NULL;
>> +  if (vect_epilogues)
>> +    {
>> +      /* Take the next epilogue_vinfo to vectorize for.  */
>> +      epilogue_vinfo = loop_vinfo->epilogue_vinfos[0];
>> +      loop_vinfo->epilogue_vinfos.ordered_remove (0);
>> +
>> +      /* Don't vectorize epilogues if this is not the most inner loop or if
>> +	 the epilogue may need peeling for alignment as the vectorizer doesn't
>> +	 know how to handle these situations properly yet.  */
>> +      if (loop->inner != NULL
>> +	  || LOOP_VINFO_PEELING_FOR_ALIGNMENT (epilogue_vinfo))
>> +	vect_epilogues = false;
>> +
>> +    }
> 
> Nit: excess blank line before "}".  Sorry if this was discussed before,
> but what's the reason for delaying the check for "loop->inner" to
> this point, rather than doing it in vect_analyze_loop?

Done.
> 
>> +
>> +  tree niters_vector_mult_vf;
>> +  unsigned int lowest_vf = constant_lower_bound (vf);
>> +  /* Note LOOP_VINFO_NITERS_KNOWN_P and LOOP_VINFO_INT_NITERS work
>> +     on niters already ajusted for the iterations of the prologue.  */
> 
> Pre-existing typo: adjusted.  But...
> 
>> +  if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
>> +      && known_eq (vf, lowest_vf))
>> +    {
>> +      loop_vec_info orig_loop_vinfo;
>> +      if (LOOP_VINFO_EPILOGUE_P (loop_vinfo))
>> +	orig_loop_vinfo = LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo);
>> +      else
>> +	orig_loop_vinfo = loop_vinfo;
>> +      vector_sizes vector_sizes = LOOP_VINFO_EPILOGUE_SIZES (orig_loop_vinfo);
>> +      unsigned next_size = 0;
>> +      unsigned HOST_WIDE_INT eiters
>> +	= (LOOP_VINFO_INT_NITERS (loop_vinfo)
>> +	   - LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo));
>> +
>> +      if (prolog_peeling > 0)
>> +	eiters -= prolog_peeling;
> 
> ...is that comment still true?  We're now subtracting the peeling
> amount here.

It is not, "adjusted" the comment ;)

> Might be worth asserting prolog_peeling >= 0, just to emphasise
> that we can't get here for variable peeling amounts, and then subtract
> prolog_peeling unconditionally (assuming that's the right thing to do).
> 
Can't assert as LOOP_VINFO_NITERS_KNOWN_P can be true even with 
prolog_peeling < 0, since we still know the constant number of scalar 
iterations, we just don't know how many vector iterations will be 
performed due to the runtime peeling. I will however, not reject 
vectorizing the epilogue, when we don't know how much we are peeling.
>> +      eiters
>> +	= eiters % lowest_vf + LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo);
>> +
>> +      unsigned int ratio;
>> +      while (next_size < vector_sizes.length ()
>> +	     && !(constant_multiple_p (current_vector_size,
>> +				       vector_sizes[next_size], &ratio)
>> +		  && eiters >= lowest_vf / ratio))
>> +	next_size += 1;
>> +
>> +      if (next_size == vector_sizes.length ())
>> +	vect_epilogues = false;
>> +    }
>> +
>>     /* Prolog loop may be skipped.  */
>>     bool skip_prolog = (prolog_peeling != 0);
>>     /* Skip to epilog if scalar loop may be preferred.  It's only needed
>> -     when we peel for epilog loop and when it hasn't been checked with
>> -     loop versioning.  */
>> +     when we peel for epilog loop or when we loop version.  */
>>     bool skip_vector = (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
>>   		      ? maybe_lt (LOOP_VINFO_INT_NITERS (loop_vinfo),
>>   				  bound_prolog + bound_epilog)
>> -		      : !LOOP_REQUIRES_VERSIONING (loop_vinfo));
>> +		      : (!LOOP_REQUIRES_VERSIONING (loop_vinfo)
>> +			 || vect_epilogues));
> 
> The comment update looks wrong here: without epilogues, we don't need
> the skip when loop versioning, because loop versioning ensures that we
> have at least one vector iteration.
> 
> (I think "it" was supposed to mean "skipping to the epilogue" rather
> than the epilogue loop itself, in case that's the confusion.)
> 
> It'd be good to mention the epilogue condition in the comment too.
> 

Rewrote comment, hopefully this now better reflects reality.

>> +
>> +  if (vect_epilogues)
>> +    {
>> +      epilog->aux = epilogue_vinfo;
>> +      LOOP_VINFO_LOOP (epilogue_vinfo) = epilog;
>> +
>> +      loop_constraint_clear (epilog, LOOP_C_INFINITE);
>> +
>> +      /* We now must calculate the number of iterations for our epilogue.  */
>> +      tree cond_niters, niters;
>> +
>> +      /* Depending on whether we peel for gaps we take niters or niters - 1,
>> +	 we will refer to this as N - G, where N and G are the NITERS and
>> +	 GAP for the original loop.  */
>> +      niters = LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
>> +	? LOOP_VINFO_NITERSM1 (loop_vinfo)
>> +	: LOOP_VINFO_NITERS (loop_vinfo);
>> +
>> +      /* Here we build a vector factorization mask:
>> +	 vf_mask = ~(VF - 1), where VF is the Vectorization Factor.  */
>> +      tree vf_mask = build_int_cst (TREE_TYPE (niters),
>> +				    LOOP_VINFO_VECT_FACTOR (loop_vinfo));
>> +      vf_mask = fold_build2 (MINUS_EXPR, TREE_TYPE (vf_mask),
>> +			     vf_mask,
>> +			     build_one_cst (TREE_TYPE (vf_mask)));
>> +      vf_mask = fold_build1 (BIT_NOT_EXPR, TREE_TYPE (niters), vf_mask);
>> +
>> +      /* Here we calculate:
>> +	 niters = N - ((N-G) & ~(VF -1)) */
>> +      niters = fold_build2 (MINUS_EXPR, TREE_TYPE (niters),
>> +			    LOOP_VINFO_NITERS (loop_vinfo),
>> +			    fold_build2 (BIT_AND_EXPR, TREE_TYPE (niters),
>> +					 niters,
>> +					 vf_mask));
> 
> Might be a daft question, sorry, but why does this need to be so
> complicated?  Couldn't we just use the final value of the main loop's
> IV to calculate how many iterations are left?
> 
> The current code wouldn't for example work for non-power-of-2 SVE vectors.
> vect_set_loop_condition_unmasked is structured to cope with that case
> (in length-agnostic mode only), even when an epilogue is needed.

Good call, as we discussed I changed my approach here. Rather than using 
a conditional expression to guard against skipping the main loop, I now 
use a phi-node to carry the IV.  This actually already exists, so I am 
duplicating here, but I didn't know what the best way was to "grab" this 
existing IV.


>> +     skipped when dealing with epilogues as we assume we already checked them
>> +     for the main loop.  So instead we always check the 'orig_loop_vinfo'.  */
>> +  if (LOOP_REQUIRES_VERSIONING (orig_loop_vinfo))
>>       {
>>         poly_uint64 niters_th = 0;
>>         unsigned int th = LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo);
>> @@ -2307,14 +2342,8 @@ again:
>>      be vectorized.  */
>>   opt_loop_vec_info
>>   vect_analyze_loop (class loop *loop, loop_vec_info orig_loop_vinfo,
>> -		   vec_info_shared *shared)
>> +		   vec_info_shared *shared, vector_sizes vector_sizes)
>>   {
>> -  auto_vector_sizes vector_sizes;
>> -
>> -  /* Autodetect first vector size we try.  */
>> -  current_vector_size = 0;
>> -  targetm.vectorize.autovectorize_vector_sizes (&vector_sizes,
>> -						loop->simdlen != 0);
>>     unsigned int next_size = 0;
>>   
>>     DUMP_VECT_SCOPE ("analyze_loop_nest");
>> @@ -2335,6 +2364,9 @@ vect_analyze_loop (class loop *loop, loop_vec_info orig_loop_vinfo,
>>     poly_uint64 autodetected_vector_size = 0;
>>     opt_loop_vec_info first_loop_vinfo = opt_loop_vec_info::success (NULL);
>>     poly_uint64 first_vector_size = 0;
>> +  poly_uint64 lowest_th = 0;
>> +  unsigned vectorized_loops = 0;
>> +  bool vect_epilogues = !loop->simdlen && PARAM_VALUE (PARAM_VECT_EPILOGUES_NOMASK);
>>     while (1)
>>       {
>>         /* Check the CFG characteristics of the loop (nesting, entry/exit).  */
>> @@ -2353,24 +2385,52 @@ vect_analyze_loop (class loop *loop, loop_vec_info orig_loop_vinfo,
>>   
>>         if (orig_loop_vinfo)
>>   	LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo) = orig_loop_vinfo;
>> +      else if (vect_epilogues && first_loop_vinfo)
>> +	LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo) = first_loop_vinfo;
>>   
>>         opt_result res = vect_analyze_loop_2 (loop_vinfo, fatal, &n_stmts);
>>         if (res)
>>   	{
>>   	  LOOP_VINFO_VECTORIZABLE_P (loop_vinfo) = 1;
>> +	  vectorized_loops++;
>>   
>> -	  if (loop->simdlen
>> -	      && maybe_ne (LOOP_VINFO_VECT_FACTOR (loop_vinfo),
>> -			   (unsigned HOST_WIDE_INT) loop->simdlen))
>> +	  if ((loop->simdlen
>> +	       && maybe_ne (LOOP_VINFO_VECT_FACTOR (loop_vinfo),
>> +			    (unsigned HOST_WIDE_INT) loop->simdlen))
>> +	      || vect_epilogues)
>>   	    {
>>   	      if (first_loop_vinfo == NULL)
>>   		{
>>   		  first_loop_vinfo = loop_vinfo;
>> +		  lowest_th
>> +		    = LOOP_VINFO_VERSIONING_THRESHOLD (first_loop_vinfo);
>>   		  first_vector_size = current_vector_size;
>>   		  loop->aux = NULL;
>>   		}
>>   	      else
>> -		delete loop_vinfo;
>> +		{
>> +		  /* Keep track of vector sizes that we know we can vectorize
>> +		     the epilogue with.  */
>> +		  if (vect_epilogues)
>> +		    {
>> +		      loop->aux = NULL;
>> +		      first_loop_vinfo->epilogue_vsizes.reserve (1);
>> +		      first_loop_vinfo->epilogue_vsizes.quick_push (current_vector_size);
>> +		      first_loop_vinfo->epilogue_vinfos.reserve (1);
>> +		      first_loop_vinfo->epilogue_vinfos.quick_push (loop_vinfo);
> 
> I've messed you around, sorry, but the patches I committed this weekend
> mean we now store the vector size in the loop_vinfo.  It'd be good to
> avoid a separate epilogue_vsizes array if possible.

Rebased. Actually quite happy with that, makes for a cleaner patch on my 
end :)
> 
>> +
>> +	  stmt_vinfo
>> +	    = epilogue_vinfo->lookup_stmt (orig_stmt);
> 
> Nit: fits one line.
> 
>> +
>> +	  STMT_VINFO_STMT (stmt_vinfo) = new_stmt;
>> +	  gimple_set_uid (new_stmt, gimple_uid (orig_stmt));
>> +
>> +	  mapping.put (gimple_phi_result (orig_stmt),
>> +			gimple_phi_result (new_stmt));
> 
> Nit: indented too far.
> 
>> +
>> +	  if (STMT_VINFO_PATTERN_DEF_SEQ (stmt_vinfo))
>> +	    pattern_worklist.safe_push (stmt_vinfo);
>> +
>> +	  related_vinfo = STMT_VINFO_RELATED_STMT (stmt_vinfo);
>> +	  while (related_vinfo && related_vinfo != stmt_vinfo)
>> +	    {
>> +	      related_worklist.safe_push (related_vinfo);
>> +	      /* Set BB such that the assert in
>> +		'get_initial_def_for_reduction' is able to determine that
>> +		the BB of the related stmt is inside this loop.  */
>> +	      gimple_set_bb (STMT_VINFO_STMT (related_vinfo),
>> +			     gimple_bb (new_stmt));
>> +	      related_vinfo = STMT_VINFO_RELATED_STMT (related_vinfo);
>> +	    }
>> +	}
>> +
>> +      for (epilogue_gsi = gsi_start_bb (epilogue_bbs[i]);
>> +	   !gsi_end_p (epilogue_gsi); gsi_next (&epilogue_gsi))
>> +	{
>> +	  orig_stmt = LOOP_VINFO_UP_STMTS (epilogue_vinfo)[0];
>> +	  LOOP_VINFO_UP_STMTS (epilogue_vinfo).ordered_remove (0);
>> +	  new_stmt = gsi_stmt (epilogue_gsi);
>> +
>> +	  stmt_vinfo
>> +	    = epilogue_vinfo->lookup_stmt (orig_stmt);
> 
> Fits on one line.
> 
>> +
>> +	  STMT_VINFO_STMT (stmt_vinfo) = new_stmt;
>> +	  gimple_set_uid (new_stmt, gimple_uid (orig_stmt));
>> +
>> +	  if (is_gimple_assign (orig_stmt))
>> +	    {
>> +	      gcc_assert (is_gimple_assign (new_stmt));
>> +	      mapping.put (gimple_assign_lhs (orig_stmt),
>> +			  gimple_assign_lhs (new_stmt));
>> +	    }
> 
> Why just assigns?  Don't we need to handle calls too?
> 
> Maybe just use gimple_get_lhs here.

Changed.
>> @@ -882,10 +886,35 @@ try_vectorize_loop_1 (hash_table<simduid_to_vf> *&simduid_to_vf_htab,
>>   		 LOCATION_FILE (vect_location.get_location_t ()),
>>   		 LOCATION_LINE (vect_location.get_location_t ()));
>>   
>> +  /* If this is an epilogue, we already know what vector sizes we will use for
>> +     vectorization as the analyzis was part of the main vectorized loop.  Use
>> +     these instead of going through all vector sizes again.  */
>> +  if (orig_loop_vinfo
>> +      && !LOOP_VINFO_EPILOGUE_SIZES (orig_loop_vinfo).is_empty ())
>> +    {
>> +      vector_sizes = LOOP_VINFO_EPILOGUE_SIZES (orig_loop_vinfo);
>> +      assert_versioning = LOOP_REQUIRES_VERSIONING (orig_loop_vinfo);
>> +      current_vector_size = vector_sizes[0];
>> +    }
>> +  else
>> +    {
>> +      /* Autodetect first vector size we try.  */
>> +      current_vector_size = 0;
>> +
>> +      targetm.vectorize.autovectorize_vector_sizes (&auto_vector_sizes,
>> +						    loop->simdlen != 0);
>> +      vector_sizes = auto_vector_sizes;
>> +    }
>> +
>>     /* Try to analyze the loop, retaining an opt_problem if dump_enabled_p.  */
>> -  opt_loop_vec_info loop_vinfo
>> -    = vect_analyze_loop (loop, orig_loop_vinfo, &shared);
>> -  loop->aux = loop_vinfo;
>> +  opt_loop_vec_info loop_vinfo = opt_loop_vec_info::success (NULL);
>> +  if (loop_vec_info_for_loop (loop))
>> +    loop_vinfo = opt_loop_vec_info::success (loop_vec_info_for_loop (loop));
>> +  else
>> +    {
>> +      loop_vinfo = vect_analyze_loop (loop, orig_loop_vinfo, &shared, vector_sizes);
>> +      loop->aux = loop_vinfo;
>> +    }
> 
> I don't really understand what this is doing for the epilogue case.
> Do we call vect_analyze_loop again?  Are vector_sizes[1:] significant
> for epilogues?

The vector sizes code here is no longer needed after your patch. The 
loop_vec_info is just checking whether loop already has one set (which 
is the case for epilogues) and use that, or if not then analyse it 
(which is the case for the first vectorization).  I'll add some comments.
> 
> Thanks,
> Richard
>

next prev parent reply	other threads:[~2019-10-25 16:18 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-23 17:17 Andre Vieira (lists)
2019-08-26 13:17 ` Richard Biener
2019-09-18 11:11   ` Andre Vieira (lists)
2019-10-08 13:16   ` Andre Vieira (lists)
2019-10-09  9:06     ` Richard Biener
2019-10-10 14:01       ` Andre Vieira (lists)
2019-10-11 13:01         ` Richard Biener
2019-10-22 12:53           ` Andre Vieira (lists)
2019-10-22 14:05             ` Richard Biener
2019-10-25 16:18               ` Andre Vieira (lists)
2019-10-28 13:16                 ` Richard Biener
2019-10-22 18:07             ` Richard Sandiford
2019-10-25 16:20               ` Andre Vieira (lists) [this message]
2019-10-25 16:27                 ` Andre Vieira (lists)
2019-10-28 14:18                   ` Richard Biener
2019-10-28 18:37                     ` Andre Vieira (lists)
2019-10-29 12:12                       ` Richard Biener
2019-10-29 13:17                         ` [committed][vect]PR 88915: Vectorize epilogues when versioning loops (was Re: [PATCH 1/2][vect]PR 88915: Vectorize epilogues when versioning loops) Andre Vieira (lists)
2019-10-29 13:45                           ` Richard Biener
2019-09-04 15:35 ` [PATCH 1/2][vect]PR 88915: Vectorize epilogues when versioning loops Jeff Law

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e55b5bcd-ccb2-8cf3-6dc9-3a58d58fc8b2@arm.com \
    --to=andre.simoesdiasvieira@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=rguenther@suse.de \
    --cc=richard.sandiford@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).