public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] vect: Fix missing alias checks for 128-bit SVE [PR98371]
@ 2020-12-18 12:46 Richard Sandiford
  2021-01-05  9:04 ` Richard Biener
  0 siblings, 1 reply; 2+ messages in thread
From: Richard Sandiford @ 2020-12-18 12:46 UTC (permalink / raw)
  To: gcc-patches; +Cc: rguenther

On AArch64, the vectoriser tries various ways of vectorising with both
SVE and Advanced SIMD and picks the best one.  All other things being
equal, it prefers earlier attempts over later attempts.

The way this works currently is that, once it has a successful
vectorisation attempt A, it analyses all other attempts as epilogue
loops of A:

      /* When pick_lowest_cost_p is true, we should in principle iterate
	 over all the loop_vec_infos that LOOP_VINFO could replace and
	 try to vectorize LOOP_VINFO under the same conditions.
	 E.g. when trying to replace an epilogue loop, we should vectorize
	 LOOP_VINFO as an epilogue loop with the same VF limit.  When trying
	 to replace the main loop, we should vectorize LOOP_VINFO as a main
	 loop too.

	 However, autovectorize_vector_modes is usually sorted as follows:

	 - Modes that naturally produce lower VFs usually follow modes that
	   naturally produce higher VFs.

	 - When modes naturally produce the same VF, maskable modes
	   usually follow unmaskable ones, so that the maskable mode
	   can be used to vectorize the epilogue of the unmaskable mode.

	 This order is preferred because it leads to the maximum
	 epilogue vectorization opportunities.  Targets should only use
	 a different order if they want to make wide modes available while
	 disparaging them relative to earlier, smaller modes.  The assumption
	 in that case is that the wider modes are more expensive in some
	 way that isn't reflected directly in the costs.

	 There should therefore be few interesting cases in which
	 LOOP_VINFO fails when treated as an epilogue loop, succeeds when
	 treated as a standalone loop, and ends up being genuinely cheaper
	 than FIRST_LOOP_VINFO.  */

However, the vectoriser can normally elide alias checks for epilogue
loops, on the basis that the main loop should do them instead.
Converting an epilogue loop to a main loop can therefore cause the alias
checks to be skipped.  (It probably also unfairly penalises the original
loop in the cost comparison, given that one loop will have alias checks
and the other won't.)

As the comment says, we should in principle analyse each vector mode
twice: once as a main loop and once as an epilogue.  However, doing
that up-front would be quite expensive.  This patch instead goes for a
compromise: if an epilogue loop for mode M2 seems better than a main
loop for mode M1, re-analyse with M2 as the main loop.

The patch fixes dg.torture.exp=pr69719.c when testing with
-msve-vector-bits=128.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


gcc/
	PR tree-optimization/98371
	* tree-vect-loop.c (vect_reanalyze_as_main_loop): New function.
	(vect_analyze_loop): If an epilogue loop appears to be cheaper
	than the main loop, re-analyze it as a main loop before adopting
	it as a main loop.
---
 gcc/tree-vect-loop.c | 61 +++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 58 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 688538a4521..221541f4a38 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -2850,6 +2850,45 @@ vect_joust_loop_vinfos (loop_vec_info new_loop_vinfo,
   return true;
 }
 
+/* If LOOP_VINFO is already a main loop, return it unmodified.  Otherwise
+   try to reanalyze it as a main loop.  Return the loop_vinfo on success
+   and null on failure.  */
+
+static loop_vec_info
+vect_reanalyze_as_main_loop (loop_vec_info loop_vinfo, unsigned int *n_stmts)
+{
+  if (!LOOP_VINFO_EPILOGUE_P (loop_vinfo))
+    return loop_vinfo;
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location,
+		     "***** Reanalyzing as a main loop with vector mode %s\n",
+		     GET_MODE_NAME (loop_vinfo->vector_mode));
+
+  struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+  vec_info_shared *shared = loop_vinfo->shared;
+  opt_loop_vec_info main_loop_vinfo = vect_analyze_loop_form (loop, shared);
+  gcc_assert (main_loop_vinfo);
+
+  main_loop_vinfo->vector_mode = loop_vinfo->vector_mode;
+
+  bool fatal = false;
+  bool res = vect_analyze_loop_2 (main_loop_vinfo, fatal, n_stmts);
+  loop->aux = NULL;
+  if (!res)
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, vect_location,
+			 "***** Failed to analyze main loop with vector"
+			 " mode %s\n",
+			 GET_MODE_NAME (loop_vinfo->vector_mode));
+      delete main_loop_vinfo;
+      return NULL;
+    }
+  LOOP_VINFO_VECTORIZABLE_P (main_loop_vinfo) = 1;
+  return main_loop_vinfo;
+}
+
 /* Function vect_analyze_loop.
 
    Apply a set of analyses on LOOP, and create a loop_vec_info struct
@@ -2997,9 +3036,25 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared)
 	      if (vinfos.is_empty ()
 		  && vect_joust_loop_vinfos (loop_vinfo, first_loop_vinfo))
 		{
-		  delete first_loop_vinfo;
-		  first_loop_vinfo = opt_loop_vec_info::success (NULL);
-		  LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo) = NULL;
+		  loop_vec_info main_loop_vinfo
+		    = vect_reanalyze_as_main_loop (loop_vinfo, &n_stmts);
+		  if (main_loop_vinfo == loop_vinfo)
+		    {
+		      delete first_loop_vinfo;
+		      first_loop_vinfo = opt_loop_vec_info::success (NULL);
+		    }
+		  else if (main_loop_vinfo
+			   && vect_joust_loop_vinfos (main_loop_vinfo,
+						      first_loop_vinfo))
+		    {
+		      delete first_loop_vinfo;
+		      first_loop_vinfo = opt_loop_vec_info::success (NULL);
+		      delete loop_vinfo;
+		      loop_vinfo
+			= opt_loop_vec_info::success (main_loop_vinfo);
+		    }
+		  else
+		    delete main_loop_vinfo;
 		}
 	    }
 

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [PATCH] vect: Fix missing alias checks for 128-bit SVE [PR98371]
  2020-12-18 12:46 [PATCH] vect: Fix missing alias checks for 128-bit SVE [PR98371] Richard Sandiford
@ 2021-01-05  9:04 ` Richard Biener
  0 siblings, 0 replies; 2+ messages in thread
From: Richard Biener @ 2021-01-05  9:04 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc-patches

On Fri, 18 Dec 2020, Richard Sandiford wrote:

> On AArch64, the vectoriser tries various ways of vectorising with both
> SVE and Advanced SIMD and picks the best one.  All other things being
> equal, it prefers earlier attempts over later attempts.
> 
> The way this works currently is that, once it has a successful
> vectorisation attempt A, it analyses all other attempts as epilogue
> loops of A:
> 
>       /* When pick_lowest_cost_p is true, we should in principle iterate
> 	 over all the loop_vec_infos that LOOP_VINFO could replace and
> 	 try to vectorize LOOP_VINFO under the same conditions.
> 	 E.g. when trying to replace an epilogue loop, we should vectorize
> 	 LOOP_VINFO as an epilogue loop with the same VF limit.  When trying
> 	 to replace the main loop, we should vectorize LOOP_VINFO as a main
> 	 loop too.
> 
> 	 However, autovectorize_vector_modes is usually sorted as follows:
> 
> 	 - Modes that naturally produce lower VFs usually follow modes that
> 	   naturally produce higher VFs.
> 
> 	 - When modes naturally produce the same VF, maskable modes
> 	   usually follow unmaskable ones, so that the maskable mode
> 	   can be used to vectorize the epilogue of the unmaskable mode.
> 
> 	 This order is preferred because it leads to the maximum
> 	 epilogue vectorization opportunities.  Targets should only use
> 	 a different order if they want to make wide modes available while
> 	 disparaging them relative to earlier, smaller modes.  The assumption
> 	 in that case is that the wider modes are more expensive in some
> 	 way that isn't reflected directly in the costs.
> 
> 	 There should therefore be few interesting cases in which
> 	 LOOP_VINFO fails when treated as an epilogue loop, succeeds when
> 	 treated as a standalone loop, and ends up being genuinely cheaper
> 	 than FIRST_LOOP_VINFO.  */
> 
> However, the vectoriser can normally elide alias checks for epilogue
> loops, on the basis that the main loop should do them instead.
> Converting an epilogue loop to a main loop can therefore cause the alias
> checks to be skipped.  (It probably also unfairly penalises the original
> loop in the cost comparison, given that one loop will have alias checks
> and the other won't.)
> 
> As the comment says, we should in principle analyse each vector mode
> twice: once as a main loop and once as an epilogue.  However, doing
> that up-front would be quite expensive.  This patch instead goes for a
> compromise: if an epilogue loop for mode M2 seems better than a main
> loop for mode M1, re-analyse with M2 as the main loop.
> 
> The patch fixes dg.torture.exp=pr69719.c when testing with
> -msve-vector-bits=128.
> 
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

OK.

Thanks,
Richard.

> Richard
> 
> 
> gcc/
> 	PR tree-optimization/98371
> 	* tree-vect-loop.c (vect_reanalyze_as_main_loop): New function.
> 	(vect_analyze_loop): If an epilogue loop appears to be cheaper
> 	than the main loop, re-analyze it as a main loop before adopting
> 	it as a main loop.
> ---
>  gcc/tree-vect-loop.c | 61 +++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 58 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index 688538a4521..221541f4a38 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -2850,6 +2850,45 @@ vect_joust_loop_vinfos (loop_vec_info new_loop_vinfo,
>    return true;
>  }
>  
> +/* If LOOP_VINFO is already a main loop, return it unmodified.  Otherwise
> +   try to reanalyze it as a main loop.  Return the loop_vinfo on success
> +   and null on failure.  */
> +
> +static loop_vec_info
> +vect_reanalyze_as_main_loop (loop_vec_info loop_vinfo, unsigned int *n_stmts)
> +{
> +  if (!LOOP_VINFO_EPILOGUE_P (loop_vinfo))
> +    return loop_vinfo;
> +
> +  if (dump_enabled_p ())
> +    dump_printf_loc (MSG_NOTE, vect_location,
> +		     "***** Reanalyzing as a main loop with vector mode %s\n",
> +		     GET_MODE_NAME (loop_vinfo->vector_mode));
> +
> +  struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> +  vec_info_shared *shared = loop_vinfo->shared;
> +  opt_loop_vec_info main_loop_vinfo = vect_analyze_loop_form (loop, shared);
> +  gcc_assert (main_loop_vinfo);
> +
> +  main_loop_vinfo->vector_mode = loop_vinfo->vector_mode;
> +
> +  bool fatal = false;
> +  bool res = vect_analyze_loop_2 (main_loop_vinfo, fatal, n_stmts);
> +  loop->aux = NULL;
> +  if (!res)
> +    {
> +      if (dump_enabled_p ())
> +	dump_printf_loc (MSG_NOTE, vect_location,
> +			 "***** Failed to analyze main loop with vector"
> +			 " mode %s\n",
> +			 GET_MODE_NAME (loop_vinfo->vector_mode));
> +      delete main_loop_vinfo;
> +      return NULL;
> +    }
> +  LOOP_VINFO_VECTORIZABLE_P (main_loop_vinfo) = 1;
> +  return main_loop_vinfo;
> +}
> +
>  /* Function vect_analyze_loop.
>  
>     Apply a set of analyses on LOOP, and create a loop_vec_info struct
> @@ -2997,9 +3036,25 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared)
>  	      if (vinfos.is_empty ()
>  		  && vect_joust_loop_vinfos (loop_vinfo, first_loop_vinfo))
>  		{
> -		  delete first_loop_vinfo;
> -		  first_loop_vinfo = opt_loop_vec_info::success (NULL);
> -		  LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo) = NULL;
> +		  loop_vec_info main_loop_vinfo
> +		    = vect_reanalyze_as_main_loop (loop_vinfo, &n_stmts);
> +		  if (main_loop_vinfo == loop_vinfo)
> +		    {
> +		      delete first_loop_vinfo;
> +		      first_loop_vinfo = opt_loop_vec_info::success (NULL);
> +		    }
> +		  else if (main_loop_vinfo
> +			   && vect_joust_loop_vinfos (main_loop_vinfo,
> +						      first_loop_vinfo))
> +		    {
> +		      delete first_loop_vinfo;
> +		      first_loop_vinfo = opt_loop_vec_info::success (NULL);
> +		      delete loop_vinfo;
> +		      loop_vinfo
> +			= opt_loop_vec_info::success (main_loop_vinfo);
> +		    }
> +		  else
> +		    delete main_loop_vinfo;
>  		}
>  	    }
>  
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-01-05  9:04 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-18 12:46 [PATCH] vect: Fix missing alias checks for 128-bit SVE [PR98371] Richard Sandiford
2021-01-05  9:04 ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).