public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH][vect] Use main loop's thresholds and vectorization factor to narrow upper_bound of epilogue
@ 2021-05-24  6:17 Andre Vieira (lists)
  2021-05-24  7:21 ` Kewen.Lin
  2021-05-24 10:30 ` Richard Sandiford
  0 siblings, 2 replies; 5+ messages in thread
From: Andre Vieira (lists) @ 2021-05-24  6:17 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Sandiford, Richard Biener

[-- Attachment #1: Type: text/plain, Size: 1161 bytes --]

Hi,

When vectorizing with --param vect-partial-vector-usage=1 the vectorizer 
uses an unpredicated (all-true predicate for SVE) main loop and a 
predicated tail loop. The way this was implemented seems to mean it 
re-uses the same vector-mode for both loops, which means the tail loop 
isn't an actual loop but only executes one iteration.

This patch uses the knowledge of the conditions to enter an epilogue 
loop to help come up with a potentially more restricive upper bound.

Regression tested on aarch64-linux-gnu and also ran the testsuite using 
'--param vect-partial-vector-usage=1' detecting no ICEs and no execution 
failures.

Would be good to have this tested for PPC too as I believe they are the 
main users of the --param vect-partial-vector-usage=1 option. Can 
someone help me test (and maybe even benchmark?) this on a PPC target?

Kind regards,
Andre

gcc/ChangeLog:

         * tree-vect-loop.c (vect_transform_loop): Use main loop's 
various' thresholds
         to narrow the upper bound on epilogue iterations.

gcc/testsuite/ChangeLog:

         * gcc.target/aarch64/sve/part_vect_single_iter_epilog.c: New test.


[-- Attachment #2: epilogue_bound.patch --]
[-- Type: text/plain, Size: 2397 bytes --]

diff --git a/gcc/testsuite/gcc.target/aarch64/sve/part_vect_single_iter_epilog.c b/gcc/testsuite/gcc.target/aarch64/sve/part_vect_single_iter_epilog.c
new file mode 100644
index 0000000000000000000000000000000000000000..a03229eb55585f637ebd5288fb4c00f8f921d44c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/part_vect_single_iter_epilog.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 --param vect-partial-vector-usage=1" } */
+
+void
+foo (short * __restrict__ a, short * __restrict__ b, short * __restrict__ c, int n)
+{
+  for (int i = 0; i < n; ++i)
+    c[i] = a[i] + b[i];
+}
+
+/* { dg-final { scan-assembler-times {\twhilelo\tp[0-9]+.h, wzr, [xw][0-9]+} 1 } } */
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 3e973e774af8f9205be893e01ad9263281116885..81e9c5cc42415a0a92b765bc46640105670c4e6b 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -9723,12 +9723,31 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
   /* In these calculations the "- 1" converts loop iteration counts
      back to latch counts.  */
   if (loop->any_upper_bound)
-    loop->nb_iterations_upper_bound
-      = (final_iter_may_be_partial
-	 ? wi::udiv_ceil (loop->nb_iterations_upper_bound + bias_for_lowest,
-			  lowest_vf) - 1
-	 : wi::udiv_floor (loop->nb_iterations_upper_bound + bias_for_lowest,
-			   lowest_vf) - 1);
+    {
+      loop_vec_info main_vinfo = LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo);
+      loop->nb_iterations_upper_bound
+	= (final_iter_may_be_partial
+	   ? wi::udiv_ceil (loop->nb_iterations_upper_bound + bias_for_lowest,
+			    lowest_vf) - 1
+	   : wi::udiv_floor (loop->nb_iterations_upper_bound + bias_for_lowest,
+			     lowest_vf) - 1);
+      if (main_vinfo)
+	{
+	  unsigned int bound;
+	  poly_uint64 main_iters
+	    = upper_bound (LOOP_VINFO_VECT_FACTOR (main_vinfo),
+			   LOOP_VINFO_COST_MODEL_THRESHOLD (main_vinfo));
+	  main_iters
+	    = upper_bound (main_iters,
+			   LOOP_VINFO_VERSIONING_THRESHOLD (main_vinfo));
+	  if (can_div_away_from_zero_p (main_iters,
+					LOOP_VINFO_VECT_FACTOR (loop_vinfo),
+					&bound))
+	    loop->nb_iterations_upper_bound
+	      = wi::umin ((widest_int) (bound - 1),
+			  loop->nb_iterations_upper_bound);
+      }
+  }
   if (loop->any_likely_upper_bound)
     loop->nb_iterations_likely_upper_bound
       = (final_iter_may_be_partial

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH][vect] Use main loop's thresholds and vectorization factor to narrow upper_bound of epilogue
  2021-05-24  6:17 [PATCH][vect] Use main loop's thresholds and vectorization factor to narrow upper_bound of epilogue Andre Vieira (lists)
@ 2021-05-24  7:21 ` Kewen.Lin
  2021-05-25  8:42   ` Kewen.Lin
  2021-05-24 10:30 ` Richard Sandiford
  1 sibling, 1 reply; 5+ messages in thread
From: Kewen.Lin @ 2021-05-24  7:21 UTC (permalink / raw)
  To: Andre Vieira (lists); +Cc: Richard Sandiford, Richard Biener, gcc-patches

Hi Andre,

on 2021/5/24 下午2:17, Andre Vieira (lists) via Gcc-patches wrote:
> Hi,
> 
> When vectorizing with --param vect-partial-vector-usage=1 the vectorizer uses an unpredicated (all-true predicate for SVE) main loop and a predicated tail loop. The way this was implemented seems to mean it re-uses the same vector-mode for both loops, which means the tail loop isn't an actual loop but only executes one iteration.
> 
> This patch uses the knowledge of the conditions to enter an epilogue loop to help come up with a potentially more restricive upper bound.
> 
> Regression tested on aarch64-linux-gnu and also ran the testsuite using '--param vect-partial-vector-usage=1' detecting no ICEs and no execution failures.
> 
> Would be good to have this tested for PPC too as I believe they are the main users of the --param vect-partial-vector-usage=1 option. Can someone help me test (and maybe even benchmark?) this on a PPC target?
> 


Thanks for doing this!  I can test it on Power10 which enables this parameter
by default, also evaluate its impact on SPEC2017 Ofast/unroll.

Do you have any preference for the baseline commit?  I'll use r12-0 if it's fine.

BR,
Kewen

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH][vect] Use main loop's thresholds and vectorization factor to narrow upper_bound of epilogue
  2021-05-24  6:17 [PATCH][vect] Use main loop's thresholds and vectorization factor to narrow upper_bound of epilogue Andre Vieira (lists)
  2021-05-24  7:21 ` Kewen.Lin
@ 2021-05-24 10:30 ` Richard Sandiford
  1 sibling, 0 replies; 5+ messages in thread
From: Richard Sandiford @ 2021-05-24 10:30 UTC (permalink / raw)
  To: Andre Vieira (lists); +Cc: gcc-patches, Richard Biener

"Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com> writes:
> Hi,
>
> When vectorizing with --param vect-partial-vector-usage=1 the vectorizer 
> uses an unpredicated (all-true predicate for SVE) main loop and a 
> predicated tail loop. The way this was implemented seems to mean it 
> re-uses the same vector-mode for both loops, which means the tail loop 
> isn't an actual loop but only executes one iteration.
>
> This patch uses the knowledge of the conditions to enter an epilogue 
> loop to help come up with a potentially more restricive upper bound.
>
> Regression tested on aarch64-linux-gnu and also ran the testsuite using 
> '--param vect-partial-vector-usage=1' detecting no ICEs and no execution 
> failures.
>
> Would be good to have this tested for PPC too as I believe they are the 
> main users of the --param vect-partial-vector-usage=1 option. Can 
> someone help me test (and maybe even benchmark?) this on a PPC target?
>
> Kind regards,
> Andre

LGTM.  OK if no objections and if the Power testing comes back clean.

Thanks,
Richard

> gcc/ChangeLog:
>
>          * tree-vect-loop.c (vect_transform_loop): Use main loop's 
> various' thresholds
>          to narrow the upper bound on epilogue iterations.
>
> gcc/testsuite/ChangeLog:
>
>          * gcc.target/aarch64/sve/part_vect_single_iter_epilog.c: New test.
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/part_vect_single_iter_epilog.c b/gcc/testsuite/gcc.target/aarch64/sve/part_vect_single_iter_epilog.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..a03229eb55585f637ebd5288fb4c00f8f921d44c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/part_vect_single_iter_epilog.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 --param vect-partial-vector-usage=1" } */
> +
> +void
> +foo (short * __restrict__ a, short * __restrict__ b, short * __restrict__ c, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +    c[i] = a[i] + b[i];
> +}
> +
> +/* { dg-final { scan-assembler-times {\twhilelo\tp[0-9]+.h, wzr, [xw][0-9]+} 1 } } */
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index 3e973e774af8f9205be893e01ad9263281116885..81e9c5cc42415a0a92b765bc46640105670c4e6b 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -9723,12 +9723,31 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
>    /* In these calculations the "- 1" converts loop iteration counts
>       back to latch counts.  */
>    if (loop->any_upper_bound)
> -    loop->nb_iterations_upper_bound
> -      = (final_iter_may_be_partial
> -	 ? wi::udiv_ceil (loop->nb_iterations_upper_bound + bias_for_lowest,
> -			  lowest_vf) - 1
> -	 : wi::udiv_floor (loop->nb_iterations_upper_bound + bias_for_lowest,
> -			   lowest_vf) - 1);
> +    {
> +      loop_vec_info main_vinfo = LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo);
> +      loop->nb_iterations_upper_bound
> +	= (final_iter_may_be_partial
> +	   ? wi::udiv_ceil (loop->nb_iterations_upper_bound + bias_for_lowest,
> +			    lowest_vf) - 1
> +	   : wi::udiv_floor (loop->nb_iterations_upper_bound + bias_for_lowest,
> +			     lowest_vf) - 1);
> +      if (main_vinfo)
> +	{
> +	  unsigned int bound;
> +	  poly_uint64 main_iters
> +	    = upper_bound (LOOP_VINFO_VECT_FACTOR (main_vinfo),
> +			   LOOP_VINFO_COST_MODEL_THRESHOLD (main_vinfo));
> +	  main_iters
> +	    = upper_bound (main_iters,
> +			   LOOP_VINFO_VERSIONING_THRESHOLD (main_vinfo));
> +	  if (can_div_away_from_zero_p (main_iters,
> +					LOOP_VINFO_VECT_FACTOR (loop_vinfo),
> +					&bound))
> +	    loop->nb_iterations_upper_bound
> +	      = wi::umin ((widest_int) (bound - 1),
> +			  loop->nb_iterations_upper_bound);
> +      }
> +  }
>    if (loop->any_likely_upper_bound)
>      loop->nb_iterations_likely_upper_bound
>        = (final_iter_may_be_partial

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH][vect] Use main loop's thresholds and vectorization factor to narrow upper_bound of epilogue
  2021-05-24  7:21 ` Kewen.Lin
@ 2021-05-25  8:42   ` Kewen.Lin
  2021-06-03 12:42     ` Andre Vieira (lists)
  0 siblings, 1 reply; 5+ messages in thread
From: Kewen.Lin @ 2021-05-25  8:42 UTC (permalink / raw)
  To: Andre Vieira (lists); +Cc: Richard Sandiford, Richard Biener, gcc-patches

on 2021/5/24 下午3:21, Kewen.Lin via Gcc-patches wrote:
> Hi Andre,
> 
> on 2021/5/24 下午2:17, Andre Vieira (lists) via Gcc-patches wrote:
>> Hi,
>>
>> When vectorizing with --param vect-partial-vector-usage=1 the vectorizer uses an unpredicated (all-true predicate for SVE) main loop and a predicated tail loop. The way this was implemented seems to mean it re-uses the same vector-mode for both loops, which means the tail loop isn't an actual loop but only executes one iteration.
>>
>> This patch uses the knowledge of the conditions to enter an epilogue loop to help come up with a potentially more restricive upper bound.
>>
>> Regression tested on aarch64-linux-gnu and also ran the testsuite using '--param vect-partial-vector-usage=1' detecting no ICEs and no execution failures.
>>
>> Would be good to have this tested for PPC too as I believe they are the main users of the --param vect-partial-vector-usage=1 option. Can someone help me test (and maybe even benchmark?) this on a PPC target?
>>
> 
> 
> Thanks for doing this!  I can test it on Power10 which enables this parameter
> by default, also evaluate its impact on SPEC2017 Ofast/unroll.
> 

Bootstrapped/regtested on powerpc64le-linux-gnu Power10.
SPEC2017 run didn't show any remarkable improvement/degradation.

BR,
Kewen

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH][vect] Use main loop's thresholds and vectorization factor to narrow upper_bound of epilogue
  2021-05-25  8:42   ` Kewen.Lin
@ 2021-06-03 12:42     ` Andre Vieira (lists)
  0 siblings, 0 replies; 5+ messages in thread
From: Andre Vieira (lists) @ 2021-06-03 12:42 UTC (permalink / raw)
  To: Kewen.Lin; +Cc: Richard Sandiford, Richard Biener, gcc-patches

Thank you Kewen!!

I will apply this now.

BR,
Andre

On 25/05/2021 09:42, Kewen.Lin wrote:
> on 2021/5/24 下午3:21, Kewen.Lin via Gcc-patches wrote:
>> Hi Andre,
>>
>> on 2021/5/24 下午2:17, Andre Vieira (lists) via Gcc-patches wrote:
>>> Hi,
>>>
>>> When vectorizing with --param vect-partial-vector-usage=1 the vectorizer uses an unpredicated (all-true predicate for SVE) main loop and a predicated tail loop. The way this was implemented seems to mean it re-uses the same vector-mode for both loops, which means the tail loop isn't an actual loop but only executes one iteration.
>>>
>>> This patch uses the knowledge of the conditions to enter an epilogue loop to help come up with a potentially more restricive upper bound.
>>>
>>> Regression tested on aarch64-linux-gnu and also ran the testsuite using '--param vect-partial-vector-usage=1' detecting no ICEs and no execution failures.
>>>
>>> Would be good to have this tested for PPC too as I believe they are the main users of the --param vect-partial-vector-usage=1 option. Can someone help me test (and maybe even benchmark?) this on a PPC target?
>>>
>>
>> Thanks for doing this!  I can test it on Power10 which enables this parameter
>> by default, also evaluate its impact on SPEC2017 Ofast/unroll.
>>
> Bootstrapped/regtested on powerpc64le-linux-gnu Power10.
> SPEC2017 run didn't show any remarkable improvement/degradation.
>
> BR,
> Kewen

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-06-03 12:42 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-24  6:17 [PATCH][vect] Use main loop's thresholds and vectorization factor to narrow upper_bound of epilogue Andre Vieira (lists)
2021-05-24  7:21 ` Kewen.Lin
2021-05-25  8:42   ` Kewen.Lin
2021-06-03 12:42     ` Andre Vieira (lists)
2021-05-24 10:30 ` Richard Sandiford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).