public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] vect: Don't vectorize a single scalar iteration loop [PR110740]
@ 2023-07-21  6:03 Kewen.Lin
  2023-07-21 11:49 ` Richard Biener
  0 siblings, 1 reply; 3+ messages in thread
From: Kewen.Lin @ 2023-07-21  6:03 UTC (permalink / raw)
  To: GCC Patches
  Cc: Richard Biener, Richard Sandiford, Peter Bergner, Segher Boessenkool

Hi,

The function vect_update_epilogue_niters which has been
removed by r14-2281 has some code taking care of that if
there is only one scalar iteration left for epilogue then
we won't try to vectorize it any more.

Although costing should be able to care about it eventually,
I think we still want this special casing without costing
enabled, so this patch is to add it back in function
vect_analyze_loop_costing, and make it more general for
both main and epilogue loops as Richi suggested, it can fix
some exposed failures on Power10:

 - gcc.target/powerpc/p9-vec-length-epil-{1,8}.c
 - gcc.dg/vect/slp-perm-{1,5,6,7}.c

Bootstrapped and regtested on x86_64-redhat-linux,
aarch64-linux-gnu, powerpc64-linux-gnu P8/P9 and
powerpc64le-linux-gnu P9/P10.

Is it ok for trunk?

BR,
Kewen
-----
	PR tree-optimization/110740

gcc/ChangeLog:

	* tree-vect-loop.cc (vect_analyze_loop_costing): Do not vectorize a
	loop with a single scalar iteration.
---
 gcc/tree-vect-loop.cc | 55 ++++++++++++++++++++++++++-----------------
 1 file changed, 34 insertions(+), 21 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index b44fb9c7712..92d2abde094 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -2158,8 +2158,7 @@ vect_analyze_loop_costing (loop_vec_info loop_vinfo,
      epilogue we can also decide whether the main loop leaves us
      with enough iterations, prefering a smaller vector epilog then
      also possibly used for the case we skip the vector loop.  */
-  if (!LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)
-      && LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
+  if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
     {
       widest_int scalar_niters
 	= wi::to_widest (LOOP_VINFO_NITERSM1 (loop_vinfo)) + 1;
@@ -2182,32 +2181,46 @@ vect_analyze_loop_costing (loop_vec_info loop_vinfo,
 			       % lowest_vf + gap);
 	    }
 	}
-
-      /* Check that the loop processes at least one full vector.  */
-      poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
-      if (known_lt (scalar_niters, vf))
+      /* Reject vectorizing for a single scalar iteration, even if
+	 we could in principle implement that using partial vectors.  */
+      unsigned peeling_gap = LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo);
+      if (scalar_niters <= peeling_gap + 1)
 	{
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-			     "loop does not have enough iterations "
-			     "to support vectorization.\n");
+			     "not vectorized: loop only has a single "
+			     "scalar iteration.\n");
 	  return 0;
 	}

-      /* If we need to peel an extra epilogue iteration to handle data
-	 accesses with gaps, check that there are enough scalar iterations
-	 available.
-
-	 The check above is redundant with this one when peeling for gaps,
-	 but the distinction is useful for diagnostics.  */
-      if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
-	  && known_le (scalar_niters, vf))
+      if (!LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo))
 	{
-	  if (dump_enabled_p ())
-	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-			     "loop does not have enough iterations "
-			     "to support peeling for gaps.\n");
-	  return 0;
+	  /* Check that the loop processes at least one full vector.  */
+	  poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+	  if (known_lt (scalar_niters, vf))
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				 "loop does not have enough iterations "
+				 "to support vectorization.\n");
+	      return 0;
+	    }
+
+	  /* If we need to peel an extra epilogue iteration to handle data
+	     accesses with gaps, check that there are enough scalar iterations
+	     available.
+
+	     The check above is redundant with this one when peeling for gaps,
+	     but the distinction is useful for diagnostics.  */
+	  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
+	      && known_le (scalar_niters, vf))
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				 "loop does not have enough iterations "
+				 "to support peeling for gaps.\n");
+	      return 0;
+	    }
 	}
     }

--
2.39.3

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] vect: Don't vectorize a single scalar iteration loop [PR110740]
  2023-07-21  6:03 [PATCH] vect: Don't vectorize a single scalar iteration loop [PR110740] Kewen.Lin
@ 2023-07-21 11:49 ` Richard Biener
  2023-07-24  6:28   ` Kewen.Lin
  0 siblings, 1 reply; 3+ messages in thread
From: Richard Biener @ 2023-07-21 11:49 UTC (permalink / raw)
  To: Kewen.Lin
  Cc: GCC Patches, Richard Sandiford, Peter Bergner, Segher Boessenkool

On Fri, Jul 21, 2023 at 8:08 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>
> Hi,
>
> The function vect_update_epilogue_niters which has been
> removed by r14-2281 has some code taking care of that if
> there is only one scalar iteration left for epilogue then
> we won't try to vectorize it any more.
>
> Although costing should be able to care about it eventually,
> I think we still want this special casing without costing
> enabled, so this patch is to add it back in function
> vect_analyze_loop_costing, and make it more general for
> both main and epilogue loops as Richi suggested, it can fix
> some exposed failures on Power10:
>
>  - gcc.target/powerpc/p9-vec-length-epil-{1,8}.c
>  - gcc.dg/vect/slp-perm-{1,5,6,7}.c
>
> Bootstrapped and regtested on x86_64-redhat-linux,
> aarch64-linux-gnu, powerpc64-linux-gnu P8/P9 and
> powerpc64le-linux-gnu P9/P10.
>
> Is it ok for trunk?

OK.

Thanks,
Richard.

> BR,
> Kewen
> -----
>         PR tree-optimization/110740
>
> gcc/ChangeLog:
>
>         * tree-vect-loop.cc (vect_analyze_loop_costing): Do not vectorize a
>         loop with a single scalar iteration.
> ---
>  gcc/tree-vect-loop.cc | 55 ++++++++++++++++++++++++++-----------------
>  1 file changed, 34 insertions(+), 21 deletions(-)
>
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index b44fb9c7712..92d2abde094 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -2158,8 +2158,7 @@ vect_analyze_loop_costing (loop_vec_info loop_vinfo,
>       epilogue we can also decide whether the main loop leaves us
>       with enough iterations, prefering a smaller vector epilog then
>       also possibly used for the case we skip the vector loop.  */
> -  if (!LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)
> -      && LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
> +  if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
>      {
>        widest_int scalar_niters
>         = wi::to_widest (LOOP_VINFO_NITERSM1 (loop_vinfo)) + 1;
> @@ -2182,32 +2181,46 @@ vect_analyze_loop_costing (loop_vec_info loop_vinfo,
>                                % lowest_vf + gap);
>             }
>         }
> -
> -      /* Check that the loop processes at least one full vector.  */
> -      poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> -      if (known_lt (scalar_niters, vf))
> +      /* Reject vectorizing for a single scalar iteration, even if
> +        we could in principle implement that using partial vectors.  */
> +      unsigned peeling_gap = LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo);
> +      if (scalar_niters <= peeling_gap + 1)
>         {
>           if (dump_enabled_p ())
>             dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -                            "loop does not have enough iterations "
> -                            "to support vectorization.\n");
> +                            "not vectorized: loop only has a single "
> +                            "scalar iteration.\n");
>           return 0;
>         }
>
> -      /* If we need to peel an extra epilogue iteration to handle data
> -        accesses with gaps, check that there are enough scalar iterations
> -        available.
> -
> -        The check above is redundant with this one when peeling for gaps,
> -        but the distinction is useful for diagnostics.  */
> -      if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
> -         && known_le (scalar_niters, vf))
> +      if (!LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo))
>         {
> -         if (dump_enabled_p ())
> -           dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -                            "loop does not have enough iterations "
> -                            "to support peeling for gaps.\n");
> -         return 0;
> +         /* Check that the loop processes at least one full vector.  */
> +         poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> +         if (known_lt (scalar_niters, vf))
> +           {
> +             if (dump_enabled_p ())
> +               dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +                                "loop does not have enough iterations "
> +                                "to support vectorization.\n");
> +             return 0;
> +           }
> +
> +         /* If we need to peel an extra epilogue iteration to handle data
> +            accesses with gaps, check that there are enough scalar iterations
> +            available.
> +
> +            The check above is redundant with this one when peeling for gaps,
> +            but the distinction is useful for diagnostics.  */
> +         if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
> +             && known_le (scalar_niters, vf))
> +           {
> +             if (dump_enabled_p ())
> +               dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +                                "loop does not have enough iterations "
> +                                "to support peeling for gaps.\n");
> +             return 0;
> +           }
>         }
>      }
>
> --
> 2.39.3

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] vect: Don't vectorize a single scalar iteration loop [PR110740]
  2023-07-21 11:49 ` Richard Biener
@ 2023-07-24  6:28   ` Kewen.Lin
  0 siblings, 0 replies; 3+ messages in thread
From: Kewen.Lin @ 2023-07-24  6:28 UTC (permalink / raw)
  To: Richard Biener
  Cc: GCC Patches, Richard Sandiford, Peter Bergner, Segher Boessenkool

on 2023/7/21 19:49, Richard Biener wrote:
> On Fri, Jul 21, 2023 at 8:08 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>
>> Hi,
>>
>> The function vect_update_epilogue_niters which has been
>> removed by r14-2281 has some code taking care of that if
>> there is only one scalar iteration left for epilogue then
>> we won't try to vectorize it any more.
>>
>> Although costing should be able to care about it eventually,
>> I think we still want this special casing without costing
>> enabled, so this patch is to add it back in function
>> vect_analyze_loop_costing, and make it more general for
>> both main and epilogue loops as Richi suggested, it can fix
>> some exposed failures on Power10:
>>
>>  - gcc.target/powerpc/p9-vec-length-epil-{1,8}.c
>>  - gcc.dg/vect/slp-perm-{1,5,6,7}.c
>>
>> Bootstrapped and regtested on x86_64-redhat-linux,
>> aarch64-linux-gnu, powerpc64-linux-gnu P8/P9 and
>> powerpc64le-linux-gnu P9/P10.
>>
>> Is it ok for trunk?
> 
> OK.
> 

Thanks Richi, pushed as r14-2736.

BR,
Kewen

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-07-24  6:28 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-21  6:03 [PATCH] vect: Don't vectorize a single scalar iteration loop [PR110740] Kewen.Lin
2023-07-21 11:49 ` Richard Biener
2023-07-24  6:28   ` Kewen.Lin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).