public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] tree-optimization/110979 - fold-left reduction and partial vectors
@ 2023-08-11 11:06 Richard Biener
  2023-08-11 11:24 ` Alexander Monakov
  2023-08-11 11:45 ` Richard Sandiford
  0 siblings, 2 replies; 5+ messages in thread
From: Richard Biener @ 2023-08-11 11:06 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.sandiford, amonakov

When we vectorize fold-left reductions with partial vectors but
no target operation available we use a vector conditional to force
excess elements to zero.  But that doesn't correctly preserve
the sign of zero.  The following patch disables partial vector
support in that case.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Does this look OK?  With -frounding-math -fno-signed-zeros we are
happily using the masking again, but that's OK, right?  An additional
+ 0.0 shouldn't do anything here.

Thanks,
Richard.

	PR tree-optimization/110979
	* tree-vect-loop.cc (vectorizable_reduction): For
	FOLD_LEFT_REDUCTION without target support make sure
	we don't need to honor signed zeros.

	* gcc.dg/torture/pr110979.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr110979.c | 25 +++++++++++++++++++++++++
 gcc/tree-vect-loop.cc                   | 11 +++++++++++
 2 files changed, 36 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr110979.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr110979.c b/gcc/testsuite/gcc.dg/torture/pr110979.c
new file mode 100644
index 00000000000..c25ad7a8a31
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr110979.c
@@ -0,0 +1,25 @@
+/* { dg-do run } */
+/* { dg-additional-options "--param vect-partial-vector-usage=2" } */
+
+#define FLT double
+#define N 20
+
+__attribute__((noipa))
+FLT
+foo3 (FLT *a)
+{
+  FLT sum = -0.0;
+  for (int i = 0; i != N; i++)
+    sum += a[i];
+  return sum;
+}
+
+int main()
+{
+  FLT a[N];
+  for (int i = 0; i != N; i++)
+    a[i] = -0.0;
+  if (!__builtin_signbit(foo3(a)))
+    __builtin_abort();
+  return 0;
+}
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index bf8d677b584..741b5c20389 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -8037,6 +8037,17 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
 			     " no conditional operation is available.\n");
 	  LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
 	}
+      else if (reduction_type == FOLD_LEFT_REDUCTION
+	       && reduc_fn == IFN_LAST
+	       && FLOAT_TYPE_P (vectype_in)
+	       && HONOR_SIGNED_ZEROS (vectype_in))
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			     "can't operate on partial vectors because"
+			     " signed zeros need to be handled.\n");
+	  LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
+	}
       else
 	{
 	  internal_fn mask_reduc_fn
-- 
2.35.3

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] tree-optimization/110979 - fold-left reduction and partial vectors
  2023-08-11 11:06 [PATCH] tree-optimization/110979 - fold-left reduction and partial vectors Richard Biener
@ 2023-08-11 11:24 ` Alexander Monakov
  2023-08-11 12:11   ` Richard Biener
  2023-08-11 11:45 ` Richard Sandiford
  1 sibling, 1 reply; 5+ messages in thread
From: Alexander Monakov @ 2023-08-11 11:24 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, richard.sandiford

[-- Attachment #1: Type: text/plain, Size: 1112 bytes --]


On Fri, 11 Aug 2023, Richard Biener wrote:

> When we vectorize fold-left reductions with partial vectors but
> no target operation available we use a vector conditional to force
> excess elements to zero.  But that doesn't correctly preserve
> the sign of zero.  The following patch disables partial vector
> support in that case.
> 
> Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> 
> Does this look OK?  With -frounding-math -fno-signed-zeros we are
> happily using the masking again, but that's OK, right?  An additional
> + 0.0 shouldn't do anything here.

I think it converts SNan to QNan (when the partial vector has just one
element which is SNan), so is a test for -fsignaling-nans missing?

In the defaut -fno-rounding-math -fno-signaling-nans mode I think we
can do the reduction by substituting negative zero for masked-off
elements — maybe it's worth diagnosing that case separately (i.e.
as "not yet implemented", not an incorrect transform)?

(note that in avx512 it's possible to materialize negative zeroes
by mask with a single vpternlog instruction, which is cheap)

Alexander

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] tree-optimization/110979 - fold-left reduction and partial vectors
  2023-08-11 11:06 [PATCH] tree-optimization/110979 - fold-left reduction and partial vectors Richard Biener
  2023-08-11 11:24 ` Alexander Monakov
@ 2023-08-11 11:45 ` Richard Sandiford
  1 sibling, 0 replies; 5+ messages in thread
From: Richard Sandiford @ 2023-08-11 11:45 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, amonakov

Richard Biener <rguenther@suse.de> writes:
> When we vectorize fold-left reductions with partial vectors but
> no target operation available we use a vector conditional to force
> excess elements to zero.  But that doesn't correctly preserve
> the sign of zero.  The following patch disables partial vector
> support in that case.
>
> Bootstrap and regtest running on x86_64-unknown-linux-gnu.
>
> Does this look OK?

LGTM.

> With -frounding-math -fno-signed-zeros we are
> happily using the masking again, but that's OK, right?  An additional
> + 0.0 shouldn't do anything here.

Yeah, I would hope so.

Thanks,
Richard

>
> Thanks,
> Richard.
>
> 	PR tree-optimization/110979
> 	* tree-vect-loop.cc (vectorizable_reduction): For
> 	FOLD_LEFT_REDUCTION without target support make sure
> 	we don't need to honor signed zeros.
>
> 	* gcc.dg/torture/pr110979.c: New testcase.
> ---
>  gcc/testsuite/gcc.dg/torture/pr110979.c | 25 +++++++++++++++++++++++++
>  gcc/tree-vect-loop.cc                   | 11 +++++++++++
>  2 files changed, 36 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/torture/pr110979.c
>
> diff --git a/gcc/testsuite/gcc.dg/torture/pr110979.c b/gcc/testsuite/gcc.dg/torture/pr110979.c
> new file mode 100644
> index 00000000000..c25ad7a8a31
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/torture/pr110979.c
> @@ -0,0 +1,25 @@
> +/* { dg-do run } */
> +/* { dg-additional-options "--param vect-partial-vector-usage=2" } */
> +
> +#define FLT double
> +#define N 20
> +
> +__attribute__((noipa))
> +FLT
> +foo3 (FLT *a)
> +{
> +  FLT sum = -0.0;
> +  for (int i = 0; i != N; i++)
> +    sum += a[i];
> +  return sum;
> +}
> +
> +int main()
> +{
> +  FLT a[N];
> +  for (int i = 0; i != N; i++)
> +    a[i] = -0.0;
> +  if (!__builtin_signbit(foo3(a)))
> +    __builtin_abort();
> +  return 0;
> +}
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index bf8d677b584..741b5c20389 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -8037,6 +8037,17 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
>  			     " no conditional operation is available.\n");
>  	  LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
>  	}
> +      else if (reduction_type == FOLD_LEFT_REDUCTION
> +	       && reduc_fn == IFN_LAST
> +	       && FLOAT_TYPE_P (vectype_in)
> +	       && HONOR_SIGNED_ZEROS (vectype_in))
> +	{
> +	  if (dump_enabled_p ())
> +	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +			     "can't operate on partial vectors because"
> +			     " signed zeros need to be handled.\n");
> +	  LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
> +	}
>        else
>  	{
>  	  internal_fn mask_reduc_fn

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] tree-optimization/110979 - fold-left reduction and partial vectors
  2023-08-11 11:24 ` Alexander Monakov
@ 2023-08-11 12:11   ` Richard Biener
  2023-08-11 13:34     ` Alexander Monakov
  0 siblings, 1 reply; 5+ messages in thread
From: Richard Biener @ 2023-08-11 12:11 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: gcc-patches, richard.sandiford

On Fri, 11 Aug 2023, Alexander Monakov wrote:

> 
> On Fri, 11 Aug 2023, Richard Biener wrote:
> 
> > When we vectorize fold-left reductions with partial vectors but
> > no target operation available we use a vector conditional to force
> > excess elements to zero.  But that doesn't correctly preserve
> > the sign of zero.  The following patch disables partial vector
> > support in that case.
> > 
> > Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> > 
> > Does this look OK?  With -frounding-math -fno-signed-zeros we are
> > happily using the masking again, but that's OK, right?  An additional
> > + 0.0 shouldn't do anything here.
> 
> I think it converts SNan to QNan (when the partial vector has just one
> element which is SNan), so is a test for -fsignaling-nans missing?

Hm, I guess that's a corner case that could happen when there's no
runtime profitability check on more than one element and when the
element accumulated is directly loaded from memory.  OTOH the
loop vectorizer always expects an initial value for the reduction
and thus we perform either no add (when the loop isn't entered)
or at least a single add (when it is).  So I think this particular
situation cannot occur?

> In the defaut -fno-rounding-math -fno-signaling-nans mode I think we
> can do the reduction by substituting negative zero for masked-off
> elements ? maybe it's worth diagnosing that case separately (i.e.
> as "not yet implemented", not an incorrect transform)?

Ah, that's interesting.  So the only case we can't handle is
-frounding-math -fsigned-zeros then.  I'll see to adjust the patch
accordingly, like the following incremental patch:

> (note that in avx512 it's possible to materialize negative zeroes
> by mask with a single vpternlog instruction, which is cheap)

It ends up loading the { -0.0, ... } constant from memory, the
{ 0.0, ... } mask is handled by using a zero-masked load, so
indeed cheaper.

Richard.

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 741b5c20389..bc3063c3615 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -6905,7 +6905,17 @@ vectorize_fold_left_reduction (loop_vec_info 
loop_vinfo,
 
   tree vector_identity = NULL_TREE;
   if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
-    vector_identity = build_zero_cst (vectype_out);
+    {
+      vector_identity = build_zero_cst (vectype_out);
+      if (!HONOR_SIGNED_ZEROS (vectype_out))
+       ;
+      else
+       {
+         gcc_assert (!HONOR_SIGN_DEPENDENT_ROUNDING (vectype_out));
+         vector_identity = const_unop (NEGATE_EXPR, vectype_out,
+                                       vector_identity);
+       }
+    }
 
   tree scalar_dest_var = vect_create_destination_var (scalar_dest, NULL);
   int i;
@@ -8040,12 +8050,13 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
       else if (reduction_type == FOLD_LEFT_REDUCTION
               && reduc_fn == IFN_LAST
               && FLOAT_TYPE_P (vectype_in)
-              && HONOR_SIGNED_ZEROS (vectype_in))
+              && HONOR_SIGNED_ZEROS (vectype_in)
+              && HONOR_SIGN_DEPENDENT_ROUNDING (vectype_in))
        {
          if (dump_enabled_p ())
            dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
                             "can't operate on partial vectors because"
-                            " signed zeros need to be handled.\n");
+                            " signed zeros cannot be preserved.\n");
          LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
        }
       else


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] tree-optimization/110979 - fold-left reduction and partial vectors
  2023-08-11 12:11   ` Richard Biener
@ 2023-08-11 13:34     ` Alexander Monakov
  0 siblings, 0 replies; 5+ messages in thread
From: Alexander Monakov @ 2023-08-11 13:34 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, richard.sandiford


On Fri, 11 Aug 2023, Richard Biener wrote:

> > I think it converts SNan to QNan (when the partial vector has just one
> > element which is SNan), so is a test for -fsignaling-nans missing?
> 
> Hm, I guess that's a corner case that could happen when there's no
> runtime profitability check on more than one element and when the
> element accumulated is directly loaded from memory.  OTOH the
> loop vectorizer always expects an initial value for the reduction
> and thus we perform either no add (when the loop isn't entered)
> or at least a single add (when it is).  So I think this particular
> situation cannot occur?

Yes, that makes sense, thanks for the elaboration.
(it's a bit subtle so maybe worth a comment? not sure)

> > In the defaut -fno-rounding-math -fno-signaling-nans mode I think we
> > can do the reduction by substituting negative zero for masked-off
> > elements ? maybe it's worth diagnosing that case separately (i.e.
> > as "not yet implemented", not an incorrect transform)?
> 
> Ah, that's interesting.  So the only case we can't handle is
> -frounding-math -fsigned-zeros then.  I'll see to adjust the patch
> accordingly, like the following incremental patch:

Yeah, nice!

> > (note that in avx512 it's possible to materialize negative zeroes
> > by mask with a single vpternlog instruction, which is cheap)
> 
> It ends up loading the { -0.0, ... } constant from memory, the
> { 0.0, ... } mask is handled by using a zero-masked load, so
> indeed cheaper.

I was thinking it could be easily done without a memory load,
but got confused, sorry.

Alexander

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-08-11 13:34 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-11 11:06 [PATCH] tree-optimization/110979 - fold-left reduction and partial vectors Richard Biener
2023-08-11 11:24 ` Alexander Monakov
2023-08-11 12:11   ` Richard Biener
2023-08-11 13:34     ` Alexander Monakov
2023-08-11 11:45 ` Richard Sandiford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).