public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] tree-optimization/111048 - avoid flawed logic in fold_vec_perm
@ 2023-08-18  7:46 Richard Biener
  2023-08-18  8:39 ` Richard Sandiford
  0 siblings, 1 reply; 7+ messages in thread
From: Richard Biener @ 2023-08-18  7:46 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.sandiford

The following avoids running into somehow flawed logic in fold_vec_perm
for non-VLA vectors.

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

Richard.

	PR tree-optimization/111048
	* fold-const.cc (fold_vec_perm_cst): Check for non-VLA
	vectors first.

	* gcc.dg/torture/pr111048.c: New testcase.
---
 gcc/fold-const.cc                       | 12 ++++++------
 gcc/testsuite/gcc.dg/torture/pr111048.c | 24 ++++++++++++++++++++++++
 2 files changed, 30 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr111048.c

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 5c51c9d91be..144fd7481b3 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -10625,6 +10625,11 @@ fold_vec_perm_cst (tree type, tree arg0, tree arg1, const vec_perm_indices &sel,
   unsigned res_npatterns, res_nelts_per_pattern;
   unsigned HOST_WIDE_INT res_nelts;
 
+  if (TYPE_VECTOR_SUBPARTS (type).is_constant (&res_nelts))
+    {
+      res_npatterns = res_nelts;
+      res_nelts_per_pattern = 1;
+    }
   /* (1) If SEL is a suitable mask as determined by
      valid_mask_for_fold_vec_perm_cst_p, then:
      res_npatterns = max of npatterns between ARG0, ARG1, and SEL
@@ -10634,7 +10639,7 @@ fold_vec_perm_cst (tree type, tree arg0, tree arg1, const vec_perm_indices &sel,
      res_npatterns = nelts in result vector.
      res_nelts_per_pattern = 1.
      This exception is made so that VLS ARG0, ARG1 and SEL work as before.  */
-  if (valid_mask_for_fold_vec_perm_cst_p (arg0, arg1, sel, reason))
+  else if (valid_mask_for_fold_vec_perm_cst_p (arg0, arg1, sel, reason))
     {
       res_npatterns
 	= std::max (VECTOR_CST_NPATTERNS (arg0),
@@ -10648,11 +10653,6 @@ fold_vec_perm_cst (tree type, tree arg0, tree arg1, const vec_perm_indices &sel,
 
       res_nelts = res_npatterns * res_nelts_per_pattern;
     }
-  else if (TYPE_VECTOR_SUBPARTS (type).is_constant (&res_nelts))
-    {
-      res_npatterns = res_nelts;
-      res_nelts_per_pattern = 1;
-    }
   else
     return NULL_TREE;
 
diff --git a/gcc/testsuite/gcc.dg/torture/pr111048.c b/gcc/testsuite/gcc.dg/torture/pr111048.c
new file mode 100644
index 00000000000..475978aae2b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr111048.c
@@ -0,0 +1,24 @@
+/* { dg-do run } */
+/* { dg-additional-options "-mavx2" { target avx2_runtime } } */
+
+typedef unsigned char u8;
+
+__attribute__((noipa))
+static void check(const u8 * v) {
+    if (*v != 15) __builtin_trap();
+}
+
+__attribute__((noipa))
+static void bug(void) {
+    u8 in_lanes[32];
+    for (unsigned i = 0; i < 32; i += 2) {
+      in_lanes[i + 0] = 0;
+      in_lanes[i + 1] = ((u8)0xff) >> (i & 7);
+    }
+
+    check(&in_lanes[13]);
+  }
+
+int main() {
+    bug();
+}
-- 
2.35.3

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] tree-optimization/111048 - avoid flawed logic in fold_vec_perm
  2023-08-18  7:46 [PATCH] tree-optimization/111048 - avoid flawed logic in fold_vec_perm Richard Biener
@ 2023-08-18  8:39 ` Richard Sandiford
  2023-08-18  9:22   ` Richard Biener
  0 siblings, 1 reply; 7+ messages in thread
From: Richard Sandiford @ 2023-08-18  8:39 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, prathamesh.kulkarni

Richard Biener <rguenther@suse.de> writes:
> The following avoids running into somehow flawed logic in fold_vec_perm
> for non-VLA vectors.
>
> Bootstrap & regtest running on x86_64-unknown-linux-gnu.
>
> Richard.
>
> 	PR tree-optimization/111048
> 	* fold-const.cc (fold_vec_perm_cst): Check for non-VLA
> 	vectors first.
>
> 	* gcc.dg/torture/pr111048.c: New testcase.

Please don't do this as a permanent thing.  It was a deliberate choice
to have the is_constant be the fallback, so that the "generic" (VLA+VLS)
logic gets more coverage.  Like you say, if something is wrong for VLS
then the chances are that it's also wrong for VLA.

Thanks,
Richard


> ---
>  gcc/fold-const.cc                       | 12 ++++++------
>  gcc/testsuite/gcc.dg/torture/pr111048.c | 24 ++++++++++++++++++++++++
>  2 files changed, 30 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/torture/pr111048.c
>
> diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> index 5c51c9d91be..144fd7481b3 100644
> --- a/gcc/fold-const.cc
> +++ b/gcc/fold-const.cc
> @@ -10625,6 +10625,11 @@ fold_vec_perm_cst (tree type, tree arg0, tree arg1, const vec_perm_indices &sel,
>    unsigned res_npatterns, res_nelts_per_pattern;
>    unsigned HOST_WIDE_INT res_nelts;
>  
> +  if (TYPE_VECTOR_SUBPARTS (type).is_constant (&res_nelts))
> +    {
> +      res_npatterns = res_nelts;
> +      res_nelts_per_pattern = 1;
> +    }
>    /* (1) If SEL is a suitable mask as determined by
>       valid_mask_for_fold_vec_perm_cst_p, then:
>       res_npatterns = max of npatterns between ARG0, ARG1, and SEL
> @@ -10634,7 +10639,7 @@ fold_vec_perm_cst (tree type, tree arg0, tree arg1, const vec_perm_indices &sel,
>       res_npatterns = nelts in result vector.
>       res_nelts_per_pattern = 1.
>       This exception is made so that VLS ARG0, ARG1 and SEL work as before.  */
> -  if (valid_mask_for_fold_vec_perm_cst_p (arg0, arg1, sel, reason))
> +  else if (valid_mask_for_fold_vec_perm_cst_p (arg0, arg1, sel, reason))
>      {
>        res_npatterns
>  	= std::max (VECTOR_CST_NPATTERNS (arg0),
> @@ -10648,11 +10653,6 @@ fold_vec_perm_cst (tree type, tree arg0, tree arg1, const vec_perm_indices &sel,
>  
>        res_nelts = res_npatterns * res_nelts_per_pattern;
>      }
> -  else if (TYPE_VECTOR_SUBPARTS (type).is_constant (&res_nelts))
> -    {
> -      res_npatterns = res_nelts;
> -      res_nelts_per_pattern = 1;
> -    }
>    else
>      return NULL_TREE;
>  
> diff --git a/gcc/testsuite/gcc.dg/torture/pr111048.c b/gcc/testsuite/gcc.dg/torture/pr111048.c
> new file mode 100644
> index 00000000000..475978aae2b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/torture/pr111048.c
> @@ -0,0 +1,24 @@
> +/* { dg-do run } */
> +/* { dg-additional-options "-mavx2" { target avx2_runtime } } */
> +
> +typedef unsigned char u8;
> +
> +__attribute__((noipa))
> +static void check(const u8 * v) {
> +    if (*v != 15) __builtin_trap();
> +}
> +
> +__attribute__((noipa))
> +static void bug(void) {
> +    u8 in_lanes[32];
> +    for (unsigned i = 0; i < 32; i += 2) {
> +      in_lanes[i + 0] = 0;
> +      in_lanes[i + 1] = ((u8)0xff) >> (i & 7);
> +    }
> +
> +    check(&in_lanes[13]);
> +  }
> +
> +int main() {
> +    bug();
> +}

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] tree-optimization/111048 - avoid flawed logic in fold_vec_perm
  2023-08-18  8:39 ` Richard Sandiford
@ 2023-08-18  9:22   ` Richard Biener
  2023-08-19 10:35     ` Prathamesh Kulkarni
  0 siblings, 1 reply; 7+ messages in thread
From: Richard Biener @ 2023-08-18  9:22 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc-patches, prathamesh.kulkarni

On Fri, 18 Aug 2023, Richard Sandiford wrote:

> Richard Biener <rguenther@suse.de> writes:
> > The following avoids running into somehow flawed logic in fold_vec_perm
> > for non-VLA vectors.
> >
> > Bootstrap & regtest running on x86_64-unknown-linux-gnu.
> >
> > Richard.
> >
> > 	PR tree-optimization/111048
> > 	* fold-const.cc (fold_vec_perm_cst): Check for non-VLA
> > 	vectors first.
> >
> > 	* gcc.dg/torture/pr111048.c: New testcase.
> 
> Please don't do this as a permanent thing.  It was a deliberate choice
> to have the is_constant be the fallback, so that the "generic" (VLA+VLS)
> logic gets more coverage.  Like you say, if something is wrong for VLS
> then the chances are that it's also wrong for VLA.

Sure, feel free to undo this change together with the fix for the
VLA case.

Richard.

> Thanks,
> Richard
> 
> 
> > ---
> >  gcc/fold-const.cc                       | 12 ++++++------
> >  gcc/testsuite/gcc.dg/torture/pr111048.c | 24 ++++++++++++++++++++++++
> >  2 files changed, 30 insertions(+), 6 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/torture/pr111048.c
> >
> > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> > index 5c51c9d91be..144fd7481b3 100644
> > --- a/gcc/fold-const.cc
> > +++ b/gcc/fold-const.cc
> > @@ -10625,6 +10625,11 @@ fold_vec_perm_cst (tree type, tree arg0, tree arg1, const vec_perm_indices &sel,
> >    unsigned res_npatterns, res_nelts_per_pattern;
> >    unsigned HOST_WIDE_INT res_nelts;
> >  
> > +  if (TYPE_VECTOR_SUBPARTS (type).is_constant (&res_nelts))
> > +    {
> > +      res_npatterns = res_nelts;
> > +      res_nelts_per_pattern = 1;
> > +    }
> >    /* (1) If SEL is a suitable mask as determined by
> >       valid_mask_for_fold_vec_perm_cst_p, then:
> >       res_npatterns = max of npatterns between ARG0, ARG1, and SEL
> > @@ -10634,7 +10639,7 @@ fold_vec_perm_cst (tree type, tree arg0, tree arg1, const vec_perm_indices &sel,
> >       res_npatterns = nelts in result vector.
> >       res_nelts_per_pattern = 1.
> >       This exception is made so that VLS ARG0, ARG1 and SEL work as before.  */
> > -  if (valid_mask_for_fold_vec_perm_cst_p (arg0, arg1, sel, reason))
> > +  else if (valid_mask_for_fold_vec_perm_cst_p (arg0, arg1, sel, reason))
> >      {
> >        res_npatterns
> >  	= std::max (VECTOR_CST_NPATTERNS (arg0),
> > @@ -10648,11 +10653,6 @@ fold_vec_perm_cst (tree type, tree arg0, tree arg1, const vec_perm_indices &sel,
> >  
> >        res_nelts = res_npatterns * res_nelts_per_pattern;
> >      }
> > -  else if (TYPE_VECTOR_SUBPARTS (type).is_constant (&res_nelts))
> > -    {
> > -      res_npatterns = res_nelts;
> > -      res_nelts_per_pattern = 1;
> > -    }
> >    else
> >      return NULL_TREE;
> >  
> > diff --git a/gcc/testsuite/gcc.dg/torture/pr111048.c b/gcc/testsuite/gcc.dg/torture/pr111048.c
> > new file mode 100644
> > index 00000000000..475978aae2b
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/torture/pr111048.c
> > @@ -0,0 +1,24 @@
> > +/* { dg-do run } */
> > +/* { dg-additional-options "-mavx2" { target avx2_runtime } } */
> > +
> > +typedef unsigned char u8;
> > +
> > +__attribute__((noipa))
> > +static void check(const u8 * v) {
> > +    if (*v != 15) __builtin_trap();
> > +}
> > +
> > +__attribute__((noipa))
> > +static void bug(void) {
> > +    u8 in_lanes[32];
> > +    for (unsigned i = 0; i < 32; i += 2) {
> > +      in_lanes[i + 0] = 0;
> > +      in_lanes[i + 1] = ((u8)0xff) >> (i & 7);
> > +    }
> > +
> > +    check(&in_lanes[13]);
> > +  }
> > +
> > +int main() {
> > +    bug();
> > +}
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] tree-optimization/111048 - avoid flawed logic in fold_vec_perm
  2023-08-18  9:22   ` Richard Biener
@ 2023-08-19 10:35     ` Prathamesh Kulkarni
  2023-08-21  6:56       ` Richard Biener
  0 siblings, 1 reply; 7+ messages in thread
From: Prathamesh Kulkarni @ 2023-08-19 10:35 UTC (permalink / raw)
  To: Richard Biener; +Cc: Richard Sandiford, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 4235 bytes --]

On Fri, 18 Aug 2023 at 14:52, Richard Biener <rguenther@suse.de> wrote:
>
> On Fri, 18 Aug 2023, Richard Sandiford wrote:
>
> > Richard Biener <rguenther@suse.de> writes:
> > > The following avoids running into somehow flawed logic in fold_vec_perm
> > > for non-VLA vectors.
> > >
> > > Bootstrap & regtest running on x86_64-unknown-linux-gnu.
> > >
> > > Richard.
> > >
> > >     PR tree-optimization/111048
> > >     * fold-const.cc (fold_vec_perm_cst): Check for non-VLA
> > >     vectors first.
> > >
> > >     * gcc.dg/torture/pr111048.c: New testcase.
> >
> > Please don't do this as a permanent thing.  It was a deliberate choice
> > to have the is_constant be the fallback, so that the "generic" (VLA+VLS)
> > logic gets more coverage.  Like you say, if something is wrong for VLS
> > then the chances are that it's also wrong for VLA.
>
> Sure, feel free to undo this change together with the fix for the
> VLA case.
Hi,
The attached patch reverts the workaround, and fixes the issue.
Bootstrapped+tested on aarch64-linux-gnu with and without SVE, and
x64_64-linux-gnu.
OK to commit ?

Thanks,
Prathamesh
>
> Richard.
>
> > Thanks,
> > Richard
> >
> >
> > > ---
> > >  gcc/fold-const.cc                       | 12 ++++++------
> > >  gcc/testsuite/gcc.dg/torture/pr111048.c | 24 ++++++++++++++++++++++++
> > >  2 files changed, 30 insertions(+), 6 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.dg/torture/pr111048.c
> > >
> > > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> > > index 5c51c9d91be..144fd7481b3 100644
> > > --- a/gcc/fold-const.cc
> > > +++ b/gcc/fold-const.cc
> > > @@ -10625,6 +10625,11 @@ fold_vec_perm_cst (tree type, tree arg0, tree arg1, const vec_perm_indices &sel,
> > >    unsigned res_npatterns, res_nelts_per_pattern;
> > >    unsigned HOST_WIDE_INT res_nelts;
> > >
> > > +  if (TYPE_VECTOR_SUBPARTS (type).is_constant (&res_nelts))
> > > +    {
> > > +      res_npatterns = res_nelts;
> > > +      res_nelts_per_pattern = 1;
> > > +    }
> > >    /* (1) If SEL is a suitable mask as determined by
> > >       valid_mask_for_fold_vec_perm_cst_p, then:
> > >       res_npatterns = max of npatterns between ARG0, ARG1, and SEL
> > > @@ -10634,7 +10639,7 @@ fold_vec_perm_cst (tree type, tree arg0, tree arg1, const vec_perm_indices &sel,
> > >       res_npatterns = nelts in result vector.
> > >       res_nelts_per_pattern = 1.
> > >       This exception is made so that VLS ARG0, ARG1 and SEL work as before.  */
> > > -  if (valid_mask_for_fold_vec_perm_cst_p (arg0, arg1, sel, reason))
> > > +  else if (valid_mask_for_fold_vec_perm_cst_p (arg0, arg1, sel, reason))
> > >      {
> > >        res_npatterns
> > >     = std::max (VECTOR_CST_NPATTERNS (arg0),
> > > @@ -10648,11 +10653,6 @@ fold_vec_perm_cst (tree type, tree arg0, tree arg1, const vec_perm_indices &sel,
> > >
> > >        res_nelts = res_npatterns * res_nelts_per_pattern;
> > >      }
> > > -  else if (TYPE_VECTOR_SUBPARTS (type).is_constant (&res_nelts))
> > > -    {
> > > -      res_npatterns = res_nelts;
> > > -      res_nelts_per_pattern = 1;
> > > -    }
> > >    else
> > >      return NULL_TREE;
> > >
> > > diff --git a/gcc/testsuite/gcc.dg/torture/pr111048.c b/gcc/testsuite/gcc.dg/torture/pr111048.c
> > > new file mode 100644
> > > index 00000000000..475978aae2b
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/torture/pr111048.c
> > > @@ -0,0 +1,24 @@
> > > +/* { dg-do run } */
> > > +/* { dg-additional-options "-mavx2" { target avx2_runtime } } */
> > > +
> > > +typedef unsigned char u8;
> > > +
> > > +__attribute__((noipa))
> > > +static void check(const u8 * v) {
> > > +    if (*v != 15) __builtin_trap();
> > > +}
> > > +
> > > +__attribute__((noipa))
> > > +static void bug(void) {
> > > +    u8 in_lanes[32];
> > > +    for (unsigned i = 0; i < 32; i += 2) {
> > > +      in_lanes[i + 0] = 0;
> > > +      in_lanes[i + 1] = ((u8)0xff) >> (i & 7);
> > > +    }
> > > +
> > > +    check(&in_lanes[13]);
> > > +  }
> > > +
> > > +int main() {
> > > +    bug();
> > > +}
> >
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

[-- Attachment #2: gnu-887-1.txt --]
[-- Type: text/plain, Size: 4021 bytes --]

PR111048: Set arg_npatterns correctly.

In valid_mask_for_fold_vec_perm_cst we set arg_npatterns always
to VECTOR_CST_NPATTERNS (arg0) because of (q1 & 0) == 0 in
following condition:

     /* Ensure that the stepped sequence always selects from the same
         input pattern.  */
      unsigned arg_npatterns
        = ((q1 & 0) == 0) ? VECTOR_CST_NPATTERNS (arg0)
                          : VECTOR_CST_NPATTERNS (arg1);

resulting in wrong code-gen issues.
The patch fixes this by changing the condition to (q1 & 1) == 0.

gcc/ChangeLog:
	PR tree-optimization/111048
	* fold-const.cc (valid_mask_for_fold_vec_perm_cst_p): Set arg_npatterns
	correctly.
	(fold_vec_perm_cst): Remove workaround and again call
	valid_mask_fold_vec_perm_cst_p for both VLS and VLA vectors.
	(test_fold_vec_perm_cst::test_nunits_min_4): Add test-case.

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 08530b6ae80..fa1d5834bc3 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -10599,7 +10599,7 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1,
       /* Ensure that the stepped sequence always selects from the same
 	 input pattern.  */
       unsigned arg_npatterns
-	= ((q1 & 0) == 0) ? VECTOR_CST_NPATTERNS (arg0)
+	= ((q1 & 1) == 0) ? VECTOR_CST_NPATTERNS (arg0)
 			  : VECTOR_CST_NPATTERNS (arg1);
 
       if (!multiple_p (step, arg_npatterns))
@@ -10625,11 +10625,6 @@ fold_vec_perm_cst (tree type, tree arg0, tree arg1, const vec_perm_indices &sel,
   unsigned res_npatterns, res_nelts_per_pattern;
   unsigned HOST_WIDE_INT res_nelts;
 
-  if (TYPE_VECTOR_SUBPARTS (type).is_constant (&res_nelts))
-    {
-      res_npatterns = res_nelts;
-      res_nelts_per_pattern = 1;
-    }
   /* (1) If SEL is a suitable mask as determined by
      valid_mask_for_fold_vec_perm_cst_p, then:
      res_npatterns = max of npatterns between ARG0, ARG1, and SEL
@@ -10639,7 +10634,7 @@ fold_vec_perm_cst (tree type, tree arg0, tree arg1, const vec_perm_indices &sel,
      res_npatterns = nelts in result vector.
      res_nelts_per_pattern = 1.
      This exception is made so that VLS ARG0, ARG1 and SEL work as before.  */
-  else if (valid_mask_for_fold_vec_perm_cst_p (arg0, arg1, sel, reason))
+  if (valid_mask_for_fold_vec_perm_cst_p (arg0, arg1, sel, reason))
     {
       res_npatterns
 	= std::max (VECTOR_CST_NPATTERNS (arg0),
@@ -10653,6 +10648,11 @@ fold_vec_perm_cst (tree type, tree arg0, tree arg1, const vec_perm_indices &sel,
 
       res_nelts = res_npatterns * res_nelts_per_pattern;
     }
+  else if (TYPE_VECTOR_SUBPARTS (type).is_constant (&res_nelts))
+    {
+      res_npatterns = res_nelts;
+      res_nelts_per_pattern = 1;
+    }
   else
     return NULL_TREE;
 
@@ -17518,6 +17518,36 @@ test_nunits_min_4 (machine_mode vmode)
 				ARG0(1), ARG0(0), ARG0(2) };
 	validate_res (2, 3, res, expected_res);
       }
+
+      /* Case 7: PR111048: Check that we set arg_npatterns correctly,
+	 when arg0, arg1 and sel have different number of patterns.
+	 arg0 is of shape (1, 1)
+	 arg1 is of shape (4, 1)
+	 sel is of shape (2, 3) = {1, len, 2, len+1, 3, len+2, ...}
+
+	 In this case the pattern: {len, len+1, len+2, ...} chooses arg1.
+	 However,
+	 step = (len+2) - (len+1) = 1
+	 arg_npatterns = VECTOR_CST_NPATTERNS (arg1) = 4
+	 Since step is not a multiple of arg_npatterns,
+	 valid_mask_for_fold_vec_perm_cst should return false,
+	 and thus fold_vec_perm_cst should return NULL_TREE.  */
+      {
+	tree arg0 = build_vec_cst_rand (vmode, 1, 1);
+	tree arg1 = build_vec_cst_rand (vmode, 4, 1);
+	poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
+
+	vec_perm_builder builder (len, 2, 3);
+	poly_uint64 mask_elems[] = { 0, len, 1, len + 1, 2, len + 2 };
+	builder_push_elems (builder, mask_elems);
+
+	vec_perm_indices sel (builder, 2, len);
+	const char *reason;
+	tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel, &reason);
+
+	ASSERT_TRUE (res == NULL_TREE);
+	ASSERT_TRUE (!strcmp (reason, "step is not multiple of npatterns"));
+      }
     }
 }
 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] tree-optimization/111048 - avoid flawed logic in fold_vec_perm
  2023-08-19 10:35     ` Prathamesh Kulkarni
@ 2023-08-21  6:56       ` Richard Biener
  2023-08-21  9:59         ` Prathamesh Kulkarni
  0 siblings, 1 reply; 7+ messages in thread
From: Richard Biener @ 2023-08-21  6:56 UTC (permalink / raw)
  To: Prathamesh Kulkarni; +Cc: Richard Sandiford, gcc-patches

On Sat, 19 Aug 2023, Prathamesh Kulkarni wrote:

> On Fri, 18 Aug 2023 at 14:52, Richard Biener <rguenther@suse.de> wrote:
> >
> > On Fri, 18 Aug 2023, Richard Sandiford wrote:
> >
> > > Richard Biener <rguenther@suse.de> writes:
> > > > The following avoids running into somehow flawed logic in fold_vec_perm
> > > > for non-VLA vectors.
> > > >
> > > > Bootstrap & regtest running on x86_64-unknown-linux-gnu.
> > > >
> > > > Richard.
> > > >
> > > >     PR tree-optimization/111048
> > > >     * fold-const.cc (fold_vec_perm_cst): Check for non-VLA
> > > >     vectors first.
> > > >
> > > >     * gcc.dg/torture/pr111048.c: New testcase.
> > >
> > > Please don't do this as a permanent thing.  It was a deliberate choice
> > > to have the is_constant be the fallback, so that the "generic" (VLA+VLS)
> > > logic gets more coverage.  Like you say, if something is wrong for VLS
> > > then the chances are that it's also wrong for VLA.
> >
> > Sure, feel free to undo this change together with the fix for the
> > VLA case.
> Hi,
> The attached patch reverts the workaround, and fixes the issue.
> Bootstrapped+tested on aarch64-linux-gnu with and without SVE, and
> x64_64-linux-gnu.
> OK to commit ?

OK.

> Thanks,
> Prathamesh
> >
> > Richard.
> >
> > > Thanks,
> > > Richard
> > >
> > >
> > > > ---
> > > >  gcc/fold-const.cc                       | 12 ++++++------
> > > >  gcc/testsuite/gcc.dg/torture/pr111048.c | 24 ++++++++++++++++++++++++
> > > >  2 files changed, 30 insertions(+), 6 deletions(-)
> > > >  create mode 100644 gcc/testsuite/gcc.dg/torture/pr111048.c
> > > >
> > > > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> > > > index 5c51c9d91be..144fd7481b3 100644
> > > > --- a/gcc/fold-const.cc
> > > > +++ b/gcc/fold-const.cc
> > > > @@ -10625,6 +10625,11 @@ fold_vec_perm_cst (tree type, tree arg0, tree arg1, const vec_perm_indices &sel,
> > > >    unsigned res_npatterns, res_nelts_per_pattern;
> > > >    unsigned HOST_WIDE_INT res_nelts;
> > > >
> > > > +  if (TYPE_VECTOR_SUBPARTS (type).is_constant (&res_nelts))
> > > > +    {
> > > > +      res_npatterns = res_nelts;
> > > > +      res_nelts_per_pattern = 1;
> > > > +    }
> > > >    /* (1) If SEL is a suitable mask as determined by
> > > >       valid_mask_for_fold_vec_perm_cst_p, then:
> > > >       res_npatterns = max of npatterns between ARG0, ARG1, and SEL
> > > > @@ -10634,7 +10639,7 @@ fold_vec_perm_cst (tree type, tree arg0, tree arg1, const vec_perm_indices &sel,
> > > >       res_npatterns = nelts in result vector.
> > > >       res_nelts_per_pattern = 1.
> > > >       This exception is made so that VLS ARG0, ARG1 and SEL work as before.  */
> > > > -  if (valid_mask_for_fold_vec_perm_cst_p (arg0, arg1, sel, reason))
> > > > +  else if (valid_mask_for_fold_vec_perm_cst_p (arg0, arg1, sel, reason))
> > > >      {
> > > >        res_npatterns
> > > >     = std::max (VECTOR_CST_NPATTERNS (arg0),
> > > > @@ -10648,11 +10653,6 @@ fold_vec_perm_cst (tree type, tree arg0, tree arg1, const vec_perm_indices &sel,
> > > >
> > > >        res_nelts = res_npatterns * res_nelts_per_pattern;
> > > >      }
> > > > -  else if (TYPE_VECTOR_SUBPARTS (type).is_constant (&res_nelts))
> > > > -    {
> > > > -      res_npatterns = res_nelts;
> > > > -      res_nelts_per_pattern = 1;
> > > > -    }
> > > >    else
> > > >      return NULL_TREE;
> > > >
> > > > diff --git a/gcc/testsuite/gcc.dg/torture/pr111048.c b/gcc/testsuite/gcc.dg/torture/pr111048.c
> > > > new file mode 100644
> > > > index 00000000000..475978aae2b
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/torture/pr111048.c
> > > > @@ -0,0 +1,24 @@
> > > > +/* { dg-do run } */
> > > > +/* { dg-additional-options "-mavx2" { target avx2_runtime } } */
> > > > +
> > > > +typedef unsigned char u8;
> > > > +
> > > > +__attribute__((noipa))
> > > > +static void check(const u8 * v) {
> > > > +    if (*v != 15) __builtin_trap();
> > > > +}
> > > > +
> > > > +__attribute__((noipa))
> > > > +static void bug(void) {
> > > > +    u8 in_lanes[32];
> > > > +    for (unsigned i = 0; i < 32; i += 2) {
> > > > +      in_lanes[i + 0] = 0;
> > > > +      in_lanes[i + 1] = ((u8)0xff) >> (i & 7);
> > > > +    }
> > > > +
> > > > +    check(&in_lanes[13]);
> > > > +  }
> > > > +
> > > > +int main() {
> > > > +    bug();
> > > > +}
> > >
> >
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] tree-optimization/111048 - avoid flawed logic in fold_vec_perm
  2023-08-21  6:56       ` Richard Biener
@ 2023-08-21  9:59         ` Prathamesh Kulkarni
  2023-08-21 10:45           ` Richard Sandiford
  0 siblings, 1 reply; 7+ messages in thread
From: Prathamesh Kulkarni @ 2023-08-21  9:59 UTC (permalink / raw)
  To: Richard Biener; +Cc: Richard Sandiford, gcc-patches

On Mon, 21 Aug 2023 at 12:26, Richard Biener <rguenther@suse.de> wrote:
>
> On Sat, 19 Aug 2023, Prathamesh Kulkarni wrote:
>
> > On Fri, 18 Aug 2023 at 14:52, Richard Biener <rguenther@suse.de> wrote:
> > >
> > > On Fri, 18 Aug 2023, Richard Sandiford wrote:
> > >
> > > > Richard Biener <rguenther@suse.de> writes:
> > > > > The following avoids running into somehow flawed logic in fold_vec_perm
> > > > > for non-VLA vectors.
> > > > >
> > > > > Bootstrap & regtest running on x86_64-unknown-linux-gnu.
> > > > >
> > > > > Richard.
> > > > >
> > > > >     PR tree-optimization/111048
> > > > >     * fold-const.cc (fold_vec_perm_cst): Check for non-VLA
> > > > >     vectors first.
> > > > >
> > > > >     * gcc.dg/torture/pr111048.c: New testcase.
> > > >
> > > > Please don't do this as a permanent thing.  It was a deliberate choice
> > > > to have the is_constant be the fallback, so that the "generic" (VLA+VLS)
> > > > logic gets more coverage.  Like you say, if something is wrong for VLS
> > > > then the chances are that it's also wrong for VLA.
> > >
> > > Sure, feel free to undo this change together with the fix for the
> > > VLA case.
> > Hi,
> > The attached patch reverts the workaround, and fixes the issue.
> > Bootstrapped+tested on aarch64-linux-gnu with and without SVE, and
> > x64_64-linux-gnu.
> > OK to commit ?
>
> OK.
Thanks, committed to trunk in 649388462e9a3c2de0b90ce525de8044704cc521

Thanks,
Prathamesh
>
> > Thanks,
> > Prathamesh
> > >
> > > Richard.
> > >
> > > > Thanks,
> > > > Richard
> > > >
> > > >
> > > > > ---
> > > > >  gcc/fold-const.cc                       | 12 ++++++------
> > > > >  gcc/testsuite/gcc.dg/torture/pr111048.c | 24 ++++++++++++++++++++++++
> > > > >  2 files changed, 30 insertions(+), 6 deletions(-)
> > > > >  create mode 100644 gcc/testsuite/gcc.dg/torture/pr111048.c
> > > > >
> > > > > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> > > > > index 5c51c9d91be..144fd7481b3 100644
> > > > > --- a/gcc/fold-const.cc
> > > > > +++ b/gcc/fold-const.cc
> > > > > @@ -10625,6 +10625,11 @@ fold_vec_perm_cst (tree type, tree arg0, tree arg1, const vec_perm_indices &sel,
> > > > >    unsigned res_npatterns, res_nelts_per_pattern;
> > > > >    unsigned HOST_WIDE_INT res_nelts;
> > > > >
> > > > > +  if (TYPE_VECTOR_SUBPARTS (type).is_constant (&res_nelts))
> > > > > +    {
> > > > > +      res_npatterns = res_nelts;
> > > > > +      res_nelts_per_pattern = 1;
> > > > > +    }
> > > > >    /* (1) If SEL is a suitable mask as determined by
> > > > >       valid_mask_for_fold_vec_perm_cst_p, then:
> > > > >       res_npatterns = max of npatterns between ARG0, ARG1, and SEL
> > > > > @@ -10634,7 +10639,7 @@ fold_vec_perm_cst (tree type, tree arg0, tree arg1, const vec_perm_indices &sel,
> > > > >       res_npatterns = nelts in result vector.
> > > > >       res_nelts_per_pattern = 1.
> > > > >       This exception is made so that VLS ARG0, ARG1 and SEL work as before.  */
> > > > > -  if (valid_mask_for_fold_vec_perm_cst_p (arg0, arg1, sel, reason))
> > > > > +  else if (valid_mask_for_fold_vec_perm_cst_p (arg0, arg1, sel, reason))
> > > > >      {
> > > > >        res_npatterns
> > > > >     = std::max (VECTOR_CST_NPATTERNS (arg0),
> > > > > @@ -10648,11 +10653,6 @@ fold_vec_perm_cst (tree type, tree arg0, tree arg1, const vec_perm_indices &sel,
> > > > >
> > > > >        res_nelts = res_npatterns * res_nelts_per_pattern;
> > > > >      }
> > > > > -  else if (TYPE_VECTOR_SUBPARTS (type).is_constant (&res_nelts))
> > > > > -    {
> > > > > -      res_npatterns = res_nelts;
> > > > > -      res_nelts_per_pattern = 1;
> > > > > -    }
> > > > >    else
> > > > >      return NULL_TREE;
> > > > >
> > > > > diff --git a/gcc/testsuite/gcc.dg/torture/pr111048.c b/gcc/testsuite/gcc.dg/torture/pr111048.c
> > > > > new file mode 100644
> > > > > index 00000000000..475978aae2b
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/torture/pr111048.c
> > > > > @@ -0,0 +1,24 @@
> > > > > +/* { dg-do run } */
> > > > > +/* { dg-additional-options "-mavx2" { target avx2_runtime } } */
> > > > > +
> > > > > +typedef unsigned char u8;
> > > > > +
> > > > > +__attribute__((noipa))
> > > > > +static void check(const u8 * v) {
> > > > > +    if (*v != 15) __builtin_trap();
> > > > > +}
> > > > > +
> > > > > +__attribute__((noipa))
> > > > > +static void bug(void) {
> > > > > +    u8 in_lanes[32];
> > > > > +    for (unsigned i = 0; i < 32; i += 2) {
> > > > > +      in_lanes[i + 0] = 0;
> > > > > +      in_lanes[i + 1] = ((u8)0xff) >> (i & 7);
> > > > > +    }
> > > > > +
> > > > > +    check(&in_lanes[13]);
> > > > > +  }
> > > > > +
> > > > > +int main() {
> > > > > +    bug();
> > > > > +}
> > > >
> > >
> > > --
> > > Richard Biener <rguenther@suse.de>
> > > SUSE Software Solutions Germany GmbH,
> > > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
> >
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] tree-optimization/111048 - avoid flawed logic in fold_vec_perm
  2023-08-21  9:59         ` Prathamesh Kulkarni
@ 2023-08-21 10:45           ` Richard Sandiford
  0 siblings, 0 replies; 7+ messages in thread
From: Richard Sandiford @ 2023-08-21 10:45 UTC (permalink / raw)
  To: Prathamesh Kulkarni; +Cc: Richard Biener, gcc-patches

Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> On Mon, 21 Aug 2023 at 12:26, Richard Biener <rguenther@suse.de> wrote:
>>
>> On Sat, 19 Aug 2023, Prathamesh Kulkarni wrote:
>>
>> > On Fri, 18 Aug 2023 at 14:52, Richard Biener <rguenther@suse.de> wrote:
>> > >
>> > > On Fri, 18 Aug 2023, Richard Sandiford wrote:
>> > >
>> > > > Richard Biener <rguenther@suse.de> writes:
>> > > > > The following avoids running into somehow flawed logic in fold_vec_perm
>> > > > > for non-VLA vectors.
>> > > > >
>> > > > > Bootstrap & regtest running on x86_64-unknown-linux-gnu.
>> > > > >
>> > > > > Richard.
>> > > > >
>> > > > >     PR tree-optimization/111048
>> > > > >     * fold-const.cc (fold_vec_perm_cst): Check for non-VLA
>> > > > >     vectors first.
>> > > > >
>> > > > >     * gcc.dg/torture/pr111048.c: New testcase.
>> > > >
>> > > > Please don't do this as a permanent thing.  It was a deliberate choice
>> > > > to have the is_constant be the fallback, so that the "generic" (VLA+VLS)
>> > > > logic gets more coverage.  Like you say, if something is wrong for VLS
>> > > > then the chances are that it's also wrong for VLA.
>> > >
>> > > Sure, feel free to undo this change together with the fix for the
>> > > VLA case.
>> > Hi,
>> > The attached patch reverts the workaround, and fixes the issue.
>> > Bootstrapped+tested on aarch64-linux-gnu with and without SVE, and
>> > x64_64-linux-gnu.
>> > OK to commit ?
>>
>> OK.
> Thanks, committed to trunk in 649388462e9a3c2de0b90ce525de8044704cc521

Thanks for the patch.  Please remember to close the PR too.

Richard

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-08-21 10:45 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-18  7:46 [PATCH] tree-optimization/111048 - avoid flawed logic in fold_vec_perm Richard Biener
2023-08-18  8:39 ` Richard Sandiford
2023-08-18  9:22   ` Richard Biener
2023-08-19 10:35     ` Prathamesh Kulkarni
2023-08-21  6:56       ` Richard Biener
2023-08-21  9:59         ` Prathamesh Kulkarni
2023-08-21 10:45           ` Richard Sandiford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).