public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Richard Sandiford <richard.sandiford@arm.com>
To: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
Cc: gcc Patches <gcc-patches@gcc.gnu.org>
Subject: Re: PR111648: Fix wrong code-gen due to incorrect VEC_PERM_EXPR folding
Date: Mon, 16 Oct 2023 22:10:10 +0100	[thread overview]
Message-ID: <mptlec26zpp.fsf@arm.com> (raw)
In-Reply-To: <CAAgBjMkyory+amhdRUyYc5x=+-w1go6xqjsMeWyjXqJPhuZZxA@mail.gmail.com> (Prathamesh Kulkarni's message of "Thu, 12 Oct 2023 15:24:49 +0530")

Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> On Wed, 11 Oct 2023 at 16:57, Prathamesh Kulkarni
> <prathamesh.kulkarni@linaro.org> wrote:
>>
>> On Wed, 11 Oct 2023 at 16:42, Prathamesh Kulkarni
>> <prathamesh.kulkarni@linaro.org> wrote:
>> >
>> > On Mon, 9 Oct 2023 at 17:05, Richard Sandiford
>> > <richard.sandiford@arm.com> wrote:
>> > >
>> > > Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
>> > > > Hi,
>> > > > The attached patch attempts to fix PR111648.
>> > > > As mentioned in PR, the issue is when a1 is a multiple of vector
>> > > > length, we end up creating following encoding in result: { base_elem,
>> > > > arg[0], arg[1], ... } (assuming S = 1),
>> > > > where arg is chosen input vector, which is incorrect, since the
>> > > > encoding originally in arg would be: { arg[0], arg[1], arg[2], ... }
>> > > >
>> > > > For the test-case mentioned in PR, vectorizer pass creates
>> > > > VEC_PERM_EXPR<arg0, arg, sel> where:
>> > > > arg0: { -16, -9, -10, -11 }
>> > > > arg1: { -12, -5, -6, -7 }
>> > > > sel = { 3, 4, 5, 6 }
>> > > >
>> > > > arg0, arg1 and sel are encoded with npatterns = 1 and nelts_per_pattern = 3.
>> > > > Since a1 = 4 and arg_len = 4, it ended up creating the result with
>> > > > following encoding:
>> > > > res = { arg0[3], arg1[0], arg1[1] } // npatterns = 1, nelts_per_pattern = 3
>> > > >       = { -11, -12, -5 }
>> > > >
>> > > > So for res[3], it used S = (-5) - (-12) = 7
>> > > > And hence computed it as -5 + 7 = 2.
>> > > > instead of selecting arg1[2], ie, -6.
>> > > >
>> > > > The patch tweaks valid_mask_for_fold_vec_perm_cst_p to punt if a1 is a multiple
>> > > > of vector length, so a1 ... ae select elements only from stepped part
>> > > > of the pattern
>> > > > from input vector and return false for this case.
>> > > >
>> > > > Since the vectors are VLS, fold_vec_perm_cst then sets:
>> > > > res_npatterns = res_nelts
>> > > > res_nelts_per_pattern  = 1
>> > > > which seems to fix the issue by encoding all the elements.
>> > > >
>> > > > The patch resulted in Case 4 and Case 5 failing from test_nunits_min_2 because
>> > > > they used sel = { 0, 0, 1, ... } and {len, 0, 1, ... } respectively,
>> > > > which used a1 = 0, and thus selected arg1[0].
>> > > >
>> > > > I removed Case 4 because it was already covered in test_nunits_min_4,
>> > > > and moved Case 5 to test_nunits_min_4, with sel = { len, 1, 2, ... }
>> > > > and added a new Case 9 to test for this issue.
>> > > >
>> > > > Passes bootstrap+test on aarch64-linux-gnu with and without SVE,
>> > > > and on x86_64-linux-gnu.
>> > > > Does the patch look OK ?
>> > > >
>> > > > Thanks,
>> > > > Prathamesh
>> > > >
>> > > > [PR111648] Fix wrong code-gen due to incorrect VEC_PERM_EXPR folding.
>> > > >
>> > > > gcc/ChangeLog:
>> > > >       PR tree-optimization/111648
>> > > >       * fold-const.cc (valid_mask_for_fold_vec_perm_cst_p): Punt if a1
>> > > >       is a multiple of vector length.
>> > > >       (test_nunits_min_2): Remove Case 4 and move Case 5 to ...
>> > > >       (test_nunits_min_4): ... here and rename case numbers. Also add
>> > > >       Case 9.
>> > > >
>> > > > gcc/testsuite/ChangeLog:
>> > > >       PR tree-optimization/111648
>> > > >       * gcc.dg/vect/pr111648.c: New test.
>> > > >
>> > > >
>> > > > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
>> > > > index 4f8561509ff..c5f421d6b76 100644
>> > > > --- a/gcc/fold-const.cc
>> > > > +++ b/gcc/fold-const.cc
>> > > > @@ -10682,8 +10682,8 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1,
>> > > >         return false;
>> > > >       }
>> > > >
>> > > > -      /* Ensure that the stepped sequence always selects from the same
>> > > > -      input pattern.  */
>> > > > +      /* Ensure that the stepped sequence always selects from the stepped
>> > > > +      part of same input pattern.  */
>> > > >        unsigned arg_npatterns
>> > > >       = ((q1 & 1) == 0) ? VECTOR_CST_NPATTERNS (arg0)
>> > > >                         : VECTOR_CST_NPATTERNS (arg1);
>> > > > @@ -10694,6 +10694,20 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1,
>> > > >           *reason = "step is not multiple of npatterns";
>> > > >         return false;
>> > > >       }
>> > > > +
>> > > > +      /* If a1 is a multiple of len, it will select base element of input
>> > > > +      vector resulting in following encoding:
>> > > > +      { base_elem, arg[0], arg[1], ... } where arg is the chosen input
>> > > > +      vector. This encoding is not originally present in arg, since it's
>> > > > +      defined as:
>> > > > +      { arg[0], arg[1], arg[2], ... }.  */
>> > > > +
>> > > > +      if (multiple_p (a1, arg_len))
>> > > > +     {
>> > > > +       if (reason)
>> > > > +         *reason = "selecting base element of input vector";
>> > > > +       return false;
>> > > > +     }
>> > >
>> > > That wouldn't catch (for example) cases where a1 == arg_len + 1 and the
>> > > second argument has 2 stepped patterns.
>> > Ah right, thanks for pointing out. In the attached patch I extended the check
>> > so that r1 < arg_npatterns which should check if we are choosing base
>> > elements from any of the patterns in arg (and not just first).
>> > Does that look OK ?
>> > >
>> > > The equivalent condition that handles multiple patterns would
>> > > probably be to reject q1 < arg_npatterns.  But that's only necessary if:
>> > Sorry, I don't understand -- we use q1 only for determining which
>> > vector to choose from,
>> > and r1 will give the index for first element ?
>> > >
>> > > (1) the argument has three elements per pattern (i.e. has a stepped
>> > >     sequence) and
>> > >
>> > > (2) element 2 - element 1 != element 1 - element 0
>> > >
>> > > I think we should check those to avoid pessimising VLA cases.
>> > Thanks for the suggestions. In attached POC patch (stage-1 tested), I
>> > added the above checks,
>> > does it look in the right direction ? Also, should this patch be the
>> > right fix for PR111754 ?
>> Oops sorry, this patch causes build errors on aarch64. Please ignore it.
>> Sorry for the noise.
> Hi Richard,
> The attached patch passes bootstrap+test on aarch64-linux-gnu with and
> without SVE,
> and on x86_64-linux-gnu.
> Does it look OK ?
>
> Thanks,
> Prathamesh
>>
>> Thanks,
>> Prathamesh
>> >
>> > Thanks,
>> > Prathamesh
>> > >
>> > > Thanks,
>> > > Richard
>> > >
>> > > >      }
>> > > >
>> > > >    return true;
>> > > > @@ -17425,47 +17439,6 @@ test_nunits_min_2 (machine_mode vmode)
>> > > >       tree expected_res[] = { ARG0(0), ARG1(0), ARG0(1), ARG1(1) };
>> > > >       validate_res (2, 2, res, expected_res);
>> > > >        }
>> > > > -
>> > > > -      /* Case 4: mask = {0, 0, 1, ...} // (1, 3)
>> > > > -      Test that the stepped sequence of the pattern selects from
>> > > > -      same input pattern. Since input vectors have npatterns = 2,
>> > > > -      and step (a2 - a1) = 1, step is not a multiple of npatterns
>> > > > -      in input vector. So return NULL_TREE.  */
>> > > > -      {
>> > > > -     tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1);
>> > > > -     tree arg1 = build_vec_cst_rand (vmode, 2, 3, 1);
>> > > > -     poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
>> > > > -
>> > > > -     vec_perm_builder builder (len, 1, 3);
>> > > > -     poly_uint64 mask_elems[] = { 0, 0, 1 };
>> > > > -     builder_push_elems (builder, mask_elems);
>> > > > -
>> > > > -     vec_perm_indices sel (builder, 2, len);
>> > > > -     const char *reason;
>> > > > -     tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel,
>> > > > -                                   &reason);
>> > > > -     ASSERT_TRUE (res == NULL_TREE);
>> > > > -     ASSERT_TRUE (!strcmp (reason, "step is not multiple of npatterns"));
>> > > > -      }
>> > > > -
>> > > > -      /* Case 5: mask = {len, 0, 1, ...} // (1, 3)
>> > > > -      Test that stepped sequence of the pattern selects from arg0.
>> > > > -      res = { arg1[0], arg0[0], arg0[1], ... } // (1, 3)  */
>> > > > -      {
>> > > > -     tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
>> > > > -     tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1);
>> > > > -     poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
>> > > > -
>> > > > -     vec_perm_builder builder (len, 1, 3);
>> > > > -     poly_uint64 mask_elems[] = { len, 0, 1 };
>> > > > -     builder_push_elems (builder, mask_elems);
>> > > > -
>> > > > -     vec_perm_indices sel (builder, 2, len);
>> > > > -     tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel);
>> > > > -
>> > > > -     tree expected_res[] = { ARG1(0), ARG0(0), ARG0(1) };
>> > > > -     validate_res (1, 3, res, expected_res);
>> > > > -      }
>> > > >      }
>> > > >  }
>> > > >
>> > > > @@ -17528,7 +17501,26 @@ test_nunits_min_4 (machine_mode vmode)
>> > > >       validate_res (1, 3, res, expected_res);
>> > > >        }
>> > > >
>> > > > -      /* Case 4:
>> > > > +      /* Case 4: mask = {len, 1, 2, ...} // (1, 3)
>> > > > +      Test that stepped sequence of the pattern selects from arg0.
>> > > > +      res = { arg1[0], arg0[1], arg0[2], ... } // (1, 3)  */
>> > > > +      {
>> > > > +     tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
>> > > > +     tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1);
>> > > > +     poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
>> > > > +
>> > > > +     vec_perm_builder builder (len, 1, 3);
>> > > > +     poly_uint64 mask_elems[] = { len, 1, 2 };
>> > > > +     builder_push_elems (builder, mask_elems);
>> > > > +
>> > > > +     vec_perm_indices sel (builder, 2, len);
>> > > > +     tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel);
>> > > > +
>> > > > +     tree expected_res[] = { ARG1(0), ARG0(1), ARG0(2) };
>> > > > +     validate_res (1, 3, res, expected_res);
>> > > > +      }
>> > > > +
>> > > > +      /* Case 5:
>> > > >       sel = { len, 0, 2, ... } // (1, 3)
>> > > >       This should return NULL because we cross the input vectors.
>> > > >       Because,
>> > > > @@ -17561,7 +17553,7 @@ test_nunits_min_4 (machine_mode vmode)
>> > > >       ASSERT_TRUE (!strcmp (reason, "crossed input vectors"));
>> > > >        }
>> > > >
>> > > > -      /* Case 5: npatterns(arg0) = 4 > npatterns(sel) = 2
>> > > > +      /* Case 6: npatterns(arg0) = 4 > npatterns(sel) = 2
>> > > >        mask = { 0, len, 1, len + 1, ...} // (2, 2)
>> > > >        res = { arg0[0], arg1[0], arg0[1], arg1[1], ... } // (2, 2)
>> > > >
>> > > > @@ -17583,7 +17575,7 @@ test_nunits_min_4 (machine_mode vmode)
>> > > >       validate_res (2, 2, res, expected_res);
>> > > >        }
>> > > >
>> > > > -      /* Case 6: Test combination in sel, where one pattern is dup and other
>> > > > +      /* Case 7: Test combination in sel, where one pattern is dup and other
>> > > >        is stepped sequence.
>> > > >        sel = { 0, 0, 0, 1, 0, 2, ... } // (2, 3)
>> > > >        res = { arg0[0], arg0[0], arg0[0],
>> > > > @@ -17605,7 +17597,7 @@ test_nunits_min_4 (machine_mode vmode)
>> > > >       validate_res (2, 3, res, expected_res);
>> > > >        }
>> > > >
>> > > > -      /* Case 7: PR111048: Check that we set arg_npatterns correctly,
>> > > > +      /* Case 8: PR111048: Check that we set arg_npatterns correctly,
>> > > >        when arg0, arg1 and sel have different number of patterns.
>> > > >        arg0 is of shape (1, 1)
>> > > >        arg1 is of shape (4, 1)
>> > > > @@ -17634,6 +17626,51 @@ test_nunits_min_4 (machine_mode vmode)
>> > > >       ASSERT_TRUE (res == NULL_TREE);
>> > > >       ASSERT_TRUE (!strcmp (reason, "step is not multiple of npatterns"));
>> > > >        }
>> > > > +
>> > > > +      /* Case 9: PR111648 - a1 is multiple of vector length,
>> > > > +      which results in incorrect encoding. Verify that we return
>> > > > +      NULL for this case.
>> > > > +      sel = { base_elem, len, len+1, ... } // (1, 3)
>> > > > +      In this case, the single pattern is: { base_elem len, len+1, ...}
>> > > > +      Let's assume that base_elem is used for indexing into arg0,
>> > > > +      and a1 ... ae chooses elements from arg1.
>> > > > +      So res = { arg0[base_elem], arg1[0], arg1[1], ... } // (1, 3)
>> > > > +      Which creates an incorrect encoding with S = arg1[1] - arg1[0]
>> > > > +      while the original encoding in arg1 is
>> > > > +      arg1: { arg1[0], arg1[1], arg1[2], ... }
>> > > > +      with S = arg1[2] - arg1[1].
>> > > > +
>> > > > +      As a concrete example, for above PR:
>> > > > +      arg0: { -16, -9, -10, -11 }
>> > > > +      arg1: { -12, -5, -6, -7 }
>> > > > +      sel = { 3, 4, 5, 6 }
>> > > > +
>> > > > +      arg0, arg1 and sel are encoded with npatterns = 1 and nelts_per_pattern = 3.
>> > > > +      Since a1 = 4 and arg_len = 4, it ended up creating the result with
>> > > > +      following encoding:
>> > > > +      res = { arg0[3], arg1[0], arg1[1] } // (1, 3)
>> > > > +          = { -11, -12, -5 }
>> > > > +
>> > > > +      So for res[3], it used S = (-5) - (-12) = 7
>> > > > +      And hence computed it as -5 + 7 = 2.
>> > > > +      instead of arg1[2], ie, -6, which is the correct value.
>> > > > +      Ensure that valid_mask_for_fold_vec_perm_cst returns false for this case.  */
>> > > > +      {
>> > > > +     tree arg0 = build_vec_cst_rand (vmode, 1, 3);
>> > > > +     tree arg1 = build_vec_cst_rand (vmode, 1, 3);
>> > > > +     poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
>> > > > +
>> > > > +     vec_perm_builder builder (len, 1, 3);
>> > > > +     poly_uint64 mask_elems[] = { 0, len, len+1 };
>> > > > +     builder_push_elems (builder, mask_elems);
>> > > > +
>> > > > +     vec_perm_indices sel (builder, 2, len);
>> > > > +     const char *reason;
>> > > > +     tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel, &reason);
>> > > > +     ASSERT_TRUE (res == NULL_TREE);
>> > > > +     ASSERT_TRUE (!strcmp (reason,
>> > > > +                           "selecting base element of input vector"));
>> > > > +      }
>> > > >      }
>> > > >  }
>> > > >
>> > > > diff --git a/gcc/testsuite/gcc.dg/vect/pr111648.c b/gcc/testsuite/gcc.dg/vect/pr111648.c
>> > > > new file mode 100644
>> > > > index 00000000000..093e2b02654
>> > > > --- /dev/null
>> > > > +++ b/gcc/testsuite/gcc.dg/vect/pr111648.c
>> > > > @@ -0,0 +1,23 @@
>> > > > +/* { dg-do compile } */
>> > > > +/* { dg-options "-O2 -fdump-tree-optimized" } */
>> > > > +
>> > > > +int a;
>> > > > +int *b = &a;
>> > > > +static int **c = &b;
>> > > > +static int d;
>> > > > +short e;
>> > > > +short f;
>> > > > +
>> > > > +_Bool foo ()
>> > > > +{
>> > > > +  f = -21;
>> > > > +  for (; f < -5; f++) {
>> > > > +    e = f ^ 3;
>> > > > +    d = *b;
>> > > > +    **c = e;
>> > > > +  }
>> > > > +
>> > > > +  return d == -6;
>> > > > +}
>> > > > +
>> > > > +/* { dg-final { scan-tree-dump "return 1" "optimized" } } */
>
> diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> index 4f8561509ff..55a6a68c16c 100644
> --- a/gcc/fold-const.cc
> +++ b/gcc/fold-const.cc
> @@ -10684,9 +10684,8 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1,
>  
>        /* Ensure that the stepped sequence always selects from the same
>  	 input pattern.  */
> -      unsigned arg_npatterns
> -	= ((q1 & 1) == 0) ? VECTOR_CST_NPATTERNS (arg0)
> -			  : VECTOR_CST_NPATTERNS (arg1);
> +      tree arg = ((q1 & 1) == 0) ? arg0 : arg1;
> +      unsigned arg_npatterns = VECTOR_CST_NPATTERNS (arg);
>  
>        if (!multiple_p (step, arg_npatterns))
>  	{
> @@ -10694,6 +10693,29 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1,
>  	    *reason = "step is not multiple of npatterns";
>  	  return false;
>  	}
> +
> +      /* If a1 chooses base element from arg, ensure that it's a natural
> +	 stepped sequence, ie, (arg[2] - arg[1]) == (arg[1] - arg[0])
> +	 to preserve arg's encoding.  */
> +
> +      unsigned HOST_WIDE_INT index;
> +      if (!r1.is_constant (&index))
> +	return false;
> +      if (index < arg_npatterns)
> +	{

I don't know whether it matters in practice, but I think the two conditions
above are more natural as:

    if (maybe_lt (r1, arg_npatterns))
      {
        unsigned HOST_WIDE_INT index;
        if (!r1.is_constant (&index))
          return false;

        ...[code below]...
      }        

> +	  tree arg_elem0 = vector_cst_elt (arg, index);
> +	  tree arg_elem1 = vector_cst_elt (arg, index + arg_npatterns);
> +	  tree arg_elem2 = vector_cst_elt (arg, index + arg_npatterns * 2);
> +
> +	  if (!operand_equal_p (const_binop (MINUS_EXPR, arg_elem2, arg_elem1),
> +				const_binop (MINUS_EXPR, arg_elem1, arg_elem0),
> +				0))

This needs to check whether const_binop returns null.  Maybe:

   tree step1, step2;
   if (!(step1 = const_binop (MINUS_EXPR, arg_elem1, arg_elem0))
       || !(step2 = const_binop (MINUS_EXPR, arg_elem2, arg_elem1))
       || !operand_equal_p (step1, step2, 0))

OK with those changes, thanks.

Richard

> +	    {
> +	      if (reason)
> +		*reason = "not a natural stepped sequence";
> +	      return false;
> +	    }
> +	}
>      }
>  
>    return true;
> @@ -17161,7 +17183,8 @@ namespace test_fold_vec_perm_cst {
>  static tree
>  build_vec_cst_rand (machine_mode vmode, unsigned npatterns,
>  		    unsigned nelts_per_pattern,
> -		    int step = 0, int threshold = 100)
> +		    int step = 0, bool natural_stepped = false,
> +		    int threshold = 100)
>  {
>    tree inner_type = lang_hooks.types.type_for_mode (GET_MODE_INNER (vmode), 1);
>    tree vectype = build_vector_type_for_mode (inner_type, vmode);
> @@ -17176,17 +17199,28 @@ build_vec_cst_rand (machine_mode vmode, unsigned npatterns,
>  
>    // Fill a1 for each pattern
>    for (unsigned i = 0; i < npatterns; i++)
> -    builder.quick_push (build_int_cst (inner_type, rand () % threshold));
> -
> +    {
> +      tree a1;
> +      if (natural_stepped)
> +	{
> +	  tree a0 = builder[i];
> +	  wide_int a0_val = wi::to_wide (a0);
> +	  wide_int a1_val = a0_val + step;
> +	  a1 = wide_int_to_tree (inner_type, a1_val);
> +	}
> +      else
> +	a1 = build_int_cst (inner_type, rand () % threshold);
> +      builder.quick_push (a1);
> +    }
>    if (nelts_per_pattern == 2)
>      return builder.build ();
>  
>    for (unsigned i = npatterns * 2; i < npatterns * nelts_per_pattern; i++)
>      {
>        tree prev_elem = builder[i - npatterns];
> -      int prev_elem_val = TREE_INT_CST_LOW (prev_elem);
> -      int val = prev_elem_val + step;
> -      builder.quick_push (build_int_cst (inner_type, val));
> +      wide_int prev_elem_val = wi::to_wide (prev_elem);
> +      wide_int val = prev_elem_val + step;
> +      builder.quick_push (wide_int_to_tree (inner_type, val));
>      }
>  
>    return builder.build ();
> @@ -17432,7 +17466,7 @@ test_nunits_min_2 (machine_mode vmode)
>  	 and step (a2 - a1) = 1, step is not a multiple of npatterns
>  	 in input vector. So return NULL_TREE.  */
>        {
> -	tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1);
> +	tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1, true);
>  	tree arg1 = build_vec_cst_rand (vmode, 2, 3, 1);
>  	poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
>  
> @@ -17452,7 +17486,7 @@ test_nunits_min_2 (machine_mode vmode)
>  	 Test that stepped sequence of the pattern selects from arg0.
>  	 res = { arg1[0], arg0[0], arg0[1], ... } // (1, 3)  */
>        {
> -	tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
> +	tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1, true);
>  	tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1);
>  	poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
>  
> @@ -17466,6 +17500,62 @@ test_nunits_min_2 (machine_mode vmode)
>  	tree expected_res[] = { ARG1(0), ARG0(0), ARG0(1) };
>  	validate_res (1, 3, res, expected_res);
>        }
> +
> +      /* Case 6: PR111648 - a1 chooses base element from input vector arg.
> +	 In this case ensure that arg has a natural stepped sequence
> +	 to preserve arg's encoding.
> +
> +	 As a concrete example, consider:
> +	 arg0: { -16, -9, -10, ... } // (1, 3)
> +	 arg1: { -12, -5, -6, ... }  // (1, 3)
> +	 sel = { 0, len, len + 1, ... } // (1, 3)
> +
> +	 This will create res with following encoding:
> +	 res = { arg0[0], arg1[0], arg1[1], ... } // (1, 3)
> +	     = { -16, -12, -5, ... }
> +
> +	 The step in above encoding would be: (-5) - (-12) = 7
> +	 And hence res[3] would be computed as -5 + 7 = 2.
> +	 instead of arg1[2], ie, -6.
> +	 Ensure that valid_mask_for_fold_vec_perm_cst returns false
> +	 for this case.  */
> +      {
> +	tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
> +	tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1);
> +	poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> +
> +	vec_perm_builder builder (len, 1, 3);
> +	poly_uint64 mask_elems[] = { 0, len, len+1 };
> +	builder_push_elems (builder, mask_elems);
> +
> +	vec_perm_indices sel (builder, 2, len);
> +	const char *reason;
> +	/* FIXME: It may happen that build_vec_cst_rand may build a natural
> +	   stepped pattern, even if we didn't explicitly tell it to. So folding
> +	   may not always fail, but if it does, ensure that's because arg1 does
> +	   not have a natural stepped sequence (and not due to other reason)  */
> +	tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel, &reason);
> +	if (res == NULL_TREE)
> +	  ASSERT_TRUE (!strcmp (reason, "not a natural stepped sequence"));
> +      }
> +
> +      /* Case 7: Same as Case 6, except that arg1 contains natural stepped
> +	 sequence and thus folding should be valid for this case.  */
> +      {
> +	tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
> +	tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1, true);
> +	poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> +
> +	vec_perm_builder builder (len, 1, 3);
> +	poly_uint64 mask_elems[] = { 0, len, len+1 };
> +	builder_push_elems (builder, mask_elems);
> +
> +	vec_perm_indices sel (builder, 2, len);
> +	tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel);
> +
> +	tree expected_res[] = { ARG0(0), ARG1(0), ARG1(1) };
> +	validate_res (1, 3, res, expected_res);
> +      }
>      }
>  }
>  
> diff --git a/gcc/testsuite/gcc.dg/vect/pr111648.c b/gcc/testsuite/gcc.dg/vect/pr111648.c
> new file mode 100644
> index 00000000000..093e2b02654
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/pr111648.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> +int a;
> +int *b = &a;
> +static int **c = &b;
> +static int d;
> +short e;
> +short f;
> +
> +_Bool foo ()
> +{
> +  f = -21;
> +  for (; f < -5; f++) {
> +    e = f ^ 3;
> +    d = *b;
> +    **c = e;
> +  }
> +
> +  return d == -6;
> +}
> +
> +/* { dg-final { scan-tree-dump "return 1" "optimized" } } */

  reply	other threads:[~2023-10-16 21:10 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-04 10:16 Prathamesh Kulkarni
2023-10-09 11:35 ` Richard Sandiford
2023-10-11 11:12   ` Prathamesh Kulkarni
2023-10-11 11:27     ` Prathamesh Kulkarni
2023-10-12  9:54       ` Prathamesh Kulkarni
2023-10-16 21:10         ` Richard Sandiford [this message]
2023-10-17 18:53           ` Prathamesh Kulkarni
2023-10-18 17:52             ` Richard Sandiford
2023-10-18 19:04               ` Prathamesh Kulkarni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=mptlec26zpp.fsf@arm.com \
    --to=richard.sandiford@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=prathamesh.kulkarni@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).