[PATCH 0/4] aarch64: Improve codegen for dups and constructors

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH 0/4] aarch64: Improve codegen for dups and constructors
@ 2022-08-05 12:50 Andre Vieira (lists)
  2022-08-05 12:53 ` [PATCH 1/4] aarch64: encourage use of GPR input for SIMD inserts Andre Vieira (lists)
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Andre Vieira (lists) @ 2022-08-05 12:50 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Sandiford, Prathamesh Kulkarni

Hi,

This patch series is a work in progress towards getting the compiler to 
generate better code for constructors and dups in both NEON and SVE 
targets.  It first changes the backend to use rtx_vector_builder for 
vector_init's. Then it is followed by some prepraration passes to better 
handle VLA VEC_PERM_EXPRs followed by the addition of a new TARGET_HOOK 
VLA_CONSTRUCTOR that is used to expand VLA VEC_PERM_EXPRs, all based on 
Prathamesh's initial work in this area. As I said before, this is still 
work in progress, though I suspect the first two patches could go in but 
I was trying to get the series ready to post to make sure the first 
patches were in the right shape.

I have put this work on hold right now, but I heard Prathamesh might 
want to pick this up, feel free to use any of this, or discard as you 
see fit.

Andre Vieira (4)
aarch64: Encourage use of GPR input for SIMD inserts
aarch64: Change aarch64_expand_vector_init to use rtx_vector_builder
match.pd: Teach forwprop to handle VLA VEC_PERM_EXPRs with VLS 
CONSTRUCTORs as arguments
[RFC]: VLA Constructors

Kind regards,
Andre

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/4] aarch64: encourage use of GPR input for SIMD inserts
  2022-08-05 12:50 [PATCH 0/4] aarch64: Improve codegen for dups and constructors Andre Vieira (lists)
@ 2022-08-05 12:53 ` Andre Vieira (lists)
  2022-08-05 12:55 ` [PATCH 2/4]aarch64: Change aarch64_expand_vector_init to use rtx_vector_builder Andre Vieira (lists)
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Andre Vieira (lists) @ 2022-08-05 12:53 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Sandiford, Prathamesh Kulkarni

[-- Attachment #1: Type: text/plain, Size: 588 bytes --]

Hi,

This enables and makes it more likely the compiler is able to use GPR 
input for SIMD inserts. I believe this is some outdated hack we used to 
prevent costly GPR<->SIMD register file swaps. This patch is required 
for better codegen in situations like the test case 'int8_3' in the next 
patch in this series.

Bootstrapped and regression tested together with the next patch on 
aarch64-none-linux-gnu.

gcc/ChangeLog:

2022-08-05  Andre Vieira  <andre.simoesdiasvieira@arm.com>

         * config/aarch64/aarch64-simd.md (aarch64_simd_vec_set<mode>): 
Remove '?' modifier.

[-- Attachment #2: sve_const_dup_1.patch --]
[-- Type: text/plain, Size: 685 bytes --]

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 587a45d77721e1b39accbad7dbeca4d741eccb10..51eab5a872ade7b70268676346e8be7c9c6c8e3a 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1038,7 +1038,7 @@
   [(set (match_operand:VALL_F16 0 "register_operand" "=w,w,w")
 	(vec_merge:VALL_F16
 	    (vec_duplicate:VALL_F16
-		(match_operand:<VEL> 1 "aarch64_simd_nonimmediate_operand" "w,?r,Utv"))
+		(match_operand:<VEL> 1 "aarch64_simd_nonimmediate_operand" "w,r,Utv"))
 	    (match_operand:VALL_F16 3 "register_operand" "0,0,0")
 	    (match_operand:SI 2 "immediate_operand" "i,i,i")))]
   "TARGET_SIMD"

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 2/4]aarch64: Change aarch64_expand_vector_init to use rtx_vector_builder
  2022-08-05 12:50 [PATCH 0/4] aarch64: Improve codegen for dups and constructors Andre Vieira (lists)
  2022-08-05 12:53 ` [PATCH 1/4] aarch64: encourage use of GPR input for SIMD inserts Andre Vieira (lists)
@ 2022-08-05 12:55 ` Andre Vieira (lists)
  2022-08-05 12:56 ` [PATCH 3/4] match.pd: Teach forwprop to handle VLA VEC_PERM_EXPRs with VLS CONSTRUCTORs as arguments Andre Vieira (lists)
  2022-08-05 12:58 ` [PATCH 4/4][RFC] VLA Constructor Andre Vieira (lists)
  3 siblings, 0 replies; 7+ messages in thread
From: Andre Vieira (lists) @ 2022-08-05 12:55 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Sandiford, Prathamesh Kulkarni

[-- Attachment #1: Type: text/plain, Size: 809 bytes --]

Hi,

This patch changes aarch64_expand_vector_init to use rtx_vector_builder,
exploiting it's internal pattern detection to find 'dup' patterns.

Bootstrapped and regression tested on aarch64-none-linux-gnu.

Is this OK for trunk or should we wait for the rest of the series?

gcc/ChangeLog:
2022-08-05  Andre Vieira  <andre.simoesdiasvieira@arm.com>

         * config/aarch64/aarch64.cc (aarch64_vec_duplicate): New.
          (aarch64_expand_vector_init): Make the existing variant construct
          a rtx_vector_builder from the list of elements and use this to 
detect
          duplicate patterns.

gcc/testesuite/ChangeLog:
2022-08-05  Andre Vieira  <andre.simoesdiasvieira@arm.com>

         * gcc.target/aarch64/ldp_stp_16.c: Modify to reflect code change.

[-- Attachment #2: sve_const_dup_2.patch --]
[-- Type: text/plain, Size: 12316 bytes --]

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 4b486aeea90ea2afb9cdd96a4dbe15c5bb2abd7a..a08043e18d609e258ebfe033875201163d129aba 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -305,6 +305,7 @@ static machine_mode aarch64_simd_container_mode (scalar_mode, poly_int64);
 static bool aarch64_print_address_internal (FILE*, machine_mode, rtx,
 					    aarch64_addr_query_type);
 static HOST_WIDE_INT aarch64_clamp_to_uimm12_shift (HOST_WIDE_INT val);
+static void aarch64_expand_vector_init (rtx, rtx_vector_builder&);
 
 /* The processor for which instructions should be scheduled.  */
 enum aarch64_processor aarch64_tune = cortexa53;
@@ -21804,55 +21805,96 @@ aarch64_simd_make_constant (rtx vals)
     return NULL_RTX;
 }
 
+static void
+aarch64_vec_duplicate (rtx target, machine_mode mode, machine_mode element_mode,
+		       int narrow_n_elts)
+{
+  poly_uint64 size = narrow_n_elts * GET_MODE_BITSIZE (element_mode);
+  scalar_mode i_mode = int_mode_for_size (size, 0).require ();
+  machine_mode o_mode;
+  if (aarch64_sve_mode_p (mode))
+    o_mode = aarch64_full_sve_mode (i_mode).require ();
+  else
+    o_mode
+      = aarch64_simd_container_mode (i_mode,
+				     GET_MODE_BITSIZE (mode));
+  rtx input = simplify_gen_subreg (i_mode, target, mode, 0);
+  rtx output = simplify_gen_subreg (o_mode, target, mode, 0);
+  aarch64_emit_move (output, gen_vec_duplicate (o_mode, input));
+}
+
+
 /* Expand a vector initialisation sequence, such that TARGET is
    initialised to contain VALS.  */
 
 void
 aarch64_expand_vector_init (rtx target, rtx vals)
 {
-  machine_mode mode = GET_MODE (target);
-  scalar_mode inner_mode = GET_MODE_INNER (mode);
   /* The number of vector elements.  */
   int n_elts = XVECLEN (vals, 0);
-  /* The number of vector elements which are not constant.  */
-  int n_var = 0;
-  rtx any_const = NULL_RTX;
+  machine_mode mode = GET_MODE (target);
+  scalar_mode inner_mode = GET_MODE_INNER (mode);
   /* The first element of vals.  */
   rtx v0 = XVECEXP (vals, 0, 0);
-  bool all_same = true;
 
   /* This is a special vec_init<M><N> where N is not an element mode but a
      vector mode with half the elements of M.  We expect to find two entries
      of mode N in VALS and we must put their concatentation into TARGET.  */
-  if (XVECLEN (vals, 0) == 2 && VECTOR_MODE_P (GET_MODE (XVECEXP (vals, 0, 0))))
+  if (n_elts == 2
+      && VECTOR_MODE_P (GET_MODE (v0)))
     {
-      machine_mode narrow_mode = GET_MODE (XVECEXP (vals, 0, 0));
+      machine_mode narrow_mode = GET_MODE (v0);
       gcc_assert (GET_MODE_INNER (narrow_mode) == inner_mode
 		  && known_eq (GET_MODE_SIZE (mode),
 			       2 * GET_MODE_SIZE (narrow_mode)));
-      emit_insn (gen_aarch64_vec_concat (narrow_mode, target,
-					 XVECEXP (vals, 0, 0),
+      emit_insn (gen_aarch64_vec_concat (narrow_mode, target, v0,
 					 XVECEXP (vals, 0, 1)));
      return;
    }
 
-  /* Count the number of variable elements to initialise.  */
+  rtx_vector_builder builder (mode, n_elts, 1);
   for (int i = 0; i < n_elts; ++i)
+    builder.quick_push (XVECEXP (vals, 0, i));
+  builder.finalize ();
+
+  aarch64_expand_vector_init (target, builder);
+}
+
+static void
+aarch64_expand_vector_init (rtx target, rtx_vector_builder &v)
+{
+  machine_mode mode = GET_MODE (target);
+  scalar_mode inner_mode = GET_MODE_INNER (mode);
+  /* The number of vector elements which are not constant.  */
+  unsigned n_var = 0;
+  rtx any_const = NULL_RTX;
+  /* The first element of vals.  */
+  rtx v0 = v.elt (0);
+  /* Get the number of elements to insert into an Advanced SIMD vector.
+     If we have more than one element per pattern then we use the constant
+     number of elements in a full vector.
+     If we only have one element per pattern we use the number of patterns as
+     this may be lower than the number of elements in a full vector, which
+     means they repeat and we should use a duplicate of the smaller vector.  */
+  unsigned n_elts
+    = v.nelts_per_pattern () == 1 ? v.npatterns ()
+				  : v.full_nelts ().coeffs[0];
+
+  /* Count the number of variable elements to initialise.  */
+  for (unsigned i = 0; i < n_elts ; ++i)
     {
-      rtx x = XVECEXP (vals, 0, i);
+      rtx x = v.elt (i);
       if (!(CONST_INT_P (x) || CONST_DOUBLE_P (x)))
 	++n_var;
       else
 	any_const = x;
-
-      all_same &= rtx_equal_p (x, v0);
     }
 
   /* No variable elements, hand off to aarch64_simd_make_constant which knows
      how best to handle this.  */
   if (n_var == 0)
     {
-      rtx constant = aarch64_simd_make_constant (vals);
+      rtx constant = aarch64_simd_make_constant (v.build ());
       if (constant != NULL_RTX)
 	{
 	  emit_move_insn (target, constant);
@@ -21861,7 +21903,7 @@ aarch64_expand_vector_init (rtx target, rtx vals)
     }
 
   /* Splat a single non-constant element if we can.  */
-  if (all_same)
+  if (n_elts == 1)
     {
       rtx x = copy_to_mode_reg (inner_mode, v0);
       aarch64_emit_move (target, gen_vec_duplicate (mode, x));
@@ -21879,14 +21921,15 @@ aarch64_expand_vector_init (rtx target, rtx vals)
      and matches[X][1] with the count of duplicate elements (if X is the
      earliest element which has duplicates).  */
 
-  if (n_var == n_elts && n_elts <= 16)
+  if (n_var == n_elts)
     {
-      int matches[16][2] = {0};
-      for (int i = 0; i < n_elts; i++)
+      gcc_assert (n_elts <= 16);
+      unsigned matches[16][2] = {0};
+      for (unsigned i = 0; i < n_elts; i++)
 	{
-	  for (int j = 0; j <= i; j++)
+	  for (unsigned j = 0; j <= i; j++)
 	    {
-	      if (rtx_equal_p (XVECEXP (vals, 0, i), XVECEXP (vals, 0, j)))
+	      if (rtx_equal_p (v.elt (i), v.elt (j)))
 		{
 		  matches[i][0] = j;
 		  matches[j][1]++;
@@ -21894,9 +21937,9 @@ aarch64_expand_vector_init (rtx target, rtx vals)
 		}
 	    }
 	}
-      int maxelement = 0;
-      int maxv = 0;
-      for (int i = 0; i < n_elts; i++)
+      unsigned maxelement = 0;
+      unsigned maxv = 0;
+      for (unsigned i = 0; i < n_elts; i++)
 	if (matches[i][1] > maxv)
 	  {
 	    maxelement = i;
@@ -21915,8 +21958,8 @@ aarch64_expand_vector_init (rtx target, rtx vals)
 		  || inner_mode == E_DFmode))
 
 	    {
-	      rtx x0 = XVECEXP (vals, 0, 0);
-	      rtx x1 = XVECEXP (vals, 0, 1);
+	      rtx x0 = v.elt (0);
+	      rtx x1 = v.elt (1);
 	      /* Combine can pick up this case, but handling it directly
 		 here leaves clearer RTL.
 
@@ -21939,24 +21982,26 @@ aarch64_expand_vector_init (rtx target, rtx vals)
 	     vector register.  For big-endian we want that position to hold
 	     the last element of VALS.  */
 	  maxelement = BYTES_BIG_ENDIAN ? n_elts - 1 : 0;
-	  rtx x = copy_to_mode_reg (inner_mode, XVECEXP (vals, 0, maxelement));
+	  rtx x = copy_to_mode_reg (inner_mode, v.elt (maxelement));
 	  aarch64_emit_move (target, lowpart_subreg (mode, x, inner_mode));
 	}
       else
 	{
-	  rtx x = copy_to_mode_reg (inner_mode, XVECEXP (vals, 0, maxelement));
+	  rtx x = copy_to_mode_reg (inner_mode, v.elt (maxelement));
 	  aarch64_emit_move (target, gen_vec_duplicate (mode, x));
 	}
 
       /* Insert the rest.  */
-      for (int i = 0; i < n_elts; i++)
+      for (unsigned i = 0; i < n_elts; i++)
 	{
-	  rtx x = XVECEXP (vals, 0, i);
+	  rtx x = v.elt (i);
 	  if (matches[i][0] == maxelement)
 	    continue;
 	  x = copy_to_mode_reg (inner_mode, x);
 	  emit_insn (GEN_FCN (icode) (target, x, GEN_INT (i)));
 	}
+	if (!known_eq (v.full_nelts (), n_elts))
+	  aarch64_vec_duplicate (target, mode, GET_MODE (v0), n_elts);
       return;
     }
 
@@ -21965,19 +22010,19 @@ aarch64_expand_vector_init (rtx target, rtx vals)
      can.  */
   if (n_var != n_elts)
     {
-      rtx copy = copy_rtx (vals);
+      rtx copy = v.build ();
 
       /* Load constant part of vector.  We really don't care what goes into the
 	 parts we will overwrite, but we're more likely to be able to load the
 	 constant efficiently if it has fewer, larger, repeating parts
 	 (see aarch64_simd_valid_immediate).  */
-      for (int i = 0; i < n_elts; i++)
+      for (unsigned i = 0; i < n_elts; i++)
 	{
-	  rtx x = XVECEXP (vals, 0, i);
+	  rtx x = XVECEXP (copy, 0, i);
 	  if (CONST_INT_P (x) || CONST_DOUBLE_P (x))
 	    continue;
 	  rtx subst = any_const;
-	  for (int bit = n_elts / 2; bit > 0; bit /= 2)
+	  for (unsigned bit = n_elts / 2; bit > 0; bit /= 2)
 	    {
 	      /* Look in the copied vector, as more elements are const.  */
 	      rtx test = XVECEXP (copy, 0, i ^ bit);
@@ -21989,18 +22034,21 @@ aarch64_expand_vector_init (rtx target, rtx vals)
 	    }
 	  XVECEXP (copy, 0, i) = subst;
 	}
+      gcc_assert (GET_MODE (target) == GET_MODE (copy));
       aarch64_expand_vector_init (target, copy);
     }
 
   /* Insert the variable lanes directly.  */
-  for (int i = 0; i < n_elts; i++)
+  for (unsigned i = 0; i < n_elts; i++)
     {
-      rtx x = XVECEXP (vals, 0, i);
+      rtx x = v.elt (i);
       if (CONST_INT_P (x) || CONST_DOUBLE_P (x))
 	continue;
       x = copy_to_mode_reg (inner_mode, x);
       emit_insn (GEN_FCN (icode) (target, x, GEN_INT (i)));
     }
+  if (!known_eq (v.full_nelts (), n_elts))
+    aarch64_vec_duplicate (target, mode, inner_mode, n_elts);
 }
 
 /* Emit RTL corresponding to:
diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_stp_16.c b/gcc/testsuite/gcc.target/aarch64/ldp_stp_16.c
index 8ab117c4dcd7a731abc7e1b039e1faf0dfa09a5d..b307d2791824dd9c30200931452b2636708b5035 100644
--- a/gcc/testsuite/gcc.target/aarch64/ldp_stp_16.c
+++ b/gcc/testsuite/gcc.target/aarch64/ldp_stp_16.c
@@ -96,8 +96,8 @@ CONS2_FN (4, float);
 
 /*
 ** cons2_8_float:
-**	dup	v([0-9]+)\.4s, .*
-**	...
+**	ins	v0\.s\[1\], v1\.s\[0\]
+**	dup	v([0-9]+)\.2d, v0\.d\[0\]
 **	stp	q\1, q\1, \[x0\]
 **	stp	q\1, q\1, \[x0, #?32\]
 **	ret
diff --git a/gcc/testsuite/gcc.target/aarch64/vect_init.c b/gcc/testsuite/gcc.target/aarch64/vect_init.c
new file mode 100644
index 0000000000000000000000000000000000000000..546e44e96f4db60d289b4bc0ebfecbe18c81b4cc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vect_init.c
@@ -0,0 +1,144 @@
+#include <arm_neon.h>
+
+/*
+** int32_0:
+**	fmov	s0, w0
+**	ins	v0.s\[1\], w1
+**	dup	v0.2d, v0.d\[0\]
+**	ret
+*/
+
+int32x4_t int32_0 (int a, int b)
+{
+  int32x4_t v = {a, b, a, b};
+  return v;
+}
+/*
+** int32_1:
+**	dup	v0.4s, w0
+**	ret
+*/
+
+int32x4_t int32_1 (int a)
+{
+  int32x4_t v = {a, a, a, a};
+  return v;
+}
+
+/*
+** int16_0:
+**	sxth	w0, w0
+**	fmov	s0, w0
+**	ins	v0.h\[1\], w1
+**	ins	v0.h\[2\], w2
+**	ins	v0.h\[3\], w3
+**	dup	v0.2d, v0.d\[0\]
+**	ret
+*/
+
+int16x8_t int16_0 (int16_t a, int16_t b, int16_t c, int16_t d)
+{
+  int16x8_t v = {a, b, c, d,
+		 a, b, c, d};
+  return v;
+}
+
+/*
+** int16_1:
+**	sxth	w0, w0
+**	fmov	s0, w0
+**	ins	v0.h\[1\], w1
+**	dup	v0.4s, v0.s\[0\]
+**	ret
+*/
+
+int16x8_t int16_1 (int16_t a, int16_t b)
+{
+  int16x8_t v = {a, b, a, b,
+		 a, b, a, b};
+  return v;
+}
+
+/*
+** int16_2:
+**	dup	v0.8h, w0
+**	ret
+*/
+
+int16x8_t int16_2 (int16_t a)
+{
+  int16x8_t v = {a, a, a, a,
+		 a, a, a, a};
+  return v;
+}
+
+/*
+** int8_0:
+**	sxtb	w0, w0
+**	fmov	s0, w0
+**	ins	v0.b\[1\], w1
+**	ins	v0.b\[2\], w2
+**	ins	v0.b\[3\], w3
+**	ins	v0.b\[4\], w4
+**	ins	v0.b\[5\], w5
+**	ins	v0.b\[6\], w6
+**	ins	v0.b\[7\], w7
+**	dup	v0.2d, v0.d\[0\]
+**	ret
+*/
+
+int8x16_t int8_0 (int8_t a, int8_t b, int8_t c, int8_t d, int8_t e, int8_t f,
+		   int8_t g, int8_t h)
+{
+  int8x16_t v = {a, b, c, d, e, f, g, h,
+                 a, b, c, d, e, f, g, h};
+  return v;
+}
+
+/*
+** int8_1:
+**	sxtb	w0, w0
+**	fmov	s0, w0
+**	ins	v0.b\[1\], w1
+**	ins	v0.b\[2\], w2
+**	ins	v0.b\[3\], w3
+**	dup	v0.4s, v0.s\[0\]
+**	ret
+*/
+
+int8x16_t int8_1 (int8_t a, int8_t b, int8_t c, int8_t d)
+{
+  int8x16_t v = {a, b, c, d, a, b, c, d,
+                 a, b, c, d, a, b, c, d};
+  return v;
+}
+
+/*
+** int8_2:
+**	sxtb	w0, w0
+**	fmov	s0, w0
+**	ins	v0.b\[1\], w1
+**	dup	v0.8h, v0.h\[0\]
+**	ret
+*/
+
+int8x16_t int8_2 (int8_t a, int8_t b)
+{
+  int8x16_t v = {a, b, a, b, a, b, a, b,
+                 a, b, a, b, a, b, a, b};
+  return v;
+}
+
+/*
+** int8_3:
+**	dup	v0.16b, w0
+**	ret
+*/
+
+int8x16_t int8_3 (int8_t a)
+{
+  int8x16_t v = {a, a, a, a, a, a, a, a,
+                 a, a, a, a, a, a, a, a};
+  return v;
+}
+

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 3/4] match.pd: Teach forwprop to handle VLA VEC_PERM_EXPRs with VLS CONSTRUCTORs as arguments
  2022-08-05 12:50 [PATCH 0/4] aarch64: Improve codegen for dups and constructors Andre Vieira (lists)
  2022-08-05 12:53 ` [PATCH 1/4] aarch64: encourage use of GPR input for SIMD inserts Andre Vieira (lists)
  2022-08-05 12:55 ` [PATCH 2/4]aarch64: Change aarch64_expand_vector_init to use rtx_vector_builder Andre Vieira (lists)
@ 2022-08-05 12:56 ` Andre Vieira (lists)
  2022-08-05 14:53   ` Prathamesh Kulkarni
  2022-08-05 12:58 ` [PATCH 4/4][RFC] VLA Constructor Andre Vieira (lists)
  3 siblings, 1 reply; 7+ messages in thread
From: Andre Vieira (lists) @ 2022-08-05 12:56 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Sandiford, Prathamesh Kulkarni

[-- Attachment #1: Type: text/plain, Size: 244 bytes --]

Hi,

This patch is part of the WIP patch that follows in this series. It's 
goal is to teach forwprop to handle VLA VEC_PERM_EXPRs with VLS 
CONSTRUCTORs as arguments as preparation for the 'VLA constructor' hook 
approach.

Kind Regards,
Andre

[-- Attachment #2: sve_const_dup_3.patch --]
[-- Type: text/plain, Size: 1831 bytes --]

diff --git a/gcc/match.pd b/gcc/match.pd
index 9736393061aac61d4d53aaad6cf6b2c97a7d4679..3c3c0c6a88b35a6e42c506f6c4603680fe6e4318 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -7852,14 +7852,24 @@ and,
     if (!tree_to_vec_perm_builder (&builder, op2))
       return NULL_TREE;
 
+    /* FIXME: disable folding of a VEC_PERM_EXPR with a VLA mask and VLS
+       CONSTRUCTORS, since that would yield a VLA CONSTRUCTOR which we
+       currently do not support.  */
+    if (!TYPE_VECTOR_SUBPARTS (type).is_constant ()
+	&& (TYPE_VECTOR_SUBPARTS (TREE_TYPE (op0)).is_constant ()
+	    || TYPE_VECTOR_SUBPARTS (TREE_TYPE (op1)).is_constant ()))
+      return NULL_TREE;
+
     /* Create a vec_perm_indices for the integer vector.  */
     poly_uint64 nelts = TYPE_VECTOR_SUBPARTS (type);
     bool single_arg = (op0 == op1);
     vec_perm_indices sel (builder, single_arg ? 1 : 2, nelts);
   }
-  (if (sel.series_p (0, 1, 0, 1))
+  (if (sel.series_p (0, 1, 0, 1)
+       && useless_type_conversion_p (type, TREE_TYPE (op0)))
    { op0; }
-   (if (sel.series_p (0, 1, nelts, 1))
+   (if (sel.series_p (0, 1, nelts, 1)
+	&& useless_type_conversion_p (type, TREE_TYPE (op1)))
     { op1; }
     (with
      {
diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc
index fdc4bc8909d2763876550e53277ff2b3dcca796a..cda91c21c476ea8611e12c593bfa64e1d71dd29e 100644
--- a/gcc/tree-ssa-forwprop.cc
+++ b/gcc/tree-ssa-forwprop.cc
@@ -2661,7 +2661,7 @@ simplify_permutation (gimple_stmt_iterator *gsi)
 
       /* Shuffle of a constructor.  */
       bool ret = false;
-      tree res_type = TREE_TYPE (arg0);
+      tree res_type = TREE_TYPE (gimple_get_lhs (stmt));
       tree opt = fold_ternary (VEC_PERM_EXPR, res_type, arg0, arg1, op2);
       if (!opt
 	  || (TREE_CODE (opt) != CONSTRUCTOR && TREE_CODE (opt) != VECTOR_CST))

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 4/4][RFC] VLA Constructor
  2022-08-05 12:50 [PATCH 0/4] aarch64: Improve codegen for dups and constructors Andre Vieira (lists)
                   ` (2 preceding siblings ...)
  2022-08-05 12:56 ` [PATCH 3/4] match.pd: Teach forwprop to handle VLA VEC_PERM_EXPRs with VLS CONSTRUCTORs as arguments Andre Vieira (lists)
@ 2022-08-05 12:58 ` Andre Vieira (lists)
  2022-08-08 12:12   ` Richard Biener
  3 siblings, 1 reply; 7+ messages in thread
From: Andre Vieira (lists) @ 2022-08-05 12:58 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Sandiford, Prathamesh Kulkarni

[-- Attachment #1: Type: text/plain, Size: 153 bytes --]

This isn't really a 'PATCH' yet, it's something I was working on but had 
to put on hold. Feel free to re-use any bits or trash all of it if you'd 
like.

[-- Attachment #2: sve_const_dup_4.patch --]
[-- Type: text/plain, Size: 15627 bytes --]

diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index 82f9eba5c397af04924bdebdc684a1d77682d3fd..08625aad7b1a8dc9c9f8c491cb13d8af0b46a946 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -842,13 +842,45 @@ public:
     for (unsigned int i = 0; i < nargs; ++i)
       {
 	tree elt = gimple_call_arg (f.call, i);
-	if (!CONSTANT_CLASS_P (elt))
-	  return NULL;
 	builder.quick_push (elt);
 	for (unsigned int j = 1; j < factor; ++j)
 	  builder.quick_push (build_zero_cst (TREE_TYPE (vec_type)));
       }
-    return gimple_build_assign (f.lhs, builder.build ());
+    builder.finalize ();
+    unsigned int n_elts
+      = builder.nelts_per_pattern () == 1 ? builder.npatterns ()
+					  : builder.full_nelts ().coeffs[0];
+
+    if (n_elts == 1)
+      return gimple_build_assign (f.lhs, build1 (VEC_DUPLICATE_EXPR, vec_type,
+						 builder.elt (0)));
+    tree list = NULL_TREE;
+    tree *pp = &list;
+    for (unsigned int i = 0; i < n_elts; ++i)
+      {
+	*pp = build_tree_list (NULL, builder.elt (i) PASS_MEM_STAT);
+	pp = &TREE_CHAIN (*pp);
+      }
+
+    poly_uint64 vec_len = TYPE_VECTOR_SUBPARTS (vec_type);
+    vec_perm_builder sel (vec_len, n_elts, 1);
+    for (unsigned int i = 0; i < n_elts; i++)
+      sel.quick_push (i);
+    vec_perm_indices indices (sel, 1, n_elts);
+
+    tree elt_type = TREE_TYPE (vec_type);
+
+    tree ctor_type = build_vector_type (elt_type, n_elts);
+    tree ctor = make_ssa_name_fn (cfun, ctor_type, 0);
+    gimple *ctor_stmt
+      = gimple_build_assign (ctor,
+			     build_constructor_from_list (ctor_type, list));
+    gsi_insert_before (f.gsi, ctor_stmt, GSI_SAME_STMT);
+
+    tree mask_type = build_vector_type (ssizetype, vec_len);
+    tree mask = vec_perm_indices_to_tree (mask_type, indices);
+    return gimple_build_assign (f.lhs, fold_build3 (VEC_PERM_EXPR, vec_type,
+						    ctor, ctor, mask));
   }
 
   rtx
diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md
index bd60e65b0c3f05f1c931f03807170f3b9d699de5..dec935211e5a064239c858880a696e6ca3fe1ae2 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -2544,6 +2544,17 @@
   }
 )
 
+;; Duplicate an Advanced SIMD vector to fill an SVE vector (LE version).
+(define_insn "*aarch64_vec_duplicate_reg<mode>_le"
+  [(set (match_operand:SVE_FULL 0 "register_operand" "=w,w")
+	(vec_duplicate:SVE_FULL
+	  (match_operand:<VEL> 1 "register_operand" "w,r")))]
+  "TARGET_SVE && !BYTES_BIG_ENDIAN"
+  "@
+   mov\t%0.<Vetype>, %<vwcore>1
+   mov\t%0.<Vetype>, %<Vetype>1"
+)
+
 ;; Duplicate an Advanced SIMD vector to fill an SVE vector (BE version).
 ;; The SVE register layout puts memory lane N into (architectural)
 ;; register lane N, whereas the Advanced SIMD layout puts the memory
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index a08043e18d609e258ebfe033875201163d129aba..9b118e4101d0a5995a833769433be49321ab2151 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -6033,7 +6033,6 @@ rtx
 aarch64_expand_sve_dupq (rtx target, machine_mode mode, rtx src)
 {
   machine_mode src_mode = GET_MODE (src);
-  gcc_assert (GET_MODE_INNER (mode) == GET_MODE_INNER (src_mode));
   insn_code icode = (BYTES_BIG_ENDIAN
 		     ? code_for_aarch64_vec_duplicate_vq_be (mode)
 		     : code_for_aarch64_vec_duplicate_vq_le (mode));
@@ -21806,20 +21805,29 @@ aarch64_simd_make_constant (rtx vals)
 }
 
 static void
-aarch64_vec_duplicate (rtx target, machine_mode mode, machine_mode element_mode,
+aarch64_vec_duplicate (rtx target, rtx op, machine_mode mode, machine_mode element_mode,
 		       int narrow_n_elts)
 {
   poly_uint64 size = narrow_n_elts * GET_MODE_BITSIZE (element_mode);
-  scalar_mode i_mode = int_mode_for_size (size, 0).require ();
   machine_mode o_mode;
-  if (aarch64_sve_mode_p (mode))
-    o_mode = aarch64_full_sve_mode (i_mode).require ();
+  rtx input, output;
+  bool sve = aarch64_sve_mode_p (mode);
+  if (sve && known_eq (size, 128U))
+    {
+      o_mode = mode;
+      output = target;
+      input = op;
+    }
   else
-    o_mode
-      = aarch64_simd_container_mode (i_mode,
-				     GET_MODE_BITSIZE (mode));
-  rtx input = simplify_gen_subreg (i_mode, target, mode, 0);
-  rtx output = simplify_gen_subreg (o_mode, target, mode, 0);
+    {
+      scalar_mode i_mode = int_mode_for_size (size, 0).require ();
+      o_mode
+	= sve ? aarch64_full_sve_mode (i_mode).require ()
+	      : aarch64_simd_container_mode (i_mode,
+					     GET_MODE_BITSIZE (mode));
+      input = simplify_gen_subreg (i_mode, op, GET_MODE (op), 0);
+      output = simplify_gen_subreg (o_mode, target, mode, 0);
+    }
   aarch64_emit_move (output, gen_vec_duplicate (o_mode, input));
 }
 
@@ -21910,6 +21918,16 @@ aarch64_expand_vector_init (rtx target, rtx_vector_builder &v)
       return;
     }
 
+  /* We are constructing a VLS vector that we may later duplicate into a VLA
+     one.  Actually maybe split this into one for ASIMD and one for SVE? */
+  machine_mode real_mode = mode;
+  rtx real_target = target;
+  if (aarch64_sve_mode_p (real_mode))
+    {
+      mode = aarch64_vq_mode (GET_MODE_INNER (real_mode)).require ();
+      target = simplify_gen_subreg (mode, target, real_mode, 0);
+    }
+
   enum insn_code icode = optab_handler (vec_set_optab, mode);
   gcc_assert (icode != CODE_FOR_nothing);
 
@@ -22000,8 +22018,8 @@ aarch64_expand_vector_init (rtx target, rtx_vector_builder &v)
 	  x = copy_to_mode_reg (inner_mode, x);
 	  emit_insn (GEN_FCN (icode) (target, x, GEN_INT (i)));
 	}
-	if (!known_eq (v.full_nelts (), n_elts))
-	  aarch64_vec_duplicate (target, mode, GET_MODE (v0), n_elts);
+      if (!known_eq (v.full_nelts (), n_elts))
+	aarch64_vec_duplicate (real_target, target, real_mode, GET_MODE (v0), n_elts);
       return;
     }
 
@@ -22048,7 +22066,7 @@ aarch64_expand_vector_init (rtx target, rtx_vector_builder &v)
       emit_insn (GEN_FCN (icode) (target, x, GEN_INT (i)));
     }
   if (!known_eq (v.full_nelts (), n_elts))
-    aarch64_vec_duplicate (target, mode, inner_mode, n_elts);
+    aarch64_vec_duplicate (real_target, target, real_mode, inner_mode, n_elts);
 }
 
 /* Emit RTL corresponding to:
@@ -23947,11 +23965,7 @@ aarch64_evpc_sve_dup (struct expand_vec_perm_d *d)
   if (BYTES_BIG_ENDIAN
       || !d->one_vector_p
       || d->vec_flags != VEC_SVE_DATA
-      || d->op_vec_flags != VEC_ADVSIMD
-      || d->perm.encoding ().nelts_per_pattern () != 1
-      || !known_eq (d->perm.encoding ().npatterns (),
-		    GET_MODE_NUNITS (d->op_mode))
-      || !known_eq (GET_MODE_BITSIZE (d->op_mode), 128))
+      || d->perm.encoding ().nelts_per_pattern () != 1)
     return false;
 
   int npatterns = d->perm.encoding ().npatterns ();
@@ -23962,7 +23976,10 @@ aarch64_evpc_sve_dup (struct expand_vec_perm_d *d)
   if (d->testing_p)
     return true;
 
-  aarch64_expand_sve_dupq (d->target, GET_MODE (d->target), d->op0);
+  machine_mode mode = GET_MODE (d->target);
+  machine_mode element_mode = GET_MODE_INNER (mode);
+  aarch64_vec_duplicate (d->target, d->op0, mode, element_mode,
+			 d->perm.encoding ().npatterns ());
   return true;
 }
 
@@ -24194,6 +24211,15 @@ aarch64_vectorize_vec_perm_const (machine_mode vmode, machine_mode op_mode,
   return ret;
 }
 
+/* Implement TARGET_VECTORIZE_VLA_CONSTRUCTOR.  */
+
+static bool
+aarch64_vectorize_vla_constructor (rtx target, rtx_vector_builder &builder)
+{
+  aarch64_expand_vector_init (target, builder);
+  return true;
+}
+
 /* Generate a byte permute mask for a register of mode MODE,
    which has NUNITS units.  */
 
@@ -27667,6 +27693,10 @@ aarch64_libgcc_floating_mode_supported_p
 #define TARGET_VECTORIZE_VEC_PERM_CONST \
   aarch64_vectorize_vec_perm_const
 
+#undef TARGET_VECTORIZE_VLA_CONSTRUCTOR
+#define TARGET_VECTORIZE_VLA_CONSTRUCTOR \
+  aarch64_vectorize_vla_constructor
+
 #undef TARGET_VECTORIZE_RELATED_MODE
 #define TARGET_VECTORIZE_RELATED_MODE aarch64_vectorize_related_mode
 #undef TARGET_VECTORIZE_GET_MASK_MODE
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index b0ea39884aa3ced5c0ccc1e792088aa66997ec3b..eda3f014984f62d96d7fe0b3c0c439905375f25a 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6112,6 +6112,11 @@ instruction pattern.  There is no need for the hook to handle these two
 implementation approaches itself.
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_VECTORIZE_VLA_CONSTRUCTOR (rtx @var{target}, rtx_vector_builder @var{&builder})
+This hook is used to expand a vla constructor into @var{target}
+using the rtx_vector_builder @var{builder}.
+@end deftypefn
+
 @deftypefn {Target Hook} tree TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION (unsigned @var{code}, tree @var{vec_type_out}, tree @var{vec_type_in})
 This hook should return the decl of a function that implements the
 vectorized variant of the function with the @code{combined_fn} code
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index f869ddd5e5b8b7acbd8e9765fb103af24a1085b6..07f4f77877b18a23f6fd205a8dd8daf1a03c2923 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4164,6 +4164,8 @@ address;  but often a machine-dependent strategy can generate better code.
 
 @hook TARGET_VECTORIZE_VEC_PERM_CONST
 
+@hook TARGET_VECTORIZE_VLA_CONSTRUCTOR
+
 @hook TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION
 
 @hook TARGET_VECTORIZE_BUILTIN_MD_VECTORIZED_FUNCTION
diff --git a/gcc/expr.cc b/gcc/expr.cc
index f9753d48245d56039206647be8576246a3b25ed3..b9eb550cac4c68464c95cffa8da19b3984b80782 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -10264,6 +10264,44 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
 
     case VEC_PERM_EXPR:
       {
+	if (TREE_CODE (treeop2) == VECTOR_CST
+	    && targetm.vectorize.vla_constructor)
+	  {
+	    tree ctor0, ctor1;
+	    if (TREE_CODE (treeop0) == SSA_NAME
+		&& is_gimple_assign (SSA_NAME_DEF_STMT (treeop0)))
+	      ctor0 = gimple_assign_rhs1 (SSA_NAME_DEF_STMT (treeop0));
+	    else
+	      ctor0 = treeop0;
+	    if (TREE_CODE (treeop1) == SSA_NAME
+		&& is_gimple_assign (SSA_NAME_DEF_STMT (treeop1)))
+	      ctor1 = gimple_assign_rhs1 (SSA_NAME_DEF_STMT (treeop1));
+	    else
+	      ctor1 = treeop1;
+
+	    if (TREE_CODE (ctor0) == CONSTRUCTOR
+		&& TREE_CODE (ctor1) == CONSTRUCTOR)
+	      {
+
+		unsigned int nelts = vector_cst_encoded_nelts (treeop2);
+		unsigned int ctor_nelts = CONSTRUCTOR_NELTS (ctor0);
+		machine_mode mode = GET_MODE (target);
+		rtx_vector_builder builder (mode, nelts, 1);
+		for (unsigned int i = 0; i < nelts; ++i)
+		  {
+		    unsigned HOST_WIDE_INT index
+		      = tree_to_uhwi (VECTOR_CST_ENCODED_ELT (treeop2, i));
+		    tree op
+		      = index >= ctor_nelts
+			? CONSTRUCTOR_ELT (ctor1, index - ctor_nelts)->value
+			: CONSTRUCTOR_ELT (ctor0, index)->value;
+		    builder.quick_push (expand_normal (op));
+		  }
+		builder.finalize ();
+		if (targetm.vectorize.vla_constructor (target, builder))
+		  return target;
+	      }
+	  }
 	expand_operands (treeop0, treeop1, target, &op0, &op1, EXPAND_NORMAL);
 	vec_perm_builder sel;
 	if (TREE_CODE (treeop2) == VECTOR_CST
diff --git a/gcc/target.def b/gcc/target.def
index 2a7fa68f83dd15dcdd2c332e8431e6142ec7d305..3c219b6a90d9cc1a6393a3ebc24e54fcf14c6377 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -1902,6 +1902,13 @@ implementation approaches itself.",
 	const vec_perm_indices &sel),
  NULL)
 
+DEFHOOK
+(vla_constructor,
+ "This hook is used to expand a vla constructor into @var{target}\n\
+using the rtx_vector_builder @var{builder}.",
+ bool, (rtx target, rtx_vector_builder &builder),
+ NULL)
+
 /* Return true if the target supports misaligned store/load of a
    specific factor denoted in the third parameter.  The last parameter
    is true if the access is defined in a packed struct.  */
diff --git a/gcc/target.h b/gcc/target.h
index d6fa6931499d15edff3e5af3e429540d001c7058..b46b8f0d7a9c52f6efe6acf10f589703cec3bd08 100644
--- a/gcc/target.h
+++ b/gcc/target.h
@@ -262,6 +262,8 @@ enum poly_value_estimate_kind
 extern bool verify_type_context (location_t, type_context_kind, const_tree,
 				 bool = false);
 
+class rtx_vector_builder;
+
 /* The target structure.  This holds all the backend hooks.  */
 #define DEFHOOKPOD(NAME, DOC, TYPE, INIT) TYPE NAME;
 #define DEFHOOK(NAME, DOC, TYPE, PARAMS, INIT) TYPE (* NAME) PARAMS;
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_opt_1.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_opt_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..01f652931555534f43e0487766c568c72a5df686
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_opt_1.c
@@ -0,0 +1,134 @@
+/* { dg-options { "-O2" } } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+#include <arm_sve.h>
+
+/*
+** test0:
+**	ins	v0.s\[1\], v1.s\[0\]
+**	mov	z0.d, d0
+**	ret
+*/
+svfloat32_t test0(float x, float y) {
+    return svdupq_n_f32(x, y, x, y);
+}
+/*
+** test1:
+**	mov	z0.s, s0
+**	ret
+*/
+
+svfloat32_t test1(float x) {
+    return svdupq_n_f32(x, x, x, x);
+}
+
+/*
+** test2:
+**	mov	z0.s, w0
+**	ret
+*/
+
+svint32_t test2(int x) {
+    return svdupq_n_s32(x, x, x, x);
+}
+
+/*
+** test3:
+**	sxth	w0, w0
+**	fmov	d0, x0
+**	ins	v0.h\[1\], w1
+**	ins	v0.h\[2\], w2
+**	ins	v0.h\[3\], w3
+**	mov	z0.d, d0
+**	ret
+*/
+
+svint16_t test3(short a, short b, short c, short d)
+{
+    return svdupq_n_s16(a, b, c, d, a, b, c, d);
+}
+
+/*
+** test4:
+**	dup	v0.4h, w0
+**	ins	v0.h\[1\], w1
+**	ins	v0.h\[3\], w1
+**	mov	z0.d, d0
+**	ret
+*/
+
+svint16_t test4(short a, short b)
+{
+    return svdupq_n_s16(a, b, a, b, a, b, a, b);
+}
+
+/*
+** test5:
+**	mov	z0.h, w0
+**	ret
+*/
+
+svint16_t test5(short a)
+{
+    return svdupq_n_s16(a, a, a, a, a, a, a, a);
+}
+/*
+** test6:
+**	sxtb	w0, w0
+**	fmov	d0, x0
+**	ins	v0.b\[1\], w1
+**	ins	v0.b\[2\], w2
+**	ins	v0.b\[3\], w3
+**	ins	v0.b\[4\], w4
+**	ins	v0.b\[5\], w5
+**	ins	v0.b\[6\], w6
+**	ins	v0.b\[7\], w7
+**	mov	z0.d, d0
+**	ret
+*/
+
+svint8_t test6(char a, char b, char c, char d, char e, char f, char g, char h)
+{
+    return svdupq_n_s8(a, b, c, d, e, f, g, h, a, b, c, d, e, f, g, h);
+}
+
+/*
+** test7:
+**	dup	v0.8b, w0
+**	ins	v0.b\[1\], w1
+**	ins	v0.b\[2\], w2
+**	ins	v0.b\[3\], w3
+**	mov	z0.s, s0
+**	ret
+*/
+
+svint8_t test7(char a, char b, char c, char d)
+{
+    return svdupq_n_s8(a, b, c, d, a, b, c, d, a, b, c, d, a, b, c, d);
+}
+
+
+// We can do better than this
+/*
+**	sxtb	w0, w0
+**	fmov	d0, x0
+**	ins	v0.d\[1\], x1
+**	ins	v0.b\[1\], w1
+**	mov	z0.h, h0
+**	ret
+*/
+
+svint8_t test8(char a, char b)
+{
+    return svdupq_n_s8(a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b);
+}
+
+/*
+** test9:
+**	mov	z0.b, w0
+**	ret
+*/
+
+svint8_t test9(char a)
+{
+    return svdupq_n_s8(a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a);
+}
diff --git a/gcc/tree-vect-generic.cc b/gcc/tree-vect-generic.cc
index 350129555a0c71c0896c4f1003163f3b3557c11b..eaae1eefe02af3f51073310e7d17c33286b2bead 100644
--- a/gcc/tree-vect-generic.cc
+++ b/gcc/tree-vect-generic.cc
@@ -1513,6 +1513,11 @@ lower_vec_perm (gimple_stmt_iterator *gsi)
   if (!TYPE_VECTOR_SUBPARTS (vect_type).is_constant (&elements))
     return;
 
+  /* It is possible to have a VEC_PERM_EXPR with a VLA mask and a VLS
+     CONSTRUCTOR, this should return a VLA type, so we can't lower it.  */
+  if (!TYPE_VECTOR_SUBPARTS (mask_type).is_constant ())
+    return;
+
   if (TREE_CODE (mask) == SSA_NAME)
     {
       gimple *def_stmt = SSA_NAME_DEF_STMT (mask);

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 3/4] match.pd: Teach forwprop to handle VLA VEC_PERM_EXPRs with VLS CONSTRUCTORs as arguments
  2022-08-05 12:56 ` [PATCH 3/4] match.pd: Teach forwprop to handle VLA VEC_PERM_EXPRs with VLS CONSTRUCTORs as arguments Andre Vieira (lists)
@ 2022-08-05 14:53   ` Prathamesh Kulkarni
  0 siblings, 0 replies; 7+ messages in thread
From: Prathamesh Kulkarni @ 2022-08-05 14:53 UTC (permalink / raw)
  To: Andre Vieira (lists); +Cc: gcc-patches, Richard Sandiford

On Fri, 5 Aug 2022 at 18:26, Andre Vieira (lists)
<andre.simoesdiasvieira@arm.com> wrote:
>
> Hi,
>
> This patch is part of the WIP patch that follows in this series. It's
> goal is to teach forwprop to handle VLA VEC_PERM_EXPRs with VLS
> CONSTRUCTORs as arguments as preparation for the 'VLA constructor' hook
> approach.

      /* Shuffle of a constructor.  */
       bool ret = false;
-      tree res_type = TREE_TYPE (arg0);
+      tree res_type = TREE_TYPE (gimple_get_lhs (stmt));
       tree opt = fold_ternary (VEC_PERM_EXPR, res_type, arg0, arg1, op2);
       if (!opt
         || (TREE_CODE (opt) != CONSTRUCTOR && TREE_CODE (opt) != VECTOR_CST))

This has to be TREE_TYPE (arg0). I had changed it to TREE_TYPE
(gimple_assign_lhs (stmt)) and it caused
several ICE's on ppc64le (PR106360)
For details, see:
https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598611.html
I currently have a patch in review that extends fold_vec_perm to
handle differing vector lengths:
https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599126.html

Thanks,
Prathamesh
>
> Kind Regards,
> Andre

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 4/4][RFC] VLA Constructor
  2022-08-05 12:58 ` [PATCH 4/4][RFC] VLA Constructor Andre Vieira (lists)
@ 2022-08-08 12:12   ` Richard Biener
  0 siblings, 0 replies; 7+ messages in thread
From: Richard Biener @ 2022-08-08 12:12 UTC (permalink / raw)
  To: Andre Vieira (lists); +Cc: GCC Patches, Richard Sandiford

On Fri, Aug 5, 2022 at 2:59 PM Andre Vieira (lists) via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> This isn't really a 'PATCH' yet, it's something I was working on but had
> to put on hold. Feel free to re-use any bits or trash all of it if you'd
> like.

@@ -10264,6 +10264,44 @@ expand_expr_real_2 (sepops ops, rtx target,
machine_mode tmode,

     case VEC_PERM_EXPR:
       {
+       if (TREE_CODE (treeop2) == VECTOR_CST
+           && targetm.vectorize.vla_constructor)
+         {
+           tree ctor0, ctor1;
+           if (TREE_CODE (treeop0) == SSA_NAME
+               && is_gimple_assign (SSA_NAME_DEF_STMT (treeop0)))
+             ctor0 = gimple_assign_rhs1 (SSA_NAME_DEF_STMT (treeop0));
+           else
+             ctor0 = treeop0;
+           if (TREE_CODE (treeop1) == SSA_NAME
+               && is_gimple_assign (SSA_NAME_DEF_STMT (treeop1)))
+             ctor1 = gimple_assign_rhs1 (SSA_NAME_DEF_STMT (treeop1));

just to say - you can't lookup things like this, you have to go through the TER
machinery, otherwise the expansions for the CTOR elements might be
clobbered already.  That means to be fully effective doing this during RTL
expansion is likely limited.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-08-08 12:13 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-05 12:50 [PATCH 0/4] aarch64: Improve codegen for dups and constructors Andre Vieira (lists)
2022-08-05 12:53 ` [PATCH 1/4] aarch64: encourage use of GPR input for SIMD inserts Andre Vieira (lists)
2022-08-05 12:55 ` [PATCH 2/4]aarch64: Change aarch64_expand_vector_init to use rtx_vector_builder Andre Vieira (lists)
2022-08-05 12:56 ` [PATCH 3/4] match.pd: Teach forwprop to handle VLA VEC_PERM_EXPRs with VLS CONSTRUCTORs as arguments Andre Vieira (lists)
2022-08-05 14:53   ` Prathamesh Kulkarni
2022-08-05 12:58 ` [PATCH 4/4][RFC] VLA Constructor Andre Vieira (lists)
2022-08-08 12:12   ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).