public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] gcc: Add vec_select -> subreg RTL simplification
@ 2021-07-02  9:53 Jonathan Wright
  2021-07-07 13:35 ` [PATCH V2] " Jonathan Wright
  0 siblings, 1 reply; 6+ messages in thread
From: Jonathan Wright @ 2021-07-02  9:53 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Sandiford, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 3221 bytes --]

Hi,

As subject, this patch adds a new RTL simplification for the case of a
VEC_SELECT selecting the low part of a vector. The simplification
returns a SUBREG.

The primary goal of this patch is to enable better combinations of
Neon RTL patterns - specifically allowing generation of 'write-to-
high-half' narrowing intructions.

Adding this RTL simplification means that the expected results for a
number of tests need to be updated:
* aarch64 Neon: Update the scan-assembler regex for intrinsics tests
  to expect a scalar register instead of lane 0 of a vector.
* aarch64 SVE: Likewise.
* arm MVE: Use lane 1 instead of lane 0 for lane-extraction
  intrinsics tests (as the move instructions get optimized away for
  lane 0.)

Regression tested and bootstrapped on aarch64-none-linux-gnu,
x86_64-unknown-linux-gnu, arm-none-linux-gnueabihf and
aarch64_be-none-linux-gnu - no issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-06-08  Jonathan Wright  <jonathan.wright@arm.com>

	* combine.c (combine_simplify_rtx): Add vec_select -> subreg
	simplification.
	* config/aarch64/aarch64.md (*zero_extend<SHORT:mode><GPI:mode>2_aarch64):
	Add Neon to general purpose register case for zero-extend
	pattern.
	* config/arm/vfp.md (*arm_movsi_vfp): Remove "*" from *t -> r
	case to prevent some cases opting to go through memory.
	* cse.c (fold_rtx): Add vec_select -> subreg simplification.
	* simplify-rtx.c (simplify_context::simplify_binary_operation_1):
	Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/extract_zero_extend.c: Remove dump scan
	for RTL pattern match.
	* gcc.target/aarch64/simd/vmulx_laneq_f64_1.c: Update
	scan-assembler regex to look for a scalar register instead of
	lane 0 of a vector.
	* gcc.target/aarch64/simd/vmulx_laneq_f64_1.c: Likewise.
	* gcc.target/aarch64/simd/vmulxd_laneq_f64_1.c: Likewise.
	* gcc.target/aarch64/simd/vmulxs_lane_f32_1.c: Likewise.
	* gcc.target/aarch64/simd/vmulxs_laneq_f32_1.c: Likewise.
	* gcc.target/aarch64/simd/vqdmlalh_lane_s16.c: Likewise.
	* gcc.target/aarch64/simd/vqdmlals_lane_s32.c: Likewise.
	* gcc.target/aarch64/simd/vqdmlslh_lane_s16.c: Likewise.
	* gcc.target/aarch64/simd/vqdmlsls_lane_s32.c: Likewise.
	* gcc.target/aarch64/simd/vqdmullh_lane_s16.c: Likewise.
	* gcc.target/aarch64/simd/vqdmullh_laneq_s16.c: Likewise.
	* gcc.target/aarch64/simd/vqdmulls_lane_s32.c: Likewise.
	* gcc.target/aarch64/simd/vqdmulls_laneq_s32.c: Likewise.
	* gcc.target/aarch64/sve/dup_lane_1.c: Likewise.
	* gcc.target/aarch64/sve/live_1.c: Update scan-assembler regex
	cases to look for 'b' and 'h' registers instead of 'w'.
	* gcc.target/arm/mve/intrinsics/vgetq_lane_f16.c: Extract
	lane 1 as the moves for lane 0 now get optimized away.
	* gcc.target/arm/mve/intrinsics/vgetq_lane_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vgetq_lane_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vgetq_lane_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vgetq_lane_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vgetq_lane_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vgetq_lane_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vgetq_lane_u8.c: Likewise.

[-- Attachment #2: rb14526.patch --]
[-- Type: application/octet-stream, Size: 24917 bytes --]

diff --git a/gcc/combine.c b/gcc/combine.c
index 6476812a21268e28219d1e302ee1c979d528a6ca..965b1a69ab2162a537b5846f0563f5120090fb22 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -6276,6 +6276,36 @@ combine_simplify_rtx (rtx x, machine_mode op0_mode, int in_dest,
 			      - 1,
 			      0));
       break;
+    case VEC_SELECT:
+      {
+	rtx trueop0 = XEXP (x, 0);
+	mode = GET_MODE (trueop0);
+	rtx trueop1 = XEXP (x, 1);
+	int nunits;
+	/* If we select a low-part subreg, return that.  */
+	if (GET_MODE_NUNITS (mode).is_constant (&nunits)
+	    && targetm.can_change_mode_class (mode, GET_MODE (x), ALL_REGS))
+	  {
+	    int flag = 0;
+	    int offset = BYTES_BIG_ENDIAN ? nunits - XVECLEN (trueop1, 0) : 0;
+
+	    for (int i = 0; i < XVECLEN (trueop1, 0); i++)
+	      {
+		if (i + offset != INTVAL (XVECEXP (trueop1, 0, i)))
+		  {
+		    flag = 1;
+		    break;
+		  }
+	      }
+
+	    if (flag == 0)
+	      {
+		rtx new_rtx = lowpart_subreg (GET_MODE (x), trueop0, mode);
+		if (new_rtx != NULL_RTX)
+		  return new_rtx;
+	      }
+	  }
+      }
 
     default:
       break;
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index aef6da9732d45b3586bad5ba57dafa438374ac3c..f12a0bebd3d6dd3381ac8248cd3fa3f519115105 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1884,15 +1884,16 @@
 )
 
 (define_insn "*zero_extend<SHORT:mode><GPI:mode>2_aarch64"
-  [(set (match_operand:GPI 0 "register_operand" "=r,r,w")
-        (zero_extend:GPI (match_operand:SHORT 1 "nonimmediate_operand" "r,m,m")))]
+  [(set (match_operand:GPI 0 "register_operand" "=r,r,w,r")
+        (zero_extend:GPI (match_operand:SHORT 1 "nonimmediate_operand" "r,m,m,w")))]
   ""
   "@
    and\t%<GPI:w>0, %<GPI:w>1, <SHORT:short_mask>
    ldr<SHORT:size>\t%w0, %1
-   ldr\t%<SHORT:size>0, %1"
-  [(set_attr "type" "logic_imm,load_4,f_loads")
-   (set_attr "arch" "*,*,fp")]
+   ldr\t%<SHORT:size>0, %1
+   umov\t%w0, %1.<SHORT:size>[0]"
+  [(set_attr "type" "logic_imm,load_4,f_loads,neon_to_gp")
+   (set_attr "arch" "*,*,fp,fp")]
 )
 
 (define_expand "<optab>qihi2"
diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
index 55b6c1ac585a4cae0789c3afc0fccfc05a6d3653..93e963696dad30f29a76025696670f8b31bf2c35 100644
--- a/gcc/config/arm/vfp.md
+++ b/gcc/config/arm/vfp.md
@@ -224,7 +224,7 @@
 ;; problems because small constants get converted into adds.
 (define_insn "*arm_movsi_vfp"
   [(set (match_operand:SI 0 "nonimmediate_operand" "=rk,r,r,r,rk,m ,*t,r,*t,*t, *Uv")
-      (match_operand:SI 1 "general_operand"	   "rk, I,K,j,mi,rk,r,*t,*t,*Uvi,*t"))]
+      (match_operand:SI 1 "general_operand"	   "rk, I,K,j,mi,rk,r,t,*t,*Uvi,*t"))]
   "TARGET_ARM && TARGET_HARD_FLOAT
    && (   s_register_operand (operands[0], SImode)
        || s_register_operand (operands[1], SImode))"
diff --git a/gcc/cse.c b/gcc/cse.c
index 4b7cbdce600e9d0e1d4768c17a99381c76e1cef1..51e0599d2a34b19a2a8b71780e47a25027afea1c 100644
--- a/gcc/cse.c
+++ b/gcc/cse.c
@@ -3171,6 +3171,36 @@ fold_rtx (rtx x, rtx_insn *insn)
       if (NO_FUNCTION_CSE && CONSTANT_P (XEXP (XEXP (x, 0), 0)))
 	return x;
       break;
+    case VEC_SELECT:
+      {
+	rtx trueop0 = XEXP (x, 0);
+	mode = GET_MODE (trueop0);
+	rtx trueop1 = XEXP (x, 1);
+	int nunits;
+	/* If we select a low-part subreg, return that.  */
+	if (GET_MODE_NUNITS (mode).is_constant (&nunits)
+	    && targetm.can_change_mode_class (mode, GET_MODE (x), ALL_REGS))
+	  {
+	    int flag = 0;
+	    int offset = BYTES_BIG_ENDIAN ? nunits - XVECLEN (trueop1, 0) : 0;
+
+	    for (int i = 0; i < XVECLEN (trueop1, 0); i++)
+	      {
+		if (i + offset != INTVAL (XVECEXP (trueop1, 0, i)))
+		  {
+		    flag = 1;
+		    break;
+		  }
+	      }
+
+	    if (flag == 0)
+	      {
+		rtx new_rtx = lowpart_subreg (GET_MODE (x), trueop0, mode);
+		if (new_rtx != NULL_RTX)
+		  return new_rtx;
+	      }
+	  }
+      }
 
     /* Anything else goes through the loop below.  */
     default:
diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index c82101c73a46e300bc65eb2104a2205433ff5d24..3b41588932e0801fd379e9aa36fa5b094b33d15e 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -4201,6 +4201,34 @@ simplify_context::simplify_binary_operation_1 (rtx_code code,
 		return trueop0;
 	    }
 
+	  /* If we select a low-part subreg, return that.  */
+	  int nunits;
+	  if (GET_MODE_NUNITS (GET_MODE (trueop0)).is_constant (&nunits)
+	      && targetm.can_change_mode_class (GET_MODE (trueop0), mode,
+						ALL_REGS))
+	    {
+	      int flag = 0;
+	      int offset = BYTES_BIG_ENDIAN ? nunits - XVECLEN (trueop1, 0)
+					    : 0;
+
+	      for (int i = 0; i < XVECLEN (trueop1, 0); i++)
+		{
+		  if (i + offset != INTVAL (XVECEXP (trueop1, 0, i)))
+		    {
+		      flag = 1;
+		      break;
+		    }
+		}
+
+	      if (flag == 0)
+		{
+		  rtx new_rtx = lowpart_subreg (mode, trueop0,
+						GET_MODE (trueop0));
+		  if (new_rtx != NULL_RTX)
+		    return new_rtx;
+		}
+	    }
+
 	  /* If we build {a,b} then permute it, build the result directly.  */
 	  if (XVECLEN (trueop1, 0) == 2
 	      && CONST_INT_P (XVECEXP (trueop1, 0, 0))
diff --git a/gcc/testsuite/gcc.target/aarch64/extract_zero_extend.c b/gcc/testsuite/gcc.target/aarch64/extract_zero_extend.c
index 0209305cd55b0b62b794f790a1cc3606fcc7a44b..193b945b41ad821da6d1112ffae79ca463b4a5e4 100644
--- a/gcc/testsuite/gcc.target/aarch64/extract_zero_extend.c
+++ b/gcc/testsuite/gcc.target/aarch64/extract_zero_extend.c
@@ -70,12 +70,3 @@ foo_siv4hi (siv4hi a)
 
 /* { dg-final { scan-assembler-times "umov\\t" 8 } } */
 /* { dg-final { scan-assembler-not "and\\t" } } */
-
-/* { dg-final { scan-rtl-dump "aarch64_get_lane_zero_extenddiv16qi" "final" } } */
-/* { dg-final { scan-rtl-dump "aarch64_get_lane_zero_extenddiv8qi" "final" } } */
-/* { dg-final { scan-rtl-dump "aarch64_get_lane_zero_extenddiv8hi" "final" } } */
-/* { dg-final { scan-rtl-dump "aarch64_get_lane_zero_extenddiv4hi" "final" } } */
-/* { dg-final { scan-rtl-dump "aarch64_get_lane_zero_extendsiv16qi" "final" } } */
-/* { dg-final { scan-rtl-dump "aarch64_get_lane_zero_extendsiv8qi" "final" } } */
-/* { dg-final { scan-rtl-dump "aarch64_get_lane_zero_extendsiv8hi" "final" } } */
-/* { dg-final { scan-rtl-dump "aarch64_get_lane_zero_extendsiv4hi" "final" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vmulx_laneq_f64_1.c b/gcc/testsuite/gcc.target/aarch64/simd/vmulx_laneq_f64_1.c
index db79d5355bc925098555788c0dd09c99029576c7..9ef001eb3bad40ea09008d1d79b2211ff81f911a 100644
--- a/gcc/testsuite/gcc.target/aarch64/simd/vmulx_laneq_f64_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vmulx_laneq_f64_1.c
@@ -72,5 +72,5 @@ main (void)
   set_and_test_case3 ();
   return 0;
 }
-/* { dg-final { scan-assembler-times "fmulx\[ \t\]+\[dD\]\[0-9\]+, ?\[dD\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[dD\]\\\[0\\\]\n" 1 } } */
+/* { dg-final { scan-assembler-times "fmulx\[ \t\]+\[dD\]\[0-9\]+, ?\[dD\]\[0-9\]+, ?\[dD\]\[0-9\]\n" 1 } } */
 /* { dg-final { scan-assembler-times "fmulx\[ \t\]+\[dD\]\[0-9\]+, ?\[dD\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[dD\]\\\[1\\\]\n" 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vmulxd_laneq_f64_1.c b/gcc/testsuite/gcc.target/aarch64/simd/vmulxd_laneq_f64_1.c
index 3f8303c574ff40967c5b9ce5a152d70c4a11a9dc..232ade910472bf2ea3aa182f4216f55c8403b45b 100644
--- a/gcc/testsuite/gcc.target/aarch64/simd/vmulxd_laneq_f64_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vmulxd_laneq_f64_1.c
@@ -58,5 +58,5 @@ main (void)
   set_and_test_case3 ();
   return 0;
 }
-/* { dg-final { scan-assembler-times "fmulx\[ \t\]+\[dD\]\[0-9\]+, ?\[dD\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[dD\]\\\[0\\\]\n" 1 } } */
+/* { dg-final { scan-assembler-times "fmulx\[ \t\]+\[dD\]\[0-9\]+, ?\[dD\]\[0-9\]+, ?\[dD\]\[0-9\]\n" 1 } } */
 /* { dg-final { scan-assembler-times "fmulx\[ \t\]+\[dD\]\[0-9\]+, ?\[dD\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[dD\]\\\[1\\\]\n" 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vmulxs_lane_f32_1.c b/gcc/testsuite/gcc.target/aarch64/simd/vmulxs_lane_f32_1.c
index 124dcd8c4ec187b38ffb03606fad4121d9280451..37aa0ec270c29d998973ef37acd4d06470caf1f1 100644
--- a/gcc/testsuite/gcc.target/aarch64/simd/vmulxs_lane_f32_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vmulxs_lane_f32_1.c
@@ -57,5 +57,5 @@ main (void)
   set_and_test_case3 ();
   return 0;
 }
-/* { dg-final { scan-assembler-times "fmulx\[ \t\]+\[sS\]\[0-9\]+, ?\[sS\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[sS\]\\\[0\\\]\n" 1 } } */
+/* { dg-final { scan-assembler-times "fmulx\[ \t\]+\[sS\]\[0-9\]+, ?\[sS\]\[0-9\]+, ?\[sS\]\[0-9\]\n" 1 } } */
 /* { dg-final { scan-assembler-times "fmulx\[ \t\]+\[sS\]\[0-9\]+, ?\[sS\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[sS\]\\\[1\\\]\n" 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vmulxs_laneq_f32_1.c b/gcc/testsuite/gcc.target/aarch64/simd/vmulxs_laneq_f32_1.c
index 255f0968822ffee7f3429c5997b02e3fcfca68f3..c9f2484975a66afd7d69e7fc1d9ea023a655a4d6 100644
--- a/gcc/testsuite/gcc.target/aarch64/simd/vmulxs_laneq_f32_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vmulxs_laneq_f32_1.c
@@ -79,7 +79,7 @@ main (void)
   set_and_test_case3 ();
   return 0;
 }
-/* { dg-final { scan-assembler-times "fmulx\[ \t\]+\[sS\]\[0-9\]+, ?\[sS\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[sS\]\\\[0\\\]\n" 1 } } */
+/* { dg-final { scan-assembler-times "fmulx\[ \t\]+\[sS\]\[0-9\]+, ?\[sS\]\[0-9\]+, ?\[sS\]\[0-9\]\n" 1 } } */
 /* { dg-final { scan-assembler-times "fmulx\[ \t\]+\[sS\]\[0-9\]+, ?\[sS\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[sS\]\\\[1\\\]\n" 1 } } */
 /* { dg-final { scan-assembler-times "fmulx\[ \t\]+\[sS\]\[0-9\]+, ?\[sS\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[sS\]\\\[2\\\]\n" 1 } } */
 /* { dg-final { scan-assembler-times "fmulx\[ \t\]+\[sS\]\[0-9\]+, ?\[sS\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[sS\]\\\[3\\\]\n" 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmlalh_lane_s16.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlalh_lane_s16.c
index 21ae724cf0ede2378cc21a2b151e948ddb198137..6b96d1cbf0fa0de7c79811abcce25990867549ab 100644
--- a/gcc/testsuite/gcc.target/aarch64/simd/vqdmlalh_lane_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlalh_lane_s16.c
@@ -11,4 +11,4 @@ t_vqdmlalh_lane_s16 (int32_t a, int16_t b, int16x4_t c)
   return vqdmlalh_lane_s16 (a, b, c, 0);
 }
 
-/* { dg-final { scan-assembler-times "sqdmlal\[ \t\]+\[sS\]\[0-9\]+, ?\[hH\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[hH\]\\\[0\\\]\n" 1 } } */
+/* { dg-final { scan-assembler-times "sqdmlal\[ \t\]+\[sS\]\[0-9\]+, ?\[hH\]\[0-9\]+, ?\[hH\]\[0-9\]\n" 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmlals_lane_s32.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlals_lane_s32.c
index 79db7b73de07000c4a0546c2afa5e3b27584ebe9..a780ddbe2f90a0750497448ed05f0be61bd173c0 100644
--- a/gcc/testsuite/gcc.target/aarch64/simd/vqdmlals_lane_s32.c
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlals_lane_s32.c
@@ -11,4 +11,4 @@ t_vqdmlals_lane_s32 (int64_t a, int32_t b, int32x2_t c)
   return vqdmlals_lane_s32 (a, b, c, 0);
 }
 
-/* { dg-final { scan-assembler-times "sqdmlal\[ \t\]+\[dD\]\[0-9\]+, ?\[sS\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[sS\]\\\[0\\\]\n" 1 } } */
+/* { dg-final { scan-assembler-times "sqdmlal\[ \t\]+\[dD\]\[0-9\]+, ?\[sS\]\[0-9\]+, ?\[sS\]\[0-9\]\n" 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmlslh_lane_s16.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlslh_lane_s16.c
index 185507b9362527b842d6f0f07934e19f77e61c97..8bbac1a3c59f60844fb75aeec57adf1b8b830d2a 100644
--- a/gcc/testsuite/gcc.target/aarch64/simd/vqdmlslh_lane_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlslh_lane_s16.c
@@ -11,4 +11,4 @@ t_vqdmlslh_lane_s16 (int32_t a, int16_t b, int16x4_t c)
   return vqdmlslh_lane_s16 (a, b, c, 0);
 }
 
-/* { dg-final { scan-assembler-times "sqdmlsl\[ \t\]+\[sS\]\[0-9\]+, ?\[hH\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[hH\]\\\[0\\\]\n" 1 } } */
+/* { dg-final { scan-assembler-times "sqdmlsl\[ \t\]+\[sS\]\[0-9\]+, ?\[hH\]\[0-9\]+, ?\[hH\]\[0-9\]\n" 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmlsls_lane_s32.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlsls_lane_s32.c
index f692923850e959946c7113b5b60bcef052938b75..069ba918d5bbae20bda5fa6b3c23e41dd8068b40 100644
--- a/gcc/testsuite/gcc.target/aarch64/simd/vqdmlsls_lane_s32.c
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlsls_lane_s32.c
@@ -11,4 +11,4 @@ t_vqdmlsls_lane_s32 (int64_t a, int32_t b, int32x2_t c)
   return vqdmlsls_lane_s32 (a, b, c, 0);
 }
 
-/* { dg-final { scan-assembler-times "sqdmlsl\[ \t\]+\[dD\]\[0-9\]+, ?\[sS\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[sS\]\\\[0\\\]\n" 1 } } */
+/* { dg-final { scan-assembler-times "sqdmlsl\[ \t\]+\[dD\]\[0-9\]+, ?\[sS\]\[0-9\]+, ?\[sS\]\[0-9\]\n" 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmullh_lane_s16.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmullh_lane_s16.c
index debf191abc71429cb26e1478ca837cc7734760d2..fcd496b1aaa773204053bec6a0d3b764a71fcf63 100644
--- a/gcc/testsuite/gcc.target/aarch64/simd/vqdmullh_lane_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vqdmullh_lane_s16.c
@@ -11,4 +11,4 @@ t_vqdmullh_lane_s16 (int16_t a, int16x4_t b)
   return vqdmullh_lane_s16 (a, b, 0);
 }
 
-/* { dg-final { scan-assembler-times "sqdmull\[ \t\]+\[sS\]\[0-9\]+, ?\[hH\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[hH\]\\\[0\\\]\n" 1 } } */
+/* { dg-final { scan-assembler-times "sqdmull\[ \t\]+\[sS\]\[0-9\]+, ?\[hH\]\[0-9\]+, ?\[hH\]\[0-9\]\n" 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmullh_laneq_s16.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmullh_laneq_s16.c
index e810c4713bcc66f3e8aa04cba9304325d7e62a25..db77fff27f3ec4838f9e2d06f0d9cede495dedac 100644
--- a/gcc/testsuite/gcc.target/aarch64/simd/vqdmullh_laneq_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vqdmullh_laneq_s16.c
@@ -11,4 +11,4 @@ t_vqdmullh_laneq_s16 (int16_t a, int16x8_t b)
   return vqdmullh_laneq_s16 (a, b, 0);
 }
 
-/* { dg-final { scan-assembler-times "sqdmull\[ \t\]+\[sS\]\[0-9\]+, ?\[hH\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[hH\]\\\[0\\\]\n" 1 } } */
+/* { dg-final { scan-assembler-times "sqdmull\[ \t\]+\[sS\]\[0-9\]+, ?\[hH\]\[0-9\]+, ?\[hH\]\[0-9\]\n" 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmulls_lane_s32.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmulls_lane_s32.c
index a5fe60fbe16983bef97c688948743b2052109e96..04bbe7f9daf19b93ef48779452ff03898cc62c19 100644
--- a/gcc/testsuite/gcc.target/aarch64/simd/vqdmulls_lane_s32.c
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vqdmulls_lane_s32.c
@@ -11,4 +11,4 @@ t_vqdmulls_lane_s32 (int32_t a, int32x2_t b)
   return vqdmulls_lane_s32 (a, b, 0);
 }
 
-/* { dg-final { scan-assembler-times "sqdmull\[ \t\]+\[dD\]\[0-9\]+, ?\[sS\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[sS\]\\\[0\\\]\n" 1 } } */
+/* { dg-final { scan-assembler-times "sqdmull\[ \t\]+\[dD\]\[0-9\]+, ?\[sS\]\[0-9\]+, ?\[sS\]\[0-9\]\n" 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmulls_laneq_s32.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmulls_laneq_s32.c
index bd856d8e71fb1210ecec46f116c47645bbdef4e4..e8e236894fbb7d029995dcb7f9938c4f0c4511f2 100644
--- a/gcc/testsuite/gcc.target/aarch64/simd/vqdmulls_laneq_s32.c
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vqdmulls_laneq_s32.c
@@ -11,4 +11,4 @@ t_vqdmulls_laneq_s32 (int32_t a, int32x4_t b)
   return vqdmulls_laneq_s32 (a, b, 0);
 }
 
-/* { dg-final { scan-assembler-times "sqdmull\[ \t\]+\[dD\]\[0-9\]+, ?\[sS\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[sS\]\\\[0\\\]\n" 1 } } */
+/* { dg-final { scan-assembler-times "sqdmull\[ \t\]+\[dD\]\[0-9\]+, ?\[sS\]\[0-9\]+, ?\[sS\]\[0-9\]\n" 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/dup_lane_1.c b/gcc/testsuite/gcc.target/aarch64/sve/dup_lane_1.c
index 532847bb7e50095217988fbd66e9c58a006fdfc7..14c1f5ab4c2de84bf923eae5ae26e1bdd81cd6ef 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/dup_lane_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/dup_lane_1.c
@@ -56,15 +56,27 @@ TEST_ALL (DUP_LANE)
 
 /* { dg-final { scan-assembler-not {\ttbl\t} } } */
 
-/* { dg-final { scan-assembler-times {\tdup\tz[0-9]+\.d, z[0-9]+\.d\[0\]} 2 } } */
+/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, d[0-9]} 2 {
+		target { aarch64_little_endian } } } } */
+/* { dg-final { scan-assembler-times {\tdup\tz[0-9]+\.d, z[0-9]+\.d\[0\]} 2 {
+		target { aarch64_big_endian } } } } */
 /* { dg-final { scan-assembler-times {\tdup\tz[0-9]+\.d, z[0-9]+\.d\[2\]} 2 } } */
 /* { dg-final { scan-assembler-times {\tdup\tz[0-9]+\.d, z[0-9]+\.d\[3\]} 2 } } */
-/* { dg-final { scan-assembler-times {\tdup\tz[0-9]+\.s, z[0-9]+\.s\[0\]} 2 } } */
+/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.s, s[0-9]} 2 {
+		target { aarch64_little_endian } } } } */
+/* { dg-final { scan-assembler-times {\tdup\tz[0-9]+\.s, z[0-9]+\.s\[0\]} 2 {
+		target { aarch64_big_endian } } } } */
 /* { dg-final { scan-assembler-times {\tdup\tz[0-9]+\.s, z[0-9]+\.s\[5\]} 2 } } */
 /* { dg-final { scan-assembler-times {\tdup\tz[0-9]+\.s, z[0-9]+\.s\[7\]} 2 } } */
-/* { dg-final { scan-assembler-times {\tdup\tz[0-9]+\.h, z[0-9]+\.h\[0\]} 2 } } */
+/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.h, h[0-9]} 2 {
+		target { aarch64_little_endian } } } } */
+/* { dg-final { scan-assembler-times {\tdup\tz[0-9]+\.h, z[0-9]+\.h\[0\]} 2 {
+		target { aarch64_big_endian } } } } */
 /* { dg-final { scan-assembler-times {\tdup\tz[0-9]+\.h, z[0-9]+\.h\[6\]} 2 } } */
 /* { dg-final { scan-assembler-times {\tdup\tz[0-9]+\.h, z[0-9]+\.h\[15\]} 2 } } */
-/* { dg-final { scan-assembler-times {\tdup\tz[0-9]+\.b, z[0-9]+\.b\[0\]} 1 } } */
+/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.b, b[0-9]} 1 {
+		target { aarch64_little_endian } } } } */
+/* { dg-final { scan-assembler-times {\tdup\tz[0-9]+\.b, z[0-9]+\.b\[0\]} 1 {
+		target { aarch64_big_endian } } } } */
 /* { dg-final { scan-assembler-times {\tdup\tz[0-9]+\.b, z[0-9]+\.b\[19\]} 1 } } */
 /* { dg-final { scan-assembler-times {\tdup\tz[0-9]+\.b, z[0-9]+\.b\[31\]} 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/live_1.c b/gcc/testsuite/gcc.target/aarch64/sve/live_1.c
index e8d92ec7e9f57a4f2d1c2fd8b259a41d87eb03c3..80ee176d1807bf628ad47551d69ff5d84deda79e 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/live_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/live_1.c
@@ -32,10 +32,9 @@ TEST_ALL (EXTRACT_LAST)
 /* { dg-final { scan-assembler-times {\twhilelo\tp[0-7].s, } 4 } } */
 /* { dg-final { scan-assembler-times {\twhilelo\tp[0-7].d, } 4 } } */
 
-/* { dg-final { scan-assembler-times {\tlastb\tw[0-9]+, p[0-7], z[0-9]+\.b\n} 1 } } */
-/* { dg-final { scan-assembler-times {\tlastb\tw[0-9]+, p[0-7], z[0-9]+\.h\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tlastb\tb[0-9]+, p[0-7], z[0-9]+\.b\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tlastb\th[0-9]+, p[0-7], z[0-9]+\.h\n} 2 } } */
 /* { dg-final { scan-assembler-times {\tlastb\tw[0-9]+, p[0-7], z[0-9]+\.s\n} 1 } } */
 /* { dg-final { scan-assembler-times {\tlastb\tx[0-9]+, p[0-7], z[0-9]+\.d\n} 1 } } */
-/* { dg-final { scan-assembler-times {\tlastb\th[0-9]+, p[0-7], z[0-9]+\.h\n} 1 } } */
 /* { dg-final { scan-assembler-times {\tlastb\ts[0-9]+, p[0-7], z[0-9]+\.s\n} 1 } } */
 /* { dg-final { scan-assembler-times {\tlastb\td[0-9]+, p[0-7], z[0-9]+\.d\n} 1 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_f16.c
index 2a5aa63f4572a666e50d7825c8820d49eb9cd70e..a92e1d47393ac1e6d5d39d967787c4a88f16d0f9 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_f16.c
@@ -8,7 +8,7 @@
 float16_t
 foo (float16x8_t a)
 {
-  return vgetq_lane_f16 (a, 0);
+  return vgetq_lane_f16 (a, 1);
 }
 
 /* { dg-final { scan-assembler "vmov.u16"  }  } */
@@ -16,7 +16,7 @@ foo (float16x8_t a)
 float16_t
 foo1 (float16x8_t a)
 {
-  return vgetq_lane (a, 0);
+  return vgetq_lane (a, 1);
 }
 
 /* { dg-final { scan-assembler "vmov.u16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_f32.c
index f1839cccffe1c34478f2372cd20b47761357b142..98319eff5c0f5825edd3563b8fa018a437fa3458 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_f32.c
@@ -8,7 +8,7 @@
 float32_t
 foo (float32x4_t a)
 {
-  return vgetq_lane_f32 (a, 0);
+  return vgetq_lane_f32 (a, 1);
 }
 
 /* { dg-final { scan-assembler "vmov.32"  }  } */
@@ -16,7 +16,7 @@ foo (float32x4_t a)
 float32_t
 foo1 (float32x4_t a)
 {
-  return vgetq_lane (a, 0);
+  return vgetq_lane (a, 1);
 }
 
 /* { dg-final { scan-assembler "vmov.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s16.c
index ed1c2178839568dcc3eea3342606ba8eff57ea72..c9eefeb9972eaac8168218b5c10c5efaa2e59fce 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s16.c
@@ -8,7 +8,7 @@
 int16_t
 foo (int16x8_t a)
 {
-  return vgetq_lane_s16 (a, 0);
+  return vgetq_lane_s16 (a, 1);
 }
 
 /* { dg-final { scan-assembler "vmov.s16"  }  } */
@@ -16,7 +16,7 @@ foo (int16x8_t a)
 int16_t
 foo1 (int16x8_t a)
 {
-  return vgetq_lane (a, 0);
+  return vgetq_lane (a, 1);
 }
 
 /* { dg-final { scan-assembler "vmov.s16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s32.c
index c87ed93e70def5bbf6b1055d99656f7386f97ea8..0925a25bb45df9708d46038b5f534a02a2d6dbbb 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s32.c
@@ -8,7 +8,7 @@
 int32_t
 foo (int32x4_t a)
 {
-  return vgetq_lane_s32 (a, 0);
+  return vgetq_lane_s32 (a, 1);
 }
 
 /* { dg-final { scan-assembler "vmov.32"  }  } */
@@ -16,7 +16,7 @@ foo (int32x4_t a)
 int32_t
 foo1 (int32x4_t a)
 {
-  return vgetq_lane (a, 0);
+  return vgetq_lane (a, 1);
 }
 
 /* { dg-final { scan-assembler "vmov.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s8.c
index 11242ff3bc090a11bf7f8f163f0348824158bed7..5b76e3da5562fb8e2a2a49de851bed3329bc6ea0 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s8.c
@@ -8,7 +8,7 @@
 int8_t
 foo (int8x16_t a)
 {
-  return vgetq_lane_s8 (a, 0);
+  return vgetq_lane_s8 (a, 1);
 }
 
 /* { dg-final { scan-assembler "vmov.s8"  }  } */
@@ -16,7 +16,7 @@ foo (int8x16_t a)
 int8_t
 foo1 (int8x16_t a)
 {
-  return vgetq_lane (a, 0);
+  return vgetq_lane (a, 1);
 }
 
 /* { dg-final { scan-assembler "vmov.s8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u16.c
index 2788b585535c46a3271be65849b1ba058df1adcf..c4a3fb0d3794c67a789c3c479fa7ca6415da35c4 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u16.c
@@ -8,7 +8,7 @@
 uint16_t
 foo (uint16x8_t a)
 {
-  return vgetq_lane_u16 (a, 0);
+  return vgetq_lane_u16 (a, 1);
 }
 
 /* { dg-final { scan-assembler "vmov.u16"  }  } */
@@ -16,7 +16,7 @@ foo (uint16x8_t a)
 uint16_t
 foo1 (uint16x8_t a)
 {
-  return vgetq_lane (a, 0);
+  return vgetq_lane (a, 1);
 }
 
 /* { dg-final { scan-assembler "vmov.u16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u32.c
index 721c5a5ffd77cd1ad038d44f32fa197fe2687311..d79837023248e84d4c30774afc07e243edc8ba65 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u32.c
@@ -8,7 +8,7 @@
 uint32_t
 foo (uint32x4_t a)
 {
-  return vgetq_lane_u32 (a, 0);
+  return vgetq_lane_u32 (a, 1);
 }
 
 /* { dg-final { scan-assembler "vmov.32"  }  } */
@@ -16,7 +16,7 @@ foo (uint32x4_t a)
 uint32_t
 foo1 (uint32x4_t a)
 {
-  return vgetq_lane (a, 0);
+  return vgetq_lane (a, 1);
 }
 
 /* { dg-final { scan-assembler "vmov.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u8.c
index 2bcaeac3fe1f5775f448d7f702ea139726fadcc3..631d995dc17f99c7a30cb9cbf56883f818fa2b1d 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u8.c
@@ -8,7 +8,7 @@
 uint8_t
 foo (uint8x16_t a)
 {
-  return vgetq_lane_u8 (a, 0);
+  return vgetq_lane_u8 (a, 1);
 }
 
 /* { dg-final { scan-assembler "vmov.u8"  }  } */
@@ -16,7 +16,7 @@ foo (uint8x16_t a)
 uint8_t
 foo1 (uint8x16_t a)
 {
-  return vgetq_lane (a, 0);
+  return vgetq_lane (a, 1);
 }
 
 /* { dg-final { scan-assembler "vmov.u8"  }  } */

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-08-03  9:36 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-02  9:53 [PATCH] gcc: Add vec_select -> subreg RTL simplification Jonathan Wright
2021-07-07 13:35 ` [PATCH V2] " Jonathan Wright
2021-07-12 15:30   ` Richard Sandiford
2021-07-15  9:09     ` Christophe Lyon
2021-07-15 13:06       ` Jonathan Wright
2021-08-03  9:36         ` Christophe Lyon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).