[PATCH] aarch64: Model zero-high-half semantics of SQXTUN instruction in RTL

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] aarch64: Model zero-high-half semantics of SQXTUN instruction in RTL
@ 2021-06-15  9:52 Jonathan Wright
  2021-06-16  9:10 ` [PATCH V2] " Jonathan Wright
  0 siblings, 1 reply; 3+ messages in thread
From: Jonathan Wright @ 2021-06-15  9:52 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1067 bytes --]

Hi,

As subject, this patch first splits the aarch64_sqmovun<mode> pattern
into separate scalar and vector variants. It then further split the vector
pattern into big/little endian variants that model the zero-high-half
semantics of the underlying instruction. Modeling these semantics
allows for better RTL combinations while also removing some register
allocation issues as the compiler now knows that the operation is
totally destructive.

Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-06-14  Jonathan Wright  <jonathan.wright@arm.com>

	* config/aarch64/aarch64-simd-builtins.def: Split generator
	for aarch64_sqmovun builtins into scalar and vector variants.
	* config/aarch64/aarch64-simd.md (aarch64_sqmovun<mode>):
	Split into scalar and vector variants. Change vector variant
	to an expander that emits the correct instruction depending
	on endianness.
	(aarch64_sqmovun<mode>_insn_le): Define.
	(aarch64_sqmovun<mode>_insn_be): Define.

[-- Attachment #2: rb14564.patch --]
[-- Type: application/octet-stream, Size: 3576 bytes --]

diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 18baa6720b09b2ebda8577b809f8a8683f8b44f0..2adb4b127527794d19b2bbd4859f089d3da47763 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -263,7 +263,9 @@
   BUILTIN_VQ_HSI (TERNOP, smlal_hi_n, 0, NONE)
   BUILTIN_VQ_HSI (TERNOPU, umlal_hi_n, 0, NONE)
 
-  BUILTIN_VSQN_HSDI (UNOPUS, sqmovun, 0, NONE)
+  /* Implemented by aarch64_sqmovun<mode>.  */
+  BUILTIN_VQN (UNOPUS, sqmovun, 0, NONE)
+  BUILTIN_SD_HSDI (UNOPUS, sqmovun, 0, NONE)
 
   /* Implemented by aarch64_sqxtun2<mode>.  */
   BUILTIN_VQN (BINOP_UUS, sqxtun2, 0, NONE)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index b23556b551cbbef420950007e9714acf190a534d..59779b851fbeecb17cd2cddbb0ed8770a22762b5 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -4870,17 +4870,6 @@
   [(set_attr "type" "neon_qadd<q>")]
 )
 
-;; sqmovun
-
-(define_insn "aarch64_sqmovun<mode>"
-  [(set (match_operand:<VNARROWQ> 0 "register_operand" "=w")
-	(unspec:<VNARROWQ> [(match_operand:VSQN_HSDI 1 "register_operand" "w")]
-                            UNSPEC_SQXTUN))]
-   "TARGET_SIMD"
-   "sqxtun\\t%<vn2>0<Vmntype>, %<v>1<Vmtype>"
-   [(set_attr "type" "neon_sat_shift_imm_narrow_q")]
-)
-
 ;; sqmovn and uqmovn
 
 (define_insn "aarch64_<su>qmovn<mode>"
@@ -4931,6 +4920,61 @@
   }
 )
 
+;; sqmovun
+
+(define_insn "aarch64_sqmovun<mode>"
+  [(set (match_operand:<VNARROWQ> 0 "register_operand" "=w")
+	(unspec:<VNARROWQ> [(match_operand:SD_HSDI 1 "register_operand" "w")]
+			   UNSPEC_SQXTUN))]
+   "TARGET_SIMD"
+   "sqxtun\\t%<vn2>0<Vmntype>, %<v>1<Vmtype>"
+   [(set_attr "type" "neon_sat_shift_imm_narrow_q")]
+)
+
+(define_insn "aarch64_sqmovun<mode>_insn_le"
+  [(set (match_operand:<VNARROWQ2> 0 "register_operand" "=w")
+	(vec_concat:<VNARROWQ2>
+	  (unspec:<VNARROWQ> [(match_operand:VQN 1 "register_operand" "w")]
+			     UNSPEC_SQXTUN)
+	  (match_operand:<VNARROWQ> 2 "aarch64_simd_or_scalar_imm_zero")))]
+  "TARGET_SIMD && !BYTES_BIG_ENDIAN"
+  "sqxtun\\t%<vn2>0<Vmntype>, %<v>1<Vmtype>"
+  [(set_attr "type" "neon_sat_shift_imm_narrow_q")]
+)
+
+(define_insn "aarch64_sqmovun<mode>_insn_be"
+  [(set (match_operand:<VNARROWQ2> 0 "register_operand" "=w")
+	(vec_concat:<VNARROWQ2>
+	  (match_operand:<VNARROWQ> 2 "aarch64_simd_or_scalar_imm_zero")
+	  (unspec:<VNARROWQ> [(match_operand:VQN 1 "register_operand" "w")]
+			     UNSPEC_SQXTUN)))]
+  "TARGET_SIMD && BYTES_BIG_ENDIAN"
+  "sqxtun\\t%<vn2>0<Vmntype>, %<v>1<Vmtype>"
+  [(set_attr "type" "neon_sat_shift_imm_narrow_q")]
+)
+
+(define_expand "aarch64_sqmovun<mode>"
+  [(set (match_operand:<VNARROWQ> 0 "register_operand")
+	(unspec:<VNARROWQ> [(match_operand:VQN 1 "register_operand")]
+			   UNSPEC_SQXTUN))]
+  "TARGET_SIMD"
+  {
+    rtx tmp = gen_reg_rtx (<VNARROWQ2>mode);
+    if (BYTES_BIG_ENDIAN)
+      emit_insn (gen_aarch64_sqmovun<mode>_insn_be (tmp, operands[1],
+				CONST0_RTX (<VNARROWQ>mode)));
+    else
+      emit_insn (gen_aarch64_sqmovun<mode>_insn_le (tmp, operands[1],
+				CONST0_RTX (<VNARROWQ>mode)));
+
+    /* The intrinsic expects a narrow result, so emit a subreg that will get
+       optimized away as appropriate.  */
+    emit_move_insn (operands[0], lowpart_subreg (<VNARROWQ>mode, tmp,
+						 <VNARROWQ2>mode));
+    DONE;
+  }
+)
+
 (define_insn "aarch64_sqxtun2<mode>_le"
   [(set (match_operand:<VNARROWQ2> 0 "register_operand" "=w")
 	(vec_concat:<VNARROWQ2>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH V2] aarch64: Model zero-high-half semantics of SQXTUN instruction in RTL
  2021-06-15  9:52 [PATCH] aarch64: Model zero-high-half semantics of SQXTUN instruction in RTL Jonathan Wright
@ 2021-06-16  9:10 ` Jonathan Wright
  2021-06-16 12:30   ` Richard Sandiford
  0 siblings, 1 reply; 3+ messages in thread
From: Jonathan Wright @ 2021-06-16  9:10 UTC (permalink / raw)
  To: gcc-patches; +Cc: Kyrylo Tkachov, Richard Sandiford

[-- Attachment #1: Type: text/plain, Size: 2152 bytes --]

Hi,

Version 2 of the patch adds tests to verify the benefit of this change.

Ok for master?

Thanks,
Jonathan

---
gcc/ChangeLog:

2021-06-14  Jonathan Wright  <jonathan.wright@arm.com>

	* config/aarch64/aarch64-simd-builtins.def: Split generator
	for aarch64_sqmovun builtins into scalar and vector variants.
	* config/aarch64/aarch64-simd.md (aarch64_sqmovun<mode>):
	Split into scalar and vector variants. Change vector variant
	to an expander that emits the correct instruction depending
	on endianness.
	(aarch64_sqmovun<mode>_insn_le): Define.
	(aarch64_sqmovun<mode>_insn_be): Define.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/narrow_zero_high_half.c: Add new tests.

From: Gcc-patches <gcc-patches-bounces+jonathan.wright=arm.com@gcc.gnu.org> on behalf of Jonathan Wright via Gcc-patches <gcc-patches@gcc.gnu.org>
Sent: 15 June 2021 10:52
To: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>
Subject: [PATCH] aarch64: Model zero-high-half semantics of SQXTUN instruction in RTL 
 
Hi,

As subject, this patch first splits the aarch64_sqmovun<mode> pattern
into separate scalar and vector variants. It then further split the vector
pattern into big/little endian variants that model the zero-high-half
semantics of the underlying instruction. Modeling these semantics
allows for better RTL combinations while also removing some register
allocation issues as the compiler now knows that the operation is
totally destructive.

Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-06-14  Jonathan Wright  <jonathan.wright@arm.com>

        * config/aarch64/aarch64-simd-builtins.def: Split generator
        for aarch64_sqmovun builtins into scalar and vector variants.
        * config/aarch64/aarch64-simd.md (aarch64_sqmovun<mode>):
        Split into scalar and vector variants. Change vector variant
        to an expander that emits the correct instruction depending
        on endianness.
        (aarch64_sqmovun<mode>_insn_le): Define.
        (aarch64_sqmovun<mode>_insn_be): Define.

[-- Attachment #2: rb14564.patch --]
[-- Type: application/octet-stream, Size: 4712 bytes --]

diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 18baa6720b09b2ebda8577b809f8a8683f8b44f0..2adb4b127527794d19b2bbd4859f089d3da47763 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -263,7 +263,9 @@
   BUILTIN_VQ_HSI (TERNOP, smlal_hi_n, 0, NONE)
   BUILTIN_VQ_HSI (TERNOPU, umlal_hi_n, 0, NONE)
 
-  BUILTIN_VSQN_HSDI (UNOPUS, sqmovun, 0, NONE)
+  /* Implemented by aarch64_sqmovun<mode>.  */
+  BUILTIN_VQN (UNOPUS, sqmovun, 0, NONE)
+  BUILTIN_SD_HSDI (UNOPUS, sqmovun, 0, NONE)
 
   /* Implemented by aarch64_sqxtun2<mode>.  */
   BUILTIN_VQN (BINOP_UUS, sqxtun2, 0, NONE)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index b23556b551cbbef420950007e9714acf190a534d..59779b851fbeecb17cd2cddbb0ed8770a22762b5 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -4870,17 +4870,6 @@
   [(set_attr "type" "neon_qadd<q>")]
 )
 
-;; sqmovun
-
-(define_insn "aarch64_sqmovun<mode>"
-  [(set (match_operand:<VNARROWQ> 0 "register_operand" "=w")
-	(unspec:<VNARROWQ> [(match_operand:VSQN_HSDI 1 "register_operand" "w")]
-                            UNSPEC_SQXTUN))]
-   "TARGET_SIMD"
-   "sqxtun\\t%<vn2>0<Vmntype>, %<v>1<Vmtype>"
-   [(set_attr "type" "neon_sat_shift_imm_narrow_q")]
-)
-
 ;; sqmovn and uqmovn
 
 (define_insn "aarch64_<su>qmovn<mode>"
@@ -4931,6 +4920,61 @@
   }
 )
 
+;; sqmovun
+
+(define_insn "aarch64_sqmovun<mode>"
+  [(set (match_operand:<VNARROWQ> 0 "register_operand" "=w")
+	(unspec:<VNARROWQ> [(match_operand:SD_HSDI 1 "register_operand" "w")]
+			   UNSPEC_SQXTUN))]
+   "TARGET_SIMD"
+   "sqxtun\\t%<vn2>0<Vmntype>, %<v>1<Vmtype>"
+   [(set_attr "type" "neon_sat_shift_imm_narrow_q")]
+)
+
+(define_insn "aarch64_sqmovun<mode>_insn_le"
+  [(set (match_operand:<VNARROWQ2> 0 "register_operand" "=w")
+	(vec_concat:<VNARROWQ2>
+	  (unspec:<VNARROWQ> [(match_operand:VQN 1 "register_operand" "w")]
+			     UNSPEC_SQXTUN)
+	  (match_operand:<VNARROWQ> 2 "aarch64_simd_or_scalar_imm_zero")))]
+  "TARGET_SIMD && !BYTES_BIG_ENDIAN"
+  "sqxtun\\t%<vn2>0<Vmntype>, %<v>1<Vmtype>"
+  [(set_attr "type" "neon_sat_shift_imm_narrow_q")]
+)
+
+(define_insn "aarch64_sqmovun<mode>_insn_be"
+  [(set (match_operand:<VNARROWQ2> 0 "register_operand" "=w")
+	(vec_concat:<VNARROWQ2>
+	  (match_operand:<VNARROWQ> 2 "aarch64_simd_or_scalar_imm_zero")
+	  (unspec:<VNARROWQ> [(match_operand:VQN 1 "register_operand" "w")]
+			     UNSPEC_SQXTUN)))]
+  "TARGET_SIMD && BYTES_BIG_ENDIAN"
+  "sqxtun\\t%<vn2>0<Vmntype>, %<v>1<Vmtype>"
+  [(set_attr "type" "neon_sat_shift_imm_narrow_q")]
+)
+
+(define_expand "aarch64_sqmovun<mode>"
+  [(set (match_operand:<VNARROWQ> 0 "register_operand")
+	(unspec:<VNARROWQ> [(match_operand:VQN 1 "register_operand")]
+			   UNSPEC_SQXTUN))]
+  "TARGET_SIMD"
+  {
+    rtx tmp = gen_reg_rtx (<VNARROWQ2>mode);
+    if (BYTES_BIG_ENDIAN)
+      emit_insn (gen_aarch64_sqmovun<mode>_insn_be (tmp, operands[1],
+				CONST0_RTX (<VNARROWQ>mode)));
+    else
+      emit_insn (gen_aarch64_sqmovun<mode>_insn_le (tmp, operands[1],
+				CONST0_RTX (<VNARROWQ>mode)));
+
+    /* The intrinsic expects a narrow result, so emit a subreg that will get
+       optimized away as appropriate.  */
+    emit_move_insn (operands[0], lowpart_subreg (<VNARROWQ>mode, tmp,
+						 <VNARROWQ2>mode));
+    DONE;
+  }
+)
+
 (define_insn "aarch64_sqxtun2<mode>_le"
   [(set (match_operand:<VNARROWQ2> 0 "register_operand" "=w")
 	(vec_concat:<VNARROWQ2>
diff --git a/gcc/testsuite/gcc.target/aarch64/narrow_zero_high_half.c b/gcc/testsuite/gcc.target/aarch64/narrow_zero_high_half.c
index 78c474f3025cbf56d14323d8f05bfb73e003ebfd..27660be963c080a05f489e94c453044588f30f5f 100644
--- a/gcc/testsuite/gcc.target/aarch64/narrow_zero_high_half.c
+++ b/gcc/testsuite/gcc.target/aarch64/narrow_zero_high_half.c
@@ -63,6 +63,10 @@ TEST_UNARY (vmovn, uint8x16_t, uint16x8_t, u16, u8)
 TEST_UNARY (vmovn, uint16x8_t, uint32x4_t, u32, u16)
 TEST_UNARY (vmovn, uint32x4_t, uint64x2_t, u64, u32)
 
+TEST_UNARY (vqmovun, uint8x16_t, int16x8_t, s16, u8)
+TEST_UNARY (vqmovun, uint16x8_t, int32x4_t, s32, u16)
+TEST_UNARY (vqmovun, uint32x4_t, int64x2_t, s64, u32)
+
 /* { dg-final { scan-assembler-not "dup\\t" } } */
 
 /* { dg-final { scan-assembler-times "\\trshrn\\tv" 6} }  */
@@ -74,3 +78,4 @@ TEST_UNARY (vmovn, uint32x4_t, uint64x2_t, u64, u32)
 /* { dg-final { scan-assembler-times "\\tsqrshrn\\tv" 3} }  */
 /* { dg-final { scan-assembler-times "\\tuqrshrn\\tv" 3} }  */
 /* { dg-final { scan-assembler-times "\\txtn\\tv" 6} }  */
+/* { dg-final { scan-assembler-times "\\tsqxtun\\tv" 3} }  */

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH V2] aarch64: Model zero-high-half semantics of SQXTUN instruction in RTL
  2021-06-16  9:10 ` [PATCH V2] " Jonathan Wright
@ 2021-06-16 12:30   ` Richard Sandiford
  0 siblings, 0 replies; 3+ messages in thread
From: Richard Sandiford @ 2021-06-16 12:30 UTC (permalink / raw)
  To: Jonathan Wright; +Cc: gcc-patches, Kyrylo Tkachov

Jonathan Wright <Jonathan.Wright@arm.com> writes:
> Hi,
>
> Version 2 of the patch adds tests to verify the benefit of this change.
>
> Ok for master?
>
> Thanks,
> Jonathan
>
> ---
> gcc/ChangeLog:
>
> 2021-06-14  Jonathan Wright  <jonathan.wright@arm.com>
>
>         * config/aarch64/aarch64-simd-builtins.def: Split generator
>         for aarch64_sqmovun builtins into scalar and vector variants.
>         * config/aarch64/aarch64-simd.md (aarch64_sqmovun<mode>):
>         Split into scalar and vector variants. Change vector variant
>         to an expander that emits the correct instruction depending
>         on endianness.
>         (aarch64_sqmovun<mode>_insn_le): Define.
>         (aarch64_sqmovun<mode>_insn_be): Define.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/aarch64/narrow_zero_high_half.c: Add new tests.

OK, thanks.

> From: Gcc-patches <gcc-patches-bounces+jonathan.wright=arm.com@gcc.gnu.org> on behalf of Jonathan Wright via Gcc-patches <gcc-patches@gcc.gnu.org>
> Sent: 15 June 2021 10:52
> To: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>
> Subject: [PATCH] aarch64: Model zero-high-half semantics of SQXTUN instruction in RTL
>
> Hi,
>
> As subject, this patch first splits the aarch64_sqmovun<mode> pattern
> into separate scalar and vector variants. It then further split the vector
> pattern into big/little endian variants that model the zero-high-half
> semantics of the underlying instruction. Modeling these semantics
> allows for better RTL combinations while also removing some register
> allocation issues as the compiler now knows that the operation is
> totally destructive.
>
> Regression tested and bootstrapped on aarch64-none-linux-gnu - no
> issues.
>
> Ok for master?
>
> Thanks,
> Jonathan
>
> ---
>
> gcc/ChangeLog:
>
> 2021-06-14  Jonathan Wright  <jonathan.wright@arm.com>
>
>         * config/aarch64/aarch64-simd-builtins.def: Split generator
>         for aarch64_sqmovun builtins into scalar and vector variants.
>         * config/aarch64/aarch64-simd.md (aarch64_sqmovun<mode>):
>         Split into scalar and vector variants. Change vector variant
>         to an expander that emits the correct instruction depending
>         on endianness.
>         (aarch64_sqmovun<mode>_insn_le): Define.
>         (aarch64_sqmovun<mode>_insn_be): Define.
>
> diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
> index 18baa6720b09b2ebda8577b809f8a8683f8b44f0..2adb4b127527794d19b2bbd4859f089d3da47763 100644
> --- a/gcc/config/aarch64/aarch64-simd-builtins.def
> +++ b/gcc/config/aarch64/aarch64-simd-builtins.def
> @@ -263,7 +263,9 @@
>    BUILTIN_VQ_HSI (TERNOP, smlal_hi_n, 0, NONE)
>    BUILTIN_VQ_HSI (TERNOPU, umlal_hi_n, 0, NONE)
>  
> -  BUILTIN_VSQN_HSDI (UNOPUS, sqmovun, 0, NONE)
> +  /* Implemented by aarch64_sqmovun<mode>.  */
> +  BUILTIN_VQN (UNOPUS, sqmovun, 0, NONE)
> +  BUILTIN_SD_HSDI (UNOPUS, sqmovun, 0, NONE)
>  
>    /* Implemented by aarch64_sqxtun2<mode>.  */
>    BUILTIN_VQN (BINOP_UUS, sqxtun2, 0, NONE)
> diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
> index b23556b551cbbef420950007e9714acf190a534d..59779b851fbeecb17cd2cddbb0ed8770a22762b5 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -4870,17 +4870,6 @@
>    [(set_attr "type" "neon_qadd<q>")]
>  )
>  
> -;; sqmovun
> -
> -(define_insn "aarch64_sqmovun<mode>"
> -  [(set (match_operand:<VNARROWQ> 0 "register_operand" "=w")
> -	(unspec:<VNARROWQ> [(match_operand:VSQN_HSDI 1 "register_operand" "w")]
> -                            UNSPEC_SQXTUN))]
> -   "TARGET_SIMD"
> -   "sqxtun\\t%<vn2>0<Vmntype>, %<v>1<Vmtype>"
> -   [(set_attr "type" "neon_sat_shift_imm_narrow_q")]
> -)
> -
>  ;; sqmovn and uqmovn
>  
>  (define_insn "aarch64_<su>qmovn<mode>"
> @@ -4931,6 +4920,61 @@
>    }
>  )
>  
> +;; sqmovun
> +
> +(define_insn "aarch64_sqmovun<mode>"
> +  [(set (match_operand:<VNARROWQ> 0 "register_operand" "=w")
> +	(unspec:<VNARROWQ> [(match_operand:SD_HSDI 1 "register_operand" "w")]
> +			   UNSPEC_SQXTUN))]
> +   "TARGET_SIMD"
> +   "sqxtun\\t%<vn2>0<Vmntype>, %<v>1<Vmtype>"
> +   [(set_attr "type" "neon_sat_shift_imm_narrow_q")]
> +)
> +
> +(define_insn "aarch64_sqmovun<mode>_insn_le"
> +  [(set (match_operand:<VNARROWQ2> 0 "register_operand" "=w")
> +	(vec_concat:<VNARROWQ2>
> +	  (unspec:<VNARROWQ> [(match_operand:VQN 1 "register_operand" "w")]
> +			     UNSPEC_SQXTUN)
> +	  (match_operand:<VNARROWQ> 2 "aarch64_simd_or_scalar_imm_zero")))]
> +  "TARGET_SIMD && !BYTES_BIG_ENDIAN"
> +  "sqxtun\\t%<vn2>0<Vmntype>, %<v>1<Vmtype>"
> +  [(set_attr "type" "neon_sat_shift_imm_narrow_q")]
> +)
> +
> +(define_insn "aarch64_sqmovun<mode>_insn_be"
> +  [(set (match_operand:<VNARROWQ2> 0 "register_operand" "=w")
> +	(vec_concat:<VNARROWQ2>
> +	  (match_operand:<VNARROWQ> 2 "aarch64_simd_or_scalar_imm_zero")
> +	  (unspec:<VNARROWQ> [(match_operand:VQN 1 "register_operand" "w")]
> +			     UNSPEC_SQXTUN)))]
> +  "TARGET_SIMD && BYTES_BIG_ENDIAN"
> +  "sqxtun\\t%<vn2>0<Vmntype>, %<v>1<Vmtype>"
> +  [(set_attr "type" "neon_sat_shift_imm_narrow_q")]
> +)
> +
> +(define_expand "aarch64_sqmovun<mode>"
> +  [(set (match_operand:<VNARROWQ> 0 "register_operand")
> +	(unspec:<VNARROWQ> [(match_operand:VQN 1 "register_operand")]
> +			   UNSPEC_SQXTUN))]
> +  "TARGET_SIMD"
> +  {
> +    rtx tmp = gen_reg_rtx (<VNARROWQ2>mode);
> +    if (BYTES_BIG_ENDIAN)
> +      emit_insn (gen_aarch64_sqmovun<mode>_insn_be (tmp, operands[1],
> +				CONST0_RTX (<VNARROWQ>mode)));
> +    else
> +      emit_insn (gen_aarch64_sqmovun<mode>_insn_le (tmp, operands[1],
> +				CONST0_RTX (<VNARROWQ>mode)));
> +
> +    /* The intrinsic expects a narrow result, so emit a subreg that will get
> +       optimized away as appropriate.  */
> +    emit_move_insn (operands[0], lowpart_subreg (<VNARROWQ>mode, tmp,
> +						 <VNARROWQ2>mode));
> +    DONE;
> +  }
> +)
> +
>  (define_insn "aarch64_sqxtun2<mode>_le"
>    [(set (match_operand:<VNARROWQ2> 0 "register_operand" "=w")
>  	(vec_concat:<VNARROWQ2>
> diff --git a/gcc/testsuite/gcc.target/aarch64/narrow_zero_high_half.c b/gcc/testsuite/gcc.target/aarch64/narrow_zero_high_half.c
> index 78c474f3025cbf56d14323d8f05bfb73e003ebfd..27660be963c080a05f489e94c453044588f30f5f 100644
> --- a/gcc/testsuite/gcc.target/aarch64/narrow_zero_high_half.c
> +++ b/gcc/testsuite/gcc.target/aarch64/narrow_zero_high_half.c
> @@ -63,6 +63,10 @@ TEST_UNARY (vmovn, uint8x16_t, uint16x8_t, u16, u8)
>  TEST_UNARY (vmovn, uint16x8_t, uint32x4_t, u32, u16)
>  TEST_UNARY (vmovn, uint32x4_t, uint64x2_t, u64, u32)
>  
> +TEST_UNARY (vqmovun, uint8x16_t, int16x8_t, s16, u8)
> +TEST_UNARY (vqmovun, uint16x8_t, int32x4_t, s32, u16)
> +TEST_UNARY (vqmovun, uint32x4_t, int64x2_t, s64, u32)
> +
>  /* { dg-final { scan-assembler-not "dup\\t" } } */
>  
>  /* { dg-final { scan-assembler-times "\\trshrn\\tv" 6} }  */
> @@ -74,3 +78,4 @@ TEST_UNARY (vmovn, uint32x4_t, uint64x2_t, u64, u32)
>  /* { dg-final { scan-assembler-times "\\tsqrshrn\\tv" 3} }  */
>  /* { dg-final { scan-assembler-times "\\tuqrshrn\\tv" 3} }  */
>  /* { dg-final { scan-assembler-times "\\txtn\\tv" 6} }  */
> +/* { dg-final { scan-assembler-times "\\tsqxtun\\tv" 3} }  */

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-06-16 12:30 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-15  9:52 [PATCH] aarch64: Model zero-high-half semantics of SQXTUN instruction in RTL Jonathan Wright
2021-06-16  9:10 ` [PATCH V2] " Jonathan Wright
2021-06-16 12:30   ` Richard Sandiford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).