[AArch64][PATCH 1/7] Add support for ARMv8.1 Adv.SIMD,instructions.

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [AArch64][PATCH 1/7] Add support for ARMv8.1 Adv.SIMD,instructions.
@ 2015-10-23 12:19 Matthew Wahab
  2015-10-23 12:19 ` [AArch64][PATCH 2/7] Add sqrdmah, sqrdmsh instructions Matthew Wahab
                   ` (6 more replies)
  0 siblings, 7 replies; 30+ messages in thread
From: Matthew Wahab @ 2015-10-23 12:19 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1240 bytes --]

The ARMv8.1 architecture extension adds two Adv.SIMD instructions,
sqrdmlah and sqrdmlsh. This patch series adds the instructions to the
AArch64 backend together with the ACLE feature macro and NEON intrinsics
to make use of them. The instructions are enabled when -march=armv8.1-a
is selected.

To support execution tests for the instructions, code is also added to
the testsuite to check the target capabilities and to specify required
compiler options.

This patch adds target feature macros for the instructions. Subsequent
patches:
- add the instructions to the aarch64-simd patterns,
- add GCC builtins to generate the instructions,
- add the ACLE feature macro __ARM_FEATURE_QRDMX,
- add support for ARMv8.1-A Adv.SIMD tests to the dejagnu support code,
- add NEON intrinsics for the basic form of the instructions.
- add NEON intrinsics for the *_lane forms of the instructions.

Tested the series for aarch64-none-linux-gnu with native bootstrap and
make check on an ARMv8 architecture. Also tested aarch64-none-elf with
cross-compiled check-gcc on an ARMv8.1 emulator.

Ok for trunk?
Matthew

gcc/
2015-10-23  Matthew Wahab  <matthew.wahab@arm.com>

	* config/aarch64/aarch64.h (AARCH64_ISA_RDMA): New.
	(TARGET_SIMD_RDMA): New.


[-- Attachment #2: 0001-Add-RDMA-target-feature.patch --]
[-- Type: text/x-patch, Size: 1326 bytes --]

From 4933ff4839406cdff2d2ec87920cab257a90474d Mon Sep 17 00:00:00 2001
From: Matthew Wahab <matthew.wahab@arm.com>
Date: Thu, 27 Aug 2015 13:31:17 +0100
Subject: [PATCH 1/7] Add RDMA target feature.

Change-Id: Ic22d5ae4c8dc012bd8e63dfd82a21935f44be50c
---
 gcc/config/aarch64/aarch64.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index b041a1e..c67eac9 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -157,6 +157,7 @@ extern unsigned aarch64_architecture_version;
 #define AARCH64_ISA_FP             (aarch64_isa_flags & AARCH64_FL_FP)
 #define AARCH64_ISA_SIMD           (aarch64_isa_flags & AARCH64_FL_SIMD)
 #define AARCH64_ISA_LSE		   (aarch64_isa_flags & AARCH64_FL_LSE)
+#define AARCH64_ISA_RDMA	   (aarch64_isa_flags & AARCH64_FL_RDMA)
 
 /* Crypto is an optional extension to AdvSIMD.  */
 #define TARGET_CRYPTO (TARGET_SIMD && AARCH64_ISA_CRYPTO)
@@ -181,6 +182,9 @@ extern unsigned aarch64_architecture_version;
   ((aarch64_fix_a53_err835769 == 2)	\
   ? TARGET_FIX_ERR_A53_835769_DEFAULT : aarch64_fix_a53_err835769)
 
+/* ARMv8.1 Adv.SIMD support.  */
+#define TARGET_SIMD_RDMA (TARGET_SIMD && AARCH64_ISA_RDMA)
+
 /* Standard register usage.  */
 
 /* 31 64-bit general purpose registers R0-R30:
-- 
2.1.4


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [AArch64][PATCH 2/7] Add sqrdmah, sqrdmsh instructions.
  2015-10-23 12:19 [AArch64][PATCH 1/7] Add support for ARMv8.1 Adv.SIMD,instructions Matthew Wahab
@ 2015-10-23 12:19 ` Matthew Wahab
  2015-10-27 11:19   ` James Greenhalgh
  2015-10-23 12:21 ` [AArch64][PATCH 3/7] Add builtins for ARMv8.1 Adv.SIMD,instructions Matthew Wahab
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 30+ messages in thread
From: Matthew Wahab @ 2015-10-23 12:19 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1216 bytes --]

The ARMv8.1 architecture extension adds two Adv.SIMD instructions,
sqrdmlah and sqrdmlsh. This patch adds the instructions to the
aarch64-simd patterns, making them conditional on the TARGET_SIMD_RDMA
feature macro introduced in the previous patch.

The instructions patterns are defined using unspec expressions, so that
they are only generated through builtins added by this patch series. To
simplify the definition, iterators SQRDMLAH and rdma_as are added, to
iterate over the add (sqrdmlah) and subtract (sqrdmlsh) forms of the
instructions.

Tested the series for aarch64-none-linux-gnu with native bootstrap and
make check on an ARMv8 architecture. Also tested aarch64-none-elf with
cross-compiled check-gcc on an ARMv8.1 emulator.

Ok for trunk?
Matthew

gcc/
2015-10-23  Matthew Wahab  <matthew.wahab@arm.com>

	* config/aarch64/aarch64-simd.md
	(aarch64_sqmovun<mode>): Fix some white-space.
	(aarch64_<sur>qmovun<mode>): Likewise.
	(aarch64_sqrdml<SQRDMLAH:rdma_as>h<mode>): New.
	(aarch64_sqrdml<SQRDMLAH:rdma_as>h_lane<mode>): New.
	(aarch64_sqrdml<SQRDMLAH:rdma_as>h_laneq<mode>): New.
	* config/aarch64/iterators.md (UNSPEC_SQRDMLAH): New.
	(UNSPEC_SQRDMLSH): New.
	(SQRDMLAH): New.
	(rdma_as): New.


[-- Attachment #2: 0002-Add-RDMA-simd-instruction-patterns.patch --]
[-- Type: text/x-patch, Size: 5447 bytes --]

From 3505963108eac78ad5e224a0e558cce82ac8e127 Mon Sep 17 00:00:00 2001
From: Matthew Wahab <matthew.wahab@arm.com>
Date: Mon, 7 Sep 2015 18:57:37 +0100
Subject: [PATCH 2/7] Add RDMA simd instruction patterns.

Change-Id: I87043d052c660b7ce9b6d881293abe880efb795e
---
 gcc/config/aarch64/aarch64-simd.md | 94 +++++++++++++++++++++++++++++++++++++-
 gcc/config/aarch64/iterators.md    |  6 +++
 2 files changed, 98 insertions(+), 2 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 167277e..cf87ac2 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -2852,7 +2852,7 @@
    "TARGET_SIMD"
    "sqxtun\\t%<vn2>0<Vmntype>, %<v>1<Vmtype>"
    [(set_attr "type" "neon_sat_shift_imm_narrow_q")]
- )
+)
 
 ;; sqmovn and uqmovn
 
@@ -2863,7 +2863,7 @@
   "TARGET_SIMD"
   "<sur>qxtn\\t%<vn2>0<Vmntype>, %<v>1<Vmtype>"
    [(set_attr "type" "neon_sat_shift_imm_narrow_q")]
- )
+)
 
 ;; <su>q<absneg>
 
@@ -2951,6 +2951,96 @@
   [(set_attr "type" "neon_sat_mul_<Vetype>_scalar<q>")]
 )
 
+;; sqrdml[as]h.
+
+(define_insn "aarch64_sqrdml<SQRDMLAH:rdma_as>h<mode>"
+  [(set (match_operand:VSDQ_HSI 0 "register_operand" "=w")
+	(unspec:VSDQ_HSI
+	  [(match_operand:VSDQ_HSI 1 "register_operand" "0")
+	   (match_operand:VSDQ_HSI 2 "register_operand" "w")
+	   (match_operand:VSDQ_HSI 3 "register_operand" "w")]
+	  SQRDMLAH))]
+   "TARGET_SIMD_RDMA"
+   "sqrdml<SQRDMLAH:rdma_as>h\\t%<v>0<Vmtype>, %<v>2<Vmtype>, %<v>3<Vmtype>"
+   [(set_attr "type" "neon_sat_mla_<Vetype>_long")]
+)
+
+;; sqrdml[as]h_lane.
+
+(define_insn "aarch64_sqrdml<SQRDMLAH:rdma_as>h_lane<mode>"
+  [(set (match_operand:VDQHS 0 "register_operand" "=w")
+	(unspec:VDQHS
+	  [(match_operand:VDQHS 1 "register_operand" "0")
+	   (match_operand:VDQHS 2 "register_operand" "w")
+	   (vec_select:<VEL>
+	     (match_operand:<VCOND> 3 "register_operand" "w")
+	     (parallel [(match_operand:SI 4 "immediate_operand" "i")]))]
+	  SQRDMLAH))]
+   "TARGET_SIMD_RDMA"
+   {
+     operands[4] = GEN_INT (ENDIAN_LANE_N (<VCOND>mode, INTVAL (operands[4])));
+     return
+      "sqrdml<SQRDMLAH:rdma_as>h\\t%0.<Vtype>, %2.<Vtype>, %3.<Vetype>[%4]";
+   }
+   [(set_attr "type" "neon_sat_mla_<Vetype>_scalar_long")]
+)
+
+(define_insn "aarch64_sqrdml<SQRDMLAH:rdma_as>h_lane<mode>"
+  [(set (match_operand:SD_HSI 0 "register_operand" "=w")
+	(unspec:SD_HSI
+	  [(match_operand:SD_HSI 1 "register_operand" "0")
+	   (match_operand:SD_HSI 2 "register_operand" "w")
+	   (vec_select:<VEL>
+	     (match_operand:<VCOND> 3 "register_operand" "w")
+	     (parallel [(match_operand:SI 4 "immediate_operand" "i")]))]
+	  SQRDMLAH))]
+   "TARGET_SIMD_RDMA"
+   {
+     operands[4] = GEN_INT (ENDIAN_LANE_N (<VCOND>mode, INTVAL (operands[4])));
+     return
+      "sqrdml<SQRDMLAH:rdma_as>h\\t%<v>0, %<v>2, %3.<Vetype>[%4]";
+   }
+   [(set_attr "type" "neon_sat_mla_<Vetype>_scalar_long")]
+)
+
+;; sqrdml[as]h_laneq.
+
+(define_insn "aarch64_sqrdml<SQRDMLAH:rdma_as>h_laneq<mode>"
+  [(set (match_operand:VDQHS 0 "register_operand" "=w")
+	(unspec:VDQHS
+	  [(match_operand:VDQHS 1 "register_operand" "0")
+	   (match_operand:VDQHS 2 "register_operand" "w")
+	   (vec_select:<VEL>
+	     (match_operand:<VCONQ> 3 "register_operand" "w")
+	     (parallel [(match_operand:SI 4 "immediate_operand" "i")]))]
+	  SQRDMLAH))]
+   "TARGET_SIMD_RDMA"
+   {
+     operands[4] = GEN_INT (ENDIAN_LANE_N (<VCONQ>mode, INTVAL (operands[4])));
+     return
+      "sqrdml<SQRDMLAH:rdma_as>h\\t%0.<Vtype>, %2.<Vtype>, %3.<Vetype>[%4]";
+   }
+   [(set_attr "type" "neon_sat_mla_<Vetype>_scalar_long")]
+)
+
+(define_insn "aarch64_sqrdml<SQRDMLAH:rdma_as>h_laneq<mode>"
+  [(set (match_operand:SD_HSI 0 "register_operand" "=w")
+	(unspec:SD_HSI
+	  [(match_operand:SD_HSI 1 "register_operand" "0")
+	   (match_operand:SD_HSI 2 "register_operand" "w")
+	   (vec_select:<VEL>
+	     (match_operand:<VCONQ> 3 "register_operand" "w")
+	     (parallel [(match_operand:SI 4 "immediate_operand" "i")]))]
+	  SQRDMLAH))]
+   "TARGET_SIMD_RDMA"
+   {
+     operands[4] = GEN_INT (ENDIAN_LANE_N (<VCONQ>mode, INTVAL (operands[4])));
+     return
+      "sqrdml<SQRDMLAH:rdma_as>h\\t%<v>0, %<v>2, %3.<v>[%4]";
+   }
+   [(set_attr "type" "neon_sat_mla_<Vetype>_scalar_long")]
+)
+
 ;; vqdml[sa]l
 
 (define_insn "aarch64_sqdml<SBINQOPS:as>l<mode>"
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 964f8f1..409ba7b 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -303,6 +303,8 @@
     UNSPEC_PMULL2       ; Used in aarch64-simd.md.
     UNSPEC_REV_REGLIST  ; Used in aarch64-simd.md.
     UNSPEC_VEC_SHR      ; Used in aarch64-simd.md.
+    UNSPEC_SQRDMLAH     ; Used in aarch64-simd.md.
+    UNSPEC_SQRDMLSH     ; Used in aarch64-simd.md.
 ])
 
 ;; -------------------------------------------------------------------
@@ -932,6 +934,8 @@
                                UNSPEC_SQSHRN UNSPEC_UQSHRN
                                UNSPEC_SQRSHRN UNSPEC_UQRSHRN])
 
+(define_int_iterator SQRDMLAH [UNSPEC_SQRDMLAH UNSPEC_SQRDMLSH])
+
 (define_int_iterator PERMUTE [UNSPEC_ZIP1 UNSPEC_ZIP2
 			      UNSPEC_TRN1 UNSPEC_TRN2
 			      UNSPEC_UZP1 UNSPEC_UZP2])
@@ -1096,3 +1100,5 @@
 			  (UNSPEC_SHA1M "m")])
 
 (define_int_attr sha256_op [(UNSPEC_SHA256H "") (UNSPEC_SHA256H2 "2")])
+
+(define_int_attr rdma_as [(UNSPEC_SQRDMLAH "a") (UNSPEC_SQRDMLSH "s")])
-- 
2.1.4


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [AArch64][PATCH 3/7] Add builtins for ARMv8.1 Adv.SIMD,instructions.
  2015-10-23 12:19 [AArch64][PATCH 1/7] Add support for ARMv8.1 Adv.SIMD,instructions Matthew Wahab
  2015-10-23 12:19 ` [AArch64][PATCH 2/7] Add sqrdmah, sqrdmsh instructions Matthew Wahab
@ 2015-10-23 12:21 ` Matthew Wahab
  2015-10-27 11:20   ` James Greenhalgh
  2015-10-23 12:24 ` [AArch64][PATCH 4/7] Add ACLE feature macro for ARMv8.1,Adv.SIMD instructions Matthew Wahab
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 30+ messages in thread
From: Matthew Wahab @ 2015-10-23 12:21 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 649 bytes --]

The ARMv8.1 architecture extension adds two Adv.SIMD instructions,
sqrdmlah and sqrdmlsh. This patch adds the GCC builtins to generate the new
instructions which are needed for the NEON intrinsics added later in
this series.

Tested the series for aarch64-none-linux-gnu with native bootstrap and
make check on an ARMv8 architecture. Also tested aarch64-none-elf with
cross-compiled check-gcc on an ARMv8.1 emulator.

Ok for trunk?
Matthew

gcc/
2015-10-23  Matthew Wahab  <matthew.wahab@arm.com>

	* config/aarch64/aarch64-simd-builtins.def
	(sqrdmlah, sqrdmlsh): New.
	(sqrdmlah_lane, sqrdmlsh_lane): New.
	(sqrdmlah_laneq, sqrdmlsh_laneq): New.


[-- Attachment #2: 0003-Add-builtins-for-RDMA-instructions.patch --]
[-- Type: text/x-patch, Size: 1280 bytes --]

From b4a480cf0e38caa156b2fa15fc30b12ab8e0e7ad Mon Sep 17 00:00:00 2001
From: Matthew Wahab <matthew.wahab@arm.com>
Date: Thu, 27 Aug 2015 13:15:34 +0100
Subject: [PATCH 3/7] Add builtins for RDMA instructions.

Change-Id: I5156884010b1f6171583229c816aef4daab23b8f
---
 gcc/config/aarch64/aarch64-simd-builtins.def | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 654e963..4cc4559 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -412,3 +412,17 @@
 
   /* Implemented by aarch64_tbx4v8qi.  */
   VAR1 (TERNOP, tbx4, 0, v8qi)
+
+  /* Builtins for ARMv8.1 Adv.SIMD instructions.  */
+
+  /* Implemented by aarch64_sqrdml<SQRDMLAH:rdma_as>h<mode>.  */
+  BUILTIN_VSDQ_HSI (TERNOP, sqrdmlah, 0)
+  BUILTIN_VSDQ_HSI (TERNOP, sqrdmlsh, 0)
+
+  /* Implemented by aarch64_sqrdml<SQRDMLAH:rdma_as>h_lane<mode>.  */
+  BUILTIN_VSDQ_HSI (QUADOP_LANE, sqrdmlah_lane, 0)
+  BUILTIN_VSDQ_HSI (QUADOP_LANE, sqrdmlsh_lane, 0)
+
+  /* Implemented by aarch64_sqrdml<SQRDMLAH:rdma_as>h_laneq<mode>.  */
+  BUILTIN_VSDQ_HSI (QUADOP_LANE, sqrdmlah_laneq, 0)
+  BUILTIN_VSDQ_HSI (QUADOP_LANE, sqrdmlsh_laneq, 0)
-- 
2.1.4


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [AArch64][dejagnu][PATCH 5/7] Dejagnu support for ARMv8.1 Adv.SIMD.
  2015-10-23 12:19 [AArch64][PATCH 1/7] Add support for ARMv8.1 Adv.SIMD,instructions Matthew Wahab
                   ` (2 preceding siblings ...)
  2015-10-23 12:24 ` [AArch64][PATCH 4/7] Add ACLE feature macro for ARMv8.1,Adv.SIMD instructions Matthew Wahab
@ 2015-10-23 12:24 ` Matthew Wahab
  2015-10-24  8:04   ` Bernhard Reutner-Fischer
  2015-10-23 12:30 ` [AArch64][PATCH 6/7] Add NEON intrinsics vqrdmlah and vqrdmlsh Matthew Wahab
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 30+ messages in thread
From: Matthew Wahab @ 2015-10-23 12:24 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1112 bytes --]

The ARMv8.1 architecture extension adds two Adv.SIMD instructions,. This
patch adds support in Dejagnu for ARMv8.1 Adv.SIMD specifiers and
checks.

The new test options are
- { dg-add-options arm_v8_1a_neon }: Add compiler options needed to
   enable ARMv8.1 Adv.SIMD.
- { dg-require-effective-target arm_v8_1a_neon_hw }: Require a target
   capable of executing ARMv8.1 Adv.SIMD instructions.

The new options support AArch64 only.

Tested the series for aarch64-none-linux-gnu with native bootstrap and
make check on an ARMv8 architecture. Also tested aarch64-none-elf with
cross-compiled check-gcc on an ARMv8.1 emulator.

Ok for trunk?
Matthew

gcc/testsuite
2015-10-23  Matthew Wahab  <matthew.wahab@arm.com>

	* lib/target-supports.exp (add_options_for_arm_v8_1a_neon): New.
	(check_effective_target_arm_arch_FUNC_ok)
	(add_options_for_arm_arch_FUNC)
	(check_effective_target_arm_arch_FUNC_multilib): Add "armv8.1-a"
	to the list to be generated.
	(check_effective_target_arm_v8_1a_neon_ok_nocache): New.
	(check_effective_target_arm_v8_1a_neon_ok): New.
	(check_effective_target_arm_v8_1a_neon_hw): New.


[-- Attachment #2: 0005-Testsuite-Add-dejagnu-options-for-armv8.1-neon.patch --]
[-- Type: text/x-patch, Size: 3156 bytes --]

From 4c218c6972f510aee2b438180084baafda80b37f Mon Sep 17 00:00:00 2001
From: Matthew Wahab <matthew.wahab@arm.com>
Date: Thu, 27 Aug 2015 13:41:15 +0100
Subject: [PATCH 5/7] [Testsuite] Add dejagnu options for armv8.1 neon

Change-Id: Ic8edc48aa701aa159303f13154710a6fdae816d0
---
 gcc/testsuite/lib/target-supports.exp | 50 ++++++++++++++++++++++++++++++++++-
 1 file changed, 49 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 4d5b0a3d..b03ea02 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2700,6 +2700,16 @@ proc add_options_for_arm_v8_neon { flags } {
     return "$flags $et_arm_v8_neon_flags -march=armv8-a"
 }
 
+# Add the options needed for ARMv8.1 Adv.SIMD.
+
+proc add_options_for_arm_v8_1a_neon { flags } {
+    if { [istarget aarch64*-*-*] } {
+	return "$flags -march=armv8.1-a"
+    } else {
+	return "$flags"
+    }
+}
+
 proc add_options_for_arm_crc { flags } {
     if { ! [check_effective_target_arm_crc_ok] } {
         return "$flags"
@@ -2984,7 +2994,8 @@ foreach { armfunc armflag armdef } { v4 "-march=armv4 -marm" __ARM_ARCH_4__
 				     v7r "-march=armv7-r" __ARM_ARCH_7R__
 				     v7m "-march=armv7-m -mthumb" __ARM_ARCH_7M__
 				     v7em "-march=armv7e-m -mthumb" __ARM_ARCH_7EM__
-				     v8a "-march=armv8-a" __ARM_ARCH_8A__ } {
+				     v8a "-march=armv8-a" __ARM_ARCH_8A__
+				     v8_1a "-march=armv8.1a" __ARM_ARCH_8A__ } {
     eval [string map [list FUNC $armfunc FLAG $armflag DEF $armdef ] {
 	proc check_effective_target_arm_arch_FUNC_ok { } {
 	    if { [ string match "*-marm*" "FLAG" ] &&
@@ -3141,6 +3152,22 @@ proc check_effective_target_arm_neonv2_hw { } {
     } [add_options_for_arm_neonv2 ""]]
 }
 
+# Return 1 if the target supports the ARMv8.1 Adv.SIMD extension, 0
+# otherwise.  The test is valid for AArch64.
+
+proc check_effective_target_arm_v8_1a_neon_ok_nocache { } {
+    return [check_no_compiler_messages_nocache arm_v8_1a_neon_ok assembly {
+	#if !defined (__ARM_FEATURE_QRDMX)
+	#error FOO
+	#endif
+    } [add_options_for_arm_v8_1a_neon ""]]
+}
+
+proc check_effective_target_arm_v8_1a_neon_ok { } {
+    return [check_cached_effective_target arm_v8_1a_neon_ok \
+		check_effective_target_arm_v8_1a_neon_ok_nocache]
+}
+
 # Return 1 if the target supports executing ARMv8 NEON instructions, 0
 # otherwise.
 
@@ -3159,6 +3186,27 @@ proc check_effective_target_arm_v8_neon_hw { } {
     } [add_options_for_arm_v8_neon ""]]
 }
 
+# Return 1 if the target supports executing the ARMv8.1 Adv.SIMD extension, 0
+# otherwise.  The test is valid for AArch64.
+
+proc check_effective_target_arm_v8_1a_neon_hw { } {
+    return [check_runtime_nocache arm_v8_1a_neon_hw_available {
+	int
+	main (void)
+	{
+	  long long a = 0, b = 1;
+	  long long result = 0;
+
+	  asm ("sqrdmlah %s0,%s1,%s2"
+	       : "=w"(result)
+	       : "w"(a), "w"(b)
+	       : /* No clobbers.  */);
+
+	  return result;
+	}
+    }  [add_options_for_arm_v8_1a_neon ""]]
+}
+
 # Return 1 if this is a ARM target with NEON enabled.
 
 proc check_effective_target_arm_neon { } {
-- 
2.1.4


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [AArch64][PATCH 4/7] Add ACLE feature macro for ARMv8.1,Adv.SIMD instructions.
  2015-10-23 12:19 [AArch64][PATCH 1/7] Add support for ARMv8.1 Adv.SIMD,instructions Matthew Wahab
  2015-10-23 12:19 ` [AArch64][PATCH 2/7] Add sqrdmah, sqrdmsh instructions Matthew Wahab
  2015-10-23 12:21 ` [AArch64][PATCH 3/7] Add builtins for ARMv8.1 Adv.SIMD,instructions Matthew Wahab
@ 2015-10-23 12:24 ` Matthew Wahab
  2015-10-27 11:36   ` James Greenhalgh
  2015-10-23 12:24 ` [AArch64][dejagnu][PATCH 5/7] Dejagnu support for ARMv8.1 Adv.SIMD Matthew Wahab
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 30+ messages in thread
From: Matthew Wahab @ 2015-10-23 12:24 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 640 bytes --]

The ARMv8.1 architecture extension adds two Adv.SIMD instructions,
sqrdmlah and sqrdmlsh. This patch adds the feature macro
__ARM_FEATURE_QRDMX to indicate the presence of these instructions,
generating it when the feature is available, as it is when
-march=armv8.1-a is selected.

Tested the series for aarch64-none-linux-gnu with native bootstrap and
make check on an ARMv8 architecture. Also tested aarch64-none-elf with
cross-compiled check-gcc on an ARMv8.1 emulator.

Ok for trunk?
Matthew

gcc/
2015-10-23  Matthew Wahab  <matthew.wahab@arm.com>

	* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Add
	ARM_FEATURE_QRDMX.


[-- Attachment #2: 0004-Add-ACLE-QRDMX-feature-macro.patch --]
[-- Type: text/x-patch, Size: 849 bytes --]

From 3af8c483a2def95abec264ca8591547d6c0e0b3e Mon Sep 17 00:00:00 2001
From: Matthew Wahab <matthew.wahab@arm.com>
Date: Thu, 27 Aug 2015 13:31:49 +0100
Subject: [PATCH 4/7] Add ACLE QRDMX feature macro.

Change-Id: I91af172637603ea89fc93a8e715973d7d304a92f
---
 gcc/config/aarch64/aarch64-c.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/config/aarch64/aarch64-c.c b/gcc/config/aarch64/aarch64-c.c
index 303025f..ad95c78 100644
--- a/gcc/config/aarch64/aarch64-c.c
+++ b/gcc/config/aarch64/aarch64-c.c
@@ -126,6 +126,7 @@ aarch64_update_cpp_builtins (cpp_reader *pfile)
   aarch64_def_or_undef (TARGET_ILP32, "__ILP32__", pfile);
 
   aarch64_def_or_undef (TARGET_CRYPTO, "__ARM_FEATURE_CRYPTO", pfile);
+  aarch64_def_or_undef (TARGET_SIMD_RDMA, "__ARM_FEATURE_QRDMX", pfile);
 }
 
 /* Implement TARGET_CPU_CPP_BUILTINS.  */
-- 
2.1.4


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [AArch64][PATCH 6/7] Add NEON intrinsics vqrdmlah and vqrdmlsh.
  2015-10-23 12:19 [AArch64][PATCH 1/7] Add support for ARMv8.1 Adv.SIMD,instructions Matthew Wahab
                   ` (3 preceding siblings ...)
  2015-10-23 12:24 ` [AArch64][dejagnu][PATCH 5/7] Dejagnu support for ARMv8.1 Adv.SIMD Matthew Wahab
@ 2015-10-23 12:30 ` Matthew Wahab
  2015-10-30 12:53   ` Christophe Lyon
  2015-11-23 13:37   ` James Greenhalgh
  2015-10-23 12:34 ` [AArch64][PATCH 7/7] Add NEON intrinsics vqrdmlah_lane and vqrdmlsh_lane Matthew Wahab
  2015-10-27 10:54 ` [AArch64][PATCH 1/7] Add support for ARMv8.1 Adv.SIMD,instructions James Greenhalgh
  6 siblings, 2 replies; 30+ messages in thread
From: Matthew Wahab @ 2015-10-23 12:30 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 969 bytes --]

The ARMv8.1 architecture extension adds two Adv.SIMD instructions,
sqrdmlah and sqrdmlsh. This patch adds the NEON intrinsics vqrdmlah and
vqrdmlsh for these instructions. The new intrinsics are of the form
vqrdml{as}h[q]_<type>.

Tested the series for aarch64-none-linux-gnu with native bootstrap and
make check on an ARMv8 architecture. Also tested aarch64-none-elf with
cross-compiled check-gcc on an ARMv8.1 emulator.

Ok for trunk?
Matthew

gcc/
2015-10-23  Matthew Wahab  <matthew.wahab@arm.com>

	* gcc/config/aarch64/arm_neon.h (vqrdmlah_s16, vqrdmlah_s32): New.
	(vqrdmlahq_s16, vqrdmlahq_s32): New.
	(vqrdmlsh_s16, vqrdmlsh_s32): New.
	(vqrdmlshq_s16, vqrdmlshq_s32): New.

gcc/testsuite
2015-10-23  Matthew Wahab  <matthew.wahab@arm.com>

	* gcc.target/aarch64/advsimd-intrinsics/vqrdmlXh.inc: New file,
	support code for vqrdml{as}h tests.
	* gcc.target/aarch64/advsimd-intrinsics/vqrdmlah.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vqrdmlsh.c: New.


[-- Attachment #2: 0006-Add-neon-intrinsics-vqrdmlah-vqrdmlsh.patch --]
[-- Type: text/x-patch, Size: 14671 bytes --]

From 611e1232a59dfe42f2cd9666680407d67abcfea5 Mon Sep 17 00:00:00 2001
From: Matthew Wahab <matthew.wahab@arm.com>
Date: Thu, 27 Aug 2015 13:22:41 +0100
Subject: [PATCH 6/7] Add neon intrinsics: vqrdmlah, vqrdmlsh.

Change-Id: I5c7f8d36ee980d280c1d50f6f212b286084c5acf
---
 gcc/config/aarch64/arm_neon.h                      |  53 ++++++++
 .../aarch64/advsimd-intrinsics/vqrdmlXh.inc        | 138 +++++++++++++++++++++
 .../aarch64/advsimd-intrinsics/vqrdmlah.c          |  57 +++++++++
 .../aarch64/advsimd-intrinsics/vqrdmlsh.c          |  61 +++++++++
 4 files changed, 309 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlXh.inc
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlah.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlsh.c

diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index e186348..9e73809 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -2649,6 +2649,59 @@ vqrdmulhq_s32 (int32x4_t __a, int32x4_t __b)
   return (int32x4_t) __builtin_aarch64_sqrdmulhv4si (__a, __b);
 }
 
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8.1-a")
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqrdmlah_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c)
+{
+  return (int16x4_t) __builtin_aarch64_sqrdmlahv4hi (__a, __b, __c);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqrdmlah_s32 (int32x2_t __a, int32x2_t __b, int32x2_t __c)
+{
+  return (int32x2_t) __builtin_aarch64_sqrdmlahv2si (__a, __b, __c);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vqrdmlahq_s16 (int16x8_t __a, int16x8_t __b, int16x8_t __c)
+{
+  return (int16x8_t) __builtin_aarch64_sqrdmlahv8hi (__a, __b, __c);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqrdmlahq_s32 (int32x4_t __a, int32x4_t __b, int32x4_t __c)
+{
+  return (int32x4_t) __builtin_aarch64_sqrdmlahv4si (__a, __b, __c);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqrdmlsh_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c)
+{
+  return (int16x4_t) __builtin_aarch64_sqrdmlshv4hi (__a, __b, __c);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqrdmlsh_s32 (int32x2_t __a, int32x2_t __b, int32x2_t __c)
+{
+  return (int32x2_t) __builtin_aarch64_sqrdmlshv2si (__a, __b, __c);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vqrdmlshq_s16 (int16x8_t __a, int16x8_t __b, int16x8_t __c)
+{
+  return (int16x8_t) __builtin_aarch64_sqrdmlshv8hi (__a, __b, __c);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqrdmlshq_s32 (int32x4_t __a, int32x4_t __b, int32x4_t __c)
+{
+  return (int32x4_t) __builtin_aarch64_sqrdmlshv4si (__a, __b, __c);
+}
+
+#pragma GCC pop_options
+
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
 vcreate_s8 (uint64_t __a)
 {
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlXh.inc b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlXh.inc
new file mode 100644
index 0000000..a504ca6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlXh.inc
@@ -0,0 +1,138 @@
+#define FNNAME1(NAME) exec_ ## NAME
+#define FNNAME(NAME) FNNAME1 (NAME)
+
+void FNNAME (INSN) (void)
+{
+  /* vector_res = vqrdmlah (vector, vector2, vector3, vector4),
+     then store the result.  */
+#define TEST_VQRDMLAH2(INSN, Q, T1, T2, W, N, EXPECTED_CUMULATIVE_SAT, CMT) \
+  Set_Neon_Cumulative_Sat (0, VECT_VAR (vector_res, T1, W, N));		\
+  VECT_VAR (vector_res, T1, W, N) =					\
+    INSN##Q##_##T2##W (VECT_VAR (vector, T1, W, N),			\
+		       VECT_VAR (vector2, T1, W, N),			\
+		       VECT_VAR (vector3, T1, W, N));			\
+  vst1##Q##_##T2##W (VECT_VAR (result, T1, W, N),			\
+		     VECT_VAR (vector_res, T1, W, N));			\
+  CHECK_CUMULATIVE_SAT (TEST_MSG, T1, W, N,				\
+			EXPECTED_CUMULATIVE_SAT, CMT)
+
+  /* Two auxliary macros are necessary to expand INSN.  */
+#define TEST_VQRDMLAH1(INSN, Q, T1, T2, W, N, EXPECTED_CUMULATIVE_SAT, CMT) \
+  TEST_VQRDMLAH2 (INSN, Q, T1, T2, W, N, EXPECTED_CUMULATIVE_SAT, CMT)
+
+#define TEST_VQRDMLAH(Q, T1, T2, W, N, EXPECTED_CUMULATIVE_SAT, CMT)	\
+  TEST_VQRDMLAH1 (INSN, Q, T1, T2, W, N, EXPECTED_CUMULATIVE_SAT, CMT)
+
+  DECL_VARIABLE (vector, int, 16, 4);
+  DECL_VARIABLE (vector, int, 32, 2);
+  DECL_VARIABLE (vector, int, 16, 8);
+  DECL_VARIABLE (vector, int, 32, 4);
+
+  DECL_VARIABLE (vector_res, int, 16, 4);
+  DECL_VARIABLE (vector_res, int, 32, 2);
+  DECL_VARIABLE (vector_res, int, 16, 8);
+  DECL_VARIABLE (vector_res, int, 32, 4);
+
+  DECL_VARIABLE (vector2, int, 16, 4);
+  DECL_VARIABLE (vector2, int, 32, 2);
+  DECL_VARIABLE (vector2, int, 16, 8);
+  DECL_VARIABLE (vector2, int, 32, 4);
+
+  DECL_VARIABLE (vector3, int, 16, 4);
+  DECL_VARIABLE (vector3, int, 32, 2);
+  DECL_VARIABLE (vector3, int, 16, 8);
+  DECL_VARIABLE (vector3, int, 32, 4);
+
+  clean_results ();
+
+  VLOAD (vector, buffer, , int, s, 16, 4);
+  VLOAD (vector, buffer, , int, s, 32, 2);
+  VLOAD (vector, buffer, q, int, s, 16, 8);
+  VLOAD (vector, buffer, q, int, s, 32, 4);
+
+  /* Initialize vector2.  */
+  VDUP (vector2, , int, s, 16, 4, 0x5555);
+  VDUP (vector2, , int, s, 32, 2, 0xBB);
+  VDUP (vector2, q, int, s, 16, 8, 0xBB);
+  VDUP (vector2, q, int, s, 32, 4, 0x22);
+
+  /* Initialize vector3.  */
+  VDUP (vector3, , int, s, 16, 4, 0x5555);
+  VDUP (vector3, , int, s, 32, 2, 0xBB);
+  VDUP (vector3, q, int, s, 16, 8, 0x33);
+  VDUP (vector3, q, int, s, 32, 4, 0x22);
+
+#define CMT ""
+  TEST_VQRDMLAH ( , int, s, 16, 4, expected_cumulative_sat, CMT);
+  TEST_VQRDMLAH ( , int, s, 32, 2, expected_cumulative_sat, CMT);
+  TEST_VQRDMLAH (q, int, s, 16, 8, expected_cumulative_sat, CMT);
+  TEST_VQRDMLAH (q, int, s, 32, 4, expected_cumulative_sat, CMT);
+
+  CHECK (TEST_MSG, int, 16, 4, PRIx16, expected, CMT);
+  CHECK (TEST_MSG, int, 32, 2, PRIx32, expected, CMT);
+  CHECK (TEST_MSG, int, 16, 8, PRIx16, expected, CMT);
+  CHECK (TEST_MSG, int, 32, 4, PRIx32, expected, CMT);
+
+  /* Now use input values such that the multiplication causes
+     saturation.  */
+#define TEST_MSG_MUL " (check mul cumulative saturation)"
+  VDUP (vector, , int, s, 16, 4, 0x8000);
+  VDUP (vector, , int, s, 32, 2, 0x80000000);
+  VDUP (vector, q, int, s, 16, 8, 0x8000);
+  VDUP (vector, q, int, s, 32, 4, 0x80000000);
+  VDUP (vector2, , int, s, 16, 4, 0x8000);
+  VDUP (vector2, , int, s, 32, 2, 0x80000000);
+  VDUP (vector2, q, int, s, 16, 8, 0x8000);
+  VDUP (vector2, q, int, s, 32, 4, 0x80000000);
+  VDUP (vector3, , int, s, 16, 4, 0x8000);
+  VDUP (vector3, , int, s, 32, 2, 0x80000000);
+  VDUP (vector3, q, int, s, 16, 8, 0x8000);
+  VDUP (vector3, q, int, s, 32, 4, 0x80000000);
+
+  TEST_VQRDMLAH ( , int, s, 16, 4, expected_cumulative_sat_mul, TEST_MSG_MUL);
+  TEST_VQRDMLAH ( , int, s, 32, 2, expected_cumulative_sat_mul, TEST_MSG_MUL);
+  TEST_VQRDMLAH (q, int, s, 16, 8, expected_cumulative_sat_mul, TEST_MSG_MUL);
+  TEST_VQRDMLAH (q, int, s, 32, 4, expected_cumulative_sat_mul, TEST_MSG_MUL);
+
+  CHECK (TEST_MSG, int, 16, 4, PRIx16, expected_mul, TEST_MSG_MUL);
+  CHECK (TEST_MSG, int, 32, 2, PRIx32, expected_mul, TEST_MSG_MUL);
+  CHECK (TEST_MSG, int, 16, 8, PRIx16, expected_mul, TEST_MSG_MUL);
+  CHECK (TEST_MSG, int, 32, 4, PRIx32, expected_mul, TEST_MSG_MUL);
+
+  /* Use input values where rounding produces a result equal to the
+     saturation value, but does not set the saturation flag.  */
+#define TEST_MSG_ROUND " (check rounding)"
+  VDUP (vector, , int, s, 16, 4, 0x8000);
+  VDUP (vector, , int, s, 32, 2, 0x80000000);
+  VDUP (vector, q, int, s, 16, 8, 0x8000);
+  VDUP (vector, q, int, s, 32, 4, 0x80000000);
+  VDUP (vector2, , int, s, 16, 4, 0x8001);
+  VDUP (vector2, , int, s, 32, 2, 0x80000001);
+  VDUP (vector2, q, int, s, 16, 8, 0x8001);
+  VDUP (vector2, q, int, s, 32, 4, 0x80000001);
+  VDUP (vector3, , int, s, 16, 4, 0x8001);
+  VDUP (vector3, , int, s, 32, 2, 0x80000001);
+  VDUP (vector3, q, int, s, 16, 8, 0x8001);
+  VDUP (vector3, q, int, s, 32, 4, 0x80000001);
+
+  TEST_VQRDMLAH ( , int, s, 16, 4, expected_cumulative_sat_round, \
+		 TEST_MSG_ROUND);
+  TEST_VQRDMLAH ( , int, s, 32, 2, expected_cumulative_sat_round, \
+		 TEST_MSG_ROUND);
+  TEST_VQRDMLAH (q, int, s, 16, 8, expected_cumulative_sat_round, \
+		 TEST_MSG_ROUND);
+  TEST_VQRDMLAH (q, int, s, 32, 4, expected_cumulative_sat_round, \
+		 TEST_MSG_ROUND);
+
+  CHECK (TEST_MSG, int, 16, 4, PRIx16, expected_round, TEST_MSG_ROUND);
+  CHECK (TEST_MSG, int, 32, 2, PRIx32, expected_round, TEST_MSG_ROUND);
+  CHECK (TEST_MSG, int, 16, 8, PRIx16, expected_round, TEST_MSG_ROUND);
+  CHECK (TEST_MSG, int, 32, 4, PRIx32, expected_round, TEST_MSG_ROUND);
+}
+
+int
+main (void)
+{
+  FNNAME (INSN) ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlah.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlah.c
new file mode 100644
index 0000000..148d94c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlah.c
@@ -0,0 +1,57 @@
+/* { dg-require-effective-target arm_v8_1a_neon_hw } */
+/* { dg-add-options arm_v8_1a_neon } */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+/* Expected values of cumulative_saturation flag.  */
+int VECT_VAR (expected_cumulative_sat, int, 16, 4) = 0;
+int VECT_VAR (expected_cumulative_sat, int, 32, 2) = 0;
+int VECT_VAR (expected_cumulative_sat, int, 16, 8) = 0;
+int VECT_VAR (expected_cumulative_sat, int, 32, 4) = 0;
+
+/* Expected results.  */
+VECT_VAR_DECL (expected, int, 16, 4) [] = { 0x38d3, 0x38d4, 0x38d5, 0x38d6 };
+VECT_VAR_DECL (expected, int, 32, 2) [] = { 0xfffffff0, 0xfffffff1 };
+VECT_VAR_DECL (expected, int, 16, 8) [] = { 0xfff0,  0xfff1, 0xfff2,  0xfff3,
+					    0xfff4, 0xfff5, 0xfff6, 0xfff7 };
+VECT_VAR_DECL (expected, int, 32, 4) [] = { 0xfffffff0, 0xfffffff1,
+					    0xfffffff2, 0xfffffff3 };
+
+/* Expected values of cumulative_saturation flag when multiplication
+   saturates.  */
+int VECT_VAR (expected_cumulative_sat_mul, int, 16, 4) = 0;
+int VECT_VAR (expected_cumulative_sat_mul, int, 32, 2) = 0;
+int VECT_VAR (expected_cumulative_sat_mul, int, 16, 8) = 0;
+int VECT_VAR (expected_cumulative_sat_mul, int, 32, 4) = 0;
+
+/* Expected results when multiplication saturates.  */
+VECT_VAR_DECL (expected_mul, int, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL (expected_mul, int, 32, 2) [] = { 0x0, 0x0 };
+VECT_VAR_DECL (expected_mul, int, 16, 8) [] = { 0x0, 0x0, 0x0, 0x0,
+						0x0, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL (expected_mul, int, 32, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
+
+/* Expected values of cumulative_saturation flag when rounding
+   should not cause saturation.  */
+int VECT_VAR (expected_cumulative_sat_round, int, 16, 4) = 0;
+int VECT_VAR (expected_cumulative_sat_round, int, 32, 2) = 0;
+int VECT_VAR (expected_cumulative_sat_round, int, 16, 8) = 0;
+int VECT_VAR (expected_cumulative_sat_round, int, 32, 4) = 0;
+
+/* Expected results when rounding should not cause saturation.  */
+VECT_VAR_DECL (expected_round, int, 16, 4) [] = { 0xfffe, 0xfffe,
+						  0xfffe, 0xfffe };
+VECT_VAR_DECL (expected_round, int, 32, 2) [] = { 0xfffffffe, 0xfffffffe };
+VECT_VAR_DECL (expected_round, int, 16, 8) [] = { 0xfffe, 0xfffe,
+						  0xfffe, 0xfffe,
+						  0xfffe, 0xfffe,
+						  0xfffe, 0xfffe };
+VECT_VAR_DECL (expected_round, int, 32, 4) [] = { 0xfffffffe, 0xfffffffe,
+						  0xfffffffe, 0xfffffffe };
+
+#define INSN vqrdmlah
+#define TEST_MSG "VQRDMLAH"
+
+#include "vqrdmlXh.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlsh.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlsh.c
new file mode 100644
index 0000000..91c3b34
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlsh.c
@@ -0,0 +1,61 @@
+/* { dg-require-effective-target arm_v8_1a_neon_hw } */
+/* { dg-add-options arm_v8_1a_neon } */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+/* Expected values of cumulative_saturation flag.  */
+int VECT_VAR (expected_cumulative_sat, int, 16, 4) = 0;
+int VECT_VAR (expected_cumulative_sat, int, 32, 2) = 0;
+int VECT_VAR (expected_cumulative_sat, int, 16, 8) = 0;
+int VECT_VAR (expected_cumulative_sat, int, 32, 4) = 0;
+
+/* Expected results.  */
+VECT_VAR_DECL (expected, int, 16, 4) [] = { 0xc70d, 0xc70e, 0xc70f, 0xc710 };
+VECT_VAR_DECL (expected, int, 32, 2) [] = { 0xfffffff0, 0xfffffff1 };
+VECT_VAR_DECL (expected, int, 16, 8) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3,
+					    0xfff4, 0xfff5, 0xfff6, 0xfff7 };
+VECT_VAR_DECL (expected, int, 32, 4) [] = { 0xfffffff0, 0xfffffff1,
+					    0xfffffff2, 0xfffffff3 };
+
+/* Expected values of cumulative_saturation flag when multiplication
+   saturates.  */
+int VECT_VAR (expected_cumulative_sat_mul, int, 16, 4) = 1;
+int VECT_VAR (expected_cumulative_sat_mul, int, 32, 2) = 1;
+int VECT_VAR (expected_cumulative_sat_mul, int, 16, 8) = 1;
+int VECT_VAR (expected_cumulative_sat_mul, int, 32, 4) = 1;
+
+/* Expected results when multiplication saturates.  */
+VECT_VAR_DECL (expected_mul, int, 16, 4) [] = { 0x8000, 0x8000,
+						0x8000, 0x8000 };
+VECT_VAR_DECL (expected_mul, int, 32, 2) [] = { 0x80000000, 0x80000000 };
+VECT_VAR_DECL (expected_mul, int, 16, 8) [] = { 0x8000, 0x8000,
+						0x8000, 0x8000,
+						0x8000, 0x8000,
+						0x8000, 0x8000 };
+VECT_VAR_DECL (expected_mul, int, 32, 4) [] = { 0x80000000, 0x80000000,
+						0x80000000, 0x80000000 };
+
+/* Expected values of cumulative_saturation flag when rounding
+   should not cause saturation.  */
+int VECT_VAR (expected_cumulative_sat_round, int, 16, 4) = 1;
+int VECT_VAR (expected_cumulative_sat_round, int, 32, 2) = 1;
+int VECT_VAR (expected_cumulative_sat_round, int, 16, 8) = 1;
+int VECT_VAR (expected_cumulative_sat_round, int, 32, 4) = 1;
+
+/* Expected results when rounding should not cause saturation.  */
+VECT_VAR_DECL (expected_round, int, 16, 4) [] = { 0x8000, 0x8000,
+						  0x8000, 0x8000 };
+VECT_VAR_DECL (expected_round, int, 32, 2) [] = { 0x80000000, 0x80000000 };
+VECT_VAR_DECL (expected_round, int, 16, 8) [] = { 0x8000, 0x8000,
+						  0x8000, 0x8000,
+						  0x8000, 0x8000,
+						  0x8000, 0x8000 };
+VECT_VAR_DECL (expected_round, int, 32, 4) [] = { 0x80000000, 0x80000000,
+						  0x80000000, 0x80000000 };
+
+#define INSN vqrdmlsh
+#define TEST_MSG "VQRDMLSH"
+
+#include "vqrdmlXh.inc"
-- 
2.1.4


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [AArch64][PATCH 7/7] Add NEON intrinsics vqrdmlah_lane and vqrdmlsh_lane.
  2015-10-23 12:19 [AArch64][PATCH 1/7] Add support for ARMv8.1 Adv.SIMD,instructions Matthew Wahab
                   ` (4 preceding siblings ...)
  2015-10-23 12:30 ` [AArch64][PATCH 6/7] Add NEON intrinsics vqrdmlah and vqrdmlsh Matthew Wahab
@ 2015-10-23 12:34 ` Matthew Wahab
  2015-11-23 13:45   ` James Greenhalgh
  2015-10-27 10:54 ` [AArch64][PATCH 1/7] Add support for ARMv8.1 Adv.SIMD,instructions James Greenhalgh
  6 siblings, 1 reply; 30+ messages in thread
From: Matthew Wahab @ 2015-10-23 12:34 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1497 bytes --]

The ARMv8.1 architecture extension adds two Adv.SIMD instructions,
sqrdmlah and sqrdmlsh. This patch adds the NEON intrinsics vqrdmlah_lane
and vqrdmlsh_lane for these instructions. The new intrinsics are of the
form vqrdml{as}h[q]_lane_<type>.

Tested the series for aarch64-none-linux-gnu with native bootstrap and
make check on an ARMv8 architecture. Also tested aarch64-none-elf with
cross-compiled check-gcc on an ARMv8.1 emulator.

Ok for trunk?
Matthew

gcc/
2015-10-23  Matthew Wahab  <matthew.wahab@arm.com>

	* gcc/config/aarch64/arm_neon.h
	(vqrdmlah_laneq_s16, vqrdmlah_laneq_s32): New.
	(vqrdmlahq_laneq_s16, vqrdmlahq_laneq_s32): New.
	(vqrdmlsh_laneq_s16, vqrdmlsh_laneq_s32): New.
	(vqrdmlshq_laneq_s16, vqrdmlshq_laneq_s32): New.
	(vqrdmlah_lane_s16, vqrdmlah_lane_s32): New.
	(vqrdmlahq_lane_s16, vqrdmlahq_lane_s32): New.
	(vqrdmlahh_s16, vqrdmlahh_lane_s16, vqrdmlahh_laneq_s16): New.
	(vqrdmlahs_s32, vqrdmlahs_lane_s32, vqrdmlahs_laneq_s32): New.
	(vqrdmlsh_lane_s16, vqrdmlsh_lane_s32): New.
	(vqrdmlshq_lane_s16, vqrdmlshq_lane_s32): New.
	(vqrdmlshh_s16, vqrdmlshh_lane_s16, vqrdmlshh_laneq_s16): New.
	(vqrdmlshs_s32, vqrdmlshs_lane_s32, vqrdmlshs_laneq_s32): New.

gcc/testsuite
2015-10-23  Matthew Wahab  <matthew.wahab@arm.com>

	* gcc.target/aarch64/advsimd-intrinsics/vqrdmlXh_lane.inc: New file,
	support code for vqrdml{as}h_lane tests.
	* gcc.target/aarch64/advsimd-intrinsics/vqrdmlah_lane.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vqrdmlsh_lane.c: New.


[-- Attachment #2: 0007-Add-neon-intrinsics-vqrdmlah_lane-vqrdmlsh_lane.patch --]
[-- Type: text/x-patch, Size: 20186 bytes --]

From a2399818dba85ff2801a28bad77ef51697990da7 Mon Sep 17 00:00:00 2001
From: Matthew Wahab <matthew.wahab@arm.com>
Date: Thu, 27 Aug 2015 14:17:26 +0100
Subject: [PATCH 7/7] Add neon intrinsics: vqrdmlah_lane, vqrdmlsh_lane.

Change-Id: I6d7a372e0a5b83ef0846ab62abbe9b24ada69fc4
---
 gcc/config/aarch64/arm_neon.h                      | 182 +++++++++++++++++++++
 .../aarch64/advsimd-intrinsics/vqrdmlXh_lane.inc   | 154 +++++++++++++++++
 .../aarch64/advsimd-intrinsics/vqrdmlah_lane.c     |  57 +++++++
 .../aarch64/advsimd-intrinsics/vqrdmlsh_lane.c     |  61 +++++++
 4 files changed, 454 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlXh_lane.inc
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlah_lane.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlsh_lane.c

diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 9e73809..9b68e4a 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -10675,6 +10675,59 @@ vqrdmulhq_laneq_s32 (int32x4_t __a, int32x4_t __b, const int __c)
   return __builtin_aarch64_sqrdmulh_laneqv4si (__a, __b, __c);
 }
 
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8.1-a")
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqrdmlah_laneq_s16 (int16x4_t __a, int16x4_t __b, int16x8_t __c, const int __d)
+{
+  return  __builtin_aarch64_sqrdmlah_laneqv4hi (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqrdmlah_laneq_s32 (int32x2_t __a, int32x2_t __b, int32x4_t __c, const int __d)
+{
+  return __builtin_aarch64_sqrdmlah_laneqv2si (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vqrdmlahq_laneq_s16 (int16x8_t __a, int16x8_t __b, int16x8_t __c, const int __d)
+{
+  return __builtin_aarch64_sqrdmlah_laneqv8hi (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqrdmlahq_laneq_s32 (int32x4_t __a, int32x4_t __b, int32x4_t __c, const int __d)
+{
+  return __builtin_aarch64_sqrdmlah_laneqv4si (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqrdmlsh_laneq_s16 (int16x4_t __a, int16x4_t __b, int16x8_t __c, const int __d)
+{
+  return  __builtin_aarch64_sqrdmlsh_laneqv4hi (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqrdmlsh_laneq_s32 (int32x2_t __a, int32x2_t __b, int32x4_t __c, const int __d)
+{
+  return __builtin_aarch64_sqrdmlsh_laneqv2si (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vqrdmlshq_laneq_s16 (int16x8_t __a, int16x8_t __b, int16x8_t __c, const int __d)
+{
+  return __builtin_aarch64_sqrdmlsh_laneqv8hi (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqrdmlshq_laneq_s32 (int32x4_t __a, int32x4_t __b, int32x4_t __c, const int __d)
+{
+  return __builtin_aarch64_sqrdmlsh_laneqv4si (__a, __b, __c, __d);
+}
+
+#pragma GCC pop_options
+
 /* Table intrinsics.  */
 
 __extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
@@ -20014,6 +20067,135 @@ vqrdmulhs_laneq_s32 (int32_t __a, int32x4_t __b, const int __c)
   return __builtin_aarch64_sqrdmulh_laneqsi (__a, __b, __c);
 }
 
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8.1-a")
+
+/* vqrdmlah.  */
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqrdmlah_lane_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c, const int __d)
+{
+  return  __builtin_aarch64_sqrdmlah_lanev4hi (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqrdmlah_lane_s32 (int32x2_t __a, int32x2_t __b, int32x2_t __c, const int __d)
+{
+  return __builtin_aarch64_sqrdmlah_lanev2si (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vqrdmlahq_lane_s16 (int16x8_t __a, int16x8_t __b, int16x4_t __c, const int __d)
+{
+  return __builtin_aarch64_sqrdmlah_lanev8hi (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqrdmlahq_lane_s32 (int32x4_t __a, int32x4_t __b, int32x2_t __c, const int __d)
+{
+  return __builtin_aarch64_sqrdmlah_lanev4si (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+vqrdmlahh_s16 (int16_t __a, int16_t __b, int16_t __c)
+{
+  return (int16_t) __builtin_aarch64_sqrdmlahhi (__a, __b, __c);
+}
+
+__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+vqrdmlahh_lane_s16 (int16_t __a, int16_t __b, int16x4_t __c, const int __d)
+{
+  return __builtin_aarch64_sqrdmlah_lanehi (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+vqrdmlahh_laneq_s16 (int16_t __a, int16_t __b, int16x8_t __c, const int __d)
+{
+  return __builtin_aarch64_sqrdmlah_laneqhi (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+vqrdmlahs_s32 (int32_t __a, int32_t __b, int32_t __c)
+{
+  return (int32_t) __builtin_aarch64_sqrdmlahsi (__a, __b, __c);
+}
+
+__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+vqrdmlahs_lane_s32 (int32_t __a, int32_t __b, int32x2_t __c, const int __d)
+{
+  return __builtin_aarch64_sqrdmlah_lanesi (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+vqrdmlahs_laneq_s32 (int32_t __a, int32_t __b, int32x4_t __c, const int __d)
+{
+  return __builtin_aarch64_sqrdmlah_laneqsi (__a, __b, __c, __d);
+}
+
+/* vqrdmlsh.  */
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqrdmlsh_lane_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c, const int __d)
+{
+  return  __builtin_aarch64_sqrdmlsh_lanev4hi (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqrdmlsh_lane_s32 (int32x2_t __a, int32x2_t __b, int32x2_t __c, const int __d)
+{
+  return __builtin_aarch64_sqrdmlsh_lanev2si (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vqrdmlshq_lane_s16 (int16x8_t __a, int16x8_t __b, int16x4_t __c, const int __d)
+{
+  return __builtin_aarch64_sqrdmlsh_lanev8hi (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqrdmlshq_lane_s32 (int32x4_t __a, int32x4_t __b, int32x2_t __c, const int __d)
+{
+  return __builtin_aarch64_sqrdmlsh_lanev4si (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+vqrdmlshh_s16 (int16_t __a, int16_t __b, int16_t __c)
+{
+  return (int16_t) __builtin_aarch64_sqrdmlshhi (__a, __b, __c);
+}
+
+__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+vqrdmlshh_lane_s16 (int16_t __a, int16_t __b, int16x4_t __c, const int __d)
+{
+  return __builtin_aarch64_sqrdmlsh_lanehi (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+vqrdmlshh_laneq_s16 (int16_t __a, int16_t __b, int16x8_t __c, const int __d)
+{
+  return __builtin_aarch64_sqrdmlsh_laneqhi (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+vqrdmlshs_s32 (int32_t __a, int32_t __b, int32_t __c)
+{
+  return (int32_t) __builtin_aarch64_sqrdmlshsi (__a, __b, __c);
+}
+
+__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+vqrdmlshs_lane_s32 (int32_t __a, int32_t __b, int32x2_t __c, const int __d)
+{
+  return __builtin_aarch64_sqrdmlsh_lanesi (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+vqrdmlshs_laneq_s32 (int32_t __a, int32_t __b, int32x4_t __c, const int __d)
+{
+  return __builtin_aarch64_sqrdmlsh_laneqsi (__a, __b, __c, __d);
+}
+
+#pragma GCC pop_options
+
 /* vqrshl */
 
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlXh_lane.inc b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlXh_lane.inc
new file mode 100644
index 0000000..a855502
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlXh_lane.inc
@@ -0,0 +1,154 @@
+#define FNNAME1(NAME) exec_ ## NAME ## _lane
+#define FNNAME(NAME) FNNAME1 (NAME)
+
+void FNNAME (INSN) (void)
+{
+  /* vector_res = vqrdmlXh_lane (vector, vector2, vector3, lane),
+     then store the result.  */
+#define TEST_VQRDMLXH_LANE2(INSN, Q, T1, T2, W, N, N2, L,		\
+			    EXPECTED_CUMULATIVE_SAT, CMT)		\
+  Set_Neon_Cumulative_Sat (0, VECT_VAR (vector_res, T1, W, N));		\
+  VECT_VAR (vector_res, T1, W, N) =					\
+    INSN##Q##_lane_##T2##W (VECT_VAR (vector, T1, W, N),		\
+			    VECT_VAR (vector2, T1, W, N),		\
+			    VECT_VAR (vector3, T1, W, N2),		\
+			    L);						\
+  vst1##Q##_##T2##W (VECT_VAR (result, T1, W, N),			\
+		     VECT_VAR (vector_res, T1, W, N));			\
+  CHECK_CUMULATIVE_SAT (TEST_MSG, T1, W, N, EXPECTED_CUMULATIVE_SAT, CMT)
+
+  /* Two auxliary macros are necessary to expand INSN.  */
+#define TEST_VQRDMLXH_LANE1(INSN, Q, T1, T2, W, N, N2, L,	\
+			    EXPECTED_CUMULATIVE_SAT, CMT)	\
+  TEST_VQRDMLXH_LANE2 (INSN, Q, T1, T2, W, N, N2, L,		\
+		       EXPECTED_CUMULATIVE_SAT, CMT)
+
+#define TEST_VQRDMLXH_LANE(Q, T1, T2, W, N, N2, L,		\
+			   EXPECTED_CUMULATIVE_SAT, CMT)	\
+  TEST_VQRDMLXH_LANE1 (INSN, Q, T1, T2, W, N, N2, L,		\
+		       EXPECTED_CUMULATIVE_SAT, CMT)
+
+
+  DECL_VARIABLE (vector, int, 16, 4);
+  DECL_VARIABLE (vector, int, 32, 2);
+  DECL_VARIABLE (vector, int, 16, 8);
+  DECL_VARIABLE (vector, int, 32, 4);
+
+  DECL_VARIABLE (vector_res, int, 16, 4);
+  DECL_VARIABLE (vector_res, int, 32, 2);
+  DECL_VARIABLE (vector_res, int, 16, 8);
+  DECL_VARIABLE (vector_res, int, 32, 4);
+
+  DECL_VARIABLE (vector2, int, 16, 4);
+  DECL_VARIABLE (vector2, int, 32, 2);
+  DECL_VARIABLE (vector2, int, 16, 8);
+  DECL_VARIABLE (vector2, int, 32, 4);
+
+  DECL_VARIABLE (vector3, int, 16, 4);
+  DECL_VARIABLE (vector3, int, 32, 2);
+  DECL_VARIABLE (vector3, int, 16, 8);
+  DECL_VARIABLE (vector3, int, 32, 4);
+
+  clean_results ();
+
+  VLOAD (vector, buffer, , int, s, 16, 4);
+  VLOAD (vector, buffer, , int, s, 32, 2);
+
+  VLOAD (vector, buffer, q, int, s, 16, 8);
+  VLOAD (vector, buffer, q, int, s, 32, 4);
+
+  /* Initialize vector2.  */
+  VDUP (vector2, , int, s, 16, 4, 0x5555);
+  VDUP (vector2, , int, s, 32, 2, 0xBB);
+  VDUP (vector2, q, int, s, 16, 8, 0xBB);
+  VDUP (vector2, q, int, s, 32, 4, 0x22);
+
+  /* Initialize vector3.  */
+  VDUP (vector3, , int, s, 16, 4, 0x5555);
+  VDUP (vector3, , int, s, 32, 2, 0xBB);
+  VDUP (vector3, q, int, s, 16, 8, 0x33);
+  VDUP (vector3, q, int, s, 32, 4, 0x22);
+
+  /* Choose lane arbitrarily.  */
+#define CMT ""
+  TEST_VQRDMLXH_LANE (, int, s, 16, 4, 4, 2, expected_cumulative_sat, CMT);
+  TEST_VQRDMLXH_LANE (, int, s, 32, 2, 2, 1, expected_cumulative_sat, CMT);
+  TEST_VQRDMLXH_LANE (q, int, s, 16, 8, 4, 3, expected_cumulative_sat, CMT);
+  TEST_VQRDMLXH_LANE (q, int, s, 32, 4, 2, 0, expected_cumulative_sat, CMT);
+
+  CHECK (TEST_MSG, int, 16, 4, PRIx16, expected, CMT);
+  CHECK (TEST_MSG, int, 32, 2, PRIx32, expected, CMT);
+  CHECK (TEST_MSG, int, 16, 8, PRIx16, expected, CMT);
+  CHECK (TEST_MSG, int, 32, 4, PRIx32, expected, CMT);
+
+  /* Now use input values such that the multiplication causes
+     saturation.  */
+#define TEST_MSG_MUL " (check mul cumulative saturation)"
+  VDUP (vector, , int, s, 16, 4, 0x8000);
+  VDUP (vector, , int, s, 32, 2, 0x80000000);
+  VDUP (vector, q, int, s, 16, 8, 0x8000);
+  VDUP (vector, q, int, s, 32, 4, 0x80000000);
+
+  VDUP (vector2, , int, s, 16, 4, 0x8000);
+  VDUP (vector2, , int, s, 32, 2, 0x80000000);
+  VDUP (vector2, q, int, s, 16, 8, 0x8000);
+  VDUP (vector2, q, int, s, 32, 4, 0x80000000);
+
+  VDUP (vector3, , int, s, 16, 4, 0x8000);
+  VDUP (vector3, , int, s, 32, 2, 0x80000000);
+  VDUP (vector3, q, int, s, 16, 8, 0x8000);
+  VDUP (vector3, q, int, s, 32, 4, 0x80000000);
+
+  TEST_VQRDMLXH_LANE (, int, s, 16, 4, 4, 2, expected_cumulative_sat_mul,
+		      TEST_MSG_MUL);
+  TEST_VQRDMLXH_LANE (, int, s, 32, 2, 2, 1, expected_cumulative_sat_mul,
+		      TEST_MSG_MUL);
+  TEST_VQRDMLXH_LANE (q, int, s, 16, 8, 4, 3, expected_cumulative_sat_mul,
+		      TEST_MSG_MUL);
+  TEST_VQRDMLXH_LANE (q, int, s, 32, 4, 2, 0, expected_cumulative_sat_mul,
+		      TEST_MSG_MUL);
+
+  CHECK (TEST_MSG, int, 16, 4, PRIx16, expected_mul, TEST_MSG_MUL);
+  CHECK (TEST_MSG, int, 32, 2, PRIx32, expected_mul, TEST_MSG_MUL);
+  CHECK (TEST_MSG, int, 16, 8, PRIx16, expected_mul, TEST_MSG_MUL);
+  CHECK (TEST_MSG, int, 32, 4, PRIx32, expected_mul, TEST_MSG_MUL);
+
+  VDUP (vector, , int, s, 16, 4, 0x8000);
+  VDUP (vector, , int, s, 32, 2, 0x80000000);
+  VDUP (vector, q, int, s, 16, 8, 0x8000);
+  VDUP (vector, q, int, s, 32, 4, 0x80000000);
+
+  VDUP (vector2, , int, s, 16, 4, 0x8001);
+  VDUP (vector2, , int, s, 32, 2, 0x80000001);
+  VDUP (vector2, q, int, s, 16, 8, 0x8001);
+  VDUP (vector2, q, int, s, 32, 4, 0x80000001);
+
+  VDUP (vector3, , int, s, 16, 4, 0x8001);
+  VDUP (vector3, , int, s, 32, 2, 0x80000001);
+  VDUP (vector3, q, int, s, 16, 8, 0x8001);
+  VDUP (vector3, q, int, s, 32, 4, 0x80000001);
+
+  /* Use input values where rounding produces a result equal to the
+     saturation value, but does not set the saturation flag.  */
+#define TEST_MSG_ROUND " (check rounding)"
+  TEST_VQRDMLXH_LANE (, int, s, 16, 4, 4, 2, expected_cumulative_sat_round,
+		      TEST_MSG_ROUND);
+  TEST_VQRDMLXH_LANE (, int, s, 32, 2, 2, 1, expected_cumulative_sat_round,
+		      TEST_MSG_ROUND);
+  TEST_VQRDMLXH_LANE (q, int, s, 16, 8, 4, 3, expected_cumulative_sat_round,
+		      TEST_MSG_ROUND);
+  TEST_VQRDMLXH_LANE (q, int, s, 32, 4, 2, 0, expected_cumulative_sat_round,
+		      TEST_MSG_ROUND);
+
+  CHECK (TEST_MSG, int, 16, 4, PRIx16, expected_round, TEST_MSG_ROUND);
+  CHECK (TEST_MSG, int, 32, 2, PRIx32, expected_round, TEST_MSG_ROUND);
+  CHECK (TEST_MSG, int, 16, 8, PRIx16, expected_round, TEST_MSG_ROUND);
+  CHECK (TEST_MSG, int, 32, 4, PRIx32, expected_round, TEST_MSG_ROUND);
+}
+
+int
+main (void)
+{
+  FNNAME (INSN) ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlah_lane.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlah_lane.c
new file mode 100644
index 0000000..ed43e01
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlah_lane.c
@@ -0,0 +1,57 @@
+/* { dg-require-effective-target arm_v8_1a_neon_hw } */
+/* { dg-add-options arm_v8_1a_neon } */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+/* Expected values of cumulative_saturation flag.  */
+int VECT_VAR (expected_cumulative_sat, int, 16, 4) = 0;
+int VECT_VAR (expected_cumulative_sat, int, 32, 2) = 0;
+int VECT_VAR (expected_cumulative_sat, int, 16, 8) = 0;
+int VECT_VAR (expected_cumulative_sat, int, 32, 4) = 0;
+
+/* Expected results.  */
+VECT_VAR_DECL (expected, int, 16, 4) [] = { 0x38d3, 0x38d4, 0x38d5, 0x38d6 };
+VECT_VAR_DECL (expected, int, 32, 2) [] = { 0xfffffff0, 0xfffffff1 };
+VECT_VAR_DECL (expected, int, 16, 8) [] = { 0x006d, 0x006e, 0x006f, 0x0070,
+					    0x0071, 0x0072, 0x0073, 0x0074 };
+VECT_VAR_DECL (expected, int, 32, 4) [] = { 0xfffffff0, 0xfffffff1,
+					    0xfffffff2, 0xfffffff3 };
+
+/* Expected values of cumulative_saturation flag when multiplication
+   saturates.  */
+int VECT_VAR (expected_cumulative_sat_mul, int, 16, 4) = 0;
+int VECT_VAR (expected_cumulative_sat_mul, int, 32, 2) = 0;
+int VECT_VAR (expected_cumulative_sat_mul, int, 16, 8) = 0;
+int VECT_VAR (expected_cumulative_sat_mul, int, 32, 4) = 0;
+
+/* Expected results when multiplication saturates.  */
+VECT_VAR_DECL (expected_mul, int, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL (expected_mul, int, 32, 2) [] = { 0x0, 0x0 };
+VECT_VAR_DECL (expected_mul, int, 16, 8) [] = { 0x0, 0x0, 0x0, 0x0,
+						0x0, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL (expected_mul, int, 32, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
+
+/* Expected values of cumulative_saturation flag when rounding
+   should not cause saturation.  */
+int VECT_VAR (expected_cumulative_sat_round, int, 16, 4) = 0;
+int VECT_VAR (expected_cumulative_sat_round, int, 32, 2) = 0;
+int VECT_VAR (expected_cumulative_sat_round, int, 16, 8) = 0;
+int VECT_VAR (expected_cumulative_sat_round, int, 32, 4) = 0;
+
+/* Expected results when rounding should not cause saturation.  */
+VECT_VAR_DECL (expected_round, int, 16, 4) [] = { 0xfffe, 0xfffe,
+						  0xfffe, 0xfffe };
+VECT_VAR_DECL (expected_round, int, 32, 2) [] = { 0xfffffffe, 0xfffffffe };
+VECT_VAR_DECL (expected_round,int, 16, 8) [] = { 0xfffe, 0xfffe,
+						 0xfffe, 0xfffe,
+						 0xfffe, 0xfffe,
+						 0xfffe, 0xfffe };
+VECT_VAR_DECL (expected_round, int, 32, 4) [] = { 0xfffffffe, 0xfffffffe,
+						  0xfffffffe, 0xfffffffe };
+
+#define INSN vqrdmlah
+#define TEST_MSG "VQRDMLAH_LANE"
+
+#include "vqrdmlXh_lane.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlsh_lane.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlsh_lane.c
new file mode 100644
index 0000000..6010b42
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlsh_lane.c
@@ -0,0 +1,61 @@
+/* { dg-require-effective-target arm_v8_1a_neon_hw } */
+/* { dg-add-options arm_v8_1a_neon } */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+/* Expected values of cumulative_saturation flag.  */
+int VECT_VAR (expected_cumulative_sat, int, 16, 4) = 0;
+int VECT_VAR (expected_cumulative_sat, int, 32, 2) = 0;
+int VECT_VAR (expected_cumulative_sat, int, 16, 8) = 0;
+int VECT_VAR (expected_cumulative_sat, int, 32, 4) = 0;
+
+/* Expected results.  */
+VECT_VAR_DECL (expected, int, 16, 4) [] = { 0xc70d, 0xc70e, 0xc70f, 0xc710 };
+VECT_VAR_DECL (expected, int, 32, 2) [] = { 0xfffffff0, 0xfffffff1 };
+VECT_VAR_DECL (expected, int, 16, 8) [] = { 0xff73, 0xff74, 0xff75, 0xff76,
+					    0xff77, 0xff78, 0xff79, 0xff7a };
+VECT_VAR_DECL (expected, int, 32, 4) [] = { 0xfffffff0, 0xfffffff1,
+					    0xfffffff2, 0xfffffff3 };
+
+/* Expected values of cumulative_saturation flag when multiplication
+   saturates.  */
+int VECT_VAR (expected_cumulative_sat_mul, int, 16, 4) = 1;
+int VECT_VAR (expected_cumulative_sat_mul, int, 32, 2) = 1;
+int VECT_VAR (expected_cumulative_sat_mul, int, 16, 8) = 1;
+int VECT_VAR (expected_cumulative_sat_mul, int, 32, 4) = 1;
+
+/* Expected results when multiplication saturates.  */
+VECT_VAR_DECL (expected_mul, int, 16, 4) [] = { 0x8000, 0x8000,
+						0x8000, 0x8000 };
+VECT_VAR_DECL (expected_mul, int, 32, 2) [] = { 0x80000000, 0x80000000 };
+VECT_VAR_DECL (expected_mul, int, 16, 8) [] = { 0x8000, 0x8000,
+						0x8000, 0x8000,
+						0x8000, 0x8000,
+						0x8000, 0x8000 };
+VECT_VAR_DECL (expected_mul, int, 32, 4) [] = { 0x80000000, 0x80000000,
+						0x80000000, 0x80000000 };
+
+/* Expected values of cumulative_saturation flag when rounding
+   should not cause saturation.  */
+int VECT_VAR (expected_cumulative_sat_round, int, 16, 4) = 1;
+int VECT_VAR (expected_cumulative_sat_round, int, 32, 2) = 1;
+int VECT_VAR (expected_cumulative_sat_round, int, 16, 8) = 1;
+int VECT_VAR (expected_cumulative_sat_round, int, 32, 4) = 1;
+
+/* Expected results when rounding should not cause saturation.  */
+VECT_VAR_DECL (expected_round, int, 16, 4) [] = { 0x8000, 0x8000,
+						  0x8000, 0x8000 };
+VECT_VAR_DECL (expected_round, int, 32, 2) [] = { 0x80000000, 0x80000000 };
+VECT_VAR_DECL (expected_round, int, 16, 8) [] = { 0x8000, 0x8000,
+						  0x8000, 0x8000,
+						  0x8000, 0x8000,
+						  0x8000, 0x8000 };
+VECT_VAR_DECL (expected_round, int, 32, 4) [] = { 0x80000000, 0x80000000,
+						  0x80000000, 0x80000000 };
+
+#define INSN vqrdmlsh
+#define TEST_MSG "VQRDMLSH_LANE"
+
+#include "vqrdmlXh_lane.inc"
-- 
2.1.4


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [AArch64][dejagnu][PATCH 5/7] Dejagnu support for ARMv8.1 Adv.SIMD.
  2015-10-23 12:24 ` [AArch64][dejagnu][PATCH 5/7] Dejagnu support for ARMv8.1 Adv.SIMD Matthew Wahab
@ 2015-10-24  8:04   ` Bernhard Reutner-Fischer
  2015-10-27 15:32     ` Matthew Wahab
  0 siblings, 1 reply; 30+ messages in thread
From: Bernhard Reutner-Fischer @ 2015-10-24  8:04 UTC (permalink / raw)
  To: Matthew Wahab, gcc-patches

On October 23, 2015 2:24:26 PM GMT+02:00, Matthew Wahab <matthew.wahab@foss.arm.com> wrote:
>The ARMv8.1 architecture extension adds two Adv.SIMD instructions,.
>This
>patch adds support in Dejagnu for ARMv8.1 Adv.SIMD specifiers and
>checks.
>
>The new test options are
>- { dg-add-options arm_v8_1a_neon }: Add compiler options needed to
>   enable ARMv8.1 Adv.SIMD.
>- { dg-require-effective-target arm_v8_1a_neon_hw }: Require a target
>   capable of executing ARMv8.1 Adv.SIMD instructions.
>
>The new options support AArch64 only.
>
>Tested the series for aarch64-none-linux-gnu with native bootstrap and
>make check on an ARMv8 architecture. Also tested aarch64-none-elf with
>cross-compiled check-gcc on an ARMv8.1 emulator.


 
+# Return 1 if the target supports the ARMv8.1 Adv.SIMD extension, 0
+# otherwise.  The test is valid for AArch64.
+
+proc check_effective_target_arm_v8_1a_neon_ok_nocache { } {
+    return [check_no_compiler_messages_nocache arm_v8_1a_neon_ok assembly {
+	#if !defined (__ARM_FEATURE_QRDMX)
+	#error FOO
+	#endif
+    } [add_options_for_arm_v8_1a_neon ""]]
+}

Please error with something more meaningful than FOO, !__ARM_FEATURE_QRDMX comes to mind.

TIA,

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [AArch64][PATCH 1/7] Add support for ARMv8.1 Adv.SIMD,instructions.
  2015-10-23 12:19 [AArch64][PATCH 1/7] Add support for ARMv8.1 Adv.SIMD,instructions Matthew Wahab
                   ` (5 preceding siblings ...)
  2015-10-23 12:34 ` [AArch64][PATCH 7/7] Add NEON intrinsics vqrdmlah_lane and vqrdmlsh_lane Matthew Wahab
@ 2015-10-27 10:54 ` James Greenhalgh
  6 siblings, 0 replies; 30+ messages in thread
From: James Greenhalgh @ 2015-10-27 10:54 UTC (permalink / raw)
  To: Matthew Wahab; +Cc: gcc-patches

On Fri, Oct 23, 2015 at 01:16:25PM +0100, Matthew Wahab wrote:
> The ARMv8.1 architecture extension adds two Adv.SIMD instructions,
> sqrdmlah and sqrdmlsh. This patch series adds the instructions to the
> AArch64 backend together with the ACLE feature macro and NEON intrinsics
> to make use of them. The instructions are enabled when -march=armv8.1-a
> is selected.
> 
> To support execution tests for the instructions, code is also added to
> the testsuite to check the target capabilities and to specify required
> compiler options.
> 
> This patch adds target feature macros for the instructions. Subsequent
> patches:
> - add the instructions to the aarch64-simd patterns,
> - add GCC builtins to generate the instructions,
> - add the ACLE feature macro __ARM_FEATURE_QRDMX,
> - add support for ARMv8.1-A Adv.SIMD tests to the dejagnu support code,
> - add NEON intrinsics for the basic form of the instructions.
> - add NEON intrinsics for the *_lane forms of the instructions.
> 
> Tested the series for aarch64-none-linux-gnu with native bootstrap and
> make check on an ARMv8 architecture. Also tested aarch64-none-elf with
> cross-compiled check-gcc on an ARMv8.1 emulator.
> 
> Ok for trunk?
> Matthew

OK.

Thanks,
James

> 
> gcc/
> 2015-10-23  Matthew Wahab  <matthew.wahab@arm.com>
> 
> 	* config/aarch64/aarch64.h (AARCH64_ISA_RDMA): New.
> 	(TARGET_SIMD_RDMA): New.
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [AArch64][PATCH 2/7] Add sqrdmah, sqrdmsh instructions.
  2015-10-23 12:19 ` [AArch64][PATCH 2/7] Add sqrdmah, sqrdmsh instructions Matthew Wahab
@ 2015-10-27 11:19   ` James Greenhalgh
  2015-10-27 16:12     ` Matthew Wahab
  0 siblings, 1 reply; 30+ messages in thread
From: James Greenhalgh @ 2015-10-27 11:19 UTC (permalink / raw)
  To: Matthew Wahab; +Cc: gcc-patches

On Fri, Oct 23, 2015 at 01:19:20PM +0100, Matthew Wahab wrote:
> The ARMv8.1 architecture extension adds two Adv.SIMD instructions,
> sqrdmlah and sqrdmlsh. This patch adds the instructions to the
> aarch64-simd patterns, making them conditional on the TARGET_SIMD_RDMA
> feature macro introduced in the previous patch.
> 
> The instructions patterns are defined using unspec expressions, so that
> they are only generated through builtins added by this patch series. To
> simplify the definition, iterators SQRDMLAH and rdma_as are added, to
> iterate over the add (sqrdmlah) and subtract (sqrdmlsh) forms of the
> instructions.
> 
> Tested the series for aarch64-none-linux-gnu with native bootstrap and
> make check on an ARMv8 architecture. Also tested aarch64-none-elf with
> cross-compiled check-gcc on an ARMv8.1 emulator.
> 
> Ok for trunk?
> Matthew

OK with the name of the iterator fixed to something more clear as to what
you are iterating over.

> gcc/
> 2015-10-23  Matthew Wahab  <matthew.wahab@arm.com>
> 
> 	* config/aarch64/aarch64-simd.md
> 	(aarch64_sqmovun<mode>): Fix some white-space.
> 	(aarch64_<sur>qmovun<mode>): Likewise.
> 	(aarch64_sqrdml<SQRDMLAH:rdma_as>h<mode>): New.
> 	(aarch64_sqrdml<SQRDMLAH:rdma_as>h_lane<mode>): New.
> 	(aarch64_sqrdml<SQRDMLAH:rdma_as>h_laneq<mode>): New.
> 	* config/aarch64/iterators.md (UNSPEC_SQRDMLAH): New.
> 	(UNSPEC_SQRDMLSH): New.
> 	(SQRDMLAH): New.
> 	(rdma_as): New.
> 

> diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
> index 964f8f1..409ba7b 100644
> --- a/gcc/config/aarch64/iterators.md
> +++ b/gcc/config/aarch64/iterators.md
> @@ -303,6 +303,8 @@
>      UNSPEC_PMULL2       ; Used in aarch64-simd.md.
>      UNSPEC_REV_REGLIST  ; Used in aarch64-simd.md.
>      UNSPEC_VEC_SHR      ; Used in aarch64-simd.md.
> +    UNSPEC_SQRDMLAH     ; Used in aarch64-simd.md.
> +    UNSPEC_SQRDMLSH     ; Used in aarch64-simd.md.
>  ])
>  
>  ;; -------------------------------------------------------------------
> @@ -932,6 +934,8 @@
>                                 UNSPEC_SQSHRN UNSPEC_UQSHRN
>                                 UNSPEC_SQRSHRN UNSPEC_UQRSHRN])
>  
> +(define_int_iterator SQRDMLAH [UNSPEC_SQRDMLAH UNSPEC_SQRDMLSH])
> +

This name does not make it clear that you will iterate over an "A" and an
"S" form. I'd like to see a clearer naming choice, RDMAS? SQRDMLHADDSUB? etc.

Thanks,
James

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [AArch64][PATCH 3/7] Add builtins for ARMv8.1 Adv.SIMD,instructions.
  2015-10-23 12:21 ` [AArch64][PATCH 3/7] Add builtins for ARMv8.1 Adv.SIMD,instructions Matthew Wahab
@ 2015-10-27 11:20   ` James Greenhalgh
  0 siblings, 0 replies; 30+ messages in thread
From: James Greenhalgh @ 2015-10-27 11:20 UTC (permalink / raw)
  To: Matthew Wahab; +Cc: gcc-patches

On Fri, Oct 23, 2015 at 01:20:55PM +0100, Matthew Wahab wrote:
> The ARMv8.1 architecture extension adds two Adv.SIMD instructions,
> sqrdmlah and sqrdmlsh. This patch adds the GCC builtins to generate the new
> instructions which are needed for the NEON intrinsics added later in
> this series.
> 
> Tested the series for aarch64-none-linux-gnu with native bootstrap and
> make check on an ARMv8 architecture. Also tested aarch64-none-elf with
> cross-compiled check-gcc on an ARMv8.1 emulator.
> 
> Ok for trunk?
> Matthew

OK.

Thanks,
James

> 
> gcc/
> 2015-10-23  Matthew Wahab  <matthew.wahab@arm.com>
> 
> 	* config/aarch64/aarch64-simd-builtins.def
> 	(sqrdmlah, sqrdmlsh): New.
> 	(sqrdmlah_lane, sqrdmlsh_lane): New.
> 	(sqrdmlah_laneq, sqrdmlsh_laneq): New.
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [AArch64][PATCH 4/7] Add ACLE feature macro for ARMv8.1,Adv.SIMD instructions.
  2015-10-23 12:24 ` [AArch64][PATCH 4/7] Add ACLE feature macro for ARMv8.1,Adv.SIMD instructions Matthew Wahab
@ 2015-10-27 11:36   ` James Greenhalgh
  2015-11-17 13:21     ` James Greenhalgh
  0 siblings, 1 reply; 30+ messages in thread
From: James Greenhalgh @ 2015-10-27 11:36 UTC (permalink / raw)
  To: Matthew Wahab; +Cc: gcc-patches, Richard Earnshaw, Marcus Shawcroft

On Fri, Oct 23, 2015 at 01:22:16PM +0100, Matthew Wahab wrote:
> The ARMv8.1 architecture extension adds two Adv.SIMD instructions,
> sqrdmlah and sqrdmlsh. This patch adds the feature macro
> __ARM_FEATURE_QRDMX to indicate the presence of these instructions,
> generating it when the feature is available, as it is when
> -march=armv8.1-a is selected.
> 
> Tested the series for aarch64-none-linux-gnu with native bootstrap and
> make check on an ARMv8 architecture. Also tested aarch64-none-elf with
> cross-compiled check-gcc on an ARMv8.1 emulator.
> 
> Ok for trunk?
> Matthew

I don't see this macro documented in the versions of ACLE available from
the ARM documentation sites, and googling doesn't show anything other
than your patches. You don't explicitly mention anywhere in cover text for
this series where these new features are (or will be?) documented.

Could you please write a more complete description of where these new
macros and intrinsics come from and what they are intended to do? I would
not like to accept them without some confidence that these names have
been finalized, and I am nervous about having the best description of the
behaviour of them be the GCC source code.

Richard, Marcus?

Thanks,
James

> 
> gcc/
> 2015-10-23  Matthew Wahab  <matthew.wahab@arm.com>
> 
> 	* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Add
> 	ARM_FEATURE_QRDMX.
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [AArch64][dejagnu][PATCH 5/7] Dejagnu support for ARMv8.1 Adv.SIMD.
  2015-10-24  8:04   ` Bernhard Reutner-Fischer
@ 2015-10-27 15:32     ` Matthew Wahab
  2015-11-23 12:34       ` James Greenhalgh
  0 siblings, 1 reply; 30+ messages in thread
From: Matthew Wahab @ 2015-10-27 15:32 UTC (permalink / raw)
  To: Bernhard Reutner-Fischer, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1458 bytes --]

On 24/10/15 08:16, Bernhard Reutner-Fischer wrote:
> On October 23, 2015 2:24:26 PM GMT+02:00, Matthew Wahab <matthew.wahab@foss.arm.com> wrote:
>> The ARMv8.1 architecture extension adds two Adv.SIMD instructions,.
>> This
>> patch adds support in Dejagnu for ARMv8.1 Adv.SIMD specifiers and
>> checks.
>>
>> The new test options are
>> - { dg-add-options arm_v8_1a_neon }: Add compiler options needed to
>>    enable ARMv8.1 Adv.SIMD.
>> - { dg-require-effective-target arm_v8_1a_neon_hw }: Require a target
>>    capable of executing ARMv8.1 Adv.SIMD instructions.
>>
>
> Please error with something more meaningful than FOO, !__ARM_FEATURE_QRDMX comes to mind.
>
> TIA,
>

I've reworked the patch so that the error is "__ARM_FEATURE_QRDMX not
defined" and also strengthened the check_effective_target tests.

Retested for aarch64-none-elf with cross-compiled check-gcc on an
ARMv8.1 emulator. Also tested with a version of the compiler that
doesn't define the ACLE feature macro.

Matthew

gcc/testsuite
2015-10-27  Matthew Wahab  <matthew.wahab@arm.com>

	* lib/target-supports.exp (add_options_for_arm_v8_1a_neon): New.
	(check_effective_target_arm_arch_FUNC_ok)
	(add_options_for_arm_arch_FUNC)
	(check_effective_target_arm_arch_FUNC_multilib): Add "armv8.1-a"
	to the list to be generated.
	(check_effective_target_arm_v8_1a_neon_ok_nocache): New.
	(check_effective_target_arm_v8_1a_neon_ok): New.
	(check_effective_target_arm_v8_1a_neon_hw): New.



[-- Attachment #2: 0005-Testsuite-Add-dejagnu-options-for-armv8.1-neon.patch --]
[-- Type: text/x-patch, Size: 3320 bytes --]

From b12969882298cb79737e882c48398c58a45161b9 Mon Sep 17 00:00:00 2001
From: Matthew Wahab <matthew.wahab@arm.com>
Date: Mon, 26 Oct 2015 14:58:36 +0000
Subject: [PATCH 5/7] [Testsuite] Add dejagnu options for armv8.1 neon

Change-Id: Ib58b8c4930ad3971af3ea682eda043e14cd2e8b3
---
 gcc/testsuite/lib/target-supports.exp | 56 ++++++++++++++++++++++++++++++++++-
 1 file changed, 55 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 4d5b0a3d..0fb679d 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2700,6 +2700,16 @@ proc add_options_for_arm_v8_neon { flags } {
     return "$flags $et_arm_v8_neon_flags -march=armv8-a"
 }
 
+# Add the options needed for ARMv8.1 Adv.SIMD.
+
+proc add_options_for_arm_v8_1a_neon { flags } {
+    if { [istarget aarch64*-*-*] } {
+	return "$flags -march=armv8.1-a"
+    } else {
+	return "$flags"
+    }
+}
+
 proc add_options_for_arm_crc { flags } {
     if { ! [check_effective_target_arm_crc_ok] } {
         return "$flags"
@@ -2984,7 +2994,8 @@ foreach { armfunc armflag armdef } { v4 "-march=armv4 -marm" __ARM_ARCH_4__
 				     v7r "-march=armv7-r" __ARM_ARCH_7R__
 				     v7m "-march=armv7-m -mthumb" __ARM_ARCH_7M__
 				     v7em "-march=armv7e-m -mthumb" __ARM_ARCH_7EM__
-				     v8a "-march=armv8-a" __ARM_ARCH_8A__ } {
+				     v8a "-march=armv8-a" __ARM_ARCH_8A__
+				     v8_1a "-march=armv8.1a" __ARM_ARCH_8A__ } {
     eval [string map [list FUNC $armfunc FLAG $armflag DEF $armdef ] {
 	proc check_effective_target_arm_arch_FUNC_ok { } {
 	    if { [ string match "*-marm*" "FLAG" ] &&
@@ -3141,6 +3152,25 @@ proc check_effective_target_arm_neonv2_hw { } {
     } [add_options_for_arm_neonv2 ""]]
 }
 
+# Return 1 if the target supports the ARMv8.1 Adv.SIMD extension, 0
+# otherwise.  The test is valid for AArch64.
+
+proc check_effective_target_arm_v8_1a_neon_ok_nocache { } {
+    if { ![istarget aarch64*-*-*] } {
+	return 0
+    }
+    return [check_no_compiler_messages_nocache arm_v8_1a_neon_ok assembly {
+	#if !defined (__ARM_FEATURE_QRDMX)
+	#error "__ARM_FEATURE_QRDMX not defined"
+	#endif
+    } [add_options_for_arm_v8_1a_neon ""]]
+}
+
+proc check_effective_target_arm_v8_1a_neon_ok { } {
+    return [check_cached_effective_target arm_v8_1a_neon_ok \
+		check_effective_target_arm_v8_1a_neon_ok_nocache]
+}
+
 # Return 1 if the target supports executing ARMv8 NEON instructions, 0
 # otherwise.
 
@@ -3159,6 +3189,30 @@ proc check_effective_target_arm_v8_neon_hw { } {
     } [add_options_for_arm_v8_neon ""]]
 }
 
+# Return 1 if the target supports executing the ARMv8.1 Adv.SIMD extension, 0
+# otherwise.  The test is valid for AArch64.
+
+proc check_effective_target_arm_v8_1a_neon_hw { } {
+    if { ![check_effective_target_arm_v8_1a_neon_ok] } {
+	return 0;
+    }
+    return [check_runtime_nocache arm_v8_1a_neon_hw_available {
+	int
+	main (void)
+	{
+	  long long a = 0, b = 1;
+	  long long result = 0;
+
+	  asm ("sqrdmlah %s0,%s1,%s2"
+	       : "=w"(result)
+	       : "w"(a), "w"(b)
+	       : /* No clobbers.  */);
+
+	  return result;
+	}
+    }  [add_options_for_arm_v8_1a_neon ""]]
+}
+
 # Return 1 if this is a ARM target with NEON enabled.
 
 proc check_effective_target_arm_neon { } {
-- 
2.1.4


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [AArch64][PATCH 2/7] Add sqrdmah, sqrdmsh instructions.
  2015-10-27 11:19   ` James Greenhalgh
@ 2015-10-27 16:12     ` Matthew Wahab
  2015-10-27 16:30       ` James Greenhalgh
  0 siblings, 1 reply; 30+ messages in thread
From: Matthew Wahab @ 2015-10-27 16:12 UTC (permalink / raw)
  To: James Greenhalgh; +Cc: gcc-patches

On 27/10/15 11:18, James Greenhalgh wrote:

>>   ;; -------------------------------------------------------------------
>> @@ -932,6 +934,8 @@
>>                                  UNSPEC_SQSHRN UNSPEC_UQSHRN
>>                                  UNSPEC_SQRSHRN UNSPEC_UQRSHRN])
>>
>> +(define_int_iterator SQRDMLAH [UNSPEC_SQRDMLAH UNSPEC_SQRDMLSH])
>> +
>
> This name does not make it clear that you will iterate over an "A" and an
> "S" form. I'd like to see a clearer naming choice, RDMAS? SQRDMLHADDSUB? etc.

SQRDMLHADDSUB is a little difficult to read. How about SQRDMLH_AS, to keep the link 
to the instruction?

Matthew


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [AArch64][PATCH 2/7] Add sqrdmah, sqrdmsh instructions.
  2015-10-27 16:12     ` Matthew Wahab
@ 2015-10-27 16:30       ` James Greenhalgh
  0 siblings, 0 replies; 30+ messages in thread
From: James Greenhalgh @ 2015-10-27 16:30 UTC (permalink / raw)
  To: Matthew Wahab; +Cc: gcc-patches

On Tue, Oct 27, 2015 at 04:11:07PM +0000, Matthew Wahab wrote:
> On 27/10/15 11:18, James Greenhalgh wrote:
> 
> >>  ;; -------------------------------------------------------------------
> >>@@ -932,6 +934,8 @@
> >>                                 UNSPEC_SQSHRN UNSPEC_UQSHRN
> >>                                 UNSPEC_SQRSHRN UNSPEC_UQRSHRN])
> >>
> >>+(define_int_iterator SQRDMLAH [UNSPEC_SQRDMLAH UNSPEC_SQRDMLSH])
> >>+
> >
> >This name does not make it clear that you will iterate over an "A" and an
> >"S" form. I'd like to see a clearer naming choice, RDMAS? SQRDMLHADDSUB? etc.
> 
> SQRDMLHADDSUB is a little difficult to read. How about SQRDMLH_AS,
> to keep the link to the instruction?

Sounds good to me.

Thanks,
James

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [AArch64][PATCH 6/7] Add NEON intrinsics vqrdmlah and vqrdmlsh.
  2015-10-23 12:30 ` [AArch64][PATCH 6/7] Add NEON intrinsics vqrdmlah and vqrdmlsh Matthew Wahab
@ 2015-10-30 12:53   ` Christophe Lyon
  2015-10-30 15:56     ` Matthew Wahab
  2015-11-23 13:37   ` James Greenhalgh
  1 sibling, 1 reply; 30+ messages in thread
From: Christophe Lyon @ 2015-10-30 12:53 UTC (permalink / raw)
  To: Matthew Wahab; +Cc: gcc-patches

On 23 October 2015 at 14:26, Matthew Wahab <matthew.wahab@foss.arm.com> wrote:
> The ARMv8.1 architecture extension adds two Adv.SIMD instructions,
> sqrdmlah and sqrdmlsh. This patch adds the NEON intrinsics vqrdmlah and
> vqrdmlsh for these instructions. The new intrinsics are of the form
> vqrdml{as}h[q]_<type>.
>
> Tested the series for aarch64-none-linux-gnu with native bootstrap and
> make check on an ARMv8 architecture. Also tested aarch64-none-elf with
> cross-compiled check-gcc on an ARMv8.1 emulator.
>

Is there a publicly available simulator for v8.1? QEMU or Foundation Model?


> Ok for trunk?
> Matthew
>
> gcc/
> 2015-10-23  Matthew Wahab  <matthew.wahab@arm.com>
>
>         * gcc/config/aarch64/arm_neon.h (vqrdmlah_s16, vqrdmlah_s32): New.
>         (vqrdmlahq_s16, vqrdmlahq_s32): New.
>         (vqrdmlsh_s16, vqrdmlsh_s32): New.
>         (vqrdmlshq_s16, vqrdmlshq_s32): New.
>
> gcc/testsuite
> 2015-10-23  Matthew Wahab  <matthew.wahab@arm.com>
>
>         * gcc.target/aarch64/advsimd-intrinsics/vqrdmlXh.inc: New file,
>         support code for vqrdml{as}h tests.
>         * gcc.target/aarch64/advsimd-intrinsics/vqrdmlah.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vqrdmlsh.c: New.
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [AArch64][PATCH 6/7] Add NEON intrinsics vqrdmlah and vqrdmlsh.
  2015-10-30 12:53   ` Christophe Lyon
@ 2015-10-30 15:56     ` Matthew Wahab
  2015-11-09 13:31       ` Christophe Lyon
  0 siblings, 1 reply; 30+ messages in thread
From: Matthew Wahab @ 2015-10-30 15:56 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: gcc-patches

On 30/10/15 12:51, Christophe Lyon wrote:
> On 23 October 2015 at 14:26, Matthew Wahab <matthew.wahab@foss.arm.com> wrote:
>> The ARMv8.1 architecture extension adds two Adv.SIMD instructions,
>> sqrdmlah and sqrdmlsh. This patch adds the NEON intrinsics vqrdmlah and
>> vqrdmlsh for these instructions. The new intrinsics are of the form
>> vqrdml{as}h[q]_<type>.
>>
>> Tested the series for aarch64-none-linux-gnu with native bootstrap and
>> make check on an ARMv8 architecture. Also tested aarch64-none-elf with
>> cross-compiled check-gcc on an ARMv8.1 emulator.
>>
>
> Is there a publicly available simulator for v8.1? QEMU or Foundation Model?
>

Sorry, I don't know.
Matthew

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [AArch64][PATCH 6/7] Add NEON intrinsics vqrdmlah and vqrdmlsh.
  2015-10-30 15:56     ` Matthew Wahab
@ 2015-11-09 13:31       ` Christophe Lyon
  2015-11-09 13:53         ` Matthew Wahab
  0 siblings, 1 reply; 30+ messages in thread
From: Christophe Lyon @ 2015-11-09 13:31 UTC (permalink / raw)
  To: Matthew Wahab; +Cc: gcc-patches

On 30 October 2015 at 16:52, Matthew Wahab <matthew.wahab@foss.arm.com> wrote:
> On 30/10/15 12:51, Christophe Lyon wrote:
>>
>> On 23 October 2015 at 14:26, Matthew Wahab <matthew.wahab@foss.arm.com>
>> wrote:
>>>
>>> The ARMv8.1 architecture extension adds two Adv.SIMD instructions,
>>> sqrdmlah and sqrdmlsh. This patch adds the NEON intrinsics vqrdmlah and
>>> vqrdmlsh for these instructions. The new intrinsics are of the form
>>> vqrdml{as}h[q]_<type>.
>>>
>>> Tested the series for aarch64-none-linux-gnu with native bootstrap and
>>> make check on an ARMv8 architecture. Also tested aarch64-none-elf with
>>> cross-compiled check-gcc on an ARMv8.1 emulator.
>>>
>>
>> Is there a publicly available simulator for v8.1? QEMU or Foundation
>> Model?
>>
>
> Sorry, I don't know.
> Matthew
>

So, what will happen to the testsuite once this is committed?
Are we going to see FAILs when using QEMU?

Thanks,

Christophe.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [AArch64][PATCH 6/7] Add NEON intrinsics vqrdmlah and vqrdmlsh.
  2015-11-09 13:31       ` Christophe Lyon
@ 2015-11-09 13:53         ` Matthew Wahab
  0 siblings, 0 replies; 30+ messages in thread
From: Matthew Wahab @ 2015-11-09 13:53 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: gcc-patches

On 09/11/15 13:31, Christophe Lyon wrote:
> On 30 October 2015 at 16:52, Matthew Wahab <matthew.wahab@foss.arm.com> wrote:
>> On 30/10/15 12:51, Christophe Lyon wrote:
>>>
>>> On 23 October 2015 at 14:26, Matthew Wahab <matthew.wahab@foss.arm.com>
>>> wrote:
>>>>
>>>> The ARMv8.1 architecture extension adds two Adv.SIMD instructions,
>>>> sqrdmlah and sqrdmlsh. This patch adds the NEON intrinsics vqrdmlah and
>>>> vqrdmlsh for these instructions. The new intrinsics are of the form
>>>> vqrdml{as}h[q]_<type>.
>>>>
>>>> Tested the series for aarch64-none-linux-gnu with native bootstrap and
>>>> make check on an ARMv8 architecture. Also tested aarch64-none-elf with
>>>> cross-compiled check-gcc on an ARMv8.1 emulator.
>>>
>>> Is there a publicly available simulator for v8.1? QEMU or Foundation
>>> Model?
>>
>> Sorry, I don't know.
>> Matthew
>>
>
> So, what will happen to the testsuite once this is committed?
> Are we going to see FAILs when using QEMU?
>

No, the check at the top of the  test files

+/* { dg-require-effective-target arm_v8_1a_neon_hw } */

should make this test UNSUPPORTED if the the HW/simulator can't execute it. (Support 
for this check is added in patch #5 in this series.) Note that the aarch64-none-linux 
make check was run on ARMv8 HW which can't execute the test and correctly reported it 
as unsupported.

Matthew

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [AArch64][PATCH 4/7] Add ACLE feature macro for ARMv8.1,Adv.SIMD instructions.
  2015-10-27 11:36   ` James Greenhalgh
@ 2015-11-17 13:21     ` James Greenhalgh
  0 siblings, 0 replies; 30+ messages in thread
From: James Greenhalgh @ 2015-11-17 13:21 UTC (permalink / raw)
  To: Matthew Wahab; +Cc: gcc-patches, Richard Earnshaw, Marcus Shawcroft

On Tue, Oct 27, 2015 at 11:33:21AM +0000, James Greenhalgh wrote:
> On Fri, Oct 23, 2015 at 01:22:16PM +0100, Matthew Wahab wrote:
> > The ARMv8.1 architecture extension adds two Adv.SIMD instructions,
> > sqrdmlah and sqrdmlsh. This patch adds the feature macro
> > __ARM_FEATURE_QRDMX to indicate the presence of these instructions,
> > generating it when the feature is available, as it is when
> > -march=armv8.1-a is selected.
> > 
> > Tested the series for aarch64-none-linux-gnu with native bootstrap and
> > make check on an ARMv8 architecture. Also tested aarch64-none-elf with
> > cross-compiled check-gcc on an ARMv8.1 emulator.
> > 
> > Ok for trunk?
> > Matthew
> 
> I don't see this macro documented in the versions of ACLE available from
> the ARM documentation sites, and googling doesn't show anything other
> than your patches. You don't explicitly mention anywhere in cover text for
> this series where these new features are (or will be?) documented.
> 
> Could you please write a more complete description of where these new
> macros and intrinsics come from and what they are intended to do? I would
> not like to accept them without some confidence that these names have
> been finalized, and I am nervous about having the best description of the
> behaviour of them be the GCC source code.

This macro and the intrinsics included in this patch set are as they will
appear in a future release of ACLE.

__ARM_FEATURE_QRDMX will be defined to 1 if the SQRDMLAH and SQRDMLSH
instructions are available.

The intrinsics added take this form for the non-lane intrinsics:

  int16x4_t vqrdmlah_s16 (int16x4_t a, int16x4_t b, int16x4_t c)
    a -> Vd.4H, b -> Vn.4H, c-> Vm.4h
    VQRDMLAH Vd.4H,Vn.4H,Vm.4H
    Vd.4H -> result

And this form for the lane intrinsics:

  int16x4_t vqrdmlah_lane_s16 (int16x4_t a, int16x4_t b,
			       int16x4_t v, const int lane)
    a -> Vd.4H, b -> Vn.4H, v -> Vm.4h, 0 <= lane <= 3
    VQRDMLAH Vd.4H,Vn.4H,Vm.H[lane]
    Vd.4H -> result

Using the same syntax as is in the ARM Neon Intrinsics Reference [1].

These intrinsics are only available when __ARM_FEATURE_QRDMX is defined.

With all that said...

This patch is OK, but please fix the ChangeLog entry:

> > 	* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Add
> > 	ARM_FEATURE_QRDMX.

  s/ARM_FEATURE_QRDMX/__ARM_FEATURE_QRDMX/

Thanks,
James

---
[1]: http://infocenter.arm.com/help/topic/com.arm.doc.ihi0073a/IHI0073A_arm_neon_intrinsics_ref.pdf
  

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [AArch64][dejagnu][PATCH 5/7] Dejagnu support for ARMv8.1 Adv.SIMD.
  2015-10-27 15:32     ` Matthew Wahab
@ 2015-11-23 12:34       ` James Greenhalgh
  2015-11-23 16:40         ` Matthew Wahab
  0 siblings, 1 reply; 30+ messages in thread
From: James Greenhalgh @ 2015-11-23 12:34 UTC (permalink / raw)
  To: Matthew Wahab; +Cc: Bernhard Reutner-Fischer, gcc-patches

On Tue, Oct 27, 2015 at 03:32:04PM +0000, Matthew Wahab wrote:
> On 24/10/15 08:16, Bernhard Reutner-Fischer wrote:
> >On October 23, 2015 2:24:26 PM GMT+02:00, Matthew Wahab <matthew.wahab@foss.arm.com> wrote:
> >>The ARMv8.1 architecture extension adds two Adv.SIMD instructions,.
> >>This
> >>patch adds support in Dejagnu for ARMv8.1 Adv.SIMD specifiers and
> >>checks.
> >>
> >>The new test options are
> >>- { dg-add-options arm_v8_1a_neon }: Add compiler options needed to
> >>   enable ARMv8.1 Adv.SIMD.
> >>- { dg-require-effective-target arm_v8_1a_neon_hw }: Require a target
> >>   capable of executing ARMv8.1 Adv.SIMD instructions.
> >>
> >
> >Please error with something more meaningful than FOO, !__ARM_FEATURE_QRDMX comes to mind.
> >
> >TIA,
> >
> 
> I've reworked the patch so that the error is "__ARM_FEATURE_QRDMX not
> defined" and also strengthened the check_effective_target tests.
> 
> Retested for aarch64-none-elf with cross-compiled check-gcc on an
> ARMv8.1 emulator. Also tested with a version of the compiler that
> doesn't define the ACLE feature macro.

Hi Matthew,

I have a couple of comments below. Neither need to block the patch, but
I'd appreciate a reply before I say OK.

> From b12969882298cb79737e882c48398c58a45161b9 Mon Sep 17 00:00:00 2001
> From: Matthew Wahab <matthew.wahab@arm.com>
> Date: Mon, 26 Oct 2015 14:58:36 +0000
> Subject: [PATCH 5/7] [Testsuite] Add dejagnu options for armv8.1 neon
> 
> Change-Id: Ib58b8c4930ad3971af3ea682eda043e14cd2e8b3
> ---
>  gcc/testsuite/lib/target-supports.exp | 56 ++++++++++++++++++++++++++++++++++-
>  1 file changed, 55 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
> index 4d5b0a3d..0fb679d 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -2700,6 +2700,16 @@ proc add_options_for_arm_v8_neon { flags } {
>      return "$flags $et_arm_v8_neon_flags -march=armv8-a"
>  }
>  
> +# Add the options needed for ARMv8.1 Adv.SIMD.
> +
> +proc add_options_for_arm_v8_1a_neon { flags } {
> +    if { [istarget aarch64*-*-*] } {
> +	return "$flags -march=armv8.1-a"

Should this be -march=armv8.1-a+simd or some other feature flag?

> +    } else {
> +	return "$flags"
> +    }
> +}
> +
>  proc add_options_for_arm_crc { flags } {
>      if { ! [check_effective_target_arm_crc_ok] } {
>          return "$flags"
> @@ -2984,7 +2994,8 @@ foreach { armfunc armflag armdef } { v4 "-march=armv4 -marm" __ARM_ARCH_4__
>  				     v7r "-march=armv7-r" __ARM_ARCH_7R__
>  				     v7m "-march=armv7-m -mthumb" __ARM_ARCH_7M__
>  				     v7em "-march=armv7e-m -mthumb" __ARM_ARCH_7EM__
> -				     v8a "-march=armv8-a" __ARM_ARCH_8A__ } {
> +				     v8a "-march=armv8-a" __ARM_ARCH_8A__
> +				     v8_1a "-march=armv8.1a" __ARM_ARCH_8A__ } {
>      eval [string map [list FUNC $armfunc FLAG $armflag DEF $armdef ] {
>  	proc check_effective_target_arm_arch_FUNC_ok { } {
>  	    if { [ string match "*-marm*" "FLAG" ] &&
> @@ -3141,6 +3152,25 @@ proc check_effective_target_arm_neonv2_hw { } {
>      } [add_options_for_arm_neonv2 ""]]
>  }
>  
> +# Return 1 if the target supports the ARMv8.1 Adv.SIMD extension, 0
> +# otherwise.  The test is valid for AArch64.
> +
> +proc check_effective_target_arm_v8_1a_neon_ok_nocache { } {
> +    if { ![istarget aarch64*-*-*] } {
> +	return 0
> +    }
> +    return [check_no_compiler_messages_nocache arm_v8_1a_neon_ok assembly {
> +	#if !defined (__ARM_FEATURE_QRDMX)
> +	#error "__ARM_FEATURE_QRDMX not defined"
> +	#endif
> +    } [add_options_for_arm_v8_1a_neon ""]]
> +}
> +
> +proc check_effective_target_arm_v8_1a_neon_ok { } {
> +    return [check_cached_effective_target arm_v8_1a_neon_ok \
> +		check_effective_target_arm_v8_1a_neon_ok_nocache]
> +}
> +
>  # Return 1 if the target supports executing ARMv8 NEON instructions, 0
>  # otherwise.
>  
> @@ -3159,6 +3189,30 @@ proc check_effective_target_arm_v8_neon_hw { } {
>      } [add_options_for_arm_v8_neon ""]]
>  }
>  
> +# Return 1 if the target supports executing the ARMv8.1 Adv.SIMD extension, 0
> +# otherwise.  The test is valid for AArch64.
> +
> +proc check_effective_target_arm_v8_1a_neon_hw { } {
> +    if { ![check_effective_target_arm_v8_1a_neon_ok] } {
> +	return 0;
> +    }
> +    return [check_runtime_nocache arm_v8_1a_neon_hw_available {
> +	int
> +	main (void)
> +	{
> +	  long long a = 0, b = 1;
> +	  long long result = 0;
> +
> +	  asm ("sqrdmlah %s0,%s1,%s2"
> +	       : "=w"(result)
> +	       : "w"(a), "w"(b)
> +	       : /* No clobbers.  */);

Hm, those types look wrong, I guess this works but it is an unusual way
to write it. I presume this is to avoid including arm_neon.h each time, but
you could just directly use the internal type names for the arm_neon types.
That is to say __Int32x4_t (or whichever mode you intend to use).

> +
> +	  return result;
> +	}
> +    }  [add_options_for_arm_v8_1a_neon ""]]
> +}
> +
>  # Return 1 if this is a ARM target with NEON enabled.
>  
>  proc check_effective_target_arm_neon { } {

Thanks,
James

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [AArch64][PATCH 6/7] Add NEON intrinsics vqrdmlah and vqrdmlsh.
  2015-10-23 12:30 ` [AArch64][PATCH 6/7] Add NEON intrinsics vqrdmlah and vqrdmlsh Matthew Wahab
  2015-10-30 12:53   ` Christophe Lyon
@ 2015-11-23 13:37   ` James Greenhalgh
  2015-11-25 10:15     ` Matthew Wahab
  1 sibling, 1 reply; 30+ messages in thread
From: James Greenhalgh @ 2015-11-23 13:37 UTC (permalink / raw)
  To: Matthew Wahab; +Cc: gcc-patches

On Fri, Oct 23, 2015 at 01:26:11PM +0100, Matthew Wahab wrote:
> The ARMv8.1 architecture extension adds two Adv.SIMD instructions,
> sqrdmlah and sqrdmlsh. This patch adds the NEON intrinsics vqrdmlah and
> vqrdmlsh for these instructions. The new intrinsics are of the form
> vqrdml{as}h[q]_<type>.
> 
> Tested the series for aarch64-none-linux-gnu with native bootstrap and
> make check on an ARMv8 architecture. Also tested aarch64-none-elf with
> cross-compiled check-gcc on an ARMv8.1 emulator.
> 
> Ok for trunk?
> Matthew
> 
> gcc/
> 2015-10-23  Matthew Wahab  <matthew.wahab@arm.com>
> 
> 	* gcc/config/aarch64/arm_neon.h (vqrdmlah_s16, vqrdmlah_s32): New.
> 	(vqrdmlahq_s16, vqrdmlahq_s32): New.
> 	(vqrdmlsh_s16, vqrdmlsh_s32): New.
> 	(vqrdmlshq_s16, vqrdmlshq_s32): New.
> 
> gcc/testsuite
> 2015-10-23  Matthew Wahab  <matthew.wahab@arm.com>
> 
> 	* gcc.target/aarch64/advsimd-intrinsics/vqrdmlXh.inc: New file,
> 	support code for vqrdml{as}h tests.
> 	* gcc.target/aarch64/advsimd-intrinsics/vqrdmlah.c: New.
> 	* gcc.target/aarch64/advsimd-intrinsics/vqrdmlsh.c: New.
> 

> From 611e1232a59dfe42f2cd9666680407d67abcfea5 Mon Sep 17 00:00:00 2001
> From: Matthew Wahab <matthew.wahab@arm.com>
> Date: Thu, 27 Aug 2015 13:22:41 +0100
> Subject: [PATCH 6/7] Add neon intrinsics: vqrdmlah, vqrdmlsh.
> 
> Change-Id: I5c7f8d36ee980d280c1d50f6f212b286084c5acf
> ---
>  gcc/config/aarch64/arm_neon.h                      |  53 ++++++++
>  .../aarch64/advsimd-intrinsics/vqrdmlXh.inc        | 138 +++++++++++++++++++++
>  .../aarch64/advsimd-intrinsics/vqrdmlah.c          |  57 +++++++++
>  .../aarch64/advsimd-intrinsics/vqrdmlsh.c          |  61 +++++++++
>  4 files changed, 309 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlXh.inc
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlah.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlsh.c
> 
> diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
> index e186348..9e73809 100644
> --- a/gcc/config/aarch64/arm_neon.h
> +++ b/gcc/config/aarch64/arm_neon.h
> @@ -2649,6 +2649,59 @@ vqrdmulhq_s32 (int32x4_t __a, int32x4_t __b)
>    return (int32x4_t) __builtin_aarch64_sqrdmulhv4si (__a, __b);
>  }
>  
> +#pragma GCC push_options
> +#pragma GCC target ("arch=armv8.1-a")

Can we please patch the documentation to make it clear that -march=armv8.1-a
always implies -march=armv8.1-a+rdma ? The documentation around which
feature modifiers are implied when leaves much to be desired.

> +
> +__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
> +vqrdmlah_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c)
> +{
> +  return (int16x4_t) __builtin_aarch64_sqrdmlahv4hi (__a, __b, __c);

We don't need this cast (likewise the other instances)?

Thanks,
James


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [AArch64][PATCH 7/7] Add NEON intrinsics vqrdmlah_lane and vqrdmlsh_lane.
  2015-10-23 12:34 ` [AArch64][PATCH 7/7] Add NEON intrinsics vqrdmlah_lane and vqrdmlsh_lane Matthew Wahab
@ 2015-11-23 13:45   ` James Greenhalgh
  2015-11-25 10:25     ` Matthew Wahab
  0 siblings, 1 reply; 30+ messages in thread
From: James Greenhalgh @ 2015-11-23 13:45 UTC (permalink / raw)
  To: Matthew Wahab; +Cc: gcc-patches

On Fri, Oct 23, 2015 at 01:30:46PM +0100, Matthew Wahab wrote:
> The ARMv8.1 architecture extension adds two Adv.SIMD instructions,
> sqrdmlah and sqrdmlsh. This patch adds the NEON intrinsics vqrdmlah_lane
> and vqrdmlsh_lane for these instructions. The new intrinsics are of the
> form vqrdml{as}h[q]_lane_<type>.
> 
> Tested the series for aarch64-none-linux-gnu with native bootstrap and
> make check on an ARMv8 architecture. Also tested aarch64-none-elf with
> cross-compiled check-gcc on an ARMv8.1 emulator.
> 
> Ok for trunk?
> Matthew
> 
> gcc/
> 2015-10-23  Matthew Wahab  <matthew.wahab@arm.com>
> 
> 	* gcc/config/aarch64/arm_neon.h
> 	(vqrdmlah_laneq_s16, vqrdmlah_laneq_s32): New.
> 	(vqrdmlahq_laneq_s16, vqrdmlahq_laneq_s32): New.
> 	(vqrdmlsh_laneq_s16, vqrdmlsh_laneq_s32): New.
> 	(vqrdmlshq_laneq_s16, vqrdmlshq_laneq_s32): New.
> 	(vqrdmlah_lane_s16, vqrdmlah_lane_s32): New.
> 	(vqrdmlahq_lane_s16, vqrdmlahq_lane_s32): New.
> 	(vqrdmlahh_s16, vqrdmlahh_lane_s16, vqrdmlahh_laneq_s16): New.
> 	(vqrdmlahs_s32, vqrdmlahs_lane_s32, vqrdmlahs_laneq_s32): New.
> 	(vqrdmlsh_lane_s16, vqrdmlsh_lane_s32): New.
> 	(vqrdmlshq_lane_s16, vqrdmlshq_lane_s32): New.
> 	(vqrdmlshh_s16, vqrdmlshh_lane_s16, vqrdmlshh_laneq_s16): New.
> 	(vqrdmlshs_s32, vqrdmlshs_lane_s32, vqrdmlshs_laneq_s32): New.
> 
> gcc/testsuite
> 2015-10-23  Matthew Wahab  <matthew.wahab@arm.com>
> 
> 	* gcc.target/aarch64/advsimd-intrinsics/vqrdmlXh_lane.inc: New file,
> 	support code for vqrdml{as}h_lane tests.
> 	* gcc.target/aarch64/advsimd-intrinsics/vqrdmlah_lane.c: New.
> 	* gcc.target/aarch64/advsimd-intrinsics/vqrdmlsh_lane.c: New.
> 

> From a2399818dba85ff2801a28bad77ef51697990da7 Mon Sep 17 00:00:00 2001
> From: Matthew Wahab <matthew.wahab@arm.com>
> Date: Thu, 27 Aug 2015 14:17:26 +0100
> Subject: [PATCH 7/7] Add neon intrinsics: vqrdmlah_lane, vqrdmlsh_lane.
> 
> Change-Id: I6d7a372e0a5b83ef0846ab62abbe9b24ada69fc4
> ---
>  gcc/config/aarch64/arm_neon.h                      | 182 +++++++++++++++++++++
>  .../aarch64/advsimd-intrinsics/vqrdmlXh_lane.inc   | 154 +++++++++++++++++
>  .../aarch64/advsimd-intrinsics/vqrdmlah_lane.c     |  57 +++++++
>  .../aarch64/advsimd-intrinsics/vqrdmlsh_lane.c     |  61 +++++++
>  4 files changed, 454 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlXh_lane.inc
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlah_lane.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlsh_lane.c
> 
> diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
> index 9e73809..9b68e4a 100644
> --- a/gcc/config/aarch64/arm_neon.h
> +++ b/gcc/config/aarch64/arm_neon.h
> @@ -10675,6 +10675,59 @@ vqrdmulhq_laneq_s32 (int32x4_t __a, int32x4_t __b, const int __c)
>    return __builtin_aarch64_sqrdmulh_laneqv4si (__a, __b, __c);
>  }
>  
> +#pragma GCC push_options
> +#pragma GCC target ("arch=armv8.1-a")

Rather than strict alphabetical order, can we group everything which is
under one set of extensions together, to save on the push_options/pop_options
pairs.

This patch is OK with that change.

Thanks,
James

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [AArch64][dejagnu][PATCH 5/7] Dejagnu support for ARMv8.1 Adv.SIMD.
  2015-11-23 12:34       ` James Greenhalgh
@ 2015-11-23 16:40         ` Matthew Wahab
  2015-11-25 10:14           ` Matthew Wahab
  0 siblings, 1 reply; 30+ messages in thread
From: Matthew Wahab @ 2015-11-23 16:40 UTC (permalink / raw)
  To: James Greenhalgh; +Cc: Bernhard Reutner-Fischer, gcc-patches

On 23/11/15 12:24, James Greenhalgh wrote:
> On Tue, Oct 27, 2015 at 03:32:04PM +0000, Matthew Wahab wrote:
>> On 24/10/15 08:16, Bernhard Reutner-Fischer wrote:
>>> On October 23, 2015 2:24:26 PM GMT+02:00, Matthew Wahab <matthew.wahab@foss.arm.com> wrote:
>>>> The ARMv8.1 architecture extension adds two Adv.SIMD instructions,.
>>>> This
>>>> patch adds support in Dejagnu for ARMv8.1 Adv.SIMD specifiers and
>>>> checks.
>>>>
>>>> The new test options are
>>>> - { dg-add-options arm_v8_1a_neon }: Add compiler options needed to
>>>>    enable ARMv8.1 Adv.SIMD.
>>>> - { dg-require-effective-target arm_v8_1a_neon_hw }: Require a target
>>>>    capable of executing ARMv8.1 Adv.SIMD instructions.
>>>>

> Hi Matthew,
>
> I have a couple of comments below. Neither need to block the patch, but
> I'd appreciate a reply before I say OK.
>
>>
>> diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
>> index 4d5b0a3d..0fb679d 100644
>> --- a/gcc/testsuite/lib/target-supports.exp
>> +++ b/gcc/testsuite/lib/target-supports.exp
>> @@ -2700,6 +2700,16 @@ proc add_options_for_arm_v8_neon { flags } {
>>       return "$flags $et_arm_v8_neon_flags -march=armv8-a"
>>   }
>>
>> +# Add the options needed for ARMv8.1 Adv.SIMD.
>> +
>> +proc add_options_for_arm_v8_1a_neon { flags } {
>> +    if { [istarget aarch64*-*-*] } {
>> +	return "$flags -march=armv8.1-a"
>
> Should this be -march=armv8.1-a+simd or some other feature flag?
>

I think it should by armv8.1-a only. +simd is enabled by all -march settings so it 
seems redundant to add it here. An alternative is to add +rdma but that's also 
enabled by armv8.1-a. (I've a patch at 
https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01973.html which gets rid for +rdma as 
part of an armv8.1-a command line clean up.)

>> +# Return 1 if the target supports executing the ARMv8.1 Adv.SIMD extension, 0
>> +# otherwise.  The test is valid for AArch64.
>> +
>> +proc check_effective_target_arm_v8_1a_neon_hw { } {
>> +    if { ![check_effective_target_arm_v8_1a_neon_ok] } {
>> +	return 0;
>> +    }
>> +    return [check_runtime_nocache arm_v8_1a_neon_hw_available {
>> +	int
>> +	main (void)
>> +	{
>> +	  long long a = 0, b = 1;
>> +	  long long result = 0;
>> +
>> +	  asm ("sqrdmlah %s0,%s1,%s2"
>> +	       : "=w"(result)
>> +	       : "w"(a), "w"(b)
>> +	       : /* No clobbers.  */);
>
> Hm, those types look wrong, I guess this works but it is an unusual way
> to write it. I presume this is to avoid including arm_neon.h each time, but
> you could just directly use the internal type names for the arm_neon types.
> That is to say __Int32x4_t (or whichever mode you intend to use).
>

I'll rework the patch to use the internal types names.

Matthew

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [AArch64][dejagnu][PATCH 5/7] Dejagnu support for ARMv8.1 Adv.SIMD.
  2015-11-23 16:40         ` Matthew Wahab
@ 2015-11-25 10:14           ` Matthew Wahab
  2015-11-25 10:57             ` James Greenhalgh
  0 siblings, 1 reply; 30+ messages in thread
From: Matthew Wahab @ 2015-11-25 10:14 UTC (permalink / raw)
  To: James Greenhalgh; +Cc: Bernhard Reutner-Fischer, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2524 bytes --]

On 23/11/15 16:38, Matthew Wahab wrote:
> On 23/11/15 12:24, James Greenhalgh wrote:
>> On Tue, Oct 27, 2015 at 03:32:04PM +0000, Matthew Wahab wrote:
>>> On 24/10/15 08:16, Bernhard Reutner-Fischer wrote:
>>>> On October 23, 2015 2:24:26 PM GMT+02:00, Matthew Wahab
>>>> <matthew.wahab@foss.arm.com> wrote:
>>>>> The ARMv8.1 architecture extension adds two Adv.SIMD instructions,.
>>>>> This
>>>>> patch adds support in Dejagnu for ARMv8.1 Adv.SIMD specifiers and
>>>>> checks.
>>>>>
>>>>> The new test options are
>>>>> - { dg-add-options arm_v8_1a_neon }: Add compiler options needed to
>>>>>    enable ARMv8.1 Adv.SIMD.
>>>>> - { dg-require-effective-target arm_v8_1a_neon_hw }: Require a target
>>>>>    capable of executing ARMv8.1 Adv.SIMD instructions.
>>>>>

>>> +# Return 1 if the target supports executing the ARMv8.1 Adv.SIMD extension, 0
>>> +# otherwise.  The test is valid for AArch64.
>>> +
>>> +proc check_effective_target_arm_v8_1a_neon_hw { } {
>>> +    if { ![check_effective_target_arm_v8_1a_neon_ok] } {
>>> +    return 0;
>>> +    }
>>> +    return [check_runtime_nocache arm_v8_1a_neon_hw_available {
>>> +    int
>>> +    main (void)
>>> +    {
>>> +      long long a = 0, b = 1;
>>> +      long long result = 0;
>>> +
>>> +      asm ("sqrdmlah %s0,%s1,%s2"
>>> +           : "=w"(result)
>>> +           : "w"(a), "w"(b)
>>> +           : /* No clobbers.  */);
>>
>> Hm, those types look wrong, I guess this works but it is an unusual way
>> to write it. I presume this is to avoid including arm_neon.h each time, but
>> you could just directly use the internal type names for the arm_neon types.
>> That is to say __Int32x4_t (or whichever mode you intend to use).
>>
>
> I'll rework the patch to use the internal types names.

Attached, the reworked patch which uses internal type __Int32x2_t and
cleans up the assembler.

Retested aarch64-none-elf with cross-compiled check-gcc on an ARMv8.1
emulator. Also re-ran the cross-compiled
gcc.target/aarch64/advsimd-intrinsics tests for aarch64-none-elf on an
ARMv8 emulator.

Matthew

gcc/testsuite
2015-11-24  Matthew Wahab  <matthew.wahab@arm.com>

	* lib/target-supports.exp (add_options_for_arm_v8_1a_neon): New.
	(check_effective_target_arm_arch_FUNC_ok)
	(add_options_for_arm_arch_FUNC)
	(check_effective_target_arm_arch_FUNC_multilib): Add "armv8.1-a"
	to the list to be generated.
	(check_effective_target_arm_v8_1a_neon_ok_nocache): New.
	(check_effective_target_arm_v8_1a_neon_ok): New.
	(check_effective_target_arm_v8_1a_neon_hw): New.




[-- Attachment #2: 0005-Testsuite-Add-dejagnu-options-for-armv8.1-neon.patch --]
[-- Type: text/x-patch, Size: 3356 bytes --]

From 262c24946b2da5833a30b2e3e696bb7ea271059f Mon Sep 17 00:00:00 2001
From: Matthew Wahab <matthew.wahab@arm.com>
Date: Mon, 26 Oct 2015 14:58:36 +0000
Subject: [PATCH 5/7] [Testsuite] Add dejagnu options for armv8.1 neon

Change-Id: Ib58b8c4930ad3971af3ea682eda043e14cd2e8b3
---
 gcc/testsuite/lib/target-supports.exp | 57 ++++++++++++++++++++++++++++++++++-
 1 file changed, 56 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 3eb46f2..dcd51fd 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2816,6 +2816,16 @@ proc add_options_for_arm_v8_neon { flags } {
     return "$flags $et_arm_v8_neon_flags -march=armv8-a"
 }
 
+# Add the options needed for ARMv8.1 Adv.SIMD.
+
+proc add_options_for_arm_v8_1a_neon { flags } {
+    if { [istarget aarch64*-*-*] } {
+	return "$flags -march=armv8.1-a"
+    } else {
+	return "$flags"
+    }
+}
+
 proc add_options_for_arm_crc { flags } {
     if { ! [check_effective_target_arm_crc_ok] } {
         return "$flags"
@@ -3102,7 +3112,8 @@ foreach { armfunc armflag armdef } { v4 "-march=armv4 -marm" __ARM_ARCH_4__
 				     v7r "-march=armv7-r" __ARM_ARCH_7R__
 				     v7m "-march=armv7-m -mthumb" __ARM_ARCH_7M__
 				     v7em "-march=armv7e-m -mthumb" __ARM_ARCH_7EM__
-				     v8a "-march=armv8-a" __ARM_ARCH_8A__ } {
+				     v8a "-march=armv8-a" __ARM_ARCH_8A__
+				     v8_1a "-march=armv8.1a" __ARM_ARCH_8A__ } {
     eval [string map [list FUNC $armfunc FLAG $armflag DEF $armdef ] {
 	proc check_effective_target_arm_arch_FUNC_ok { } {
 	    if { [ string match "*-marm*" "FLAG" ] &&
@@ -3259,6 +3270,25 @@ proc check_effective_target_arm_neonv2_hw { } {
     } [add_options_for_arm_neonv2 ""]]
 }
 
+# Return 1 if the target supports the ARMv8.1 Adv.SIMD extension, 0
+# otherwise.  The test is valid for AArch64.
+
+proc check_effective_target_arm_v8_1a_neon_ok_nocache { } {
+    if { ![istarget aarch64*-*-*] } {
+	return 0
+    }
+    return [check_no_compiler_messages_nocache arm_v8_1a_neon_ok assembly {
+	#if !defined (__ARM_FEATURE_QRDMX)
+	#error "__ARM_FEATURE_QRDMX not defined"
+	#endif
+    } [add_options_for_arm_v8_1a_neon ""]]
+}
+
+proc check_effective_target_arm_v8_1a_neon_ok { } {
+    return [check_cached_effective_target arm_v8_1a_neon_ok \
+		check_effective_target_arm_v8_1a_neon_ok_nocache]
+}
+
 # Return 1 if the target supports executing ARMv8 NEON instructions, 0
 # otherwise.
 
@@ -3277,6 +3307,31 @@ proc check_effective_target_arm_v8_neon_hw { } {
     } [add_options_for_arm_v8_neon ""]]
 }
 
+# Return 1 if the target supports executing the ARMv8.1 Adv.SIMD extension, 0
+# otherwise.  The test is valid for AArch64.
+
+proc check_effective_target_arm_v8_1a_neon_hw { } {
+    if { ![check_effective_target_arm_v8_1a_neon_ok] } {
+	return 0;
+    }
+    return [check_runtime_nocache arm_v8_1a_neon_hw_available {
+	int
+	main (void)
+	{
+	  __Int32x2_t a = {0, 1};
+	  __Int32x2_t b = {0, 2};
+	  __Int32x2_t result;
+
+	  asm ("sqrdmlah %0.2s, %1.2s, %2.2s"
+	       : "=w"(result)
+	       : "w"(a), "w"(b)
+	       : /* No clobbers.  */);
+
+	  return result[0];
+	}
+    }  [add_options_for_arm_v8_1a_neon ""]]
+}
+
 # Return 1 if this is a ARM target with NEON enabled.
 
 proc check_effective_target_arm_neon { } {
-- 
2.1.4


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [AArch64][PATCH 6/7] Add NEON intrinsics vqrdmlah and vqrdmlsh.
  2015-11-23 13:37   ` James Greenhalgh
@ 2015-11-25 10:15     ` Matthew Wahab
  2015-11-25 10:58       ` James Greenhalgh
  0 siblings, 1 reply; 30+ messages in thread
From: Matthew Wahab @ 2015-11-25 10:15 UTC (permalink / raw)
  To: James Greenhalgh; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2381 bytes --]

On 23/11/15 13:35, James Greenhalgh wrote:
> On Fri, Oct 23, 2015 at 01:26:11PM +0100, Matthew Wahab wrote:
>> The ARMv8.1 architecture extension adds two Adv.SIMD instructions,
>> sqrdmlah and sqrdmlsh. This patch adds the NEON intrinsics vqrdmlah and
>> vqrdmlsh for these instructions. The new intrinsics are of the form
>> vqrdml{as}h[q]_<type>.
>>

>> diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
>> index e186348..9e73809 100644
>> --- a/gcc/config/aarch64/arm_neon.h
>> +++ b/gcc/config/aarch64/arm_neon.h
>> @@ -2649,6 +2649,59 @@ vqrdmulhq_s32 (int32x4_t __a, int32x4_t __b)
>>     return (int32x4_t) __builtin_aarch64_sqrdmulhv4si (__a, __b);
>>   }
>>
>> +#pragma GCC push_options
>> +#pragma GCC target ("arch=armv8.1-a")
>
> Can we please patch the documentation to make it clear that -march=armv8.1-a
> always implies -march=armv8.1-a+rdma ? The documentation around which
> feature modifiers are implied when leaves much to be desired.

I'll rework the documentation as part of the (separate) command lines clean-up patch.

>> +
>> +__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
>> +vqrdmlah_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c)
>> +{
>> +  return (int16x4_t) __builtin_aarch64_sqrdmlahv4hi (__a, __b, __c);
>
> We don't need this cast (likewise the other instances)?
>

Attached, a reworked patch that removes the casts from the new
intrinsics. It also moves the new intrinsics to before the crypto
intrinsics. The intention is that the intrinsics added in this and the
next patch in the set are put in the same place and bracketed by a
single target pragma.

Retested aarch64-none-elf with cross-compiled check-gcc on an ARMv8.1
emulator. Also re-ran the cross-compiled
gcc.target/aarch64/advsimd-intrinsics tests for aarch64-none-elf on an
ARMv8 emulator.

Matthew

gcc/
2015-11-24  Matthew Wahab  <matthew.wahab@arm.com>

	* gcc/config/aarch64/arm_neon.h (vqrdmlah_s16, vqrdmlah_s32): New.
	(vqrdmlahq_s16, vqrdmlahq_s32): New.
	(vqrdmlsh_s16, vqrdmlsh_s32): New.
	(vqrdmlshq_s16, vqrdmlshq_s32): New.

gcc/testsuite
2015-11-24  Matthew Wahab  <matthew.wahab@arm.com>

	* gcc.target/aarch64/advsimd-intrinsics/vqrdmlXh.inc: New file,
	support code for vqrdml{as}h tests.
	* gcc.target/aarch64/advsimd-intrinsics/vqrdmlah.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vqrdmlsh.c: New.



[-- Attachment #2: 0006-Add-neon-intrinsics-vqrdmlah-vqrdmlsh.patch --]
[-- Type: text/x-patch, Size: 14589 bytes --]

From e623828ac2d033a9a51766d9843a650aab9f42e9 Mon Sep 17 00:00:00 2001
From: Matthew Wahab <matthew.wahab@arm.com>
Date: Thu, 27 Aug 2015 13:22:41 +0100
Subject: [PATCH 6/7] Add neon intrinsics: vqrdmlah, vqrdmlsh.

Change-Id: I5c7f8d36ee980d280c1d50f6f212b286084c5acf
---
 gcc/config/aarch64/arm_neon.h                      |  53 ++++++++
 .../aarch64/advsimd-intrinsics/vqrdmlXh.inc        | 138 +++++++++++++++++++++
 .../aarch64/advsimd-intrinsics/vqrdmlah.c          |  57 +++++++++
 .../aarch64/advsimd-intrinsics/vqrdmlsh.c          |  61 +++++++++
 4 files changed, 309 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlXh.inc
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlah.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlsh.c

diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 138b108..63f1627 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -11213,6 +11213,59 @@ vbslq_u64 (uint64x2_t __a, uint64x2_t __b, uint64x2_t __c)
   return __builtin_aarch64_simd_bslv2di_uuuu (__a, __b, __c);
 }
 
+/* ARMv8.1 instrinsics.  */
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8.1-a")
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqrdmlah_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c)
+{
+  return __builtin_aarch64_sqrdmlahv4hi (__a, __b, __c);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqrdmlah_s32 (int32x2_t __a, int32x2_t __b, int32x2_t __c)
+{
+  return __builtin_aarch64_sqrdmlahv2si (__a, __b, __c);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vqrdmlahq_s16 (int16x8_t __a, int16x8_t __b, int16x8_t __c)
+{
+  return __builtin_aarch64_sqrdmlahv8hi (__a, __b, __c);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqrdmlahq_s32 (int32x4_t __a, int32x4_t __b, int32x4_t __c)
+{
+  return __builtin_aarch64_sqrdmlahv4si (__a, __b, __c);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqrdmlsh_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c)
+{
+  return __builtin_aarch64_sqrdmlshv4hi (__a, __b, __c);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqrdmlsh_s32 (int32x2_t __a, int32x2_t __b, int32x2_t __c)
+{
+  return __builtin_aarch64_sqrdmlshv2si (__a, __b, __c);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vqrdmlshq_s16 (int16x8_t __a, int16x8_t __b, int16x8_t __c)
+{
+  return __builtin_aarch64_sqrdmlshv8hi (__a, __b, __c);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqrdmlshq_s32 (int32x4_t __a, int32x4_t __b, int32x4_t __c)
+{
+  return __builtin_aarch64_sqrdmlshv4si (__a, __b, __c);
+}
+#pragma GCC pop_options
+
 #pragma GCC push_options
 #pragma GCC target ("+nothing+crypto")
 /* vaes  */
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlXh.inc b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlXh.inc
new file mode 100644
index 0000000..a504ca6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlXh.inc
@@ -0,0 +1,138 @@
+#define FNNAME1(NAME) exec_ ## NAME
+#define FNNAME(NAME) FNNAME1 (NAME)
+
+void FNNAME (INSN) (void)
+{
+  /* vector_res = vqrdmlah (vector, vector2, vector3, vector4),
+     then store the result.  */
+#define TEST_VQRDMLAH2(INSN, Q, T1, T2, W, N, EXPECTED_CUMULATIVE_SAT, CMT) \
+  Set_Neon_Cumulative_Sat (0, VECT_VAR (vector_res, T1, W, N));		\
+  VECT_VAR (vector_res, T1, W, N) =					\
+    INSN##Q##_##T2##W (VECT_VAR (vector, T1, W, N),			\
+		       VECT_VAR (vector2, T1, W, N),			\
+		       VECT_VAR (vector3, T1, W, N));			\
+  vst1##Q##_##T2##W (VECT_VAR (result, T1, W, N),			\
+		     VECT_VAR (vector_res, T1, W, N));			\
+  CHECK_CUMULATIVE_SAT (TEST_MSG, T1, W, N,				\
+			EXPECTED_CUMULATIVE_SAT, CMT)
+
+  /* Two auxliary macros are necessary to expand INSN.  */
+#define TEST_VQRDMLAH1(INSN, Q, T1, T2, W, N, EXPECTED_CUMULATIVE_SAT, CMT) \
+  TEST_VQRDMLAH2 (INSN, Q, T1, T2, W, N, EXPECTED_CUMULATIVE_SAT, CMT)
+
+#define TEST_VQRDMLAH(Q, T1, T2, W, N, EXPECTED_CUMULATIVE_SAT, CMT)	\
+  TEST_VQRDMLAH1 (INSN, Q, T1, T2, W, N, EXPECTED_CUMULATIVE_SAT, CMT)
+
+  DECL_VARIABLE (vector, int, 16, 4);
+  DECL_VARIABLE (vector, int, 32, 2);
+  DECL_VARIABLE (vector, int, 16, 8);
+  DECL_VARIABLE (vector, int, 32, 4);
+
+  DECL_VARIABLE (vector_res, int, 16, 4);
+  DECL_VARIABLE (vector_res, int, 32, 2);
+  DECL_VARIABLE (vector_res, int, 16, 8);
+  DECL_VARIABLE (vector_res, int, 32, 4);
+
+  DECL_VARIABLE (vector2, int, 16, 4);
+  DECL_VARIABLE (vector2, int, 32, 2);
+  DECL_VARIABLE (vector2, int, 16, 8);
+  DECL_VARIABLE (vector2, int, 32, 4);
+
+  DECL_VARIABLE (vector3, int, 16, 4);
+  DECL_VARIABLE (vector3, int, 32, 2);
+  DECL_VARIABLE (vector3, int, 16, 8);
+  DECL_VARIABLE (vector3, int, 32, 4);
+
+  clean_results ();
+
+  VLOAD (vector, buffer, , int, s, 16, 4);
+  VLOAD (vector, buffer, , int, s, 32, 2);
+  VLOAD (vector, buffer, q, int, s, 16, 8);
+  VLOAD (vector, buffer, q, int, s, 32, 4);
+
+  /* Initialize vector2.  */
+  VDUP (vector2, , int, s, 16, 4, 0x5555);
+  VDUP (vector2, , int, s, 32, 2, 0xBB);
+  VDUP (vector2, q, int, s, 16, 8, 0xBB);
+  VDUP (vector2, q, int, s, 32, 4, 0x22);
+
+  /* Initialize vector3.  */
+  VDUP (vector3, , int, s, 16, 4, 0x5555);
+  VDUP (vector3, , int, s, 32, 2, 0xBB);
+  VDUP (vector3, q, int, s, 16, 8, 0x33);
+  VDUP (vector3, q, int, s, 32, 4, 0x22);
+
+#define CMT ""
+  TEST_VQRDMLAH ( , int, s, 16, 4, expected_cumulative_sat, CMT);
+  TEST_VQRDMLAH ( , int, s, 32, 2, expected_cumulative_sat, CMT);
+  TEST_VQRDMLAH (q, int, s, 16, 8, expected_cumulative_sat, CMT);
+  TEST_VQRDMLAH (q, int, s, 32, 4, expected_cumulative_sat, CMT);
+
+  CHECK (TEST_MSG, int, 16, 4, PRIx16, expected, CMT);
+  CHECK (TEST_MSG, int, 32, 2, PRIx32, expected, CMT);
+  CHECK (TEST_MSG, int, 16, 8, PRIx16, expected, CMT);
+  CHECK (TEST_MSG, int, 32, 4, PRIx32, expected, CMT);
+
+  /* Now use input values such that the multiplication causes
+     saturation.  */
+#define TEST_MSG_MUL " (check mul cumulative saturation)"
+  VDUP (vector, , int, s, 16, 4, 0x8000);
+  VDUP (vector, , int, s, 32, 2, 0x80000000);
+  VDUP (vector, q, int, s, 16, 8, 0x8000);
+  VDUP (vector, q, int, s, 32, 4, 0x80000000);
+  VDUP (vector2, , int, s, 16, 4, 0x8000);
+  VDUP (vector2, , int, s, 32, 2, 0x80000000);
+  VDUP (vector2, q, int, s, 16, 8, 0x8000);
+  VDUP (vector2, q, int, s, 32, 4, 0x80000000);
+  VDUP (vector3, , int, s, 16, 4, 0x8000);
+  VDUP (vector3, , int, s, 32, 2, 0x80000000);
+  VDUP (vector3, q, int, s, 16, 8, 0x8000);
+  VDUP (vector3, q, int, s, 32, 4, 0x80000000);
+
+  TEST_VQRDMLAH ( , int, s, 16, 4, expected_cumulative_sat_mul, TEST_MSG_MUL);
+  TEST_VQRDMLAH ( , int, s, 32, 2, expected_cumulative_sat_mul, TEST_MSG_MUL);
+  TEST_VQRDMLAH (q, int, s, 16, 8, expected_cumulative_sat_mul, TEST_MSG_MUL);
+  TEST_VQRDMLAH (q, int, s, 32, 4, expected_cumulative_sat_mul, TEST_MSG_MUL);
+
+  CHECK (TEST_MSG, int, 16, 4, PRIx16, expected_mul, TEST_MSG_MUL);
+  CHECK (TEST_MSG, int, 32, 2, PRIx32, expected_mul, TEST_MSG_MUL);
+  CHECK (TEST_MSG, int, 16, 8, PRIx16, expected_mul, TEST_MSG_MUL);
+  CHECK (TEST_MSG, int, 32, 4, PRIx32, expected_mul, TEST_MSG_MUL);
+
+  /* Use input values where rounding produces a result equal to the
+     saturation value, but does not set the saturation flag.  */
+#define TEST_MSG_ROUND " (check rounding)"
+  VDUP (vector, , int, s, 16, 4, 0x8000);
+  VDUP (vector, , int, s, 32, 2, 0x80000000);
+  VDUP (vector, q, int, s, 16, 8, 0x8000);
+  VDUP (vector, q, int, s, 32, 4, 0x80000000);
+  VDUP (vector2, , int, s, 16, 4, 0x8001);
+  VDUP (vector2, , int, s, 32, 2, 0x80000001);
+  VDUP (vector2, q, int, s, 16, 8, 0x8001);
+  VDUP (vector2, q, int, s, 32, 4, 0x80000001);
+  VDUP (vector3, , int, s, 16, 4, 0x8001);
+  VDUP (vector3, , int, s, 32, 2, 0x80000001);
+  VDUP (vector3, q, int, s, 16, 8, 0x8001);
+  VDUP (vector3, q, int, s, 32, 4, 0x80000001);
+
+  TEST_VQRDMLAH ( , int, s, 16, 4, expected_cumulative_sat_round, \
+		 TEST_MSG_ROUND);
+  TEST_VQRDMLAH ( , int, s, 32, 2, expected_cumulative_sat_round, \
+		 TEST_MSG_ROUND);
+  TEST_VQRDMLAH (q, int, s, 16, 8, expected_cumulative_sat_round, \
+		 TEST_MSG_ROUND);
+  TEST_VQRDMLAH (q, int, s, 32, 4, expected_cumulative_sat_round, \
+		 TEST_MSG_ROUND);
+
+  CHECK (TEST_MSG, int, 16, 4, PRIx16, expected_round, TEST_MSG_ROUND);
+  CHECK (TEST_MSG, int, 32, 2, PRIx32, expected_round, TEST_MSG_ROUND);
+  CHECK (TEST_MSG, int, 16, 8, PRIx16, expected_round, TEST_MSG_ROUND);
+  CHECK (TEST_MSG, int, 32, 4, PRIx32, expected_round, TEST_MSG_ROUND);
+}
+
+int
+main (void)
+{
+  FNNAME (INSN) ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlah.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlah.c
new file mode 100644
index 0000000..148d94c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlah.c
@@ -0,0 +1,57 @@
+/* { dg-require-effective-target arm_v8_1a_neon_hw } */
+/* { dg-add-options arm_v8_1a_neon } */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+/* Expected values of cumulative_saturation flag.  */
+int VECT_VAR (expected_cumulative_sat, int, 16, 4) = 0;
+int VECT_VAR (expected_cumulative_sat, int, 32, 2) = 0;
+int VECT_VAR (expected_cumulative_sat, int, 16, 8) = 0;
+int VECT_VAR (expected_cumulative_sat, int, 32, 4) = 0;
+
+/* Expected results.  */
+VECT_VAR_DECL (expected, int, 16, 4) [] = { 0x38d3, 0x38d4, 0x38d5, 0x38d6 };
+VECT_VAR_DECL (expected, int, 32, 2) [] = { 0xfffffff0, 0xfffffff1 };
+VECT_VAR_DECL (expected, int, 16, 8) [] = { 0xfff0,  0xfff1, 0xfff2,  0xfff3,
+					    0xfff4, 0xfff5, 0xfff6, 0xfff7 };
+VECT_VAR_DECL (expected, int, 32, 4) [] = { 0xfffffff0, 0xfffffff1,
+					    0xfffffff2, 0xfffffff3 };
+
+/* Expected values of cumulative_saturation flag when multiplication
+   saturates.  */
+int VECT_VAR (expected_cumulative_sat_mul, int, 16, 4) = 0;
+int VECT_VAR (expected_cumulative_sat_mul, int, 32, 2) = 0;
+int VECT_VAR (expected_cumulative_sat_mul, int, 16, 8) = 0;
+int VECT_VAR (expected_cumulative_sat_mul, int, 32, 4) = 0;
+
+/* Expected results when multiplication saturates.  */
+VECT_VAR_DECL (expected_mul, int, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL (expected_mul, int, 32, 2) [] = { 0x0, 0x0 };
+VECT_VAR_DECL (expected_mul, int, 16, 8) [] = { 0x0, 0x0, 0x0, 0x0,
+						0x0, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL (expected_mul, int, 32, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
+
+/* Expected values of cumulative_saturation flag when rounding
+   should not cause saturation.  */
+int VECT_VAR (expected_cumulative_sat_round, int, 16, 4) = 0;
+int VECT_VAR (expected_cumulative_sat_round, int, 32, 2) = 0;
+int VECT_VAR (expected_cumulative_sat_round, int, 16, 8) = 0;
+int VECT_VAR (expected_cumulative_sat_round, int, 32, 4) = 0;
+
+/* Expected results when rounding should not cause saturation.  */
+VECT_VAR_DECL (expected_round, int, 16, 4) [] = { 0xfffe, 0xfffe,
+						  0xfffe, 0xfffe };
+VECT_VAR_DECL (expected_round, int, 32, 2) [] = { 0xfffffffe, 0xfffffffe };
+VECT_VAR_DECL (expected_round, int, 16, 8) [] = { 0xfffe, 0xfffe,
+						  0xfffe, 0xfffe,
+						  0xfffe, 0xfffe,
+						  0xfffe, 0xfffe };
+VECT_VAR_DECL (expected_round, int, 32, 4) [] = { 0xfffffffe, 0xfffffffe,
+						  0xfffffffe, 0xfffffffe };
+
+#define INSN vqrdmlah
+#define TEST_MSG "VQRDMLAH"
+
+#include "vqrdmlXh.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlsh.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlsh.c
new file mode 100644
index 0000000..91c3b34
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlsh.c
@@ -0,0 +1,61 @@
+/* { dg-require-effective-target arm_v8_1a_neon_hw } */
+/* { dg-add-options arm_v8_1a_neon } */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+/* Expected values of cumulative_saturation flag.  */
+int VECT_VAR (expected_cumulative_sat, int, 16, 4) = 0;
+int VECT_VAR (expected_cumulative_sat, int, 32, 2) = 0;
+int VECT_VAR (expected_cumulative_sat, int, 16, 8) = 0;
+int VECT_VAR (expected_cumulative_sat, int, 32, 4) = 0;
+
+/* Expected results.  */
+VECT_VAR_DECL (expected, int, 16, 4) [] = { 0xc70d, 0xc70e, 0xc70f, 0xc710 };
+VECT_VAR_DECL (expected, int, 32, 2) [] = { 0xfffffff0, 0xfffffff1 };
+VECT_VAR_DECL (expected, int, 16, 8) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3,
+					    0xfff4, 0xfff5, 0xfff6, 0xfff7 };
+VECT_VAR_DECL (expected, int, 32, 4) [] = { 0xfffffff0, 0xfffffff1,
+					    0xfffffff2, 0xfffffff3 };
+
+/* Expected values of cumulative_saturation flag when multiplication
+   saturates.  */
+int VECT_VAR (expected_cumulative_sat_mul, int, 16, 4) = 1;
+int VECT_VAR (expected_cumulative_sat_mul, int, 32, 2) = 1;
+int VECT_VAR (expected_cumulative_sat_mul, int, 16, 8) = 1;
+int VECT_VAR (expected_cumulative_sat_mul, int, 32, 4) = 1;
+
+/* Expected results when multiplication saturates.  */
+VECT_VAR_DECL (expected_mul, int, 16, 4) [] = { 0x8000, 0x8000,
+						0x8000, 0x8000 };
+VECT_VAR_DECL (expected_mul, int, 32, 2) [] = { 0x80000000, 0x80000000 };
+VECT_VAR_DECL (expected_mul, int, 16, 8) [] = { 0x8000, 0x8000,
+						0x8000, 0x8000,
+						0x8000, 0x8000,
+						0x8000, 0x8000 };
+VECT_VAR_DECL (expected_mul, int, 32, 4) [] = { 0x80000000, 0x80000000,
+						0x80000000, 0x80000000 };
+
+/* Expected values of cumulative_saturation flag when rounding
+   should not cause saturation.  */
+int VECT_VAR (expected_cumulative_sat_round, int, 16, 4) = 1;
+int VECT_VAR (expected_cumulative_sat_round, int, 32, 2) = 1;
+int VECT_VAR (expected_cumulative_sat_round, int, 16, 8) = 1;
+int VECT_VAR (expected_cumulative_sat_round, int, 32, 4) = 1;
+
+/* Expected results when rounding should not cause saturation.  */
+VECT_VAR_DECL (expected_round, int, 16, 4) [] = { 0x8000, 0x8000,
+						  0x8000, 0x8000 };
+VECT_VAR_DECL (expected_round, int, 32, 2) [] = { 0x80000000, 0x80000000 };
+VECT_VAR_DECL (expected_round, int, 16, 8) [] = { 0x8000, 0x8000,
+						  0x8000, 0x8000,
+						  0x8000, 0x8000,
+						  0x8000, 0x8000 };
+VECT_VAR_DECL (expected_round, int, 32, 4) [] = { 0x80000000, 0x80000000,
+						  0x80000000, 0x80000000 };
+
+#define INSN vqrdmlsh
+#define TEST_MSG "VQRDMLSH"
+
+#include "vqrdmlXh.inc"
-- 
2.1.4


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [AArch64][PATCH 7/7] Add NEON intrinsics vqrdmlah_lane and vqrdmlsh_lane.
  2015-11-23 13:45   ` James Greenhalgh
@ 2015-11-25 10:25     ` Matthew Wahab
  2015-11-25 11:11       ` James Greenhalgh
  0 siblings, 1 reply; 30+ messages in thread
From: Matthew Wahab @ 2015-11-25 10:25 UTC (permalink / raw)
  To: James Greenhalgh; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2321 bytes --]

On 23/11/15 13:37, James Greenhalgh wrote:
> On Fri, Oct 23, 2015 at 01:30:46PM +0100, Matthew Wahab wrote:
>> The ARMv8.1 architecture extension adds two Adv.SIMD instructions,
>> sqrdmlah and sqrdmlsh. This patch adds the NEON intrinsics vqrdmlah_lane
>> and vqrdmlsh_lane for these instructions. The new intrinsics are of the
>> form vqrdml{as}h[q]_lane_<type>.
>>

>> diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
>> index 9e73809..9b68e4a 100644
>> --- a/gcc/config/aarch64/arm_neon.h
>> +++ b/gcc/config/aarch64/arm_neon.h
>> @@ -10675,6 +10675,59 @@ vqrdmulhq_laneq_s32 (int32x4_t __a, int32x4_t __b, const int __c)
>>     return __builtin_aarch64_sqrdmulh_laneqv4si (__a, __b, __c);
>>   }
>>
>> +#pragma GCC push_options
>> +#pragma GCC target ("arch=armv8.1-a")
>
> Rather than strict alphabetical order, can we group everything which is
> under one set of extensions together, to save on the push_options/pop_options
> pairs.
>

Attached the reworked patch that keeps the ARMv8.1 intrinsics together,
bracketed by a single target pragma.

Retested aarch64-none-elf with cross-compiled check-gcc on an ARMv8.1
emulator. Also re-ran the cross-compiled
gcc.target/aarch64/advsimd-intrinsics tests for aarch64-none-elf on an
ARMv8 emulator.

Matthew

gcc/
2015-11-24  Matthew Wahab  <matthew.wahab@arm.com>

	* gcc/config/aarch64/arm_neon.h
	(vqrdmlah_laneq_s16, vqrdmlah_laneq_s32): New.
	(vqrdmlahq_laneq_s16, vqrdmlahq_laneq_s32): New.
	(vqrdmlsh_lane_s16, vqrdmlsh_lane_s32): New.
	(vqrdmlshq_laneq_s16, vqrdmlshq_laneq_s32): New.
	(vqrdmlah_lane_s16, vqrdmlah_lane_s32): New.
	(vqrdmlahq_lane_s16, vqrdmlahq_lane_s32): New.
	(vqrdmlahh_s16, vqrdmlahh_lane_s16, vqrdmlahh_laneq_s16): New.
	(vqrdmlahs_s32, vqrdmlahs_lane_s32, vqrdmlahs_laneq_s32): New.
	(vqrdmlsh_lane_s16, vqrdmlsh_lane_s32): New.
	(vqrdmlshq_lane_s16, vqrdmlshq_lane_s32): New.
	(vqrdmlshh_s16, vqrdmlshh_lane_s16, vqrdmlshh_laneq_s16): New.
	(vqrdmlshs_s32, vqrdmlshs_lane_s32, vqrdmlshs_laneq_s32): New.

gcc/testsuite
2015-11-24  Matthew Wahab  <matthew.wahab@arm.com>

	* gcc.target/aarch64/advsimd-intrinsics/vqrdmlXh_lane.inc: New file,
	support code for vqrdml{as}h_lane tests.
	* gcc.target/aarch64/advsimd-intrinsics/vqrdmlah_lane.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vqrdmlsh_lane.c: New.


[-- Attachment #2: 0007-Add-neon-intrinsics-vqrdmlah_lane-vqrdmlsh_lane.patch --]
[-- Type: text/x-patch, Size: 19650 bytes --]

From 03cb214eaf07cceb65f0dc07dca1be739bfe5375 Mon Sep 17 00:00:00 2001
From: Matthew Wahab <matthew.wahab@arm.com>
Date: Thu, 27 Aug 2015 14:17:26 +0100
Subject: [PATCH 7/7] Add neon intrinsics: vqrdmlah_lane, vqrdmlsh_lane.

Change-Id: I6d7a372e0a5b83ef0846ab62abbe9b24ada69fc4
---
 gcc/config/aarch64/arm_neon.h                      | 168 +++++++++++++++++++++
 .../aarch64/advsimd-intrinsics/vqrdmlXh_lane.inc   | 154 +++++++++++++++++++
 .../aarch64/advsimd-intrinsics/vqrdmlah_lane.c     |  57 +++++++
 .../aarch64/advsimd-intrinsics/vqrdmlsh_lane.c     |  61 ++++++++
 4 files changed, 440 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlXh_lane.inc
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlah_lane.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlsh_lane.c

diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 63f1627..56db339 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -11264,6 +11264,174 @@ vqrdmlshq_s32 (int32x4_t __a, int32x4_t __b, int32x4_t __c)
 {
   return __builtin_aarch64_sqrdmlshv4si (__a, __b, __c);
 }
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqrdmlah_laneq_s16 (int16x4_t __a, int16x4_t __b, int16x8_t __c, const int __d)
+{
+  return  __builtin_aarch64_sqrdmlah_laneqv4hi (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqrdmlah_laneq_s32 (int32x2_t __a, int32x2_t __b, int32x4_t __c, const int __d)
+{
+  return __builtin_aarch64_sqrdmlah_laneqv2si (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vqrdmlahq_laneq_s16 (int16x8_t __a, int16x8_t __b, int16x8_t __c, const int __d)
+{
+  return __builtin_aarch64_sqrdmlah_laneqv8hi (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqrdmlahq_laneq_s32 (int32x4_t __a, int32x4_t __b, int32x4_t __c, const int __d)
+{
+  return __builtin_aarch64_sqrdmlah_laneqv4si (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqrdmlsh_laneq_s16 (int16x4_t __a, int16x4_t __b, int16x8_t __c, const int __d)
+{
+  return  __builtin_aarch64_sqrdmlsh_laneqv4hi (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqrdmlsh_laneq_s32 (int32x2_t __a, int32x2_t __b, int32x4_t __c, const int __d)
+{
+  return __builtin_aarch64_sqrdmlsh_laneqv2si (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vqrdmlshq_laneq_s16 (int16x8_t __a, int16x8_t __b, int16x8_t __c, const int __d)
+{
+  return __builtin_aarch64_sqrdmlsh_laneqv8hi (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqrdmlshq_laneq_s32 (int32x4_t __a, int32x4_t __b, int32x4_t __c, const int __d)
+{
+  return __builtin_aarch64_sqrdmlsh_laneqv4si (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqrdmlah_lane_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c, const int __d)
+{
+  return  __builtin_aarch64_sqrdmlah_lanev4hi (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqrdmlah_lane_s32 (int32x2_t __a, int32x2_t __b, int32x2_t __c, const int __d)
+{
+  return __builtin_aarch64_sqrdmlah_lanev2si (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vqrdmlahq_lane_s16 (int16x8_t __a, int16x8_t __b, int16x4_t __c, const int __d)
+{
+  return __builtin_aarch64_sqrdmlah_lanev8hi (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqrdmlahq_lane_s32 (int32x4_t __a, int32x4_t __b, int32x2_t __c, const int __d)
+{
+  return __builtin_aarch64_sqrdmlah_lanev4si (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+vqrdmlahh_s16 (int16_t __a, int16_t __b, int16_t __c)
+{
+  return (int16_t) __builtin_aarch64_sqrdmlahhi (__a, __b, __c);
+}
+
+__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+vqrdmlahh_lane_s16 (int16_t __a, int16_t __b, int16x4_t __c, const int __d)
+{
+  return __builtin_aarch64_sqrdmlah_lanehi (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+vqrdmlahh_laneq_s16 (int16_t __a, int16_t __b, int16x8_t __c, const int __d)
+{
+  return __builtin_aarch64_sqrdmlah_laneqhi (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+vqrdmlahs_s32 (int32_t __a, int32_t __b, int32_t __c)
+{
+  return (int32_t) __builtin_aarch64_sqrdmlahsi (__a, __b, __c);
+}
+
+__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+vqrdmlahs_lane_s32 (int32_t __a, int32_t __b, int32x2_t __c, const int __d)
+{
+  return __builtin_aarch64_sqrdmlah_lanesi (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+vqrdmlahs_laneq_s32 (int32_t __a, int32_t __b, int32x4_t __c, const int __d)
+{
+  return __builtin_aarch64_sqrdmlah_laneqsi (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqrdmlsh_lane_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c, const int __d)
+{
+  return  __builtin_aarch64_sqrdmlsh_lanev4hi (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqrdmlsh_lane_s32 (int32x2_t __a, int32x2_t __b, int32x2_t __c, const int __d)
+{
+  return __builtin_aarch64_sqrdmlsh_lanev2si (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vqrdmlshq_lane_s16 (int16x8_t __a, int16x8_t __b, int16x4_t __c, const int __d)
+{
+  return __builtin_aarch64_sqrdmlsh_lanev8hi (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqrdmlshq_lane_s32 (int32x4_t __a, int32x4_t __b, int32x2_t __c, const int __d)
+{
+  return __builtin_aarch64_sqrdmlsh_lanev4si (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+vqrdmlshh_s16 (int16_t __a, int16_t __b, int16_t __c)
+{
+  return (int16_t) __builtin_aarch64_sqrdmlshhi (__a, __b, __c);
+}
+
+__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+vqrdmlshh_lane_s16 (int16_t __a, int16_t __b, int16x4_t __c, const int __d)
+{
+  return __builtin_aarch64_sqrdmlsh_lanehi (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+vqrdmlshh_laneq_s16 (int16_t __a, int16_t __b, int16x8_t __c, const int __d)
+{
+  return __builtin_aarch64_sqrdmlsh_laneqhi (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+vqrdmlshs_s32 (int32_t __a, int32_t __b, int32_t __c)
+{
+  return (int32_t) __builtin_aarch64_sqrdmlshsi (__a, __b, __c);
+}
+
+__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+vqrdmlshs_lane_s32 (int32_t __a, int32_t __b, int32x2_t __c, const int __d)
+{
+  return __builtin_aarch64_sqrdmlsh_lanesi (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+vqrdmlshs_laneq_s32 (int32_t __a, int32_t __b, int32x4_t __c, const int __d)
+{
+  return __builtin_aarch64_sqrdmlsh_laneqsi (__a, __b, __c, __d);
+}
 #pragma GCC pop_options
 
 #pragma GCC push_options
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlXh_lane.inc b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlXh_lane.inc
new file mode 100644
index 0000000..a855502
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlXh_lane.inc
@@ -0,0 +1,154 @@
+#define FNNAME1(NAME) exec_ ## NAME ## _lane
+#define FNNAME(NAME) FNNAME1 (NAME)
+
+void FNNAME (INSN) (void)
+{
+  /* vector_res = vqrdmlXh_lane (vector, vector2, vector3, lane),
+     then store the result.  */
+#define TEST_VQRDMLXH_LANE2(INSN, Q, T1, T2, W, N, N2, L,		\
+			    EXPECTED_CUMULATIVE_SAT, CMT)		\
+  Set_Neon_Cumulative_Sat (0, VECT_VAR (vector_res, T1, W, N));		\
+  VECT_VAR (vector_res, T1, W, N) =					\
+    INSN##Q##_lane_##T2##W (VECT_VAR (vector, T1, W, N),		\
+			    VECT_VAR (vector2, T1, W, N),		\
+			    VECT_VAR (vector3, T1, W, N2),		\
+			    L);						\
+  vst1##Q##_##T2##W (VECT_VAR (result, T1, W, N),			\
+		     VECT_VAR (vector_res, T1, W, N));			\
+  CHECK_CUMULATIVE_SAT (TEST_MSG, T1, W, N, EXPECTED_CUMULATIVE_SAT, CMT)
+
+  /* Two auxliary macros are necessary to expand INSN.  */
+#define TEST_VQRDMLXH_LANE1(INSN, Q, T1, T2, W, N, N2, L,	\
+			    EXPECTED_CUMULATIVE_SAT, CMT)	\
+  TEST_VQRDMLXH_LANE2 (INSN, Q, T1, T2, W, N, N2, L,		\
+		       EXPECTED_CUMULATIVE_SAT, CMT)
+
+#define TEST_VQRDMLXH_LANE(Q, T1, T2, W, N, N2, L,		\
+			   EXPECTED_CUMULATIVE_SAT, CMT)	\
+  TEST_VQRDMLXH_LANE1 (INSN, Q, T1, T2, W, N, N2, L,		\
+		       EXPECTED_CUMULATIVE_SAT, CMT)
+
+
+  DECL_VARIABLE (vector, int, 16, 4);
+  DECL_VARIABLE (vector, int, 32, 2);
+  DECL_VARIABLE (vector, int, 16, 8);
+  DECL_VARIABLE (vector, int, 32, 4);
+
+  DECL_VARIABLE (vector_res, int, 16, 4);
+  DECL_VARIABLE (vector_res, int, 32, 2);
+  DECL_VARIABLE (vector_res, int, 16, 8);
+  DECL_VARIABLE (vector_res, int, 32, 4);
+
+  DECL_VARIABLE (vector2, int, 16, 4);
+  DECL_VARIABLE (vector2, int, 32, 2);
+  DECL_VARIABLE (vector2, int, 16, 8);
+  DECL_VARIABLE (vector2, int, 32, 4);
+
+  DECL_VARIABLE (vector3, int, 16, 4);
+  DECL_VARIABLE (vector3, int, 32, 2);
+  DECL_VARIABLE (vector3, int, 16, 8);
+  DECL_VARIABLE (vector3, int, 32, 4);
+
+  clean_results ();
+
+  VLOAD (vector, buffer, , int, s, 16, 4);
+  VLOAD (vector, buffer, , int, s, 32, 2);
+
+  VLOAD (vector, buffer, q, int, s, 16, 8);
+  VLOAD (vector, buffer, q, int, s, 32, 4);
+
+  /* Initialize vector2.  */
+  VDUP (vector2, , int, s, 16, 4, 0x5555);
+  VDUP (vector2, , int, s, 32, 2, 0xBB);
+  VDUP (vector2, q, int, s, 16, 8, 0xBB);
+  VDUP (vector2, q, int, s, 32, 4, 0x22);
+
+  /* Initialize vector3.  */
+  VDUP (vector3, , int, s, 16, 4, 0x5555);
+  VDUP (vector3, , int, s, 32, 2, 0xBB);
+  VDUP (vector3, q, int, s, 16, 8, 0x33);
+  VDUP (vector3, q, int, s, 32, 4, 0x22);
+
+  /* Choose lane arbitrarily.  */
+#define CMT ""
+  TEST_VQRDMLXH_LANE (, int, s, 16, 4, 4, 2, expected_cumulative_sat, CMT);
+  TEST_VQRDMLXH_LANE (, int, s, 32, 2, 2, 1, expected_cumulative_sat, CMT);
+  TEST_VQRDMLXH_LANE (q, int, s, 16, 8, 4, 3, expected_cumulative_sat, CMT);
+  TEST_VQRDMLXH_LANE (q, int, s, 32, 4, 2, 0, expected_cumulative_sat, CMT);
+
+  CHECK (TEST_MSG, int, 16, 4, PRIx16, expected, CMT);
+  CHECK (TEST_MSG, int, 32, 2, PRIx32, expected, CMT);
+  CHECK (TEST_MSG, int, 16, 8, PRIx16, expected, CMT);
+  CHECK (TEST_MSG, int, 32, 4, PRIx32, expected, CMT);
+
+  /* Now use input values such that the multiplication causes
+     saturation.  */
+#define TEST_MSG_MUL " (check mul cumulative saturation)"
+  VDUP (vector, , int, s, 16, 4, 0x8000);
+  VDUP (vector, , int, s, 32, 2, 0x80000000);
+  VDUP (vector, q, int, s, 16, 8, 0x8000);
+  VDUP (vector, q, int, s, 32, 4, 0x80000000);
+
+  VDUP (vector2, , int, s, 16, 4, 0x8000);
+  VDUP (vector2, , int, s, 32, 2, 0x80000000);
+  VDUP (vector2, q, int, s, 16, 8, 0x8000);
+  VDUP (vector2, q, int, s, 32, 4, 0x80000000);
+
+  VDUP (vector3, , int, s, 16, 4, 0x8000);
+  VDUP (vector3, , int, s, 32, 2, 0x80000000);
+  VDUP (vector3, q, int, s, 16, 8, 0x8000);
+  VDUP (vector3, q, int, s, 32, 4, 0x80000000);
+
+  TEST_VQRDMLXH_LANE (, int, s, 16, 4, 4, 2, expected_cumulative_sat_mul,
+		      TEST_MSG_MUL);
+  TEST_VQRDMLXH_LANE (, int, s, 32, 2, 2, 1, expected_cumulative_sat_mul,
+		      TEST_MSG_MUL);
+  TEST_VQRDMLXH_LANE (q, int, s, 16, 8, 4, 3, expected_cumulative_sat_mul,
+		      TEST_MSG_MUL);
+  TEST_VQRDMLXH_LANE (q, int, s, 32, 4, 2, 0, expected_cumulative_sat_mul,
+		      TEST_MSG_MUL);
+
+  CHECK (TEST_MSG, int, 16, 4, PRIx16, expected_mul, TEST_MSG_MUL);
+  CHECK (TEST_MSG, int, 32, 2, PRIx32, expected_mul, TEST_MSG_MUL);
+  CHECK (TEST_MSG, int, 16, 8, PRIx16, expected_mul, TEST_MSG_MUL);
+  CHECK (TEST_MSG, int, 32, 4, PRIx32, expected_mul, TEST_MSG_MUL);
+
+  VDUP (vector, , int, s, 16, 4, 0x8000);
+  VDUP (vector, , int, s, 32, 2, 0x80000000);
+  VDUP (vector, q, int, s, 16, 8, 0x8000);
+  VDUP (vector, q, int, s, 32, 4, 0x80000000);
+
+  VDUP (vector2, , int, s, 16, 4, 0x8001);
+  VDUP (vector2, , int, s, 32, 2, 0x80000001);
+  VDUP (vector2, q, int, s, 16, 8, 0x8001);
+  VDUP (vector2, q, int, s, 32, 4, 0x80000001);
+
+  VDUP (vector3, , int, s, 16, 4, 0x8001);
+  VDUP (vector3, , int, s, 32, 2, 0x80000001);
+  VDUP (vector3, q, int, s, 16, 8, 0x8001);
+  VDUP (vector3, q, int, s, 32, 4, 0x80000001);
+
+  /* Use input values where rounding produces a result equal to the
+     saturation value, but does not set the saturation flag.  */
+#define TEST_MSG_ROUND " (check rounding)"
+  TEST_VQRDMLXH_LANE (, int, s, 16, 4, 4, 2, expected_cumulative_sat_round,
+		      TEST_MSG_ROUND);
+  TEST_VQRDMLXH_LANE (, int, s, 32, 2, 2, 1, expected_cumulative_sat_round,
+		      TEST_MSG_ROUND);
+  TEST_VQRDMLXH_LANE (q, int, s, 16, 8, 4, 3, expected_cumulative_sat_round,
+		      TEST_MSG_ROUND);
+  TEST_VQRDMLXH_LANE (q, int, s, 32, 4, 2, 0, expected_cumulative_sat_round,
+		      TEST_MSG_ROUND);
+
+  CHECK (TEST_MSG, int, 16, 4, PRIx16, expected_round, TEST_MSG_ROUND);
+  CHECK (TEST_MSG, int, 32, 2, PRIx32, expected_round, TEST_MSG_ROUND);
+  CHECK (TEST_MSG, int, 16, 8, PRIx16, expected_round, TEST_MSG_ROUND);
+  CHECK (TEST_MSG, int, 32, 4, PRIx32, expected_round, TEST_MSG_ROUND);
+}
+
+int
+main (void)
+{
+  FNNAME (INSN) ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlah_lane.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlah_lane.c
new file mode 100644
index 0000000..ed43e01
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlah_lane.c
@@ -0,0 +1,57 @@
+/* { dg-require-effective-target arm_v8_1a_neon_hw } */
+/* { dg-add-options arm_v8_1a_neon } */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+/* Expected values of cumulative_saturation flag.  */
+int VECT_VAR (expected_cumulative_sat, int, 16, 4) = 0;
+int VECT_VAR (expected_cumulative_sat, int, 32, 2) = 0;
+int VECT_VAR (expected_cumulative_sat, int, 16, 8) = 0;
+int VECT_VAR (expected_cumulative_sat, int, 32, 4) = 0;
+
+/* Expected results.  */
+VECT_VAR_DECL (expected, int, 16, 4) [] = { 0x38d3, 0x38d4, 0x38d5, 0x38d6 };
+VECT_VAR_DECL (expected, int, 32, 2) [] = { 0xfffffff0, 0xfffffff1 };
+VECT_VAR_DECL (expected, int, 16, 8) [] = { 0x006d, 0x006e, 0x006f, 0x0070,
+					    0x0071, 0x0072, 0x0073, 0x0074 };
+VECT_VAR_DECL (expected, int, 32, 4) [] = { 0xfffffff0, 0xfffffff1,
+					    0xfffffff2, 0xfffffff3 };
+
+/* Expected values of cumulative_saturation flag when multiplication
+   saturates.  */
+int VECT_VAR (expected_cumulative_sat_mul, int, 16, 4) = 0;
+int VECT_VAR (expected_cumulative_sat_mul, int, 32, 2) = 0;
+int VECT_VAR (expected_cumulative_sat_mul, int, 16, 8) = 0;
+int VECT_VAR (expected_cumulative_sat_mul, int, 32, 4) = 0;
+
+/* Expected results when multiplication saturates.  */
+VECT_VAR_DECL (expected_mul, int, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL (expected_mul, int, 32, 2) [] = { 0x0, 0x0 };
+VECT_VAR_DECL (expected_mul, int, 16, 8) [] = { 0x0, 0x0, 0x0, 0x0,
+						0x0, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL (expected_mul, int, 32, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
+
+/* Expected values of cumulative_saturation flag when rounding
+   should not cause saturation.  */
+int VECT_VAR (expected_cumulative_sat_round, int, 16, 4) = 0;
+int VECT_VAR (expected_cumulative_sat_round, int, 32, 2) = 0;
+int VECT_VAR (expected_cumulative_sat_round, int, 16, 8) = 0;
+int VECT_VAR (expected_cumulative_sat_round, int, 32, 4) = 0;
+
+/* Expected results when rounding should not cause saturation.  */
+VECT_VAR_DECL (expected_round, int, 16, 4) [] = { 0xfffe, 0xfffe,
+						  0xfffe, 0xfffe };
+VECT_VAR_DECL (expected_round, int, 32, 2) [] = { 0xfffffffe, 0xfffffffe };
+VECT_VAR_DECL (expected_round,int, 16, 8) [] = { 0xfffe, 0xfffe,
+						 0xfffe, 0xfffe,
+						 0xfffe, 0xfffe,
+						 0xfffe, 0xfffe };
+VECT_VAR_DECL (expected_round, int, 32, 4) [] = { 0xfffffffe, 0xfffffffe,
+						  0xfffffffe, 0xfffffffe };
+
+#define INSN vqrdmlah
+#define TEST_MSG "VQRDMLAH_LANE"
+
+#include "vqrdmlXh_lane.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlsh_lane.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlsh_lane.c
new file mode 100644
index 0000000..6010b42
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlsh_lane.c
@@ -0,0 +1,61 @@
+/* { dg-require-effective-target arm_v8_1a_neon_hw } */
+/* { dg-add-options arm_v8_1a_neon } */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+/* Expected values of cumulative_saturation flag.  */
+int VECT_VAR (expected_cumulative_sat, int, 16, 4) = 0;
+int VECT_VAR (expected_cumulative_sat, int, 32, 2) = 0;
+int VECT_VAR (expected_cumulative_sat, int, 16, 8) = 0;
+int VECT_VAR (expected_cumulative_sat, int, 32, 4) = 0;
+
+/* Expected results.  */
+VECT_VAR_DECL (expected, int, 16, 4) [] = { 0xc70d, 0xc70e, 0xc70f, 0xc710 };
+VECT_VAR_DECL (expected, int, 32, 2) [] = { 0xfffffff0, 0xfffffff1 };
+VECT_VAR_DECL (expected, int, 16, 8) [] = { 0xff73, 0xff74, 0xff75, 0xff76,
+					    0xff77, 0xff78, 0xff79, 0xff7a };
+VECT_VAR_DECL (expected, int, 32, 4) [] = { 0xfffffff0, 0xfffffff1,
+					    0xfffffff2, 0xfffffff3 };
+
+/* Expected values of cumulative_saturation flag when multiplication
+   saturates.  */
+int VECT_VAR (expected_cumulative_sat_mul, int, 16, 4) = 1;
+int VECT_VAR (expected_cumulative_sat_mul, int, 32, 2) = 1;
+int VECT_VAR (expected_cumulative_sat_mul, int, 16, 8) = 1;
+int VECT_VAR (expected_cumulative_sat_mul, int, 32, 4) = 1;
+
+/* Expected results when multiplication saturates.  */
+VECT_VAR_DECL (expected_mul, int, 16, 4) [] = { 0x8000, 0x8000,
+						0x8000, 0x8000 };
+VECT_VAR_DECL (expected_mul, int, 32, 2) [] = { 0x80000000, 0x80000000 };
+VECT_VAR_DECL (expected_mul, int, 16, 8) [] = { 0x8000, 0x8000,
+						0x8000, 0x8000,
+						0x8000, 0x8000,
+						0x8000, 0x8000 };
+VECT_VAR_DECL (expected_mul, int, 32, 4) [] = { 0x80000000, 0x80000000,
+						0x80000000, 0x80000000 };
+
+/* Expected values of cumulative_saturation flag when rounding
+   should not cause saturation.  */
+int VECT_VAR (expected_cumulative_sat_round, int, 16, 4) = 1;
+int VECT_VAR (expected_cumulative_sat_round, int, 32, 2) = 1;
+int VECT_VAR (expected_cumulative_sat_round, int, 16, 8) = 1;
+int VECT_VAR (expected_cumulative_sat_round, int, 32, 4) = 1;
+
+/* Expected results when rounding should not cause saturation.  */
+VECT_VAR_DECL (expected_round, int, 16, 4) [] = { 0x8000, 0x8000,
+						  0x8000, 0x8000 };
+VECT_VAR_DECL (expected_round, int, 32, 2) [] = { 0x80000000, 0x80000000 };
+VECT_VAR_DECL (expected_round, int, 16, 8) [] = { 0x8000, 0x8000,
+						  0x8000, 0x8000,
+						  0x8000, 0x8000,
+						  0x8000, 0x8000 };
+VECT_VAR_DECL (expected_round, int, 32, 4) [] = { 0x80000000, 0x80000000,
+						  0x80000000, 0x80000000 };
+
+#define INSN vqrdmlsh
+#define TEST_MSG "VQRDMLSH_LANE"
+
+#include "vqrdmlXh_lane.inc"
-- 
2.1.4


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [AArch64][dejagnu][PATCH 5/7] Dejagnu support for ARMv8.1 Adv.SIMD.
  2015-11-25 10:14           ` Matthew Wahab
@ 2015-11-25 10:57             ` James Greenhalgh
  0 siblings, 0 replies; 30+ messages in thread
From: James Greenhalgh @ 2015-11-25 10:57 UTC (permalink / raw)
  To: Matthew Wahab; +Cc: Bernhard Reutner-Fischer, gcc-patches

On Wed, Nov 25, 2015 at 10:10:49AM +0000, Matthew Wahab wrote:
> On 23/11/15 16:38, Matthew Wahab wrote:
> >On 23/11/15 12:24, James Greenhalgh wrote:
> >>On Tue, Oct 27, 2015 at 03:32:04PM +0000, Matthew Wahab wrote:
> >>>On 24/10/15 08:16, Bernhard Reutner-Fischer wrote:
> >>>>On October 23, 2015 2:24:26 PM GMT+02:00, Matthew Wahab
> >>>><matthew.wahab@foss.arm.com> wrote:
> >>>>>The ARMv8.1 architecture extension adds two Adv.SIMD instructions,.
> >>>>>This
> >>>>>patch adds support in Dejagnu for ARMv8.1 Adv.SIMD specifiers and
> >>>>>checks.
> >>>>>
> >>>>>The new test options are
> >>>>>- { dg-add-options arm_v8_1a_neon }: Add compiler options needed to
> >>>>>   enable ARMv8.1 Adv.SIMD.
> >>>>>- { dg-require-effective-target arm_v8_1a_neon_hw }: Require a target
> >>>>>   capable of executing ARMv8.1 Adv.SIMD instructions.
> >>>>>
> 
> >>>+# Return 1 if the target supports executing the ARMv8.1 Adv.SIMD extension, 0
> >>>+# otherwise.  The test is valid for AArch64.
> >>>+
> >>>+proc check_effective_target_arm_v8_1a_neon_hw { } {
> >>>+    if { ![check_effective_target_arm_v8_1a_neon_ok] } {
> >>>+    return 0;
> >>>+    }
> >>>+    return [check_runtime_nocache arm_v8_1a_neon_hw_available {
> >>>+    int
> >>>+    main (void)
> >>>+    {
> >>>+      long long a = 0, b = 1;
> >>>+      long long result = 0;
> >>>+
> >>>+      asm ("sqrdmlah %s0,%s1,%s2"
> >>>+           : "=w"(result)
> >>>+           : "w"(a), "w"(b)
> >>>+           : /* No clobbers.  */);
> >>
> >>Hm, those types look wrong, I guess this works but it is an unusual way
> >>to write it. I presume this is to avoid including arm_neon.h each time, but
> >>you could just directly use the internal type names for the arm_neon types.
> >>That is to say __Int32x4_t (or whichever mode you intend to use).
> >>
> >
> >I'll rework the patch to use the internal types names.
> 
> Attached, the reworked patch which uses internal type __Int32x2_t and
> cleans up the assembler.
> 
> Retested aarch64-none-elf with cross-compiled check-gcc on an ARMv8.1
> emulator. Also re-ran the cross-compiled
> gcc.target/aarch64/advsimd-intrinsics tests for aarch64-none-elf on an
> ARMv8 emulator.

OK.

Thanks,
James

> gcc/testsuite
> 2015-11-24  Matthew Wahab  <matthew.wahab@arm.com>
> 
> 	* lib/target-supports.exp (add_options_for_arm_v8_1a_neon): New.
> 	(check_effective_target_arm_arch_FUNC_ok)
> 	(add_options_for_arm_arch_FUNC)
> 	(check_effective_target_arm_arch_FUNC_multilib): Add "armv8.1-a"
> 	to the list to be generated.
> 	(check_effective_target_arm_v8_1a_neon_ok_nocache): New.
> 	(check_effective_target_arm_v8_1a_neon_ok): New.
> 	(check_effective_target_arm_v8_1a_neon_hw): New.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [AArch64][PATCH 6/7] Add NEON intrinsics vqrdmlah and vqrdmlsh.
  2015-11-25 10:15     ` Matthew Wahab
@ 2015-11-25 10:58       ` James Greenhalgh
  0 siblings, 0 replies; 30+ messages in thread
From: James Greenhalgh @ 2015-11-25 10:58 UTC (permalink / raw)
  To: Matthew Wahab; +Cc: gcc-patches

On Wed, Nov 25, 2015 at 10:14:10AM +0000, Matthew Wahab wrote:
> On 23/11/15 13:35, James Greenhalgh wrote:
> >On Fri, Oct 23, 2015 at 01:26:11PM +0100, Matthew Wahab wrote:
> >>The ARMv8.1 architecture extension adds two Adv.SIMD instructions,
> >>sqrdmlah and sqrdmlsh. This patch adds the NEON intrinsics vqrdmlah and
> >>vqrdmlsh for these instructions. The new intrinsics are of the form
> >>vqrdml{as}h[q]_<type>.
> >>
> 
> >>diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
> >>index e186348..9e73809 100644
> >>--- a/gcc/config/aarch64/arm_neon.h
> >>+++ b/gcc/config/aarch64/arm_neon.h
> >>@@ -2649,6 +2649,59 @@ vqrdmulhq_s32 (int32x4_t __a, int32x4_t __b)
> >>    return (int32x4_t) __builtin_aarch64_sqrdmulhv4si (__a, __b);
> >>  }
> >>
> >>+#pragma GCC push_options
> >>+#pragma GCC target ("arch=armv8.1-a")
> >
> >Can we please patch the documentation to make it clear that -march=armv8.1-a
> >always implies -march=armv8.1-a+rdma ? The documentation around which
> >feature modifiers are implied when leaves much to be desired.
> 
> I'll rework the documentation as part of the (separate) command lines clean-up patch.
> 
> >>+
> >>+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
> >>+vqrdmlah_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c)
> >>+{
> >>+  return (int16x4_t) __builtin_aarch64_sqrdmlahv4hi (__a, __b, __c);
> >
> >We don't need this cast (likewise the other instances)?
> >
> 
> Attached, a reworked patch that removes the casts from the new
> intrinsics. It also moves the new intrinsics to before the crypto
> intrinsics. The intention is that the intrinsics added in this and the
> next patch in the set are put in the same place and bracketed by a
> single target pragma.
> 
> Retested aarch64-none-elf with cross-compiled check-gcc on an ARMv8.1
> emulator. Also re-ran the cross-compiled
> gcc.target/aarch64/advsimd-intrinsics tests for aarch64-none-elf on an
> ARMv8 emulator.

OK.

Thanks,
James

> gcc/
> 2015-11-24  Matthew Wahab  <matthew.wahab@arm.com>
> 
> 	* gcc/config/aarch64/arm_neon.h (vqrdmlah_s16, vqrdmlah_s32): New.
> 	(vqrdmlahq_s16, vqrdmlahq_s32): New.
> 	(vqrdmlsh_s16, vqrdmlsh_s32): New.
> 	(vqrdmlshq_s16, vqrdmlshq_s32): New.
> 
> gcc/testsuite
> 2015-11-24  Matthew Wahab  <matthew.wahab@arm.com>
> 
> 	* gcc.target/aarch64/advsimd-intrinsics/vqrdmlXh.inc: New file,
> 	support code for vqrdml{as}h tests.
> 	* gcc.target/aarch64/advsimd-intrinsics/vqrdmlah.c: New.
> 	* gcc.target/aarch64/advsimd-intrinsics/vqrdmlsh.c: New.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [AArch64][PATCH 7/7] Add NEON intrinsics vqrdmlah_lane and vqrdmlsh_lane.
  2015-11-25 10:25     ` Matthew Wahab
@ 2015-11-25 11:11       ` James Greenhalgh
  0 siblings, 0 replies; 30+ messages in thread
From: James Greenhalgh @ 2015-11-25 11:11 UTC (permalink / raw)
  To: Matthew Wahab; +Cc: gcc-patches

On Wed, Nov 25, 2015 at 10:15:45AM +0000, Matthew Wahab wrote:
> On 23/11/15 13:37, James Greenhalgh wrote:
> >On Fri, Oct 23, 2015 at 01:30:46PM +0100, Matthew Wahab wrote:
> >>The ARMv8.1 architecture extension adds two Adv.SIMD instructions,
> >>sqrdmlah and sqrdmlsh. This patch adds the NEON intrinsics vqrdmlah_lane
> >>and vqrdmlsh_lane for these instructions. The new intrinsics are of the
> >>form vqrdml{as}h[q]_lane_<type>.
> >>
> 
> >>diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
> >>index 9e73809..9b68e4a 100644
> >>--- a/gcc/config/aarch64/arm_neon.h
> >>+++ b/gcc/config/aarch64/arm_neon.h
> >>@@ -10675,6 +10675,59 @@ vqrdmulhq_laneq_s32 (int32x4_t __a, int32x4_t __b, const int __c)
> >>    return __builtin_aarch64_sqrdmulh_laneqv4si (__a, __b, __c);
> >>  }
> >>
> >>+#pragma GCC push_options
> >>+#pragma GCC target ("arch=armv8.1-a")
> >
> >Rather than strict alphabetical order, can we group everything which is
> >under one set of extensions together, to save on the push_options/pop_options
> >pairs.
> >
> 
> Attached the reworked patch that keeps the ARMv8.1 intrinsics together,
> bracketed by a single target pragma.
> 
> Retested aarch64-none-elf with cross-compiled check-gcc on an ARMv8.1
> emulator. Also re-ran the cross-compiled
> gcc.target/aarch64/advsimd-intrinsics tests for aarch64-none-elf on an
> ARMv8 emulator.

OK.

Thanks,
James

> gcc/
> 2015-11-24  Matthew Wahab  <matthew.wahab@arm.com>
> 
> 	* gcc/config/aarch64/arm_neon.h
> 	(vqrdmlah_laneq_s16, vqrdmlah_laneq_s32): New.
> 	(vqrdmlahq_laneq_s16, vqrdmlahq_laneq_s32): New.
> 	(vqrdmlsh_lane_s16, vqrdmlsh_lane_s32): New.
> 	(vqrdmlshq_laneq_s16, vqrdmlshq_laneq_s32): New.
> 	(vqrdmlah_lane_s16, vqrdmlah_lane_s32): New.
> 	(vqrdmlahq_lane_s16, vqrdmlahq_lane_s32): New.
> 	(vqrdmlahh_s16, vqrdmlahh_lane_s16, vqrdmlahh_laneq_s16): New.
> 	(vqrdmlahs_s32, vqrdmlahs_lane_s32, vqrdmlahs_laneq_s32): New.
> 	(vqrdmlsh_lane_s16, vqrdmlsh_lane_s32): New.
> 	(vqrdmlshq_lane_s16, vqrdmlshq_lane_s32): New.
> 	(vqrdmlshh_s16, vqrdmlshh_lane_s16, vqrdmlshh_laneq_s16): New.
> 	(vqrdmlshs_s32, vqrdmlshs_lane_s32, vqrdmlshs_laneq_s32): New.
> 
> gcc/testsuite
> 2015-11-24  Matthew Wahab  <matthew.wahab@arm.com>
> 
> 	* gcc.target/aarch64/advsimd-intrinsics/vqrdmlXh_lane.inc: New file,
> 	support code for vqrdml{as}h_lane tests.
> 	* gcc.target/aarch64/advsimd-intrinsics/vqrdmlah_lane.c: New.
> 	* gcc.target/aarch64/advsimd-intrinsics/vqrdmlsh_lane.c: New.


^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2015-11-25 10:58 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-23 12:19 [AArch64][PATCH 1/7] Add support for ARMv8.1 Adv.SIMD,instructions Matthew Wahab
2015-10-23 12:19 ` [AArch64][PATCH 2/7] Add sqrdmah, sqrdmsh instructions Matthew Wahab
2015-10-27 11:19   ` James Greenhalgh
2015-10-27 16:12     ` Matthew Wahab
2015-10-27 16:30       ` James Greenhalgh
2015-10-23 12:21 ` [AArch64][PATCH 3/7] Add builtins for ARMv8.1 Adv.SIMD,instructions Matthew Wahab
2015-10-27 11:20   ` James Greenhalgh
2015-10-23 12:24 ` [AArch64][PATCH 4/7] Add ACLE feature macro for ARMv8.1,Adv.SIMD instructions Matthew Wahab
2015-10-27 11:36   ` James Greenhalgh
2015-11-17 13:21     ` James Greenhalgh
2015-10-23 12:24 ` [AArch64][dejagnu][PATCH 5/7] Dejagnu support for ARMv8.1 Adv.SIMD Matthew Wahab
2015-10-24  8:04   ` Bernhard Reutner-Fischer
2015-10-27 15:32     ` Matthew Wahab
2015-11-23 12:34       ` James Greenhalgh
2015-11-23 16:40         ` Matthew Wahab
2015-11-25 10:14           ` Matthew Wahab
2015-11-25 10:57             ` James Greenhalgh
2015-10-23 12:30 ` [AArch64][PATCH 6/7] Add NEON intrinsics vqrdmlah and vqrdmlsh Matthew Wahab
2015-10-30 12:53   ` Christophe Lyon
2015-10-30 15:56     ` Matthew Wahab
2015-11-09 13:31       ` Christophe Lyon
2015-11-09 13:53         ` Matthew Wahab
2015-11-23 13:37   ` James Greenhalgh
2015-11-25 10:15     ` Matthew Wahab
2015-11-25 10:58       ` James Greenhalgh
2015-10-23 12:34 ` [AArch64][PATCH 7/7] Add NEON intrinsics vqrdmlah_lane and vqrdmlsh_lane Matthew Wahab
2015-11-23 13:45   ` James Greenhalgh
2015-11-25 10:25     ` Matthew Wahab
2015-11-25 11:11       ` James Greenhalgh
2015-10-27 10:54 ` [AArch64][PATCH 1/7] Add support for ARMv8.1 Adv.SIMD,instructions James Greenhalgh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).