[AArch64][1/14] ARMv8.2-A FP16 data processing intrinsics

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [AArch64][1/14] ARMv8.2-A FP16 data processing intrinsics
       [not found] <67f7b93f-0a92-de8f-8c50-5b4b573fed3a@foss.arm.com>
       [not found] ` <99eb95e3-5e9c-c6c9-b85f-e67d15f4859a@foss.arm.com>
@ 2016-07-07 16:14 ` Jiong Wang
  2016-07-08 14:07   ` James Greenhalgh
  1 sibling, 1 reply; 32+ messages in thread
From: Jiong Wang @ 2016-07-07 16:14 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 1923 bytes --]

Several data-processing instructions are agnostic to the type of their
operands. This patch add the mapping between them and those bit- and
lane-manipulation instructions.

No ARMv8.2-A FP16 extension hardware support is required for these
intrinsics.

gcc/
2016-07-07  Jiong Wang <jiong.wang@arm.com>

         * config/aarch64/aarch64-simd.md
(aarch64_<PERMUTE:perm_insn><PERMUTE:perm_hilo><mode>): Use VALL_F16.
         (aarch64_ext<mode>): Likewise.
         (aarch64_rev<REVERSE:rev_op><mode>): Likewise.
         * config/aarch64/aarch64.c (aarch64_evpc_trn): Support V4HFmode 
and V8HFmode.
         (aarch64_evpc_uzp): Likewise.
         (aarch64_evpc_zip): Likewise.
         (aarch64_evpc_ext): Likewise.
         (aarch64_evpc_rev): Likewise.
         * config/aarch64/arm_neon.h (__aarch64_vdup_lane_f16): New.
         (__aarch64_vdup_laneq_f16): New..
         (__aarch64_vdupq_lane_f16): New.
         (__aarch64_vdupq_laneq_f16): New.
         (vbsl_f16): New.
         (vbslq_f16): New.
         (vdup_n_f16): New.
         (vdupq_n_f16): New.
         (vdup_lane_f16): New.
         (vdup_laneq_f16): New.
         (vdupq_lane_f16): New.
         (vdupq_laneq_f16): New.
         (vduph_lane_f16): New.
         (vduph_laneq_f16): New.
         (vext_f16): New.
         (vextq_f16): New.
         (vmov_n_f16): New.
         (vmovq_n_f16): New.
         (vrev64_f16): New.
         (vrev64q_f16): New.
         (vtrn1_f16): New.
         (vtrn1q_f16): New.
         (vtrn2_f16): New.
         (vtrn2q_f16): New.
         (vtrn_f16): New.
         (vtrnq_f16): New.
         (__INTERLEAVE_LIST): Support float16x4_t, float16x8_t.
         (vuzp1_f16): New.
         (vuzp1q_f16): New.
         (vuzp2_f16): New.
         (vuzp2q_f16): New.
         (vzip1_f16): New.
         (vzip2q_f16): New.
         (vmov_n_f16): Reimplement using vdup_n_f16.
         (vmovq_n_f16): Reimplement using vdupq_n_f16..

[-- Attachment #2: 0001-1-14-ARMv8.2-FP16-data-processing-intrinsics.patch --]
[-- Type: text/x-patch, Size: 25537 bytes --]

From b12677052e69b67310c1d63360db2793354414cb Mon Sep 17 00:00:00 2001
From: Jiong Wang <jiong.wang@arm.com>
Date: Tue, 7 Jun 2016 17:01:22 +0100
Subject: [PATCH 01/14] [1/14] ARMv8.2 FP16 data processing intrinsics

---
 gcc/config/aarch64/aarch64-simd.md |  22 +--
 gcc/config/aarch64/aarch64.c       |  16 +++
 gcc/config/aarch64/arm_neon.h      | 275 ++++++++++++++++++++++++++++++++++++-
 3 files changed, 298 insertions(+), 15 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index c8a5e3e..74dfe28 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -5161,10 +5161,10 @@
 )
 
 (define_insn "aarch64_<PERMUTE:perm_insn><PERMUTE:perm_hilo><mode>"
-  [(set (match_operand:VALL 0 "register_operand" "=w")
-	(unspec:VALL [(match_operand:VALL 1 "register_operand" "w")
-		      (match_operand:VALL 2 "register_operand" "w")]
-		       PERMUTE))]
+  [(set (match_operand:VALL_F16 0 "register_operand" "=w")
+	(unspec:VALL_F16 [(match_operand:VALL_F16 1 "register_operand" "w")
+			  (match_operand:VALL_F16 2 "register_operand" "w")]
+	 PERMUTE))]
   "TARGET_SIMD"
   "<PERMUTE:perm_insn><PERMUTE:perm_hilo>\\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
   [(set_attr "type" "neon_permute<q>")]
@@ -5172,11 +5172,11 @@
 
 ;; Note immediate (third) operand is lane index not byte index.
 (define_insn "aarch64_ext<mode>"
-  [(set (match_operand:VALL 0 "register_operand" "=w")
-        (unspec:VALL [(match_operand:VALL 1 "register_operand" "w")
-                      (match_operand:VALL 2 "register_operand" "w")
-                      (match_operand:SI 3 "immediate_operand" "i")]
-                     UNSPEC_EXT))]
+  [(set (match_operand:VALL_F16 0 "register_operand" "=w")
+        (unspec:VALL_F16 [(match_operand:VALL_F16 1 "register_operand" "w")
+			  (match_operand:VALL_F16 2 "register_operand" "w")
+			  (match_operand:SI 3 "immediate_operand" "i")]
+	 UNSPEC_EXT))]
   "TARGET_SIMD"
 {
   operands[3] = GEN_INT (INTVAL (operands[3])
@@ -5187,8 +5187,8 @@
 )
 
 (define_insn "aarch64_rev<REVERSE:rev_op><mode>"
-  [(set (match_operand:VALL 0 "register_operand" "=w")
-	(unspec:VALL [(match_operand:VALL 1 "register_operand" "w")]
+  [(set (match_operand:VALL_F16 0 "register_operand" "=w")
+	(unspec:VALL_F16 [(match_operand:VALL_F16 1 "register_operand" "w")]
                     REVERSE))]
   "TARGET_SIMD"
   "rev<REVERSE:rev_op>\\t%0.<Vtype>, %1.<Vtype>"
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index b60e5c5..358d35c 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -12053,6 +12053,8 @@ aarch64_evpc_trn (struct expand_vec_perm_d *d)
 	case V4SImode: gen = gen_aarch64_trn2v4si; break;
 	case V2SImode: gen = gen_aarch64_trn2v2si; break;
 	case V2DImode: gen = gen_aarch64_trn2v2di; break;
+	case V4HFmode: gen = gen_aarch64_trn2v4hf; break;
+	case V8HFmode: gen = gen_aarch64_trn2v8hf; break;
 	case V4SFmode: gen = gen_aarch64_trn2v4sf; break;
 	case V2SFmode: gen = gen_aarch64_trn2v2sf; break;
 	case V2DFmode: gen = gen_aarch64_trn2v2df; break;
@@ -12071,6 +12073,8 @@ aarch64_evpc_trn (struct expand_vec_perm_d *d)
 	case V4SImode: gen = gen_aarch64_trn1v4si; break;
 	case V2SImode: gen = gen_aarch64_trn1v2si; break;
 	case V2DImode: gen = gen_aarch64_trn1v2di; break;
+	case V4HFmode: gen = gen_aarch64_trn1v4hf; break;
+	case V8HFmode: gen = gen_aarch64_trn1v8hf; break;
 	case V4SFmode: gen = gen_aarch64_trn1v4sf; break;
 	case V2SFmode: gen = gen_aarch64_trn1v2sf; break;
 	case V2DFmode: gen = gen_aarch64_trn1v2df; break;
@@ -12136,6 +12140,8 @@ aarch64_evpc_uzp (struct expand_vec_perm_d *d)
 	case V4SImode: gen = gen_aarch64_uzp2v4si; break;
 	case V2SImode: gen = gen_aarch64_uzp2v2si; break;
 	case V2DImode: gen = gen_aarch64_uzp2v2di; break;
+	case V4HFmode: gen = gen_aarch64_uzp2v4hf; break;
+	case V8HFmode: gen = gen_aarch64_uzp2v8hf; break;
 	case V4SFmode: gen = gen_aarch64_uzp2v4sf; break;
 	case V2SFmode: gen = gen_aarch64_uzp2v2sf; break;
 	case V2DFmode: gen = gen_aarch64_uzp2v2df; break;
@@ -12154,6 +12160,8 @@ aarch64_evpc_uzp (struct expand_vec_perm_d *d)
 	case V4SImode: gen = gen_aarch64_uzp1v4si; break;
 	case V2SImode: gen = gen_aarch64_uzp1v2si; break;
 	case V2DImode: gen = gen_aarch64_uzp1v2di; break;
+	case V4HFmode: gen = gen_aarch64_uzp1v4hf; break;
+	case V8HFmode: gen = gen_aarch64_uzp1v8hf; break;
 	case V4SFmode: gen = gen_aarch64_uzp1v4sf; break;
 	case V2SFmode: gen = gen_aarch64_uzp1v2sf; break;
 	case V2DFmode: gen = gen_aarch64_uzp1v2df; break;
@@ -12224,6 +12232,8 @@ aarch64_evpc_zip (struct expand_vec_perm_d *d)
 	case V4SImode: gen = gen_aarch64_zip2v4si; break;
 	case V2SImode: gen = gen_aarch64_zip2v2si; break;
 	case V2DImode: gen = gen_aarch64_zip2v2di; break;
+	case V4HFmode: gen = gen_aarch64_zip2v4hf; break;
+	case V8HFmode: gen = gen_aarch64_zip2v8hf; break;
 	case V4SFmode: gen = gen_aarch64_zip2v4sf; break;
 	case V2SFmode: gen = gen_aarch64_zip2v2sf; break;
 	case V2DFmode: gen = gen_aarch64_zip2v2df; break;
@@ -12242,6 +12252,8 @@ aarch64_evpc_zip (struct expand_vec_perm_d *d)
 	case V4SImode: gen = gen_aarch64_zip1v4si; break;
 	case V2SImode: gen = gen_aarch64_zip1v2si; break;
 	case V2DImode: gen = gen_aarch64_zip1v2di; break;
+	case V4HFmode: gen = gen_aarch64_zip1v4hf; break;
+	case V8HFmode: gen = gen_aarch64_zip1v8hf; break;
 	case V4SFmode: gen = gen_aarch64_zip1v4sf; break;
 	case V2SFmode: gen = gen_aarch64_zip1v2sf; break;
 	case V2DFmode: gen = gen_aarch64_zip1v2df; break;
@@ -12286,6 +12298,8 @@ aarch64_evpc_ext (struct expand_vec_perm_d *d)
     case V8HImode: gen = gen_aarch64_extv8hi; break;
     case V2SImode: gen = gen_aarch64_extv2si; break;
     case V4SImode: gen = gen_aarch64_extv4si; break;
+    case V4HFmode: gen = gen_aarch64_extv4hf; break;
+    case V8HFmode: gen = gen_aarch64_extv8hf; break;
     case V2SFmode: gen = gen_aarch64_extv2sf; break;
     case V4SFmode: gen = gen_aarch64_extv4sf; break;
     case V2DImode: gen = gen_aarch64_extv2di; break;
@@ -12361,6 +12375,8 @@ aarch64_evpc_rev (struct expand_vec_perm_d *d)
 	case V2SImode: gen = gen_aarch64_rev64v2si;  break;
 	case V4SFmode: gen = gen_aarch64_rev64v4sf;  break;
 	case V2SFmode: gen = gen_aarch64_rev64v2sf;  break;
+	case V8HFmode: gen = gen_aarch64_rev64v8hf;  break;
+	case V4HFmode: gen = gen_aarch64_rev64v4hf;  break;
 	default:
 	  return false;
 	}
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 4e36c6a..b7b1eb8 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -466,6 +466,8 @@ typedef struct poly16x8x4_t
 #define __aarch64_vdup_lane_any(__size, __q, __a, __b) \
   vdup##__q##_n_##__size (__aarch64_vget_lane_any (__a, __b))
 
+#define __aarch64_vdup_lane_f16(__a, __b) \
+   __aarch64_vdup_lane_any (f16, , __a, __b)
 #define __aarch64_vdup_lane_f32(__a, __b) \
    __aarch64_vdup_lane_any (f32, , __a, __b)
 #define __aarch64_vdup_lane_f64(__a, __b) \
@@ -492,6 +494,8 @@ typedef struct poly16x8x4_t
    __aarch64_vdup_lane_any (u64, , __a, __b)
 
 /* __aarch64_vdup_laneq internal macros.  */
+#define __aarch64_vdup_laneq_f16(__a, __b) \
+   __aarch64_vdup_lane_any (f16, , __a, __b)
 #define __aarch64_vdup_laneq_f32(__a, __b) \
    __aarch64_vdup_lane_any (f32, , __a, __b)
 #define __aarch64_vdup_laneq_f64(__a, __b) \
@@ -518,6 +522,8 @@ typedef struct poly16x8x4_t
    __aarch64_vdup_lane_any (u64, , __a, __b)
 
 /* __aarch64_vdupq_lane internal macros.  */
+#define __aarch64_vdupq_lane_f16(__a, __b) \
+   __aarch64_vdup_lane_any (f16, q, __a, __b)
 #define __aarch64_vdupq_lane_f32(__a, __b) \
    __aarch64_vdup_lane_any (f32, q, __a, __b)
 #define __aarch64_vdupq_lane_f64(__a, __b) \
@@ -544,6 +550,8 @@ typedef struct poly16x8x4_t
    __aarch64_vdup_lane_any (u64, q, __a, __b)
 
 /* __aarch64_vdupq_laneq internal macros.  */
+#define __aarch64_vdupq_laneq_f16(__a, __b) \
+   __aarch64_vdup_lane_any (f16, q, __a, __b)
 #define __aarch64_vdupq_laneq_f32(__a, __b) \
    __aarch64_vdup_lane_any (f32, q, __a, __b)
 #define __aarch64_vdupq_laneq_f64(__a, __b) \
@@ -10369,6 +10377,12 @@ vaddvq_f64 (float64x2_t __a)
 
 /* vbsl  */
 
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vbsl_f16 (uint16x4_t __a, float16x4_t __b, float16x4_t __c)
+{
+  return __builtin_aarch64_simd_bslv4hf_suss (__a, __b, __c);
+}
+
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vbsl_f32 (uint32x2_t __a, float32x2_t __b, float32x2_t __c)
 {
@@ -10444,6 +10458,12 @@ vbsl_u64 (uint64x1_t __a, uint64x1_t __b, uint64x1_t __c)
       {__builtin_aarch64_simd_bsldi_uuuu (__a[0], __b[0], __c[0])};
 }
 
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vbslq_f16 (uint16x8_t __a, float16x8_t __b, float16x8_t __c)
+{
+  return __builtin_aarch64_simd_bslv8hf_suss (__a, __b, __c);
+}
+
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vbslq_f32 (uint32x4_t __a, float32x4_t __b, float32x4_t __c)
 {
@@ -12967,6 +12987,12 @@ vcvtpq_u64_f64 (float64x2_t __a)
 
 /* vdup_n  */
 
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vdup_n_f16 (float16_t __a)
+{
+  return (float16x4_t) {__a, __a, __a, __a};
+}
+
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vdup_n_f32 (float32_t __a)
 {
@@ -13041,6 +13067,12 @@ vdup_n_u64 (uint64_t __a)
 
 /* vdupq_n  */
 
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vdupq_n_f16 (float16_t __a)
+{
+  return (float16x8_t) {__a, __a, __a, __a, __a, __a, __a, __a};
+}
+
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vdupq_n_f32 (float32_t __a)
 {
@@ -13118,6 +13150,12 @@ vdupq_n_u64 (uint64_t __a)
 
 /* vdup_lane  */
 
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vdup_lane_f16 (float16x4_t __a, const int __b)
+{
+  return __aarch64_vdup_lane_f16 (__a, __b);
+}
+
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vdup_lane_f32 (float32x2_t __a, const int __b)
 {
@@ -13192,6 +13230,12 @@ vdup_lane_u64 (uint64x1_t __a, const int __b)
 
 /* vdup_laneq  */
 
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vdup_laneq_f16 (float16x8_t __a, const int __b)
+{
+  return __aarch64_vdup_laneq_f16 (__a, __b);
+}
+
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vdup_laneq_f32 (float32x4_t __a, const int __b)
 {
@@ -13265,6 +13309,13 @@ vdup_laneq_u64 (uint64x2_t __a, const int __b)
 }
 
 /* vdupq_lane  */
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vdupq_lane_f16 (float16x4_t __a, const int __b)
+{
+  return __aarch64_vdupq_lane_f16 (__a, __b);
+}
+
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vdupq_lane_f32 (float32x2_t __a, const int __b)
 {
@@ -13338,6 +13389,13 @@ vdupq_lane_u64 (uint64x1_t __a, const int __b)
 }
 
 /* vdupq_laneq  */
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vdupq_laneq_f16 (float16x8_t __a, const int __b)
+{
+  return __aarch64_vdupq_laneq_f16 (__a, __b);
+}
+
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vdupq_laneq_f32 (float32x4_t __a, const int __b)
 {
@@ -13430,6 +13488,13 @@ vdupb_lane_u8 (uint8x8_t __a, const int __b)
 }
 
 /* vduph_lane  */
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vduph_lane_f16 (float16x4_t __a, const int __b)
+{
+  return __aarch64_vget_lane_any (__a, __b);
+}
+
 __extension__ static __inline poly16_t __attribute__ ((__always_inline__))
 vduph_lane_p16 (poly16x4_t __a, const int __b)
 {
@@ -13449,6 +13514,7 @@ vduph_lane_u16 (uint16x4_t __a, const int __b)
 }
 
 /* vdups_lane  */
+
 __extension__ static __inline float32_t __attribute__ ((__always_inline__))
 vdups_lane_f32 (float32x2_t __a, const int __b)
 {
@@ -13509,6 +13575,13 @@ vdupb_laneq_u8 (uint8x16_t __a, const int __b)
 }
 
 /* vduph_laneq  */
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vduph_laneq_f16 (float16x8_t __a, const int __b)
+{
+  return __aarch64_vget_lane_any (__a, __b);
+}
+
 __extension__ static __inline poly16_t __attribute__ ((__always_inline__))
 vduph_laneq_p16 (poly16x8_t __a, const int __b)
 {
@@ -13528,6 +13601,7 @@ vduph_laneq_u16 (uint16x8_t __a, const int __b)
 }
 
 /* vdups_laneq  */
+
 __extension__ static __inline float32_t __attribute__ ((__always_inline__))
 vdups_laneq_f32 (float32x4_t __a, const int __b)
 {
@@ -13567,6 +13641,19 @@ vdupd_laneq_u64 (uint64x2_t __a, const int __b)
 
 /* vext  */
 
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vext_f16 (float16x4_t __a, float16x4_t __b, __const int __c)
+{
+  __AARCH64_LANE_CHECK (__a, __c);
+#ifdef __AARCH64EB__
+  return __builtin_shuffle (__b, __a,
+			    (uint16x4_t) {4 - __c, 5 - __c, 6 - __c, 7 - __c});
+#else
+  return __builtin_shuffle (__a, __b,
+			    (uint16x4_t) {__c, __c + 1, __c + 2, __c + 3});
+#endif
+}
+
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vext_f32 (float32x2_t __a, float32x2_t __b, __const int __c)
 {
@@ -13698,6 +13785,22 @@ vext_u64 (uint64x1_t __a, uint64x1_t __b, __const int __c)
   return __a;
 }
 
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vextq_f16 (float16x8_t __a, float16x8_t __b, __const int __c)
+{
+  __AARCH64_LANE_CHECK (__a, __c);
+#ifdef __AARCH64EB__
+  return __builtin_shuffle (__b, __a,
+			    (uint16x8_t) {8 - __c, 9 - __c, 10 - __c, 11 - __c,
+					  12 - __c, 13 - __c, 14 - __c,
+					  15 - __c});
+#else
+  return __builtin_shuffle (__a, __b,
+			    (uint16x8_t) {__c, __c + 1, __c + 2, __c + 3,
+					  __c + 4, __c + 5, __c + 6, __c + 7});
+#endif
+}
+
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vextq_f32 (float32x4_t __a, float32x4_t __b, __const int __c)
 {
@@ -14333,8 +14436,7 @@ vld1q_u64 (const uint64_t *a)
 __extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
 vld1_dup_f16 (const float16_t* __a)
 {
-  float16_t __f = *__a;
-  return (float16x4_t) { __f, __f, __f, __f };
+  return vdup_n_f16 (*__a);
 }
 
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
@@ -14414,8 +14516,7 @@ vld1_dup_u64 (const uint64_t* __a)
 __extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
 vld1q_dup_f16 (const float16_t* __a)
 {
-  float16_t __f = *__a;
-  return (float16x8_t) { __f, __f, __f, __f, __f, __f, __f, __f };
+  return vdupq_n_f16 (*__a);
 }
 
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
@@ -18018,6 +18119,12 @@ vmlsq_laneq_u32 (uint32x4_t __a, uint32x4_t __b,
 
 /* vmov_n_  */
 
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vmov_n_f16 (float16_t __a)
+{
+  return vdup_n_f16 (__a);
+}
+
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vmov_n_f32 (float32_t __a)
 {
@@ -18090,6 +18197,12 @@ vmov_n_u64 (uint64_t __a)
   return (uint64x1_t) {__a};
 }
 
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vmovq_n_f16 (float16_t __a)
+{
+  return vdupq_n_f16 (__a);
+}
+
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vmovq_n_f32 (float32_t __a)
 {
@@ -20834,6 +20947,12 @@ vrev32q_u16 (uint16x8_t a)
   return __builtin_shuffle (a, (uint16x8_t) { 1, 0, 3, 2, 5, 4, 7, 6 });
 }
 
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vrev64_f16 (float16x4_t __a)
+{
+  return __builtin_shuffle (__a, (uint16x4_t) { 3, 2, 1, 0 });
+}
+
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vrev64_f32 (float32x2_t a)
 {
@@ -20888,6 +21007,12 @@ vrev64_u32 (uint32x2_t a)
   return __builtin_shuffle (a, (uint32x2_t) { 1, 0 });
 }
 
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vrev64q_f16 (float16x8_t __a)
+{
+  return __builtin_shuffle (__a, (uint16x8_t) { 3, 2, 1, 0, 7, 6, 5, 4 });
+}
+
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vrev64q_f32 (float32x4_t a)
 {
@@ -23840,6 +23965,16 @@ vtbx4_p8 (poly8x8_t __r, poly8x8x4_t __tab, uint8x8_t __idx)
 
 /* vtrn */
 
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vtrn1_f16 (float16x4_t __a, float16x4_t __b)
+{
+#ifdef __AARCH64EB__
+  return __builtin_shuffle (__a, __b, (uint16x4_t) {5, 1, 7, 3});
+#else
+  return __builtin_shuffle (__a, __b, (uint16x4_t) {0, 4, 2, 6});
+#endif
+}
+
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vtrn1_f32 (float32x2_t __a, float32x2_t __b)
 {
@@ -23930,6 +24065,16 @@ vtrn1_u32 (uint32x2_t __a, uint32x2_t __b)
 #endif
 }
 
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vtrn1q_f16 (float16x8_t __a, float16x8_t __b)
+{
+#ifdef __AARCH64EB__
+  return __builtin_shuffle (__a, __b, (uint16x8_t) {9, 1, 11, 3, 13, 5, 15, 7});
+#else
+  return __builtin_shuffle (__a, __b, (uint16x8_t) {0, 8, 2, 10, 4, 12, 6, 14});
+#endif
+}
+
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vtrn1q_f32 (float32x4_t __a, float32x4_t __b)
 {
@@ -24056,6 +24201,16 @@ vtrn1q_u64 (uint64x2_t __a, uint64x2_t __b)
 #endif
 }
 
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vtrn2_f16 (float16x4_t __a, float16x4_t __b)
+{
+#ifdef __AARCH64EB__
+  return __builtin_shuffle (__a, __b, (uint16x4_t) {4, 0, 6, 2});
+#else
+  return __builtin_shuffle (__a, __b, (uint16x4_t) {1, 5, 3, 7});
+#endif
+}
+
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vtrn2_f32 (float32x2_t __a, float32x2_t __b)
 {
@@ -24146,6 +24301,16 @@ vtrn2_u32 (uint32x2_t __a, uint32x2_t __b)
 #endif
 }
 
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vtrn2q_f16 (float16x8_t __a, float16x8_t __b)
+{
+#ifdef __AARCH64EB__
+  return __builtin_shuffle (__a, __b, (uint16x8_t) {8, 0, 10, 2, 12, 4, 14, 6});
+#else
+  return __builtin_shuffle (__a, __b, (uint16x8_t) {1, 9, 3, 11, 5, 13, 7, 15});
+#endif
+}
+
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vtrn2q_f32 (float32x4_t __a, float32x4_t __b)
 {
@@ -24272,6 +24437,12 @@ vtrn2q_u64 (uint64x2_t __a, uint64x2_t __b)
 #endif
 }
 
+__extension__ static __inline float16x4x2_t __attribute__ ((__always_inline__))
+vtrn_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return (float16x4x2_t) {vtrn1_f16 (__a, __b), vtrn2_f16 (__a, __b)};
+}
+
 __extension__ static __inline float32x2x2_t __attribute__ ((__always_inline__))
 vtrn_f32 (float32x2_t a, float32x2_t b)
 {
@@ -24326,6 +24497,12 @@ vtrn_u32 (uint32x2_t a, uint32x2_t b)
   return (uint32x2x2_t) {vtrn1_u32 (a, b), vtrn2_u32 (a, b)};
 }
 
+__extension__ static __inline float16x8x2_t __attribute__ ((__always_inline__))
+vtrnq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return (float16x8x2_t) {vtrn1q_f16 (__a, __b), vtrn2q_f16 (__a, __b)};
+}
+
 __extension__ static __inline float32x4x2_t __attribute__ ((__always_inline__))
 vtrnq_f32 (float32x4_t a, float32x4_t b)
 {
@@ -24574,6 +24751,7 @@ vuqaddd_s64 (int64_t __a, uint64_t __b)
   }
 
 #define __INTERLEAVE_LIST(op)					\
+  __DEFINTERLEAVE (op, float16x4x2_t, float16x4_t, f16,)	\
   __DEFINTERLEAVE (op, float32x2x2_t, float32x2_t, f32,)	\
   __DEFINTERLEAVE (op, poly8x8x2_t, poly8x8_t, p8,)		\
   __DEFINTERLEAVE (op, poly16x4x2_t, poly16x4_t, p16,)		\
@@ -24583,6 +24761,7 @@ vuqaddd_s64 (int64_t __a, uint64_t __b)
   __DEFINTERLEAVE (op, uint8x8x2_t, uint8x8_t, u8,)		\
   __DEFINTERLEAVE (op, uint16x4x2_t, uint16x4_t, u16,)		\
   __DEFINTERLEAVE (op, uint32x2x2_t, uint32x2_t, u32,)		\
+  __DEFINTERLEAVE (op, float16x8x2_t, float16x8_t, f16, q)	\
   __DEFINTERLEAVE (op, float32x4x2_t, float32x4_t, f32, q)	\
   __DEFINTERLEAVE (op, poly8x16x2_t, poly8x16_t, p8, q)		\
   __DEFINTERLEAVE (op, poly16x8x2_t, poly16x8_t, p16, q)	\
@@ -24595,6 +24774,16 @@ vuqaddd_s64 (int64_t __a, uint64_t __b)
 
 /* vuzp */
 
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vuzp1_f16 (float16x4_t __a, float16x4_t __b)
+{
+#ifdef __AARCH64EB__
+  return __builtin_shuffle (__a, __b, (uint16x4_t) {5, 7, 1, 3});
+#else
+  return __builtin_shuffle (__a, __b, (uint16x4_t) {0, 2, 4, 6});
+#endif
+}
+
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vuzp1_f32 (float32x2_t __a, float32x2_t __b)
 {
@@ -24685,6 +24874,16 @@ vuzp1_u32 (uint32x2_t __a, uint32x2_t __b)
 #endif
 }
 
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vuzp1q_f16 (float16x8_t __a, float16x8_t __b)
+{
+#ifdef __AARCH64EB__
+  return __builtin_shuffle (__a, __b, (uint16x8_t) {9, 11, 13, 15, 1, 3, 5, 7});
+#else
+  return __builtin_shuffle (__a, __b, (uint16x8_t) {0, 2, 4, 6, 8, 10, 12, 14});
+#endif
+}
+
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vuzp1q_f32 (float32x4_t __a, float32x4_t __b)
 {
@@ -24811,6 +25010,16 @@ vuzp1q_u64 (uint64x2_t __a, uint64x2_t __b)
 #endif
 }
 
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vuzp2_f16 (float16x4_t __a, float16x4_t __b)
+{
+#ifdef __AARCH64EB__
+  return __builtin_shuffle (__a, __b, (uint16x4_t) {4, 6, 0, 2});
+#else
+  return __builtin_shuffle (__a, __b, (uint16x4_t) {1, 3, 5, 7});
+#endif
+}
+
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vuzp2_f32 (float32x2_t __a, float32x2_t __b)
 {
@@ -24901,6 +25110,16 @@ vuzp2_u32 (uint32x2_t __a, uint32x2_t __b)
 #endif
 }
 
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vuzp2q_f16 (float16x8_t __a, float16x8_t __b)
+{
+#ifdef __AARCH64EB__
+  return __builtin_shuffle (__a, __b, (uint16x8_t) {8, 10, 12, 14, 0, 2, 4, 6});
+#else
+  return __builtin_shuffle (__a, __b, (uint16x8_t) {1, 3, 5, 7, 9, 11, 13, 15});
+#endif
+}
+
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vuzp2q_f32 (float32x4_t __a, float32x4_t __b)
 {
@@ -25031,6 +25250,16 @@ __INTERLEAVE_LIST (uzp)
 
 /* vzip */
 
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vzip1_f16 (float16x4_t __a, float16x4_t __b)
+{
+#ifdef __AARCH64EB__
+  return __builtin_shuffle (__a, __b, (uint16x4_t) {6, 2, 7, 3});
+#else
+  return __builtin_shuffle (__a, __b, (uint16x4_t) {0, 4, 1, 5});
+#endif
+}
+
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vzip1_f32 (float32x2_t __a, float32x2_t __b)
 {
@@ -25121,6 +25350,18 @@ vzip1_u32 (uint32x2_t __a, uint32x2_t __b)
 #endif
 }
 
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vzip1q_f16 (float16x8_t __a, float16x8_t __b)
+{
+#ifdef __AARCH64EB__
+  return __builtin_shuffle (__a, __b,
+			    (uint16x8_t) {12, 4, 13, 5, 14, 6, 15, 7});
+#else
+  return __builtin_shuffle (__a, __b,
+			    (uint16x8_t) {0, 8, 1, 9, 2, 10, 3, 11});
+#endif
+}
+
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vzip1q_f32 (float32x4_t __a, float32x4_t __b)
 {
@@ -25250,6 +25491,16 @@ vzip1q_u64 (uint64x2_t __a, uint64x2_t __b)
 #endif
 }
 
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vzip2_f16 (float16x4_t __a, float16x4_t __b)
+{
+#ifdef __AARCH64EB__
+  return __builtin_shuffle (__a, __b, (uint16x4_t) {4, 0, 5, 1});
+#else
+  return __builtin_shuffle (__a, __b, (uint16x4_t) {2, 6, 3, 7});
+#endif
+}
+
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vzip2_f32 (float32x2_t __a, float32x2_t __b)
 {
@@ -25340,6 +25591,18 @@ vzip2_u32 (uint32x2_t __a, uint32x2_t __b)
 #endif
 }
 
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vzip2q_f16 (float16x8_t __a, float16x8_t __b)
+{
+#ifdef __AARCH64EB__
+  return __builtin_shuffle (__a, __b,
+			    (uint16x8_t) {8, 0, 9, 1, 10, 2, 11, 3});
+#else
+  return __builtin_shuffle (__a, __b,
+			    (uint16x8_t) {4, 12, 5, 13, 6, 14, 7, 15});
+#endif
+}
+
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vzip2q_f32 (float32x4_t __a, float32x4_t __b)
 {
@@ -25479,6 +25742,7 @@ __INTERLEAVE_LIST (zip)
 #undef __aarch64_vget_lane_any
 
 #undef __aarch64_vdup_lane_any
+#undef __aarch64_vdup_lane_f16
 #undef __aarch64_vdup_lane_f32
 #undef __aarch64_vdup_lane_f64
 #undef __aarch64_vdup_lane_p8
@@ -25491,6 +25755,7 @@ __INTERLEAVE_LIST (zip)
 #undef __aarch64_vdup_lane_u16
 #undef __aarch64_vdup_lane_u32
 #undef __aarch64_vdup_lane_u64
+#undef __aarch64_vdup_laneq_f16
 #undef __aarch64_vdup_laneq_f32
 #undef __aarch64_vdup_laneq_f64
 #undef __aarch64_vdup_laneq_p8
@@ -25503,6 +25768,7 @@ __INTERLEAVE_LIST (zip)
 #undef __aarch64_vdup_laneq_u16
 #undef __aarch64_vdup_laneq_u32
 #undef __aarch64_vdup_laneq_u64
+#undef __aarch64_vdupq_lane_f16
 #undef __aarch64_vdupq_lane_f32
 #undef __aarch64_vdupq_lane_f64
 #undef __aarch64_vdupq_lane_p8
@@ -25515,6 +25781,7 @@ __INTERLEAVE_LIST (zip)
 #undef __aarch64_vdupq_lane_u16
 #undef __aarch64_vdupq_lane_u32
 #undef __aarch64_vdupq_lane_u64
+#undef __aarch64_vdupq_laneq_f16
 #undef __aarch64_vdupq_laneq_f32
 #undef __aarch64_vdupq_laneq_f64
 #undef __aarch64_vdupq_laneq_p8
-- 
2.5.0




^ permalink raw reply	[flat|nested] 32+ messages in thread

* [AArch64][2/14] ARMv8.2-A FP16 one operand vector intrinsics
       [not found] ` <99eb95e3-5e9c-c6c9-b85f-e67d15f4859a@foss.arm.com>
@ 2016-07-07 16:14   ` Jiong Wang
  2016-07-20 17:00     ` Jiong Wang
       [not found]   ` <21c3c64f-95ad-c127-3f8a-4afd236aae33@foss.arm.com>
  1 sibling, 1 reply; 32+ messages in thread
From: Jiong Wang @ 2016-07-07 16:14 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 3561 bytes --]

This patch add ARMv8.2-A FP16 one operand vector intrinsics.

We introduced new mode iterators to cover HF modes, qualified patterns
which was using old mode iterators are switched to new ones.

We can't simply extend old iterator like VDQF to conver HF modes,
because not all patterns using VDQF are with new FP16 support, thus we
introduced new, temperary iterators, and only apply new iterators on
those patterns which do have FP16 supports.

gcc/
2016-07-07  Jiong Wang <jiong.wang@arm.com>

         * config/aarch64/aarch64-builtins.c (TYPES_BINOP_USS): New.
         * config/aarch64/aarch64-simd-builtins.def: Register new builtins.
         * config/aarch64/aarch64-simd.md (aarch64_rsqrte<mode>): Extend 
to HF modes.
         (neg<mode>2): Likewise.
         (abs<mode>2): Likewise.
         (<frint_pattern><mode>2): Likewise.
(l<fcvt_pattern><su_optab><VDQF:mode><fcvt_target>2): Likewise.
         (<optab><VDQF:mode><fcvt_target>2): Likewise.
(<fix_trunc_optab><VDQF:mode><fcvt_target>2): Likewise.
         (ftrunc<VDQF:mode>2): Likewise.
         (<optab><fcvt_target><VDQF:mode>2): Likewise.
         (sqrt<mode>2): Likwise.
         (aarch64_frecpe<mode>): Likewise.
         (aarch64_cm<optab><mode>): Likewise.
         * config/aarch64/iterators.md (VHSDF, VHSDF_DF, VHSDF_SDF): New.
         (VDQF_COND, fcvt_target, FCVT_TARGET, hcon): Extend mode 
attribute to HF modes.
         (stype): New.
         * config/aarch64/arm_neon.h (vdup_n_f16): New.
         (vdupq_n_f16): Likewise.
         (vld1_dup_f16): Use vdup_n_f16.
         (vld1q_dup_f16): Use vdupq_n_f16.
         (vabs_f16): New.
         (vabsq_f16): Likewise.
         (vceqz_f16): Likewise.
         (vceqzq_f16): Likewise.
         (vcgez_f16): Likewise.
         (vcgezq_f16): Likewise.
         (vcgtz_f16): Likewise.
         (vcgtzq_f16): Likewise.
         (vclez_f16): Likewise.
         (vclezq_f16): Likewise.
         (vcltz_f16): Likewise.
         (vcltzq_f16): Likewise.
         (vcvt_f16_s16): Likewise.
         (vcvtq_f16_s16): Likewise.
         (vcvt_f16_u16): Likewise.
         (vcvtq_f16_u16): Likewise.
         (vcvt_s16_f16): Likewise.
         (vcvtq_s16_f16): Likewise.
         (vcvt_u16_f16): Likewise.
         (vcvtq_u16_f16): Likewise.
         (vcvta_s16_f16): Likewise.
         (vcvtaq_s16_f16): Likewise.
         (vcvta_u16_f16): Likewise.
         (vcvtaq_u16_f16): Likewise.
         (vcvtm_s16_f16): Likewise.
         (vcvtmq_s16_f16): Likewise.
         (vcvtm_u16_f16): Likewise.
         (vcvtmq_u16_f16): Likewise.
         (vcvtn_s16_f16): Likewise.
         (vcvtnq_s16_f16): Likewise.
         (vcvtn_u16_f16): Likewise.
         (vcvtnq_u16_f16): Likewise.
         (vcvtp_s16_f16): Likewise.
         (vcvtpq_s16_f16): Likewise.
         (vcvtp_u16_f16): Likewise.
         (vcvtpq_u16_f16): Likewise.
         (vneg_f16): Likewise.
         (vnegq_f16): Likewise.
         (vrecpe_f16): Likewise.
         (vrecpeq_f16): Likewise.
         (vrnd_f16): Likewise.
         (vrndq_f16): Likewise.
         (vrnda_f16): Likewise.
         (vrndaq_f16): Likewise.
         (vrndi_f16): Likewise.
         (vrndiq_f16): Likewise.
         (vrndm_f16): Likewise.
         (vrndmq_f16): Likewise.
         (vrndn_f16): Likewise.
         (vrndnq_f16): Likewise.
         (vrndp_f16): Likewise.
         (vrndpq_f16): Likewise.
         (vrndx_f16): Likewise.
         (vrndxq_f16): Likewise.
         (vrsqrte_f16): Likewise.
         (vrsqrteq_f16): Likewise.
         (vsqrt_f16): Likewise.
         (vsqrtq_f16): Likewise.

[-- Attachment #2: 0002-2-14-ARMv8.2-FP16-one-operand-vector-intrinsics.patch --]
[-- Type: text/x-patch, Size: 26827 bytes --]

From 3ab3e91e81aa1aa01894a07083e226779145ec88 Mon Sep 17 00:00:00 2001
From: Jiong Wang <jiong.wang@arm.com>
Date: Wed, 8 Jun 2016 09:30:16 +0100
Subject: [PATCH 02/14] [2/14] ARMv8.2 FP16 one operand vector intrinsics

---
 gcc/config/aarch64/aarch64-builtins.c        |   4 +
 gcc/config/aarch64/aarch64-simd-builtins.def |  56 ++++-
 gcc/config/aarch64/aarch64-simd.md           |  78 +++---
 gcc/config/aarch64/arm_neon.h                | 361 ++++++++++++++++++++++++++-
 gcc/config/aarch64/iterators.md              |  37 ++-
 5 files changed, 478 insertions(+), 58 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index 6b90b2a..af5fac5 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -139,6 +139,10 @@ aarch64_types_binop_ssu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_unsigned };
 #define TYPES_BINOP_SSU (aarch64_types_binop_ssu_qualifiers)
 static enum aarch64_type_qualifiers
+aarch64_types_binop_uss_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_none, qualifier_none };
+#define TYPES_BINOP_USS (aarch64_types_binop_uss_qualifiers)
+static enum aarch64_type_qualifiers
 aarch64_types_binopp_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_poly, qualifier_poly, qualifier_poly };
 #define TYPES_BINOPP (aarch64_types_binopp_qualifiers)
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index df0a7d8..3e48046 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -42,7 +42,7 @@
   BUILTIN_VDC (COMBINE, combine, 0)
   BUILTIN_VB (BINOP, pmul, 0)
   BUILTIN_VALLF (BINOP, fmulx, 0)
-  BUILTIN_VDQF_DF (UNOP, sqrt, 2)
+  BUILTIN_VHSDF_DF (UNOP, sqrt, 2)
   BUILTIN_VD_BHSI (BINOP, addp, 0)
   VAR1 (UNOP, addp, 0, di)
   BUILTIN_VDQ_BHSI (UNOP, clrsb, 2)
@@ -266,23 +266,29 @@
   BUILTIN_VDQF (BINOP, smin_nanp, 0)
 
   /* Implemented by <frint_pattern><mode>2.  */
-  BUILTIN_VDQF (UNOP, btrunc, 2)
-  BUILTIN_VDQF (UNOP, ceil, 2)
-  BUILTIN_VDQF (UNOP, floor, 2)
-  BUILTIN_VDQF (UNOP, nearbyint, 2)
-  BUILTIN_VDQF (UNOP, rint, 2)
-  BUILTIN_VDQF (UNOP, round, 2)
-  BUILTIN_VDQF_DF (UNOP, frintn, 2)
+  BUILTIN_VHSDF (UNOP, btrunc, 2)
+  BUILTIN_VHSDF (UNOP, ceil, 2)
+  BUILTIN_VHSDF (UNOP, floor, 2)
+  BUILTIN_VHSDF (UNOP, nearbyint, 2)
+  BUILTIN_VHSDF (UNOP, rint, 2)
+  BUILTIN_VHSDF (UNOP, round, 2)
+  BUILTIN_VHSDF_DF (UNOP, frintn, 2)
 
   /* Implemented by l<fcvt_pattern><su_optab><VQDF:mode><vcvt_target>2.  */
+  VAR1 (UNOP, lbtruncv4hf, 2, v4hi)
+  VAR1 (UNOP, lbtruncv8hf, 2, v8hi)
   VAR1 (UNOP, lbtruncv2sf, 2, v2si)
   VAR1 (UNOP, lbtruncv4sf, 2, v4si)
   VAR1 (UNOP, lbtruncv2df, 2, v2di)
 
+  VAR1 (UNOPUS, lbtruncuv4hf, 2, v4hi)
+  VAR1 (UNOPUS, lbtruncuv8hf, 2, v8hi)
   VAR1 (UNOPUS, lbtruncuv2sf, 2, v2si)
   VAR1 (UNOPUS, lbtruncuv4sf, 2, v4si)
   VAR1 (UNOPUS, lbtruncuv2df, 2, v2di)
 
+  VAR1 (UNOP, lroundv4hf, 2, v4hi)
+  VAR1 (UNOP, lroundv8hf, 2, v8hi)
   VAR1 (UNOP, lroundv2sf, 2, v2si)
   VAR1 (UNOP, lroundv4sf, 2, v4si)
   VAR1 (UNOP, lroundv2df, 2, v2di)
@@ -290,38 +296,52 @@
   VAR1 (UNOP, lroundsf, 2, si)
   VAR1 (UNOP, lrounddf, 2, di)
 
+  VAR1 (UNOPUS, lrounduv4hf, 2, v4hi)
+  VAR1 (UNOPUS, lrounduv8hf, 2, v8hi)
   VAR1 (UNOPUS, lrounduv2sf, 2, v2si)
   VAR1 (UNOPUS, lrounduv4sf, 2, v4si)
   VAR1 (UNOPUS, lrounduv2df, 2, v2di)
   VAR1 (UNOPUS, lroundusf, 2, si)
   VAR1 (UNOPUS, lroundudf, 2, di)
 
+  VAR1 (UNOP, lceilv4hf, 2, v4hi)
+  VAR1 (UNOP, lceilv8hf, 2, v8hi)
   VAR1 (UNOP, lceilv2sf, 2, v2si)
   VAR1 (UNOP, lceilv4sf, 2, v4si)
   VAR1 (UNOP, lceilv2df, 2, v2di)
 
+  VAR1 (UNOPUS, lceiluv4hf, 2, v4hi)
+  VAR1 (UNOPUS, lceiluv8hf, 2, v8hi)
   VAR1 (UNOPUS, lceiluv2sf, 2, v2si)
   VAR1 (UNOPUS, lceiluv4sf, 2, v4si)
   VAR1 (UNOPUS, lceiluv2df, 2, v2di)
   VAR1 (UNOPUS, lceilusf, 2, si)
   VAR1 (UNOPUS, lceiludf, 2, di)
 
+  VAR1 (UNOP, lfloorv4hf, 2, v4hi)
+  VAR1 (UNOP, lfloorv8hf, 2, v8hi)
   VAR1 (UNOP, lfloorv2sf, 2, v2si)
   VAR1 (UNOP, lfloorv4sf, 2, v4si)
   VAR1 (UNOP, lfloorv2df, 2, v2di)
 
+  VAR1 (UNOPUS, lflooruv4hf, 2, v4hi)
+  VAR1 (UNOPUS, lflooruv8hf, 2, v8hi)
   VAR1 (UNOPUS, lflooruv2sf, 2, v2si)
   VAR1 (UNOPUS, lflooruv4sf, 2, v4si)
   VAR1 (UNOPUS, lflooruv2df, 2, v2di)
   VAR1 (UNOPUS, lfloorusf, 2, si)
   VAR1 (UNOPUS, lfloorudf, 2, di)
 
+  VAR1 (UNOP, lfrintnv4hf, 2, v4hi)
+  VAR1 (UNOP, lfrintnv8hf, 2, v8hi)
   VAR1 (UNOP, lfrintnv2sf, 2, v2si)
   VAR1 (UNOP, lfrintnv4sf, 2, v4si)
   VAR1 (UNOP, lfrintnv2df, 2, v2di)
   VAR1 (UNOP, lfrintnsf, 2, si)
   VAR1 (UNOP, lfrintndf, 2, di)
 
+  VAR1 (UNOPUS, lfrintnuv4hf, 2, v4hi)
+  VAR1 (UNOPUS, lfrintnuv8hf, 2, v8hi)
   VAR1 (UNOPUS, lfrintnuv2sf, 2, v2si)
   VAR1 (UNOPUS, lfrintnuv4sf, 2, v4si)
   VAR1 (UNOPUS, lfrintnuv2df, 2, v2di)
@@ -329,10 +349,14 @@
   VAR1 (UNOPUS, lfrintnudf, 2, di)
 
   /* Implemented by <optab><fcvt_target><VDQF:mode>2.  */
+  VAR1 (UNOP, floatv4hi, 2, v4hf)
+  VAR1 (UNOP, floatv8hi, 2, v8hf)
   VAR1 (UNOP, floatv2si, 2, v2sf)
   VAR1 (UNOP, floatv4si, 2, v4sf)
   VAR1 (UNOP, floatv2di, 2, v2df)
 
+  VAR1 (UNOP, floatunsv4hi, 2, v4hf)
+  VAR1 (UNOP, floatunsv8hi, 2, v8hf)
   VAR1 (UNOP, floatunsv2si, 2, v2sf)
   VAR1 (UNOP, floatunsv4si, 2, v4sf)
   VAR1 (UNOP, floatunsv2di, 2, v2df)
@@ -358,13 +382,13 @@
 
   BUILTIN_VDQ_SI (UNOP, urecpe, 0)
 
-  BUILTIN_VDQF (UNOP, frecpe, 0)
+  BUILTIN_VHSDF (UNOP, frecpe, 0)
   BUILTIN_VDQF (BINOP, frecps, 0)
 
   /* Implemented by a mixture of abs2 patterns.  Note the DImode builtin is
      only ever used for the int64x1_t intrinsic, there is no scalar version.  */
   BUILTIN_VSDQ_I_DI (UNOP, abs, 0)
-  BUILTIN_VDQF (UNOP, abs, 2)
+  BUILTIN_VHSDF (UNOP, abs, 2)
 
   BUILTIN_VQ_HSF (UNOP, vec_unpacks_hi_, 10)
   VAR1 (BINOP, float_truncate_hi_, 0, v4sf)
@@ -457,7 +481,7 @@
   BUILTIN_VALLF (SHIFTIMM_USS, fcvtzu, 3)
 
   /* Implemented by aarch64_rsqrte<mode>.  */
-  BUILTIN_VALLF (UNOP, rsqrte, 0)
+  BUILTIN_VHSDF_SDF (UNOP, rsqrte, 0)
 
   /* Implemented by aarch64_rsqrts<mode>.  */
   BUILTIN_VALLF (BINOP, rsqrts, 0)
@@ -467,3 +491,13 @@
 
   /* Implemented by aarch64_faddp<mode>.  */
   BUILTIN_VDQF (BINOP, faddp, 0)
+
+  /* Implemented by aarch64_cm<optab><mode>.  */
+  BUILTIN_VHSDF_SDF (BINOP_USS, cmeq, 0)
+  BUILTIN_VHSDF_SDF (BINOP_USS, cmge, 0)
+  BUILTIN_VHSDF_SDF (BINOP_USS, cmgt, 0)
+  BUILTIN_VHSDF_SDF (BINOP_USS, cmle, 0)
+  BUILTIN_VHSDF_SDF (BINOP_USS, cmlt, 0)
+
+  /* Implemented by neg<mode>2.  */
+  BUILTIN_VHSDF (UNOP, neg, 2)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 74dfe28..da6dd52 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -383,12 +383,12 @@
 )
 
 (define_insn "aarch64_rsqrte<mode>"
-  [(set (match_operand:VALLF 0 "register_operand" "=w")
-	(unspec:VALLF [(match_operand:VALLF 1 "register_operand" "w")]
+  [(set (match_operand:VHSDF_SDF 0 "register_operand" "=w")
+	(unspec:VHSDF_SDF [(match_operand:VHSDF_SDF 1 "register_operand" "w")]
 		     UNSPEC_RSQRTE))]
   "TARGET_SIMD"
   "frsqrte\\t%<v>0<Vmtype>, %<v>1<Vmtype>"
-  [(set_attr "type" "neon_fp_rsqrte_<Vetype><q>")])
+  [(set_attr "type" "neon_fp_rsqrte_<stype><q>")])
 
 (define_insn "aarch64_rsqrts<mode>"
   [(set (match_operand:VALLF 0 "register_operand" "=w")
@@ -1510,19 +1510,19 @@
 )
 
 (define_insn "neg<mode>2"
- [(set (match_operand:VDQF 0 "register_operand" "=w")
-       (neg:VDQF (match_operand:VDQF 1 "register_operand" "w")))]
+ [(set (match_operand:VHSDF 0 "register_operand" "=w")
+       (neg:VHSDF (match_operand:VHSDF 1 "register_operand" "w")))]
  "TARGET_SIMD"
  "fneg\\t%0.<Vtype>, %1.<Vtype>"
-  [(set_attr "type" "neon_fp_neg_<Vetype><q>")]
+  [(set_attr "type" "neon_fp_neg_<stype><q>")]
 )
 
 (define_insn "abs<mode>2"
- [(set (match_operand:VDQF 0 "register_operand" "=w")
-       (abs:VDQF (match_operand:VDQF 1 "register_operand" "w")))]
+ [(set (match_operand:VHSDF 0 "register_operand" "=w")
+       (abs:VHSDF (match_operand:VHSDF 1 "register_operand" "w")))]
  "TARGET_SIMD"
  "fabs\\t%0.<Vtype>, %1.<Vtype>"
-  [(set_attr "type" "neon_fp_abs_<Vetype><q>")]
+  [(set_attr "type" "neon_fp_abs_<stype><q>")]
 )
 
 (define_insn "fma<mode>4"
@@ -1680,24 +1680,24 @@
 ;; Vector versions of the floating-point frint patterns.
 ;; Expands to btrunc, ceil, floor, nearbyint, rint, round, frintn.
 (define_insn "<frint_pattern><mode>2"
-  [(set (match_operand:VDQF 0 "register_operand" "=w")
-	(unspec:VDQF [(match_operand:VDQF 1 "register_operand" "w")]
-		      FRINT))]
+  [(set (match_operand:VHSDF 0 "register_operand" "=w")
+	(unspec:VHSDF [(match_operand:VHSDF 1 "register_operand" "w")]
+		       FRINT))]
   "TARGET_SIMD"
   "frint<frint_suffix>\\t%0.<Vtype>, %1.<Vtype>"
-  [(set_attr "type" "neon_fp_round_<Vetype><q>")]
+  [(set_attr "type" "neon_fp_round_<stype><q>")]
 )
 
 ;; Vector versions of the fcvt standard patterns.
 ;; Expands to lbtrunc, lround, lceil, lfloor
-(define_insn "l<fcvt_pattern><su_optab><VDQF:mode><fcvt_target>2"
+(define_insn "l<fcvt_pattern><su_optab><VHSDF:mode><fcvt_target>2"
   [(set (match_operand:<FCVT_TARGET> 0 "register_operand" "=w")
 	(FIXUORS:<FCVT_TARGET> (unspec:<FCVT_TARGET>
-			       [(match_operand:VDQF 1 "register_operand" "w")]
+			       [(match_operand:VHSDF 1 "register_operand" "w")]
 			       FCVT)))]
   "TARGET_SIMD"
   "fcvt<frint_suffix><su>\\t%0.<Vtype>, %1.<Vtype>"
-  [(set_attr "type" "neon_fp_to_int_<Vetype><q>")]
+  [(set_attr "type" "neon_fp_to_int_<stype><q>")]
 )
 
 (define_insn "*aarch64_fcvt<su_optab><VDQF:mode><fcvt_target>2_mult"
@@ -1720,36 +1720,36 @@
   [(set_attr "type" "neon_fp_to_int_<Vetype><q>")]
 )
 
-(define_expand "<optab><VDQF:mode><fcvt_target>2"
+(define_expand "<optab><VHSDF:mode><fcvt_target>2"
   [(set (match_operand:<FCVT_TARGET> 0 "register_operand")
 	(FIXUORS:<FCVT_TARGET> (unspec:<FCVT_TARGET>
-			       [(match_operand:VDQF 1 "register_operand")]
+			       [(match_operand:VHSDF 1 "register_operand")]
 			       UNSPEC_FRINTZ)))]
   "TARGET_SIMD"
   {})
 
-(define_expand "<fix_trunc_optab><VDQF:mode><fcvt_target>2"
+(define_expand "<fix_trunc_optab><VHSDF:mode><fcvt_target>2"
   [(set (match_operand:<FCVT_TARGET> 0 "register_operand")
 	(FIXUORS:<FCVT_TARGET> (unspec:<FCVT_TARGET>
-			       [(match_operand:VDQF 1 "register_operand")]
-			       UNSPEC_FRINTZ)))]
+			       [(match_operand:VHSDF 1 "register_operand")]
+			        UNSPEC_FRINTZ)))]
   "TARGET_SIMD"
   {})
 
-(define_expand "ftrunc<VDQF:mode>2"
-  [(set (match_operand:VDQF 0 "register_operand")
-	(unspec:VDQF [(match_operand:VDQF 1 "register_operand")]
-		      UNSPEC_FRINTZ))]
+(define_expand "ftrunc<VHSDF:mode>2"
+  [(set (match_operand:VHSDF 0 "register_operand")
+	(unspec:VHSDF [(match_operand:VHSDF 1 "register_operand")]
+		       UNSPEC_FRINTZ))]
   "TARGET_SIMD"
   {})
 
-(define_insn "<optab><fcvt_target><VDQF:mode>2"
-  [(set (match_operand:VDQF 0 "register_operand" "=w")
-	(FLOATUORS:VDQF
+(define_insn "<optab><fcvt_target><VHSDF:mode>2"
+  [(set (match_operand:VHSDF 0 "register_operand" "=w")
+	(FLOATUORS:VHSDF
 	  (match_operand:<FCVT_TARGET> 1 "register_operand" "w")))]
   "TARGET_SIMD"
   "<su_optab>cvtf\\t%0.<Vtype>, %1.<Vtype>"
-  [(set_attr "type" "neon_int_to_fp_<Vetype><q>")]
+  [(set_attr "type" "neon_int_to_fp_<stype><q>")]
 )
 
 ;; Conversions between vectors of floats and doubles.
@@ -4247,14 +4247,14 @@
   [(set (match_operand:<V_cmp_result> 0 "register_operand" "=w,w")
 	(neg:<V_cmp_result>
 	  (COMPARISONS:<V_cmp_result>
-	    (match_operand:VALLF 1 "register_operand" "w,w")
-	    (match_operand:VALLF 2 "aarch64_simd_reg_or_zero" "w,YDz")
+	    (match_operand:VHSDF_SDF 1 "register_operand" "w,w")
+	    (match_operand:VHSDF_SDF 2 "aarch64_simd_reg_or_zero" "w,YDz")
 	  )))]
   "TARGET_SIMD"
   "@
   fcm<n_optab>\t%<v>0<Vmtype>, %<v><cmp_1><Vmtype>, %<v><cmp_2><Vmtype>
   fcm<optab>\t%<v>0<Vmtype>, %<v>1<Vmtype>, 0"
-  [(set_attr "type" "neon_fp_compare_<Vetype><q>")]
+  [(set_attr "type" "neon_fp_compare_<stype><q>")]
 )
 
 ;; fac(ge|gt)
@@ -4299,11 +4299,11 @@
 ;; sqrt
 
 (define_insn "sqrt<mode>2"
-  [(set (match_operand:VDQF 0 "register_operand" "=w")
-        (sqrt:VDQF (match_operand:VDQF 1 "register_operand" "w")))]
+  [(set (match_operand:VHSDF 0 "register_operand" "=w")
+        (sqrt:VHSDF (match_operand:VHSDF 1 "register_operand" "w")))]
   "TARGET_SIMD"
   "fsqrt\\t%0.<Vtype>, %1.<Vtype>"
-  [(set_attr "type" "neon_fp_sqrt_<Vetype><q>")]
+  [(set_attr "type" "neon_fp_sqrt_<stype><q>")]
 )
 
 ;; Patterns for vector struct loads and stores.
@@ -5355,12 +5355,12 @@
 )
 
 (define_insn "aarch64_frecpe<mode>"
-  [(set (match_operand:VDQF 0 "register_operand" "=w")
-	(unspec:VDQF [(match_operand:VDQF 1 "register_operand" "w")]
-		    UNSPEC_FRECPE))]
+  [(set (match_operand:VHSDF 0 "register_operand" "=w")
+	(unspec:VHSDF [(match_operand:VHSDF 1 "register_operand" "w")]
+	 UNSPEC_FRECPE))]
   "TARGET_SIMD"
   "frecpe\\t%0.<Vtype>, %1.<Vtype>"
-  [(set_attr "type" "neon_fp_recpe_<Vetype><q>")]
+  [(set_attr "type" "neon_fp_recpe_<stype><q>")]
 )
 
 (define_insn "aarch64_frecp<FRECP:frecp_suffix><mode>"
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index b7b1eb8..3018049 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -25739,6 +25739,365 @@ __INTERLEAVE_LIST (zip)
 
 /* End of optimal implementations in approved order.  */
 
+#pragma GCC pop_options
+
+/* ARMv8.2-A FP16 intrinsics.  */
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8.2-a+fp16")
+
+/* ARMv8.2-A FP16 one operand vector intrinsics.  */
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vabs_f16 (float16x4_t __a)
+{
+  return __builtin_aarch64_absv4hf (__a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vabsq_f16 (float16x8_t __a)
+{
+  return __builtin_aarch64_absv8hf (__a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vceqz_f16 (float16x4_t __a)
+{
+  return __builtin_aarch64_cmeqv4hf_uss (__a, vdup_n_f16 (0.0f));
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vceqzq_f16 (float16x8_t __a)
+{
+  return __builtin_aarch64_cmeqv8hf_uss (__a, vdupq_n_f16 (0.0f));
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcgez_f16 (float16x4_t __a)
+{
+  return __builtin_aarch64_cmgev4hf_uss (__a, vdup_n_f16 (0.0f));
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcgezq_f16 (float16x8_t __a)
+{
+  return __builtin_aarch64_cmgev8hf_uss (__a, vdupq_n_f16 (0.0f));
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcgtz_f16 (float16x4_t __a)
+{
+  return __builtin_aarch64_cmgtv4hf_uss (__a, vdup_n_f16 (0.0f));
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcgtzq_f16 (float16x8_t __a)
+{
+  return __builtin_aarch64_cmgtv8hf_uss (__a, vdupq_n_f16 (0.0f));
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vclez_f16 (float16x4_t __a)
+{
+  return __builtin_aarch64_cmlev4hf_uss (__a, vdup_n_f16 (0.0f));
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vclezq_f16 (float16x8_t __a)
+{
+  return __builtin_aarch64_cmlev8hf_uss (__a, vdupq_n_f16 (0.0f));
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcltz_f16 (float16x4_t __a)
+{
+  return __builtin_aarch64_cmltv4hf_uss (__a, vdup_n_f16 (0.0f));
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcltzq_f16 (float16x8_t __a)
+{
+  return __builtin_aarch64_cmltv8hf_uss (__a, vdupq_n_f16 (0.0f));
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vcvt_f16_s16 (int16x4_t __a)
+{
+  return __builtin_aarch64_floatv4hiv4hf (__a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vcvtq_f16_s16 (int16x8_t __a)
+{
+  return __builtin_aarch64_floatv8hiv8hf (__a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vcvt_f16_u16 (uint16x4_t __a)
+{
+  return __builtin_aarch64_floatunsv4hiv4hf ((int16x4_t) __a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vcvtq_f16_u16 (uint16x8_t __a)
+{
+  return __builtin_aarch64_floatunsv8hiv8hf ((int16x8_t) __a);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vcvt_s16_f16 (float16x4_t __a)
+{
+  return __builtin_aarch64_lbtruncv4hfv4hi (__a);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vcvtq_s16_f16 (float16x8_t __a)
+{
+  return __builtin_aarch64_lbtruncv8hfv8hi (__a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcvt_u16_f16 (float16x4_t __a)
+{
+  return __builtin_aarch64_lbtruncuv4hfv4hi_us (__a);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcvtq_u16_f16 (float16x8_t __a)
+{
+  return __builtin_aarch64_lbtruncuv8hfv8hi_us (__a);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vcvta_s16_f16 (float16x4_t __a)
+{
+  return __builtin_aarch64_lroundv4hfv4hi (__a);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vcvtaq_s16_f16 (float16x8_t __a)
+{
+  return __builtin_aarch64_lroundv8hfv8hi (__a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcvta_u16_f16 (float16x4_t __a)
+{
+  return __builtin_aarch64_lrounduv4hfv4hi_us (__a);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcvtaq_u16_f16 (float16x8_t __a)
+{
+  return __builtin_aarch64_lrounduv8hfv8hi_us (__a);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vcvtm_s16_f16 (float16x4_t __a)
+{
+  return __builtin_aarch64_lfloorv4hfv4hi (__a);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vcvtmq_s16_f16 (float16x8_t __a)
+{
+  return __builtin_aarch64_lfloorv8hfv8hi (__a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcvtm_u16_f16 (float16x4_t __a)
+{
+  return __builtin_aarch64_lflooruv4hfv4hi_us (__a);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcvtmq_u16_f16 (float16x8_t __a)
+{
+  return __builtin_aarch64_lflooruv8hfv8hi_us (__a);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vcvtn_s16_f16 (float16x4_t __a)
+{
+  return __builtin_aarch64_lfrintnv4hfv4hi (__a);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vcvtnq_s16_f16 (float16x8_t __a)
+{
+  return __builtin_aarch64_lfrintnv8hfv8hi (__a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcvtn_u16_f16 (float16x4_t __a)
+{
+  return __builtin_aarch64_lfrintnuv4hfv4hi_us (__a);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcvtnq_u16_f16 (float16x8_t __a)
+{
+  return __builtin_aarch64_lfrintnuv8hfv8hi_us (__a);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vcvtp_s16_f16 (float16x4_t __a)
+{
+  return __builtin_aarch64_lceilv4hfv4hi (__a);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vcvtpq_s16_f16 (float16x8_t __a)
+{
+  return __builtin_aarch64_lceilv8hfv8hi (__a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcvtp_u16_f16 (float16x4_t __a)
+{
+  return __builtin_aarch64_lceiluv4hfv4hi_us (__a);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcvtpq_u16_f16 (float16x8_t __a)
+{
+  return __builtin_aarch64_lceiluv8hfv8hi_us (__a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vneg_f16 (float16x4_t __a)
+{
+  return -__a;
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vnegq_f16 (float16x8_t __a)
+{
+  return -__a;
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vrecpe_f16 (float16x4_t __a)
+{
+  return __builtin_aarch64_frecpev4hf (__a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vrecpeq_f16 (float16x8_t __a)
+{
+  return __builtin_aarch64_frecpev8hf (__a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vrnd_f16 (float16x4_t __a)
+{
+  return __builtin_aarch64_btruncv4hf (__a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vrndq_f16 (float16x8_t __a)
+{
+  return __builtin_aarch64_btruncv8hf (__a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vrnda_f16 (float16x4_t __a)
+{
+  return __builtin_aarch64_roundv4hf (__a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vrndaq_f16 (float16x8_t __a)
+{
+  return __builtin_aarch64_roundv8hf (__a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vrndi_f16 (float16x4_t __a)
+{
+  return __builtin_aarch64_nearbyintv4hf (__a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vrndiq_f16 (float16x8_t __a)
+{
+  return __builtin_aarch64_nearbyintv8hf (__a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vrndm_f16 (float16x4_t __a)
+{
+  return __builtin_aarch64_floorv4hf (__a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vrndmq_f16 (float16x8_t __a)
+{
+  return __builtin_aarch64_floorv8hf (__a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vrndn_f16 (float16x4_t __a)
+{
+  return __builtin_aarch64_frintnv4hf (__a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vrndnq_f16 (float16x8_t __a)
+{
+  return __builtin_aarch64_frintnv8hf (__a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vrndp_f16 (float16x4_t __a)
+{
+  return __builtin_aarch64_ceilv4hf (__a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vrndpq_f16 (float16x8_t __a)
+{
+  return __builtin_aarch64_ceilv8hf (__a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vrndx_f16 (float16x4_t __a)
+{
+  return __builtin_aarch64_rintv4hf (__a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vrndxq_f16 (float16x8_t __a)
+{
+  return __builtin_aarch64_rintv8hf (__a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vrsqrte_f16 (float16x4_t a)
+{
+  return __builtin_aarch64_rsqrtev4hf (a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vrsqrteq_f16 (float16x8_t a)
+{
+  return __builtin_aarch64_rsqrtev8hf (a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vsqrt_f16 (float16x4_t a)
+{
+  return __builtin_aarch64_sqrtv4hf (a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vsqrtq_f16 (float16x8_t a)
+{
+  return __builtin_aarch64_sqrtv8hf (a);
+}
+
+#pragma GCC pop_options
+
 #undef __aarch64_vget_lane_any
 
 #undef __aarch64_vdup_lane_any
@@ -25795,6 +26154,4 @@ __INTERLEAVE_LIST (zip)
 #undef __aarch64_vdupq_laneq_u32
 #undef __aarch64_vdupq_laneq_u64
 
-#pragma GCC pop_options
-
 #endif
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index e8fbb12..2687d74 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -88,11 +88,20 @@
 ;; Vector Float modes suitable for moving, loading and storing.
 (define_mode_iterator VDQF_F16 [V4HF V8HF V2SF V4SF V2DF])
 
-;; Vector Float modes, barring HF modes.
+;; Vector Float modes.
 (define_mode_iterator VDQF [V2SF V4SF V2DF])
+(define_mode_iterator VHSDF [(V4HF "TARGET_SIMD_F16INST")
+			     (V8HF "TARGET_SIMD_F16INST")
+			     V2SF V4SF V2DF])
 
 ;; Vector Float modes, and DF.
 (define_mode_iterator VDQF_DF [V2SF V4SF V2DF DF])
+(define_mode_iterator VHSDF_DF [(V4HF "TARGET_SIMD_F16INST")
+				(V8HF "TARGET_SIMD_F16INST")
+				V2SF V4SF V2DF DF])
+(define_mode_iterator VHSDF_SDF [(V4HF "TARGET_SIMD_F16INST")
+				 (V8HF "TARGET_SIMD_F16INST")
+				 V2SF V4SF V2DF SF DF])
 
 ;; Vector single Float modes.
 (define_mode_iterator VDQSF [V2SF V4SF])
@@ -366,7 +375,8 @@
 		    (V4HI "") (V8HI "")
 		    (V2SI "") (V4SI  "")
 		    (V2DI "") (V2SF "")
-		    (V4SF "") (V2DF "")])
+		    (V4SF "") (V4HF "")
+		    (V8HF "") (V2DF "")])
 
 ;; For scalar usage of vector/FP registers, narrowing
 (define_mode_attr vn2 [(QI "") (HI "b") (SI "h") (DI "s")
@@ -447,6 +457,16 @@
 			  (QI "b")   (HI "h")
 			  (SI "s")   (DI "d")])
 
+;; Vetype is used everywhere in scheduling type and assembly output,
+;; sometimes they are not the same, for example HF modes on some
+;; instructions.  stype is defined to represent scheduling type
+;; more accurately.
+(define_mode_attr stype [(V8QI "b") (V16QI "b") (V4HI "s") (V8HI "s")
+			 (V2SI "s") (V4SI "s") (V2DI "d") (V4HF "s")
+			 (V8HF "s") (V2SF "s") (V4SF "s") (V2DF "d")
+			 (HF "s") (SF "s") (DF "d") (QI "b") (HI "s")
+			 (SI "s") (DI "d")])
+
 ;; Mode-to-bitwise operation type mapping.
 (define_mode_attr Vbtype [(V8QI "8b")  (V16QI "16b")
 			  (V4HI "8b") (V8HI  "16b")
@@ -655,11 +675,15 @@
   [(QI "b") (HI "h") (SI "") (DI "")])
 
 (define_mode_attr fcvt_target [(V2DF "v2di") (V4SF "v4si") (V2SF "v2si")
-			       (V2DI "v2df") (V4SI "v4sf") (V2SI "v2sf")
-			       (SF "si") (DF "di") (SI "sf") (DI "df")])
+                               (V2DI "v2df") (V4SI "v4sf") (V2SI "v2sf")
+			       (SF "si") (DF "di") (SI "sf") (DI "df")
+			       (V4HF "v4hi") (V8HF "v8hi") (V4HI "v4hf")
+			       (V8HI "v8hf")])
 (define_mode_attr FCVT_TARGET [(V2DF "V2DI") (V4SF "V4SI") (V2SF "V2SI")
-			       (V2DI "V2DF") (V4SI "V4SF") (V2SI "V2SF")
-			       (SF "SI") (DF "DI") (SI "SF") (DI "DF")])
+                               (V2DI "V2DF") (V4SI "V4SF") (V2SI "V2SF")
+			       (SF "SI") (DF "DI") (SI "SF") (DI "DF")
+			       (V4HF "V4HI") (V8HF "V8HI") (V4HI "V4HF")
+			       (V8HI "V8HF")])
 
 
 ;; for the inequal width integer to fp conversions
@@ -687,6 +711,7 @@
 ;; the 'x' constraint.  All other modes may use the 'w' constraint.
 (define_mode_attr h_con [(V2SI "w") (V4SI "w")
 			 (V4HI "x") (V8HI "x")
+			 (V4HF "w") (V8HF "w")
 			 (V2SF "w") (V4SF "w")
 			 (V2DF "w") (DF "w")])
 
-- 
2.5.0





^ permalink raw reply	[flat|nested] 32+ messages in thread

* [AArch64][3/14] ARMv8.2-A FP16 two operands vector intrinsics
       [not found]   ` <21c3c64f-95ad-c127-3f8a-4afd236aae33@foss.arm.com>
@ 2016-07-07 16:15     ` Jiong Wang
  2016-07-20 17:01       ` Jiong Wang
       [not found]     ` <938d13c1-39be-5fe3-9997-e55942bbd163@foss.arm.com>
  1 sibling, 1 reply; 32+ messages in thread
From: Jiong Wang @ 2016-07-07 16:15 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 3282 bytes --]

This patch add ARMv8.2-A FP16 two operands vector intrinsics.

gcc/
2016-07-07  Jiong Wang <jiong.wang@arm.com>

         * config/aarch64/aarch64-simd-builtins.def: Register new builtins.
         * config/aarch64/aarch64-simd.md
         (aarch64_rsqrts<mode>): Extend to HF modes.
         (fabd<mode>3): Likewise.
(<FCVT_F2FIXED:fcvt_fixed_insn><VHSDF_SDF:mode>3): Likewise.
(<FCVT_FIXED2F:fcvt_fixed_insn><VHSDI_SDI:mode>3): Likewise.
         (aarch64_<maxmin_uns>p<mode>): Likewise.
         (<su><maxmin><mode>3): Likewise.
         (<maxmin_uns><mode>3): Likewise.
         (<fmaxmin><mode>3): Likewise.
         (aarch64_faddp<mode>): Likewise.
         (aarch64_fmulx<mode>): Likewise.
         (aarch64_frecps<mode>): Likewise.
         (*aarch64_fac<optab><mode>): Rename to aarch64_fac<optab><mode>.
         (add<mode>3): Extend to HF modes.
         (sub<mode>3): Likewise.
         (mul<mode>3): Likewise.
         (div<mode>3): Likewise.
         * config/aarch64/iterators.md (VDQ_HSDI, VSDQ_HSDI): New mode
         iterator.
         * config/aarch64/arm_neon.h (vadd_f16): Likewise.
         (vaddq_f16): Likewise.
         (vabd_f16): Likewise.
         (vabdq_f16): Likewise.
         (vcage_f16): Likewise.
         (vcageq_f16): Likewise.
         (vcagt_f16): Likewise.
         (vcagtq_f16): Likewise.
         (vcale_f16): Likewise.
         (vcaleq_f16): Likewise.
         (vcalt_f16): Likewise.
         (vcaltq_f16): Likewise.
         (vceq_f16): Likewise.
         (vceqq_f16): Likewise.
         (vcge_f16): Likewise.
         (vcgeq_f16): Likewise.
         (vcgt_f16): Likewise.
         (vcgtq_f16): Likewise.
         (vcle_f16): Likewise.
         (vcleq_f16): Likewise.
         (vclt_f16): Likewise.
         (vcltq_f16): Likewise.
         (vcvt_n_f16_s16): Likewise.
         (vcvtq_n_f16_s16): Likewise.
         (vcvt_n_f16_u16): Likewise.
         (vcvtq_n_f16_u16): Likewise.
         (vcvt_n_s16_f16): Likewise.
         (vcvtq_n_s16_f16): Likewise.
         (vcvt_n_u16_f16): Likewise.
         (vcvtq_n_u16_f16): Likewise.
         (vdiv_f16): Likewise.
         (vdivq_f16): Likewise.
         (vdup_lane_f16): Likewise.
         (vdup_laneq_f16): Likewise.
         (vdupq_lane_f16): Likewise.
         (vdupq_laneq_f16): Likewise.
         (vdups_lane_f16): Likewise.
         (vdups_laneq_f16): Likewise.
         (vmax_f16): Likewise.
         (vmaxq_f16): Likewise.
         (vmaxnm_f16): Likewise.
         (vmaxnmq_f16): Likewise.
         (vmin_f16): Likewise.
         (vminq_f16): Likewise.
         (vminnm_f16): Likewise.
         (vminnmq_f16): Likewise.
         (vmul_f16): Likewise.
         (vmulq_f16): Likewise.
         (vmulx_f16): Likewise.
         (vmulxq_f16): Likewise.
         (vpadd_f16): Likewise.
         (vpaddq_f16): Likewise.
         (vpmax_f16): Likewise.
         (vpmaxq_f16): Likewise.
         (vpmaxnm_f16): Likewise.
         (vpmaxnmq_f16): Likewise.
         (vpmin_f16): Likewise.
         (vpminq_f16): Likewise.
         (vpminnm_f16): Likewise.
         (vpminnmq_f16): Likewise.
         (vrecps_f16): Likewise.
         (vrecpsq_f16): Likewise.
         (vrsqrts_f16): Likewise.
         (vrsqrtsq_f16): Likewise.
         (vsub_f16): Likewise.
         (vsubq_f16): Likewise.


[-- Attachment #2: 0003-3-14-ARMv8.2-FP16-two-operands-vector-intrinsics.patch --]
[-- Type: text/x-patch, Size: 26986 bytes --]

commit 5ed72d355491365b3af5883cdc5a4fdaf5cb545b
Author: Jiong Wang <jiong.wang@arm.com>
Date:   Wed Jun 8 10:10:28 2016 +0100

    [3/14] ARMv8.2 FP16 two operands vector intrinsics

 gcc/config/aarch64/aarch64-simd-builtins.def |  40 +--
 gcc/config/aarch64/aarch64-simd.md           | 152 +++++------
 gcc/config/aarch64/arm_neon.h                | 362 +++++++++++++++++++++++++++
 gcc/config/aarch64/iterators.md              |  10 +
 4 files changed, 473 insertions(+), 91 deletions(-)
commit 5ed72d355491365b3af5883cdc5a4fdaf5cb545b
Author: Jiong Wang <jiong.wang@arm.com>
Date:   Wed Jun 8 10:10:28 2016 +0100

    [3/14] ARMv8.2 FP16 two operands vector intrinsics

diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 3e48046..fe17298 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -41,7 +41,7 @@
 
   BUILTIN_VDC (COMBINE, combine, 0)
   BUILTIN_VB (BINOP, pmul, 0)
-  BUILTIN_VALLF (BINOP, fmulx, 0)
+  BUILTIN_VHSDF_SDF (BINOP, fmulx, 0)
   BUILTIN_VHSDF_DF (UNOP, sqrt, 2)
   BUILTIN_VD_BHSI (BINOP, addp, 0)
   VAR1 (UNOP, addp, 0, di)
@@ -248,22 +248,22 @@
   BUILTIN_VDQ_BHSI (BINOP, smin, 3)
   BUILTIN_VDQ_BHSI (BINOP, umax, 3)
   BUILTIN_VDQ_BHSI (BINOP, umin, 3)
-  BUILTIN_VDQF (BINOP, smax_nan, 3)
-  BUILTIN_VDQF (BINOP, smin_nan, 3)
+  BUILTIN_VHSDF (BINOP, smax_nan, 3)
+  BUILTIN_VHSDF (BINOP, smin_nan, 3)
 
   /* Implemented by <fmaxmin><mode>3.  */
-  BUILTIN_VDQF (BINOP, fmax, 3)
-  BUILTIN_VDQF (BINOP, fmin, 3)
+  BUILTIN_VHSDF (BINOP, fmax, 3)
+  BUILTIN_VHSDF (BINOP, fmin, 3)
 
   /* Implemented by aarch64_<maxmin_uns>p<mode>.  */
   BUILTIN_VDQ_BHSI (BINOP, smaxp, 0)
   BUILTIN_VDQ_BHSI (BINOP, sminp, 0)
   BUILTIN_VDQ_BHSI (BINOP, umaxp, 0)
   BUILTIN_VDQ_BHSI (BINOP, uminp, 0)
-  BUILTIN_VDQF (BINOP, smaxp, 0)
-  BUILTIN_VDQF (BINOP, sminp, 0)
-  BUILTIN_VDQF (BINOP, smax_nanp, 0)
-  BUILTIN_VDQF (BINOP, smin_nanp, 0)
+  BUILTIN_VHSDF (BINOP, smaxp, 0)
+  BUILTIN_VHSDF (BINOP, sminp, 0)
+  BUILTIN_VHSDF (BINOP, smax_nanp, 0)
+  BUILTIN_VHSDF (BINOP, smin_nanp, 0)
 
   /* Implemented by <frint_pattern><mode>2.  */
   BUILTIN_VHSDF (UNOP, btrunc, 2)
@@ -383,7 +383,7 @@
   BUILTIN_VDQ_SI (UNOP, urecpe, 0)
 
   BUILTIN_VHSDF (UNOP, frecpe, 0)
-  BUILTIN_VDQF (BINOP, frecps, 0)
+  BUILTIN_VHSDF (BINOP, frecps, 0)
 
   /* Implemented by a mixture of abs2 patterns.  Note the DImode builtin is
      only ever used for the int64x1_t intrinsic, there is no scalar version.  */
@@ -475,22 +475,22 @@
   BUILTIN_VSDQ_HSI (QUADOP_LANE, sqrdmlsh_laneq, 0)
 
   /* Implemented by <FCVT_F2FIXED/FIXED2F:fcvt_fixed_insn><*><*>3.  */
-  BUILTIN_VSDQ_SDI (SHIFTIMM, scvtf, 3)
-  BUILTIN_VSDQ_SDI (FCVTIMM_SUS, ucvtf, 3)
-  BUILTIN_VALLF (SHIFTIMM, fcvtzs, 3)
-  BUILTIN_VALLF (SHIFTIMM_USS, fcvtzu, 3)
+  BUILTIN_VSDQ_HSDI (SHIFTIMM, scvtf, 3)
+  BUILTIN_VSDQ_HSDI (FCVTIMM_SUS, ucvtf, 3)
+  BUILTIN_VHSDF_SDF (SHIFTIMM, fcvtzs, 3)
+  BUILTIN_VHSDF_SDF (SHIFTIMM_USS, fcvtzu, 3)
 
   /* Implemented by aarch64_rsqrte<mode>.  */
   BUILTIN_VHSDF_SDF (UNOP, rsqrte, 0)
 
   /* Implemented by aarch64_rsqrts<mode>.  */
-  BUILTIN_VALLF (BINOP, rsqrts, 0)
+  BUILTIN_VHSDF_SDF (BINOP, rsqrts, 0)
 
   /* Implemented by fabd<mode>3.  */
-  BUILTIN_VALLF (BINOP, fabd, 3)
+  BUILTIN_VHSDF_SDF (BINOP, fabd, 3)
 
   /* Implemented by aarch64_faddp<mode>.  */
-  BUILTIN_VDQF (BINOP, faddp, 0)
+  BUILTIN_VHSDF (BINOP, faddp, 0)
 
   /* Implemented by aarch64_cm<optab><mode>.  */
   BUILTIN_VHSDF_SDF (BINOP_USS, cmeq, 0)
@@ -501,3 +501,9 @@
 
   /* Implemented by neg<mode>2.  */
   BUILTIN_VHSDF (UNOP, neg, 2)
+
+  /* Implemented by aarch64_fac<optab><mode>.  */
+  BUILTIN_VHSDF_SDF (BINOP_USS, faclt, 0)
+  BUILTIN_VHSDF_SDF (BINOP_USS, facle, 0)
+  BUILTIN_VHSDF_SDF (BINOP_USS, facgt, 0)
+  BUILTIN_VHSDF_SDF (BINOP_USS, facge, 0)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index da6dd52..0a80adb 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -391,13 +391,13 @@
   [(set_attr "type" "neon_fp_rsqrte_<stype><q>")])
 
 (define_insn "aarch64_rsqrts<mode>"
-  [(set (match_operand:VALLF 0 "register_operand" "=w")
-	(unspec:VALLF [(match_operand:VALLF 1 "register_operand" "w")
-	       (match_operand:VALLF 2 "register_operand" "w")]
-		     UNSPEC_RSQRTS))]
+  [(set (match_operand:VHSDF_SDF 0 "register_operand" "=w")
+	(unspec:VHSDF_SDF [(match_operand:VHSDF_SDF 1 "register_operand" "w")
+			   (match_operand:VHSDF_SDF 2 "register_operand" "w")]
+	 UNSPEC_RSQRTS))]
   "TARGET_SIMD"
   "frsqrts\\t%<v>0<Vmtype>, %<v>1<Vmtype>, %<v>2<Vmtype>"
-  [(set_attr "type" "neon_fp_rsqrts_<Vetype><q>")])
+  [(set_attr "type" "neon_fp_rsqrts_<stype><q>")])
 
 (define_expand "rsqrt<mode>2"
   [(set (match_operand:VALLF 0 "register_operand" "=w")
@@ -475,14 +475,14 @@
 )
 
 (define_insn "fabd<mode>3"
-  [(set (match_operand:VALLF 0 "register_operand" "=w")
-	(abs:VALLF
-	  (minus:VALLF
-	    (match_operand:VALLF 1 "register_operand" "w")
-	    (match_operand:VALLF 2 "register_operand" "w"))))]
+  [(set (match_operand:VHSDF_SDF 0 "register_operand" "=w")
+	(abs:VHSDF_SDF
+	  (minus:VHSDF_SDF
+	    (match_operand:VHSDF_SDF 1 "register_operand" "w")
+	    (match_operand:VHSDF_SDF 2 "register_operand" "w"))))]
   "TARGET_SIMD"
   "fabd\t%<v>0<Vmtype>, %<v>1<Vmtype>, %<v>2<Vmtype>"
-  [(set_attr "type" "neon_fp_abd_<Vetype><q>")]
+  [(set_attr "type" "neon_fp_abd_<stype><q>")]
 )
 
 (define_insn "and<mode>3"
@@ -1062,10 +1062,10 @@
 
 ;; Pairwise FP Max/Min operations.
 (define_insn "aarch64_<maxmin_uns>p<mode>"
- [(set (match_operand:VDQF 0 "register_operand" "=w")
-       (unspec:VDQF [(match_operand:VDQF 1 "register_operand" "w")
-		     (match_operand:VDQF 2 "register_operand" "w")]
-		    FMAXMINV))]
+ [(set (match_operand:VHSDF 0 "register_operand" "=w")
+       (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand" "w")
+		      (match_operand:VHSDF 2 "register_operand" "w")]
+		      FMAXMINV))]
  "TARGET_SIMD"
  "<maxmin_uns_op>p\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
   [(set_attr "type" "neon_minmax<q>")]
@@ -1474,39 +1474,39 @@
 ;; FP arithmetic operations.
 
 (define_insn "add<mode>3"
- [(set (match_operand:VDQF 0 "register_operand" "=w")
-       (plus:VDQF (match_operand:VDQF 1 "register_operand" "w")
-		  (match_operand:VDQF 2 "register_operand" "w")))]
+ [(set (match_operand:VHSDF 0 "register_operand" "=w")
+       (plus:VHSDF (match_operand:VHSDF 1 "register_operand" "w")
+		   (match_operand:VHSDF 2 "register_operand" "w")))]
  "TARGET_SIMD"
  "fadd\\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
-  [(set_attr "type" "neon_fp_addsub_<Vetype><q>")]
+  [(set_attr "type" "neon_fp_addsub_<stype><q>")]
 )
 
 (define_insn "sub<mode>3"
- [(set (match_operand:VDQF 0 "register_operand" "=w")
-       (minus:VDQF (match_operand:VDQF 1 "register_operand" "w")
-		   (match_operand:VDQF 2 "register_operand" "w")))]
+ [(set (match_operand:VHSDF 0 "register_operand" "=w")
+       (minus:VHSDF (match_operand:VHSDF 1 "register_operand" "w")
+		    (match_operand:VHSDF 2 "register_operand" "w")))]
  "TARGET_SIMD"
  "fsub\\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
-  [(set_attr "type" "neon_fp_addsub_<Vetype><q>")]
+  [(set_attr "type" "neon_fp_addsub_<stype><q>")]
 )
 
 (define_insn "mul<mode>3"
- [(set (match_operand:VDQF 0 "register_operand" "=w")
-       (mult:VDQF (match_operand:VDQF 1 "register_operand" "w")
-		  (match_operand:VDQF 2 "register_operand" "w")))]
+ [(set (match_operand:VHSDF 0 "register_operand" "=w")
+       (mult:VHSDF (match_operand:VHSDF 1 "register_operand" "w")
+		   (match_operand:VHSDF 2 "register_operand" "w")))]
  "TARGET_SIMD"
  "fmul\\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
-  [(set_attr "type" "neon_fp_mul_<Vetype><q>")]
+  [(set_attr "type" "neon_fp_mul_<stype><q>")]
 )
 
 (define_insn "div<mode>3"
- [(set (match_operand:VDQF 0 "register_operand" "=w")
-       (div:VDQF (match_operand:VDQF 1 "register_operand" "w")
-		 (match_operand:VDQF 2 "register_operand" "w")))]
+ [(set (match_operand:VHSDF 0 "register_operand" "=w")
+       (div:VHSDF (match_operand:VHSDF 1 "register_operand" "w")
+		  (match_operand:VHSDF 2 "register_operand" "w")))]
  "TARGET_SIMD"
  "fdiv\\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
-  [(set_attr "type" "neon_fp_div_<Vetype><q>")]
+  [(set_attr "type" "neon_fp_div_<stype><q>")]
 )
 
 (define_insn "neg<mode>2"
@@ -1771,24 +1771,24 @@
 
 ;; Convert between fixed-point and floating-point (vector modes)
 
-(define_insn "<FCVT_F2FIXED:fcvt_fixed_insn><VDQF:mode>3"
-  [(set (match_operand:<VDQF:FCVT_TARGET> 0 "register_operand" "=w")
-	(unspec:<VDQF:FCVT_TARGET> [(match_operand:VDQF 1 "register_operand" "w")
-				    (match_operand:SI 2 "immediate_operand" "i")]
+(define_insn "<FCVT_F2FIXED:fcvt_fixed_insn><VHSDF:mode>3"
+  [(set (match_operand:<VHSDF:FCVT_TARGET> 0 "register_operand" "=w")
+	(unspec:<VHSDF:FCVT_TARGET> [(match_operand:VHSDF 1 "register_operand" "w")
+				     (match_operand:SI 2 "immediate_operand" "i")]
 	 FCVT_F2FIXED))]
   "TARGET_SIMD"
   "<FCVT_F2FIXED:fcvt_fixed_insn>\t%<v>0<Vmtype>, %<v>1<Vmtype>, #%2"
-  [(set_attr "type" "neon_fp_to_int_<VDQF:Vetype><q>")]
+  [(set_attr "type" "neon_fp_to_int_<VHSDF:stype><q>")]
 )
 
-(define_insn "<FCVT_FIXED2F:fcvt_fixed_insn><VDQ_SDI:mode>3"
-  [(set (match_operand:<VDQ_SDI:FCVT_TARGET> 0 "register_operand" "=w")
-	(unspec:<VDQ_SDI:FCVT_TARGET> [(match_operand:VDQ_SDI 1 "register_operand" "w")
-				       (match_operand:SI 2 "immediate_operand" "i")]
+(define_insn "<FCVT_FIXED2F:fcvt_fixed_insn><VDQ_HSDI:mode>3"
+  [(set (match_operand:<VDQ_HSDI:FCVT_TARGET> 0 "register_operand" "=w")
+	(unspec:<VDQ_HSDI:FCVT_TARGET> [(match_operand:VDQ_HSDI 1 "register_operand" "w")
+					(match_operand:SI 2 "immediate_operand" "i")]
 	 FCVT_FIXED2F))]
   "TARGET_SIMD"
   "<FCVT_FIXED2F:fcvt_fixed_insn>\t%<v>0<Vmtype>, %<v>1<Vmtype>, #%2"
-  [(set_attr "type" "neon_int_to_fp_<VDQ_SDI:Vetype><q>")]
+  [(set_attr "type" "neon_int_to_fp_<VDQ_HSDI:stype><q>")]
 )
 
 ;; ??? Note that the vectorizer usage of the vec_unpacks_[lo/hi] patterns
@@ -1947,33 +1947,33 @@
 ;; NaNs.
 
 (define_insn "<su><maxmin><mode>3"
-  [(set (match_operand:VDQF 0 "register_operand" "=w")
-        (FMAXMIN:VDQF (match_operand:VDQF 1 "register_operand" "w")
-		   (match_operand:VDQF 2 "register_operand" "w")))]
+  [(set (match_operand:VHSDF 0 "register_operand" "=w")
+        (FMAXMIN:VHSDF (match_operand:VHSDF 1 "register_operand" "w")
+		        (match_operand:VHSDF 2 "register_operand" "w")))]
   "TARGET_SIMD"
   "f<maxmin>nm\\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
-  [(set_attr "type" "neon_fp_minmax_<Vetype><q>")]
+  [(set_attr "type" "neon_fp_minmax_<stype><q>")]
 )
 
 (define_insn "<maxmin_uns><mode>3"
-  [(set (match_operand:VDQF 0 "register_operand" "=w")
-       (unspec:VDQF [(match_operand:VDQF 1 "register_operand" "w")
-		     (match_operand:VDQF 2 "register_operand" "w")]
-		    FMAXMIN_UNS))]
+  [(set (match_operand:VHSDF 0 "register_operand" "=w")
+       (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand" "w")
+		      (match_operand:VHSDF 2 "register_operand" "w")]
+		      FMAXMIN_UNS))]
   "TARGET_SIMD"
   "<maxmin_uns_op>\\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
-  [(set_attr "type" "neon_fp_minmax_<Vetype><q>")]
+  [(set_attr "type" "neon_fp_minmax_<stype><q>")]
 )
 
 ;; Auto-vectorized forms for the IEEE-754 fmax()/fmin() functions
 (define_insn "<fmaxmin><mode>3"
-  [(set (match_operand:VDQF 0 "register_operand" "=w")
-	(unspec:VDQF [(match_operand:VDQF 1 "register_operand" "w")
-		      (match_operand:VDQF 2 "register_operand" "w")]
-		      FMAXMIN))]
+  [(set (match_operand:VHSDF 0 "register_operand" "=w")
+	(unspec:VHSDF [(match_operand:VHSDF 1 "register_operand" "w")
+		       (match_operand:VHSDF 2 "register_operand" "w")]
+		       FMAXMIN))]
   "TARGET_SIMD"
   "<fmaxmin_op>\\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
-  [(set_attr "type" "neon_fp_minmax_<Vetype><q>")]
+  [(set_attr "type" "neon_fp_minmax_<stype><q>")]
 )
 
 ;; 'across lanes' add.
@@ -1993,13 +1993,13 @@
 )
 
 (define_insn "aarch64_faddp<mode>"
- [(set (match_operand:VDQF 0 "register_operand" "=w")
-       (unspec:VDQF [(match_operand:VDQF 1 "register_operand" "w")
-		     (match_operand:VDQF 2 "register_operand" "w")]
-		     UNSPEC_FADDV))]
+ [(set (match_operand:VHSDF 0 "register_operand" "=w")
+       (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand" "w")
+		      (match_operand:VHSDF 2 "register_operand" "w")]
+	UNSPEC_FADDV))]
  "TARGET_SIMD"
  "faddp\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
-  [(set_attr "type" "neon_fp_reduc_add_<Vetype><q>")]
+  [(set_attr "type" "neon_fp_reduc_add_<stype><q>")]
 )
 
 (define_insn "aarch64_reduc_plus_internal<mode>"
@@ -2995,13 +2995,14 @@
 ;; fmulx.
 
 (define_insn "aarch64_fmulx<mode>"
-  [(set (match_operand:VALLF 0 "register_operand" "=w")
-	(unspec:VALLF [(match_operand:VALLF 1 "register_operand" "w")
-		       (match_operand:VALLF 2 "register_operand" "w")]
-		      UNSPEC_FMULX))]
+  [(set (match_operand:VHSDF_SDF 0 "register_operand" "=w")
+	(unspec:VHSDF_SDF
+	  [(match_operand:VHSDF_SDF 1 "register_operand" "w")
+	   (match_operand:VHSDF_SDF 2 "register_operand" "w")]
+	   UNSPEC_FMULX))]
  "TARGET_SIMD"
  "fmulx\t%<v>0<Vmtype>, %<v>1<Vmtype>, %<v>2<Vmtype>"
- [(set_attr "type" "neon_fp_mul_<Vetype>")]
+ [(set_attr "type" "neon_fp_mul_<stype>")]
 )
 
 ;; vmulxq_lane_f32, and vmulx_laneq_f32
@@ -4261,16 +4262,18 @@
 ;; Note we can also handle what would be fac(le|lt) by
 ;; generating fac(ge|gt).
 
-(define_insn "*aarch64_fac<optab><mode>"
+(define_insn "aarch64_fac<optab><mode>"
   [(set (match_operand:<V_cmp_result> 0 "register_operand" "=w")
 	(neg:<V_cmp_result>
 	  (FAC_COMPARISONS:<V_cmp_result>
-	    (abs:VALLF (match_operand:VALLF 1 "register_operand" "w"))
-	    (abs:VALLF (match_operand:VALLF 2 "register_operand" "w"))
+	    (abs:VHSDF_SDF
+	      (match_operand:VHSDF_SDF 1 "register_operand" "w"))
+	    (abs:VHSDF_SDF
+	      (match_operand:VHSDF_SDF 2 "register_operand" "w"))
   )))]
   "TARGET_SIMD"
   "fac<n_optab>\t%<v>0<Vmtype>, %<v><cmp_1><Vmtype>, %<v><cmp_2><Vmtype>"
-  [(set_attr "type" "neon_fp_compare_<Vetype><q>")]
+  [(set_attr "type" "neon_fp_compare_<stype><q>")]
 )
 
 ;; addp
@@ -5373,13 +5376,14 @@
 )
 
 (define_insn "aarch64_frecps<mode>"
-  [(set (match_operand:VALLF 0 "register_operand" "=w")
-	(unspec:VALLF [(match_operand:VALLF 1 "register_operand" "w")
-		     (match_operand:VALLF 2 "register_operand" "w")]
-		    UNSPEC_FRECPS))]
+  [(set (match_operand:VHSDF_SDF 0 "register_operand" "=w")
+	(unspec:VHSDF_SDF
+	  [(match_operand:VHSDF_SDF 1 "register_operand" "w")
+	  (match_operand:VHSDF_SDF 2 "register_operand" "w")]
+	  UNSPEC_FRECPS))]
   "TARGET_SIMD"
   "frecps\\t%<v>0<Vmtype>, %<v>1<Vmtype>, %<v>2<Vmtype>"
-  [(set_attr "type" "neon_fp_recps_<Vetype><q>")]
+  [(set_attr "type" "neon_fp_recps_<stype><q>")]
 )
 
 (define_insn "aarch64_urecpe<mode>"
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 3018049..e78ff43 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -26096,6 +26096,368 @@ vsqrtq_f16 (float16x8_t a)
   return __builtin_aarch64_sqrtv8hf (a);
 }
 
+/* ARMv8.2-A FP16 two operands vector intrinsics.  */
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vadd_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __a + __b;
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vaddq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __a + __b;
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vabd_f16 (float16x4_t a, float16x4_t b)
+{
+  return __builtin_aarch64_fabdv4hf (a, b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vabdq_f16 (float16x8_t a, float16x8_t b)
+{
+  return __builtin_aarch64_fabdv8hf (a, b);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcage_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_aarch64_facgev4hf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcageq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_aarch64_facgev8hf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcagt_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_aarch64_facgtv4hf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcagtq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_aarch64_facgtv8hf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcale_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_aarch64_faclev4hf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcaleq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_aarch64_faclev8hf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcalt_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_aarch64_facltv4hf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcaltq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_aarch64_facltv8hf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vceq_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_aarch64_cmeqv4hf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vceqq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_aarch64_cmeqv8hf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcge_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_aarch64_cmgev4hf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcgeq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_aarch64_cmgev8hf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcgt_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_aarch64_cmgtv4hf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcgtq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_aarch64_cmgtv8hf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcle_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_aarch64_cmlev4hf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcleq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_aarch64_cmlev8hf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vclt_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_aarch64_cmltv4hf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcltq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_aarch64_cmltv8hf_uss (__a, __b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vcvt_n_f16_s16 (int16x4_t __a, const int __b)
+{
+  return __builtin_aarch64_scvtfv4hi (__a, __b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vcvtq_n_f16_s16 (int16x8_t __a, const int __b)
+{
+  return __builtin_aarch64_scvtfv8hi (__a, __b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vcvt_n_f16_u16 (uint16x4_t __a, const int __b)
+{
+  return __builtin_aarch64_ucvtfv4hi_sus (__a, __b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vcvtq_n_f16_u16 (uint16x8_t __a, const int __b)
+{
+  return __builtin_aarch64_ucvtfv8hi_sus (__a, __b);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vcvt_n_s16_f16 (float16x4_t __a, const int __b)
+{
+  return __builtin_aarch64_fcvtzsv4hf (__a, __b);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vcvtq_n_s16_f16 (float16x8_t __a, const int __b)
+{
+  return __builtin_aarch64_fcvtzsv8hf (__a, __b);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcvt_n_u16_f16 (float16x4_t __a, const int __b)
+{
+  return __builtin_aarch64_fcvtzuv4hf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcvtq_n_u16_f16 (float16x8_t __a, const int __b)
+{
+  return __builtin_aarch64_fcvtzuv8hf_uss (__a, __b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vdiv_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __a / __b;
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vdivq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __a / __b;
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vmax_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_aarch64_smax_nanv4hf (__a, __b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vmaxq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_aarch64_smax_nanv8hf (__a, __b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vmaxnm_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_aarch64_fmaxv4hf (__a, __b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vmaxnmq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_aarch64_fmaxv8hf (__a, __b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vmin_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_aarch64_smin_nanv4hf (__a, __b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vminq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_aarch64_smin_nanv8hf (__a, __b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vminnm_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_aarch64_fminv4hf (__a, __b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vminnmq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_aarch64_fminv8hf (__a, __b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vmul_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __a * __b;
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vmulq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __a * __b;
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vmulx_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_aarch64_fmulxv4hf (__a, __b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vmulxq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_aarch64_fmulxv8hf (__a, __b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vpadd_f16 (float16x4_t a, float16x4_t b)
+{
+  return __builtin_aarch64_faddpv4hf (a, b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vpaddq_f16 (float16x8_t a, float16x8_t b)
+{
+  return __builtin_aarch64_faddpv8hf (a, b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vpmax_f16 (float16x4_t a, float16x4_t b)
+{
+  return __builtin_aarch64_smax_nanpv4hf (a, b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vpmaxq_f16 (float16x8_t a, float16x8_t b)
+{
+  return __builtin_aarch64_smax_nanpv8hf (a, b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vpmaxnm_f16 (float16x4_t a, float16x4_t b)
+{
+  return __builtin_aarch64_smaxpv4hf (a, b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vpmaxnmq_f16 (float16x8_t a, float16x8_t b)
+{
+  return __builtin_aarch64_smaxpv8hf (a, b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vpmin_f16 (float16x4_t a, float16x4_t b)
+{
+  return __builtin_aarch64_smin_nanpv4hf (a, b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vpminq_f16 (float16x8_t a, float16x8_t b)
+{
+  return __builtin_aarch64_smin_nanpv8hf (a, b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vpminnm_f16 (float16x4_t a, float16x4_t b)
+{
+  return __builtin_aarch64_sminpv4hf (a, b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vpminnmq_f16 (float16x8_t a, float16x8_t b)
+{
+  return __builtin_aarch64_sminpv8hf (a, b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vrecps_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_aarch64_frecpsv4hf (__a, __b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vrecpsq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_aarch64_frecpsv8hf (__a, __b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vrsqrts_f16 (float16x4_t a, float16x4_t b)
+{
+  return __builtin_aarch64_rsqrtsv4hf (a, b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vrsqrtsq_f16 (float16x8_t a, float16x8_t b)
+{
+  return __builtin_aarch64_rsqrtsv8hf (a, b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vsub_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __a - __b;
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vsubq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __a - __b;
+}
+
 #pragma GCC pop_options
 
 #undef __aarch64_vget_lane_any
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index af5eda9..35190b4 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -166,9 +166,19 @@
 ;; Vector modes for S and D
 (define_mode_iterator VDQ_SDI [V2SI V4SI V2DI])
 
+;; Vector modes for H, S and D
+(define_mode_iterator VDQ_HSDI [(V4HI "TARGET_SIMD_F16INST")
+				(V8HI "TARGET_SIMD_F16INST")
+				V2SI V4SI V2DI])
+
 ;; Scalar and Vector modes for S and D
 (define_mode_iterator VSDQ_SDI [V2SI V4SI V2DI SI DI])
 
+;; Scalar and Vector modes for S and D, Vector modes for H.
+(define_mode_iterator VSDQ_HSDI [(V4HI "TARGET_SIMD_F16INST")
+				 (V8HI "TARGET_SIMD_F16INST")
+				 V2SI V4SI V2DI SI DI])
+
 ;; Vector modes for Q and H types.
 (define_mode_iterator VDQQH [V8QI V16QI V4HI V8HI])
 




^ permalink raw reply	[flat|nested] 32+ messages in thread

* [AArch64][5/14] ARMv8.2-A FP16 lane vector intrinsics
       [not found]       ` <a12ecde7-2ac1-0539-334e-9a33395dd3eb@foss.arm.com>
@ 2016-07-07 16:16         ` Jiong Wang
  2016-07-25 11:06           ` James Greenhalgh
       [not found]         ` <a3eeda81-cb1c-6d9e-706d-c5c067a90d74@foss.arm.com>
  1 sibling, 1 reply; 32+ messages in thread
From: Jiong Wang @ 2016-07-07 16:16 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 1665 bytes --]

This patch add ARMv8.2-A FP16 lane vector intrinsics.

Lane intrinsics are generally derivatives of multiply intrinsics,
including multiply accumulate.  All necessary backend support for them
are there already except fmulx, the implementions are largely a
combination of existed multiply intrinsics with vdup intrinsics

2016-07-07  Jiong Wang <jiong.wang@arm.com>

gcc/
         * config/aarch64/aarch64-simd.md (*aarch64_mulx_elt_to_64v2df): 
Rename to
         "*aarch64_mulx_elt_from_dup<mode>".
         (*aarch64_mul3_elt<mode>): Update schedule type.
         (*aarch64_mul3_elt_from_dup<mode>): Likewise.
         (*aarch64_fma4_elt_from_dup<mode>): Likewise.
         (*aarch64_fnma4_elt_from_dup<mode>): Likewise.
         * config/aarch64/iterators.md (VMUL): Supprt half precision 
float modes.
         (f, fp): Support HF modes.
         * config/aarch64/arm_neon.h (vfma_lane_f16): New.
         (vfmaq_lane_f16): Likewise.
         (vfma_laneq_f16): Likewise.
         (vfmaq_laneq_f16): Likewise.
         (vfma_n_f16): Likewise.
         (vfmaq_n_f16): Likewise.
         (vfms_lane_f16): Likewise.
         (vfmsq_lane_f16): Likewise.
         (vfms_laneq_f16): Likewise.
         (vfmsq_laneq_f16): Likewise.
         (vfms_n_f16): Likewise.
         (vfmsq_n_f16): Likewise.
         (vmul_lane_f16): Likewise.
         (vmulq_lane_f16): Likewise.
         (vmul_laneq_f16): Likewise.
         (vmulq_laneq_f16): Likewise.
         (vmul_n_f16): Likewise.
         (vmulq_n_f16): Likewise.
         (vmulx_lane_f16): Likewise.
         (vmulxq_lane_f16): Likewise.
         (vmulx_laneq_f16): Likewise.
         (vmulxq_laneq_f16): Likewise.



[-- Attachment #2: 0005-5-14-ARMv8.2-FP16-lane-vector-intrinsics.patch --]
[-- Type: text/x-patch, Size: 9710 bytes --]

From 25ed161255c4f0155f3c69c1ee4ec0e071ed115c Mon Sep 17 00:00:00 2001
From: Jiong Wang <jiong.wang@arm.com>
Date: Wed, 8 Jun 2016 10:22:38 +0100
Subject: [PATCH 05/14] [5/14] ARMv8.2 FP16 lane vector intrinsics

---
 gcc/config/aarch64/aarch64-simd.md |  28 ++++---
 gcc/config/aarch64/arm_neon.h      | 154 +++++++++++++++++++++++++++++++++++++
 gcc/config/aarch64/iterators.md    |   7 +-
 3 files changed, 173 insertions(+), 16 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 576ad3c..c0600df 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -351,7 +351,7 @@
     operands[2] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[2])));
     return "<f>mul\\t%0.<Vtype>, %3.<Vtype>, %1.<Vetype>[%2]";
   }
-  [(set_attr "type" "neon<fp>_mul_<Vetype>_scalar<q>")]
+  [(set_attr "type" "neon<fp>_mul_<stype>_scalar<q>")]
 )
 
 (define_insn "*aarch64_mul3_elt_<vswap_width_name><mode>"
@@ -379,7 +379,7 @@
       (match_operand:VMUL 2 "register_operand" "w")))]
   "TARGET_SIMD"
   "<f>mul\t%0.<Vtype>, %2.<Vtype>, %1.<Vetype>[0]";
-  [(set_attr "type" "neon<fp>_mul_<Vetype>_scalar<q>")]
+  [(set_attr "type" "neon<fp>_mul_<stype>_scalar<q>")]
 )
 
 (define_insn "aarch64_rsqrte<mode>"
@@ -1579,7 +1579,7 @@
       (match_operand:VMUL 3 "register_operand" "0")))]
   "TARGET_SIMD"
   "fmla\t%0.<Vtype>, %2.<Vtype>, %1.<Vetype>[0]"
-  [(set_attr "type" "neon<fp>_mla_<Vetype>_scalar<q>")]
+  [(set_attr "type" "neon<fp>_mla_<stype>_scalar<q>")]
 )
 
 (define_insn "*aarch64_fma4_elt_to_64v2df"
@@ -1657,7 +1657,7 @@
       (match_operand:VMUL 3 "register_operand" "0")))]
   "TARGET_SIMD"
   "fmls\t%0.<Vtype>, %2.<Vtype>, %1.<Vetype>[0]"
-  [(set_attr "type" "neon<fp>_mla_<Vetype>_scalar<q>")]
+  [(set_attr "type" "neon<fp>_mla_<stype>_scalar<q>")]
 )
 
 (define_insn "*aarch64_fnma4_elt_to_64v2df"
@@ -3044,20 +3044,18 @@
   [(set_attr "type" "neon_fp_mul_<Vetype><q>")]
 )
 
-;; vmulxq_lane_f64
+;; vmulxq_lane
 
-(define_insn "*aarch64_mulx_elt_to_64v2df"
-  [(set (match_operand:V2DF 0 "register_operand" "=w")
-	(unspec:V2DF
-	 [(match_operand:V2DF 1 "register_operand" "w")
-	  (vec_duplicate:V2DF
-	    (match_operand:DF 2 "register_operand" "w"))]
+(define_insn "*aarch64_mulx_elt_from_dup<mode>"
+  [(set (match_operand:VHSDF 0 "register_operand" "=w")
+	(unspec:VHSDF
+	 [(match_operand:VHSDF 1 "register_operand" "w")
+	  (vec_duplicate:VHSDF
+	    (match_operand:<VEL> 2 "register_operand" "w"))]
 	 UNSPEC_FMULX))]
   "TARGET_SIMD"
-  {
-    return "fmulx\t%0.2d, %1.2d, %2.d[0]";
-  }
-  [(set_attr "type" "neon_fp_mul_d_scalar_q")]
+  "fmulx\t%0.<Vtype>, %1.<Vtype>, %2.<Vetype>[0]";
+  [(set_attr "type" "neon<fp>_mul_<stype>_scalar<q>")]
 )
 
 ;; vmulxs_lane_f32, vmulxs_laneq_f32
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index ad5b6fa..b09a3a7 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -26484,6 +26484,160 @@ vfmsq_f16 (float16x8_t __a, float16x8_t __b, float16x8_t __c)
   return __builtin_aarch64_fnmav8hf (__b, __c, __a);
 }
 
+/* ARMv8.2-A FP16 lane vector intrinsics.  */
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vfma_lane_f16 (float16x4_t __a, float16x4_t __b,
+	       float16x4_t __c, const int __lane)
+{
+  return vfma_f16 (__a, __b, __aarch64_vdup_lane_f16 (__c, __lane));
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vfmaq_lane_f16 (float16x8_t __a, float16x8_t __b,
+		float16x4_t __c, const int __lane)
+{
+  return vfmaq_f16 (__a, __b, __aarch64_vdupq_lane_f16 (__c, __lane));
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vfma_laneq_f16 (float16x4_t __a, float16x4_t __b,
+		float16x8_t __c, const int __lane)
+{
+  return vfma_f16 (__a, __b, __aarch64_vdup_laneq_f16 (__c, __lane));
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vfmaq_laneq_f16 (float16x8_t __a, float16x8_t __b,
+		 float16x8_t __c, const int __lane)
+{
+  return vfmaq_f16 (__a, __b, __aarch64_vdupq_laneq_f16 (__c, __lane));
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vfma_n_f16 (float16x4_t __a, float16x4_t __b, float16_t __c)
+{
+  return vfma_f16 (__a, __b, vdup_n_f16 (__c));
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vfmaq_n_f16 (float16x8_t __a, float16x8_t __b, float16_t __c)
+{
+  return vfmaq_f16 (__a, __b, vdupq_n_f16 (__c));
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vfms_lane_f16 (float16x4_t __a, float16x4_t __b,
+	       float16x4_t __c, const int __lane)
+{
+  return vfms_f16 (__a, __b, __aarch64_vdup_lane_f16 (__c, __lane));
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vfmsq_lane_f16 (float16x8_t __a, float16x8_t __b,
+		float16x4_t __c, const int __lane)
+{
+  return vfmsq_f16 (__a, __b, __aarch64_vdupq_lane_f16 (__c, __lane));
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vfms_laneq_f16 (float16x4_t __a, float16x4_t __b,
+		float16x8_t __c, const int __lane)
+{
+  return vfms_f16 (__a, __b, __aarch64_vdup_laneq_f16 (__c, __lane));
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vfmsq_laneq_f16 (float16x8_t __a, float16x8_t __b,
+		 float16x8_t __c, const int __lane)
+{
+  return vfmsq_f16 (__a, __b, __aarch64_vdupq_laneq_f16 (__c, __lane));
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vfms_n_f16 (float16x4_t __a, float16x4_t __b, float16_t __c)
+{
+  return vfms_f16 (__a, __b, vdup_n_f16 (__c));
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vfmsq_n_f16 (float16x8_t __a, float16x8_t __b, float16_t __c)
+{
+  return vfmsq_f16 (__a, __b, vdupq_n_f16 (__c));
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vmul_lane_f16 (float16x4_t __a, float16x4_t __b, const int __lane)
+{
+  return vmul_f16 (__a, vdup_n_f16 (__aarch64_vget_lane_any (__b, __lane)));
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vmulq_lane_f16 (float16x8_t __a, float16x4_t __b, const int __lane)
+{
+  return vmulq_f16 (__a, vdupq_n_f16 (__aarch64_vget_lane_any (__b, __lane)));
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vmul_laneq_f16 (float16x4_t __a, float16x8_t __b, const int __lane)
+{
+  return vmul_f16 (__a, vdup_n_f16 (__aarch64_vget_lane_any (__b, __lane)));
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vmulq_laneq_f16 (float16x8_t __a, float16x8_t __b, const int __lane)
+{
+  return vmulq_f16 (__a, vdupq_n_f16 (__aarch64_vget_lane_any (__b, __lane)));
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vmul_n_f16 (float16x4_t __a, float16_t __b)
+{
+  return vmul_lane_f16 (__a, vdup_n_f16 (__b), 0);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vmulq_n_f16 (float16x8_t __a, float16_t __b)
+{
+  return vmulq_laneq_f16 (__a, vdupq_n_f16 (__b), 0);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vmulx_lane_f16 (float16x4_t __a, float16x4_t __b, const int __lane)
+{
+  return vmulx_f16 (__a, __aarch64_vdup_lane_f16 (__b, __lane));
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vmulxq_lane_f16 (float16x8_t __a, float16x4_t __b, const int __lane)
+{
+  return vmulxq_f16 (__a, __aarch64_vdupq_lane_f16 (__b, __lane));
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vmulx_laneq_f16 (float16x4_t __a, float16x8_t __b, const int __lane)
+{
+  return vmulx_f16 (__a, __aarch64_vdup_laneq_f16 (__b, __lane));
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vmulxq_laneq_f16 (float16x8_t __a, float16x8_t __b, const int __lane)
+{
+  return vmulxq_f16 (__a, __aarch64_vdupq_laneq_f16 (__b, __lane));
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vmulx_n_f16 (float16x4_t __a, float16_t __b)
+{
+  return vmulx_f16 (__a, vdup_n_f16 (__b));
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vmulxq_n_f16 (float16x8_t __a, float16_t __b)
+{
+  return vmulxq_f16 (__a, vdupq_n_f16 (__b));
+}
+
 #pragma GCC pop_options
 
 #undef __aarch64_vget_lane_any
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 35190b4..8d4dc6c 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -218,7 +218,10 @@
 (define_mode_iterator DX [DI DF])
 
 ;; Modes available for <f>mul lane operations.
-(define_mode_iterator VMUL [V4HI V8HI V2SI V4SI V2SF V4SF V2DF])
+(define_mode_iterator VMUL [V4HI V8HI V2SI V4SI
+			    (V4HF "TARGET_SIMD_F16INST")
+			    (V8HF "TARGET_SIMD_F16INST")
+			    V2SF V4SF V2DF])
 
 ;; Modes available for <f>mul lane operations changing lane count.
 (define_mode_iterator VMUL_CHANGE_NLANES [V4HI V8HI V2SI V4SI V2SF V4SF])
@@ -730,6 +733,7 @@
 		     (V4HI "")  (V8HI  "")
 		     (V2SI "")  (V4SI  "")
 		     (DI   "")  (V2DI  "")
+		     (V4HF "f") (V8HF  "f")
 		     (V2SF "f") (V4SF  "f")
 		     (V2DF "f") (DF    "f")])
 
@@ -738,6 +742,7 @@
 		      (V4HI "")  (V8HI  "")
 		      (V2SI "")  (V4SI  "")
 		      (DI   "")  (V2DI  "")
+		      (V4HF "_fp") (V8HF  "_fp")
 		      (V2SF "_fp") (V4SF  "_fp")
 		      (V2DF "_fp") (DF    "_fp")
 		      (SF "_fp")])
-- 
2.5.0





^ permalink raw reply	[flat|nested] 32+ messages in thread

* [AArch64][4/14] ARMv8.2-A FP16 three operands vector intrinsics
       [not found]     ` <938d13c1-39be-5fe3-9997-e55942bbd163@foss.arm.com>
@ 2016-07-07 16:16       ` Jiong Wang
  2016-07-25 11:05         ` James Greenhalgh
       [not found]       ` <a12ecde7-2ac1-0539-334e-9a33395dd3eb@foss.arm.com>
  1 sibling, 1 reply; 32+ messages in thread
From: Jiong Wang @ 2016-07-07 16:16 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 504 bytes --]

This patch add ARMv8.2-A FP16 three operands vector intrinsics.

Three operands intrinsics only contain fma and fms.

2016-07-07  Jiong Wang <jiong.wang@arm.com>

gcc/
         * config/aarch64/aarch64-simd-builtins.def: Register new builtins.
         * config/aarch64/aarch64-simd.md (fma<mode>4): Extend to HF modes.
         (fnma<mode>4): Likewise.
         * config/aarch64/arm_neon.h (vfma_f16): New.
         (vfmaq_f16): Likewise.
         (vfms_f16): Likewise.
         (vfmsq_f16): Likewise.


[-- Attachment #2: 0004-4-14-ARMv8.2-FP16-three-operands-vector-intrinsics.patch --]
[-- Type: text/x-patch, Size: 4034 bytes --]

From dc2121d586b759b864d9653e188a14d1f7296f25 Mon Sep 17 00:00:00 2001
From: Jiong Wang <jiong.wang@arm.com>
Date: Wed, 8 Jun 2016 10:21:25 +0100
Subject: [PATCH 04/14] [4/14] ARMv8.2 FP16 three operands vector intrinsics

---
 gcc/config/aarch64/aarch64-simd-builtins.def |  4 +++-
 gcc/config/aarch64/aarch64-simd.md           | 28 ++++++++++++++--------------
 gcc/config/aarch64/arm_neon.h                | 26 ++++++++++++++++++++++++++
 3 files changed, 43 insertions(+), 15 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index fe17298..6ff5063 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -405,7 +405,9 @@
   BUILTIN_VALL_F16 (STORE1, st1, 0)
 
   /* Implemented by fma<mode>4.  */
-  BUILTIN_VDQF (TERNOP, fma, 4)
+  BUILTIN_VHSDF (TERNOP, fma, 4)
+  /* Implemented by fnma<mode>4.  */
+  BUILTIN_VHSDF (TERNOP, fnma, 4)
 
   /* Implemented by aarch64_simd_bsl<mode>.  */
   BUILTIN_VDQQH (BSL_P, simd_bsl, 0)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 0a80adb..576ad3c 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1526,13 +1526,13 @@
 )
 
 (define_insn "fma<mode>4"
-  [(set (match_operand:VDQF 0 "register_operand" "=w")
-       (fma:VDQF (match_operand:VDQF 1 "register_operand" "w")
-                (match_operand:VDQF 2 "register_operand" "w")
-                (match_operand:VDQF 3 "register_operand" "0")))]
+  [(set (match_operand:VHSDF 0 "register_operand" "=w")
+       (fma:VHSDF (match_operand:VHSDF 1 "register_operand" "w")
+		  (match_operand:VHSDF 2 "register_operand" "w")
+		  (match_operand:VHSDF 3 "register_operand" "0")))]
   "TARGET_SIMD"
  "fmla\\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
-  [(set_attr "type" "neon_fp_mla_<Vetype><q>")]
+  [(set_attr "type" "neon_fp_mla_<stype><q>")]
 )
 
 (define_insn "*aarch64_fma4_elt<mode>"
@@ -1599,15 +1599,15 @@
 )
 
 (define_insn "fnma<mode>4"
-  [(set (match_operand:VDQF 0 "register_operand" "=w")
-	(fma:VDQF
-	  (match_operand:VDQF 1 "register_operand" "w")
-          (neg:VDQF
-	    (match_operand:VDQF 2 "register_operand" "w"))
-	  (match_operand:VDQF 3 "register_operand" "0")))]
-  "TARGET_SIMD"
- "fmls\\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
-  [(set_attr "type" "neon_fp_mla_<Vetype><q>")]
+  [(set (match_operand:VHSDF 0 "register_operand" "=w")
+	(fma:VHSDF
+	  (match_operand:VHSDF 1 "register_operand" "w")
+          (neg:VHSDF
+	    (match_operand:VHSDF 2 "register_operand" "w"))
+	  (match_operand:VHSDF 3 "register_operand" "0")))]
+  "TARGET_SIMD"
+  "fmls\\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
+  [(set_attr "type" "neon_fp_mla_<stype><q>")]
 )
 
 (define_insn "*aarch64_fnma4_elt<mode>"
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index e78ff43..ad5b6fa 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -26458,6 +26458,32 @@ vsubq_f16 (float16x8_t __a, float16x8_t __b)
   return __a - __b;
 }
 
+/* ARMv8.2-A FP16 three operands vector intrinsics.  */
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vfma_f16 (float16x4_t __a, float16x4_t __b, float16x4_t __c)
+{
+  return __builtin_aarch64_fmav4hf (__b, __c, __a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vfmaq_f16 (float16x8_t __a, float16x8_t __b, float16x8_t __c)
+{
+  return __builtin_aarch64_fmav8hf (__b, __c, __a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vfms_f16 (float16x4_t __a, float16x4_t __b, float16x4_t __c)
+{
+  return __builtin_aarch64_fnmav4hf (__b, __c, __a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vfmsq_f16 (float16x8_t __a, float16x8_t __b, float16x8_t __c)
+{
+  return __builtin_aarch64_fnmav8hf (__b, __c, __a);
+}
+
 #pragma GCC pop_options
 
 #undef __aarch64_vget_lane_any
-- 
2.5.0






^ permalink raw reply	[flat|nested] 32+ messages in thread

* [AArch64][7/14] ARMv8.2-A FP16 one operand scalar intrinsics
       [not found]           ` <cf21a824-01c3-0969-d12b-884c4e70e7f1@foss.arm.com>
@ 2016-07-07 16:17             ` Jiong Wang
       [not found]               ` <b6150268-1e2d-3fc6-17c9-7bde47e2534e@foss.arm.com>
       [not found]             ` <c9ed296a-1105-6bda-1927-e72be567c590@foss.arm.com>
  1 sibling, 1 reply; 32+ messages in thread
From: Jiong Wang @ 2016-07-07 16:17 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 3301 bytes --]

This patch add ARMv8.2-A FP16 one operand scalar intrinsics

Scalar intrinsics are kept in arm_fp16.h instead of arm_neon.h.

gcc/
2016-07-07  Jiong Wang <jiong.wang@arm.com>

         * config.gcc (aarch64*-*-*): Install arm_fp16.h.
         * config/aarch64/aarch64-builtins.c (hi_UP): New.
         * config/aarch64/aarch64-simd-builtins.def: Register new builtins.
         * config/aarch64/aarch64-simd.md (aarch64_frsqrte<mode>): 
Extend to HF mode.
         (aarch64_frecp<FRECP:frecp_suffix><mode>): Likewise.
         (aarch64_cm<optab><mode>): Likewise.
         * config/aarch64/aarch64.md (<frint_pattern><mode>2): Likewise.
(l<fcvt_pattern><su_optab><GPF:mode><GPI:mode>2): Likewise.
         (fix_trunc<GPF:mode><GPI:mode>2): Likewise.
         (sqrt<mode>2): Likewise.
         (abs<mode>2): Likewise.
         (<optab><mode>hf2): New pattern for HF mode.
         (<optab>hihf2): Likewise.
         * config/aarch64/arm_neon.h: Include arm_fp16.h.
         * config/aarch64/iterators.md (GPF_F16): New.
         (GPI_F16): Likewise.
         (VHSDF_HSDF): Likewise.
         (w1): Support HF mode.
         (w2): Likewise.
         (v): Likewise.
         (s): Likewise.
         (q): Likewise.
         (Vmtype): Likewise.
         (V_cmp_result): Likewise.
         (fcvt_iesize): Likewise.
         (FCVT_IESIZE): Likewise.
         * config/aarch64/arm_fp16.h: New file.
         (vabsh_f16): New.
         (vceqzh_f16): Likewise.
         (vcgezh_f16): Likewise.
         (vcgtzh_f16): Likewise.
         (vclezh_f16): Likewise.
         (vcltzh_f16): Likewise.
         (vcvth_f16_s16): Likewise.
         (vcvth_f16_s32): Likewise.
         (vcvth_f16_s64): Likewise.
         (vcvth_f16_u16): Likewise.
         (vcvth_f16_u32): Likewise.
         (vcvth_f16_u64): Likewise.
         (vcvth_s16_f16): Likewise.
         (vcvth_s32_f16): Likewise.
         (vcvth_s64_f16): Likewise.
         (vcvth_u16_f16): Likewise.
         (vcvth_u32_f16): Likewise.
         (vcvth_u64_f16): Likewise.
         (vcvtah_s16_f16): Likewise.
         (vcvtah_s32_f16): Likewise.
         (vcvtah_s64_f16): Likewise.
         (vcvtah_u16_f16): Likewise.
         (vcvtah_u32_f16): Likewise.
         (vcvtah_u64_f16): Likewise.
         (vcvtmh_s16_f16): Likewise.
         (vcvtmh_s32_f16): Likewise.
         (vcvtmh_s64_f16): Likewise.
         (vcvtmh_u16_f16): Likewise.
         (vcvtmh_u32_f16): Likewise.
         (vcvtmh_u64_f16): Likewise.
         (vcvtnh_s16_f16): Likewise.
         (vcvtnh_s32_f16): Likewise.
         (vcvtnh_s64_f16): Likewise.
         (vcvtnh_u16_f16): Likewise.
         (vcvtnh_u32_f16): Likewise.
         (vcvtnh_u64_f16): Likewise.
         (vcvtph_s16_f16): Likewise.
         (vcvtph_s32_f16): Likewise.
         (vcvtph_s64_f16): Likewise.
         (vcvtph_u16_f16): Likewise.
         (vcvtph_u32_f16): Likewise.
         (vcvtph_u64_f16): Likewise.
         (vnegh_f16): Likewise.
         (vrecpeh_f16): Likewise.
         (vrecpxh_f16): Likewise.
         (vrndh_f16): Likewise.
         (vrndah_f16): Likewise.
         (vrndih_f16): Likewise.
         (vrndmh_f16): Likewise.
         (vrndnh_f16): Likewise.
         (vrndph_f16): Likewise.
         (vrndxh_f16): Likewise.
         (vrsqrteh_f16): Likewise.
         (vsqrth_f16): Likewise.

[-- Attachment #2: 0007-7-14-ARMv8.2-FP16-one-operand-scalar-intrinsics.patch --]
[-- Type: text/x-patch, Size: 27873 bytes --]

From f5f32c0867397594ae4e914acc69bc30d9b15ce9 Mon Sep 17 00:00:00 2001
From: Jiong Wang <jiong.wang@arm.com>
Date: Wed, 8 Jun 2016 10:31:40 +0100
Subject: [PATCH 07/14] [7/14] ARMv8.2 FP16 one operand scalar intrinsics

---
 gcc/config.gcc                               |   2 +-
 gcc/config/aarch64/aarch64-builtins.c        |   1 +
 gcc/config/aarch64/aarch64-simd-builtins.def |  54 +++-
 gcc/config/aarch64/aarch64-simd.md           |  42 ++-
 gcc/config/aarch64/aarch64.md                |  52 ++--
 gcc/config/aarch64/arm_fp16.h                | 365 +++++++++++++++++++++++++++
 gcc/config/aarch64/arm_neon.h                |   2 +
 gcc/config/aarch64/iterators.md              |  32 ++-
 8 files changed, 495 insertions(+), 55 deletions(-)
 create mode 100644 gcc/config/aarch64/arm_fp16.h

diff --git a/gcc/config.gcc b/gcc/config.gcc
index e47535b..13fefee 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -307,7 +307,7 @@ m32c*-*-*)
         ;;
 aarch64*-*-*)
 	cpu_type=aarch64
-	extra_headers="arm_neon.h arm_acle.h"
+	extra_headers="arm_fp16.h arm_neon.h arm_acle.h"
 	c_target_objs="aarch64-c.o"
 	cxx_target_objs="aarch64-c.o"
 	extra_objs="aarch64-builtins.o aarch-common.o cortex-a57-fma-steering.o"
diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index af5fac5..ca91d91 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -62,6 +62,7 @@
 #define si_UP    SImode
 #define sf_UP    SFmode
 #define hi_UP    HImode
+#define hf_UP    HFmode
 #define qi_UP    QImode
 #define UP(X) X##_UP
 
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 64c5f86..6a74daa 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -274,6 +274,14 @@
   BUILTIN_VHSDF (UNOP, round, 2)
   BUILTIN_VHSDF_DF (UNOP, frintn, 2)
 
+  VAR1 (UNOP, btrunc, 2, hf)
+  VAR1 (UNOP, ceil, 2, hf)
+  VAR1 (UNOP, floor, 2, hf)
+  VAR1 (UNOP, frintn, 2, hf)
+  VAR1 (UNOP, nearbyint, 2, hf)
+  VAR1 (UNOP, rint, 2, hf)
+  VAR1 (UNOP, round, 2, hf)
+
   /* Implemented by l<fcvt_pattern><su_optab><VQDF:mode><vcvt_target>2.  */
   VAR1 (UNOP, lbtruncv4hf, 2, v4hi)
   VAR1 (UNOP, lbtruncv8hf, 2, v8hi)
@@ -292,7 +300,8 @@
   VAR1 (UNOP, lroundv2sf, 2, v2si)
   VAR1 (UNOP, lroundv4sf, 2, v4si)
   VAR1 (UNOP, lroundv2df, 2, v2di)
-  /* Implemented by l<fcvt_pattern><su_optab><GPF:mode><GPI:mode>2.  */
+  /* Implemented by l<fcvt_pattern><su_optab><GPF_F16:mode><GPI:mode>2.  */
+  BUILTIN_GPI_I16 (UNOP, lroundhf, 2)
   VAR1 (UNOP, lroundsf, 2, si)
   VAR1 (UNOP, lrounddf, 2, di)
 
@@ -301,6 +310,7 @@
   VAR1 (UNOPUS, lrounduv2sf, 2, v2si)
   VAR1 (UNOPUS, lrounduv4sf, 2, v4si)
   VAR1 (UNOPUS, lrounduv2df, 2, v2di)
+  BUILTIN_GPI_I16 (UNOPUS, lrounduhf, 2)
   VAR1 (UNOPUS, lroundusf, 2, si)
   VAR1 (UNOPUS, lroundudf, 2, di)
 
@@ -309,12 +319,14 @@
   VAR1 (UNOP, lceilv2sf, 2, v2si)
   VAR1 (UNOP, lceilv4sf, 2, v4si)
   VAR1 (UNOP, lceilv2df, 2, v2di)
+  BUILTIN_GPI_I16 (UNOP, lceilhf, 2)
 
   VAR1 (UNOPUS, lceiluv4hf, 2, v4hi)
   VAR1 (UNOPUS, lceiluv8hf, 2, v8hi)
   VAR1 (UNOPUS, lceiluv2sf, 2, v2si)
   VAR1 (UNOPUS, lceiluv4sf, 2, v4si)
   VAR1 (UNOPUS, lceiluv2df, 2, v2di)
+  BUILTIN_GPI_I16 (UNOPUS, lceiluhf, 2)
   VAR1 (UNOPUS, lceilusf, 2, si)
   VAR1 (UNOPUS, lceiludf, 2, di)
 
@@ -323,12 +335,14 @@
   VAR1 (UNOP, lfloorv2sf, 2, v2si)
   VAR1 (UNOP, lfloorv4sf, 2, v4si)
   VAR1 (UNOP, lfloorv2df, 2, v2di)
+  BUILTIN_GPI_I16 (UNOP, lfloorhf, 2)
 
   VAR1 (UNOPUS, lflooruv4hf, 2, v4hi)
   VAR1 (UNOPUS, lflooruv8hf, 2, v8hi)
   VAR1 (UNOPUS, lflooruv2sf, 2, v2si)
   VAR1 (UNOPUS, lflooruv4sf, 2, v4si)
   VAR1 (UNOPUS, lflooruv2df, 2, v2di)
+  BUILTIN_GPI_I16 (UNOPUS, lflooruhf, 2)
   VAR1 (UNOPUS, lfloorusf, 2, si)
   VAR1 (UNOPUS, lfloorudf, 2, di)
 
@@ -337,6 +351,7 @@
   VAR1 (UNOP, lfrintnv2sf, 2, v2si)
   VAR1 (UNOP, lfrintnv4sf, 2, v4si)
   VAR1 (UNOP, lfrintnv2df, 2, v2di)
+  BUILTIN_GPI_I16 (UNOP, lfrintnhf, 2)
   VAR1 (UNOP, lfrintnsf, 2, si)
   VAR1 (UNOP, lfrintndf, 2, di)
 
@@ -345,6 +360,7 @@
   VAR1 (UNOPUS, lfrintnuv2sf, 2, v2si)
   VAR1 (UNOPUS, lfrintnuv4sf, 2, v4si)
   VAR1 (UNOPUS, lfrintnuv2df, 2, v2di)
+  BUILTIN_GPI_I16 (UNOPUS, lfrintnuhf, 2)
   VAR1 (UNOPUS, lfrintnusf, 2, si)
   VAR1 (UNOPUS, lfrintnudf, 2, di)
 
@@ -376,9 +392,9 @@
 
   /* Implemented by
      aarch64_frecp<FRECP:frecp_suffix><mode>.  */
-  BUILTIN_GPF (UNOP, frecpe, 0)
+  BUILTIN_GPF_F16 (UNOP, frecpe, 0)
   BUILTIN_GPF (BINOP, frecps, 0)
-  BUILTIN_GPF (UNOP, frecpx, 0)
+  BUILTIN_GPF_F16 (UNOP, frecpx, 0)
 
   BUILTIN_VDQ_SI (UNOP, urecpe, 0)
 
@@ -389,6 +405,7 @@
      only ever used for the int64x1_t intrinsic, there is no scalar version.  */
   BUILTIN_VSDQ_I_DI (UNOP, abs, 0)
   BUILTIN_VHSDF (UNOP, abs, 2)
+  VAR1 (UNOP, abs, 2, hf)
 
   BUILTIN_VQ_HSF (UNOP, vec_unpacks_hi_, 10)
   VAR1 (BINOP, float_truncate_hi_, 0, v4sf)
@@ -483,7 +500,7 @@
   BUILTIN_VHSDF_SDF (SHIFTIMM_USS, fcvtzu, 3)
 
   /* Implemented by aarch64_rsqrte<mode>.  */
-  BUILTIN_VHSDF_SDF (UNOP, rsqrte, 0)
+  BUILTIN_VHSDF_HSDF (UNOP, rsqrte, 0)
 
   /* Implemented by aarch64_rsqrts<mode>.  */
   BUILTIN_VHSDF_SDF (BINOP, rsqrts, 0)
@@ -495,17 +512,34 @@
   BUILTIN_VHSDF (BINOP, faddp, 0)
 
   /* Implemented by aarch64_cm<optab><mode>.  */
-  BUILTIN_VHSDF_SDF (BINOP_USS, cmeq, 0)
-  BUILTIN_VHSDF_SDF (BINOP_USS, cmge, 0)
-  BUILTIN_VHSDF_SDF (BINOP_USS, cmgt, 0)
-  BUILTIN_VHSDF_SDF (BINOP_USS, cmle, 0)
-  BUILTIN_VHSDF_SDF (BINOP_USS, cmlt, 0)
+  BUILTIN_VHSDF_HSDF (BINOP_USS, cmeq, 0)
+  BUILTIN_VHSDF_HSDF (BINOP_USS, cmge, 0)
+  BUILTIN_VHSDF_HSDF (BINOP_USS, cmgt, 0)
+  BUILTIN_VHSDF_HSDF (BINOP_USS, cmle, 0)
+  BUILTIN_VHSDF_HSDF (BINOP_USS, cmlt, 0)
 
   /* Implemented by neg<mode>2.  */
-  BUILTIN_VHSDF (UNOP, neg, 2)
+  BUILTIN_VHSDF_HSDF (UNOP, neg, 2)
 
   /* Implemented by aarch64_fac<optab><mode>.  */
   BUILTIN_VHSDF_SDF (BINOP_USS, faclt, 0)
   BUILTIN_VHSDF_SDF (BINOP_USS, facle, 0)
   BUILTIN_VHSDF_SDF (BINOP_USS, facgt, 0)
   BUILTIN_VHSDF_SDF (BINOP_USS, facge, 0)
+
+  /* Implemented by sqrt<mode>2.  */
+  VAR1 (UNOP, sqrt, 2, hf)
+
+  /* Implemented by <optab><mode>hf2.  */
+  VAR1 (UNOP, floatdi, 2, hf)
+  VAR1 (UNOP, floatsi, 2, hf)
+  VAR1 (UNOP, floathi, 2, hf)
+  VAR1 (UNOPUS, floatunsdi, 2, hf)
+  VAR1 (UNOPUS, floatunssi, 2, hf)
+  VAR1 (UNOPUS, floatunshi, 2, hf)
+  BUILTIN_GPI_I16 (UNOP, fix_trunchf, 2)
+  BUILTIN_GPI (UNOP, fix_truncsf, 2)
+  BUILTIN_GPI (UNOP, fix_truncdf, 2)
+  BUILTIN_GPI_I16 (UNOPUS, fixuns_trunchf, 2)
+  BUILTIN_GPI (UNOPUS, fixuns_truncsf, 2)
+  BUILTIN_GPI (UNOPUS, fixuns_truncdf, 2)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index d5b25fa..6e6c4ac 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -383,8 +383,8 @@
 )
 
 (define_insn "aarch64_rsqrte<mode>"
-  [(set (match_operand:VHSDF_SDF 0 "register_operand" "=w")
-	(unspec:VHSDF_SDF [(match_operand:VHSDF_SDF 1 "register_operand" "w")]
+  [(set (match_operand:VHSDF_HSDF 0 "register_operand" "=w")
+	(unspec:VHSDF_HSDF [(match_operand:VHSDF_HSDF 1 "register_operand" "w")]
 		     UNSPEC_RSQRTE))]
   "TARGET_SIMD"
   "frsqrte\\t%<v>0<Vmtype>, %<v>1<Vmtype>"
@@ -1700,6 +1700,32 @@
   [(set_attr "type" "neon_fp_to_int_<stype><q>")]
 )
 
+;; HF Scalar variants of related SIMD instructions.
+(define_insn "l<fcvt_pattern><su_optab>hfhi2"
+  [(set (match_operand:HI 0 "register_operand" "=w")
+	(FIXUORS:HI (unspec:HF [(match_operand:HF 1 "register_operand" "w")]
+		      FCVT)))]
+  "TARGET_SIMD_F16INST"
+  "fcvt<frint_suffix><su>\t%h0, %h1"
+  [(set_attr "type" "neon_fp_to_int_s")]
+)
+
+(define_insn "<optab>_trunchfhi2"
+  [(set (match_operand:HI 0 "register_operand" "=w")
+        (FIXUORS:HI (match_operand:HF 1 "register_operand" "w")))]
+  "TARGET_SIMD_F16INST"
+  "fcvtz<su>\t%h0, %h1"
+  [(set_attr "type" "neon_fp_to_int_s")]
+)
+
+(define_insn "<optab>hihf2"
+  [(set (match_operand:HF 0 "register_operand" "=w")
+        (FLOATUORS:HF (match_operand:HI 1 "register_operand" "w")))]
+  "TARGET_SIMD_F16INST"
+  "<su_optab>cvtf\t%h0, %h1"
+  [(set_attr "type" "neon_int_to_fp_s")]
+)
+
 (define_insn "*aarch64_fcvt<su_optab><VDQF:mode><fcvt_target>2_mult"
   [(set (match_operand:<FCVT_TARGET> 0 "register_operand" "=w")
 	(FIXUORS:<FCVT_TARGET> (unspec:<FCVT_TARGET>
@@ -4246,8 +4272,8 @@
   [(set (match_operand:<V_cmp_result> 0 "register_operand" "=w,w")
 	(neg:<V_cmp_result>
 	  (COMPARISONS:<V_cmp_result>
-	    (match_operand:VHSDF_SDF 1 "register_operand" "w,w")
-	    (match_operand:VHSDF_SDF 2 "aarch64_simd_reg_or_zero" "w,YDz")
+	    (match_operand:VHSDF_HSDF 1 "register_operand" "w,w")
+	    (match_operand:VHSDF_HSDF 2 "aarch64_simd_reg_or_zero" "w,YDz")
 	  )))]
   "TARGET_SIMD"
   "@
@@ -5365,12 +5391,12 @@
 )
 
 (define_insn "aarch64_frecp<FRECP:frecp_suffix><mode>"
-  [(set (match_operand:GPF 0 "register_operand" "=w")
-	(unspec:GPF [(match_operand:GPF 1 "register_operand" "w")]
-		    FRECP))]
+  [(set (match_operand:GPF_F16 0 "register_operand" "=w")
+	(unspec:GPF_F16 [(match_operand:GPF_F16 1 "register_operand" "w")]
+	 FRECP))]
   "TARGET_SIMD"
   "frecp<FRECP:frecp_suffix>\\t%<s>0, %<s>1"
-  [(set_attr "type" "neon_fp_recp<FRECP:frecp_suffix>_<GPF:Vetype><GPF:q>")]
+  [(set_attr "type" "neon_fp_recp<FRECP:frecp_suffix>_<GPF_F16:stype><GPF_F16:q>")]
 )
 
 (define_insn "aarch64_frecps<mode>"
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index b3ae42b..520026d 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4452,22 +4452,22 @@
 ;; Expands to btrunc, ceil, floor, nearbyint, rint, round, frintn.
 
 (define_insn "<frint_pattern><mode>2"
-  [(set (match_operand:GPF 0 "register_operand" "=w")
-	(unspec:GPF [(match_operand:GPF 1 "register_operand" "w")]
+  [(set (match_operand:GPF_F16 0 "register_operand" "=w")
+	(unspec:GPF_F16 [(match_operand:GPF_F16 1 "register_operand" "w")]
 	 FRINT))]
   "TARGET_FLOAT"
   "frint<frint_suffix>\\t%<s>0, %<s>1"
-  [(set_attr "type" "f_rint<s>")]
+  [(set_attr "type" "f_rint<stype>")]
 )
 
 ;; frcvt floating-point round to integer and convert standard patterns.
 ;; Expands to lbtrunc, lceil, lfloor, lround.
-(define_insn "l<fcvt_pattern><su_optab><GPF:mode><GPI:mode>2"
+(define_insn "l<fcvt_pattern><su_optab><GPF_F16:mode><GPI:mode>2"
   [(set (match_operand:GPI 0 "register_operand" "=r")
-	(FIXUORS:GPI (unspec:GPF [(match_operand:GPF 1 "register_operand" "w")]
+	(FIXUORS:GPI (unspec:GPF_F16 [(match_operand:GPF_F16 1 "register_operand" "w")]
 		      FCVT)))]
   "TARGET_FLOAT"
-  "fcvt<frint_suffix><su>\\t%<GPI:w>0, %<GPF:s>1"
+  "fcvt<frint_suffix><su>\\t%<GPI:w>0, %<GPF_F16:s>1"
   [(set_attr "type" "f_cvtf2i")]
 )
 
@@ -4595,19 +4595,11 @@
   [(set_attr "type" "f_cvt")]
 )
 
-(define_insn "fix_trunc<GPF:mode><GPI:mode>2"
+(define_insn "<optab>_trunc<GPF_F16:mode><GPI:mode>2"
   [(set (match_operand:GPI 0 "register_operand" "=r")
-        (fix:GPI (match_operand:GPF 1 "register_operand" "w")))]
+        (FIXUORS:GPI (match_operand:GPF_F16 1 "register_operand" "w")))]
   "TARGET_FLOAT"
-  "fcvtzs\\t%<GPI:w>0, %<GPF:s>1"
-  [(set_attr "type" "f_cvtf2i")]
-)
-
-(define_insn "fixuns_trunc<GPF:mode><GPI:mode>2"
-  [(set (match_operand:GPI 0 "register_operand" "=r")
-        (unsigned_fix:GPI (match_operand:GPF 1 "register_operand" "w")))]
-  "TARGET_FLOAT"
-  "fcvtzu\\t%<GPI:w>0, %<GPF:s>1"
+  "fcvtz<su>\t%<GPI:w>0, %<GPF_F16:s>1"
   [(set_attr "type" "f_cvtf2i")]
 )
 
@@ -4631,6 +4623,14 @@
   [(set_attr "type" "f_cvti2f")]
 )
 
+(define_insn "<optab><mode>hf2"
+  [(set (match_operand:HF 0 "register_operand" "=w")
+        (FLOATUORS:HF (match_operand:GPI 1 "register_operand" "r")))]
+  "TARGET_FP_F16INST"
+  "<su_optab>cvtf\t%h0, %<w>1"
+  [(set_attr "type" "f_cvti2f")]
+)
+
 ;; Convert between fixed-point and floating-point (scalar modes)
 
 (define_insn "<FCVT_F2FIXED:fcvt_fixed_insn><GPF:mode>3"
@@ -4726,27 +4726,27 @@
 )
 
 (define_insn "neg<mode>2"
-  [(set (match_operand:GPF 0 "register_operand" "=w")
-        (neg:GPF (match_operand:GPF 1 "register_operand" "w")))]
+  [(set (match_operand:GPF_F16 0 "register_operand" "=w")
+        (neg:GPF_F16 (match_operand:GPF_F16 1 "register_operand" "w")))]
   "TARGET_FLOAT"
   "fneg\\t%<s>0, %<s>1"
-  [(set_attr "type" "ffarith<s>")]
+  [(set_attr "type" "ffarith<stype>")]
 )
 
 (define_insn "sqrt<mode>2"
-  [(set (match_operand:GPF 0 "register_operand" "=w")
-        (sqrt:GPF (match_operand:GPF 1 "register_operand" "w")))]
+  [(set (match_operand:GPF_F16 0 "register_operand" "=w")
+        (sqrt:GPF_F16 (match_operand:GPF_F16 1 "register_operand" "w")))]
   "TARGET_FLOAT"
   "fsqrt\\t%<s>0, %<s>1"
-  [(set_attr "type" "fsqrt<s>")]
+  [(set_attr "type" "fsqrt<stype>")]
 )
 
 (define_insn "abs<mode>2"
-  [(set (match_operand:GPF 0 "register_operand" "=w")
-        (abs:GPF (match_operand:GPF 1 "register_operand" "w")))]
+  [(set (match_operand:GPF_F16 0 "register_operand" "=w")
+        (abs:GPF_F16 (match_operand:GPF_F16 1 "register_operand" "w")))]
   "TARGET_FLOAT"
   "fabs\\t%<s>0, %<s>1"
-  [(set_attr "type" "ffarith<s>")]
+  [(set_attr "type" "ffarith<stype>")]
 )
 
 ;; Given that smax/smin do not specify the result when either input is NaN,
diff --git a/gcc/config/aarch64/arm_fp16.h b/gcc/config/aarch64/arm_fp16.h
new file mode 100644
index 0000000..818aa61
--- /dev/null
+++ b/gcc/config/aarch64/arm_fp16.h
@@ -0,0 +1,365 @@
+/* ARM FP16 scalar intrinsics include file.
+
+   Copyright (C) 2016 Free Software Foundation, Inc.
+   Contributed by ARM Ltd.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _AARCH64_FP16_H_
+#define _AARCH64_FP16_H_
+
+#include <stdint.h>
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8.2-a+fp16")
+
+typedef __fp16 float16_t;
+
+/* ARMv8.2-A FP16 one operand scalar intrinsics.  */
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vabsh_f16 (float16_t __a)
+{
+  return __builtin_aarch64_abshf (__a);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vceqzh_f16 (float16_t __a)
+{
+  return __builtin_aarch64_cmeqhf_uss (__a, 0.0f);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vcgezh_f16 (float16_t __a)
+{
+  return __builtin_aarch64_cmgehf_uss (__a, 0.0f);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vcgtzh_f16 (float16_t __a)
+{
+  return __builtin_aarch64_cmgthf_uss (__a, 0.0f);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vclezh_f16 (float16_t __a)
+{
+  return __builtin_aarch64_cmlehf_uss (__a, 0.0f);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vcltzh_f16 (float16_t __a)
+{
+  return __builtin_aarch64_cmlthf_uss (__a, 0.0f);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vcvth_f16_s16 (int16_t __a)
+{
+  return __builtin_aarch64_floathihf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vcvth_f16_s32 (int32_t __a)
+{
+  return __builtin_aarch64_floatsihf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vcvth_f16_s64 (int64_t __a)
+{
+  return __builtin_aarch64_floatdihf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vcvth_f16_u16 (uint16_t __a)
+{
+  return __builtin_aarch64_floatunshihf_us (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vcvth_f16_u32 (uint32_t __a)
+{
+  return __builtin_aarch64_floatunssihf_us (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vcvth_f16_u64 (uint64_t __a)
+{
+  return __builtin_aarch64_floatunsdihf_us (__a);
+}
+
+__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+vcvth_s16_f16 (float16_t __a)
+{
+  return __builtin_aarch64_fix_trunchfhi (__a);
+}
+
+__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+vcvth_s32_f16 (float16_t __a)
+{
+  return __builtin_aarch64_fix_trunchfsi (__a);
+}
+
+__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+vcvth_s64_f16 (float16_t __a)
+{
+  return __builtin_aarch64_fix_trunchfdi (__a);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vcvth_u16_f16 (float16_t __a)
+{
+  return __builtin_aarch64_fixuns_trunchfhi_us (__a);
+}
+
+__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+vcvth_u32_f16 (float16_t __a)
+{
+  return __builtin_aarch64_fixuns_trunchfsi_us (__a);
+}
+
+__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+vcvth_u64_f16 (float16_t __a)
+{
+  return __builtin_aarch64_fixuns_trunchfdi_us (__a);
+}
+
+__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+vcvtah_s16_f16 (float16_t __a)
+{
+  return __builtin_aarch64_lroundhfhi (__a);
+}
+
+__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+vcvtah_s32_f16 (float16_t __a)
+{
+  return __builtin_aarch64_lroundhfsi (__a);
+}
+
+__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+vcvtah_s64_f16 (float16_t __a)
+{
+  return __builtin_aarch64_lroundhfdi (__a);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vcvtah_u16_f16 (float16_t __a)
+{
+  return __builtin_aarch64_lrounduhfhi_us (__a);
+}
+
+__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+vcvtah_u32_f16 (float16_t __a)
+{
+  return __builtin_aarch64_lrounduhfsi_us (__a);
+}
+
+__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+vcvtah_u64_f16 (float16_t __a)
+{
+  return __builtin_aarch64_lrounduhfdi_us (__a);
+}
+
+__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+vcvtmh_s16_f16 (float16_t __a)
+{
+  return __builtin_aarch64_lfloorhfhi (__a);
+}
+
+__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+vcvtmh_s32_f16 (float16_t __a)
+{
+  return __builtin_aarch64_lfloorhfsi (__a);
+}
+
+__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+vcvtmh_s64_f16 (float16_t __a)
+{
+  return __builtin_aarch64_lfloorhfdi (__a);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vcvtmh_u16_f16 (float16_t __a)
+{
+  return __builtin_aarch64_lflooruhfhi_us (__a);
+}
+
+__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+vcvtmh_u32_f16 (float16_t __a)
+{
+  return __builtin_aarch64_lflooruhfsi_us (__a);
+}
+
+__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+vcvtmh_u64_f16 (float16_t __a)
+{
+  return __builtin_aarch64_lflooruhfdi_us (__a);
+}
+
+__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+vcvtnh_s16_f16 (float16_t __a)
+{
+  return __builtin_aarch64_lfrintnhfhi (__a);
+}
+
+__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+vcvtnh_s32_f16 (float16_t __a)
+{
+  return __builtin_aarch64_lfrintnhfsi (__a);
+}
+
+__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+vcvtnh_s64_f16 (float16_t __a)
+{
+  return __builtin_aarch64_lfrintnhfdi (__a);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vcvtnh_u16_f16 (float16_t __a)
+{
+  return __builtin_aarch64_lfrintnuhfhi_us (__a);
+}
+
+__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+vcvtnh_u32_f16 (float16_t __a)
+{
+  return __builtin_aarch64_lfrintnuhfsi_us (__a);
+}
+
+__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+vcvtnh_u64_f16 (float16_t __a)
+{
+  return __builtin_aarch64_lfrintnuhfdi_us (__a);
+}
+
+__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+vcvtph_s16_f16 (float16_t __a)
+{
+  return __builtin_aarch64_lceilhfhi (__a);
+}
+
+__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+vcvtph_s32_f16 (float16_t __a)
+{
+  return __builtin_aarch64_lceilhfsi (__a);
+}
+
+__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+vcvtph_s64_f16 (float16_t __a)
+{
+  return __builtin_aarch64_lceilhfdi (__a);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vcvtph_u16_f16 (float16_t __a)
+{
+  return __builtin_aarch64_lceiluhfhi_us (__a);
+}
+
+__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+vcvtph_u32_f16 (float16_t __a)
+{
+  return __builtin_aarch64_lceiluhfsi_us (__a);
+}
+
+__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+vcvtph_u64_f16 (float16_t __a)
+{
+  return __builtin_aarch64_lceiluhfdi_us (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vnegh_f16 (float16_t __a)
+{
+  return __builtin_aarch64_neghf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vrecpeh_f16 (float16_t __a)
+{
+  return __builtin_aarch64_frecpehf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vrecpxh_f16 (float16_t __a)
+{
+  return __builtin_aarch64_frecpxhf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vrndh_f16 (float16_t __a)
+{
+  return __builtin_aarch64_btrunchf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vrndah_f16 (float16_t __a)
+{
+  return __builtin_aarch64_roundhf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vrndih_f16 (float16_t __a)
+{
+  return __builtin_aarch64_nearbyinthf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vrndmh_f16 (float16_t __a)
+{
+  return __builtin_aarch64_floorhf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vrndnh_f16 (float16_t __a)
+{
+  return __builtin_aarch64_frintnhf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vrndph_f16 (float16_t __a)
+{
+  return __builtin_aarch64_ceilhf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vrndxh_f16 (float16_t __a)
+{
+  return __builtin_aarch64_rinthf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vrsqrteh_f16 (float16_t __a)
+{
+  return __builtin_aarch64_rsqrtehf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vsqrth_f16 (float16_t __a)
+{
+  return __builtin_aarch64_sqrthf (__a);
+}
+
+#pragma GCC pop_options
+
+#endif
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index f3e5d0e..e727ff1 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -25743,6 +25743,8 @@ __INTERLEAVE_LIST (zip)
 
 /* ARMv8.2-A FP16 intrinsics.  */
 
+#include "arm_fp16.h"
+
 #pragma GCC push_options
 #pragma GCC target ("arch=armv8.2-a+fp16")
 
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 011b937..20d0f1b 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -26,6 +26,9 @@
 ;; Iterator for General Purpose Integer registers (32- and 64-bit modes)
 (define_mode_iterator GPI [SI DI])
 
+;; Iterator for HI, SI, DI, some instructions can only work on these modes.
+(define_mode_iterator GPI_I16 [(HI "AARCH64_ISA_F16") SI DI])
+
 ;; Iterator for QI and HI modes
 (define_mode_iterator SHORT [QI HI])
 
@@ -38,6 +41,9 @@
 ;; Iterator for General Purpose Floating-point registers (32- and 64-bit modes)
 (define_mode_iterator GPF [SF DF])
 
+;; Iterator for all scalar floating point modes (HF, SF, DF)
+(define_mode_iterator GPF_F16 [(HF "AARCH64_ISA_F16") SF DF])
+
 ;; Iterator for all scalar floating point modes (HF, SF, DF and TF)
 (define_mode_iterator GPF_TF_F16 [HF SF DF TF])
 
@@ -102,6 +108,11 @@
 (define_mode_iterator VHSDF_SDF [(V4HF "TARGET_SIMD_F16INST")
 				 (V8HF "TARGET_SIMD_F16INST")
 				 V2SF V4SF V2DF SF DF])
+(define_mode_iterator VHSDF_HSDF [(V4HF "TARGET_SIMD_F16INST")
+				  (V8HF "TARGET_SIMD_F16INST")
+				  V2SF V4SF V2DF
+				  (HF "TARGET_SIMD_F16INST")
+				  SF DF])
 
 ;; Vector single Float modes.
 (define_mode_iterator VDQSF [V2SF V4SF])
@@ -372,8 +383,8 @@
 (define_mode_attr w [(QI "w") (HI "w") (SI "w") (DI "x") (SF "s") (DF "d")])
 
 ;; For inequal width int to float conversion
-(define_mode_attr w1 [(SF "w") (DF "x")])
-(define_mode_attr w2 [(SF "x") (DF "w")])
+(define_mode_attr w1 [(HF "w") (SF "w") (DF "x")])
+(define_mode_attr w2 [(HF "x") (SF "x") (DF "w")])
 
 (define_mode_attr short_mask [(HI "65535") (QI "255")])
 
@@ -385,7 +396,7 @@
 
 ;; For scalar usage of vector/FP registers
 (define_mode_attr v [(QI "b") (HI "h") (SI "s") (DI "d")
-		    (SF "s") (DF "d")
+		    (HF  "h") (SF "s") (DF "d")
 		    (V8QI "") (V16QI "")
 		    (V4HI "") (V8HI "")
 		    (V2SI "") (V4SI  "")
@@ -416,7 +427,7 @@
 (define_mode_attr vas [(DI "") (SI ".2s")])
 
 ;; Map a floating point mode to the appropriate register name prefix
-(define_mode_attr s [(SF "s") (DF "d")])
+(define_mode_attr s [(HF "h") (SF "s") (DF "d")])
 
 ;; Give the length suffix letter for a sign- or zero-extension.
 (define_mode_attr size [(QI "b") (HI "h") (SI "w")])
@@ -452,8 +463,8 @@
 			 (V4SF ".4s") (V2DF ".2d")
 			 (DI   "")    (SI   "")
 			 (HI   "")    (QI   "")
-			 (TI   "")    (SF   "")
-			 (DF   "")])
+			 (TI   "")    (HF   "")
+			 (SF   "")    (DF   "")])
 
 ;; Register suffix narrowed modes for VQN.
 (define_mode_attr Vmntype [(V8HI ".8b") (V4SI ".4h")
@@ -468,6 +479,7 @@
 			  (V2DI "d") (V4HF "h")
 			  (V8HF "h") (V2SF  "s")
 			  (V4SF "s") (V2DF  "d")
+			  (HF   "h")
 			  (SF   "s") (DF  "d")
 			  (QI "b")   (HI "h")
 			  (SI "s")   (DI "d")])
@@ -639,7 +651,7 @@
 				(V4HF "V4HI") (V8HF  "V8HI")
 				(V2SF "V2SI") (V4SF  "V4SI")
 				(V2DF "V2DI") (DF    "DI")
-				(SF   "SI")])
+				(SF   "SI")   (HF    "HI")])
 
 ;; Lower case mode of results of comparison operations.
 (define_mode_attr v_cmp_result [(V8QI "v8qi") (V16QI "v16qi")
@@ -702,8 +714,8 @@
 
 
 ;; for the inequal width integer to fp conversions
-(define_mode_attr fcvt_iesize [(SF "di") (DF "si")])
-(define_mode_attr FCVT_IESIZE [(SF "DI") (DF "SI")])
+(define_mode_attr fcvt_iesize [(HF "di") (SF "di") (DF "si")])
+(define_mode_attr FCVT_IESIZE [(HF "DI") (SF "DI") (DF "SI")])
 
 (define_mode_attr VSWAP_WIDTH [(V8QI "V16QI") (V16QI "V8QI")
 				(V4HI "V8HI") (V8HI  "V4HI")
@@ -757,7 +769,7 @@
 		     (V4HF "") (V8HF "_q")
 		     (V2SF "") (V4SF  "_q")
 			       (V2DF  "_q")
-		     (QI "") (HI "") (SI "") (DI "") (SF "") (DF "")])
+		     (QI "") (HI "") (SI "") (DI "") (HF "") (SF "") (DF "")])
 
 (define_mode_attr vp [(V8QI "v") (V16QI "v")
 		      (V4HI "v") (V8HI  "v")
-- 
2.5.0





^ permalink raw reply	[flat|nested] 32+ messages in thread

* [AArch64][6/14] ARMv8.2-A FP16 reduction vector intrinsics
       [not found]         ` <a3eeda81-cb1c-6d9e-706d-c5c067a90d74@foss.arm.com>
@ 2016-07-07 16:17           ` Jiong Wang
  2016-07-25 11:06             ` James Greenhalgh
       [not found]           ` <cf21a824-01c3-0969-d12b-884c4e70e7f1@foss.arm.com>
  1 sibling, 1 reply; 32+ messages in thread
From: Jiong Wang @ 2016-07-07 16:17 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 465 bytes --]

This patch add ARMv8.2-A FP16 reduction vector intrinsics.

gcc/
2016-07-07  Jiong Wang <jiong.wang@arm.com>

         * config/aarch64/arm_neon.h (vmaxv_f16): New.
         (vmaxvq_f16): Likewise.
         (vminv_f16): Likewise.
         (vminvq_f16): Likewise.
         (vmaxnmv_f16): Likewise.
         (vmaxnmvq_f16): Likewise.
         (vminnmv_f16): Likewise.
         (vminnmvq_f16): Likewise.
         * config/aarch64/iterators.md (vp): Support HF modes.


[-- Attachment #2: 0006-6-14-ARMv8.2-FP16-reduction-vector-intrinsics.patch --]
[-- Type: text/x-patch, Size: 5492 bytes --]

From 514e5d195867d2f53fac50804748976626748f81 Mon Sep 17 00:00:00 2001
From: Jiong Wang <jiong.wang@arm.com>
Date: Wed, 8 Jun 2016 10:23:17 +0100
Subject: [PATCH 06/14] [6/14] ARMv8.2 FP16 reduction vector intrinsics

---
 gcc/config/aarch64/aarch64-simd-builtins.def |  8 ++---
 gcc/config/aarch64/aarch64-simd.md           | 12 +++----
 gcc/config/aarch64/arm_neon.h                | 50 ++++++++++++++++++++++++++++
 gcc/config/aarch64/iterators.md              |  7 ++--
 4 files changed, 65 insertions(+), 12 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 6ff5063..64c5f86 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -234,12 +234,12 @@
   BUILTIN_VALL (UNOP, reduc_plus_scal_, 10)
 
   /* Implemented by reduc_<maxmin_uns>_scal_<mode> (producing scalar).  */
-  BUILTIN_VDQIF (UNOP, reduc_smax_scal_, 10)
-  BUILTIN_VDQIF (UNOP, reduc_smin_scal_, 10)
+  BUILTIN_VDQIF_F16 (UNOP, reduc_smax_scal_, 10)
+  BUILTIN_VDQIF_F16 (UNOP, reduc_smin_scal_, 10)
   BUILTIN_VDQ_BHSI (UNOPU, reduc_umax_scal_, 10)
   BUILTIN_VDQ_BHSI (UNOPU, reduc_umin_scal_, 10)
-  BUILTIN_VDQF (UNOP, reduc_smax_nan_scal_, 10)
-  BUILTIN_VDQF (UNOP, reduc_smin_nan_scal_, 10)
+  BUILTIN_VHSDF (UNOP, reduc_smax_nan_scal_, 10)
+  BUILTIN_VHSDF (UNOP, reduc_smin_nan_scal_, 10)
 
   /* Implemented by <maxmin><mode>3.
      smax variants map to fmaxnm,
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index c0600df..d5b25fa 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -2073,8 +2073,8 @@
 ;; gimple_fold'd to the REDUC_(MAX|MIN)_EXPR tree code.  (This is FP smax/smin).
 (define_expand "reduc_<maxmin_uns>_scal_<mode>"
   [(match_operand:<VEL> 0 "register_operand")
-   (unspec:VDQF [(match_operand:VDQF 1 "register_operand")]
-		FMAXMINV)]
+   (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand")]
+		  FMAXMINV)]
   "TARGET_SIMD"
   {
     rtx elt = GEN_INT (ENDIAN_LANE_N (<MODE>mode, 0));
@@ -2121,12 +2121,12 @@
 )
 
 (define_insn "aarch64_reduc_<maxmin_uns>_internal<mode>"
- [(set (match_operand:VDQF 0 "register_operand" "=w")
-       (unspec:VDQF [(match_operand:VDQF 1 "register_operand" "w")]
-		    FMAXMINV))]
+ [(set (match_operand:VHSDF 0 "register_operand" "=w")
+       (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand" "w")]
+		      FMAXMINV))]
  "TARGET_SIMD"
  "<maxmin_uns_op><vp>\\t%<Vetype>0, %1.<Vtype>"
-  [(set_attr "type" "neon_fp_reduc_minmax_<Vetype><q>")]
+  [(set_attr "type" "neon_fp_reduc_minmax_<stype><q>")]
 )
 
 ;; aarch64_simd_bsl may compile to any of bsl/bif/bit depending on register
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index b09a3a7..f3e5d0e 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -26638,6 +26638,56 @@ vmulxq_n_f16 (float16x8_t __a, float16_t __b)
   return vmulxq_f16 (__a, vdupq_n_f16 (__b));
 }
 
+/* ARMv8.2-A FP16 reduction vector intrinsics.  */
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vmaxv_f16 (float16x4_t __a)
+{
+  return __builtin_aarch64_reduc_smax_nan_scal_v4hf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vmaxvq_f16 (float16x8_t __a)
+{
+  return __builtin_aarch64_reduc_smax_nan_scal_v8hf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vminv_f16 (float16x4_t __a)
+{
+  return __builtin_aarch64_reduc_smin_nan_scal_v4hf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vminvq_f16 (float16x8_t __a)
+{
+  return __builtin_aarch64_reduc_smin_nan_scal_v8hf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vmaxnmv_f16 (float16x4_t __a)
+{
+  return __builtin_aarch64_reduc_smax_scal_v4hf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vmaxnmvq_f16 (float16x8_t __a)
+{
+  return __builtin_aarch64_reduc_smax_scal_v8hf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vminnmv_f16 (float16x4_t __a)
+{
+  return __builtin_aarch64_reduc_smin_scal_v4hf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vminnmvq_f16 (float16x8_t __a)
+{
+  return __builtin_aarch64_reduc_smin_scal_v8hf (__a);
+}
+
 #pragma GCC pop_options
 
 #undef __aarch64_vget_lane_any
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 8d4dc6c..011b937 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -159,6 +159,8 @@
 
 ;; Vector modes except double int.
 (define_mode_iterator VDQIF [V8QI V16QI V4HI V8HI V2SI V4SI V2SF V4SF V2DF])
+(define_mode_iterator VDQIF_F16 [V8QI V16QI V4HI V8HI V2SI V4SI
+                                 V4HF V8HF V2SF V4SF V2DF])
 
 ;; Vector modes for S type.
 (define_mode_iterator VDQ_SI [V2SI V4SI])
@@ -760,8 +762,9 @@
 (define_mode_attr vp [(V8QI "v") (V16QI "v")
 		      (V4HI "v") (V8HI  "v")
 		      (V2SI "p") (V4SI  "v")
-		      (V2DI  "p") (V2DF  "p")
-		      (V2SF "p") (V4SF  "v")])
+		      (V2DI "p") (V2DF  "p")
+		      (V2SF "p") (V4SF  "v")
+		      (V4HF "v") (V8HF  "v")])
 
 (define_mode_attr vsi2qi [(V2SI "v8qi") (V4SI "v16qi")])
 (define_mode_attr VSI2QI [(V2SI "V8QI") (V4SI "V16QI")])
-- 
2.5.0





^ permalink raw reply	[flat|nested] 32+ messages in thread

* [AArch64][8/14] ARMv8.2-A FP16 two operands scalar intrinsics
       [not found]             ` <c9ed296a-1105-6bda-1927-e72be567c590@foss.arm.com>
       [not found]               ` <d91fc169-1317-55ed-c36c-6dc5dac088cc@foss.arm.com>
@ 2016-07-07 16:18               ` Jiong Wang
  2016-07-20 17:01                 ` Jiong Wang
  1 sibling, 1 reply; 32+ messages in thread
From: Jiong Wang @ 2016-07-07 16:18 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 2199 bytes --]

This patch add ARMv8.2-A FP16 two operands scalar intrinsics.

2016-07-07  Jiong Wang <jiong.wang@arm.com>

gcc/
         * config/aarch64/aarch64-simd-builtins.def: Register new builtins.
         * config/aarch64/aarch64.md 
(<FCVT_F2FIXED:fcvt_fixed_insn>hf<mode>3): New.
         (<FCVT_FIXED2F:fcvt_fixed_insn><mode>hf3): Likewise.
         (add<mode>3): Likewise.
         (sub<mode>3): Likewise.
         (mul<mode>3): Likewise.
         (div<mode>3): Likewise.
         (<fmaxmin><mode>3): Extend to HF.
         * config/aarch64/aarch64-simd.md (aarch64_rsqrts<mode>): Likewise.
         (fabd<mode>3): Likewise.
(<FCVT_F2FIXED:fcvt_fixed_insn><VHSDF_HSDF:mode>3): Likewise.
(<FCVT_FIXED2F:fcvt_fixed_insn><VHSDI_HSDI:mode>3): Likewise.
         (aarch64_fmulx<mode>): Likewise.
         (aarch64_fac<optab><mode>): Likewise.
         (aarch64_frecps<mode>): Likewise.
         (<FCVT_F2FIXED:fcvt_fixed_insn>hfhi3): New.
         (<FCVT_FIXED2F:fcvt_fixed_insn>hihf3): Likewise.
         * config/aarch64/iterators.md (VHSDF_SDF): Delete.
         (VSDQ_HSDI): Support HI.
         (fcvt_target, FCVT_TARGET): Likewise.
         * config/aarch64/arm_fp16.h: (vaddh_f16): New.
         (vsubh_f16): Likewise.
         (vabdh_f16): Likewise.
         (vcageh_f16): Likewise.
         (vcagth_f16): Likewise.
         (vcaleh_f16): Likewise.
         (vcalth_f16): Likewise.        (vcleh_f16): Likewise.
         (vclth_f16): Likewise.
         (vcvth_n_f16_s16): Likewise.
         (vcvth_n_f16_s32): Likewise.
         (vcvth_n_f16_s64): Likewise.
         (vcvth_n_f16_u16): Likewise.
         (vcvth_n_f16_u32): Likewise.
         (vcvth_n_f16_u64): Likewise.
         (vcvth_n_s16_f16): Likewise.
         (vcvth_n_s32_f16): Likewise.
         (vcvth_n_s64_f16): Likewise.
         (vcvth_n_u16_f16): Likewise.
         (vcvth_n_u32_f16): Likewise.
         (vcvth_n_u64_f16): Likewise.
         (vdivh_f16): Likewise.
         (vmaxh_f16): Likewise.
         (vmaxnmh_f16): Likewise.
         (vminh_f16): Likewise.
         (vminnmh_f16): Likewise.
         (vmulh_f16): Likewise.
         (vmulxh_f16): Likewise.
         (vrecpsh_f16): Likewise.
         (vrsqrtsh_f16): Likewise.


[-- Attachment #2: 0008-8-14-ARMv8.2-FP16-two-operands-scalar-intrinsics.patch --]
[-- Type: text/x-patch, Size: 19003 bytes --]

From 59446f3e1ce914b1102320e0d81654f211fad07d Mon Sep 17 00:00:00 2001
From: Jiong Wang <jiong.wang@arm.com>
Date: Thu, 9 Jun 2016 11:02:39 +0100
Subject: [PATCH 08/14] [8/14] ARMv8.2 FP16 two operands scalar intrinsics

---
 gcc/config/aarch64/aarch64-simd-builtins.def |  31 +++--
 gcc/config/aarch64/aarch64-simd.md           |  40 +++---
 gcc/config/aarch64/aarch64.md                |  88 ++++++++----
 gcc/config/aarch64/arm_fp16.h                | 200 +++++++++++++++++++++++++++
 gcc/config/aarch64/iterators.md              |  11 +-
 5 files changed, 309 insertions(+), 61 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 6a74daa..b32fdfe 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -41,7 +41,7 @@
 
   BUILTIN_VDC (COMBINE, combine, 0)
   BUILTIN_VB (BINOP, pmul, 0)
-  BUILTIN_VHSDF_SDF (BINOP, fmulx, 0)
+  BUILTIN_VHSDF_HSDF (BINOP, fmulx, 0)
   BUILTIN_VHSDF_DF (UNOP, sqrt, 2)
   BUILTIN_VD_BHSI (BINOP, addp, 0)
   VAR1 (UNOP, addp, 0, di)
@@ -393,13 +393,12 @@
   /* Implemented by
      aarch64_frecp<FRECP:frecp_suffix><mode>.  */
   BUILTIN_GPF_F16 (UNOP, frecpe, 0)
-  BUILTIN_GPF (BINOP, frecps, 0)
   BUILTIN_GPF_F16 (UNOP, frecpx, 0)
 
   BUILTIN_VDQ_SI (UNOP, urecpe, 0)
 
   BUILTIN_VHSDF (UNOP, frecpe, 0)
-  BUILTIN_VHSDF (BINOP, frecps, 0)
+  BUILTIN_VHSDF_HSDF (BINOP, frecps, 0)
 
   /* Implemented by a mixture of abs2 patterns.  Note the DImode builtin is
      only ever used for the int64x1_t intrinsic, there is no scalar version.  */
@@ -496,17 +495,23 @@
   /* Implemented by <FCVT_F2FIXED/FIXED2F:fcvt_fixed_insn><*><*>3.  */
   BUILTIN_VSDQ_HSDI (SHIFTIMM, scvtf, 3)
   BUILTIN_VSDQ_HSDI (FCVTIMM_SUS, ucvtf, 3)
-  BUILTIN_VHSDF_SDF (SHIFTIMM, fcvtzs, 3)
-  BUILTIN_VHSDF_SDF (SHIFTIMM_USS, fcvtzu, 3)
+  BUILTIN_VHSDF_HSDF (SHIFTIMM, fcvtzs, 3)
+  BUILTIN_VHSDF_HSDF (SHIFTIMM_USS, fcvtzu, 3)
+  VAR1 (SHIFTIMM, scvtfsi, 3, hf)
+  VAR1 (SHIFTIMM, scvtfdi, 3, hf)
+  VAR1 (FCVTIMM_SUS, ucvtfsi, 3, hf)
+  VAR1 (FCVTIMM_SUS, ucvtfdi, 3, hf)
+  BUILTIN_GPI (SHIFTIMM, fcvtzshf, 3)
+  BUILTIN_GPI (SHIFTIMM_USS, fcvtzuhf, 3)
 
   /* Implemented by aarch64_rsqrte<mode>.  */
   BUILTIN_VHSDF_HSDF (UNOP, rsqrte, 0)
 
   /* Implemented by aarch64_rsqrts<mode>.  */
-  BUILTIN_VHSDF_SDF (BINOP, rsqrts, 0)
+  BUILTIN_VHSDF_HSDF (BINOP, rsqrts, 0)
 
   /* Implemented by fabd<mode>3.  */
-  BUILTIN_VHSDF_SDF (BINOP, fabd, 3)
+  BUILTIN_VHSDF_HSDF (BINOP, fabd, 3)
 
   /* Implemented by aarch64_faddp<mode>.  */
   BUILTIN_VHSDF (BINOP, faddp, 0)
@@ -522,10 +527,10 @@
   BUILTIN_VHSDF_HSDF (UNOP, neg, 2)
 
   /* Implemented by aarch64_fac<optab><mode>.  */
-  BUILTIN_VHSDF_SDF (BINOP_USS, faclt, 0)
-  BUILTIN_VHSDF_SDF (BINOP_USS, facle, 0)
-  BUILTIN_VHSDF_SDF (BINOP_USS, facgt, 0)
-  BUILTIN_VHSDF_SDF (BINOP_USS, facge, 0)
+  BUILTIN_VHSDF_HSDF (BINOP_USS, faclt, 0)
+  BUILTIN_VHSDF_HSDF (BINOP_USS, facle, 0)
+  BUILTIN_VHSDF_HSDF (BINOP_USS, facgt, 0)
+  BUILTIN_VHSDF_HSDF (BINOP_USS, facge, 0)
 
   /* Implemented by sqrt<mode>2.  */
   VAR1 (UNOP, sqrt, 2, hf)
@@ -543,3 +548,7 @@
   BUILTIN_GPI_I16 (UNOPUS, fixuns_trunchf, 2)
   BUILTIN_GPI (UNOPUS, fixuns_truncsf, 2)
   BUILTIN_GPI (UNOPUS, fixuns_truncdf, 2)
+
+  /* Implemented by <fmaxmin><mode>3.  */
+  VAR1 (BINOP, fmax, 3, hf)
+  VAR1 (BINOP, fmin, 3, hf)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 6e6c4ac..bc02833 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -391,9 +391,9 @@
   [(set_attr "type" "neon_fp_rsqrte_<stype><q>")])
 
 (define_insn "aarch64_rsqrts<mode>"
-  [(set (match_operand:VHSDF_SDF 0 "register_operand" "=w")
-	(unspec:VHSDF_SDF [(match_operand:VHSDF_SDF 1 "register_operand" "w")
-			   (match_operand:VHSDF_SDF 2 "register_operand" "w")]
+  [(set (match_operand:VHSDF_HSDF 0 "register_operand" "=w")
+	(unspec:VHSDF_HSDF [(match_operand:VHSDF_HSDF 1 "register_operand" "w")
+			    (match_operand:VHSDF_HSDF 2 "register_operand" "w")]
 	 UNSPEC_RSQRTS))]
   "TARGET_SIMD"
   "frsqrts\\t%<v>0<Vmtype>, %<v>1<Vmtype>, %<v>2<Vmtype>"
@@ -475,11 +475,11 @@
 )
 
 (define_insn "fabd<mode>3"
-  [(set (match_operand:VHSDF_SDF 0 "register_operand" "=w")
-	(abs:VHSDF_SDF
-	  (minus:VHSDF_SDF
-	    (match_operand:VHSDF_SDF 1 "register_operand" "w")
-	    (match_operand:VHSDF_SDF 2 "register_operand" "w"))))]
+  [(set (match_operand:VHSDF_HSDF 0 "register_operand" "=w")
+	(abs:VHSDF_HSDF
+	  (minus:VHSDF_HSDF
+	    (match_operand:VHSDF_HSDF 1 "register_operand" "w")
+	    (match_operand:VHSDF_HSDF 2 "register_operand" "w"))))]
   "TARGET_SIMD"
   "fabd\t%<v>0<Vmtype>, %<v>1<Vmtype>, %<v>2<Vmtype>"
   [(set_attr "type" "neon_fp_abd_<stype><q>")]
@@ -3021,10 +3021,10 @@
 ;; fmulx.
 
 (define_insn "aarch64_fmulx<mode>"
-  [(set (match_operand:VHSDF_SDF 0 "register_operand" "=w")
-	(unspec:VHSDF_SDF
-	  [(match_operand:VHSDF_SDF 1 "register_operand" "w")
-	   (match_operand:VHSDF_SDF 2 "register_operand" "w")]
+  [(set (match_operand:VHSDF_HSDF 0 "register_operand" "=w")
+	(unspec:VHSDF_HSDF
+	  [(match_operand:VHSDF_HSDF 1 "register_operand" "w")
+	   (match_operand:VHSDF_HSDF 2 "register_operand" "w")]
 	   UNSPEC_FMULX))]
  "TARGET_SIMD"
  "fmulx\t%<v>0<Vmtype>, %<v>1<Vmtype>, %<v>2<Vmtype>"
@@ -4290,10 +4290,10 @@
   [(set (match_operand:<V_cmp_result> 0 "register_operand" "=w")
 	(neg:<V_cmp_result>
 	  (FAC_COMPARISONS:<V_cmp_result>
-	    (abs:VHSDF_SDF
-	      (match_operand:VHSDF_SDF 1 "register_operand" "w"))
-	    (abs:VHSDF_SDF
-	      (match_operand:VHSDF_SDF 2 "register_operand" "w"))
+	    (abs:VHSDF_HSDF
+	      (match_operand:VHSDF_HSDF 1 "register_operand" "w"))
+	    (abs:VHSDF_HSDF
+	      (match_operand:VHSDF_HSDF 2 "register_operand" "w"))
   )))]
   "TARGET_SIMD"
   "fac<n_optab>\t%<v>0<Vmtype>, %<v><cmp_1><Vmtype>, %<v><cmp_2><Vmtype>"
@@ -5400,10 +5400,10 @@
 )
 
 (define_insn "aarch64_frecps<mode>"
-  [(set (match_operand:VHSDF_SDF 0 "register_operand" "=w")
-	(unspec:VHSDF_SDF
-	  [(match_operand:VHSDF_SDF 1 "register_operand" "w")
-	  (match_operand:VHSDF_SDF 2 "register_operand" "w")]
+  [(set (match_operand:VHSDF_HSDF 0 "register_operand" "=w")
+	(unspec:VHSDF_HSDF
+	  [(match_operand:VHSDF_HSDF 1 "register_operand" "w")
+	  (match_operand:VHSDF_HSDF 2 "register_operand" "w")]
 	  UNSPEC_FRECPS))]
   "TARGET_SIMD"
   "frecps\\t%<v>0<Vmtype>, %<v>1<Vmtype>, %<v>2<Vmtype>"
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 520026d..81a4f20 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4661,38 +4661,78 @@
    (set_attr "simd" "*, yes")]
 )
 
+(define_insn "<FCVT_F2FIXED:fcvt_fixed_insn>hf<mode>3"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+	(unspec:GPI [(match_operand:HF 1 "register_operand" "w")
+		     (match_operand:SI 2 "immediate_operand" "i")]
+	 FCVT_F2FIXED))]
+  "TARGET_FP_F16INST"
+   "<FCVT_F2FIXED:fcvt_fixed_insn>\t%<GPI:w>0, %h1, #%2"
+  [(set_attr "type" "f_cvtf2i")]
+)
+
+(define_insn "<FCVT_FIXED2F:fcvt_fixed_insn><mode>hf3"
+  [(set (match_operand:HF 0 "register_operand" "=w")
+	(unspec:HF [(match_operand:GPI 1 "register_operand" "r")
+		    (match_operand:SI 2 "immediate_operand" "i")]
+	 FCVT_FIXED2F))]
+  "TARGET_FP_F16INST"
+  "<FCVT_FIXED2F:fcvt_fixed_insn>\t%h0, %<GPI:w>1, #%2"
+  [(set_attr "type" "f_cvti2f")]
+)
+
+(define_insn "<FCVT_F2FIXED:fcvt_fixed_insn>hf3"
+  [(set (match_operand:HI 0 "register_operand" "=w")
+	(unspec:HI [(match_operand:HF 1 "register_operand" "w")
+		    (match_operand:SI 2 "immediate_operand" "i")]
+	 FCVT_F2FIXED))]
+  "TARGET_SIMD"
+  "<FCVT_F2FIXED:fcvt_fixed_insn>\t%h0, %h1, #%2"
+  [(set_attr "type" "neon_fp_to_int_s")]
+)
+
+(define_insn "<FCVT_FIXED2F:fcvt_fixed_insn>hi3"
+  [(set (match_operand:HF 0 "register_operand" "=w")
+	(unspec:HF [(match_operand:HI 1 "register_operand" "w")
+		    (match_operand:SI 2 "immediate_operand" "i")]
+	 FCVT_FIXED2F))]
+  "TARGET_SIMD"
+  "<FCVT_FIXED2F:fcvt_fixed_insn>\t%h0, %h1, #%2"
+  [(set_attr "type" "neon_int_to_fp_s")]
+)
+
 ;; -------------------------------------------------------------------
 ;; Floating-point arithmetic
 ;; -------------------------------------------------------------------
 
 (define_insn "add<mode>3"
-  [(set (match_operand:GPF 0 "register_operand" "=w")
-        (plus:GPF
-         (match_operand:GPF 1 "register_operand" "w")
-         (match_operand:GPF 2 "register_operand" "w")))]
+  [(set (match_operand:GPF_F16 0 "register_operand" "=w")
+        (plus:GPF_F16
+         (match_operand:GPF_F16 1 "register_operand" "w")
+         (match_operand:GPF_F16 2 "register_operand" "w")))]
   "TARGET_FLOAT"
   "fadd\\t%<s>0, %<s>1, %<s>2"
-  [(set_attr "type" "fadd<s>")]
+  [(set_attr "type" "fadd<stype>")]
 )
 
 (define_insn "sub<mode>3"
-  [(set (match_operand:GPF 0 "register_operand" "=w")
-        (minus:GPF
-         (match_operand:GPF 1 "register_operand" "w")
-         (match_operand:GPF 2 "register_operand" "w")))]
+  [(set (match_operand:GPF_F16 0 "register_operand" "=w")
+        (minus:GPF_F16
+         (match_operand:GPF_F16 1 "register_operand" "w")
+         (match_operand:GPF_F16 2 "register_operand" "w")))]
   "TARGET_FLOAT"
   "fsub\\t%<s>0, %<s>1, %<s>2"
-  [(set_attr "type" "fadd<s>")]
+  [(set_attr "type" "fadd<stype>")]
 )
 
 (define_insn "mul<mode>3"
-  [(set (match_operand:GPF 0 "register_operand" "=w")
-        (mult:GPF
-         (match_operand:GPF 1 "register_operand" "w")
-         (match_operand:GPF 2 "register_operand" "w")))]
+  [(set (match_operand:GPF_F16 0 "register_operand" "=w")
+        (mult:GPF_F16
+         (match_operand:GPF_F16 1 "register_operand" "w")
+         (match_operand:GPF_F16 2 "register_operand" "w")))]
   "TARGET_FLOAT"
   "fmul\\t%<s>0, %<s>1, %<s>2"
-  [(set_attr "type" "fmul<s>")]
+  [(set_attr "type" "fmul<stype>")]
 )
 
 (define_insn "*fnmul<mode>3"
@@ -4716,13 +4756,13 @@
 )
 
 (define_insn "div<mode>3"
-  [(set (match_operand:GPF 0 "register_operand" "=w")
-        (div:GPF
-         (match_operand:GPF 1 "register_operand" "w")
-         (match_operand:GPF 2 "register_operand" "w")))]
+  [(set (match_operand:GPF_F16 0 "register_operand" "=w")
+        (div:GPF_F16
+         (match_operand:GPF_F16 1 "register_operand" "w")
+         (match_operand:GPF_F16 2 "register_operand" "w")))]
   "TARGET_FLOAT"
   "fdiv\\t%<s>0, %<s>1, %<s>2"
-  [(set_attr "type" "fdiv<s>")]
+  [(set_attr "type" "fdiv<stype>")]
 )
 
 (define_insn "neg<mode>2"
@@ -4773,13 +4813,13 @@
 
 ;; Scalar forms for the IEEE-754 fmax()/fmin() functions
 (define_insn "<fmaxmin><mode>3"
-  [(set (match_operand:GPF 0 "register_operand" "=w")
-	(unspec:GPF [(match_operand:GPF 1 "register_operand" "w")
-		     (match_operand:GPF 2 "register_operand" "w")]
+  [(set (match_operand:GPF_F16 0 "register_operand" "=w")
+	(unspec:GPF_F16 [(match_operand:GPF_F16 1 "register_operand" "w")
+		     (match_operand:GPF_F16 2 "register_operand" "w")]
 		     FMAXMIN))]
   "TARGET_FLOAT"
   "<fmaxmin_op>\\t%<s>0, %<s>1, %<s>2"
-  [(set_attr "type" "f_minmax<s>")]
+  [(set_attr "type" "f_minmax<stype>")]
 )
 
 ;; For copysign (x, y), we want to generate:
diff --git a/gcc/config/aarch64/arm_fp16.h b/gcc/config/aarch64/arm_fp16.h
index 818aa61..21edc65 100644
--- a/gcc/config/aarch64/arm_fp16.h
+++ b/gcc/config/aarch64/arm_fp16.h
@@ -360,6 +360,206 @@ vsqrth_f16 (float16_t __a)
   return __builtin_aarch64_sqrthf (__a);
 }
 
+/* ARMv8.2-A FP16 two operands scalar intrinsics.  */
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vaddh_f16 (float16_t __a, float16_t __b)
+{
+  return __a + __b;
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vabdh_f16 (float16_t __a, float16_t __b)
+{
+  return __builtin_aarch64_fabdhf (__a, __b);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vcageh_f16 (float16_t __a, float16_t __b)
+{
+  return __builtin_aarch64_facgehf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vcagth_f16 (float16_t __a, float16_t __b)
+{
+  return __builtin_aarch64_facgthf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vcaleh_f16 (float16_t __a, float16_t __b)
+{
+  return __builtin_aarch64_faclehf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vcalth_f16 (float16_t __a, float16_t __b)
+{
+  return __builtin_aarch64_faclthf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vceqh_f16 (float16_t __a, float16_t __b)
+{
+  return __builtin_aarch64_cmeqhf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vcgeh_f16 (float16_t __a, float16_t __b)
+{
+  return __builtin_aarch64_cmgehf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vcgth_f16 (float16_t __a, float16_t __b)
+{
+  return __builtin_aarch64_cmgthf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vcleh_f16 (float16_t __a, float16_t __b)
+{
+  return __builtin_aarch64_cmlehf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vclth_f16 (float16_t __a, float16_t __b)
+{
+  return __builtin_aarch64_cmlthf_uss (__a, __b);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vcvth_n_f16_s16 (int16_t __a, const int __b)
+{
+  return __builtin_aarch64_scvtfhi (__a, __b);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vcvth_n_f16_s32 (int32_t __a, const int __b)
+{
+  return __builtin_aarch64_scvtfsihf (__a, __b);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vcvth_n_f16_s64 (int64_t __a, const int __b)
+{
+  return __builtin_aarch64_scvtfdihf (__a, __b);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vcvth_n_f16_u16 (uint16_t __a, const int __b)
+{
+  return __builtin_aarch64_ucvtfhi_sus (__a, __b);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vcvth_n_f16_u32 (uint32_t __a, const int __b)
+{
+  return __builtin_aarch64_ucvtfsihf_sus (__a, __b);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vcvth_n_f16_u64 (uint64_t __a, const int __b)
+{
+  return __builtin_aarch64_ucvtfdihf_sus (__a, __b);
+}
+
+__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+vcvth_n_s16_f16 (float16_t __a, const int __b)
+{
+  return __builtin_aarch64_fcvtzshf (__a, __b);
+}
+
+__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+vcvth_n_s32_f16 (float16_t __a, const int __b)
+{
+  return __builtin_aarch64_fcvtzshfsi (__a, __b);
+}
+
+__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+vcvth_n_s64_f16 (float16_t __a, const int __b)
+{
+  return __builtin_aarch64_fcvtzshfdi (__a, __b);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vcvth_n_u16_f16 (float16_t __a, const int __b)
+{
+  return __builtin_aarch64_fcvtzuhf_uss (__a, __b);
+}
+
+__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+vcvth_n_u32_f16 (float16_t __a, const int __b)
+{
+  return __builtin_aarch64_fcvtzuhfsi_uss (__a, __b);
+}
+
+__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+vcvth_n_u64_f16 (float16_t __a, const int __b)
+{
+  return __builtin_aarch64_fcvtzuhfdi_uss (__a, __b);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vdivh_f16 (float16_t __a, float16_t __b)
+{
+  return __a / __b;
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vmaxh_f16 (float16_t __a, float16_t __b)
+{
+  return __builtin_aarch64_fmaxhf (__a, __b);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vmaxnmh_f16 (float16_t __a, float16_t __b)
+{
+  return __builtin_aarch64_fmaxhf (__a, __b);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vminh_f16 (float16_t __a, float16_t __b)
+{
+  return __builtin_aarch64_fminhf (__a, __b);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vminnmh_f16 (float16_t __a, float16_t __b)
+{
+  return __builtin_aarch64_fminhf (__a, __b);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vmulh_f16 (float16_t __a, float16_t __b)
+{
+  return __a * __b;
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vmulxh_f16 (float16_t __a, float16_t __b)
+{
+  return __builtin_aarch64_fmulxhf (__a, __b);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vrecpsh_f16 (float16_t __a, float16_t __b)
+{
+  return __builtin_aarch64_frecpshf (__a, __b);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vrsqrtsh_f16 (float16_t __a, float16_t __b)
+{
+  return __builtin_aarch64_rsqrtshf (__a, __b);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vsubh_f16 (float16_t __a, float16_t __b)
+{
+  return __a - __b;
+}
+
 #pragma GCC pop_options
 
 #endif
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 20d0f1b..91e2e64 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -105,9 +105,6 @@
 (define_mode_iterator VHSDF_DF [(V4HF "TARGET_SIMD_F16INST")
 				(V8HF "TARGET_SIMD_F16INST")
 				V2SF V4SF V2DF DF])
-(define_mode_iterator VHSDF_SDF [(V4HF "TARGET_SIMD_F16INST")
-				 (V8HF "TARGET_SIMD_F16INST")
-				 V2SF V4SF V2DF SF DF])
 (define_mode_iterator VHSDF_HSDF [(V4HF "TARGET_SIMD_F16INST")
 				  (V8HF "TARGET_SIMD_F16INST")
 				  V2SF V4SF V2DF
@@ -190,7 +187,9 @@
 ;; Scalar and Vector modes for S and D, Vector modes for H.
 (define_mode_iterator VSDQ_HSDI [(V4HI "TARGET_SIMD_F16INST")
 				 (V8HI "TARGET_SIMD_F16INST")
-				 V2SI V4SI V2DI SI DI])
+				 V2SI V4SI V2DI
+				 (HI "TARGET_SIMD_F16INST")
+				 SI DI])
 
 ;; Vector modes for Q and H types.
 (define_mode_iterator VDQQH [V8QI V16QI V4HI V8HI])
@@ -705,12 +704,12 @@
 			       (V2DI "v2df") (V4SI "v4sf") (V2SI "v2sf")
 			       (SF "si") (DF "di") (SI "sf") (DI "df")
 			       (V4HF "v4hi") (V8HF "v8hi") (V4HI "v4hf")
-			       (V8HI "v8hf")])
+			       (V8HI "v8hf") (HF "hi") (HI "hf")])
 (define_mode_attr FCVT_TARGET [(V2DF "V2DI") (V4SF "V4SI") (V2SF "V2SI")
 			       (V2DI "V2DF") (V4SI "V4SF") (V2SI "V2SF")
 			       (SF "SI") (DF "DI") (SI "SF") (DI "DF")
 			       (V4HF "V4HI") (V8HF "V8HI") (V4HI "V4HF")
-			       (V8HI "V8HF")])
+			       (V8HI "V8HF") (HF "HI") (HI "HF")])
 
 
 ;; for the inequal width integer to fp conversions
-- 
2.5.0






^ permalink raw reply	[flat|nested] 32+ messages in thread

* [AArch64][10/14] ARMv8.2-A FP16 lane scalar intrinsics
       [not found]                 ` <94dcb98c-81c6-a1d5-bb1a-ff8278f0a07b@foss.arm.com>
       [not found]                   ` <82155ca9-a506-b1fc-bdd4-6a637dc66a1e@foss.arm.com>
@ 2016-07-07 16:18                   ` Jiong Wang
  2016-07-25 11:16                     ` James Greenhalgh
  1 sibling, 1 reply; 32+ messages in thread
From: Jiong Wang @ 2016-07-07 16:18 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 432 bytes --]

This patch adds ARMv8.2-A FP16 lane scalar intrinsics.

gcc/
2016-07-07  Jiong Wang <jiong.wang@arm.com>

         * config/aarch64/arm_neon.h (vfmah_lane_f16): New.
         (vfmah_laneq_f16): Likewise.
         (vfmsh_lane_f16): Likewise.
         (vfmsh_laneq_f16): Likewise.
         (vmulh_lane_f16): Likewise.
         (vmulh_laneq_f16): Likewise.
         (vmulxh_lane_f16): Likewise.
         (vmulxh_laneq_f16): Likewise.


[-- Attachment #2: 0010-10-14-ARMv8.2-FP16-lane-scalar-intrinsics.patch --]
[-- Type: text/x-patch, Size: 4197 bytes --]

From bcbe5035746c5684a3b9f0b62310f6aa276db364 Mon Sep 17 00:00:00 2001
From: Jiong Wang <jiong.wang@arm.com>
Date: Thu, 9 Jun 2016 11:06:29 +0100
Subject: [PATCH 10/14] [10/14] ARMv8.2 FP16 lane scalar intrinsics

---
 gcc/config/aarch64/arm_neon.h | 52 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index e727ff1..09095d1 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -26488,6 +26488,20 @@ vfmsq_f16 (float16x8_t __a, float16x8_t __b, float16x8_t __c)
 
 /* ARMv8.2-A FP16 lane vector intrinsics.  */
 
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vfmah_lane_f16 (float16_t __a, float16_t __b,
+		float16x4_t __c, const int __lane)
+{
+  return vfmah_f16 (__a, __b, __aarch64_vget_lane_any (__c, __lane));
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vfmah_laneq_f16 (float16_t __a, float16_t __b,
+		 float16x8_t __c, const int __lane)
+{
+  return vfmah_f16 (__a, __b, __aarch64_vget_lane_any (__c, __lane));
+}
+
 __extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
 vfma_lane_f16 (float16x4_t __a, float16x4_t __b,
 	       float16x4_t __c, const int __lane)
@@ -26528,6 +26542,20 @@ vfmaq_n_f16 (float16x8_t __a, float16x8_t __b, float16_t __c)
   return vfmaq_f16 (__a, __b, vdupq_n_f16 (__c));
 }
 
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vfmsh_lane_f16 (float16_t __a, float16_t __b,
+		float16x4_t __c, const int __lane)
+{
+  return vfmsh_f16 (__a, __b, __aarch64_vget_lane_any (__c, __lane));
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vfmsh_laneq_f16 (float16_t __a, float16_t __b,
+		 float16x8_t __c, const int __lane)
+{
+  return vfmsh_f16 (__a, __b, __aarch64_vget_lane_any (__c, __lane));
+}
+
 __extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
 vfms_lane_f16 (float16x4_t __a, float16x4_t __b,
 	       float16x4_t __c, const int __lane)
@@ -26568,6 +26596,12 @@ vfmsq_n_f16 (float16x8_t __a, float16x8_t __b, float16_t __c)
   return vfmsq_f16 (__a, __b, vdupq_n_f16 (__c));
 }
 
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vmulh_lane_f16 (float16_t __a, float16x4_t __b, const int __lane)
+{
+  return __a * __aarch64_vget_lane_any (__b, __lane);
+}
+
 __extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
 vmul_lane_f16 (float16x4_t __a, float16x4_t __b, const int __lane)
 {
@@ -26580,6 +26614,12 @@ vmulq_lane_f16 (float16x8_t __a, float16x4_t __b, const int __lane)
   return vmulq_f16 (__a, vdupq_n_f16 (__aarch64_vget_lane_any (__b, __lane)));
 }
 
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vmulh_laneq_f16 (float16_t __a, float16x8_t __b, const int __lane)
+{
+  return __a * __aarch64_vget_lane_any (__b, __lane);
+}
+
 __extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
 vmul_laneq_f16 (float16x4_t __a, float16x8_t __b, const int __lane)
 {
@@ -26604,6 +26644,12 @@ vmulq_n_f16 (float16x8_t __a, float16_t __b)
   return vmulq_laneq_f16 (__a, vdupq_n_f16 (__b), 0);
 }
 
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vmulxh_lane_f16 (float16_t __a, float16x4_t __b, const int __lane)
+{
+  return vmulxh_f16 (__a, __aarch64_vget_lane_any (__b, __lane));
+}
+
 __extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
 vmulx_lane_f16 (float16x4_t __a, float16x4_t __b, const int __lane)
 {
@@ -26616,6 +26662,12 @@ vmulxq_lane_f16 (float16x8_t __a, float16x4_t __b, const int __lane)
   return vmulxq_f16 (__a, __aarch64_vdupq_lane_f16 (__b, __lane));
 }
 
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vmulxh_laneq_f16 (float16_t __a, float16x8_t __b, const int __lane)
+{
+  return vmulxh_f16 (__a, __aarch64_vget_lane_any (__b, __lane));
+}
+
 __extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
 vmulx_laneq_f16 (float16x4_t __a, float16x8_t __b, const int __lane)
 {
-- 
2.5.0





^ permalink raw reply	[flat|nested] 32+ messages in thread

* [AArch64][11/14] ARMv8.2-A FP16 testsuite selector
       [not found]                   ` <82155ca9-a506-b1fc-bdd4-6a637dc66a1e@foss.arm.com>
@ 2016-07-07 16:18                     ` Jiong Wang
  2016-10-10  8:57                       ` James Greenhalgh
       [not found]                     ` <135287e5-6fc1-4957-d320-16f38260fa28@foss.arm.com>
  1 sibling, 1 reply; 32+ messages in thread
From: Jiong Wang @ 2016-07-07 16:18 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 1125 bytes --]

ARMv8.2-A adds support for scalar and vector FP16 instructions to ARM and
AArch64. This patch adds support for testing code for AArch64 targets
using the new instructions. It is based on the target-support code for
ARMv8.2-A added for ARM (AArch32).

The patch
- Updates effective-target directives arm_v8_2a_fp16_scalar_ok,
   arm_v8_2a_fp16_scalar_hw, arm_v8_2a_fp16_neon_ok and
   arm_v8_2a_fp16_neon_hw to check for target and hardware support of
   FP16 instructions on AArch64.

gcc/testsuite/
2016-07-07  Matthew Wahab <matthew.wahab@arm.com>
             Jiong Wang <jiong.wang@arm.com>

         * target-supports.exp (add_options_for_arm_v8_2a_fp16_scalar):
         Mention AArch64 support.
         (add_options_for_arm_v8_2a_fp16_neon): Likewise.
         (check_effective_target_arm_v8_2a_fp16_scalar_ok_nocache): Support
         AArch64 targets.
         (check_effective_target_arm_v8_2a_fp16_neon_ok_nocache): Support
         AArch64 targets.
         (check_effective_target_arm_v8_2a_fp16_scalar_hw): Support AArch64
         targets.
         (check_effective_target_arm_v8_2a_fp16_neon_hw): Likewise.


[-- Attachment #2: 0011-11-14-TESTSUITE-selector-for-ARMv8.2-A-FP16-extensio.patch --]
[-- Type: text/x-patch, Size: 4650 bytes --]

From be7793ef912b7aac0e548f528e54aaf5ddd40a4e Mon Sep 17 00:00:00 2001
From: Jiong Wang <jiong.wang@arm.com>
Date: Wed, 6 Jul 2016 14:43:31 +0100
Subject: [PATCH 11/14] [11/14] TESTSUITE selector for ARMv8.2-A FP16 extension

---
 gcc/testsuite/lib/target-supports.exp | 50 ++++++++++++++++++++++++++---------
 1 file changed, 38 insertions(+), 12 deletions(-)

diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index a13e852..812e85a 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2905,7 +2905,7 @@ proc add_options_for_arm_v8_1a_neon { flags } {
 }
 
 # Add the options needed for ARMv8.2 with the scalar FP16 extension.
-# Also adds the ARMv8 FP options for ARM.
+# Also adds the ARMv8 FP options for ARM and for AArch64.
 
 proc add_options_for_arm_v8_2a_fp16_scalar { flags } {
     if { ! [check_effective_target_arm_v8_2a_fp16_scalar_ok] } {
@@ -2916,7 +2916,7 @@ proc add_options_for_arm_v8_2a_fp16_scalar { flags } {
 }
 
 # Add the options needed for ARMv8.2 with the FP16 extension.  Also adds
-# the ARMv8 NEON options for ARM.
+# the ARMv8 NEON options for ARM and for AArch64.
 
 proc add_options_for_arm_v8_2a_fp16_neon { flags } {
     if { ! [check_effective_target_arm_v8_2a_fp16_neon_ok] } {
@@ -3487,14 +3487,14 @@ proc check_effective_target_arm_v8_1a_neon_ok { } {
 }
 
 # Return 1 if the target supports ARMv8.2 scalar FP16 arithmetic
-# instructions, 0 otherwise.  The test is valid for ARM.  Record the
-# command line options needed.
+# instructions, 0 otherwise.  The test is valid for ARM and for AArch64.
+# Record the command line options needed.
 
 proc check_effective_target_arm_v8_2a_fp16_scalar_ok_nocache { } {
     global et_arm_v8_2a_fp16_scalar_flags
     set et_arm_v8_2a_fp16_scalar_flags ""
 
-    if { ![istarget arm*-*-*] } {
+    if { ![istarget arm*-*-*] && ![istarget aarch64*-*-*] } {
 	return 0;
     }
 
@@ -3522,14 +3522,14 @@ proc check_effective_target_arm_v8_2a_fp16_scalar_ok { } {
 }
 
 # Return 1 if the target supports ARMv8.2 Adv.SIMD FP16 arithmetic
-# instructions, 0 otherwise.  The test is valid for ARM.  Record the
-# command line options needed.
+# instructions, 0 otherwise.  The test is valid for ARM and for AArch64.
+# Record the command line options needed.
 
 proc check_effective_target_arm_v8_2a_fp16_neon_ok_nocache { } {
     global et_arm_v8_2a_fp16_neon_flags
     set et_arm_v8_2a_fp16_neon_flags ""
 
-    if { ![istarget arm*-*-*] } {
+    if { ![istarget arm*-*-*] && ![istarget aarch64*-*-*] } {
 	return 0;
     }
 
@@ -3613,7 +3613,8 @@ proc check_effective_target_arm_v8_1a_neon_hw { } {
 }
 
 # Return 1 if the target supports executing floating point instructions from
-# ARMv8.2 with the FP16 extension, 0 otherwise.  The test is valid for ARM.
+# ARMv8.2 with the FP16 extension, 0 otherwise.  The test is valid for ARM and
+# for AArch64.
 
 proc check_effective_target_arm_v8_2a_fp16_scalar_hw { } {
     if { ![check_effective_target_arm_v8_2a_fp16_scalar_ok] } {
@@ -3626,19 +3627,30 @@ proc check_effective_target_arm_v8_2a_fp16_scalar_hw { } {
 	  __fp16 a = 1.0;
 	  __fp16 result;
 
+	  #ifdef __ARM_ARCH_ISA_A64
+
+	  asm ("fabs %h0, %h1"
+	       : "=w"(result)
+	       : "w"(a)
+	       : /* No clobbers.  */);
+
+	  #else
+
 	  asm ("vabs.f16 %0, %1"
 	       : "=w"(result)
 	       : "w"(a)
 	       : /* No clobbers.  */);
 
+	  #endif
+
 	  return (result == 1.0) ? 0 : 1;
 	}
     } [add_options_for_arm_v8_2a_fp16_scalar ""]]
 }
 
-# Return 1 if the target supports executing instructions Adv.SIMD
-# instructions from ARMv8.2 with the FP16 extension, 0 otherwise.  The
-# test is valid for ARM.
+# Return 1 if the target supports executing Adv.SIMD instructions from ARMv8.2
+# with the FP16 extension, 0 otherwise.  The test is valid for ARM and for
+# AArch64.
 
 proc check_effective_target_arm_v8_2a_fp16_neon_hw { } {
     if { ![check_effective_target_arm_v8_2a_fp16_neon_ok] } {
@@ -3648,6 +3660,18 @@ proc check_effective_target_arm_v8_2a_fp16_neon_hw { } {
 	int
 	main (void)
 	{
+	  #ifdef __ARM_ARCH_ISA_A64
+
+	  __Float16x4_t a = {1.0, -1.0, 1.0, -1.0};
+	  __Float16x4_t result;
+
+	  asm ("fabs %0.4h, %1.4h"
+	       : "=w"(result)
+	       : "w"(a)
+	       : /* No clobbers.  */);
+
+	  #else
+
 	  __simd64_float16_t a = {1.0, -1.0, 1.0, -1.0};
 	  __simd64_float16_t result;
 
@@ -3656,6 +3680,8 @@ proc check_effective_target_arm_v8_2a_fp16_neon_hw { } {
 	       : "w"(a)
 	       : /* No clobbers.  */);
 
+	  #endif
+
 	  return (result[0] == 1.0) ? 0 : 1;
 	}
     } [add_options_for_arm_v8_2a_fp16_neon ""]]
-- 
2.5.0



^ permalink raw reply	[flat|nested] 32+ messages in thread

* [AArch64][9/14] ARMv8.2-A FP16 three operands scalar intrinsics
       [not found]               ` <d91fc169-1317-55ed-c36c-6dc5dac088cc@foss.arm.com>
@ 2016-07-07 16:18                 ` Jiong Wang
  2016-07-25 11:15                   ` James Greenhalgh
       [not found]                 ` <94dcb98c-81c6-a1d5-bb1a-ff8278f0a07b@foss.arm.com>
  1 sibling, 1 reply; 32+ messages in thread
From: Jiong Wang @ 2016-07-07 16:18 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 362 bytes --]

This patch add ARMv8.2-A FP16 three operands scalar intrinsics.

gcc/
2016-07-07  Jiong Wang <jiong.wang@arm.com>

         * config/aarch64/aarch64-simd-builtins.def: Register new builtins.
         * config/aarch64/aarch64.md (fma): New for HF.
         (fnma): Likewise.
         * config/aarch64/arm_fp16.h (vfmah_f16): New.
         (vfmsh_f16): Likewise.


[-- Attachment #2: 0009-9-14-ARMv8.2-FP16-three-operands-scalar-intrinsics.patch --]
[-- Type: text/x-patch, Size: 3293 bytes --]

From 292838908f82ed8b1e0dbf79e451cbb82841f9ed Mon Sep 17 00:00:00 2001
From: Jiong Wang <jiong.wang@arm.com>
Date: Thu, 9 Jun 2016 11:05:27 +0100
Subject: [PATCH 09/14] [9/14] ARMv8.2 FP16 three operands scalar intrinsics

---
 gcc/config/aarch64/aarch64-simd-builtins.def |  2 ++
 gcc/config/aarch64/aarch64.md                | 21 +++++++++++----------
 gcc/config/aarch64/arm_fp16.h                | 14 ++++++++++++++
 3 files changed, 27 insertions(+), 10 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index b32fdfe..bc5eda6 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -422,8 +422,10 @@
 
   /* Implemented by fma<mode>4.  */
   BUILTIN_VHSDF (TERNOP, fma, 4)
+  VAR1 (TERNOP, fma, 4, hf)
   /* Implemented by fnma<mode>4.  */
   BUILTIN_VHSDF (TERNOP, fnma, 4)
+  VAR1 (TERNOP, fnma, 4, hf)
 
   /* Implemented by aarch64_simd_bsl<mode>.  */
   BUILTIN_VDQQH (BSL_P, simd_bsl, 0)
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 81a4f20..5664fd1 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4493,23 +4493,24 @@
 ;; fma - no throw
 
 (define_insn "fma<mode>4"
-  [(set (match_operand:GPF 0 "register_operand" "=w")
-        (fma:GPF (match_operand:GPF 1 "register_operand" "w")
-		 (match_operand:GPF 2 "register_operand" "w")
-		 (match_operand:GPF 3 "register_operand" "w")))]
+  [(set (match_operand:GPF_F16 0 "register_operand" "=w")
+        (fma:GPF_F16 (match_operand:GPF_F16 1 "register_operand" "w")
+		     (match_operand:GPF_F16 2 "register_operand" "w")
+		     (match_operand:GPF_F16 3 "register_operand" "w")))]
   "TARGET_FLOAT"
   "fmadd\\t%<s>0, %<s>1, %<s>2, %<s>3"
-  [(set_attr "type" "fmac<s>")]
+  [(set_attr "type" "fmac<stype>")]
 )
 
 (define_insn "fnma<mode>4"
-  [(set (match_operand:GPF 0 "register_operand" "=w")
-	(fma:GPF (neg:GPF (match_operand:GPF 1 "register_operand" "w"))
-		 (match_operand:GPF 2 "register_operand" "w")
-		 (match_operand:GPF 3 "register_operand" "w")))]
+  [(set (match_operand:GPF_F16 0 "register_operand" "=w")
+	(fma:GPF_F16
+	  (neg:GPF_F16 (match_operand:GPF_F16 1 "register_operand" "w"))
+	  (match_operand:GPF_F16 2 "register_operand" "w")
+	  (match_operand:GPF_F16 3 "register_operand" "w")))]
   "TARGET_FLOAT"
   "fmsub\\t%<s>0, %<s>1, %<s>2, %<s>3"
-  [(set_attr "type" "fmac<s>")]
+  [(set_attr "type" "fmac<stype>")]
 )
 
 (define_insn "fms<mode>4"
diff --git a/gcc/config/aarch64/arm_fp16.h b/gcc/config/aarch64/arm_fp16.h
index 21edc65..4b7c2dd 100644
--- a/gcc/config/aarch64/arm_fp16.h
+++ b/gcc/config/aarch64/arm_fp16.h
@@ -560,6 +560,20 @@ vsubh_f16 (float16_t __a, float16_t __b)
   return __a - __b;
 }
 
+/* ARMv8.2-A FP16 three operands scalar intrinsics.  */
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vfmah_f16 (float16_t __a, float16_t __b, float16_t __c)
+{
+  return __builtin_aarch64_fmahf (__b, __c, __a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vfmsh_f16 (float16_t __a, float16_t __b, float16_t __c)
+{
+  return __builtin_aarch64_fnmahf (__b, __c, __a);
+}
+
 #pragma GCC pop_options
 
 #endif
-- 
2.5.0





^ permalink raw reply	[flat|nested] 32+ messages in thread

* [AArch64][13/14] ARMv8.2-A testsuite for new vector intrinsics
       [not found]                       ` <cdb3640f-134a-f2be-c728-b1467fb7aaf9@foss.arm.com>
@ 2016-07-07 16:19                         ` Jiong Wang
  2016-10-10  9:55                           ` James Greenhalgh
       [not found]                         ` <c5443f0d-577b-776b-4c97-7b16b06f8264@foss.arm.com>
  1 sibling, 1 reply; 32+ messages in thread
From: Jiong Wang @ 2016-07-07 16:19 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 1153 bytes --]

This patch contains testcases for those new vector intrinsics which are only
available for AArch64.


gcc/testsuite/
2016-07-07  Jiong Wang <jiong.wang@arm.com>

         * gcc.target/aarch64/advsimd-intrinsics/vdiv_f16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vfmas_lane_f16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vfmas_n_f16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vmaxnmv_f16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vmaxv_f16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vminnmv_f16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vminv_f16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vmul_lane_f16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vmulx_f16_1.c: New
         * gcc.target/aarch64/advsimd-intrinsics/vmulx_lane_f16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vmulx_n_f16_1.c: New
         * gcc.target/aarch64/advsimd-intrinsics/vpminmaxnm_f16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vrndi_f16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vsqrt_f16_1.c: New.


[-- Attachment #2: 0013-13-14-TESTSUITE-for-new-vector-intrinsics.patch --]
[-- Type: text/x-patch, Size: 121783 bytes --]

From 774c4cf2488dff693c7130a62561f2da88639283 Mon Sep 17 00:00:00 2001
From: Jiong Wang <jiong.wang@arm.com>
Date: Tue, 5 Jul 2016 10:39:28 +0100
Subject: [PATCH 13/14] [13/14] TESTSUITE for new vector intrinsics

---
 .../aarch64/advsimd-intrinsics/vdiv_f16_1.c        |  86 ++
 .../aarch64/advsimd-intrinsics/vfmas_lane_f16_1.c  | 908 +++++++++++++++++++++
 .../aarch64/advsimd-intrinsics/vfmas_n_f16_1.c     | 469 +++++++++++
 .../aarch64/advsimd-intrinsics/vmaxnmv_f16_1.c     | 131 +++
 .../aarch64/advsimd-intrinsics/vmaxv_f16_1.c       | 131 +++
 .../aarch64/advsimd-intrinsics/vminnmv_f16_1.c     | 131 +++
 .../aarch64/advsimd-intrinsics/vminv_f16_1.c       | 131 +++
 .../aarch64/advsimd-intrinsics/vmul_lane_f16_1.c   | 454 +++++++++++
 .../aarch64/advsimd-intrinsics/vmulx_f16_1.c       |  84 ++
 .../aarch64/advsimd-intrinsics/vmulx_lane_f16_1.c  | 452 ++++++++++
 .../aarch64/advsimd-intrinsics/vmulx_n_f16_1.c     | 177 ++++
 .../aarch64/advsimd-intrinsics/vpminmaxnm_f16_1.c  | 114 +++
 .../aarch64/advsimd-intrinsics/vrndi_f16_1.c       |  71 ++
 .../aarch64/advsimd-intrinsics/vsqrt_f16_1.c       |  72 ++
 14 files changed, 3411 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vdiv_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfmas_lane_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfmas_n_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmaxnmv_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmaxv_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vminnmv_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vminv_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmul_lane_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulx_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulx_lane_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulx_n_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vpminmaxnm_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndi_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vsqrt_f16_1.c

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vdiv_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vdiv_f16_1.c
new file mode 100644
index 0000000..c0103fb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vdiv_f16_1.c
@@ -0,0 +1,86 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon } */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (13.4)
+#define B FP16_C (-56.8)
+#define C FP16_C (-34.8)
+#define D FP16_C (12)
+#define E FP16_C (63.1)
+#define F FP16_C (19.1)
+#define G FP16_C (-4.8)
+#define H FP16_C (77)
+
+#define I FP16_C (0.7)
+#define J FP16_C (-78)
+#define K FP16_C (11.23)
+#define L FP16_C (98)
+#define M FP16_C (87.1)
+#define N FP16_C (-8)
+#define O FP16_C (-1.1)
+#define P FP16_C (-9.7)
+
+/* Expected results for vdiv.  */
+VECT_VAR_DECL (expected_div_static, hfloat, 16, 4) []
+  = { 0x32CC /* A / E.  */, 0xC1F3 /* B / F.  */,
+      0x4740 /* C / G.  */, 0x30FD /* D / H.  */ };
+
+VECT_VAR_DECL (expected_div_static, hfloat, 16, 8) []
+  = { 0x32CC /* A / E.  */, 0xC1F3 /* B / F.  */,
+      0x4740 /* C / G.  */, 0x30FD /* D / H.  */,
+      0x201D /* I / M.  */, 0x48E0 /* J / N.  */,
+      0xC91B /* K / O.  */, 0xC90D /* L / P.  */ };
+
+void exec_vdiv_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VDIV (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE(vsrc_1, float, 16, 4);
+  DECL_VARIABLE(vsrc_2, float, 16, 4);
+  VECT_VAR_DECL (buf_src_1, float, 16, 4) [] = {A, B, C, D};
+  VECT_VAR_DECL (buf_src_2, float, 16, 4) [] = {E, F, G, H};
+  VLOAD (vsrc_1, buf_src_1, , float, f, 16, 4);
+  VLOAD (vsrc_2, buf_src_2, , float, f, 16, 4);
+
+  DECL_VARIABLE (vector_res, float, 16, 4)
+    = vdiv_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		VECT_VAR (vsrc_2, float, 16, 4));
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_div_static, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VDIVQ (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE(vsrc_1, float, 16, 8);
+  DECL_VARIABLE(vsrc_2, float, 16, 8);
+  VECT_VAR_DECL (buf_src_1, float, 16, 8) [] = {A, B, C, D, I, J, K, L};
+  VECT_VAR_DECL (buf_src_2, float, 16, 8) [] = {E, F, G, H, M, N, O, P};
+  VLOAD (vsrc_1, buf_src_1, q, float, f, 16, 8);
+  VLOAD (vsrc_2, buf_src_2, q, float, f, 16, 8);
+
+  DECL_VARIABLE (vector_res, float, 16, 8)
+    = vdivq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		 VECT_VAR (vsrc_2, float, 16, 8));
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_div_static, "");
+}
+
+int
+main (void)
+{
+  exec_vdiv_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfmas_lane_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfmas_lane_f16_1.c
new file mode 100644
index 0000000..00c95d3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfmas_lane_f16_1.c
@@ -0,0 +1,908 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon } */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FP16_C(a) ((__fp16) a)
+#define A0 FP16_C (123.4)
+#define A1 FP16_C (-5.8)
+#define A2 FP16_C (-0.0)
+#define A3 FP16_C (10)
+#define A4 FP16_C (123412.43)
+#define A5 FP16_C (-5.8)
+#define A6 FP16_C (90.8)
+#define A7 FP16_C (24)
+
+#define B0 FP16_C (23.4)
+#define B1 FP16_C (-5.8)
+#define B2 FP16_C (8.9)
+#define B3 FP16_C (4.0)
+#define B4 FP16_C (3.4)
+#define B5 FP16_C (-550.8)
+#define B6 FP16_C (-31.8)
+#define B7 FP16_C (20000.0)
+
+/* Expected results for vfma_lane.  */
+VECT_VAR_DECL (expected0_static, hfloat, 16, 4) []
+  = { 0x613E /* A0 + B0 * B0.  */,
+      0xD86D /* A1 + B1 * B0.  */,
+      0x5A82 /* A2 + B2 * B0.  */,
+      0x567A /* A3 + B3 * B0.  */};
+
+VECT_VAR_DECL (expected1_static, hfloat, 16, 4) []
+  = { 0xCA33 /* A0 + B0 * B1.  */,
+      0x4EF6 /* A1 + B1 * B1.  */,
+      0xD274 /* A2 + B2 * B1.  */,
+      0xCA9A /* A3 + B3 * B1.  */ };
+
+VECT_VAR_DECL (expected2_static, hfloat, 16, 4) []
+  = { 0x5D2F /* A0 + B0 * B2.  */,
+      0xD32D /* A1 + B1 * B2.  */,
+      0x54F3 /* A2 + B2 * B2.  */,
+      0x51B3 /* A3 + B3 * B2.  */ };
+
+VECT_VAR_DECL (expected3_static, hfloat, 16, 4) []
+  = { 0x5AC8 /* A0 + B0 * B3.  */,
+      0xCF40 /* A1 + B1 * B3.  */,
+      0x5073 /* A2 + B2 * B3.  */,
+      0x4E80 /* A3 + B3 * B3.  */ };
+
+/* Expected results for vfmaq_lane.  */
+VECT_VAR_DECL (expected0_static, hfloat, 16, 8) []
+  = { 0x613E /* A0 + B0 * B0.  */,
+      0xD86D /* A1 + B1 * B0.  */,
+      0x5A82 /* A2 + B2 * B0.  */,
+      0x567A /* A3 + B3 * B0.  */,
+      0x7C00 /* A4 + B4 * B0.  */,
+      0xF24D /* A5 + B5 * B0.  */,
+      0xE11B /* A6 + B6 * B0.  */,
+      0x7C00 /* A7 + B7 * B0.  */ };
+
+VECT_VAR_DECL (expected1_static, hfloat, 16, 8) []
+  = { 0xCA33 /* A0 + B0 * B1.  */,
+      0x4EF6 /* A1 + B1 * B1.  */,
+      0xD274 /* A2 + B2 * B1.  */,
+      0xCA9A /* A3 + B3 * B1.  */,
+      0x7C00 /* A4 + B4 * B1.  */,
+      0x6A3B /* A5 + B5 * B1.  */,
+      0x5C4D /* A6 + B6 * B1.  */,
+      0xFC00 /* A7 + B7 * B1.  */ };
+
+VECT_VAR_DECL (expected2_static, hfloat, 16, 8) []
+  = { 0x5D2F /* A0 + B0 * B2.  */,
+      0xD32D /* A1 + B1 * B2.  */,
+      0x54F3 /* A2 + B2 * B2.  */,
+      0x51B3 /* A3 + B3 * B2.  */,
+      0x7C00 /* A4 + B4 * B2.  */,
+      0xECCB /* A5 + B5 * B2.  */,
+      0xDA01 /* A6 + B6 * B2.  */,
+      0x7C00 /* A7 + B7 * B2.  */ };
+
+VECT_VAR_DECL (expected3_static, hfloat, 16, 8) []
+  = { 0x5AC8 /* A0 + B0 * B3.  */,
+      0xCF40 /* A1 + B1 * B3.  */,
+      0x5073 /* A2 + B2 * B3.  */,
+      0x4E80 /* A3 + B3 * B3.  */,
+      0x7C00 /* A4 + B4 * B3.  */,
+      0xE851 /* A5 + B5 * B3.  */,
+      0xD08C /* A6 + B6 * B3.  */,
+      0x7C00 /* A7 + B7 * B3.  */ };
+
+/* Expected results for vfma_laneq.  */
+VECT_VAR_DECL (expected0_laneq_static, hfloat, 16, 4) []
+  = { 0x613E /* A0 + B0 * B0.  */,
+      0xD86D /* A1 + B1 * B0.  */,
+      0x5A82 /* A2 + B2 * B0.  */,
+      0x567A /* A3 + B3 * B0.  */ };
+
+VECT_VAR_DECL (expected1_laneq_static, hfloat, 16, 4) []
+  = { 0xCA33 /* A0 + B0 * B1.  */,
+      0x4EF6 /* A1 + B1 * B1.  */,
+      0xD274 /* A2 + B2 * B1.  */,
+      0xCA9A /* A3 + B3 * B1.  */ };
+
+VECT_VAR_DECL (expected2_laneq_static, hfloat, 16, 4) []
+  = { 0x5D2F /* A0 + B0 * B2.  */,
+      0xD32D /* A1 + B1 * B2.  */,
+      0x54F3 /* A2 + B2 * B2.  */,
+      0x51B3 /* A3 + B3 * B2.  */ };
+
+VECT_VAR_DECL (expected3_laneq_static, hfloat, 16, 4) []
+  = { 0x5AC8 /* A0 + B0 * B3.  */,
+      0xCF40 /* A1 + B1 * B3.  */,
+      0x5073 /* A2 + B2 * B3.  */,
+      0x4E80 /* A3 + B3 * B3.  */ };
+
+VECT_VAR_DECL (expected4_laneq_static, hfloat, 16, 4) []
+  = { 0x5A58 /* A0 + B0 * B4.  */,
+      0xCE62 /* A1 + B1 * B4.  */,
+      0x4F91 /* A2 + B2 * B4.  */,
+      0x4DE6 /* A3 + B3 * B4.  */ };
+
+VECT_VAR_DECL (expected5_laneq_static, hfloat, 16, 4) []
+  = { 0xF23D /* A0 + B0 * B5.  */,
+      0x6A3B /* A1 + B1 * B5.  */,
+      0xECCA /* A2 + B2 * B5.  */,
+      0xE849 /* A3 + B3 * B5.  */ };
+
+VECT_VAR_DECL (expected6_laneq_static, hfloat, 16, 4) []
+  = { 0xE0DA /* A0 + B0 * B6.  */,
+      0x5995 /* A1 + B1 * B6.  */,
+      0xDC6C /* A2 + B2 * B6.  */,
+      0xD753 /* A3 + B3 * B6.  */ };
+
+VECT_VAR_DECL (expected7_laneq_static, hfloat, 16, 4) []
+  = { 0x7C00 /* A0 + B0 * B7.  */,
+      0xFC00 /* A1 + B1 * B7.  */,
+      0x7C00 /* A2 + B2 * B7.  */,
+      0x7C00 /* A3 + B3 * B7.  */ };
+
+/* Expected results for vfmaq_laneq.  */
+VECT_VAR_DECL (expected0_laneq_static, hfloat, 16, 8) []
+  = { 0x613E /* A0 + B0 * B0.  */,
+      0xD86D /* A1 + B1 * B0.  */,
+      0x5A82 /* A2 + B2 * B0.  */,
+      0x567A /* A3 + B3 * B0.  */,
+      0x7C00 /* A4 + B4 * B0.  */,
+      0xF24D /* A5 + B5 * B0.  */,
+      0xE11B /* A6 + B6 * B0.  */,
+      0x7C00 /* A7 + B7 * B0.  */ };
+
+VECT_VAR_DECL (expected1_laneq_static, hfloat, 16, 8) []
+  = { 0xCA33 /* A0 + B0 * B1.  */,
+      0x4EF6 /* A1 + B1 * B1.  */,
+      0xD274 /* A2 + B2 * B1.  */,
+      0xCA9A /* A3 + B3 * B1.  */,
+      0x7C00 /* A4 + B4 * B1.  */,
+      0x6A3B /* A5 + B5 * B1.  */,
+      0x5C4D /* A6 + B6 * B1.  */,
+      0xFC00 /* A7 + B7 * B1.  */ };
+
+VECT_VAR_DECL (expected2_laneq_static, hfloat, 16, 8) []
+  = { 0x5D2F /* A0 + B0 * B2.  */,
+      0xD32D /* A1 + B1 * B2.  */,
+      0x54F3 /* A2 + B2 * B2.  */,
+      0x51B3 /* A3 + B3 * B2.  */,
+      0x7C00 /* A4 + B4 * B2.  */,
+      0xECCB /* A5 + B5 * B2.  */,
+      0xDA01 /* A6 + B6 * B2.  */,
+      0x7C00 /* A7 + B7 * B2.  */ };
+
+VECT_VAR_DECL (expected3_laneq_static, hfloat, 16, 8) []
+  = { 0x5AC8 /* A0 + B0 * B3.  */,
+      0xCF40 /* A1 + B1 * B3.  */,
+      0x5073 /* A2 + B2 * B3.  */,
+      0x4E80 /* A3 + B3 * B3.  */,
+      0x7C00 /* A4 + B4 * B3.  */,
+      0xE851 /* A5 + B5 * B3.  */,
+      0xD08C /* A6 + B6 * B3.  */,
+      0x7C00 /* A7 + B7 * B3.  */ };
+
+VECT_VAR_DECL (expected4_laneq_static, hfloat, 16, 8) []
+  = { 0x5A58 /* A0 + B0 * B4.  */,
+      0xCE62 /* A1 + B1 * B4.  */,
+      0x4F91 /* A2 + B2 * B4.  */,
+      0x4DE6 /* A3 + B3 * B4.  */,
+      0x7C00 /* A4 + B4 * B4.  */,
+      0xE757 /* A5 + B5 * B4.  */,
+      0xCC54 /* A6 + B6 * B4.  */,
+      0x7C00 /* A7 + B7 * B4.  */ };
+
+VECT_VAR_DECL (expected5_laneq_static, hfloat, 16, 8) []
+  = { 0xF23D /* A0 + B0 * B5.  */,
+      0x6A3B /* A1 + B1 * B5.  */,
+      0xECCA /* A2 + B2 * B5.  */,
+      0xE849 /* A3 + B3 * B5.  */,
+      0x7C00 /* A4 + B4 * B5.  */,
+      0x7C00 /* A5 + B5 * B5.  */,
+      0x744D /* A6 + B6 * B5.  */,
+      0xFC00 /* A7 + B7 * B5.  */ };
+
+VECT_VAR_DECL (expected6_laneq_static, hfloat, 16, 8) []
+  = { 0xE0DA /* A0 + B0 * B6.  */,
+      0x5995 /* A1 + B1 * B6.  */,
+      0xDC6C /* A2 + B2 * B6.  */,
+      0xD753 /* A3 + B3 * B6.  */,
+      0x7C00 /* A4 + B4 * B6.  */,
+      0x7447 /* A5 + B5 * B6.  */,
+      0x644E /* A6 + B6 * B6.  */,
+      0xFC00 /* A7 + B7 * B6.  */ };
+
+VECT_VAR_DECL (expected7_laneq_static, hfloat, 16, 8) []
+  = { 0x7C00 /* A0 + B0 * B7.  */,
+      0xFC00 /* A1 + B1 * B7.  */,
+      0x7C00 /* A2 + B2 * B7.  */,
+      0x7C00 /* A3 + B3 * B7.  */,
+      0x7C00 /* A4 + B4 * B7.  */,
+      0xFC00 /* A5 + B5 * B7.  */,
+      0xFC00 /* A6 + B6 * B7.  */,
+      0x7C00 /* A7 + B7 * B7.  */ };
+
+/* Expected results for vfms_lane.  */
+VECT_VAR_DECL (expected0_fms_static, hfloat, 16, 4) []
+  = { 0xDEA2 /* A0 + (-B0) * B0.  */,
+      0x5810 /* A1 + (-B1) * B0.  */,
+      0xDA82 /* A2 + (-B2) * B0.  */,
+      0xD53A /* A3 + (-B3) * B0.  */ };
+
+VECT_VAR_DECL (expected1_fms_static, hfloat, 16, 4) []
+  = { 0x5C0D /* A0 + (-B0) * B1.  */,
+      0xD0EE /* A1 + (-B1) * B1.  */,
+      0x5274 /* A2 + (-B2) * B1.  */,
+      0x5026 /* A3 + (-B3) * B1.  */ };
+
+VECT_VAR_DECL (expected2_fms_static, hfloat, 16, 4) []
+  = { 0xD54E /* A0 + (-B0) * B2.  */,
+      0x51BA /* A1 + (-B1) * B2.  */,
+      0xD4F3 /* A2 + (-B2) * B2.  */,
+      0xCE66 /* A3 + (-B3) * B2.  */ };
+
+VECT_VAR_DECL (expected3_fms_static, hfloat, 16, 4) []
+  = { 0x4F70 /* A0 + (-B0) * B3.  */,
+      0x4C5A /* A1 + (-B1) * B3.  */,
+      0xD073 /* A2 + (-B2) * B3.  */,
+      0xC600 /* A3 + (-B3) * B3.  */ };
+
+/* Expected results for vfmsq_lane.  */
+VECT_VAR_DECL (expected0_fms_static, hfloat, 16, 8) []
+  = { 0xDEA2 /* A0 + (-B0) * B0.  */,
+      0x5810 /* A1 + (-B1) * B0.  */,
+      0xDA82 /* A2 + (-B2) * B0.  */,
+      0xD53A /* A3 + (-B3) * B0.  */,
+      0x7C00 /* A4 + (-B4) * B0.  */,
+      0x724B /* A5 + (-B5) * B0.  */,
+      0x6286 /* A6 + (-B6) * B0.  */,
+      0xFC00 /* A7 + (-B7) * B0.  */ };
+
+VECT_VAR_DECL (expected1_fms_static, hfloat, 16, 8) []
+  = { 0x5C0D /* A0 + (-B0) * B1.  */,
+      0xD0EE /* A1 + (-B1) * B1.  */,
+      0x5274 /* A2 + (-B2) * B1.  */,
+      0x5026 /* A3 + (-B3) * B1.  */,
+      0x7C00 /* A4 + (-B4) * B1.  */,
+      0xEA41 /* A5 + (-B5) * B1.  */,
+      0xD5DA /* A6 + (-B6) * B1.  */,
+      0x7C00 /* A7 + (-B7) * B1.  */ };
+
+VECT_VAR_DECL (expected2_fms_static, hfloat, 16, 8) []
+  = { 0xD54E /* A0 + (-B0) * B2.  */,
+      0x51BA /* A1 + (-B1) * B2.  */,
+      0xD4F3 /* A2 + (-B2) * B2.  */,
+      0xCE66 /* A3 + (-B3) * B2.  */,
+      0x7C00 /* A4 + (-B4) * B2.  */,
+      0x6CC8 /* A5 + (-B5) * B2.  */,
+      0x5DD7 /* A6 + (-B6) * B2.  */,
+      0xFC00 /* A7 + (-B7) * B2.  */ };
+
+VECT_VAR_DECL (expected3_fms_static, hfloat, 16, 8) []
+  = { 0x4F70 /* A0 + (-B0) * B3.  */,
+      0x4C5A /* A1 + (-B1) * B3.  */,
+      0xD073 /* A2 + (-B2) * B3.  */,
+      0xC600 /* A3 + (-B3) * B3.  */,
+      0x7C00 /* A4 + (-B4) * B3.  */,
+      0x684B /* A5 + (-B5) * B3.  */,
+      0x5AD0 /* A6 + (-B6) * B3.  */,
+      0xFC00 /* A7 + (-B7) * B3.  */ };
+
+/* Expected results for vfms_laneq.  */
+VECT_VAR_DECL (expected0_fms_laneq_static, hfloat, 16, 4) []
+  = { 0xDEA2 /* A0 + (-B0) * B0.  */,
+      0x5810 /* A1 + (-B1) * B0.  */,
+      0xDA82 /* A2 + (-B2) * B0.  */,
+      0xD53A /* A3 + (-B3) * B0.  */ };
+
+VECT_VAR_DECL (expected1_fms_laneq_static, hfloat, 16, 4) []
+  = { 0x5C0D /* A0 + (-B0) * B1.  */,
+      0xD0EE /* A1 + (-B1) * B1.  */,
+      0x5274 /* A2 + (-B2) * B1.  */,
+      0x5026 /* A3 + (-B3) * B1.  */ };
+
+VECT_VAR_DECL (expected2_fms_laneq_static, hfloat, 16, 4) []
+  = { 0xD54E /* A0 + (-B0) * B2.  */,
+      0x51BA /* A1 + (-B1) * B2.  */,
+      0xD4F3 /* A2 + (-B2) * B2.  */,
+      0xCE66 /* A3 + (-B3) * B2.  */ };
+
+VECT_VAR_DECL (expected3_fms_laneq_static, hfloat, 16, 4) []
+  = { 0x4F70 /* A0 + (-B0) * B3.  */,
+      0x4C5A /* A1 + (-B1) * B3.  */,
+      0xD073 /* A2 + (-B2) * B3.  */,
+      0xC600 /* A3 + (-B3) * B3.  */ };
+
+VECT_VAR_DECL (expected4_fms_laneq_static, hfloat, 16, 4) []
+  = { 0x5179 /* A0 + (-B0) * B4.  */,
+      0x4AF6 /* A1 + (-B1) * B4.  */,
+      0xCF91 /* A2 + (-B2) * B4.  */,
+      0xC334 /* A3 + (-B3) * B4.  */ };
+
+VECT_VAR_DECL (expected5_fms_laneq_static, hfloat, 16, 4) []
+  = { 0x725C /* A0 + (-B0) * B5.  */,
+      0xEA41 /* A1 + (-B1) * B5.  */,
+      0x6CCA /* A2 + (-B2) * B5.  */,
+      0x6853 /* A3 + (-B3) * B5.  */ };
+
+VECT_VAR_DECL (expected6_fms_laneq_static, hfloat, 16, 4) []
+  = { 0x62C7 /* A0 + (-B0) * B6.  */,
+      0xD9F2 /* A1 + (-B1) * B6.  */,
+      0x5C6C /* A2 + (-B2) * B6.  */,
+      0x584A /* A3 + (-B3) * B6.  */ };
+
+VECT_VAR_DECL (expected7_fms_laneq_static, hfloat, 16, 4) []
+  = { 0xFC00 /* A0 + (-B0) * B7.  */,
+      0x7C00 /* A1 + (-B1) * B7.  */,
+      0xFC00 /* A2 + (-B2) * B7.  */,
+      0xFC00 /* A3 + (-B3) * B7.  */ };
+
+/* Expected results for vfmsq_laneq.  */
+VECT_VAR_DECL (expected0_fms_laneq_static, hfloat, 16, 8) []
+  = { 0xDEA2 /* A0 + (-B0) * B0.  */,
+      0x5810 /* A1 + (-B1) * B0.  */,
+      0xDA82 /* A2 + (-B2) * B0.  */,
+      0xD53A /* A3 + (-B3) * B0.  */,
+      0x7C00 /* A4 + (-B4) * B0.  */,
+      0x724B /* A5 + (-B5) * B0.  */,
+      0x6286 /* A6 + (-B6) * B0.  */,
+      0xFC00 /* A7 + (-B7) * B0.  */ };
+
+VECT_VAR_DECL (expected1_fms_laneq_static, hfloat, 16, 8) []
+  = { 0x5C0D /* A0 + (-B0) * B1.  */,
+      0xD0EE /* A1 + (-B1) * B1.  */,
+      0x5274 /* A2 + (-B2) * B1.  */,
+      0x5026 /* A3 + (-B3) * B1.  */,
+      0x7C00 /* A4 + (-B4) * B1.  */,
+      0xEA41 /* A5 + (-B5) * B1.  */,
+      0xD5DA /* A6 + (-B6) * B1.  */,
+      0x7C00 /* A7 + (-B7) * B1.  */ };
+
+VECT_VAR_DECL (expected2_fms_laneq_static, hfloat, 16, 8) []
+  = { 0xD54E /* A0 + (-B0) * B2.  */,
+      0x51BA /* A1 + (-B1) * B2.  */,
+      0xD4F3 /* A2 + (-B2) * B2.  */,
+      0xCE66 /* A3 + (-B3) * B2.  */,
+      0x7C00 /* A4 + (-B4) * B2.  */,
+      0x6CC8 /* A5 + (-B5) * B2.  */,
+      0x5DD7 /* A6 + (-B6) * B2.  */,
+      0xFC00 /* A7 + (-B7) * B2.  */ };
+
+VECT_VAR_DECL (expected3_fms_laneq_static, hfloat, 16, 8) []
+  = { 0x4F70 /* A0 + (-B0) * B3.  */,
+      0x4C5A /* A1 + (-B1) * B3.  */,
+      0xD073 /* A2 + (-B2) * B3.  */,
+      0xC600 /* A3 + (-B3) * B3.  */,
+      0x7C00 /* A4 + (-B4) * B3.  */,
+      0x684B /* A5 + (-B5) * B3.  */,
+      0x5AD0 /* A6 + (-B6) * B3.  */,
+      0xFC00 /* A7 + (-B7) * B3.  */ };
+
+VECT_VAR_DECL (expected4_fms_laneq_static, hfloat, 16, 8) []
+  = { 0x5179 /* A0 + (-B0) * B4.  */,
+      0x4AF6 /* A1 + (-B1) * B4.  */,
+      0xCF91 /* A2 + (-B2) * B4.  */,
+      0xC334 /* A3 + (-B3) * B4.  */,
+      0x7C00 /* A4 + (-B4) * B4.  */,
+      0x674C /* A5 + (-B5) * B4.  */,
+      0x5A37 /* A6 + (-B6) * B4.  */,
+      0xFC00 /* A7 + (-B7) * B4.  */ };
+
+VECT_VAR_DECL (expected5_fms_laneq_static, hfloat, 16, 8) []
+  = { 0x725C /* A0 + (-B0) * B5.  */,
+      0xEA41 /* A1 + (-B1) * B5.  */,
+      0x6CCA /* A2 + (-B2) * B5.  */,
+      0x6853 /* A3 + (-B3) * B5.  */,
+      0x7C00 /* A4 + (-B4) * B5.  */,
+      0xFC00 /* A5 + (-B5) * B5.  */,
+      0xF441 /* A6 + (-B6) * B5.  */,
+      0x7C00 /* A7 + (-B7) * B5.  */ };
+
+VECT_VAR_DECL (expected6_fms_laneq_static, hfloat, 16, 8) []
+  = { 0x62C7 /* A0 + (-B0) * B6.  */,
+      0xD9F2 /* A1 + (-B1) * B6.  */,
+      0x5C6C /* A2 + (-B2) * B6.  */,
+      0x584A /* A3 + (-B3) * B6.  */,
+      0x7C00 /* A4 + (-B4) * B6.  */,
+      0xF447 /* A5 + (-B5) * B6.  */,
+      0xE330 /* A6 + (-B6) * B6.  */,
+      0x7C00 /* A7 + (-B7) * B6.  */ };
+
+VECT_VAR_DECL (expected7_fms_laneq_static, hfloat, 16, 8) []
+  = { 0xFC00 /* A0 + (-B0) * B7.  */,
+      0x7C00 /* A1 + (-B1) * B7.  */,
+      0xFC00 /* A2 + (-B2) * B7.  */,
+      0xFC00 /* A3 + (-B3) * B7.  */,
+      0x7C00 /* A4 + (-B4) * B7.  */,
+      0x7C00 /* A5 + (-B5) * B7.  */,
+      0x7C00 /* A6 + (-B6) * B7.  */,
+      0xFC00 /* A7 + (-B7) * B7.  */ };
+
+void exec_vfmas_lane_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VFMA_LANE (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE(vsrc_1, float, 16, 4);
+  DECL_VARIABLE(vsrc_2, float, 16, 4);
+  VECT_VAR_DECL (buf_src_1, float, 16, 4) [] = {A0, A1, A2, A3};
+  VECT_VAR_DECL (buf_src_2, float, 16, 4) [] = {B0, B1, B2, B3};
+  VLOAD (vsrc_1, buf_src_1, , float, f, 16, 4);
+  VLOAD (vsrc_2, buf_src_2, , float, f, 16, 4);
+  DECL_VARIABLE (vector_res, float, 16, 4)
+    = vfma_lane_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		     VECT_VAR (vsrc_2, float, 16, 4),
+		     VECT_VAR (vsrc_2, float, 16, 4), 0);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected0_static, "");
+
+  VECT_VAR (vector_res, float, 16, 4)
+    = vfma_lane_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		     VECT_VAR (vsrc_2, float, 16, 4),
+		     VECT_VAR (vsrc_2, float, 16, 4), 1);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected1_static, "");
+
+  VECT_VAR (vector_res, float, 16, 4)
+    = vfma_lane_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		     VECT_VAR (vsrc_2, float, 16, 4),
+		     VECT_VAR (vsrc_2, float, 16, 4), 2);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected2_static, "");
+
+  VECT_VAR (vector_res, float, 16, 4)
+    = vfma_lane_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		     VECT_VAR (vsrc_2, float, 16, 4),
+		     VECT_VAR (vsrc_2, float, 16, 4), 3);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected3_static, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VFMAQ_LANE (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE(vsrc_1, float, 16, 8);
+  DECL_VARIABLE(vsrc_2, float, 16, 8);
+  VECT_VAR_DECL (buf_src_1, float, 16, 8) [] = {A0, A1, A2, A3, A4, A5, A6, A7};
+  VECT_VAR_DECL (buf_src_2, float, 16, 8) [] = {B0, B1, B2, B3, B4, B5, B6, B7};
+  VLOAD (vsrc_1, buf_src_1, q, float, f, 16, 8);
+  VLOAD (vsrc_2, buf_src_2, q, float, f, 16, 8);
+  DECL_VARIABLE (vector_res, float, 16, 8)
+    = vfmaq_lane_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		      VECT_VAR (vsrc_2, float, 16, 8),
+		      VECT_VAR (vsrc_2, float, 16, 4), 0);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected0_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vfmaq_lane_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		      VECT_VAR (vsrc_2, float, 16, 8),
+		      VECT_VAR (vsrc_2, float, 16, 4), 1);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected1_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vfmaq_lane_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		      VECT_VAR (vsrc_2, float, 16, 8),
+		      VECT_VAR (vsrc_2, float, 16, 4), 2);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected2_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vfmaq_lane_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		      VECT_VAR (vsrc_2, float, 16, 8),
+		      VECT_VAR (vsrc_2, float, 16, 4), 3);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected3_static, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VFMA_LANEQ (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE(vsrc_3, float, 16, 8);
+  VECT_VAR_DECL (buf_src_3, float, 16, 8) [] = {B0, B1, B2, B3, B4, B5, B6, B7};
+  VLOAD (vsrc_3, buf_src_3, q, float, f, 16, 8);
+  VECT_VAR (vector_res, float, 16, 4)
+    = vfma_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		      VECT_VAR (vsrc_2, float, 16, 4),
+		      VECT_VAR (vsrc_3, float, 16, 8), 0);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected0_laneq_static, "");
+
+  VECT_VAR (vector_res, float, 16, 4)
+    = vfma_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		      VECT_VAR (vsrc_2, float, 16, 4),
+		      VECT_VAR (vsrc_3, float, 16, 8), 1);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected1_laneq_static, "");
+
+  VECT_VAR (vector_res, float, 16, 4)
+    = vfma_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		      VECT_VAR (vsrc_2, float, 16, 4),
+		      VECT_VAR (vsrc_3, float, 16, 8), 2);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected2_laneq_static, "");
+
+  VECT_VAR (vector_res, float, 16, 4)
+    = vfma_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		      VECT_VAR (vsrc_2, float, 16, 4),
+		      VECT_VAR (vsrc_3, float, 16, 8), 3);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected3_laneq_static, "");
+
+  VECT_VAR (vector_res, float, 16, 4)
+    = vfma_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		      VECT_VAR (vsrc_2, float, 16, 4),
+		      VECT_VAR (vsrc_3, float, 16, 8), 4);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected4_laneq_static, "");
+
+  VECT_VAR (vector_res, float, 16, 4)
+    = vfma_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		      VECT_VAR (vsrc_2, float, 16, 4),
+		      VECT_VAR (vsrc_3, float, 16, 8), 5);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected5_laneq_static, "");
+
+  VECT_VAR (vector_res, float, 16, 4)
+    = vfma_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		      VECT_VAR (vsrc_2, float, 16, 4),
+		      VECT_VAR (vsrc_3, float, 16, 8), 6);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected6_laneq_static, "");
+
+  VECT_VAR (vector_res, float, 16, 4)
+    = vfma_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		      VECT_VAR (vsrc_2, float, 16, 4),
+		      VECT_VAR (vsrc_3, float, 16, 8), 7);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected7_laneq_static, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VFMAQ_LANEQ (FP16)"
+  clean_results ();
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vfmaq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		       VECT_VAR (vsrc_2, float, 16, 8),
+		       VECT_VAR (vsrc_3, float, 16, 8), 0);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected0_laneq_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vfmaq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		       VECT_VAR (vsrc_2, float, 16, 8),
+		       VECT_VAR (vsrc_3, float, 16, 8), 1);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected1_laneq_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vfmaq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		       VECT_VAR (vsrc_2, float, 16, 8),
+		       VECT_VAR (vsrc_3, float, 16, 8), 2);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected2_laneq_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vfmaq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		       VECT_VAR (vsrc_2, float, 16, 8),
+		       VECT_VAR (vsrc_3, float, 16, 8), 3);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected3_laneq_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vfmaq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		       VECT_VAR (vsrc_2, float, 16, 8),
+		       VECT_VAR (vsrc_3, float, 16, 8), 4);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected4_laneq_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vfmaq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		       VECT_VAR (vsrc_2, float, 16, 8),
+		       VECT_VAR (vsrc_3, float, 16, 8), 5);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected5_laneq_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vfmaq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		       VECT_VAR (vsrc_2, float, 16, 8),
+		       VECT_VAR (vsrc_3, float, 16, 8), 6);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected6_laneq_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vfmaq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		       VECT_VAR (vsrc_2, float, 16, 8),
+		       VECT_VAR (vsrc_3, float, 16, 8), 7);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected7_laneq_static, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VFMS_LANE (FP16)"
+  clean_results ();
+
+  VECT_VAR (vector_res, float, 16, 4)
+    = vfms_lane_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		     VECT_VAR (vsrc_2, float, 16, 4),
+		     VECT_VAR (vsrc_2, float, 16, 4), 0);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected0_fms_static, "");
+
+  VECT_VAR (vector_res, float, 16, 4)
+    = vfms_lane_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		     VECT_VAR (vsrc_2, float, 16, 4),
+		     VECT_VAR (vsrc_2, float, 16, 4), 1);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected1_fms_static, "");
+
+  VECT_VAR (vector_res, float, 16, 4)
+    = vfms_lane_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		     VECT_VAR (vsrc_2, float, 16, 4),
+		     VECT_VAR (vsrc_2, float, 16, 4), 2);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected2_fms_static, "");
+
+  VECT_VAR (vector_res, float, 16, 4)
+    = vfms_lane_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		     VECT_VAR (vsrc_2, float, 16, 4),
+		     VECT_VAR (vsrc_2, float, 16, 4), 3);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected3_fms_static, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VFMSQ_LANE (FP16)"
+  clean_results ();
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vfmsq_lane_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		      VECT_VAR (vsrc_2, float, 16, 8),
+		      VECT_VAR (vsrc_2, float, 16, 4), 0);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected0_fms_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vfmsq_lane_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		      VECT_VAR (vsrc_2, float, 16, 8),
+		      VECT_VAR (vsrc_2, float, 16, 4), 1);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected1_fms_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vfmsq_lane_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		      VECT_VAR (vsrc_2, float, 16, 8),
+		      VECT_VAR (vsrc_2, float, 16, 4), 2);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected2_fms_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vfmsq_lane_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		      VECT_VAR (vsrc_2, float, 16, 8),
+		      VECT_VAR (vsrc_2, float, 16, 4), 3);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected3_fms_static, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VFMS_LANEQ (FP16)"
+  clean_results ();
+
+  VECT_VAR (vector_res, float, 16, 4)
+    = vfms_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		      VECT_VAR (vsrc_2, float, 16, 4),
+		      VECT_VAR (vsrc_3, float, 16, 8), 0);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected0_fms_laneq_static, "");
+
+  VECT_VAR (vector_res, float, 16, 4)
+    = vfms_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		      VECT_VAR (vsrc_2, float, 16, 4),
+		      VECT_VAR (vsrc_3, float, 16, 8), 1);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected1_fms_laneq_static, "");
+
+  VECT_VAR (vector_res, float, 16, 4)
+    = vfms_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		      VECT_VAR (vsrc_2, float, 16, 4),
+		      VECT_VAR (vsrc_3, float, 16, 8), 2);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected2_fms_laneq_static, "");
+
+  VECT_VAR (vector_res, float, 16, 4)
+    = vfms_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		      VECT_VAR (vsrc_2, float, 16, 4),
+		      VECT_VAR (vsrc_3, float, 16, 8), 3);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected3_fms_laneq_static, "");
+
+  VECT_VAR (vector_res, float, 16, 4)
+    = vfms_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		      VECT_VAR (vsrc_2, float, 16, 4),
+		      VECT_VAR (vsrc_3, float, 16, 8), 4);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected4_fms_laneq_static, "");
+
+  VECT_VAR (vector_res, float, 16, 4)
+    = vfms_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		      VECT_VAR (vsrc_2, float, 16, 4),
+		      VECT_VAR (vsrc_3, float, 16, 8), 5);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected5_fms_laneq_static, "");
+
+  VECT_VAR (vector_res, float, 16, 4)
+    = vfms_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		      VECT_VAR (vsrc_2, float, 16, 4),
+		      VECT_VAR (vsrc_3, float, 16, 8), 6);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected6_fms_laneq_static, "");
+
+  VECT_VAR (vector_res, float, 16, 4)
+    = vfms_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		      VECT_VAR (vsrc_2, float, 16, 4),
+		      VECT_VAR (vsrc_3, float, 16, 8), 7);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected7_fms_laneq_static, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VFMSQ_LANEQ (FP16)"
+  clean_results ();
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vfmsq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		       VECT_VAR (vsrc_2, float, 16, 8),
+		       VECT_VAR (vsrc_3, float, 16, 8), 0);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected0_fms_laneq_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vfmsq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		       VECT_VAR (vsrc_2, float, 16, 8),
+		       VECT_VAR (vsrc_3, float, 16, 8), 1);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected1_fms_laneq_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vfmsq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		       VECT_VAR (vsrc_2, float, 16, 8),
+		       VECT_VAR (vsrc_3, float, 16, 8), 2);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected2_fms_laneq_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vfmsq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		       VECT_VAR (vsrc_2, float, 16, 8),
+		       VECT_VAR (vsrc_3, float, 16, 8), 3);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected3_fms_laneq_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vfmsq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		       VECT_VAR (vsrc_2, float, 16, 8),
+		       VECT_VAR (vsrc_3, float, 16, 8), 4);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected4_fms_laneq_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vfmsq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		       VECT_VAR (vsrc_2, float, 16, 8),
+		       VECT_VAR (vsrc_3, float, 16, 8), 5);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected5_fms_laneq_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vfmsq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		       VECT_VAR (vsrc_2, float, 16, 8),
+		       VECT_VAR (vsrc_3, float, 16, 8), 6);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected6_fms_laneq_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vfmsq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		       VECT_VAR (vsrc_2, float, 16, 8),
+		       VECT_VAR (vsrc_3, float, 16, 8), 7);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected7_fms_laneq_static, "");
+}
+
+int
+main (void)
+{
+  exec_vfmas_lane_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfmas_n_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfmas_n_f16_1.c
new file mode 100644
index 0000000..f01aefb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfmas_n_f16_1.c
@@ -0,0 +1,469 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon } */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FP16_C(a) ((__fp16) a)
+#define A0 FP16_C (123.4)
+#define A1 FP16_C (-5.8)
+#define A2 FP16_C (-0.0)
+#define A3 FP16_C (10)
+#define A4 FP16_C (123412.43)
+#define A5 FP16_C (-5.8)
+#define A6 FP16_C (90.8)
+#define A7 FP16_C (24)
+
+#define B0 FP16_C (23.4)
+#define B1 FP16_C (-5.8)
+#define B2 FP16_C (8.9)
+#define B3 FP16_C (4.0)
+#define B4 FP16_C (3.4)
+#define B5 FP16_C (-550.8)
+#define B6 FP16_C (-31.8)
+#define B7 FP16_C (20000.0)
+
+/* Expected results for vfma_n.  */
+VECT_VAR_DECL (expected_fma0_static, hfloat, 16, 4) []
+  = { 0x613E /* A0 + B0 * B0.  */,
+      0xD86D /* A1 + B1 * B0.  */,
+      0x5A82 /* A2 + B2 * B0.  */,
+      0x567A /* A3 + B3 * B0.  */ };
+
+VECT_VAR_DECL (expected_fma1_static, hfloat, 16, 4) []
+  = { 0xCA33 /* A0 + B0 * B1.  */,
+      0x4EF6 /* A1 + B1 * B1.  */,
+      0xD274 /* A2 + B2 * B1.  */,
+      0xCA9A /* A3 + B3 * B1.  */ };
+
+VECT_VAR_DECL (expected_fma2_static, hfloat, 16, 4) []
+  = { 0x5D2F /* A0 + B0 * B2.  */,
+      0xD32D /* A1 + B1 * B2.  */,
+      0x54F3 /* A2 + B2 * B2.  */,
+      0x51B3 /* A3 + B3 * B2.  */ };
+
+VECT_VAR_DECL (expected_fma3_static, hfloat, 16, 4) []
+  = { 0x5AC8 /* A0 + B0 * B3.  */,
+      0xCF40 /* A1 + B1 * B3.  */,
+      0x5073 /* A2 + B2 * B3.  */,
+      0x4E80 /* A3 + B3 * B3.  */ };
+
+VECT_VAR_DECL (expected_fma0_static, hfloat, 16, 8) []
+  = { 0x613E /* A0 + B0 * B0.  */,
+      0xD86D /* A1 + B1 * B0.  */,
+      0x5A82 /* A2 + B2 * B0.  */,
+      0x567A /* A3 + B3 * B0.  */,
+      0x7C00 /* A4 + B4 * B0.  */,
+      0xF24D /* A5 + B5 * B0.  */,
+      0xE11B /* A6 + B6 * B0.  */,
+      0x7C00 /* A7 + B7 * B0.  */ };
+
+VECT_VAR_DECL (expected_fma1_static, hfloat, 16, 8) []
+  = { 0xCA33 /* A0 + B0 * B1.  */,
+      0x4EF6 /* A1 + B1 * B1.  */,
+      0xD274 /* A2 + B2 * B1.  */,
+      0xCA9A /* A3 + B3 * B1.  */,
+      0x7C00 /* A4 + B4 * B1.  */,
+      0x6A3B /* A5 + B5 * B1.  */,
+      0x5C4D /* A6 + B6 * B1.  */,
+      0xFC00 /* A7 + B7 * B1.  */ };
+
+VECT_VAR_DECL (expected_fma2_static, hfloat, 16, 8) []
+  = { 0x5D2F /* A0 + B0 * B2.  */,
+      0xD32D /* A1 + B1 * B2.  */,
+      0x54F3 /* A2 + B2 * B2.  */,
+      0x51B3 /* A3 + B3 * B2.  */,
+      0x7C00 /* A4 + B4 * B2.  */,
+      0xECCB /* A5 + B5 * B2.  */,
+      0xDA01 /* A6 + B6 * B2.  */,
+      0x7C00 /* A7 + B7 * B2.  */ };
+
+VECT_VAR_DECL (expected_fma3_static, hfloat, 16, 8) []
+  = { 0x5AC8 /* A0 + B0 * B3.  */,
+      0xCF40 /* A1 + B1 * B3.  */,
+      0x5073 /* A2 + B2 * B3.  */,
+      0x4E80 /* A3 + B3 * B3.  */,
+      0x7C00 /* A4 + B4 * B3.  */,
+      0xE851 /* A5 + B5 * B3.  */,
+      0xD08C /* A6 + B6 * B3.  */,
+      0x7C00 /* A7 + B7 * B3.  */ };
+
+VECT_VAR_DECL (expected_fma4_static, hfloat, 16, 8) []
+  = { 0x5A58 /* A0 + B0 * B4.  */,
+      0xCE62 /* A1 + B1 * B4.  */,
+      0x4F91 /* A2 + B2 * B4.  */,
+      0x4DE6 /* A3 + B3 * B4.  */,
+      0x7C00 /* A4 + B4 * B4.  */,
+      0xE757 /* A5 + B5 * B4.  */,
+      0xCC54 /* A6 + B6 * B4.  */,
+      0x7C00 /* A7 + B7 * B4.  */ };
+
+VECT_VAR_DECL (expected_fma5_static, hfloat, 16, 8) []
+  = { 0xF23D /* A0 + B0 * B5.  */,
+      0x6A3B /* A1 + B1 * B5.  */,
+      0xECCA /* A2 + B2 * B5.  */,
+      0xE849 /* A3 + B3 * B5.  */,
+      0x7C00 /* A4 + B4 * B5.  */,
+      0x7C00 /* A5 + B5 * B5.  */,
+      0x744D /* A6 + B6 * B5.  */,
+      0xFC00 /* A7 + B7 * B5.  */ };
+
+VECT_VAR_DECL (expected_fma6_static, hfloat, 16, 8) []
+  = { 0xE0DA /* A0 + B0 * B6.  */,
+      0x5995 /* A1 + B1 * B6.  */,
+      0xDC6C /* A2 + B2 * B6.  */,
+      0xD753 /* A3 + B3 * B6.  */,
+      0x7C00 /* A4 + B4 * B6.  */,
+      0x7447 /* A5 + B5 * B6.  */,
+      0x644E /* A6 + B6 * B6.  */,
+      0xFC00 /* A7 + B7 * B6.  */ };
+
+VECT_VAR_DECL (expected_fma7_static, hfloat, 16, 8) []
+  = { 0x7C00 /* A0 + B0 * B7.  */,
+      0xFC00 /* A1 + B1 * B7.  */,
+      0x7C00 /* A2 + B2 * B7.  */,
+      0x7C00 /* A3 + B3 * B7.  */,
+      0x7C00 /* A4 + B4 * B7.  */,
+      0xFC00 /* A5 + B5 * B7.  */,
+      0xFC00 /* A6 + B6 * B7.  */,
+      0x7C00 /* A7 + B7 * B7.  */ };
+
+/* Expected results for vfms_n.  */
+VECT_VAR_DECL (expected_fms0_static, hfloat, 16, 4) []
+  = { 0xDEA2 /* A0 + (-B0) * B0.  */,
+      0x5810 /* A1 + (-B1) * B0.  */,
+      0xDA82 /* A2 + (-B2) * B0.  */,
+      0xD53A /* A3 + (-B3) * B0.  */ };
+
+VECT_VAR_DECL (expected_fms1_static, hfloat, 16, 4) []
+  = { 0x5C0D /* A0 + (-B0) * B1.  */,
+      0xD0EE /* A1 + (-B1) * B1.  */,
+      0x5274 /* A2 + (-B2) * B1.  */,
+      0x5026 /* A3 + (-B3) * B1.  */ };
+
+VECT_VAR_DECL (expected_fms2_static, hfloat, 16, 4) []
+  = { 0xD54E /* A0 + (-B0) * B2.  */,
+      0x51BA /* A1 + (-B1) * B2.  */,
+      0xD4F3 /* A2 + (-B2) * B2.  */,
+      0xCE66 /* A3 + (-B3) * B2.  */ };
+
+VECT_VAR_DECL (expected_fms3_static, hfloat, 16, 4) []
+  = { 0x4F70 /* A0 + (-B0) * B3.  */,
+      0x4C5A /* A1 + (-B1) * B3.  */,
+      0xD073 /* A2 + (-B2) * B3.  */,
+      0xC600 /* A3 + (-B3) * B3.  */ };
+
+VECT_VAR_DECL (expected_fms0_static, hfloat, 16, 8) []
+  = { 0xDEA2 /* A0 + (-B0) * B0.  */,
+      0x5810 /* A1 + (-B1) * B0.  */,
+      0xDA82 /* A2 + (-B2) * B0.  */,
+      0xD53A /* A3 + (-B3) * B0.  */,
+      0x7C00 /* A4 + (-B4) * B0.  */,
+      0x724B /* A5 + (-B5) * B0.  */,
+      0x6286 /* A6 + (-B6) * B0.  */,
+      0xFC00 /* A7 + (-B7) * B0.  */ };
+
+VECT_VAR_DECL (expected_fms1_static, hfloat, 16, 8) []
+  = { 0x5C0D /* A0 + (-B0) * B1.  */,
+      0xD0EE /* A1 + (-B1) * B1.  */,
+      0x5274 /* A2 + (-B2) * B1.  */,
+      0x5026 /* A3 + (-B3) * B1.  */,
+      0x7C00 /* A4 + (-B4) * B1.  */,
+      0xEA41 /* A5 + (-B5) * B1.  */,
+      0xD5DA /* A6 + (-B6) * B1.  */,
+      0x7C00 /* A7 + (-B7) * B1.  */ };
+
+VECT_VAR_DECL (expected_fms2_static, hfloat, 16, 8) []
+  = { 0xD54E /* A0 + (-B0) * B2.  */,
+      0x51BA /* A1 + (-B1) * B2.  */,
+      0xD4F3 /* A2 + (-B2) * B2.  */,
+      0xCE66 /* A3 + (-B3) * B2.  */,
+      0x7C00 /* A4 + (-B4) * B2.  */,
+      0x6CC8 /* A5 + (-B5) * B2.  */,
+      0x5DD7 /* A6 + (-B6) * B2.  */,
+      0xFC00 /* A7 + (-B7) * B2.  */ };
+
+VECT_VAR_DECL (expected_fms3_static, hfloat, 16, 8) []
+  = { 0x4F70 /* A0 + (-B0) * B3.  */,
+      0x4C5A /* A1 + (-B1) * B3.  */,
+      0xD073 /* A2 + (-B2) * B3.  */,
+      0xC600 /* A3 + (-B3) * B3.  */,
+      0x7C00 /* A4 + (-B4) * B3.  */,
+      0x684B /* A5 + (-B5) * B3.  */,
+      0x5AD0 /* A6 + (-B6) * B3.  */,
+      0xFC00 /* A7 + (-B7) * B3.  */ };
+
+VECT_VAR_DECL (expected_fms4_static, hfloat, 16, 8) []
+  = { 0x5179 /* A0 + (-B0) * B4.  */,
+      0x4AF6 /* A1 + (-B1) * B4.  */,
+      0xCF91 /* A2 + (-B2) * B4.  */,
+      0xC334 /* A3 + (-B3) * B4.  */,
+      0x7C00 /* A4 + (-B4) * B4.  */,
+      0x674C /* A5 + (-B5) * B4.  */,
+      0x5A37 /* A6 + (-B6) * B4.  */,
+      0xFC00 /* A7 + (-B7) * B4.  */ };
+
+VECT_VAR_DECL (expected_fms5_static, hfloat, 16, 8) []
+  = { 0x725C /* A0 + (-B0) * B5.  */,
+      0xEA41 /* A1 + (-B1) * B5.  */,
+      0x6CCA /* A2 + (-B2) * B5.  */,
+      0x6853 /* A3 + (-B3) * B5.  */,
+      0x7C00 /* A4 + (-B4) * B5.  */,
+      0xFC00 /* A5 + (-B5) * B5.  */,
+      0xF441 /* A6 + (-B6) * B5.  */,
+      0x7C00 /* A7 + (-B7) * B5.  */ };
+
+VECT_VAR_DECL (expected_fms6_static, hfloat, 16, 8) []
+  = { 0x62C7 /* A0 + (-B0) * B6.  */,
+      0xD9F2 /* A1 + (-B1) * B6.  */,
+      0x5C6C /* A2 + (-B2) * B6.  */,
+      0x584A /* A3 + (-B3) * B6.  */,
+      0x7C00 /* A4 + (-B4) * B6.  */,
+      0xF447 /* A5 + (-B5) * B6.  */,
+      0xE330 /* A6 + (-B6) * B6.  */,
+      0x7C00 /* A7 + (-B7) * B6.  */ };
+
+VECT_VAR_DECL (expected_fms7_static, hfloat, 16, 8) []
+  = { 0xFC00 /* A0 + (-B0) * B7.  */,
+      0x7C00 /* A1 + (-B1) * B7.  */,
+      0xFC00 /* A2 + (-B2) * B7.  */,
+      0xFC00 /* A3 + (-B3) * B7.  */,
+      0x7C00 /* A4 + (-B4) * B7.  */,
+      0x7C00 /* A5 + (-B5) * B7.  */,
+      0x7C00 /* A6 + (-B6) * B7.  */,
+      0xFC00 /* A7 + (-B7) * B7.  */ };
+
+void exec_vfmas_n_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VFMA_N (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE(vsrc_1, float, 16, 4);
+  DECL_VARIABLE(vsrc_2, float, 16, 4);
+  VECT_VAR_DECL (buf_src_1, float, 16, 4) [] = {A0, A1, A2, A3};
+  VECT_VAR_DECL (buf_src_2, float, 16, 4) [] = {B0, B1, B2, B3};
+  VLOAD (vsrc_1, buf_src_1, , float, f, 16, 4);
+  VLOAD (vsrc_2, buf_src_2, , float, f, 16, 4);
+  DECL_VARIABLE (vector_res, float, 16, 4)
+    = vfma_n_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		  VECT_VAR (vsrc_2, float, 16, 4), B0);
+
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_fma0_static, "");
+
+  VECT_VAR (vector_res, float, 16, 4)
+    = vfma_n_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		  VECT_VAR (vsrc_2, float, 16, 4), B1);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_fma1_static, "");
+
+  VECT_VAR (vector_res, float, 16, 4)
+    = vfma_n_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		  VECT_VAR (vsrc_2, float, 16, 4), B2);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_fma2_static, "");
+
+  VECT_VAR (vector_res, float, 16, 4)
+    = vfma_n_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		  VECT_VAR (vsrc_2, float, 16, 4), B3);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_fma3_static, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VFMAQ_N (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE(vsrc_1, float, 16, 8);
+  DECL_VARIABLE(vsrc_2, float, 16, 8);
+  VECT_VAR_DECL (buf_src_1, float, 16, 8) [] = {A0, A1, A2, A3, A4, A5, A6, A7};
+  VECT_VAR_DECL (buf_src_2, float, 16, 8) [] = {B0, B1, B2, B3, B4, B5, B6, B7};
+  VLOAD (vsrc_1, buf_src_1, q, float, f, 16, 8);
+  VLOAD (vsrc_2, buf_src_2, q, float, f, 16, 8);
+  DECL_VARIABLE (vector_res, float, 16, 8)
+    = vfmaq_n_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		   VECT_VAR (vsrc_2, float, 16, 8), B0);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_fma0_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vfmaq_n_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		   VECT_VAR (vsrc_2, float, 16, 8), B1);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_fma1_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vfmaq_n_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		   VECT_VAR (vsrc_2, float, 16, 8), B2);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_fma2_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vfmaq_n_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		   VECT_VAR (vsrc_2, float, 16, 8), B3);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_fma3_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vfmaq_n_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		   VECT_VAR (vsrc_2, float, 16, 8), B4);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_fma4_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vfmaq_n_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		   VECT_VAR (vsrc_2, float, 16, 8), B5);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_fma5_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vfmaq_n_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		   VECT_VAR (vsrc_2, float, 16, 8), B6);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_fma6_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vfmaq_n_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		   VECT_VAR (vsrc_2, float, 16, 8), B7);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_fma7_static, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VFMA_N (FP16)"
+  clean_results ();
+
+  VECT_VAR (vector_res, float, 16, 4)
+    = vfms_n_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		  VECT_VAR (vsrc_2, float, 16, 4), B0);
+
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_fms0_static, "");
+
+  VECT_VAR (vector_res, float, 16, 4)
+    = vfms_n_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		  VECT_VAR (vsrc_2, float, 16, 4), B1);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_fms1_static, "");
+
+  VECT_VAR (vector_res, float, 16, 4)
+    = vfms_n_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		  VECT_VAR (vsrc_2, float, 16, 4), B2);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_fms2_static, "");
+
+  VECT_VAR (vector_res, float, 16, 4)
+    = vfms_n_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		  VECT_VAR (vsrc_2, float, 16, 4), B3);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_fms3_static, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VFMAQ_N (FP16)"
+  clean_results ();
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vfmsq_n_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		   VECT_VAR (vsrc_2, float, 16, 8), B0);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_fms0_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vfmsq_n_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		   VECT_VAR (vsrc_2, float, 16, 8), B1);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_fms1_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vfmsq_n_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		   VECT_VAR (vsrc_2, float, 16, 8), B2);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_fms2_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vfmsq_n_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		   VECT_VAR (vsrc_2, float, 16, 8), B3);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_fms3_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vfmsq_n_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		   VECT_VAR (vsrc_2, float, 16, 8), B4);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_fms4_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vfmsq_n_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		   VECT_VAR (vsrc_2, float, 16, 8), B5);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_fms5_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vfmsq_n_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		   VECT_VAR (vsrc_2, float, 16, 8), B6);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_fms6_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vfmsq_n_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		   VECT_VAR (vsrc_2, float, 16, 8), B7);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_fms7_static, "");
+}
+
+int
+main (void)
+{
+  exec_vfmas_n_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmaxnmv_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmaxnmv_f16_1.c
new file mode 100644
index 0000000..ce9872f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmaxnmv_f16_1.c
@@ -0,0 +1,131 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon } */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FP16_C(a) ((__fp16) a)
+#define A0 FP16_C (34.8)
+#define B0 FP16_C (__builtin_nanf (""))
+#define C0 FP16_C (-__builtin_nanf (""))
+#define D0 FP16_C (0.0)
+
+#define A1 FP16_C (1025.8)
+#define B1 FP16_C (13.4)
+#define C1 FP16_C (__builtin_nanf (""))
+#define D1 FP16_C (10)
+#define E1 FP16_C (-0.0)
+#define F1 FP16_C (-__builtin_nanf (""))
+#define G1 FP16_C (0.0)
+#define H1 FP16_C (10)
+
+/* Expected results for vmaxnmv.  */
+uint16_t expect = 0x505A /* A0.  */;
+uint16_t expect_alt = 0x6402 /* A1.  */;
+
+void exec_vmaxnmv_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VMAXNMV (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE(vsrc, float, 16, 4);
+  VECT_VAR_DECL (buf_src, float, 16, 4) [] = {A0, B0, C0, D0};
+  VLOAD (vsrc, buf_src, , float, f, 16, 4);
+  float16_t vector_res = vmaxnmv_f16 (VECT_VAR (vsrc, float, 16, 4));
+
+  if (* (uint16_t *) &vector_res != expect)
+    abort ();
+
+  VECT_VAR_DECL (buf_src1, float, 16, 4) [] = {B0, A0, C0, D0};
+  VLOAD (vsrc, buf_src1, , float, f, 16, 4);
+  vector_res = vmaxnmv_f16 (VECT_VAR (vsrc, float, 16, 4));
+
+  if (* (uint16_t *) &vector_res != expect)
+    abort ();
+
+  VECT_VAR_DECL (buf_src2, float, 16, 4) [] = {B0, C0, A0, D0};
+  VLOAD (vsrc, buf_src2, , float, f, 16, 4);
+  vector_res = vmaxnmv_f16 (VECT_VAR (vsrc, float, 16, 4));
+
+  if (* (uint16_t *) &vector_res != expect)
+    abort ();
+
+  VECT_VAR_DECL (buf_src3, float, 16, 4) [] = {B0, C0, D0, A0};
+  VLOAD (vsrc, buf_src3, , float, f, 16, 4);
+  vector_res = vmaxnmv_f16 (VECT_VAR (vsrc, float, 16, 4));
+
+  if (* (uint16_t *) &vector_res != expect)
+    abort ();
+
+#undef TEST_MSG
+#define TEST_MSG "VMAXNMVQ (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE(vsrc, float, 16, 8);
+  VECT_VAR_DECL (buf_src, float, 16, 8) [] = {A1, B1, C1, D1, E1, F1, G1, H1};
+  VLOAD (vsrc, buf_src, q, float, f, 16, 8);
+  vector_res = vmaxnmvq_f16 (VECT_VAR (vsrc, float, 16, 8));
+
+  if (* (uint16_t *) &vector_res != expect_alt)
+    abort ();
+
+  VECT_VAR_DECL (buf_src1, float, 16, 8) [] = {B1, A1, C1, D1, E1, F1, G1, H1};
+  VLOAD (vsrc, buf_src1, q, float, f, 16, 8);
+  vector_res = vmaxnmvq_f16 (VECT_VAR (vsrc, float, 16, 8));
+
+  if (* (uint16_t *) &vector_res != expect_alt)
+    abort ();
+
+  VECT_VAR_DECL (buf_src2, float, 16, 8) [] = {B1, C1, A1, D1, E1, F1, G1, H1};
+  VLOAD (vsrc, buf_src2, q, float, f, 16, 8);
+  vector_res = vmaxnmvq_f16 (VECT_VAR (vsrc, float, 16, 8));
+
+  if (* (uint16_t *) &vector_res != expect_alt)
+    abort ();
+
+  VECT_VAR_DECL (buf_src3, float, 16, 8) [] = {B1, C1, D1, A1, E1, F1, G1, H1};
+  VLOAD (vsrc, buf_src3, q, float, f, 16, 8);
+  vector_res = vmaxnmvq_f16 (VECT_VAR (vsrc, float, 16, 8));
+
+  if (* (uint16_t *) &vector_res != expect_alt)
+    abort ();
+
+  VECT_VAR_DECL (buf_src4, float, 16, 8) [] = {B1, C1, D1, E1, A1, F1, G1, H1};
+  VLOAD (vsrc, buf_src4, q, float, f, 16, 8);
+  vector_res = vmaxnmvq_f16 (VECT_VAR (vsrc, float, 16, 8));
+
+  if (* (uint16_t *) &vector_res != expect_alt)
+    abort ();
+
+  VECT_VAR_DECL (buf_src5, float, 16, 8) [] = {B1, C1, D1, E1, F1, A1, G1, H1};
+  VLOAD (vsrc, buf_src5, q, float, f, 16, 8);
+  vector_res = vmaxnmvq_f16 (VECT_VAR (vsrc, float, 16, 8));
+
+  if (* (uint16_t *) &vector_res != expect_alt)
+    abort ();
+
+  VECT_VAR_DECL (buf_src6, float, 16, 8) [] = {B1, C1, D1, E1, F1, G1, A1, H1};
+  VLOAD (vsrc, buf_src6, q, float, f, 16, 8);
+  vector_res = vmaxnmvq_f16 (VECT_VAR (vsrc, float, 16, 8));
+
+  if (* (uint16_t *) &vector_res != expect_alt)
+    abort ();
+
+  VECT_VAR_DECL (buf_src7, float, 16, 8) [] = {B1, C1, D1, E1, F1, G1, H1, A1};
+  VLOAD (vsrc, buf_src7, q, float, f, 16, 8);
+  vector_res = vmaxnmvq_f16 (VECT_VAR (vsrc, float, 16, 8));
+
+  if (* (uint16_t *) &vector_res != expect_alt)
+    abort ();
+}
+
+int
+main (void)
+{
+  exec_vmaxnmv_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmaxv_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmaxv_f16_1.c
new file mode 100644
index 0000000..39c4897
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmaxv_f16_1.c
@@ -0,0 +1,131 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon } */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FP16_C(a) ((__fp16) a)
+#define A0 FP16_C (123.4)
+#define B0 FP16_C (-567.8)
+#define C0 FP16_C (34.8)
+#define D0 FP16_C (0.0)
+
+#define A1 FP16_C (1025.8)
+#define B1 FP16_C (13.4)
+#define C1 FP16_C (-567.8)
+#define D1 FP16_C (10)
+#define E1 FP16_C (-0.0)
+#define F1 FP16_C (567.8)
+#define G1 FP16_C (0.0)
+#define H1 FP16_C (10)
+
+/* Expected results for vmaxv.  */
+uint16_t expect = 0x57B6 /* A0.  */;
+uint16_t expect_alt = 0x6402 /* A1.  */;
+
+void exec_vmaxv_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VMAXV (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE(vsrc, float, 16, 4);
+  VECT_VAR_DECL (buf_src, float, 16, 4) [] = {A0, B0, C0, D0};
+  VLOAD (vsrc, buf_src, , float, f, 16, 4);
+  float16_t vector_res = vmaxv_f16 (VECT_VAR (vsrc, float, 16, 4));
+
+  if (* (uint16_t *) &vector_res != expect)
+    abort ();
+
+  VECT_VAR_DECL (buf_src1, float, 16, 4) [] = {B0, A0, C0, D0};
+  VLOAD (vsrc, buf_src1, , float, f, 16, 4);
+  vector_res = vmaxv_f16 (VECT_VAR (vsrc, float, 16, 4));
+
+  if (* (uint16_t *) &vector_res != expect)
+    abort ();
+
+  VECT_VAR_DECL (buf_src2, float, 16, 4) [] = {B0, C0, A0, D0};
+  VLOAD (vsrc, buf_src2, , float, f, 16, 4);
+  vector_res = vmaxv_f16 (VECT_VAR (vsrc, float, 16, 4));
+
+  if (* (uint16_t *) &vector_res != expect)
+    abort ();
+
+  VECT_VAR_DECL (buf_src3, float, 16, 4) [] = {B0, C0, D0, A0};
+  VLOAD (vsrc, buf_src3, , float, f, 16, 4);
+  vector_res = vmaxv_f16 (VECT_VAR (vsrc, float, 16, 4));
+
+  if (* (uint16_t *) &vector_res != expect)
+    abort ();
+
+#undef TEST_MSG
+#define TEST_MSG "VMAXVQ (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE(vsrc, float, 16, 8);
+  VECT_VAR_DECL (buf_src, float, 16, 8) [] = {A1, B1, C1, D1, E1, F1, G1, H1};
+  VLOAD (vsrc, buf_src, q, float, f, 16, 8);
+  vector_res = vmaxvq_f16 (VECT_VAR (vsrc, float, 16, 8));
+
+  if (* (uint16_t *) &vector_res != expect_alt)
+    abort ();
+
+  VECT_VAR_DECL (buf_src1, float, 16, 8) [] = {B1, A1, C1, D1, E1, F1, G1, H1};
+  VLOAD (vsrc, buf_src1, q, float, f, 16, 8);
+  vector_res = vmaxvq_f16 (VECT_VAR (vsrc, float, 16, 8));
+
+  if (* (uint16_t *) &vector_res != expect_alt)
+    abort ();
+
+  VECT_VAR_DECL (buf_src2, float, 16, 8) [] = {B1, C1, A1, D1, E1, F1, G1, H1};
+  VLOAD (vsrc, buf_src2, q, float, f, 16, 8);
+  vector_res = vmaxvq_f16 (VECT_VAR (vsrc, float, 16, 8));
+
+  if (* (uint16_t *) &vector_res != expect_alt)
+    abort ();
+
+  VECT_VAR_DECL (buf_src3, float, 16, 8) [] = {B1, C1, D1, A1, E1, F1, G1, H1};
+  VLOAD (vsrc, buf_src3, q, float, f, 16, 8);
+  vector_res = vmaxvq_f16 (VECT_VAR (vsrc, float, 16, 8));
+
+  if (* (uint16_t *) &vector_res != expect_alt)
+    abort ();
+
+  VECT_VAR_DECL (buf_src4, float, 16, 8) [] = {B1, C1, D1, E1, A1, F1, G1, H1};
+  VLOAD (vsrc, buf_src4, q, float, f, 16, 8);
+  vector_res = vmaxvq_f16 (VECT_VAR (vsrc, float, 16, 8));
+
+  if (* (uint16_t *) &vector_res != expect_alt)
+    abort ();
+
+  VECT_VAR_DECL (buf_src5, float, 16, 8) [] = {B1, C1, D1, E1, F1, A1, G1, H1};
+  VLOAD (vsrc, buf_src5, q, float, f, 16, 8);
+  vector_res = vmaxvq_f16 (VECT_VAR (vsrc, float, 16, 8));
+
+  if (* (uint16_t *) &vector_res != expect_alt)
+    abort ();
+
+  VECT_VAR_DECL (buf_src6, float, 16, 8) [] = {B1, C1, D1, E1, F1, G1, A1, H1};
+  VLOAD (vsrc, buf_src6, q, float, f, 16, 8);
+  vector_res = vmaxvq_f16 (VECT_VAR (vsrc, float, 16, 8));
+
+  if (* (uint16_t *) &vector_res != expect_alt)
+    abort ();
+
+  VECT_VAR_DECL (buf_src7, float, 16, 8) [] = {B1, C1, D1, E1, F1, G1, H1, A1};
+  VLOAD (vsrc, buf_src7, q, float, f, 16, 8);
+  vector_res = vmaxvq_f16 (VECT_VAR (vsrc, float, 16, 8));
+
+  if (* (uint16_t *) &vector_res != expect_alt)
+    abort ();
+}
+
+int
+main (void)
+{
+  exec_vmaxv_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vminnmv_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vminnmv_f16_1.c
new file mode 100644
index 0000000..b7c5101
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vminnmv_f16_1.c
@@ -0,0 +1,131 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon } */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FP16_C(a) ((__fp16) a)
+#define A0 FP16_C (-567.8)
+#define B0 FP16_C (__builtin_nanf (""))
+#define C0 FP16_C (34.8)
+#define D0 FP16_C (-__builtin_nanf (""))
+
+#define A1 FP16_C (-567.8)
+#define B1 FP16_C (1025.8)
+#define C1 FP16_C (-__builtin_nanf (""))
+#define D1 FP16_C (10)
+#define E1 FP16_C (-0.0)
+#define F1 FP16_C (__builtin_nanf (""))
+#define G1 FP16_C (0.0)
+#define H1 FP16_C (10)
+
+/* Expected results for vminnmv.  */
+uint16_t expect = 0xE070 /* A0.  */;
+uint16_t expect_alt = 0xE070 /* A1.  */;
+
+void exec_vminnmv_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VMINNMV (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE(vsrc, float, 16, 4);
+  VECT_VAR_DECL (buf_src, float, 16, 4) [] = {A0, B0, C0, D0};
+  VLOAD (vsrc, buf_src, , float, f, 16, 4);
+  float16_t vector_res = vminnmv_f16 (VECT_VAR (vsrc, float, 16, 4));
+
+  if (* (uint16_t *) &vector_res != expect)
+    abort ();
+
+  VECT_VAR_DECL (buf_src1, float, 16, 4) [] = {B0, A0, C0, D0};
+  VLOAD (vsrc, buf_src1, , float, f, 16, 4);
+  vector_res = vminnmv_f16 (VECT_VAR (vsrc, float, 16, 4));
+
+  if (* (uint16_t *) &vector_res != expect)
+    abort ();
+
+  VECT_VAR_DECL (buf_src2, float, 16, 4) [] = {B0, C0, A0, D0};
+  VLOAD (vsrc, buf_src2, , float, f, 16, 4);
+  vector_res = vminnmv_f16 (VECT_VAR (vsrc, float, 16, 4));
+
+  if (* (uint16_t *) &vector_res != expect)
+    abort ();
+
+  VECT_VAR_DECL (buf_src3, float, 16, 4) [] = {B0, C0, D0, A0};
+  VLOAD (vsrc, buf_src3, , float, f, 16, 4);
+  vector_res = vminnmv_f16 (VECT_VAR (vsrc, float, 16, 4));
+
+  if (* (uint16_t *) &vector_res != expect)
+    abort ();
+
+#undef TEST_MSG
+#define TEST_MSG "VMINNMVQ (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE(vsrc, float, 16, 8);
+  VECT_VAR_DECL (buf_src, float, 16, 8) [] = {A1, B1, C1, D1, E1, F1, G1, H1};
+  VLOAD (vsrc, buf_src, q, float, f, 16, 8);
+  vector_res = vminnmvq_f16 (VECT_VAR (vsrc, float, 16, 8));
+
+  if (* (uint16_t *) &vector_res != expect_alt)
+    abort ();
+
+  VECT_VAR_DECL (buf_src1, float, 16, 8) [] = {B1, A1, C1, D1, E1, F1, G1, H1};
+  VLOAD (vsrc, buf_src1, q, float, f, 16, 8);
+  vector_res = vminnmvq_f16 (VECT_VAR (vsrc, float, 16, 8));
+
+  if (* (uint16_t *) &vector_res != expect_alt)
+    abort ();
+
+  VECT_VAR_DECL (buf_src2, float, 16, 8) [] = {B1, C1, A1, D1, E1, F1, G1, H1};
+  VLOAD (vsrc, buf_src2, q, float, f, 16, 8);
+  vector_res = vminnmvq_f16 (VECT_VAR (vsrc, float, 16, 8));
+
+  if (* (uint16_t *) &vector_res != expect_alt)
+    abort ();
+
+  VECT_VAR_DECL (buf_src3, float, 16, 8) [] = {B1, C1, D1, A1, E1, F1, G1, H1};
+  VLOAD (vsrc, buf_src3, q, float, f, 16, 8);
+  vector_res = vminnmvq_f16 (VECT_VAR (vsrc, float, 16, 8));
+
+  if (* (uint16_t *) &vector_res != expect_alt)
+    abort ();
+
+  VECT_VAR_DECL (buf_src4, float, 16, 8) [] = {B1, C1, D1, E1, A1, F1, G1, H1};
+  VLOAD (vsrc, buf_src4, q, float, f, 16, 8);
+  vector_res = vminnmvq_f16 (VECT_VAR (vsrc, float, 16, 8));
+
+  if (* (uint16_t *) &vector_res != expect_alt)
+    abort ();
+
+  VECT_VAR_DECL (buf_src5, float, 16, 8) [] = {B1, C1, D1, E1, F1, A1, G1, H1};
+  VLOAD (vsrc, buf_src5, q, float, f, 16, 8);
+  vector_res = vminnmvq_f16 (VECT_VAR (vsrc, float, 16, 8));
+
+  if (* (uint16_t *) &vector_res != expect_alt)
+    abort ();
+
+  VECT_VAR_DECL (buf_src6, float, 16, 8) [] = {B1, C1, D1, E1, F1, G1, A1, H1};
+  VLOAD (vsrc, buf_src6, q, float, f, 16, 8);
+  vector_res = vminnmvq_f16 (VECT_VAR (vsrc, float, 16, 8));
+
+  if (* (uint16_t *) &vector_res != expect_alt)
+    abort ();
+
+  VECT_VAR_DECL (buf_src7, float, 16, 8) [] = {B1, C1, D1, E1, F1, G1, H1, A1};
+  VLOAD (vsrc, buf_src7, q, float, f, 16, 8);
+  vector_res = vminnmvq_f16 (VECT_VAR (vsrc, float, 16, 8));
+
+  if (* (uint16_t *) &vector_res != expect_alt)
+    abort ();
+}
+
+int
+main (void)
+{
+  exec_vminnmv_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vminv_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vminv_f16_1.c
new file mode 100644
index 0000000..c454a53
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vminv_f16_1.c
@@ -0,0 +1,131 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon } */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FP16_C(a) ((__fp16) a)
+#define A0 FP16_C (-567.8)
+#define B0 FP16_C (123.4)
+#define C0 FP16_C (34.8)
+#define D0 FP16_C (0.0)
+
+#define A1 FP16_C (-567.8)
+#define B1 FP16_C (1025.8)
+#define C1 FP16_C (13.4)
+#define D1 FP16_C (10)
+#define E1 FP16_C (-0.0)
+#define F1 FP16_C (567.8)
+#define G1 FP16_C (0.0)
+#define H1 FP16_C (10)
+
+/* Expected results for vminv.  */
+uint16_t expect = 0xE070 /* A0.  */;
+uint16_t expect_alt = 0xE070 /* A1.  */;
+
+void exec_vminv_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VMINV (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE(vsrc, float, 16, 4);
+  VECT_VAR_DECL (buf_src, float, 16, 4) [] = {A0, B0, C0, D0};
+  VLOAD (vsrc, buf_src, , float, f, 16, 4);
+  float16_t vector_res = vminv_f16 (VECT_VAR (vsrc, float, 16, 4));
+
+  if (* (uint16_t *) &vector_res != expect)
+    abort ();
+
+  VECT_VAR_DECL (buf_src1, float, 16, 4) [] = {B0, A0, C0, D0};
+  VLOAD (vsrc, buf_src1, , float, f, 16, 4);
+  vector_res = vminv_f16 (VECT_VAR (vsrc, float, 16, 4));
+
+  if (* (uint16_t *) &vector_res != expect)
+    abort ();
+
+  VECT_VAR_DECL (buf_src2, float, 16, 4) [] = {B0, C0, A0, D0};
+  VLOAD (vsrc, buf_src2, , float, f, 16, 4);
+  vector_res = vminv_f16 (VECT_VAR (vsrc, float, 16, 4));
+
+  if (* (uint16_t *) &vector_res != expect)
+    abort ();
+
+  VECT_VAR_DECL (buf_src3, float, 16, 4) [] = {B0, C0, D0, A0};
+  VLOAD (vsrc, buf_src3, , float, f, 16, 4);
+  vector_res = vminv_f16 (VECT_VAR (vsrc, float, 16, 4));
+
+  if (* (uint16_t *) &vector_res != expect)
+    abort ();
+
+#undef TEST_MSG
+#define TEST_MSG "VMINVQ (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE(vsrc, float, 16, 8);
+  VECT_VAR_DECL (buf_src, float, 16, 8) [] = {A1, B1, C1, D1, E1, F1, G1, H1};
+  VLOAD (vsrc, buf_src, q, float, f, 16, 8);
+  vector_res = vminvq_f16 (VECT_VAR (vsrc, float, 16, 8));
+
+  if (* (uint16_t *) &vector_res != expect_alt)
+    abort ();
+
+  VECT_VAR_DECL (buf_src1, float, 16, 8) [] = {B1, A1, C1, D1, E1, F1, G1, H1};
+  VLOAD (vsrc, buf_src1, q, float, f, 16, 8);
+  vector_res = vminvq_f16 (VECT_VAR (vsrc, float, 16, 8));
+
+  if (* (uint16_t *) &vector_res != expect_alt)
+    abort ();
+
+  VECT_VAR_DECL (buf_src2, float, 16, 8) [] = {B1, C1, A1, D1, E1, F1, G1, H1};
+  VLOAD (vsrc, buf_src2, q, float, f, 16, 8);
+  vector_res = vminvq_f16 (VECT_VAR (vsrc, float, 16, 8));
+
+  if (* (uint16_t *) &vector_res != expect_alt)
+    abort ();
+
+  VECT_VAR_DECL (buf_src3, float, 16, 8) [] = {B1, C1, D1, A1, E1, F1, G1, H1};
+  VLOAD (vsrc, buf_src3, q, float, f, 16, 8);
+  vector_res = vminvq_f16 (VECT_VAR (vsrc, float, 16, 8));
+
+  if (* (uint16_t *) &vector_res != expect_alt)
+    abort ();
+
+  VECT_VAR_DECL (buf_src4, float, 16, 8) [] = {B1, C1, D1, E1, A1, F1, G1, H1};
+  VLOAD (vsrc, buf_src4, q, float, f, 16, 8);
+  vector_res = vminvq_f16 (VECT_VAR (vsrc, float, 16, 8));
+
+  if (* (uint16_t *) &vector_res != expect_alt)
+    abort ();
+
+  VECT_VAR_DECL (buf_src5, float, 16, 8) [] = {B1, C1, D1, E1, F1, A1, G1, H1};
+  VLOAD (vsrc, buf_src5, q, float, f, 16, 8);
+  vector_res = vminvq_f16 (VECT_VAR (vsrc, float, 16, 8));
+
+  if (* (uint16_t *) &vector_res != expect_alt)
+    abort ();
+
+  VECT_VAR_DECL (buf_src6, float, 16, 8) [] = {B1, C1, D1, E1, F1, G1, A1, H1};
+  VLOAD (vsrc, buf_src6, q, float, f, 16, 8);
+  vector_res = vminvq_f16 (VECT_VAR (vsrc, float, 16, 8));
+
+  if (* (uint16_t *) &vector_res != expect_alt)
+    abort ();
+
+  VECT_VAR_DECL (buf_src7, float, 16, 8) [] = {B1, C1, D1, E1, F1, G1, H1, A1};
+  VLOAD (vsrc, buf_src7, q, float, f, 16, 8);
+  vector_res = vminvq_f16 (VECT_VAR (vsrc, float, 16, 8));
+
+  if (* (uint16_t *) &vector_res != expect_alt)
+    abort ();
+}
+
+int
+main (void)
+{
+  exec_vminv_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmul_lane_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmul_lane_f16_1.c
new file mode 100644
index 0000000..1719d56
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmul_lane_f16_1.c
@@ -0,0 +1,454 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon } */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (13.4)
+#define B FP16_C (-56.8)
+#define C FP16_C (-34.8)
+#define D FP16_C (12)
+#define E FP16_C (63.1)
+#define F FP16_C (19.1)
+#define G FP16_C (-4.8)
+#define H FP16_C (77)
+
+#define I FP16_C (0.7)
+#define J FP16_C (-78)
+#define K FP16_C (11.23)
+#define L FP16_C (98)
+#define M FP16_C (87.1)
+#define N FP16_C (-8)
+#define O FP16_C (-1.1)
+#define P FP16_C (-9.7)
+
+/* Expected results for vmul_lane.  */
+VECT_VAR_DECL (expected0_static, hfloat, 16, 4) []
+  = { 0x629B /* A * E.  */,
+      0xEB00 /* B * E.  */,
+      0xE84A /* C * E.  */,
+      0x61EA /* D * E.  */ };
+
+VECT_VAR_DECL (expected1_static, hfloat, 16, 4) []
+  = { 0x5BFF /* A * F.  */,
+      0xE43D /* B * F.  */,
+      0xE131 /* C * F.  */,
+      0x5B29 /* D * F.  */ };
+
+VECT_VAR_DECL (expected2_static, hfloat, 16, 4) []
+  = { 0xD405 /* A * G.  */,
+      0x5C43 /* B * G.  */,
+      0x5939 /* C * G.  */,
+      0xD334 /* D * G.  */ };
+
+VECT_VAR_DECL (expected3_static, hfloat, 16, 4) []
+  = { 0x6408 /* A * H.  */,
+      0xEC46 /* B * H.  */,
+      0xE93C /* C * H.  */,
+      0x6338 /* D * H.  */ };
+
+/* Expected results for vmulq_lane.  */
+VECT_VAR_DECL (expected0_static, hfloat, 16, 8) []
+  = { 0x629B /* A * E.  */,
+      0xEB00 /* B * E.  */,
+      0xE84A /* C * E.  */,
+      0x61EA /* D * E.  */,
+      0x5186 /* I * E.  */,
+      0xECCE /* J * E.  */,
+      0x6189 /* K * E.  */,
+      0x6E0A /* L * E.  */ };
+
+VECT_VAR_DECL (expected1_static, hfloat, 16, 8) []
+  = { 0x5BFF /* A * F.  */,
+      0xE43D /* B * F.  */,
+      0xE131 /* C * F.  */,
+      0x5B29 /* D * F.  */,
+      0x4AAF /* I * F.  */,
+      0xE5D1 /* J * F.  */,
+      0x5AB3 /* K * F.  */,
+      0x674F /* L * F.  */ };
+
+VECT_VAR_DECL (expected2_static, hfloat, 16, 8) []
+  = { 0xD405 /* A * G.  */,
+      0x5C43 /* B * G.  */,
+      0x5939 /* C * G.  */,
+      0xD334 /* D * G.  */,
+      0xC2B9 /* I * G.  */,
+      0x5DDA /* J * G.  */,
+      0xD2BD /* K * G.  */,
+      0xDF5A /* L * G.  */ };
+
+VECT_VAR_DECL (expected3_static, hfloat, 16, 8) []
+  = { 0x6408 /* A * H.  */,
+      0xEC46 /* B * H.  */,
+      0xE93C /* C * H.  */,
+      0x6338 /* D * H.  */,
+      0x52BD /* I * H.  */,
+      0xEDDE /* J * H.  */,
+      0x62C1 /* K * H.  */,
+      0x6F5E /* L * H.  */ };
+
+/* Expected results for vmul_laneq.  */
+VECT_VAR_DECL (expected_laneq0_static, hfloat, 16, 4) []
+  = { 0x629B /* A * E.  */,
+      0xEB00 /* B * E.  */,
+      0xE84A /* C * E.  */,
+      0x61EA /* D * E.  */ };
+
+VECT_VAR_DECL (expected_laneq1_static, hfloat, 16, 4) []
+  = { 0x5BFF /* A * F.  */,
+      0xE43D /* B * F.  */,
+      0xE131 /* C * F.  */,
+      0x5B29 /* D * F.  */ };
+
+VECT_VAR_DECL (expected_laneq2_static, hfloat, 16, 4) []
+  = { 0xD405 /* A * G.  */,
+      0x5C43 /* B * G.  */,
+      0x5939 /* C * G.  */,
+      0xD334 /* D * G.  */ };
+
+VECT_VAR_DECL (expected_laneq3_static, hfloat, 16, 4) []
+  = { 0x6408 /* A * H.  */,
+      0xEC46 /* B * H.  */,
+      0xE93C /* C * H.  */,
+      0x6338 /* D * H.  */ };
+
+VECT_VAR_DECL (expected_laneq4_static, hfloat, 16, 4) []
+  = { 0x648F /* A * M.  */,
+      0xECD5 /* B * M.  */,
+      0xE9ED /* C * M.  */,
+      0x6416 /* D * M.  */ };
+
+VECT_VAR_DECL (expected_laneq5_static, hfloat, 16, 4) []
+  = { 0xD6B3 /* A * N.  */,
+      0x5F1A /* B * N.  */,
+      0x5C5A /* C * N.  */,
+      0xD600 /* D * N.  */ };
+
+VECT_VAR_DECL (expected_laneq6_static, hfloat, 16, 4) []
+  = { 0xCB5E /* A * O.  */,
+      0x53CF /* B * O.  */,
+      0x50C9 /* C * O.  */,
+      0xCA99 /* D * O.  */ };
+
+VECT_VAR_DECL (expected_laneq7_static, hfloat, 16, 4) []
+  = { 0xD810 /* A * P.  */,
+      0x604F /* B * P.  */,
+      0x5D47 /* C * P.  */,
+      0xD747 /* D * P.  */ };
+
+/* Expected results for vmulq_laneq.  */
+VECT_VAR_DECL (expected_laneq0_static, hfloat, 16, 8) []
+  = { 0x629B /* A * E.  */,
+      0xEB00 /* B * E.  */,
+      0xE84A /* C * E.  */,
+      0x61EA /* D * E.  */,
+      0x5186 /* I * E.  */,
+      0xECCE /* J * E.  */,
+      0x6189 /* K * E.  */,
+      0x6E0A /* L * E.  */ };
+
+VECT_VAR_DECL (expected_laneq1_static, hfloat, 16, 8) []
+  = { 0x5BFF /* A * F.  */,
+      0xE43D /* B * F.  */,
+      0xE131 /* C * F.  */,
+      0x5B29 /* D * F.  */,
+      0x4AAF /* I * F.  */,
+      0xE5D1 /* J * F.  */,
+      0x5AB3 /* K * F.  */,
+      0x674F /* L * F.  */ };
+
+VECT_VAR_DECL (expected_laneq2_static, hfloat, 16, 8) []
+  = { 0xD405 /* A * G.  */,
+      0x5C43 /* B * G.  */,
+      0x5939 /* C * G.  */,
+      0xD334 /* D * G.  */,
+      0xC2B9 /* I * G.  */,
+      0x5DDA /* J * G.  */,
+      0xD2BD /* K * G.  */,
+      0xDF5A /* L * G.  */ };
+
+VECT_VAR_DECL (expected_laneq3_static, hfloat, 16, 8) []
+  = { 0x6408 /* A * H.  */,
+      0xEC46 /* B * H.  */,
+      0xE93C /* C * H.  */,
+      0x6338 /* D * H.  */,
+      0x52BD /* I * H.  */,
+      0xEDDE /* J * H.  */,
+      0x62C1 /* K * H.  */,
+      0x6F5E /* L * H.  */ };
+
+VECT_VAR_DECL (expected_laneq4_static, hfloat, 16, 8) []
+  = { 0x648F /* A * M.  */,
+      0xECD5 /* B * M.  */,
+      0xE9ED /* C * M.  */,
+      0x6416 /* D * M.  */,
+      0x53A0 /* I * M.  */,
+      0xEEA3 /* J * M.  */,
+      0x63A4 /* K * M.  */,
+      0x702B /* L * M.  */ };
+
+VECT_VAR_DECL (expected_laneq5_static, hfloat, 16, 8) []
+  = { 0xD6B3 /* A * N.  */,
+      0x5F1A /* B * N.  */,
+      0x5C5A /* C * N.  */,
+      0xD600 /* D * N.  */,
+      0xC59A /* I * N.  */,
+      0x60E0 /* J * N.  */,
+      0xD59D /* K * N.  */,
+      0xE220 /* L * N.  */ };
+
+VECT_VAR_DECL (expected_laneq6_static, hfloat, 16, 8) []
+  = { 0xCB5E /* A * O.  */,
+      0x53CF /* B * O.  */,
+      0x50C9 /* C * O.  */,
+      0xCA99 /* D * O.  */,
+      0xBA29 /* I * O.  */,
+      0x555C /* J * O.  */,
+      0xCA2C /* K * O.  */,
+      0xD6BC /* L * O.  */ };
+
+VECT_VAR_DECL (expected_laneq7_static, hfloat, 16, 8) []
+  = { 0xD810 /* A * P.  */,
+      0x604F /* B * P.  */,
+      0x5D47 /* C * P.  */,
+      0xD747 /* D * P.  */,
+      0xC6CB /* I * P.  */,
+      0x61EA /* J * P.  */,
+      0xD6CF /* K * P.  */,
+      0xE36E /* L * P.  */ };
+
+void exec_vmul_lane_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VMUL_LANE (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE(vsrc_1, float, 16, 4);
+  DECL_VARIABLE(vsrc_2, float, 16, 4);
+  VECT_VAR_DECL (buf_src_1, float, 16, 4) [] = {A, B, C, D};
+  VECT_VAR_DECL (buf_src_2, float, 16, 4) [] = {E, F, G, H};
+  VLOAD (vsrc_1, buf_src_1, , float, f, 16, 4);
+  VLOAD (vsrc_2, buf_src_2, , float, f, 16, 4);
+  DECL_VARIABLE (vector_res, float, 16, 4)
+    = vmul_lane_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		     VECT_VAR (vsrc_2, float, 16, 4), 0);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected0_static, "");
+
+  VECT_VAR (vector_res, float, 16, 4)
+    = vmul_lane_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		     VECT_VAR (vsrc_2, float, 16, 4), 1);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected1_static, "");
+
+  VECT_VAR (vector_res, float, 16, 4)
+    = vmul_lane_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		     VECT_VAR (vsrc_2, float, 16, 4), 2);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected2_static, "");
+
+  VECT_VAR (vector_res, float, 16, 4)
+    = vmul_lane_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		     VECT_VAR (vsrc_2, float, 16, 4), 3);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected3_static, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VMULQ_LANE (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE(vsrc_1, float, 16, 8);
+  VECT_VAR_DECL (buf_src_1, float, 16, 8) [] = {A, B, C, D, I, J, K, L};
+  VLOAD (vsrc_1, buf_src_1, q, float, f, 16, 8);
+  DECL_VARIABLE (vector_res, float, 16, 8)
+    = vmulq_lane_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		      VECT_VAR (vsrc_2, float, 16, 4), 0);
+
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected0_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vmulq_lane_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		      VECT_VAR (vsrc_2, float, 16, 4), 1);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected1_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vmulq_lane_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		      VECT_VAR (vsrc_2, float, 16, 4), 2);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected2_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vmulq_lane_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		      VECT_VAR (vsrc_2, float, 16, 4), 3);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected3_static, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VMUL_LANEQ (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE(vsrc_2, float, 16, 8);
+  VECT_VAR_DECL (buf_src_2, float, 16, 8) [] = {E, F, G, H, M, N, O, P};
+  VLOAD (vsrc_2, buf_src_2, q, float, f, 16, 8);
+  VECT_VAR (vector_res, float, 16, 4)
+    = vmul_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		      VECT_VAR (vsrc_2, float, 16, 8), 0);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_laneq0_static, "");
+
+  VECT_VAR (vector_res, float, 16, 4)
+    = vmul_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		      VECT_VAR (vsrc_2, float, 16, 8), 1);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_laneq1_static, "");
+
+  VECT_VAR (vector_res, float, 16, 4)
+    = vmul_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		      VECT_VAR (vsrc_2, float, 16, 8), 2);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_laneq2_static, "");
+
+  VECT_VAR (vector_res, float, 16, 4)
+    = vmul_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		      VECT_VAR (vsrc_2, float, 16, 8), 3);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_laneq3_static, "");
+
+  VECT_VAR (vector_res, float, 16, 4)
+    = vmul_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		      VECT_VAR (vsrc_2, float, 16, 8), 4);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_laneq4_static, "");
+
+  VECT_VAR (vector_res, float, 16, 4)
+    = vmul_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		      VECT_VAR (vsrc_2, float, 16, 8), 5);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_laneq5_static, "");
+
+  VECT_VAR (vector_res, float, 16, 4)
+    = vmul_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		      VECT_VAR (vsrc_2, float, 16, 8), 6);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_laneq6_static, "");
+
+  VECT_VAR (vector_res, float, 16, 4)
+    = vmul_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		      VECT_VAR (vsrc_2, float, 16, 8), 7);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_laneq7_static, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VMULQ_LANEQ (FP16)"
+  clean_results ();
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vmulq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		       VECT_VAR (vsrc_2, float, 16, 8), 0);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_laneq0_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vmulq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		       VECT_VAR (vsrc_2, float, 16, 8), 1);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_laneq1_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vmulq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		       VECT_VAR (vsrc_2, float, 16, 8), 2);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_laneq2_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vmulq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		       VECT_VAR (vsrc_2, float, 16, 8), 3);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_laneq3_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vmulq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		       VECT_VAR (vsrc_2, float, 16, 8), 4);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_laneq4_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vmulq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		       VECT_VAR (vsrc_2, float, 16, 8), 5);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_laneq5_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vmulq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		       VECT_VAR (vsrc_2, float, 16, 8), 6);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_laneq6_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vmulq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		       VECT_VAR (vsrc_2, float, 16, 8), 7);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_laneq7_static, "");
+}
+
+int
+main (void)
+{
+  exec_vmul_lane_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulx_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulx_f16_1.c
new file mode 100644
index 0000000..51bbead
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulx_f16_1.c
@@ -0,0 +1,84 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon } */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (13.4)
+#define B FP16_C (__builtin_inff ())
+#define C FP16_C (-34.8)
+#define D FP16_C (-__builtin_inff ())
+#define E FP16_C (63.1)
+#define F FP16_C (0.0)
+#define G FP16_C (-4.8)
+#define H FP16_C (0.0)
+
+#define I FP16_C (0.7)
+#define J FP16_C (-__builtin_inff ())
+#define K FP16_C (11.23)
+#define L FP16_C (98)
+#define M FP16_C (87.1)
+#define N FP16_C (-0.0)
+#define O FP16_C (-1.1)
+#define P FP16_C (7)
+
+/* Expected results for vmulx.  */
+VECT_VAR_DECL (expected_static, hfloat, 16, 4) []
+  = { 0x629B /* A * E.  */, 0x4000 /* FP16_C (2.0f).  */,
+      0x5939 /* C * G.  */, 0xC000 /* FP16_C (-2.0f).  */ };
+
+VECT_VAR_DECL (expected_static, hfloat, 16, 8) []
+  = { 0x629B /* A * E.  */, 0x4000 /* FP16_C (2.0f).  */,
+      0x5939 /* C * G.  */, 0xC000 /* FP16_C (-2.0f).  */,
+      0x53A0 /* I * M.  */, 0x4000 /* FP16_C (2.0f).  */,
+      0xCA2C /* K * O.  */, 0x615C /* L * P.  */ };
+
+void exec_vmulx_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VMULX (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE(vsrc_1, float, 16, 4);
+  DECL_VARIABLE(vsrc_2, float, 16, 4);
+  VECT_VAR_DECL (buf_src_1, float, 16, 4) [] = {A, B, C, D};
+  VECT_VAR_DECL (buf_src_2, float, 16, 4) [] = {E, F, G, H};
+  VLOAD (vsrc_1, buf_src_1, , float, f, 16, 4);
+  VLOAD (vsrc_2, buf_src_2, , float, f, 16, 4);
+  DECL_VARIABLE (vector_res, float, 16, 4)
+    = vmulx_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		 VECT_VAR (vsrc_2, float, 16, 4));
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_static, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VMULXQ (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE(vsrc_1, float, 16, 8);
+  DECL_VARIABLE(vsrc_2, float, 16, 8);
+  VECT_VAR_DECL (buf_src_1, float, 16, 8) [] = {A, B, C, D, I, J, K, L};
+  VECT_VAR_DECL (buf_src_2, float, 16, 8) [] = {E, F, G, H, M, N, O, P};
+  VLOAD (vsrc_1, buf_src_1, q, float, f, 16, 8);
+  VLOAD (vsrc_2, buf_src_2, q, float, f, 16, 8);
+  DECL_VARIABLE (vector_res, float, 16, 8)
+    = vmulxq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		  VECT_VAR (vsrc_2, float, 16, 8));
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_static, "");
+}
+
+int
+main (void)
+{
+  exec_vmulx_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulx_lane_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulx_lane_f16_1.c
new file mode 100644
index 0000000..f90a36d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulx_lane_f16_1.c
@@ -0,0 +1,452 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon } */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (13.4)
+#define B FP16_C (__builtin_inff ())
+#define C FP16_C (-34.8)
+#define D FP16_C (-__builtin_inff ())
+#define E FP16_C (-0.0)
+#define F FP16_C (19.1)
+#define G FP16_C (-4.8)
+#define H FP16_C (0.0)
+
+#define I FP16_C (0.7)
+#define J FP16_C (-78)
+#define K FP16_C (-__builtin_inff ())
+#define L FP16_C (98)
+#define M FP16_C (87.1)
+#define N FP16_C (-8)
+#define O FP16_C (-1.1)
+#define P FP16_C (-0.0)
+
+/* Expected results for vmulx_lane.  */
+VECT_VAR_DECL (expected0_static, hfloat, 16, 4) []
+  = { 0x8000 /* A * E.  */,
+      0xC000 /* FP16_C (-2.0f).  */,
+      0x0000 /* C * E.  */,
+      0x4000 /* FP16_C (2.0f).  */ };
+
+VECT_VAR_DECL (expected1_static, hfloat, 16, 4) []
+  = { 0x5BFF /* A * F.  */,
+      0x7C00 /* B * F.  */,
+      0xE131 /* C * F.  */,
+      0xFC00 /* D * F.  */ };
+
+VECT_VAR_DECL (expected2_static, hfloat, 16, 4) []
+  = { 0xD405 /* A * G.  */,
+      0xFC00 /* B * G.  */,
+      0x5939 /* C * G.  */,
+      0x7C00 /* D * G.  */ };
+
+VECT_VAR_DECL (expected3_static, hfloat, 16, 4) []
+  = { 0x0000 /* A * H.  */,
+      0x4000 /* FP16_C (2.0f).  */,
+      0x8000 /* C * H.  */,
+      0xC000 /* FP16_C (-2.0f).  */ };
+
+/* Expected results for vmulxq_lane.  */
+VECT_VAR_DECL (expected0_static, hfloat, 16, 8) []
+  = { 0x8000 /* A * E.  */,
+      0xC000 /* FP16_C (-2.0f).  */,
+      0x0000 /* C * E.  */,
+      0x4000 /* FP16_C (2.0f).  */,
+      0x8000 /* I * E.  */,
+      0x0000 /* J * E.  */,
+      0x4000 /* FP16_C (2.0f).  */,
+      0x8000 /* L * E.  */ };
+
+VECT_VAR_DECL (expected1_static, hfloat, 16, 8) []
+  = { 0x5BFF /* A * F.  */,
+      0x7C00 /* B * F.  */,
+      0xE131 /* C * F.  */,
+      0xFC00 /* D * F.  */,
+      0x4AAF /* I * F.  */,
+      0xE5D1 /* J * F.  */,
+      0xFC00 /* K * F.  */,
+      0x674F /* L * F.  */ };
+
+VECT_VAR_DECL (expected2_static, hfloat, 16, 8) []
+  = { 0xD405 /* A * G.  */,
+      0xFC00 /* B * G.  */,
+      0x5939 /* C * G.  */,
+      0x7C00 /* D * G.  */,
+      0xC2B9 /* I * G.  */,
+      0x5DDA /* J * G.  */,
+      0x7C00 /* K * G.  */,
+      0xDF5A /* L * G.  */ };
+
+VECT_VAR_DECL (expected3_static, hfloat, 16, 8) []
+  = { 0x0000 /* A * H.  */,
+      0x4000 /* FP16_C (2.0f).  */,
+      0x8000 /* C * H.  */,
+      0xC000 /* FP16_C (-2.0f).  */,
+      0x0000 /* I * H.  */,
+      0x8000 /* J * H.  */,
+      0xC000 /* FP16_C (-2.0f).  */,
+      0x0000 /* L * H.  */};
+
+/* Expected results for vmulx_laneq.  */
+VECT_VAR_DECL (expected_laneq0_static, hfloat, 16, 4) []
+  = { 0x8000 /* A * E.  */,
+      0xC000 /* FP16_C (-2.0f).  */,
+      0x0000 /* C * E.  */,
+      0x4000 /* FP16_C (2.0f).  */ };
+
+VECT_VAR_DECL (expected_laneq1_static, hfloat, 16, 4) []
+  = { 0x5BFF /* A * F.  */,
+      0x7C00 /* B * F.  */,
+      0xE131 /* C * F.  */,
+      0xFC00 /* D * F.  */ };
+
+VECT_VAR_DECL (expected_laneq2_static, hfloat, 16, 4) []
+  = { 0xD405 /* A * G.  */,
+      0xFC00 /* B * G.  */,
+      0x5939 /* C * G.  */,
+      0x7C00 /* D * G.  */ };
+
+VECT_VAR_DECL (expected_laneq3_static, hfloat, 16, 4) []
+  = { 0x0000 /* A * H.  */,
+      0x4000 /* FP16_C (2.0f).  */,
+      0x8000 /* C * H.  */,
+      0xC000 /* FP16_C (-2.0f).  */ };
+
+VECT_VAR_DECL (expected_laneq4_static, hfloat, 16, 4) []
+  = { 0x648F /* A * M.  */,
+      0x7C00 /* B * M.  */,
+      0xE9ED /* C * M.  */,
+      0xFC00 /* D * M.  */ };
+
+VECT_VAR_DECL (expected_laneq5_static, hfloat, 16, 4) []
+  = { 0xD6B3 /* A * N.  */,
+      0xFC00 /* B * N.  */,
+      0x5C5A /* C * N.  */,
+      0x7C00 /* D * N.  */ };
+
+VECT_VAR_DECL (expected_laneq6_static, hfloat, 16, 4) []
+  = { 0xCB5E /* A * O.  */,
+      0xFC00 /* B * O.  */,
+      0x50C9 /* C * O.  */,
+      0x7C00 /* D * O.  */ };
+
+VECT_VAR_DECL (expected_laneq7_static, hfloat, 16, 4) []
+  = { 0x8000 /* A * P.  */,
+      0xC000 /* FP16_C (-2.0f).  */,
+      0x0000 /* C * P.  */,
+      0x4000 /* FP16_C (2.0f).  */ };
+
+VECT_VAR_DECL (expected_laneq0_static, hfloat, 16, 8) []
+  = { 0x8000 /* A * E.  */,
+      0xC000 /* FP16_C (-2.0f).  */,
+      0x0000 /* C * E.  */,
+      0x4000 /* FP16_C (2.0f).  */,
+      0x8000 /* I * E.  */,
+      0x0000 /* J * E.  */,
+      0x4000 /* FP16_C (2.0f).  */,
+      0x8000 /* L * E.  */  };
+
+VECT_VAR_DECL (expected_laneq1_static, hfloat, 16, 8) []
+  = { 0x5BFF /* A * F.  */,
+      0x7C00 /* B * F.  */,
+      0xE131 /* C * F.  */,
+      0xFC00 /* D * F.  */,
+      0x4AAF /* I * F.  */,
+      0xE5D1 /* J * F.  */,
+      0xFC00 /* K * F.  */,
+      0x674F /* L * F.  */ };
+
+VECT_VAR_DECL (expected_laneq2_static, hfloat, 16, 8) []
+  = { 0xD405 /* A * G.  */,
+      0xFC00 /* B * G.  */,
+      0x5939 /* C * G.  */,
+      0x7C00 /* D * G.  */,
+      0xC2B9 /* I * G.  */,
+      0x5DDA /* J * G.  */,
+      0x7C00 /* K * G.  */,
+      0xDF5A /* L * G.  */ };
+
+VECT_VAR_DECL (expected_laneq3_static, hfloat, 16, 8) []
+  = { 0x0000 /* A * H.  */,
+      0x4000 /* FP16_C (2.0f).  */,
+      0x8000 /* C * H.  */,
+      0xC000 /* FP16_C (-2.0f).  */,
+      0x0000 /* I * H.  */,
+      0x8000 /* J * H.  */,
+      0xC000 /* FP16_C (-2.0f).  */,
+      0x0000 /* L * H.  */ };
+
+VECT_VAR_DECL (expected_laneq4_static, hfloat, 16, 8) []
+  = { 0x648F /* A * M.  */,
+      0x7C00 /* B * M.  */,
+      0xE9ED /* C * M.  */,
+      0xFC00 /* D * M.  */,
+      0x53A0 /* I * M.  */,
+      0xEEA3 /* J * M.  */,
+      0xFC00 /* K * M.  */,
+      0x702B /* L * M.  */ };
+
+VECT_VAR_DECL (expected_laneq5_static, hfloat, 16, 8) []
+  = { 0xD6B3 /* A * N.  */,
+      0xFC00 /* B * N.  */,
+      0x5C5A /* C * N.  */,
+      0x7C00 /* D * N.  */,
+      0xC59A /* I * N.  */,
+      0x60E0 /* J * N.  */,
+      0x7C00 /* K * N.  */,
+      0xE220 /* L * N.  */ };
+
+VECT_VAR_DECL (expected_laneq6_static, hfloat, 16, 8) []
+  = { 0xCB5E /* A * O.  */,
+      0xFC00 /* B * O.  */,
+      0x50C9 /* C * O.  */,
+      0x7C00 /* D * O.  */,
+      0xBA29 /* I * O.  */,
+      0x555C /* J * O.  */,
+      0x7C00 /* K * O.  */,
+      0xD6BC /* L * O.  */ };
+
+VECT_VAR_DECL (expected_laneq7_static, hfloat, 16, 8) []
+  = { 0x8000 /* A * P.  */,
+      0xC000 /* FP16_C (-2.0f).  */,
+      0x0000 /* C * P.  */,
+      0x4000 /* FP16_C (2.0f).  */,
+      0x8000 /* I * P.  */,
+      0x0000 /* J * P.  */,
+      0x4000 /* FP16_C (2.0f).  */,
+      0x8000 /* L * P.  */ };
+
+void exec_vmulx_lane_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VMULX_LANE (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE(vsrc_1, float, 16, 4);
+  DECL_VARIABLE(vsrc_2, float, 16, 4);
+  VECT_VAR_DECL (buf_src_1, float, 16, 4) [] = {A, B, C, D};
+  VECT_VAR_DECL (buf_src_2, float, 16, 4) [] = {E, F, G, H};
+  VLOAD (vsrc_1, buf_src_1, , float, f, 16, 4);
+  VLOAD (vsrc_2, buf_src_2, , float, f, 16, 4);
+  DECL_VARIABLE (vector_res, float, 16, 4)
+    = vmulx_lane_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		      VECT_VAR (vsrc_2, float, 16, 4), 0);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected0_static, "");
+
+  VECT_VAR (vector_res, float, 16, 4)
+    = vmulx_lane_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		      VECT_VAR (vsrc_2, float, 16, 4), 1);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected1_static, "");
+
+  VECT_VAR (vector_res, float, 16, 4)
+    = vmulx_lane_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		      VECT_VAR (vsrc_2, float, 16, 4), 2);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected2_static, "");
+
+  VECT_VAR (vector_res, float, 16, 4)
+    = vmulx_lane_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		      VECT_VAR (vsrc_2, float, 16, 4), 3);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected3_static, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VMULXQ_LANE (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE(vsrc_1, float, 16, 8);
+  VECT_VAR_DECL (buf_src_1, float, 16, 8) [] = {A, B, C, D, I, J, K, L};
+  VLOAD (vsrc_1, buf_src_1, q, float, f, 16, 8);
+  DECL_VARIABLE (vector_res, float, 16, 8)
+    = vmulxq_lane_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		       VECT_VAR (vsrc_2, float, 16, 4), 0);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected0_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vmulxq_lane_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		       VECT_VAR (vsrc_2, float, 16, 4), 1);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected1_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vmulxq_lane_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		       VECT_VAR (vsrc_2, float, 16, 4), 2);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected2_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vmulxq_lane_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		       VECT_VAR (vsrc_2, float, 16, 4), 3);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected3_static, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VMULX_LANEQ (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE(vsrc_2, float, 16, 8);
+  VECT_VAR_DECL (buf_src_2, float, 16, 8) [] = {E, F, G, H, M, N, O, P};
+  VLOAD (vsrc_2, buf_src_2, q, float, f, 16, 8);
+  VECT_VAR (vector_res, float, 16, 4)
+    = vmulx_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		       VECT_VAR (vsrc_2, float, 16, 8), 0);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_laneq0_static, "");
+
+  VECT_VAR (vector_res, float, 16, 4)
+    = vmulx_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		       VECT_VAR (vsrc_2, float, 16, 8), 1);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_laneq1_static, "");
+
+  VECT_VAR (vector_res, float, 16, 4)
+    = vmulx_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		       VECT_VAR (vsrc_2, float, 16, 8), 2);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_laneq2_static, "");
+
+  VECT_VAR (vector_res, float, 16, 4)
+    = vmulx_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		       VECT_VAR (vsrc_2, float, 16, 8), 3);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_laneq3_static, "");
+
+  VECT_VAR (vector_res, float, 16, 4)
+    = vmulx_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		       VECT_VAR (vsrc_2, float, 16, 8), 4);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_laneq4_static, "");
+
+  VECT_VAR (vector_res, float, 16, 4)
+    = vmulx_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		       VECT_VAR (vsrc_2, float, 16, 8), 5);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_laneq5_static, "");
+
+  VECT_VAR (vector_res, float, 16, 4)
+    = vmulx_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		       VECT_VAR (vsrc_2, float, 16, 8), 6);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_laneq6_static, "");
+
+  VECT_VAR (vector_res, float, 16, 4)
+    = vmulx_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		       VECT_VAR (vsrc_2, float, 16, 8), 7);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_laneq7_static, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VMULXQ_LANEQ (FP16)"
+  clean_results ();
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vmulxq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+			VECT_VAR (vsrc_2, float, 16, 8), 0);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_laneq0_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vmulxq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+			VECT_VAR (vsrc_2, float, 16, 8), 1);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_laneq1_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vmulxq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+			VECT_VAR (vsrc_2, float, 16, 8), 2);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_laneq2_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vmulxq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+			VECT_VAR (vsrc_2, float, 16, 8), 3);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_laneq3_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vmulxq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+			VECT_VAR (vsrc_2, float, 16, 8), 4);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_laneq4_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vmulxq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+			VECT_VAR (vsrc_2, float, 16, 8), 5);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_laneq5_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vmulxq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+			VECT_VAR (vsrc_2, float, 16, 8), 6);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_laneq6_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vmulxq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+			VECT_VAR (vsrc_2, float, 16, 8), 7);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_laneq7_static, "");
+}
+
+int
+main (void)
+{
+  exec_vmulx_lane_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulx_n_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulx_n_f16_1.c
new file mode 100644
index 0000000..140647b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulx_n_f16_1.c
@@ -0,0 +1,177 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon } */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (13.4)
+#define B FP16_C (__builtin_inff ())
+#define C FP16_C (-34.8)
+#define D FP16_C (-__builtin_inff ())
+#define E FP16_C (-0.0)
+#define F FP16_C (19.1)
+#define G FP16_C (-4.8)
+#define H FP16_C (0.0)
+
+float16_t elemE = E;
+float16_t elemF = F;
+float16_t elemG = G;
+float16_t elemH = H;
+
+#define I FP16_C (0.7)
+#define J FP16_C (-78)
+#define K FP16_C (11.23)
+#define L FP16_C (98)
+#define M FP16_C (87.1)
+#define N FP16_C (-8)
+#define O FP16_C (-1.1)
+#define P FP16_C (-9.7)
+
+/* Expected results for vmulx_n.  */
+VECT_VAR_DECL (expected0_static, hfloat, 16, 4) []
+  = { 0x8000 /* A * E.  */,
+      0xC000 /* FP16_C (-2.0f).  */,
+      0x0000 /* C * E.  */,
+      0x4000 /* FP16_C (2.0f).  */ };
+
+VECT_VAR_DECL (expected1_static, hfloat, 16, 4) []
+  = { 0x5BFF /* A * F.  */,
+      0x7C00 /* B * F.  */,
+      0xE131 /* C * F.  */,
+      0xFC00 /* D * F.  */ };
+
+VECT_VAR_DECL (expected2_static, hfloat, 16, 4) []
+  = { 0xD405 /* A * G.  */,
+      0xFC00 /* B * G.  */,
+      0x5939 /* C * G.  */,
+      0x7C00 /* D * G.  */ };
+
+VECT_VAR_DECL (expected3_static, hfloat, 16, 4) []
+  = { 0x0000 /* A * H.  */,
+      0x4000 /* FP16_C (2.0f).  */,
+      0x8000 /* C * H.  */,
+      0xC000 /* FP16_C (-2.0f).  */ };
+
+VECT_VAR_DECL (expected0_static, hfloat, 16, 8) []
+  = { 0x8000 /* A * E.  */,
+      0xC000 /* FP16_C (-2.0f).  */,
+      0x0000 /* C * E.  */,
+      0x4000 /* FP16_C (2.0f).  */,
+      0x8000 /* I * E.  */,
+      0x0000 /* J * E.  */,
+      0x8000 /* K * E.  */,
+      0x8000 /* L * E.  */ };
+
+VECT_VAR_DECL (expected1_static, hfloat, 16, 8) []
+  = { 0x5BFF /* A * F.  */,
+      0x7C00 /* B * F.  */,
+      0xE131 /* C * F.  */,
+      0xFC00 /* D * F.  */,
+      0x4AAF /* I * F.  */,
+      0xE5D1 /* J * F.  */,
+      0x5AB3 /* K * F.  */,
+      0x674F /* L * F.  */ };
+
+VECT_VAR_DECL (expected2_static, hfloat, 16, 8) []
+  = { 0xD405 /* A * G.  */,
+      0xFC00 /* B * G.  */,
+      0x5939 /* C * G.  */,
+      0x7C00 /* D * G.  */,
+      0xC2B9 /* I * G.  */,
+      0x5DDA /* J * G.  */,
+      0xD2BD /* K * G.  */,
+      0xDF5A /* L * G.  */ };
+
+VECT_VAR_DECL (expected3_static, hfloat, 16, 8) []
+  = { 0x0000 /* A * H.  */,
+      0x4000 /* FP16_C (2.0f).  */,
+      0x8000 /* C * H.  */,
+      0xC000 /* FP16_C (-2.0f).  */,
+      0x0000 /* I * H.  */,
+      0x8000 /* J * H.  */,
+      0x0000 /* K * H.  */,
+      0x0000 /* L * H.  */ };
+
+void exec_vmulx_n_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VMULX_N (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc_1, float, 16, 4);
+  VECT_VAR_DECL (buf_src_1, float, 16, 4) [] = {A, B, C, D};
+  VLOAD (vsrc_1, buf_src_1, , float, f, 16, 4);
+  DECL_VARIABLE (vector_res, float, 16, 4)
+    = vmulx_n_f16 (VECT_VAR (vsrc_1, float, 16, 4), elemE);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected0_static, "");
+
+  VECT_VAR (vector_res, float, 16, 4)
+    = vmulx_n_f16 (VECT_VAR (vsrc_1, float, 16, 4), elemF);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected1_static, "");
+
+  VECT_VAR (vector_res, float, 16, 4)
+    = vmulx_n_f16 (VECT_VAR (vsrc_1, float, 16, 4), elemG);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected2_static, "");
+
+  VECT_VAR (vector_res, float, 16, 4)
+    = vmulx_n_f16 (VECT_VAR (vsrc_1, float, 16, 4), elemH);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected3_static, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VMULXQ_N (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc_1, float, 16, 8);
+  VECT_VAR_DECL (buf_src_1, float, 16, 8) [] = {A, B, C, D, I, J, K, L};
+  VLOAD (vsrc_1, buf_src_1, q, float, f, 16, 8);
+  DECL_VARIABLE (vector_res, float, 16, 8)
+    = vmulxq_n_f16 (VECT_VAR (vsrc_1, float, 16, 8), elemE);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected0_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vmulxq_n_f16 (VECT_VAR (vsrc_1, float, 16, 8), elemF);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected1_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vmulxq_n_f16 (VECT_VAR (vsrc_1, float, 16, 8), elemG);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected2_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vmulxq_n_f16 (VECT_VAR (vsrc_1, float, 16, 8), elemH);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected3_static, "");
+}
+
+int
+main (void)
+{
+  exec_vmulx_n_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vpminmaxnm_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vpminmaxnm_f16_1.c
new file mode 100644
index 0000000..c8df677
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vpminmaxnm_f16_1.c
@@ -0,0 +1,114 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon } */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (123.4)
+#define B FP16_C (__builtin_nanf ("")) /* NaN */
+#define C FP16_C (-34.8)
+#define D FP16_C (1024)
+#define E FP16_C (663.1)
+#define F FP16_C (169.1)
+#define G FP16_C (-4.8)
+#define H FP16_C (-__builtin_nanf ("")) /* NaN */
+
+#define I FP16_C (0.7)
+#define J FP16_C (-78)
+#define K FP16_C (101.23)
+#define L FP16_C (-1098)
+#define M FP16_C (870.1)
+#define N FP16_C (-8781)
+#define O FP16_C (__builtin_inff ()) /* +Inf */
+#define P FP16_C (-__builtin_inff ()) /* -Inf */
+
+
+/* Expected results for vpminnm.  */
+VECT_VAR_DECL (expected_min_static, hfloat, 16, 4) []
+  = { 0x57B6 /* A.  */, 0xD05A /* C.  */, 0x5949 /* F.  */, 0xC4CD /* G.  */ };
+
+VECT_VAR_DECL (expected_min_static, hfloat, 16, 8) []
+  = { 0x57B6 /* A.  */, 0xD05A /* C.  */, 0xD4E0 /* J.  */, 0xE44A /* L.  */,
+      0x5949 /* F.  */, 0xC4CD /* G.  */, 0xF04A /* N.  */, 0xFC00 /* P.  */ };
+
+/* expected_max results for vpmaxnm.  */
+VECT_VAR_DECL (expected_max_static, hfloat, 16, 4) []
+  = { 0x57B6 /* A.  */, 0x6400 /* D.  */, 0x612E /* E.  */, 0xC4CD /* G.  */ };
+
+VECT_VAR_DECL (expected_max_static, hfloat, 16, 8) []
+  = { 0x57B6 /* A.  */, 0x6400 /* D.  */, 0x399A /* I.  */, 0x5654 /* K.  */,
+      0x612E /* E.  */, 0xC4CD /* G.  */, 0x62CC /* M.  */, 0x7C00 /* O.  */ };
+
+void exec_vpminmaxnm_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VPMINNM (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE(vsrc_1, float, 16, 4);
+  DECL_VARIABLE(vsrc_2, float, 16, 4);
+  VECT_VAR_DECL (buf_src_1, float, 16, 4) [] = {A, B, C, D};
+  VECT_VAR_DECL (buf_src_2, float, 16, 4) [] = {E, F, G, H};
+  VLOAD (vsrc_1, buf_src_1, , float, f, 16, 4);
+  VLOAD (vsrc_2, buf_src_2, , float, f, 16, 4);
+  DECL_VARIABLE (vector_res, float, 16, 4)
+    = vpminnm_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		   VECT_VAR (vsrc_2, float, 16, 4));
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_min_static, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VPMINNMQ (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE(vsrc_1, float, 16, 8);
+  DECL_VARIABLE(vsrc_2, float, 16, 8);
+  VECT_VAR_DECL (buf_src_1, float, 16, 8) [] = {A, B, C, D, I, J, K, L};
+  VECT_VAR_DECL (buf_src_2, float, 16, 8) [] = {E, F, G, H, M, N, O, P};
+  VLOAD (vsrc_1, buf_src_1, q, float, f, 16, 8);
+  VLOAD (vsrc_2, buf_src_2, q, float, f, 16, 8);
+  DECL_VARIABLE (vector_res, float, 16, 8)
+    = vpminnmq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		    VECT_VAR (vsrc_2, float, 16, 8));
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_min_static, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VPMAXNM (FP16)"
+  clean_results ();
+
+  VECT_VAR (vector_res, float, 16, 4)
+    = vpmaxnm_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		   VECT_VAR (vsrc_2, float, 16, 4));
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_max_static, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VPMAXNMQ (FP16)"
+  clean_results ();
+
+  VECT_VAR (vector_res, float, 16, 8)
+    = vpmaxnmq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		    VECT_VAR (vsrc_2, float, 16, 8));
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_max_static, "");
+}
+
+int
+main (void)
+{
+  exec_vpminmaxnm_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndi_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndi_f16_1.c
new file mode 100644
index 0000000..7a4620b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndi_f16_1.c
@@ -0,0 +1,71 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon } */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (123.4)
+#define RNDI_A 0x57B0 /* FP16_C (123).  */
+#define B FP16_C (-567.5)
+#define RNDI_B 0xE070 /* FP16_C (-568).  */
+#define C FP16_C (-34.8)
+#define RNDI_C 0xD060 /* FP16_C (-35).  */
+#define D FP16_C (1024)
+#define RNDI_D 0x6400 /* FP16_C (1024).  */
+#define E FP16_C (663.1)
+#define RNDI_E 0x612E /* FP16_C (663).  */
+#define F FP16_C (169.1)
+#define RNDI_F 0x5948 /* FP16_C (169).  */
+#define G FP16_C (-4.8)
+#define RNDI_G 0xC500 /* FP16_C (-5).  */
+#define H FP16_C (77.5)
+#define RNDI_H 0x54E0 /* FP16_C (78).  */
+
+/* Expected results for vrndi.  */
+VECT_VAR_DECL (expected_static, hfloat, 16, 4) []
+  = { RNDI_A, RNDI_B, RNDI_C, RNDI_D };
+
+VECT_VAR_DECL (expected_static, hfloat, 16, 8) []
+  = { RNDI_A, RNDI_B, RNDI_C, RNDI_D, RNDI_E, RNDI_F, RNDI_G, RNDI_H };
+
+void exec_vrndi_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VRNDI (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE(vsrc, float, 16, 4);
+  VECT_VAR_DECL (buf_src, float, 16, 4) [] = {A, B, C, D};
+  VLOAD (vsrc, buf_src, , float, f, 16, 4);
+  DECL_VARIABLE (vector_res, float, 16, 4)
+    = vrndi_f16 (VECT_VAR (vsrc, float, 16, 4));
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_static, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VRNDIQ (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE(vsrc, float, 16, 8);
+  VECT_VAR_DECL (buf_src, float, 16, 8) [] = {A, B, C, D, E, F, G, H};
+  VLOAD (vsrc, buf_src, q, float, f, 16, 8);
+  DECL_VARIABLE (vector_res, float, 16, 8)
+    = vrndiq_f16 (VECT_VAR (vsrc, float, 16, 8));
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_static, "");
+}
+
+int
+main (void)
+{
+  exec_vrndi_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vsqrt_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vsqrt_f16_1.c
new file mode 100644
index 0000000..82249a7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vsqrt_f16_1.c
@@ -0,0 +1,72 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon } */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (123.4)
+#define B FP16_C (567.8)
+#define C FP16_C (34.8)
+#define D FP16_C (1024)
+#define E FP16_C (663.1)
+#define F FP16_C (144.0)
+#define G FP16_C (4.8)
+#define H FP16_C (77)
+
+#define SQRT_A 0x498E /* FP16_C (__builtin_sqrtf (123.4)).  */
+#define SQRT_B 0x4DF5 /* FP16_C (__builtin_sqrtf (567.8)).  */
+#define SQRT_C 0x45E6 /* FP16_C (__builtin_sqrtf (34.8)).  */
+#define SQRT_D 0x5000 /* FP16_C (__builtin_sqrtf (1024)).  */
+#define SQRT_E 0x4E70 /* FP16_C (__builtin_sqrtf (663.1)).  */
+#define SQRT_F 0x4A00 /* FP16_C (__builtin_sqrtf (144.0)).  */
+#define SQRT_G 0x4062 /* FP16_C (__builtin_sqrtf (4.8)).  */
+#define SQRT_H 0x4863 /* FP16_C (__builtin_sqrtf (77)).  */
+
+/* Expected results for vsqrt.  */
+VECT_VAR_DECL (expected_static, hfloat, 16, 4) []
+  = { SQRT_A, SQRT_B, SQRT_C, SQRT_D };
+
+VECT_VAR_DECL (expected_static, hfloat, 16, 8) []
+  = { SQRT_A, SQRT_B, SQRT_C, SQRT_D, SQRT_E, SQRT_F, SQRT_G, SQRT_H };
+
+void exec_vsqrt_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VSQRT (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE(vsrc, float, 16, 4);
+  VECT_VAR_DECL (buf_src, float, 16, 4) [] = {A, B, C, D};
+  VLOAD (vsrc, buf_src, , float, f, 16, 4);
+  DECL_VARIABLE (vector_res, float, 16, 4)
+    = vsqrt_f16 (VECT_VAR (vsrc, float, 16, 4));
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_static, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VSQRTQ (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE(vsrc, float, 16, 8);
+  VECT_VAR_DECL (buf_src, float, 16, 8) [] = {A, B, C, D, E, F, G, H};
+  VLOAD (vsrc, buf_src, q, float, f, 16, 8);
+  DECL_VARIABLE (vector_res, float, 16, 8)
+    = vsqrtq_f16 (VECT_VAR (vsrc, float, 16, 8));
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_static, "");
+}
+
+int
+main (void)
+{
+  exec_vsqrt_f16 ();
+  return 0;
+}
-- 
2.5.0




^ permalink raw reply	[flat|nested] 32+ messages in thread

* [AArch64][14/14] ARMv8.2-A testsuite for new scalar intrinsics
       [not found]                         ` <c5443f0d-577b-776b-4c97-7b16b06f8264@foss.arm.com>
@ 2016-07-07 16:19                           ` Jiong Wang
  2016-10-10  9:56                             ` James Greenhalgh
  0 siblings, 1 reply; 32+ messages in thread
From: Jiong Wang @ 2016-07-07 16:19 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 4448 bytes --]

This patch contains testcases for those new scalar intrinsics which are only
available for AArch64.

gcc/testsuite/
2016-07-07  Jiong Wang <jiong.wang@arm.com>

         * gcc.target/aarch64/advsimd-intrinsics/unary_scalar_op.inc: 
Support FMT64.
         * gcc.target/aarch64/advsimd-intrinsics/vabdh_f16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vcageh_f16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vcagth_f16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vcaleh_f16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vcalth_f16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vceqh_f16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vceqzh_f16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vcgeh_f16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vcgezh_f16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vcgth_f16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vcgtzh_f16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vcleh_f16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vclezh_f16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vclth_f16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vcltzh_f16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vcvtah_s16_f16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vcvtah_s64_f16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vcvtah_u16_f16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vcvtah_u64_f16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_s16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_s64_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_u16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_u64_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_s16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_s64_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_u16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_u64_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vcvth_n_s16_f16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vcvth_n_s64_f16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vcvth_n_u16_f16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vcvth_n_u64_f16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vcvth_s16_f16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vcvth_s64_f16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vcvth_u16_f16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vcvth_u64_f16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vcvtmh_s16_f16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vcvtmh_s64_f16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vcvtmh_u16_f16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vcvtmh_u64_f16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vcvtnh_s16_f16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vcvtnh_s64_f16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vcvtnh_u16_f16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vcvtnh_u64_f16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vcvtph_s16_f16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vcvtph_s64_f16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vcvtph_u16_f16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vcvtph_u64_f16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vfmash_lane_f16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vmaxh_f16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vminh_f16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vmulh_lane_f16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vmulxh_f16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vmulxh_lane_f16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vrecpeh_f16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vrecpsh_f16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vrecpxh_f16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vrsqrteh_f16_1.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vrsqrtsh_f16_1.c: New.

[-- Attachment #2: 0014-14-14-TESTSUITE-for-new-scalar-intrinsics.patch --]
[-- Type: text/x-patch, Size: 77562 bytes --]

From cd8b8df4d6841d0e94aa0f55013f580eb81ce4c0 Mon Sep 17 00:00:00 2001
From: Jiong Wang <jiong.wang@arm.com>
Date: Tue, 5 Jul 2016 13:44:03 +0100
Subject: [PATCH 14/14] [14/14] TESTSUITE for new scalar intrinsics

---
 .../aarch64/advsimd-intrinsics/unary_scalar_op.inc |   1 +
 .../aarch64/advsimd-intrinsics/vabdh_f16_1.c       |  44 +++++++
 .../aarch64/advsimd-intrinsics/vcageh_f16_1.c      |  21 +++
 .../aarch64/advsimd-intrinsics/vcagth_f16_1.c      |  20 +++
 .../aarch64/advsimd-intrinsics/vcaleh_f16_1.c      |  21 +++
 .../aarch64/advsimd-intrinsics/vcalth_f16_1.c      |  21 +++
 .../aarch64/advsimd-intrinsics/vceqh_f16_1.c       |  20 +++
 .../aarch64/advsimd-intrinsics/vceqzh_f16_1.c      |  20 +++
 .../aarch64/advsimd-intrinsics/vcgeh_f16_1.c       |  21 +++
 .../aarch64/advsimd-intrinsics/vcgezh_f16_1.c      |  21 +++
 .../aarch64/advsimd-intrinsics/vcgth_f16_1.c       |  21 +++
 .../aarch64/advsimd-intrinsics/vcgtzh_f16_1.c      |  21 +++
 .../aarch64/advsimd-intrinsics/vcleh_f16_1.c       |  21 +++
 .../aarch64/advsimd-intrinsics/vclezh_f16_1.c      |  20 +++
 .../aarch64/advsimd-intrinsics/vclth_f16_1.c       |  21 +++
 .../aarch64/advsimd-intrinsics/vcltzh_f16_1.c      |  20 +++
 .../aarch64/advsimd-intrinsics/vcvtah_s16_f16_1.c  |  23 ++++
 .../aarch64/advsimd-intrinsics/vcvtah_s64_f16_1.c  |  23 ++++
 .../aarch64/advsimd-intrinsics/vcvtah_u16_f16_1.c  |  23 ++++
 .../aarch64/advsimd-intrinsics/vcvtah_u64_f16_1.c  |  23 ++++
 .../aarch64/advsimd-intrinsics/vcvth_f16_s16_1.c   |  25 ++++
 .../aarch64/advsimd-intrinsics/vcvth_f16_s64_1.c   |  25 ++++
 .../aarch64/advsimd-intrinsics/vcvth_f16_u16_1.c   |  25 ++++
 .../aarch64/advsimd-intrinsics/vcvth_f16_u64_1.c   |  25 ++++
 .../aarch64/advsimd-intrinsics/vcvth_n_f16_s16_1.c |  46 +++++++
 .../aarch64/advsimd-intrinsics/vcvth_n_f16_s64_1.c |  46 +++++++
 .../aarch64/advsimd-intrinsics/vcvth_n_f16_u16_1.c |  46 +++++++
 .../aarch64/advsimd-intrinsics/vcvth_n_f16_u64_1.c |  46 +++++++
 .../aarch64/advsimd-intrinsics/vcvth_n_s16_f16_1.c |  29 +++++
 .../aarch64/advsimd-intrinsics/vcvth_n_s64_f16_1.c |  29 +++++
 .../aarch64/advsimd-intrinsics/vcvth_n_u16_f16_1.c |  29 +++++
 .../aarch64/advsimd-intrinsics/vcvth_n_u64_f16_1.c |  29 +++++
 .../aarch64/advsimd-intrinsics/vcvth_s16_f16_1.c   |  23 ++++
 .../aarch64/advsimd-intrinsics/vcvth_s64_f16_1.c   |  23 ++++
 .../aarch64/advsimd-intrinsics/vcvth_u16_f16_1.c   |  23 ++++
 .../aarch64/advsimd-intrinsics/vcvth_u64_f16_1.c   |  23 ++++
 .../aarch64/advsimd-intrinsics/vcvtmh_s16_f16_1.c  |  23 ++++
 .../aarch64/advsimd-intrinsics/vcvtmh_s64_f16_1.c  |  23 ++++
 .../aarch64/advsimd-intrinsics/vcvtmh_u16_f16_1.c  |  23 ++++
 .../aarch64/advsimd-intrinsics/vcvtmh_u64_f16_1.c  |  23 ++++
 .../aarch64/advsimd-intrinsics/vcvtnh_s16_f16_1.c  |  23 ++++
 .../aarch64/advsimd-intrinsics/vcvtnh_s64_f16_1.c  |  23 ++++
 .../aarch64/advsimd-intrinsics/vcvtnh_u16_f16_1.c  |  23 ++++
 .../aarch64/advsimd-intrinsics/vcvtnh_u64_f16_1.c  |  23 ++++
 .../aarch64/advsimd-intrinsics/vcvtph_s16_f16_1.c  |  23 ++++
 .../aarch64/advsimd-intrinsics/vcvtph_s64_f16_1.c  |  23 ++++
 .../aarch64/advsimd-intrinsics/vcvtph_u16_f16_1.c  |  23 ++++
 .../aarch64/advsimd-intrinsics/vcvtph_u64_f16_1.c  |  23 ++++
 .../aarch64/advsimd-intrinsics/vfmash_lane_f16_1.c | 143 +++++++++++++++++++++
 .../aarch64/advsimd-intrinsics/vmaxh_f16_1.c       |  34 +++++
 .../aarch64/advsimd-intrinsics/vminh_f16_1.c       |  34 +++++
 .../aarch64/advsimd-intrinsics/vmulh_lane_f16_1.c  |  90 +++++++++++++
 .../aarch64/advsimd-intrinsics/vmulxh_f16_1.c      |  50 +++++++
 .../aarch64/advsimd-intrinsics/vmulxh_lane_f16_1.c |  91 +++++++++++++
 .../aarch64/advsimd-intrinsics/vrecpeh_f16_1.c     |  42 ++++++
 .../aarch64/advsimd-intrinsics/vrecpsh_f16_1.c     |  50 +++++++
 .../aarch64/advsimd-intrinsics/vrecpxh_f16_1.c     |  32 +++++
 .../aarch64/advsimd-intrinsics/vrsqrteh_f16_1.c    |  30 +++++
 .../aarch64/advsimd-intrinsics/vrsqrtsh_f16_1.c    |  50 +++++++
 59 files changed, 1840 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vabdh_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcageh_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcagth_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcaleh_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcalth_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vceqh_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vceqzh_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcgeh_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcgezh_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcgth_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcgtzh_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcleh_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vclezh_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vclth_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcltzh_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtah_s16_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtah_s64_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtah_u16_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtah_u64_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_s16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_s64_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_u16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_u64_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_s16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_s64_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_u16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_u64_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_s16_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_s64_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_u16_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_u64_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_s16_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_s64_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_u16_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_u64_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtmh_s16_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtmh_s64_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtmh_u16_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtmh_u64_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtnh_s16_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtnh_s64_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtnh_u16_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtnh_u64_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtph_s16_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtph_s64_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtph_u16_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtph_u64_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfmash_lane_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmaxh_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vminh_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulh_lane_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulxh_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulxh_lane_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrecpeh_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrecpsh_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrecpxh_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrsqrteh_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrsqrtsh_f16_1.c

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/unary_scalar_op.inc b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/unary_scalar_op.inc
index 86403d2..66c8906 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/unary_scalar_op.inc
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/unary_scalar_op.inc
@@ -64,6 +64,7 @@ extern void abort ();
 /* Format strings for error reporting.  */
 #define FMT16 "0x%04x"
 #define FMT32 "0x%08x"
+#define FMT64 "0x%016x"
 #define FMT CAT (FMT,OUTPUT_TYPE_SIZE)
 
 /* Type construction: forms TS_t, where T is the base type and S the size in
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vabdh_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vabdh_f16_1.c
new file mode 100644
index 0000000..3a5efa5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vabdh_f16_1.c
@@ -0,0 +1,44 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_fp16.h>
+
+#define INFF __builtin_inf ()
+
+/* Expected results.
+   Absolute difference between INPUT1 and INPUT2 in binary_scalar_op.inc.  */
+uint16_t expected[] =
+{
+  0x3C00,
+  0x3C00,
+  0x4654,
+  0x560E,
+  0x4900,
+  0x36B8,
+  0x419a,
+  0x4848,
+  0x3d34,
+  0x4cec,
+  0x4791,
+  0x3f34,
+  0x484d,
+  0x4804,
+  0x469c,
+  0x4ceb,
+  0x7c00,
+  0x7c00
+};
+
+#define TEST_MSG "VABDH_F16"
+#define INSN_NAME vabdh_f16
+
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE float16_t
+#define OUTPUT_TYPE_SIZE 16
+
+/* Include the template for binary scalar operations.  */
+#include "binary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcageh_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcageh_f16_1.c
new file mode 100644
index 0000000..f8c8c79
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcageh_f16_1.c
@@ -0,0 +1,21 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+uint16_t expected[] = { 0x0, 0x0, 0x0, 0x0, 0x0, 0xFFFF, 0xFFFF, 0x0, 0xFFFF,
+			0x0, 0x0, 0x0, 0x0, 0xFFFF, 0xFFFF, 0xFFFF, 0xFFFF,
+			0xFFFF};
+
+#define TEST_MSG "VCAGEH_F16"
+#define INSN_NAME vcageh_f16
+
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE uint16_t
+#define OUTPUT_TYPE_SIZE 16
+
+/* Include the template for binary scalar operations.  */
+#include "binary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcagth_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcagth_f16_1.c
new file mode 100644
index 0000000..23c11a4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcagth_f16_1.c
@@ -0,0 +1,20 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+uint16_t expected[] = { 0x0, 0x0, 0x0, 0x0, 0x0, 0xFFFF, 0xFFFF, 0x0, 0xFFFF,
+			0x0, 0x0, 0x0, 0x0, 0xFFFF, 0xFFFF, 0xFFFF, 0x0, 0x0};
+
+#define TEST_MSG "VCAGTH_F16"
+#define INSN_NAME vcagth_f16
+
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE uint16_t
+#define OUTPUT_TYPE_SIZE 16
+
+/* Include the template for binary scalar operations.  */
+#include "binary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcaleh_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcaleh_f16_1.c
new file mode 100644
index 0000000..ae4c8b5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcaleh_f16_1.c
@@ -0,0 +1,21 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+uint16_t expected[] = { 0xFFFF, 0xFFFF, 0xFFFF, 0xFFFF, 0xFFFF, 0x0, 0x0,
+			0xFFFF, 0x0, 0xFFFF, 0xFFFF, 0xFFFF, 0xFFFF, 0x0, 0x0,
+			0x0, 0xFFFF, 0xFFFF};
+
+#define TEST_MSG "VCALEH_F16"
+#define INSN_NAME vcaleh_f16
+
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE uint16_t
+#define OUTPUT_TYPE_SIZE 16
+
+/* Include the template for binary scalar operations.  */
+#include "binary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcalth_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcalth_f16_1.c
new file mode 100644
index 0000000..56a6533
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcalth_f16_1.c
@@ -0,0 +1,21 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+uint16_t expected[] = { 0xFFFF, 0xFFFF, 0xFFFF, 0xFFFF, 0xFFFF, 0x0, 0x0,
+			0xFFFF, 0x0, 0xFFFF, 0xFFFF, 0xFFFF, 0xFFFF, 0x0, 0x0,
+			0x0, 0x0, 0x0};
+
+#define TEST_MSG "VCALTH_F16"
+#define INSN_NAME vcalth_f16
+
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE uint16_t
+#define OUTPUT_TYPE_SIZE 16
+
+/* Include the template for binary scalar operations.  */
+#include "binary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vceqh_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vceqh_f16_1.c
new file mode 100644
index 0000000..fb54e96
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vceqh_f16_1.c
@@ -0,0 +1,20 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+uint16_t expected[] = { 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
+			0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0};
+
+#define TEST_MSG "VCEQH_F16"
+#define INSN_NAME vceqh_f16
+
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE uint16_t
+#define OUTPUT_TYPE_SIZE 16
+
+/* Include the template for binary scalar operations.  */
+#include "binary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vceqzh_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vceqzh_f16_1.c
new file mode 100644
index 0000000..57c765c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vceqzh_f16_1.c
@@ -0,0 +1,20 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+uint16_t expected[] = { 0xFFFF, 0xFFFF, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
+			0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0};
+
+#define TEST_MSG "VCEQZH_F16"
+#define INSN_NAME vceqzh_f16
+
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE uint16_t
+#define OUTPUT_TYPE_SIZE 16
+
+/* Include the template for binary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcgeh_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcgeh_f16_1.c
new file mode 100644
index 0000000..f9a5bbe
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcgeh_f16_1.c
@@ -0,0 +1,21 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+uint16_t expected[] = { 0x0, 0x0, 0xFFFF, 0x0, 0x0, 0xFFFF, 0x0, 0xFFFF,
+			0x0, 0x0, 0xFFFF, 0x0, 0xFFFF, 0xFFFF, 0x0, 0xFFFF,
+			0xFFFF, 0x0};
+
+#define TEST_MSG "VCGEH_F16"
+#define INSN_NAME vcgeh_f16
+
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE uint16_t
+#define OUTPUT_TYPE_SIZE 16
+
+/* Include the template for binary scalar operations.  */
+#include "binary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcgezh_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcgezh_f16_1.c
new file mode 100644
index 0000000..a5997cc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcgezh_f16_1.c
@@ -0,0 +1,21 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+uint16_t expected[] = { 0xFFFF, 0xFFFF, 0xFFFF, 0xFFFF, 0xFFFF, 0xFFFF, 0x0,
+			0xFFFF, 0x0, 0xFFFF, 0xFFFF, 0xFFFF, 0xFFFF, 0xFFFF,
+			0x0, 0xFFFF, 0xFFFF, 0x0};
+
+#define TEST_MSG "VCGEZH_F16"
+#define INSN_NAME vcgezh_f16
+
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE uint16_t
+#define OUTPUT_TYPE_SIZE 16
+
+/* Include the template for binary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcgth_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcgth_f16_1.c
new file mode 100644
index 0000000..f0a37e8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcgth_f16_1.c
@@ -0,0 +1,21 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+uint16_t expected[] = { 0x0, 0x0, 0xFFFF, 0x0, 0x0, 0xFFFF, 0x0, 0xFFFF,
+			0x0, 0x0, 0xFFFF, 0x0, 0xFFFF, 0xFFFF, 0x0, 0xFFFF,
+			0xFFFF, 0x0};
+
+#define TEST_MSG "VCGTH_F16"
+#define INSN_NAME vcgth_f16
+
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE uint16_t
+#define OUTPUT_TYPE_SIZE 16
+
+/* Include the template for binary scalar operations.  */
+#include "binary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcgtzh_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcgtzh_f16_1.c
new file mode 100644
index 0000000..41e57a2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcgtzh_f16_1.c
@@ -0,0 +1,21 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+uint16_t expected[] = { 0x0, 0x0, 0xFFFF, 0xFFFF, 0xFFFF, 0xFFFF, 0x0, 0xFFFF,
+			0x0, 0xFFFF, 0xFFFF, 0xFFFF, 0xFFFF, 0xFFFF, 0x0,
+			0xFFFF, 0xFFFF, 0x0};
+
+#define TEST_MSG "VCGTZH_F16"
+#define INSN_NAME vcgtzh_f16
+
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE uint16_t
+#define OUTPUT_TYPE_SIZE 16
+
+/* Include the template for binary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcleh_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcleh_f16_1.c
new file mode 100644
index 0000000..e19eb51
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcleh_f16_1.c
@@ -0,0 +1,21 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+uint16_t expected[] = { 0xFFFF, 0xFFFF, 0x0, 0xFFFF, 0xFFFF, 0x0, 0xFFFF, 0x0,
+			0xFFFF, 0xFFFF, 0x0, 0xFFFF, 0x0, 0x0, 0xFFFF, 0x0, 0x0,
+			0xFFFF};
+
+#define TEST_MSG "VCLEH_F16"
+#define INSN_NAME vcleh_f16
+
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE uint16_t
+#define OUTPUT_TYPE_SIZE 16
+
+/* Include the template for binary scalar operations.  */
+#include "binary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vclezh_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vclezh_f16_1.c
new file mode 100644
index 0000000..6d09db9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vclezh_f16_1.c
@@ -0,0 +1,20 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+uint16_t expected[] = { 0xFFFF, 0xFFFF, 0x0, 0x0, 0x0, 0x0, 0xFFFF, 0x0, 0xFFFF,
+			0x0, 0x0, 0x0, 0x0, 0x0, 0xFFFF, 0x0, 0x0, 0xFFFF};
+
+#define TEST_MSG "VCLEZH_F16"
+#define INSN_NAME vclezh_f16
+
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE uint16_t
+#define OUTPUT_TYPE_SIZE 16
+
+/* Include the template for binary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vclth_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vclth_f16_1.c
new file mode 100644
index 0000000..f81c900
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vclth_f16_1.c
@@ -0,0 +1,21 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+uint16_t expected[] = { 0xFFFF, 0xFFFF, 0x0, 0xFFFF, 0xFFFF, 0x0, 0xFFFF, 0x0,
+			0xFFFF, 0xFFFF, 0x0, 0xFFFF, 0x0, 0x0, 0xFFFF, 0x0, 0x0,
+			0xFFFF};
+
+#define TEST_MSG "VCLTH_F16"
+#define INSN_NAME vclth_f16
+
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE uint16_t
+#define OUTPUT_TYPE_SIZE 16
+
+/* Include the template for binary scalar operations.  */
+#include "binary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcltzh_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcltzh_f16_1.c
new file mode 100644
index 0000000..00f6923
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcltzh_f16_1.c
@@ -0,0 +1,20 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+uint16_t expected[] = { 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xFFFF, 0x0, 0xFFFF,
+			0x0, 0x0, 0x0, 0x0, 0x0, 0xFFFF, 0x0, 0x0, 0xFFFF};
+
+#define TEST_MSG "VCltZH_F16"
+#define INSN_NAME vcltzh_f16
+
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE uint16_t
+#define OUTPUT_TYPE_SIZE 16
+
+/* Include the template for binary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtah_s16_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtah_s16_f16_1.c
new file mode 100644
index 0000000..2084c30
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtah_s16_f16_1.c
@@ -0,0 +1,23 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_fp16.h>
+
+/* Input values.  */
+float16_t input[] = { 123.9, -56.8, 0.7, 24.6, -63.5, 169.4, -4.3, 77.0 };
+int16_t expected[] = { 124, -57, 1, 25, -64, 169, -4, 77 };
+
+#define TEST_MSG "VCVTAH_S16_F16"
+#define INSN_NAME vcvtah_s16_f16
+
+#define INPUT input
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE int16_t
+#define OUTPUT_TYPE_SIZE 16
+
+/* Include the template for unary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtah_s64_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtah_s64_f16_1.c
new file mode 100644
index 0000000..a27871b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtah_s64_f16_1.c
@@ -0,0 +1,23 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_fp16.h>
+
+/* Input values.  */
+float16_t input[] = { 123.9, -56.8, 0.7, 24.6, -63.5, 169.4, -4.3, 77.0 };
+int64_t expected[] = { 124, -57, 1, 25, -64, 169, -4, 77 };
+
+#define TEST_MSG "VCVTAH_S64_F16"
+#define INSN_NAME vcvtah_s64_f16
+
+#define INPUT input
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE int64_t
+#define OUTPUT_TYPE_SIZE 64
+
+/* Include the template for unary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtah_u16_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtah_u16_f16_1.c
new file mode 100644
index 0000000..0642ae0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtah_u16_f16_1.c
@@ -0,0 +1,23 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_fp16.h>
+
+/* Input values.  */
+float16_t input[] = { 123.9, 56.8, 0.7, 24.6, 63.5, 169.4, 4.3, 77.0 };
+uint16_t expected[] = { 124, 57, 1, 25, 64, 169, 4, 77 };
+
+#define TEST_MSG "VCVTAH_u16_F16"
+#define INSN_NAME vcvtah_u16_f16
+
+#define INPUT input
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE uint16_t
+#define OUTPUT_TYPE_SIZE 16
+
+/* Include the template for unary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtah_u64_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtah_u64_f16_1.c
new file mode 100644
index 0000000..2d197b4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtah_u64_f16_1.c
@@ -0,0 +1,23 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_fp16.h>
+
+/* Input values.  */
+float16_t input[] = { 123.9, 56.8, 0.7, 24.6, 63.5, 169.4, 4.3, 77.0 };
+uint64_t expected[] = { 124, 57, 1, 25, 64, 169, 4, 77 };
+
+#define TEST_MSG "VCVTAH_u64_F16"
+#define INSN_NAME vcvtah_u64_f16
+
+#define INPUT input
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE uint64_t
+#define OUTPUT_TYPE_SIZE 64
+
+/* Include the template for unary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_s16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_s16_1.c
new file mode 100644
index 0000000..540b637
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_s16_1.c
@@ -0,0 +1,25 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_fp16.h>
+
+int16_t input[] = { 123, -567, 0, 1024, -63, 169, -4, 77 };
+uint16_t expected[] = { 0x57B0 /* 123.0.  */, 0xE06E /* -567.0.  */,
+			0x0000 /* 0.0.  */, 0x6400 /* 1024.  */,
+			0xD3E0 /* -63.  */, 0x5948 /* 169.  */,
+			0xC400 /* -4.  */, 0x54D0 /* 77.  */ };
+
+#define TEST_MSG "VCVTH_F16_S16"
+#define INSN_NAME vcvth_f16_s16
+
+#define EXPECTED expected
+
+#define INPUT input
+#define INPUT_TYPE int16_t
+#define OUTPUT_TYPE float16_t
+#define OUTPUT_TYPE_SIZE 16
+
+/* Include the template for binary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_s64_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_s64_1.c
new file mode 100644
index 0000000..5f17dbe
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_s64_1.c
@@ -0,0 +1,25 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_fp16.h>
+
+int64_t input[] = { 123, -567, 0, 1024, -63, 169, -4, 77 };
+uint16_t expected[] = { 0x57B0 /* 123.0.  */, 0xE06E /* -567.0.  */,
+			0x0000 /* 0.0.  */, 0x6400 /* 1024.  */,
+			0xD3E0 /* -63.  */, 0x5948 /* 169.  */,
+			0xC400 /* -4.  */, 0x54D0 /* 77.  */ };
+
+#define TEST_MSG "VCVTH_F16_S64"
+#define INSN_NAME vcvth_f16_s64
+
+#define EXPECTED expected
+
+#define INPUT input
+#define INPUT_TYPE int64_t
+#define OUTPUT_TYPE float16_t
+#define OUTPUT_TYPE_SIZE 16
+
+/* Include the template for binary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_u16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_u16_1.c
new file mode 100644
index 0000000..426700c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_u16_1.c
@@ -0,0 +1,25 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_fp16.h>
+
+uint16_t input[] = { 123, 567, 0, 1024, 63, 169, 4, 77 };
+uint16_t expected[] = { 0x57B0 /* 123.0.  */, 0x606E /* 567.0.  */,
+			0x0000 /* 0.0.  */, 0x6400 /* 1024.0.  */,
+			0x53E0 /* 63.0.  */, 0x5948 /* 169.0.  */,
+			0x4400 /* 4.0.  */, 0x54D0 /* 77.0.  */ };
+
+#define TEST_MSG "VCVTH_F16_U16"
+#define INSN_NAME vcvth_f16_u16
+
+#define EXPECTED expected
+
+#define INPUT input
+#define INPUT_TYPE uint16_t
+#define OUTPUT_TYPE float16_t
+#define OUTPUT_TYPE_SIZE 16
+
+/* Include the template for binary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_u64_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_u64_1.c
new file mode 100644
index 0000000..3413de0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_u64_1.c
@@ -0,0 +1,25 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_fp16.h>
+
+uint64_t input[] = { 123, 567, 0, 1024, 63, 169, 4, 77 };
+uint16_t expected[] = { 0x57B0 /* 123.0.  */, 0x606E /* 567.0.  */,
+			0x0000 /* 0.0.  */, 0x6400 /* 1024.0.  */,
+			0x53E0 /* 63.0.  */, 0x5948 /* 169.0.  */,
+			0x4400 /* 4.0.  */, 0x54D0 /* 77.0.  */ };
+
+#define TEST_MSG "VCVTH_F16_U64"
+#define INSN_NAME vcvth_f16_u64
+
+#define EXPECTED expected
+
+#define INPUT input
+#define INPUT_TYPE uint64_t
+#define OUTPUT_TYPE float16_t
+#define OUTPUT_TYPE_SIZE 16
+
+/* Include the template for binary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_s16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_s16_1.c
new file mode 100644
index 0000000..25265d1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_s16_1.c
@@ -0,0 +1,46 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_fp16.h>
+
+/* Input values.  */
+int16_t input[] = { 1, 10, 48, 100, -1, -10, 7, -7 };
+
+/* Expected results (16-bit hexadecimal representation).  */
+uint16_t expected_1[] = { 0x3800 /* 0.5.  */,
+			  0x4500 /* 5.  */,
+			  0x4E00 /* 24.  */,
+			  0x5240 /* 50.  */,
+			  0xB800 /* -0.5.  */,
+			  0xC500 /* -5.  */,
+			  0x4300 /* 3.5.  */,
+			  0xC300 /* -3.5.  */ };
+
+uint16_t expected_2[] = { 0x3400 /* 0.25.  */,
+			  0x4100 /* 2.5.  */,
+			  0x4A00 /* 12.  */,
+			  0x4E40 /* 25.  */,
+			  0xB400 /* -0.25.  */,
+			  0xC100 /* -2.5.  */,
+			  0x3F00 /* 1.75.  */,
+			  0xBF00 /* -1.75.  */ };
+
+#define TEST_MSG "VCVTH_N_F16_S16"
+#define INSN_NAME vcvth_n_f16_s16
+
+#define INPUT input
+#define EXPECTED_1 expected_1
+#define EXPECTED_2 expected_2
+
+#define INPUT_TYPE int16_t
+#define OUTPUT_TYPE float16_t
+#define OUTPUT_TYPE_SIZE 16
+
+#define SCALAR_OPERANDS
+#define SCALAR_1 1
+#define SCALAR_2 2
+
+/* Include the template for unary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_s64_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_s64_1.c
new file mode 100644
index 0000000..f0adb09
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_s64_1.c
@@ -0,0 +1,46 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_fp16.h>
+
+/* Input values.  */
+int64_t input[] = { 1, 10, 48, 100, -1, -10, 7, -7 };
+
+/* Expected results (16-bit hexadecimal representation).  */
+uint16_t expected_1[] = { 0x3800 /* 0.5.  */,
+			  0x4500 /* 5.  */,
+			  0x4E00 /* 24.  */,
+			  0x5240 /* 50.  */,
+			  0xB800 /* -0.5.  */,
+			  0xC500 /* -5.  */,
+			  0x4300 /* 3.5.  */,
+			  0xC300 /* -3.5.  */ };
+
+uint16_t expected_2[] = { 0x3400 /* 0.25.  */,
+			  0x4100 /* 2.5.  */,
+			  0x4A00 /* 12.  */,
+			  0x4E40 /* 25.  */,
+			  0xB400 /* -0.25.  */,
+			  0xC100 /* -2.5.  */,
+			  0x3F00 /* 1.75.  */,
+			  0xBF00 /* -1.75.  */ };
+
+#define TEST_MSG "VCVTH_N_F16_S64"
+#define INSN_NAME vcvth_n_f16_s64
+
+#define INPUT input
+#define EXPECTED_1 expected_1
+#define EXPECTED_2 expected_2
+
+#define INPUT_TYPE int64_t
+#define OUTPUT_TYPE float16_t
+#define OUTPUT_TYPE_SIZE 16
+
+#define SCALAR_OPERANDS
+#define SCALAR_1 1
+#define SCALAR_2 2
+
+/* Include the template for unary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_u16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_u16_1.c
new file mode 100644
index 0000000..74c4e60
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_u16_1.c
@@ -0,0 +1,46 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_fp16.h>
+
+/* Input values.  */
+uint16_t input[] = { 1, 10, 48, 100, 1000, 0, 500, 9 };
+
+/* Expected results (16-bit hexadecimal representation).  */
+uint16_t expected_1[] = { 0x3800 /* 0.5.  */,
+			  0x4500 /* 5.  */,
+			  0x4E00 /* 24.  */,
+			  0x5240 /* 50.  */,
+			  0x5FD0 /* 500.  */,
+			  0x0000 /* 0.0.  */,
+			  0x5BD0 /* 250.  */,
+			  0x4480 /* 4.5.  */ };
+
+uint16_t expected_2[] = { 0x3400 /* 0.25.  */,
+			  0x4100 /* 2.5.  */,
+			  0x4A00 /* 12.  */,
+			  0x4E40 /* 25.  */,
+			  0x5BD0 /* 250.  */,
+			  0x0000 /* 0.0.  */,
+			  0x57D0 /* 125.  */,
+			  0x4080 /* 2.25.  */ };
+
+#define TEST_MSG "VCVTH_N_F16_U16"
+#define INSN_NAME vcvth_n_f16_u16
+
+#define INPUT input
+#define EXPECTED_1 expected_1
+#define EXPECTED_2 expected_2
+
+#define INPUT_TYPE uint16_t
+#define OUTPUT_TYPE float16_t
+#define OUTPUT_TYPE_SIZE 16
+
+#define SCALAR_OPERANDS
+#define SCALAR_1 1
+#define SCALAR_2 2
+
+/* Include the template for unary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_u64_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_u64_1.c
new file mode 100644
index 0000000..b393767
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_u64_1.c
@@ -0,0 +1,46 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_fp16.h>
+
+/* Input values.  */
+uint64_t input[] = { 1, 10, 48, 100, 1000, 0, 500, 9 };
+
+/* Expected results (16-bit hexadecimal representation).  */
+uint16_t expected_1[] = { 0x3800 /* 0.5.  */,
+			  0x4500 /* 5.  */,
+			  0x4E00 /* 24.  */,
+			  0x5240 /* 50.  */,
+			  0x5FD0 /* 500.  */,
+			  0x0000 /* 0.0.  */,
+			  0x5BD0 /* 250.  */,
+			  0x4480 /* 4.5.  */ };
+
+uint16_t expected_2[] = { 0x3400 /* 0.25.  */,
+			  0x4100 /* 2.5.  */,
+			  0x4A00 /* 12.  */,
+			  0x4E40 /* 25.  */,
+			  0x5BD0 /* 250.  */,
+			  0x0000 /* 0.0.  */,
+			  0x57D0 /* 125.  */,
+			  0x4080 /* 2.25.  */ };
+
+#define TEST_MSG "VCVTH_N_F16_U64"
+#define INSN_NAME vcvth_n_f16_u64
+
+#define INPUT input
+#define EXPECTED_1 expected_1
+#define EXPECTED_2 expected_2
+
+#define INPUT_TYPE uint64_t
+#define OUTPUT_TYPE float16_t
+#define OUTPUT_TYPE_SIZE 16
+
+#define SCALAR_OPERANDS
+#define SCALAR_1 1
+#define SCALAR_2 2
+
+/* Include the template for unary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_s16_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_s16_f16_1.c
new file mode 100644
index 0000000..247f7c9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_s16_f16_1.c
@@ -0,0 +1,29 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_fp16.h>
+
+/* Input values.  */
+float16_t input[] = { 2.5, 100, 7.1, -9.9, -5.0, 9.1, -4.8, 77 };
+int16_t expected_1[] = { 5, 200, 14, -19, -10, 18, -9, 154 };
+int16_t expected_2[] = { 10, 400, 28, -39, -20, 36, -19, 308 };
+
+#define TEST_MSG "VCVTH_N_S16_F16"
+#define INSN_NAME vcvth_n_s16_f16
+
+#define INPUT input
+#define EXPECTED_1 expected_1
+#define EXPECTED_2 expected_2
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE int16_t
+#define OUTPUT_TYPE_SIZE 16
+
+#define SCALAR_OPERANDS
+#define SCALAR_1 1
+#define SCALAR_2 2
+
+/* Include the template for unary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_s64_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_s64_f16_1.c
new file mode 100644
index 0000000..27502c2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_s64_f16_1.c
@@ -0,0 +1,29 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_fp16.h>
+
+/* Input values.  */
+float16_t input[] = { 2.5, 100, 7.1, -9.9, -5.0, 9.1, -4.8, 77 };
+int64_t expected_1[] = { 5, 200, 14, -19, -10, 18, -9, 154 };
+int64_t expected_2[] = { 10, 400, 28, -39, -20, 36, -19, 308 };
+
+#define TEST_MSG "VCVTH_N_S64_F16"
+#define INSN_NAME vcvth_n_s64_f16
+
+#define INPUT input
+#define EXPECTED_1 expected_1
+#define EXPECTED_2 expected_2
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE int64_t
+#define OUTPUT_TYPE_SIZE 64
+
+#define SCALAR_OPERANDS
+#define SCALAR_1 1
+#define SCALAR_2 2
+
+/* Include the template for unary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_u16_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_u16_f16_1.c
new file mode 100644
index 0000000..e5f57f1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_u16_f16_1.c
@@ -0,0 +1,29 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_fp16.h>
+
+/* Input values.  */
+float16_t input[] = { 2.5, 100, 7.1, 9.9, 5.0, 9.1, 4.8, 77 };
+uint16_t expected_1[] = {5, 200, 14, 19, 10, 18, 9, 154};
+uint16_t expected_2[] = {10, 400, 28, 39, 20, 36, 19, 308};
+
+#define TEST_MSG "VCVTH_N_U16_F16"
+#define INSN_NAME vcvth_n_u16_f16
+
+#define INPUT input
+#define EXPECTED_1 expected_1
+#define EXPECTED_2 expected_2
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE uint16_t
+#define OUTPUT_TYPE_SIZE 16
+
+#define SCALAR_OPERANDS
+#define SCALAR_1 1
+#define SCALAR_2 2
+
+/* Include the template for unary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_u64_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_u64_f16_1.c
new file mode 100644
index 0000000..cfc33c2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_u64_f16_1.c
@@ -0,0 +1,29 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_fp16.h>
+
+/* Input values.  */
+float16_t input[] = { 2.5, 100, 7.1, 9.9, 5.0, 9.1, 4.8, 77 };
+uint64_t expected_1[] = { 5, 200, 14, 19, 10, 18, 9, 154 };
+uint64_t expected_2[] = { 10, 400, 28, 39, 20, 36, 19, 308 };
+
+#define TEST_MSG "VCVTH_N_U64_F16"
+#define INSN_NAME vcvth_n_u64_f16
+
+#define INPUT input
+#define EXPECTED_1 expected_1
+#define EXPECTED_2 expected_2
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE uint64_t
+#define OUTPUT_TYPE_SIZE 64
+
+#define SCALAR_OPERANDS
+#define SCALAR_1 1
+#define SCALAR_2 2
+
+/* Include the template for unary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_s16_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_s16_f16_1.c
new file mode 100644
index 0000000..9965654
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_s16_f16_1.c
@@ -0,0 +1,23 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_fp16.h>
+
+/* Input values.  */
+float16_t input[] = { 123.9, -56.8, 0.7, 24.6, -63.5, 169.4, -4.3, 77.0 };
+int16_t expected[] = { 123, -56, 0, 24, -63, 169, -4, 77 };
+
+#define TEST_MSG "VCVTH_S16_F16"
+#define INSN_NAME vcvth_s16_f16
+
+#define INPUT input
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE int16_t
+#define OUTPUT_TYPE_SIZE 16
+
+/* Include the template for unary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_s64_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_s64_f16_1.c
new file mode 100644
index 0000000..c7b3d17
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_s64_f16_1.c
@@ -0,0 +1,23 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_fp16.h>
+
+/* Input values.  */
+float16_t input[] = { 123.9, -56.8, 0.7, 24.6, -63.5, 169.4, -4.3, 77.0 };
+int64_t expected[] = { 123, -56, 0, 24, -63, 169, -4, 77 };
+
+#define TEST_MSG "VCVTH_S64_F16"
+#define INSN_NAME vcvth_s64_f16
+
+#define INPUT input
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE int64_t
+#define OUTPUT_TYPE_SIZE 64
+
+/* Include the template for unary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_u16_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_u16_f16_1.c
new file mode 100644
index 0000000..e3c5d3a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_u16_f16_1.c
@@ -0,0 +1,23 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_fp16.h>
+
+/* Input values.  */
+float16_t input[] = { 123.9, 56.8, 0.7, 24.6, 63.5, 169.4, 4.3, 77.0 };
+uint16_t expected[] = { 123, 56, 0, 24, 63, 169, 4, 77 };
+
+#define TEST_MSG "VCVTH_u16_F16"
+#define INSN_NAME vcvth_u16_f16
+
+#define INPUT input
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE uint16_t
+#define OUTPUT_TYPE_SIZE 16
+
+/* Include the template for unary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_u64_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_u64_f16_1.c
new file mode 100644
index 0000000..a904e5e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_u64_f16_1.c
@@ -0,0 +1,23 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_fp16.h>
+
+/* Input values.  */
+float16_t input[] = { 123.9, 56.8, 0.7, 24.6, 63.5, 169.4, 4.3, 77.0 };
+uint64_t expected[] = { 123, 56, 0, 24, 63, 169, 4, 77 };
+
+#define TEST_MSG "VCVTH_u64_F16"
+#define INSN_NAME vcvth_u64_f16
+
+#define INPUT input
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE uint64_t
+#define OUTPUT_TYPE_SIZE 64
+
+/* Include the template for unary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtmh_s16_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtmh_s16_f16_1.c
new file mode 100644
index 0000000..ef0132a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtmh_s16_f16_1.c
@@ -0,0 +1,23 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_fp16.h>
+
+/* Input values.  */
+float16_t input[] = { 123.9, -56.8, 0.7, 24.6, -63.5, 169.4, -4.3, 77.0 };
+int16_t expected[] = { 123, -57, 0, 24, -64, 169, -5, 77 };
+
+#define TEST_MSG "VCVTMH_S16_F16"
+#define INSN_NAME vcvtmh_s16_f16
+
+#define INPUT input
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE int16_t
+#define OUTPUT_TYPE_SIZE 16
+
+/* Include the template for unary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtmh_s64_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtmh_s64_f16_1.c
new file mode 100644
index 0000000..7b5b16f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtmh_s64_f16_1.c
@@ -0,0 +1,23 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_fp16.h>
+
+/* Input values.  */
+float16_t input[] = { 123.9, -56.8, 0.7, 24.6, -63.5, 169.4, -4.3, 77.0 };
+int64_t expected[] = { 123, -57, 0, 24, -64, 169, -5, 77 };
+
+#define TEST_MSG "VCVTMH_S64_F16"
+#define INSN_NAME vcvtmh_s64_f16
+
+#define INPUT input
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE int64_t
+#define OUTPUT_TYPE_SIZE 64
+
+/* Include the template for unary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtmh_u16_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtmh_u16_f16_1.c
new file mode 100644
index 0000000..db56171
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtmh_u16_f16_1.c
@@ -0,0 +1,23 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_fp16.h>
+
+/* Input values.  */
+float16_t input[] = { 123.9, 56.8, 0.7, 24.6, 63.5, 169.4, 4.3, 77.0 };
+uint16_t expected[] = { 123, 56, 0, 24, 63, 169, 4, 77 };
+
+#define TEST_MSG "VCVTMH_u16_F16"
+#define INSN_NAME vcvtmh_u16_f16
+
+#define INPUT input
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE uint16_t
+#define OUTPUT_TYPE_SIZE 16
+
+/* Include the template for unary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtmh_u64_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtmh_u64_f16_1.c
new file mode 100644
index 0000000..cae69a3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtmh_u64_f16_1.c
@@ -0,0 +1,23 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_fp16.h>
+
+/* Input values.  */
+float16_t input[] = { 123.9, 56.8, 0.7, 24.6, 63.5, 169.4, 4.3, 77.0 };
+uint64_t expected[] = { 123, 56, 0, 24, 63, 169, 4, 77 };
+
+#define TEST_MSG "VCVTMH_u64_F16"
+#define INSN_NAME vcvtmh_u64_f16
+
+#define INPUT input
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE uint64_t
+#define OUTPUT_TYPE_SIZE 64
+
+/* Include the template for unary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtnh_s16_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtnh_s16_f16_1.c
new file mode 100644
index 0000000..dec8d85
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtnh_s16_f16_1.c
@@ -0,0 +1,23 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_fp16.h>
+
+/* Input values.  */
+float16_t input[] = { 123.9, -56.8, 0.7, 24.6, -63.5, 169.4, -4.3, 77.0 };
+int16_t expected[] = { 124, -57, 1, 25, -64, 169, -4, 77 };
+
+#define TEST_MSG "VCVTNH_S16_F16"
+#define INSN_NAME vcvtnh_s16_f16
+
+#define INPUT input
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE int16_t
+#define OUTPUT_TYPE_SIZE 16
+
+/* Include the template for unary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtnh_s64_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtnh_s64_f16_1.c
new file mode 100644
index 0000000..0048b5b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtnh_s64_f16_1.c
@@ -0,0 +1,23 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_fp16.h>
+
+/* Input values.  */
+float16_t input[] = { 123.9, -56.8, 0.7, 24.6, -63.5, 169.4, -4.3, 77.0 };
+int64_t expected[] = { 124, -57, 1, 25, -64, 169, -4, 77 };
+
+#define TEST_MSG "VCVTNH_S64_F16"
+#define INSN_NAME vcvtnh_s64_f16
+
+#define INPUT input
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE int64_t
+#define OUTPUT_TYPE_SIZE 64
+
+/* Include the template for unary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtnh_u16_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtnh_u16_f16_1.c
new file mode 100644
index 0000000..0a95cea
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtnh_u16_f16_1.c
@@ -0,0 +1,23 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_fp16.h>
+
+/* Input values.  */
+float16_t input[] = { 123.9, 56.8, 0.7, 24.6, 63.5, 169.4, 4.3, 77.0 };
+uint16_t expected[] = { 124, 57, 1, 25, 64, 169, 4, 77 };
+
+#define TEST_MSG "VCVTNH_u16_F16"
+#define INSN_NAME vcvtnh_u16_f16
+
+#define INPUT input
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE uint16_t
+#define OUTPUT_TYPE_SIZE 16
+
+/* Include the template for unary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtnh_u64_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtnh_u64_f16_1.c
new file mode 100644
index 0000000..3b1b273
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtnh_u64_f16_1.c
@@ -0,0 +1,23 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_fp16.h>
+
+/* Input values.  */
+float16_t input[] = { 123.9, 56.8, 0.7, 24.6, 63.5, 169.4, 4.3, 77.0 };
+uint64_t expected[] = { 124, 57, 1, 25, 64, 169, 4, 77 };
+
+#define TEST_MSG "VCVTNH_u64_F16"
+#define INSN_NAME vcvtnh_u64_f16
+
+#define INPUT input
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE uint64_t
+#define OUTPUT_TYPE_SIZE 64
+
+/* Include the template for unary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtph_s16_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtph_s16_f16_1.c
new file mode 100644
index 0000000..5ff0d22
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtph_s16_f16_1.c
@@ -0,0 +1,23 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_fp16.h>
+
+/* Input values.  */
+float16_t input[] = { 123.9, -56.8, 0.7, 24.6, -63.5, 169.4, -4.3, 77.0 };
+int16_t expected[] = { 124, -56, 1, 25, -63, 170, -4, 77 };
+
+#define TEST_MSG "VCVTPH_S16_F16"
+#define INSN_NAME vcvtph_s16_f16
+
+#define INPUT input
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE int16_t
+#define OUTPUT_TYPE_SIZE 16
+
+/* Include the template for unary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtph_s64_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtph_s64_f16_1.c
new file mode 100644
index 0000000..290c5b1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtph_s64_f16_1.c
@@ -0,0 +1,23 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_fp16.h>
+
+/* Input values.  */
+float16_t input[] = { 123.9, -56.8, 0.7, 24.6, -63.5, 169.4, -4.3, 77.0 };
+int64_t expected[] = { 124, -56, 1, 25, -63, 170, -4, 77 };
+
+#define TEST_MSG "VCVTPH_S64_F16"
+#define INSN_NAME vcvtph_s64_f16
+
+#define INPUT input
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE int64_t
+#define OUTPUT_TYPE_SIZE 64
+
+/* Include the template for unary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtph_u16_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtph_u16_f16_1.c
new file mode 100644
index 0000000..e367dad
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtph_u16_f16_1.c
@@ -0,0 +1,23 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_fp16.h>
+
+/* Input values.  */
+float16_t input[] = { 123.9, 56.8, 0.7, 24.6, 63.5, 169.4, 4.3, 77.0 };
+uint16_t expected[] = { 124, 57, 1, 25, 64, 170, 5, 77 };
+
+#define TEST_MSG "VCVTPH_u16_F16"
+#define INSN_NAME vcvtph_u16_f16
+
+#define INPUT input
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE uint16_t
+#define OUTPUT_TYPE_SIZE 16
+
+/* Include the template for unary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtph_u64_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtph_u64_f16_1.c
new file mode 100644
index 0000000..0229099
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtph_u64_f16_1.c
@@ -0,0 +1,23 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_fp16.h>
+
+/* Input values.  */
+float16_t input[] = { 123.9, 56.8, 0.7, 24.6, 63.5, 169.4, 4.3, 77.0 };
+uint64_t expected[] = { 124, 57, 1, 25, 64, 170, 5, 77 };
+
+#define TEST_MSG "VCVTPH_u64_F16"
+#define INSN_NAME vcvtph_u64_f16
+
+#define INPUT input
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE uint64_t
+#define OUTPUT_TYPE_SIZE 64
+
+/* Include the template for unary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfmash_lane_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfmash_lane_f16_1.c
new file mode 100644
index 0000000..ea751da
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfmash_lane_f16_1.c
@@ -0,0 +1,143 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon } */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FP16_C(a) ((__fp16) a)
+#define A0 FP16_C (123.4)
+#define B0 FP16_C (-5.8)
+#define C0 FP16_C (-3.8)
+#define D0 FP16_C (10)
+
+#define A1 FP16_C (12.4)
+#define B1 FP16_C (-5.8)
+#define C1 FP16_C (90.8)
+#define D1 FP16_C (24)
+
+#define A2 FP16_C (23.4)
+#define B2 FP16_C (-5.8)
+#define C2 FP16_C (8.9)
+#define D2 FP16_C (4)
+
+#define E0 FP16_C (3.4)
+#define F0 FP16_C (-55.8)
+#define G0 FP16_C (-31.8)
+#define H0 FP16_C (2)
+
+#define E1 FP16_C (123.4)
+#define F1 FP16_C (-5.8)
+#define G1 FP16_C (-3.8)
+#define H1 FP16_C (102)
+
+#define E2 FP16_C (4.9)
+#define F2 FP16_C (-15.8)
+#define G2 FP16_C (39.8)
+#define H2 FP16_C (49)
+
+extern void abort ();
+
+float16_t src1[8] = { A0, B0, C0, D0, E0, F0, G0, H0 };
+float16_t src2[8] = { A1, B1, C1, D1, E1, F1, G1, H1 };
+VECT_VAR_DECL (src3, float, 16, 4) [] = { A2, B2, C2, D2 };
+VECT_VAR_DECL (src3, float, 16, 8) [] = { A2, B2, C2, D2, E2, F2, G2, H2 };
+
+/* Expected results for vfmah_lane_f16.  */
+uint16_t expected[4] = { 0x5E76 /* A0 + A1 * A2.  */,
+			 0x4EF6 /* B0 + B1 * B2.  */,
+			 0x6249 /* C0 + C1 * C2.  */,
+			 0x56A0 /* D0 + D1 * D2.  */ };
+
+/* Expected results for vfmah_laneq_f16.  */
+uint16_t expected_laneq[8] = { 0x5E76 /* A0 + A1 * A2.  */,
+			       0x4EF6 /* B0 + B1 * B2.  */,
+			       0x6249 /* C0 + C1 * C2.  */,
+			       0x56A0 /* D0 + D1 * D2.  */,
+			       0x60BF /* E0 + E1 * E2.  */,
+			       0x507A /* F0 + F1 * F2.  */,
+			       0xD9B9 /* G0 + G1 * G2.  */,
+			       0x6CE2 /* H0 + H1 * H2.  */ };
+
+/* Expected results for vfmsh_lane_f16.  */
+uint16_t expected_fms[4] = { 0xD937 /* A0 + -A1 * A2.  */,
+			     0xD0EE /* B0 + -B1 * B2.  */,
+			     0xE258 /* C0 + -C1 * C2.  */,
+			     0xD560 /* D0 + -D1 * D2.  */ };
+
+/* Expected results for vfmsh_laneq_f16.  */
+uint16_t expected_fms_laneq[8] = { 0xD937 /* A0 + -A1 * A2.  */,
+				   0xD0EE /* B0 + -B1 * B2.  */,
+				   0xE258 /* C0 + -C1 * C2.  */,
+				   0xD560 /* D0 + -D1 * D2.  */,
+				   0xE0B2 /* E0 + -E1 * E2.  */,
+				   0xD89C /* F0 + -F1 * F2.  */,
+				   0x5778 /* G0 + -G1 * G2.  */,
+				   0xECE1 /* H0 + -H1 * H2.  */ };
+
+void exec_vfmash_lane_f16 (void)
+{
+#define CHECK_LANE(N) \
+  ret = vfmah_lane_f16 (src1[N], src2[N], VECT_VAR (vsrc3, float, 16, 4), N);\
+  if (*(uint16_t *) &ret != expected[N])\
+    abort ();
+
+  DECL_VARIABLE(vsrc3, float, 16, 4);
+  VLOAD (vsrc3, src3, , float, f, 16, 4);
+  float16_t ret;
+  CHECK_LANE(0)
+  CHECK_LANE(1)
+  CHECK_LANE(2)
+  CHECK_LANE(3)
+
+#undef CHECK_LANE
+#define CHECK_LANE(N) \
+  ret = vfmah_laneq_f16 (src1[N], src2[N], VECT_VAR (vsrc3, float, 16, 8), N);\
+  if (*(uint16_t *) &ret != expected_laneq[N]) \
+	  abort ();
+
+  DECL_VARIABLE(vsrc3, float, 16, 8);
+  VLOAD (vsrc3, src3, q, float, f, 16, 8);
+  CHECK_LANE(0)
+  CHECK_LANE(1)
+  CHECK_LANE(2)
+  CHECK_LANE(3)
+  CHECK_LANE(4)
+  CHECK_LANE(5)
+  CHECK_LANE(6)
+  CHECK_LANE(7)
+
+#undef CHECK_LANE
+#define CHECK_LANE(N) \
+  ret = vfmsh_lane_f16 (src1[N], src2[N], VECT_VAR (vsrc3, float, 16, 4), N);\
+  if (*(uint16_t *) &ret != expected_fms[N])\
+    abort ();
+
+  CHECK_LANE(0)
+  CHECK_LANE(1)
+  CHECK_LANE(2)
+
+#undef CHECK_LANE
+#define CHECK_LANE(N) \
+  ret = vfmsh_laneq_f16 (src1[N], src2[N], VECT_VAR (vsrc3, float, 16, 8), N);\
+  if (*(uint16_t *) &ret != expected_fms_laneq[N]) \
+	  abort ();
+
+  CHECK_LANE(0)
+  CHECK_LANE(1)
+  CHECK_LANE(2)
+  CHECK_LANE(3)
+  CHECK_LANE(4)
+  CHECK_LANE(5)
+  CHECK_LANE(6)
+  CHECK_LANE(7)
+}
+
+int
+main (void)
+{
+  exec_vfmash_lane_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmaxh_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmaxh_f16_1.c
new file mode 100644
index 0000000..182463e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmaxh_f16_1.c
@@ -0,0 +1,34 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_fp16.h>
+
+/* Input values.  */
+#define A 123.4
+#define B -567.8
+#define C -34.8
+#define D 1024
+#define E 663.1
+#define F 169.1
+#define G -4.8
+#define H 77
+
+float16_t input_1[] = { A, B, C, D };
+float16_t input_2[] = { E, F, G, H };
+float16_t expected[] = { E, F, G, D };
+
+#define TEST_MSG "VMAXH_F16"
+#define INSN_NAME vmaxh_f16
+
+#define INPUT_1 input_1
+#define INPUT_2 input_2
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE float16_t
+#define OUTPUT_TYPE_SIZE 16
+
+/* Include the template for unary scalar operations.  */
+#include "binary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vminh_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vminh_f16_1.c
new file mode 100644
index 0000000..d8efbca
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vminh_f16_1.c
@@ -0,0 +1,34 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_fp16.h>
+
+/* Input values.  */
+#define A 123.4
+#define B -567.8
+#define C -34.8
+#define D 1024
+#define E 663.1
+#define F 169.1
+#define G -4.8
+#define H 77
+
+float16_t input_1[] = { A, B, C, D };
+float16_t input_2[] = { E, F, G, H };
+float16_t expected[] = { A, B, C, H };
+
+#define TEST_MSG "VMINH_F16"
+#define INSN_NAME vminh_f16
+
+#define INPUT_1 input_1
+#define INPUT_2 input_2
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE float16_t
+#define OUTPUT_TYPE_SIZE 16
+
+/* Include the template for unary scalar operations.  */
+#include "binary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulh_lane_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulh_lane_f16_1.c
new file mode 100644
index 0000000..4cd5c37
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulh_lane_f16_1.c
@@ -0,0 +1,90 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon } */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (13.4)
+#define B FP16_C (-56.8)
+#define C FP16_C (-34.8)
+#define D FP16_C (12)
+#define E FP16_C (63.1)
+#define F FP16_C (19.1)
+#define G FP16_C (-4.8)
+#define H FP16_C (77)
+
+#define I FP16_C (0.7)
+#define J FP16_C (-78)
+#define K FP16_C (11.23)
+#define L FP16_C (98)
+#define M FP16_C (87.1)
+#define N FP16_C (-8)
+#define O FP16_C (-1.1)
+#define P FP16_C (-9.7)
+
+extern void abort ();
+
+float16_t src1[8] = { A, B, C, D, I, J, K, L };
+VECT_VAR_DECL (src2, float, 16, 4) [] = { E, F, G, H };
+VECT_VAR_DECL (src2, float, 16, 8) [] = { E, F, G, H, M, N, O, P };
+
+/* Expected results for vmulh_lane.  */
+uint16_t expected[4] = { 0x629B /* A * E.  */, 0xE43D /* B * F.  */,
+			 0x5939 /* C * G.  */, 0x6338 /* D * H.  */ };
+
+
+/* Expected results for vmulh_lane.  */
+uint16_t expected_laneq[8] = { 0x629B /* A * E.  */,
+			       0xE43D /* B * F.  */,
+			       0x5939 /* C * G.  */,
+			       0x6338 /* D * H.  */,
+			       0x53A0 /* I * M.  */,
+			       0x60E0 /* J * N.  */,
+			       0xCA2C /* K * O.  */,
+			       0xE36E /* L * P.  */ };
+
+void exec_vmulh_lane_f16 (void)
+{
+#define CHECK_LANE(N)\
+  ret = vmulh_lane_f16 (src1[N], VECT_VAR (vsrc2, float, 16, 4), N);\
+  if (*(uint16_t *) &ret != expected[N])\
+    abort ();
+
+  DECL_VARIABLE(vsrc2, float, 16, 4);
+  VLOAD (vsrc2, src2, , float, f, 16, 4);
+  float16_t ret;
+
+  CHECK_LANE(0)
+  CHECK_LANE(1)
+  CHECK_LANE(2)
+  CHECK_LANE(3)
+
+#undef CHECK_LANE
+#define CHECK_LANE(N)\
+  ret = vmulh_laneq_f16 (src1[N], VECT_VAR (vsrc2, float, 16, 8), N);\
+  if (*(uint16_t *) &ret != expected_laneq[N])\
+    abort ();
+
+  DECL_VARIABLE(vsrc2, float, 16, 8);
+  VLOAD (vsrc2, src2, q, float, f, 16, 8);
+
+  CHECK_LANE(0)
+  CHECK_LANE(1)
+  CHECK_LANE(2)
+  CHECK_LANE(3)
+  CHECK_LANE(4)
+  CHECK_LANE(5)
+  CHECK_LANE(6)
+  CHECK_LANE(7)
+}
+
+int
+main (void)
+{
+  exec_vmulh_lane_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulxh_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulxh_f16_1.c
new file mode 100644
index 0000000..66c744c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulxh_f16_1.c
@@ -0,0 +1,50 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_fp16.h>
+
+/* Input values.  */
+#define A 13.4
+#define B __builtin_inff ()
+#define C -34.8
+#define D -__builtin_inff ()
+#define E 63.1
+#define F 0.0
+#define G -4.8
+#define H 0.0
+
+#define I 0.7
+#define J -__builtin_inff ()
+#define K 11.23
+#define L 98
+#define M 87.1
+#define N -0.0
+#define O -1.1
+#define P 7
+
+float16_t input_1[] = { A, B, C, D, I, J, K, L };
+float16_t input_2[] = { E, F, G, H, M, N, O, P };
+uint16_t expected[] = { 0x629B /* A * E.  */,
+			0x4000 /* FP16_C (2.0f).  */,
+			0x5939 /* C * G.  */,
+			0xC000 /* FP16_C (-2.0f).  */,
+			0x53A0 /* I * M.  */,
+			0x4000 /* FP16_C (2.0f).  */,
+			0xCA2C /* K * O.  */,
+			0x615C /* L * P.  */ };
+
+#define TEST_MSG "VMULXH_F16"
+#define INSN_NAME vmulxh_f16
+
+#define INPUT_1 input_1
+#define INPUT_2 input_2
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE float16_t
+#define OUTPUT_TYPE_SIZE 16
+
+/* Include the template for unary scalar operations.  */
+#include "binary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulxh_lane_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulxh_lane_f16_1.c
new file mode 100644
index 0000000..90a5be8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulxh_lane_f16_1.c
@@ -0,0 +1,91 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon } */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (13.4)
+#define B FP16_C (__builtin_inff ())
+#define C FP16_C (-34.8)
+#define D FP16_C (-__builtin_inff ())
+#define E FP16_C (63.1)
+#define F FP16_C (0.0)
+#define G FP16_C (-4.8)
+#define H FP16_C (0.0)
+
+#define I FP16_C (0.7)
+#define J FP16_C (-__builtin_inff ())
+#define K FP16_C (11.23)
+#define L FP16_C (98)
+#define M FP16_C (87.1)
+#define N FP16_C (-0.0)
+#define O FP16_C (-1.1)
+#define P FP16_C (7)
+
+extern void abort ();
+
+float16_t src1[8] = { A, B, C, D, I, J, K, L };
+VECT_VAR_DECL (src2, float, 16, 4) [] = { E, F, G, H };
+VECT_VAR_DECL (src2, float, 16, 8) [] = { E, F, G, H, M, N, O, P };
+
+/* Expected results for vmulxh_lane.  */
+uint16_t expected[4] = { 0x629B /* A * E.  */,
+			 0x4000 /* FP16_C (2.0f).  */,
+			 0x5939 /* C * G.  */,
+			 0xC000 /* FP16_C (-2.0f).  */ };
+
+/* Expected results for vmulxh_lane.  */
+uint16_t expected_laneq[8] = { 0x629B /* A * E.  */,
+			       0x4000 /* FP16_C (2.0f).  */,
+			       0x5939 /* C * G.  */,
+			       0xC000 /* FP16_C (-2.0f).  */,
+			       0x53A0 /* I * M.  */,
+			       0x4000 /* FP16_C (2.0f).  */,
+			       0xCA2C /* K * O.  */,
+			       0x615C /* L * P.  */ };
+
+void exec_vmulxh_lane_f16 (void)
+{
+#define CHECK_LANE(N)\
+  ret = vmulxh_lane_f16 (src1[N], VECT_VAR (vsrc2, float, 16, 4), N);\
+  if (*(uint16_t *) &ret != expected[N])\
+    abort ();
+
+  DECL_VARIABLE(vsrc2, float, 16, 4);
+  VLOAD (vsrc2, src2, , float, f, 16, 4);
+  float16_t ret;
+
+  CHECK_LANE(0)
+  CHECK_LANE(1)
+  CHECK_LANE(2)
+  CHECK_LANE(3)
+
+#undef CHECK_LANE
+#define CHECK_LANE(N)\
+  ret = vmulxh_laneq_f16 (src1[N], VECT_VAR (vsrc2, float, 16, 8), N);\
+  if (*(uint16_t *) &ret != expected_laneq[N])\
+    abort ();
+
+  DECL_VARIABLE(vsrc2, float, 16, 8);
+  VLOAD (vsrc2, src2, q, float, f, 16, 8);
+
+  CHECK_LANE(0)
+  CHECK_LANE(1)
+  CHECK_LANE(2)
+  CHECK_LANE(3)
+  CHECK_LANE(4)
+  CHECK_LANE(5)
+  CHECK_LANE(6)
+  CHECK_LANE(7)
+}
+
+int
+main (void)
+{
+  exec_vmulxh_lane_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrecpeh_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrecpeh_f16_1.c
new file mode 100644
index 0000000..3740d6a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrecpeh_f16_1.c
@@ -0,0 +1,42 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_fp16.h>
+
+/* Input values.  */
+#define A 123.4
+#define B 567.8
+#define C 34.8
+#define D 1024
+#define E 663.1
+#define F 144.0
+#define G 4.8
+#define H 77
+
+#define RECP_A 0x2028 /* 1/A.  */
+#define RECP_B 0x1734 /* 1/B.  */
+#define RECP_C 0x275C /* 1/C.  */
+#define RECP_D 0x13FC /* 1/D.  */
+#define RECP_E 0x162C /* 1/E.  */
+#define RECP_F 0x1F18 /* 1/F.  */
+#define RECP_G 0x32A8 /* 1/G.  */
+#define RECP_H 0x22A4 /* 1/H.  */
+
+float16_t input[] = { A, B, C, D, E, F, G, H };
+uint16_t expected[] = { RECP_A, RECP_B, RECP_C, RECP_D,
+		        RECP_E, RECP_F, RECP_G, RECP_H };
+
+#define TEST_MSG "VRECPEH_F16"
+#define INSN_NAME vrecpeh_f16
+
+#define INPUT input
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE float16_t
+#define OUTPUT_TYPE_SIZE 16
+
+/* Include the template for unary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrecpsh_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrecpsh_f16_1.c
new file mode 100644
index 0000000..3e6b24e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrecpsh_f16_1.c
@@ -0,0 +1,50 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_fp16.h>
+
+/* Input values.  */
+#define A 12.4
+#define B -5.8
+#define C -3.8
+#define D 10
+#define E 66.1
+#define F 16.1
+#define G -4.8
+#define H -77
+
+#define I 0.7
+#define J -78
+#define K 10.23
+#define L 98
+#define M 87
+#define N -87.81
+#define O -1.1
+#define P 47.8
+
+float16_t input_1[] = { A, B, C, D, I, J, K, L };
+float16_t input_2[] = { E, F, G, H, M, N, O, P };
+uint16_t expected[] = { 0xE264 /* 2.0f - A * E.  */,
+			0x55F6 /* 2.0f - B * F.  */,
+			0xCC10 /* 2.0f - C * G.  */,
+			0x6208 /* 2.0f - D * H.  */,
+			0xD35D /* 2.0f - I * M.  */,
+			0xEEB0 /* 2.0f - J * N.  */,
+			0x4A9F /* 2.0f - K * O.  */,
+			0xEC93 /* 2.0f - L * P.  */ };
+
+#define TEST_MSG "VRECPSH_F16"
+#define INSN_NAME vrecpsh_f16
+
+#define INPUT_1 input_1
+#define INPUT_2 input_2
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE float16_t
+#define OUTPUT_TYPE_SIZE 16
+
+/* Include the template for unary scalar operations.  */
+#include "binary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrecpxh_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrecpxh_f16_1.c
new file mode 100644
index 0000000..fc02b6b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrecpxh_f16_1.c
@@ -0,0 +1,32 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_fp16.h>
+
+/* Input values.  */
+
+float16_t input[] = { 123.4, 567.8, 34.8, 1024, 663.1, 144.0, 4.8, 77 };
+/*  Expected results are calculated by:
+  for (index = 0; index < 8; index++)
+    {
+      uint16_t src_cast = * (uint16_t *) &src[index];
+      * (uint16_t *) &expected[index] =
+	(src_cast & 0x8000) | (~src_cast & 0x7C00);
+    }  */
+uint16_t expected[8] = { 0x2800, 0x1C00, 0x2C00, 0x1800,
+			 0x1C00, 0x2400, 0x3800, 0x2800 };
+
+#define TEST_MSG "VRECPXH_F16"
+#define INSN_NAME vrecpxh_f16
+
+#define INPUT input
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE float16_t
+#define OUTPUT_TYPE_SIZE 16
+
+/* Include the template for unary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrsqrteh_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrsqrteh_f16_1.c
new file mode 100644
index 0000000..7c0e619
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrsqrteh_f16_1.c
@@ -0,0 +1,30 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_fp16.h>
+
+/* Input values.  */
+float16_t input[] = { 123.4, 67.8, 34.8, 24.0, 66.1, 144.0, 4.8, 77.0 };
+uint16_t expected[] = { 0x2DC4 /* FP16_C (1/__builtin_sqrtf (123.4)).  */,
+			0x2FC8 /* FP16_C (1/__builtin_sqrtf (67.8)).  */,
+			0x316C /* FP16_C (1/__builtin_sqrtf (34.8)).  */,
+			0x3288 /* FP16_C (1/__builtin_sqrtf (24.0)).  */,
+			0x2FDC /* FP16_C (1/__builtin_sqrtf (66.1)).  */,
+			0x2D54 /* FP16_C (1/__builtin_sqrtf (144.0)).  */,
+			0x3750 /* FP16_C (1/__builtin_sqrtf (4.8)).  */,
+			0x2F48 /* FP16_C (1/__builtin_sqrtf (77.0)).  */ };
+
+#define TEST_MSG "VRSQRTEH_F16"
+#define INSN_NAME vrsqrteh_f16
+
+#define INPUT input
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE float16_t
+#define OUTPUT_TYPE_SIZE 16
+
+/* Include the template for unary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrsqrtsh_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrsqrtsh_f16_1.c
new file mode 100644
index 0000000..a9753a4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrsqrtsh_f16_1.c
@@ -0,0 +1,50 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_fp16.h>
+
+/* Input values.  */
+#define A 12.4
+#define B -5.8
+#define C -3.8
+#define D 10
+#define E 66.1
+#define F 16.1
+#define G -4.8
+#define H -77
+
+#define I 0.7
+#define J -78
+#define K 10.23
+#define L 98
+#define M 87
+#define N -87.81
+#define O -1.1
+#define P 47.8
+
+float16_t input_1[] = { A, B, C, D, I, J, K, L };
+float16_t input_2[] = { E, F, G, H, M, N, O, P };
+uint16_t expected[] = { 0xDE62 /* (3.0f + (-A) * E) / 2.0f.  */,
+			0x5206 /* (3.0f + (-B) * F) / 2.0f.  */,
+			0xC7A0 /* (3.0f + (-C) * G) / 2.0f.  */,
+			0x5E0A /* (3.0f + (-D) * H) / 2.0f.  */,
+			0xCF3D /* (3.0f + (-I) * M) / 2.0f.  */,
+			0xEAB0 /* (3.0f + (-J) * N) / 2.0f.  */,
+			0x471F /* (3.0f + (-K) * O) / 2.0f.  */,
+			0xE893 /* (3.0f + (-L) * P) / 2.0f.  */ };
+
+#define TEST_MSG "VRSQRTSH_F16"
+#define INSN_NAME vrsqrtsh_f16
+
+#define INPUT_1 input_1
+#define INPUT_2 input_2
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE float16_t
+#define OUTPUT_TYPE_SIZE 16
+
+/* Include the template for unary scalar operations.  */
+#include "binary_scalar_op.inc"
-- 
2.5.0




^ permalink raw reply	[flat|nested] 32+ messages in thread

* [AArch64][12/14] ARMv8.2-A testsuite for new data movement intrinsics
       [not found]                     ` <135287e5-6fc1-4957-d320-16f38260fa28@foss.arm.com>
       [not found]                       ` <cdb3640f-134a-f2be-c728-b1467fb7aaf9@foss.arm.com>
@ 2016-07-07 16:19                       ` Jiong Wang
  2016-10-10  9:55                         ` James Greenhalgh
  1 sibling, 1 reply; 32+ messages in thread
From: Jiong Wang @ 2016-07-07 16:19 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 640 bytes --]

This patch contains testcases for those new scalar intrinsics which are only
available for AArch64.

gcc/testsuite/
2016-07-07  Jiong Wang <jiong.wang@arm.com>

         * gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h 
(FP16_SUPPORTED):
         Enable AArch64.
         * gcc.target/aarch64/advsimd-intrinsics/vdup_lane.c: Add 
support for
         vdup*_laneq.
         * gcc.target/aarch64/advsimd-intrinsics/vduph_lane.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vtrn_half.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vuzp_half.c: New.
         * gcc.target/aarch64/advsimd-intrinsics/vzip_half.c: New.


[-- Attachment #2: 0012-12-14-TESTSUITE-for-new-data-movement-intrinsics.patch --]
[-- Type: text/x-patch, Size: 41926 bytes --]

From 7bf705fa1bacf7a0275b28e6bfa33397f7037415 Mon Sep 17 00:00:00 2001
From: Jiong Wang <jiong.wang@arm.com>
Date: Wed, 6 Jul 2016 14:51:35 +0100
Subject: [PATCH 12/14] [12/14] TESTSUITE for new data movement intrinsics

---
 .../aarch64/advsimd-intrinsics/arm-neon-ref.h      |  16 +-
 .../aarch64/advsimd-intrinsics/vdup_lane.c         | 119 +++++++++-
 .../aarch64/advsimd-intrinsics/vduph_lane.c        | 137 +++++++++++
 .../aarch64/advsimd-intrinsics/vtrn_half.c         | 263 +++++++++++++++++++++
 .../aarch64/advsimd-intrinsics/vuzp_half.c         | 259 ++++++++++++++++++++
 .../aarch64/advsimd-intrinsics/vzip_half.c         | 263 +++++++++++++++++++++
 6 files changed, 1042 insertions(+), 15 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vduph_lane.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vtrn_half.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vuzp_half.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vzip_half.c

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
index 1297137..4621415 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
@@ -17,9 +17,8 @@ extern void *memcpy(void *, const void *, size_t);
 extern size_t strlen(const char *);
 
 /* Helper macro to select FP16 tests.  */
-#if (!defined (__aarch64__)						\
-     && (defined (__ARM_FP16_FORMAT_IEEE)				\
-	 || defined (__ARM_FP16_FORMAT_ALTERNATIVE)))
+#if (defined (__ARM_FP16_FORMAT_IEEE) \
+     || defined (__ARM_FP16_FORMAT_ALTERNATIVE))
 #define FP16_SUPPORTED (1)
 #else
 #undef FP16_SUPPORTED
@@ -520,17 +519,6 @@ static void clean_results (void)
 /* Helpers to initialize vectors.  */
 #define VDUP(VAR, Q, T1, T2, W, N, V)			\
   VECT_VAR(VAR, T1, W, N) = vdup##Q##_n_##T2##W(V)
-#if (defined (__aarch64__)						\
-     && (defined (__ARM_FP16_FORMAT_IEEE)				\
-	 || defined (__ARM_FP16_FORMAT_ALTERNATIVE)))
-/* Work around that there is no vdup_n_f16 intrinsic.  */
-#define vdup_n_f16(VAL)		\
-  __extension__			\
-    ({				\
-      float16_t f = VAL;	\
-      vld1_dup_f16(&f);		\
-    })
-#endif
 
 #define VSET_LANE(VAR, Q, T1, T2, W, N, L, V)				\
   VECT_VAR(VAR, T1, W, N) = vset##Q##_lane_##T2##W(V,			\
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vdup_lane.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vdup_lane.c
index c4b8f14..5d0dba3 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vdup_lane.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vdup_lane.c
@@ -56,7 +56,7 @@ VECT_VAR_DECL (expected, hfloat, 16, 8) [] = { 0xca80, 0xca80,
 VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0xc1700000, 0xc1700000,
 					   0xc1700000, 0xc1700000 };
 
-#define TEST_MSG "VDUP_LANE/VDUP_LANEQ"
+#define TEST_MSG "VDUP_LANE/VDUPQ_LANE"
 void exec_vdup_lane (void)
 {
   /* Basic test: vec1=vdup_lane(vec2, lane), then store the result.  */
@@ -114,6 +114,123 @@ void exec_vdup_lane (void)
 #else
   CHECK_RESULTS_NO_FP16 (TEST_MSG, "");
 #endif
+
+#if defined (__aarch64__)
+
+#undef TEST_MSG
+#define TEST_MSG "VDUP_LANEQ/VDUPQ_LANEQ"
+
+  /* Expected results for vdup*_laneq tests.  */
+VECT_VAR_DECL(expected2,int,8,8) [] = { 0xfd, 0xfd, 0xfd, 0xfd,
+					0xfd, 0xfd, 0xfd, 0xfd };
+VECT_VAR_DECL(expected2,int,16,4) [] = { 0xfff2, 0xfff2, 0xfff2, 0xfff2 };
+VECT_VAR_DECL(expected2,int,32,2) [] = { 0xfffffff1, 0xfffffff1 };
+VECT_VAR_DECL(expected2,int,64,1) [] = { 0xfffffffffffffff0 };
+VECT_VAR_DECL(expected2,uint,8,8) [] = { 0xff, 0xff, 0xff, 0xff,
+					 0xff, 0xff, 0xff, 0xff };
+VECT_VAR_DECL(expected2,uint,16,4) [] = { 0xfff3, 0xfff3, 0xfff3, 0xfff3 };
+VECT_VAR_DECL(expected2,uint,32,2) [] = { 0xfffffff1, 0xfffffff1 };
+VECT_VAR_DECL(expected2,uint,64,1) [] = { 0xfffffffffffffff0 };
+VECT_VAR_DECL(expected2,poly,8,8) [] = { 0xf7, 0xf7, 0xf7, 0xf7,
+					 0xf7, 0xf7, 0xf7, 0xf7 };
+VECT_VAR_DECL(expected2,poly,16,4) [] = { 0xfff3, 0xfff3, 0xfff3, 0xfff3 };
+VECT_VAR_DECL(expected2,hfloat,32,2) [] = { 0xc1700000, 0xc1700000 };
+#if defined (FP16_SUPPORTED)
+VECT_VAR_DECL (expected2, hfloat, 16, 4) [] = { 0xca80, 0xca80,
+						0xca80, 0xca80 };
+#endif
+VECT_VAR_DECL(expected2,int,8,16) [] = { 0xfb, 0xfb, 0xfb, 0xfb,
+					 0xfb, 0xfb, 0xfb, 0xfb,
+					 0xfb, 0xfb, 0xfb, 0xfb,
+					 0xfb, 0xfb, 0xfb, 0xfb };
+VECT_VAR_DECL(expected2,int,16,8) [] = { 0xfff7, 0xfff7, 0xfff7, 0xfff7,
+					 0xfff7, 0xfff7, 0xfff7, 0xfff7 };
+VECT_VAR_DECL(expected2,int,32,4) [] = { 0xfffffff1, 0xfffffff1,
+					 0xfffffff1, 0xfffffff1 };
+VECT_VAR_DECL(expected2,int,64,2) [] = { 0xfffffffffffffff0,
+					 0xfffffffffffffff0 };
+VECT_VAR_DECL(expected2,uint,8,16) [] = { 0xf5, 0xf5, 0xf5, 0xf5,
+					  0xf5, 0xf5, 0xf5, 0xf5,
+					  0xf5, 0xf5, 0xf5, 0xf5,
+					  0xf5, 0xf5, 0xf5, 0xf5 };
+VECT_VAR_DECL(expected2,uint,16,8) [] = { 0xfff1, 0xfff1, 0xfff1, 0xfff1,
+					  0xfff1, 0xfff1, 0xfff1, 0xfff1 };
+VECT_VAR_DECL(expected2,uint,32,4) [] = { 0xfffffff0, 0xfffffff0,
+					  0xfffffff0, 0xfffffff0 };
+VECT_VAR_DECL(expected2,uint,64,2) [] = { 0xfffffffffffffff0,
+					  0xfffffffffffffff0 };
+VECT_VAR_DECL(expected2,poly,8,16) [] = { 0xf5, 0xf5, 0xf5, 0xf5,
+					  0xf5, 0xf5, 0xf5, 0xf5,
+					  0xf5, 0xf5, 0xf5, 0xf5,
+					  0xf5, 0xf5, 0xf5, 0xf5 };
+VECT_VAR_DECL(expected2,poly,16,8) [] = { 0xfff1, 0xfff1, 0xfff1, 0xfff1,
+					  0xfff1, 0xfff1, 0xfff1, 0xfff1 };
+#if defined (FP16_SUPPORTED)
+VECT_VAR_DECL (expected2, hfloat, 16, 8) [] = { 0xc880, 0xc880,
+						0xc880, 0xc880,
+						0xc880, 0xc880,
+						0xc880, 0xc880 };
+#endif
+VECT_VAR_DECL(expected2,hfloat,32,4) [] = { 0xc1700000, 0xc1700000,
+					    0xc1700000, 0xc1700000 };
+
+  /* Clean all results for vdup*_laneq tests.  */
+  clean_results ();
+  /* Basic test: vec1=vdup_lane(vec2, lane), then store the result.  */
+#define TEST_VDUP_LANEQ(Q, T1, T2, W, N, N2, L)				\
+  VECT_VAR(vector_res, T1, W, N) =					\
+    vdup##Q##_laneq_##T2##W(VECT_VAR(vector, T1, W, N2), L);		\
+  vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(vector_res, T1, W, N))
+
+  /* Input vector can only have 64 bits.  */
+  DECL_VARIABLE_128BITS_VARIANTS(vector);
+
+  clean_results ();
+
+  TEST_MACRO_128BITS_VARIANTS_2_5(VLOAD, vector, buffer);
+#if defined (FP16_SUPPORTED)
+  VLOAD(vector, buffer, q, float, f, 16, 8);
+#endif
+  VLOAD(vector, buffer, q, float, f, 32, 4);
+
+  /* Choose lane arbitrarily.  */
+  TEST_VDUP_LANEQ(, int, s, 8, 8, 16, 13);
+  TEST_VDUP_LANEQ(, int, s, 16, 4, 8, 2);
+  TEST_VDUP_LANEQ(, int, s, 32, 2, 4, 1);
+  TEST_VDUP_LANEQ(, int, s, 64, 1, 2, 0);
+  TEST_VDUP_LANEQ(, uint, u, 8, 8, 16, 15);
+  TEST_VDUP_LANEQ(, uint, u, 16, 4, 8, 3);
+  TEST_VDUP_LANEQ(, uint, u, 32, 2, 4, 1);
+  TEST_VDUP_LANEQ(, uint, u, 64, 1, 2, 0);
+  TEST_VDUP_LANEQ(, poly, p, 8, 8, 16, 7);
+  TEST_VDUP_LANEQ(, poly, p, 16, 4, 8, 3);
+#if defined (FP16_SUPPORTED)
+  TEST_VDUP_LANEQ(, float, f, 16, 4, 8, 3);
+#endif
+  TEST_VDUP_LANEQ(, float, f, 32, 2, 4, 1);
+
+  TEST_VDUP_LANEQ(q, int, s, 8, 16, 16, 11);
+  TEST_VDUP_LANEQ(q, int, s, 16, 8, 8, 7);
+  TEST_VDUP_LANEQ(q, int, s, 32, 4, 4, 1);
+  TEST_VDUP_LANEQ(q, int, s, 64, 2, 2, 0);
+  TEST_VDUP_LANEQ(q, uint, u, 8, 16, 16, 5);
+  TEST_VDUP_LANEQ(q, uint, u, 16, 8, 8, 1);
+  TEST_VDUP_LANEQ(q, uint, u, 32, 4, 4, 0);
+  TEST_VDUP_LANEQ(q, uint, u, 64, 2, 2, 0);
+  TEST_VDUP_LANEQ(q, poly, p, 8, 16, 16, 5);
+  TEST_VDUP_LANEQ(q, poly, p, 16, 8, 8, 1);
+#if defined (FP16_SUPPORTED)
+  TEST_VDUP_LANEQ(q, float, f, 16, 8, 8, 7);
+#endif
+  TEST_VDUP_LANEQ(q, float, f, 32, 4, 4, 1);
+
+  CHECK_RESULTS_NAMED (TEST_MSG, expected2, "");
+#if defined (FP16_SUPPORTED)
+  CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected2, "");
+  CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected2, "");
+#endif
+
+#endif /* __aarch64__.  */
 }
 
 int main (void)
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vduph_lane.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vduph_lane.c
new file mode 100644
index 0000000..c9d553a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vduph_lane.c
@@ -0,0 +1,137 @@
+/* { dg-do run } */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define A -16
+#define B -15
+#define C -14
+#define D -13
+#define E -12
+#define F -11
+#define G -10
+#define H -9
+
+#define F16_C(a) ((__fp16) a)
+#define AF F16_C (A)
+#define BF F16_C (B)
+#define CF F16_C (C)
+#define DF F16_C (D)
+#define EF F16_C (E)
+#define FF F16_C (F)
+#define GF F16_C (G)
+#define HF F16_C (H)
+
+#define S16_C(a) ((int16_t) a)
+#define AS S16_C (A)
+#define BS S16_C (B)
+#define CS S16_C (C)
+#define DS S16_C (D)
+#define ES S16_C (E)
+#define FS S16_C (F)
+#define GS S16_C (G)
+#define HS S16_C (H)
+
+#define U16_C(a) ((int16_t) a)
+#define AU U16_C (A)
+#define BU U16_C (B)
+#define CU U16_C (C)
+#define DU U16_C (D)
+#define EU U16_C (E)
+#define FU U16_C (F)
+#define GU U16_C (G)
+#define HU U16_C (H)
+
+#define P16_C(a) ((poly16_t) a)
+#define AP P16_C (A)
+#define BP P16_C (B)
+#define CP P16_C (C)
+#define DP P16_C (D)
+#define EP P16_C (E)
+#define FP P16_C (F)
+#define GP P16_C (G)
+#define HP P16_C (H)
+
+/* Expected results for vduph_lane.  */
+float16_t expected_f16 = AF;
+int16_t expected_s16 = DS;
+uint16_t expected_u16 = BU;
+poly16_t expected_p16 = CP;
+
+/* Expected results for vduph_laneq.  */
+float16_t expected_q_f16 = EF;
+int16_t expected_q_s16 = BS;
+uint16_t expected_q_u16 = GU;
+poly16_t expected_q_p16 = FP;
+
+void exec_vduph_lane_f16 (void)
+{
+  /* vduph_lane.  */
+  DECL_VARIABLE(vsrc, float, 16, 4);
+  DECL_VARIABLE(vsrc, int, 16, 4);
+  DECL_VARIABLE(vsrc, uint, 16, 4);
+  DECL_VARIABLE(vsrc, poly, 16, 4);
+  VECT_VAR_DECL (buf_src, float, 16, 4) [] = {AF, BF, CF, DF};
+  VECT_VAR_DECL (buf_src, int, 16, 4) [] = {AS, BS, CS, DS};
+  VECT_VAR_DECL (buf_src, uint, 16, 4) [] = {AU, BU, CU, DU};
+  VECT_VAR_DECL (buf_src, poly, 16, 4) [] = {AP, BP, CP, DP};
+  VLOAD (vsrc, buf_src, , int, s, 16, 4);
+  VLOAD (vsrc, buf_src, , float, f, 16, 4);
+  VLOAD (vsrc, buf_src, , uint, u, 16, 4);
+  VLOAD (vsrc, buf_src, , poly, p, 16, 4);
+
+  float16_t res_f = vduph_lane_f16 (VECT_VAR (vsrc, float, 16, 4), 0);
+  if (* (unsigned short *) &res_f != * (unsigned short *) &expected_f16)
+    abort ();
+
+  int16_t res_s = vduph_lane_s16 (VECT_VAR (vsrc, int, 16, 4), 3);
+  if (* (unsigned short *) &res_s != * (unsigned short *) &expected_s16)
+    abort ();
+
+  uint16_t res_u = vduph_lane_u16 (VECT_VAR (vsrc, uint, 16, 4), 1);
+  if (* (unsigned short *) &res_u != * (unsigned short *) &expected_u16)
+    abort ();
+
+  poly16_t res_p = vduph_lane_p16 (VECT_VAR (vsrc, poly, 16, 4), 2);
+  if (* (unsigned short *) &res_p != * (unsigned short *) &expected_p16)
+    abort ();
+
+  /* vduph_laneq.  */
+  DECL_VARIABLE(vsrc, float, 16, 8);
+  DECL_VARIABLE(vsrc, int, 16, 8);
+  DECL_VARIABLE(vsrc, uint, 16, 8);
+  DECL_VARIABLE(vsrc, poly, 16, 8);
+  VECT_VAR_DECL (buf_src, float, 16, 8) [] = {AF, BF, CF, DF, EF, FF, GF, HF};
+  VECT_VAR_DECL (buf_src, int, 16, 8) [] = {AS, BS, CS, DS, ES, FS, GS, HS};
+  VECT_VAR_DECL (buf_src, uint, 16, 8) [] = {AU, BU, CU, DU, EU, FU, GU, HU};
+  VECT_VAR_DECL (buf_src, poly, 16, 8) [] = {AP, BP, CP, DP, EP, FP, GP, HP};
+  VLOAD (vsrc, buf_src, q, int, s, 16, 8);
+  VLOAD (vsrc, buf_src, q, float, f, 16, 8);
+  VLOAD (vsrc, buf_src, q, uint, u, 16, 8);
+  VLOAD (vsrc, buf_src, q, poly, p, 16, 8);
+
+  res_f = vduph_laneq_f16 (VECT_VAR (vsrc, float, 16, 8), 4);
+  if (* (unsigned short *) &res_f != * (unsigned short *) &expected_q_f16)
+    abort ();
+
+  res_s = vduph_laneq_s16 (VECT_VAR (vsrc, int, 16, 8), 1);
+  if (* (unsigned short *) &res_s != * (unsigned short *) &expected_q_s16)
+    abort ();
+
+  res_u = vduph_laneq_u16 (VECT_VAR (vsrc, uint, 16, 8), 6);
+  if (* (unsigned short *) &res_u != * (unsigned short *) &expected_q_u16)
+    abort ();
+
+  res_p = vduph_laneq_p16 (VECT_VAR (vsrc, poly, 16, 8), 5);
+  if (* (unsigned short *) &res_p != * (unsigned short *) &expected_q_p16)
+    abort ();
+}
+
+int
+main (void)
+{
+  exec_vduph_lane_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vtrn_half.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vtrn_half.c
new file mode 100644
index 0000000..63f820f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vtrn_half.c
@@ -0,0 +1,263 @@
+/* { dg-do run } */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+/* Expected results.  */
+VECT_VAR_DECL(expected,int,8,8) [] = { 0xf0, 0x11, 0xf2, 0x11,
+				       0xf4, 0x11, 0xf6, 0x11 };
+VECT_VAR_DECL(expected,int,16,4) [] = { 0xfff0, 0x22, 0xfff2, 0x22 };
+VECT_VAR_DECL(expected,int,32,2) [] = { 0xfffffff0, 0x33 };
+VECT_VAR_DECL(expected,int,64,1) [] = { 0xfffffffffffffff0 };
+VECT_VAR_DECL(expected,uint,8,8) [] = { 0xf0, 0x55, 0xf2, 0x55,
+					0xf4, 0x55, 0xf6, 0x55 };
+VECT_VAR_DECL(expected,uint,16,4) [] = { 0xfff0, 0x66, 0xfff2, 0x66 };
+VECT_VAR_DECL(expected,uint,32,2) [] = { 0xfffffff0, 0x77 };
+VECT_VAR_DECL(expected,uint,64,1) [] = { 0xfffffffffffffff0 };
+VECT_VAR_DECL(expected,poly,8,8) [] = { 0xf0, 0x55, 0xf2, 0x55,
+					0xf4, 0x55, 0xf6, 0x55 };
+VECT_VAR_DECL(expected,poly,16,4) [] = { 0xfff0, 0x66, 0xfff2, 0x66 };
+VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0xc1800000, 0x42066666 };
+#if defined (FP16_SUPPORTED)
+VECT_VAR_DECL (expected, hfloat, 16, 4) [] = { 0xcc00, 0x4b4d,
+					       0xcb00, 0x4b4d };
+#endif
+VECT_VAR_DECL(expected,int,8,16) [] = { 0xf0, 0x11, 0xf2, 0x11,
+					0xf4, 0x11, 0xf6, 0x11,
+					0xf8, 0x11, 0xfa, 0x11,
+					0xfc, 0x11, 0xfe, 0x11 };
+VECT_VAR_DECL(expected,int,16,8) [] = { 0xfff0, 0x22, 0xfff2, 0x22,
+					0xfff4, 0x22, 0xfff6, 0x22 };
+VECT_VAR_DECL(expected,int,32,4) [] = { 0xfffffff0, 0x33,
+					0xfffffff2, 0x33 };
+VECT_VAR_DECL(expected,int,64,2) [] = { 0xfffffffffffffff0,
+					0x44 };
+VECT_VAR_DECL(expected,uint,8,16) [] = { 0xf0, 0x55, 0xf2, 0x55,
+					 0xf4, 0x55, 0xf6, 0x55,
+					 0xf8, 0x55, 0xfa, 0x55,
+					 0xfc, 0x55, 0xfe, 0x55 };
+VECT_VAR_DECL(expected,uint,16,8) [] = { 0xfff0, 0x66, 0xfff2, 0x66,
+					 0xfff4, 0x66, 0xfff6, 0x66 };
+VECT_VAR_DECL(expected,uint,32,4) [] = { 0xfffffff0, 0x77,
+					 0xfffffff2, 0x77 };
+VECT_VAR_DECL(expected,uint,64,2) [] = { 0xfffffffffffffff0,
+					 0x88 };
+VECT_VAR_DECL(expected,poly,8,16) [] = { 0xf0, 0x55, 0xf2, 0x55,
+					 0xf4, 0x55, 0xf6, 0x55,
+					 0xf8, 0x55, 0xfa, 0x55,
+					 0xfc, 0x55, 0xfe, 0x55 };
+VECT_VAR_DECL(expected,poly,16,8) [] = { 0xfff0, 0x66, 0xfff2, 0x66,
+					 0xfff4, 0x66, 0xfff6, 0x66 };
+#if defined (FP16_SUPPORTED)
+VECT_VAR_DECL (expected, hfloat, 16, 8) [] = { 0xcc00, 0x4b4d,
+					       0xcb00, 0x4b4d,
+					       0xca00, 0x4b4d,
+					       0xc900, 0x4b4d };
+#endif
+VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0xc1800000, 0x42073333,
+					   0xc1600000, 0x42073333 };
+
+#define TEST_MSG "VTRN1"
+void exec_vtrn_half (void)
+{
+#define TEST_VTRN(PART, Q, T1, T2, W, N)		\
+  VECT_VAR(vector_res, T1, W, N) =			\
+    vtrn##PART##Q##_##T2##W(VECT_VAR(vector, T1, W, N),	\
+		       VECT_VAR(vector2, T1, W, N));	\
+  vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(vector_res, T1, W, N))
+
+#define TEST_VTRN1(Q, T1, T2, W, N) TEST_VTRN(1, Q, T1, T2, W, N)
+
+  /* Input vector can only have 64 bits.  */
+  DECL_VARIABLE_ALL_VARIANTS(vector);
+  DECL_VARIABLE_ALL_VARIANTS(vector2);
+  DECL_VARIABLE(vector, float, 64, 2);
+  DECL_VARIABLE(vector2, float, 64, 2);
+
+  DECL_VARIABLE_ALL_VARIANTS(vector_res);
+  DECL_VARIABLE(vector_res, float, 64, 2);
+
+  clean_results ();
+  /* We don't have vtrn1_T64x1, so set expected to the clean value.  */
+  CLEAN(expected, int, 64, 1);
+  CLEAN(expected, uint, 64, 1);
+
+  TEST_MACRO_ALL_VARIANTS_2_5(VLOAD, vector, buffer);
+#if defined (FP16_SUPPORTED)
+  VLOAD(vector, buffer, , float, f, 16, 4);
+  VLOAD(vector, buffer, q, float, f, 16, 8);
+#endif
+  VLOAD(vector, buffer, , float, f, 32, 2);
+  VLOAD(vector, buffer, q, float, f, 32, 4);
+  VLOAD(vector, buffer, q, float, f, 64, 2);
+
+  /* Choose arbitrary initialization values.  */
+  VDUP(vector2, , int, s, 8, 8, 0x11);
+  VDUP(vector2, , int, s, 16, 4, 0x22);
+  VDUP(vector2, , int, s, 32, 2, 0x33);
+  VDUP(vector2, , uint, u, 8, 8, 0x55);
+  VDUP(vector2, , uint, u, 16, 4, 0x66);
+  VDUP(vector2, , uint, u, 32, 2, 0x77);
+  VDUP(vector2, , poly, p, 8, 8, 0x55);
+  VDUP(vector2, , poly, p, 16, 4, 0x66);
+#if defined (FP16_SUPPORTED)
+  VDUP (vector2, , float, f, 16, 4, 14.6f);   /* 14.6f is 0x4b4d.  */
+#endif
+  VDUP(vector2, , float, f, 32, 2, 33.6f);
+
+  VDUP(vector2, q, int, s, 8, 16, 0x11);
+  VDUP(vector2, q, int, s, 16, 8, 0x22);
+  VDUP(vector2, q, int, s, 32, 4, 0x33);
+  VDUP(vector2, q, int, s, 64, 2, 0x44);
+  VDUP(vector2, q, uint, u, 8, 16, 0x55);
+  VDUP(vector2, q, uint, u, 16, 8, 0x66);
+  VDUP(vector2, q, uint, u, 32, 4, 0x77);
+  VDUP(vector2, q, uint, u, 64, 2, 0x88);
+  VDUP(vector2, q, poly, p, 8, 16, 0x55);
+  VDUP(vector2, q, poly, p, 16, 8, 0x66);
+#if defined (FP16_SUPPORTED)
+  VDUP (vector2, q, float, f, 16, 8, 14.6f);
+#endif
+  VDUP(vector2, q, float, f, 32, 4, 33.8f);
+  VDUP(vector2, q, float, f, 64, 2, 33.8f);
+
+  TEST_VTRN1(, int, s, 8, 8);
+  TEST_VTRN1(, int, s, 16, 4);
+  TEST_VTRN1(, int, s, 32, 2);
+  TEST_VTRN1(, uint, u, 8, 8);
+  TEST_VTRN1(, uint, u, 16, 4);
+  TEST_VTRN1(, uint, u, 32, 2);
+  TEST_VTRN1(, poly, p, 8, 8);
+  TEST_VTRN1(, poly, p, 16, 4);
+#if defined (FP16_SUPPORTED)
+  TEST_VTRN1(, float, f, 16, 4);
+#endif
+  TEST_VTRN1(, float, f, 32, 2);
+
+  TEST_VTRN1(q, int, s, 8, 16);
+  TEST_VTRN1(q, int, s, 16, 8);
+  TEST_VTRN1(q, int, s, 32, 4);
+  TEST_VTRN1(q, int, s, 64, 2);
+  TEST_VTRN1(q, uint, u, 8, 16);
+  TEST_VTRN1(q, uint, u, 16, 8);
+  TEST_VTRN1(q, uint, u, 32, 4);
+  TEST_VTRN1(q, uint, u, 64, 2);
+  TEST_VTRN1(q, poly, p, 8, 16);
+  TEST_VTRN1(q, poly, p, 16, 8);
+#if defined (FP16_SUPPORTED)
+  TEST_VTRN1(q, float, f, 16, 8);
+#endif
+  TEST_VTRN1(q, float, f, 32, 4);
+  TEST_VTRN1(q, float, f, 64, 2);
+
+#if defined (FP16_SUPPORTED)
+  CHECK_RESULTS (TEST_MSG, "");
+#else
+  CHECK_RESULTS_NO_FP16 (TEST_MSG, "");
+#endif
+
+#undef TEST_MSG
+#define TEST_MSG "VTRN2"
+
+#define TEST_VTRN2(Q, T1, T2, W, N) TEST_VTRN(2, Q, T1, T2, W, N)
+
+/* Expected results.  */
+VECT_VAR_DECL(expected2,int,8,8) [] = { 0xf1, 0x11, 0xf3, 0x11,
+					0xf5, 0x11, 0xf7, 0x11 };
+VECT_VAR_DECL(expected2,int,16,4) [] = { 0xfff1, 0x22, 0xfff3, 0x22 };
+VECT_VAR_DECL(expected2,int,32,2) [] = { 0xfffffff1, 0x33 };
+VECT_VAR_DECL(expected2,int,64,1) [] = { 0xfffffffffffffff1 };
+VECT_VAR_DECL(expected2,uint,8,8) [] = { 0xf1, 0x55, 0xf3, 0x55,
+					 0xf5, 0x55, 0xf7, 0x55 };
+VECT_VAR_DECL(expected2,uint,16,4) [] = { 0xfff1, 0x66, 0xfff3, 0x66 };
+VECT_VAR_DECL(expected2,uint,32,2) [] = { 0xfffffff1, 0x77 };
+VECT_VAR_DECL(expected2,uint,64,1) [] = { 0xfffffffffffffff1 };
+VECT_VAR_DECL(expected2,poly,8,8) [] = { 0xf1, 0x55, 0xf3, 0x55,
+					 0xf5, 0x55, 0xf7, 0x55 };
+VECT_VAR_DECL(expected2,poly,16,4) [] = { 0xfff1, 0x66, 0xfff3, 0x66 };
+VECT_VAR_DECL(expected2,hfloat,32,2) [] = { 0xc1700000, 0x42066666 };
+#if defined (FP16_SUPPORTED)
+VECT_VAR_DECL (expected2, hfloat, 16, 4) [] = { 0xcb80, 0x4b4d,
+						0xca80, 0x4b4d };
+#endif
+VECT_VAR_DECL(expected2,int,8,16) [] = { 0xf1, 0x11, 0xf3, 0x11,
+					 0xf5, 0x11, 0xf7, 0x11,
+					 0xf9, 0x11, 0xfb, 0x11,
+					 0xfd, 0x11, 0xff, 0x11 };
+VECT_VAR_DECL(expected2,int,16,8) [] = { 0xfff1, 0x22, 0xfff3, 0x22,
+					 0xfff5, 0x22, 0xfff7, 0x22 };
+VECT_VAR_DECL(expected2,int,32,4) [] = { 0xfffffff1, 0x33,
+					 0xfffffff3, 0x33 };
+VECT_VAR_DECL(expected2,int,64,2) [] = { 0xfffffffffffffff1,
+					 0x44 };
+VECT_VAR_DECL(expected2,uint,8,16) [] = { 0xf1, 0x55, 0xf3, 0x55,
+					  0xf5, 0x55, 0xf7, 0x55,
+					  0xf9, 0x55, 0xfb, 0x55,
+					  0xfd, 0x55, 0xff, 0x55 };
+VECT_VAR_DECL(expected2,uint,16,8) [] = { 0xfff1, 0x66, 0xfff3, 0x66,
+					  0xfff5, 0x66, 0xfff7, 0x66 };
+VECT_VAR_DECL(expected2,uint,32,4) [] = { 0xfffffff1, 0x77,
+					  0xfffffff3, 0x77 };
+VECT_VAR_DECL(expected2,uint,64,2) [] = { 0xfffffffffffffff1,
+					  0x88 };
+VECT_VAR_DECL(expected2,poly,8,16) [] = { 0xf1, 0x55, 0xf3, 0x55,
+					  0xf5, 0x55, 0xf7, 0x55,
+					  0xf9, 0x55, 0xfb, 0x55,
+					  0xfd, 0x55, 0xff, 0x55 };
+VECT_VAR_DECL(expected2,poly,16,8) [] = { 0xfff1, 0x66, 0xfff3, 0x66,
+					  0xfff5, 0x66, 0xfff7, 0x66 };
+#if defined (FP16_SUPPORTED)
+VECT_VAR_DECL (expected2, hfloat, 16, 8) [] = { 0xcb80, 0x4b4d,
+						0xca80, 0x4b4d,
+						0xc980, 0x4b4d,
+						0xc880, 0x4b4d };
+#endif
+VECT_VAR_DECL(expected2,hfloat,32,4) [] = { 0xc1700000, 0x42073333,
+					    0xc1500000, 0x42073333 };
+  clean_results ();
+  CLEAN(expected2, int, 64, 1);
+  CLEAN(expected2, uint, 64, 1);
+
+  TEST_VTRN2(, int, s, 8, 8);
+  TEST_VTRN2(, int, s, 16, 4);
+  TEST_VTRN2(, int, s, 32, 2);
+  TEST_VTRN2(, uint, u, 8, 8);
+  TEST_VTRN2(, uint, u, 16, 4);
+  TEST_VTRN2(, uint, u, 32, 2);
+  TEST_VTRN2(, poly, p, 8, 8);
+  TEST_VTRN2(, poly, p, 16, 4);
+#if defined (FP16_SUPPORTED)
+  TEST_VTRN2(, float, f, 16, 4);
+#endif
+  TEST_VTRN2(, float, f, 32, 2);
+
+  TEST_VTRN2(q, int, s, 8, 16);
+  TEST_VTRN2(q, int, s, 16, 8);
+  TEST_VTRN2(q, int, s, 32, 4);
+  TEST_VTRN2(q, int, s, 64, 2);
+  TEST_VTRN2(q, uint, u, 8, 16);
+  TEST_VTRN2(q, uint, u, 16, 8);
+  TEST_VTRN2(q, uint, u, 32, 4);
+  TEST_VTRN2(q, uint, u, 64, 2);
+  TEST_VTRN2(q, poly, p, 8, 16);
+  TEST_VTRN2(q, poly, p, 16, 8);
+#if defined (FP16_SUPPORTED)
+  TEST_VTRN2(q, float, f, 16, 8);
+#endif
+  TEST_VTRN2(q, float, f, 32, 4);
+  TEST_VTRN2(q, float, f, 64, 2);
+
+  CHECK_RESULTS_NAMED (TEST_MSG, expected2, "");
+#if defined (FP16_SUPPORTED)
+  CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected2, "");
+  CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected2, "");
+#endif
+}
+
+int main (void)
+{
+  exec_vtrn_half ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vuzp_half.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vuzp_half.c
new file mode 100644
index 0000000..8706f24
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vuzp_half.c
@@ -0,0 +1,259 @@
+/* { dg-do run } */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+/* Expected results.  */
+VECT_VAR_DECL(expected,int,8,8) [] = { 0xf0, 0xf2, 0xf4, 0xf6,
+				       0x11, 0x11, 0x11, 0x11 };
+VECT_VAR_DECL(expected,int,16,4) [] = { 0xfff0, 0xfff2, 0x22, 0x22 };
+VECT_VAR_DECL(expected,int,32,2) [] = { 0xfffffff0, 0x33 };
+VECT_VAR_DECL(expected,int,64,1) [] = { 0xfffffffffffffff0 };
+VECT_VAR_DECL(expected,uint,8,8) [] = { 0xf0, 0xf2, 0xf4, 0xf6,
+					0x55, 0x55, 0x55, 0x55 };
+VECT_VAR_DECL(expected,uint,16,4) [] = { 0xfff0, 0xfff2, 0x66, 0x66 };
+VECT_VAR_DECL(expected,uint,32,2) [] = { 0xfffffff0, 0x77 };
+VECT_VAR_DECL(expected,uint,64,1) [] = { 0xfffffffffffffff0 };
+VECT_VAR_DECL(expected,poly,8,8) [] = { 0xf0, 0xf2, 0xf4, 0xf6,
+					0x55, 0x55, 0x55, 0x55 };
+VECT_VAR_DECL(expected,poly,16,4) [] = { 0xfff0, 0xfff2, 0x66, 0x66 };
+VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0xc1800000, 0x42066666 };
+#if defined (FP16_SUPPORTED)
+VECT_VAR_DECL (expected, hfloat, 16, 4) [] = { 0xcc00, 0xcb00,
+					       0x4b4d, 0x4b4d };
+#endif
+VECT_VAR_DECL(expected,int,8,16) [] = { 0xf0, 0xf2, 0xf4, 0xf6,
+					0xf8, 0xfa, 0xfc, 0xfe,
+					0x11, 0x11, 0x11, 0x11,
+					0x11, 0x11, 0x11, 0x11 };
+VECT_VAR_DECL(expected,int,16,8) [] = { 0xfff0, 0xfff2, 0xfff4, 0xfff6,
+					0x22, 0x22, 0x22, 0x22 };
+VECT_VAR_DECL(expected,int,32,4) [] = { 0xfffffff0, 0xfffffff2,
+					0x33, 0x33 };
+VECT_VAR_DECL(expected,int,64,2) [] = { 0xfffffffffffffff0,
+					0x44 };
+VECT_VAR_DECL(expected,uint,8,16) [] = { 0xf0, 0xf2, 0xf4, 0xf6,
+					 0xf8, 0xfa, 0xfc, 0xfe,
+					 0x55, 0x55, 0x55, 0x55,
+					 0x55, 0x55, 0x55, 0x55 };
+VECT_VAR_DECL(expected,uint,16,8) [] = { 0xfff0, 0xfff2, 0xfff4, 0xfff6,
+					 0x66, 0x66, 0x66, 0x66 };
+VECT_VAR_DECL(expected,uint,32,4) [] = { 0xfffffff0, 0xfffffff2, 0x77, 0x77 };
+VECT_VAR_DECL(expected,uint,64,2) [] = { 0xfffffffffffffff0,
+					 0x88 };
+VECT_VAR_DECL(expected,poly,8,16) [] = { 0xf0, 0xf2, 0xf4, 0xf6,
+					 0xf8, 0xfa, 0xfc, 0xfe,
+					 0x55, 0x55, 0x55, 0x55,
+					 0x55, 0x55, 0x55, 0x55 };
+VECT_VAR_DECL(expected,poly,16,8) [] = { 0xfff0, 0xfff2, 0xfff4, 0xfff6,
+					 0x66, 0x66, 0x66, 0x66 };
+#if defined (FP16_SUPPORTED)
+VECT_VAR_DECL (expected, hfloat, 16, 8) [] = { 0xcc00, 0xcb00, 0xca00, 0xc900,
+					       0x4b4d, 0x4b4d, 0x4b4d, 0x4b4d };
+#endif
+VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0xc1800000, 0xc1600000,
+					   0x42073333, 0x42073333 };
+
+#define TEST_MSG "VUZP1"
+void exec_vuzp_half (void)
+{
+#define TEST_VUZP(PART, Q, T1, T2, W, N)		\
+  VECT_VAR(vector_res, T1, W, N) =			\
+    vuzp##PART##Q##_##T2##W(VECT_VAR(vector, T1, W, N),	\
+		       VECT_VAR(vector2, T1, W, N));	\
+  vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(vector_res, T1, W, N))
+
+#define TEST_VUZP1(Q, T1, T2, W, N) TEST_VUZP(1, Q, T1, T2, W, N)
+
+  /* Input vector can only have 64 bits.  */
+  DECL_VARIABLE_ALL_VARIANTS(vector);
+  DECL_VARIABLE_ALL_VARIANTS(vector2);
+  DECL_VARIABLE(vector, float, 64, 2);
+  DECL_VARIABLE(vector2, float, 64, 2);
+
+  DECL_VARIABLE_ALL_VARIANTS(vector_res);
+  DECL_VARIABLE(vector_res, float, 64, 2);
+
+  clean_results ();
+  /* We don't have vuzp1_T64x1, so set expected to the clean value.  */
+  CLEAN(expected, int, 64, 1);
+  CLEAN(expected, uint, 64, 1);
+
+  TEST_MACRO_ALL_VARIANTS_2_5(VLOAD, vector, buffer);
+#if defined (FP16_SUPPORTED)
+  VLOAD(vector, buffer, , float, f, 16, 4);
+  VLOAD(vector, buffer, q, float, f, 16, 8);
+#endif
+  VLOAD(vector, buffer, , float, f, 32, 2);
+  VLOAD(vector, buffer, q, float, f, 32, 4);
+  VLOAD(vector, buffer, q, float, f, 64, 2);
+
+  /* Choose arbitrary initialization values.  */
+  VDUP(vector2, , int, s, 8, 8, 0x11);
+  VDUP(vector2, , int, s, 16, 4, 0x22);
+  VDUP(vector2, , int, s, 32, 2, 0x33);
+  VDUP(vector2, , uint, u, 8, 8, 0x55);
+  VDUP(vector2, , uint, u, 16, 4, 0x66);
+  VDUP(vector2, , uint, u, 32, 2, 0x77);
+  VDUP(vector2, , poly, p, 8, 8, 0x55);
+  VDUP(vector2, , poly, p, 16, 4, 0x66);
+#if defined (FP16_SUPPORTED)
+  VDUP (vector2, , float, f, 16, 4, 14.6f);   /* 14.6f is 0x4b4d.  */
+#endif
+  VDUP(vector2, , float, f, 32, 2, 33.6f);
+
+  VDUP(vector2, q, int, s, 8, 16, 0x11);
+  VDUP(vector2, q, int, s, 16, 8, 0x22);
+  VDUP(vector2, q, int, s, 32, 4, 0x33);
+  VDUP(vector2, q, int, s, 64, 2, 0x44);
+  VDUP(vector2, q, uint, u, 8, 16, 0x55);
+  VDUP(vector2, q, uint, u, 16, 8, 0x66);
+  VDUP(vector2, q, uint, u, 32, 4, 0x77);
+  VDUP(vector2, q, uint, u, 64, 2, 0x88);
+  VDUP(vector2, q, poly, p, 8, 16, 0x55);
+  VDUP(vector2, q, poly, p, 16, 8, 0x66);
+#if defined (FP16_SUPPORTED)
+  VDUP (vector2, q, float, f, 16, 8, 14.6f);
+#endif
+  VDUP(vector2, q, float, f, 32, 4, 33.8f);
+  VDUP(vector2, q, float, f, 64, 2, 33.8f);
+
+  TEST_VUZP1(, int, s, 8, 8);
+  TEST_VUZP1(, int, s, 16, 4);
+  TEST_VUZP1(, int, s, 32, 2);
+  TEST_VUZP1(, uint, u, 8, 8);
+  TEST_VUZP1(, uint, u, 16, 4);
+  TEST_VUZP1(, uint, u, 32, 2);
+  TEST_VUZP1(, poly, p, 8, 8);
+  TEST_VUZP1(, poly, p, 16, 4);
+#if defined (FP16_SUPPORTED)
+  TEST_VUZP1(, float, f, 16, 4);
+#endif
+  TEST_VUZP1(, float, f, 32, 2);
+
+  TEST_VUZP1(q, int, s, 8, 16);
+  TEST_VUZP1(q, int, s, 16, 8);
+  TEST_VUZP1(q, int, s, 32, 4);
+  TEST_VUZP1(q, int, s, 64, 2);
+  TEST_VUZP1(q, uint, u, 8, 16);
+  TEST_VUZP1(q, uint, u, 16, 8);
+  TEST_VUZP1(q, uint, u, 32, 4);
+  TEST_VUZP1(q, uint, u, 64, 2);
+  TEST_VUZP1(q, poly, p, 8, 16);
+  TEST_VUZP1(q, poly, p, 16, 8);
+#if defined (FP16_SUPPORTED)
+  TEST_VUZP1(q, float, f, 16, 8);
+#endif
+  TEST_VUZP1(q, float, f, 32, 4);
+  TEST_VUZP1(q, float, f, 64, 2);
+
+#if defined (FP16_SUPPORTED)
+  CHECK_RESULTS (TEST_MSG, "");
+#else
+  CHECK_RESULTS_NO_FP16 (TEST_MSG, "");
+#endif
+
+#undef TEST_MSG
+#define TEST_MSG "VUZP2"
+
+#define TEST_VUZP2(Q, T1, T2, W, N) TEST_VUZP(2, Q, T1, T2, W, N)
+
+/* Expected results.  */
+VECT_VAR_DECL(expected2,int,8,8) [] = { 0xf1, 0xf3, 0xf5, 0xf7,
+					0x11, 0x11, 0x11, 0x11 };
+VECT_VAR_DECL(expected2,int,16,4) [] = { 0xfff1, 0xfff3, 0x22, 0x22 };
+VECT_VAR_DECL(expected2,int,32,2) [] = { 0xfffffff1, 0x33 };
+VECT_VAR_DECL(expected2,int,64,1) [] = { 0xfffffffffffffff1 };
+VECT_VAR_DECL(expected2,uint,8,8) [] = { 0xf1, 0xf3, 0xf5, 0xf7,
+					 0x55, 0x55, 0x55, 0x55 };
+VECT_VAR_DECL(expected2,uint,16,4) [] = { 0xfff1, 0xfff3, 0x66, 0x66 };
+VECT_VAR_DECL(expected2,uint,32,2) [] = { 0xfffffff1, 0x77 };
+VECT_VAR_DECL(expected2,uint,64,1) [] = { 0xfffffffffffffff1 };
+VECT_VAR_DECL(expected2,poly,8,8) [] = { 0xf1, 0xf3, 0xf5, 0xf7,
+					 0x55, 0x55, 0x55, 0x55 };
+VECT_VAR_DECL(expected2,poly,16,4) [] = { 0xfff1, 0xfff3, 0x66, 0x66 };
+VECT_VAR_DECL(expected2,hfloat,32,2) [] = { 0xc1700000, 0x42066666 };
+#if defined (FP16_SUPPORTED)
+VECT_VAR_DECL (expected2, hfloat, 16, 4) [] = { 0xcb80, 0xca80,
+						0x4b4d, 0x4b4d };
+#endif
+VECT_VAR_DECL(expected2,int,8,16) [] = { 0xf1, 0xf3, 0xf5, 0xf7,
+					 0xf9, 0xfb, 0xfd, 0xff,
+					 0x11, 0x11, 0x11, 0x11,
+					 0x11, 0x11, 0x11, 0x11 };
+VECT_VAR_DECL(expected2,int,16,8) [] = { 0xfff1, 0xfff3, 0xfff5, 0xfff7,
+					 0x22, 0x22, 0x22, 0x22 };
+VECT_VAR_DECL(expected2,int,32,4) [] = { 0xfffffff1, 0xfffffff3,
+					 0x33, 0x33 };
+VECT_VAR_DECL(expected2,int,64,2) [] = { 0xfffffffffffffff1,
+					 0x44 };
+VECT_VAR_DECL(expected2,uint,8,16) [] = { 0xf1, 0xf3, 0xf5, 0xf7,
+					  0xf9, 0xfb, 0xfd, 0xff,
+					  0x55, 0x55, 0x55, 0x55,
+					  0x55, 0x55, 0x55, 0x55 };
+VECT_VAR_DECL(expected2,uint,16,8) [] = { 0xfff1, 0xfff3, 0xfff5, 0xfff7,
+					  0x66, 0x66, 0x66, 0x66 };
+VECT_VAR_DECL(expected2,uint,32,4) [] = { 0xfffffff1, 0xfffffff3, 0x77, 0x77 };
+VECT_VAR_DECL(expected2,uint,64,2) [] = { 0xfffffffffffffff1,
+					  0x88 };
+VECT_VAR_DECL(expected2,poly,8,16) [] = { 0xf1, 0xf3, 0xf5, 0xf7,
+					  0xf9, 0xfb, 0xfd, 0xff,
+					  0x55, 0x55, 0x55, 0x55,
+					  0x55, 0x55, 0x55, 0x55 };
+VECT_VAR_DECL(expected2,poly,16,8) [] = { 0xfff1, 0xfff3, 0xfff5, 0xfff7,
+					  0x66, 0x66, 0x66, 0x66 };
+#if defined (FP16_SUPPORTED)
+VECT_VAR_DECL (expected2, hfloat, 16, 8) [] = { 0xcb80, 0xca80, 0xc980, 0xc880,
+						0x4b4d, 0x4b4d, 0x4b4d, 0x4b4d
+					      };
+#endif
+VECT_VAR_DECL(expected2,hfloat,32,4) [] = { 0xc1700000, 0xc1500000,
+					    0x42073333, 0x42073333 };
+
+  clean_results ();
+  CLEAN(expected2, int, 64, 1);
+  CLEAN(expected2, uint, 64, 1);
+
+  TEST_VUZP2(, int, s, 8, 8);
+  TEST_VUZP2(, int, s, 16, 4);
+  TEST_VUZP2(, int, s, 32, 2);
+  TEST_VUZP2(, uint, u, 8, 8);
+  TEST_VUZP2(, uint, u, 16, 4);
+  TEST_VUZP2(, uint, u, 32, 2);
+  TEST_VUZP2(, poly, p, 8, 8);
+  TEST_VUZP2(, poly, p, 16, 4);
+#if defined (FP16_SUPPORTED)
+  TEST_VUZP2(, float, f, 16, 4);
+#endif
+  TEST_VUZP2(, float, f, 32, 2);
+
+  TEST_VUZP2(q, int, s, 8, 16);
+  TEST_VUZP2(q, int, s, 16, 8);
+  TEST_VUZP2(q, int, s, 32, 4);
+  TEST_VUZP2(q, int, s, 64, 2);
+  TEST_VUZP2(q, uint, u, 8, 16);
+  TEST_VUZP2(q, uint, u, 16, 8);
+  TEST_VUZP2(q, uint, u, 32, 4);
+  TEST_VUZP2(q, uint, u, 64, 2);
+  TEST_VUZP2(q, poly, p, 8, 16);
+  TEST_VUZP2(q, poly, p, 16, 8);
+#if defined (FP16_SUPPORTED)
+  TEST_VUZP2(q, float, f, 16, 8);
+#endif
+  TEST_VUZP2(q, float, f, 32, 4);
+  TEST_VUZP2(q, float, f, 64, 2);
+
+  CHECK_RESULTS_NAMED (TEST_MSG, expected2, "");
+#if defined (FP16_SUPPORTED)
+  CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected2, "");
+  CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected2, "");
+#endif
+}
+
+int main (void)
+{
+  exec_vuzp_half ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vzip_half.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vzip_half.c
new file mode 100644
index 0000000..619d6b2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vzip_half.c
@@ -0,0 +1,263 @@
+/* { dg-do run } */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+/* Expected results.  */
+VECT_VAR_DECL(expected,int,8,8) [] = { 0xf0, 0x11, 0xf1, 0x11,
+				       0xf2, 0x11, 0xf3, 0x11 };
+VECT_VAR_DECL(expected,int,16,4) [] = { 0xfff0, 0x22, 0xfff1, 0x22 };
+VECT_VAR_DECL(expected,int,32,2) [] = { 0xfffffff0, 0x33 };
+VECT_VAR_DECL(expected,int,64,1) [] = { 0xfffffffffffffff0 };
+VECT_VAR_DECL(expected,uint,8,8) [] = { 0xf0, 0x55, 0xf1, 0x55,
+					0xf2, 0x55, 0xf3, 0x55 };
+VECT_VAR_DECL(expected,uint,16,4) [] = { 0xfff0, 0x66, 0xfff1, 0x66 };
+VECT_VAR_DECL(expected,uint,32,2) [] = { 0xfffffff0, 0x77 };
+VECT_VAR_DECL(expected,uint,64,1) [] = { 0xfffffffffffffff0 };
+VECT_VAR_DECL(expected,poly,8,8) [] = { 0xf0, 0x55, 0xf1, 0x55,
+					0xf2, 0x55, 0xf3, 0x55 };
+VECT_VAR_DECL(expected,poly,16,4) [] = { 0xfff0, 0x66, 0xfff1, 0x66 };
+VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0xc1800000, 0x42066666 };
+#if defined (FP16_SUPPORTED)
+VECT_VAR_DECL (expected, hfloat, 16, 4) [] = { 0xcc00, 0x4b4d,
+					       0xcb80, 0x4b4d };
+#endif
+VECT_VAR_DECL(expected,int,8,16) [] = { 0xf0, 0x11, 0xf1, 0x11,
+					0xf2, 0x11, 0xf3, 0x11,
+					0xf4, 0x11, 0xf5, 0x11,
+					0xf6, 0x11, 0xf7, 0x11 };
+VECT_VAR_DECL(expected,int,16,8) [] = { 0xfff0, 0x22, 0xfff1, 0x22,
+					0xfff2, 0x22, 0xfff3, 0x22 };
+VECT_VAR_DECL(expected,int,32,4) [] = { 0xfffffff0, 0x33,
+					0xfffffff1, 0x33 };
+VECT_VAR_DECL(expected,int,64,2) [] = { 0xfffffffffffffff0,
+					0x44 };
+VECT_VAR_DECL(expected,uint,8,16) [] = { 0xf0, 0x55, 0xf1, 0x55,
+					 0xf2, 0x55, 0xf3, 0x55,
+					 0xf4, 0x55, 0xf5, 0x55,
+					 0xf6, 0x55, 0xf7, 0x55 };
+VECT_VAR_DECL(expected,uint,16,8) [] = { 0xfff0, 0x66, 0xfff1, 0x66,
+					 0xfff2, 0x66, 0xfff3, 0x66 };
+VECT_VAR_DECL(expected,uint,32,4) [] = { 0xfffffff0, 0x77,
+					 0xfffffff1, 0x77 };
+VECT_VAR_DECL(expected,uint,64,2) [] = { 0xfffffffffffffff0,
+					 0x88 };
+VECT_VAR_DECL(expected,poly,8,16) [] = { 0xf0, 0x55, 0xf1, 0x55,
+					 0xf2, 0x55, 0xf3, 0x55,
+					 0xf4, 0x55, 0xf5, 0x55,
+					 0xf6, 0x55, 0xf7, 0x55 };
+VECT_VAR_DECL(expected,poly,16,8) [] = { 0xfff0, 0x66, 0xfff1, 0x66,
+					 0xfff2, 0x66, 0xfff3, 0x66 };
+#if defined (FP16_SUPPORTED)
+VECT_VAR_DECL (expected, hfloat, 16, 8) [] = { 0xcc00, 0x4b4d,
+					       0xcb80, 0x4b4d,
+					       0xcb00, 0x4b4d,
+					       0xca80, 0x4b4d };
+#endif
+VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0xc1800000, 0x42073333,
+					   0xc1700000, 0x42073333 };
+
+#define TEST_MSG "VZIP1"
+void exec_vzip_half (void)
+{
+#define TEST_VZIP(PART, Q, T1, T2, W, N)		\
+  VECT_VAR(vector_res, T1, W, N) =			\
+    vzip##PART##Q##_##T2##W(VECT_VAR(vector, T1, W, N),	\
+		       VECT_VAR(vector2, T1, W, N));	\
+  vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(vector_res, T1, W, N))
+
+#define TEST_VZIP1(Q, T1, T2, W, N) TEST_VZIP(1, Q, T1, T2, W, N)
+
+  /* Input vector can only have 64 bits.  */
+  DECL_VARIABLE_ALL_VARIANTS(vector);
+  DECL_VARIABLE_ALL_VARIANTS(vector2);
+  DECL_VARIABLE(vector, float, 64, 2);
+  DECL_VARIABLE(vector2, float, 64, 2);
+
+  DECL_VARIABLE_ALL_VARIANTS(vector_res);
+  DECL_VARIABLE(vector_res, float, 64, 2);
+
+  clean_results ();
+  /* We don't have vzip1_T64x1, so set expected to the clean value.  */
+  CLEAN(expected, int, 64, 1);
+  CLEAN(expected, uint, 64, 1);
+
+  TEST_MACRO_ALL_VARIANTS_2_5(VLOAD, vector, buffer);
+#if defined (FP16_SUPPORTED)
+  VLOAD(vector, buffer, , float, f, 16, 4);
+  VLOAD(vector, buffer, q, float, f, 16, 8);
+#endif
+  VLOAD(vector, buffer, , float, f, 32, 2);
+  VLOAD(vector, buffer, q, float, f, 32, 4);
+  VLOAD(vector, buffer, q, float, f, 64, 2);
+
+  /* Choose arbitrary initialization values.  */
+  VDUP(vector2, , int, s, 8, 8, 0x11);
+  VDUP(vector2, , int, s, 16, 4, 0x22);
+  VDUP(vector2, , int, s, 32, 2, 0x33);
+  VDUP(vector2, , uint, u, 8, 8, 0x55);
+  VDUP(vector2, , uint, u, 16, 4, 0x66);
+  VDUP(vector2, , uint, u, 32, 2, 0x77);
+  VDUP(vector2, , poly, p, 8, 8, 0x55);
+  VDUP(vector2, , poly, p, 16, 4, 0x66);
+#if defined (FP16_SUPPORTED)
+  VDUP (vector2, , float, f, 16, 4, 14.6f);   /* 14.6f is 0x4b4d.  */
+#endif
+  VDUP(vector2, , float, f, 32, 2, 33.6f);
+
+  VDUP(vector2, q, int, s, 8, 16, 0x11);
+  VDUP(vector2, q, int, s, 16, 8, 0x22);
+  VDUP(vector2, q, int, s, 32, 4, 0x33);
+  VDUP(vector2, q, int, s, 64, 2, 0x44);
+  VDUP(vector2, q, uint, u, 8, 16, 0x55);
+  VDUP(vector2, q, uint, u, 16, 8, 0x66);
+  VDUP(vector2, q, uint, u, 32, 4, 0x77);
+  VDUP(vector2, q, uint, u, 64, 2, 0x88);
+  VDUP(vector2, q, poly, p, 8, 16, 0x55);
+  VDUP(vector2, q, poly, p, 16, 8, 0x66);
+#if defined (FP16_SUPPORTED)
+  VDUP (vector2, q, float, f, 16, 8, 14.6f);
+#endif
+  VDUP(vector2, q, float, f, 32, 4, 33.8f);
+  VDUP(vector2, q, float, f, 64, 2, 33.8f);
+
+  TEST_VZIP1(, int, s, 8, 8);
+  TEST_VZIP1(, int, s, 16, 4);
+  TEST_VZIP1(, int, s, 32, 2);
+  TEST_VZIP1(, uint, u, 8, 8);
+  TEST_VZIP1(, uint, u, 16, 4);
+  TEST_VZIP1(, uint, u, 32, 2);
+  TEST_VZIP1(, poly, p, 8, 8);
+  TEST_VZIP1(, poly, p, 16, 4);
+#if defined (FP16_SUPPORTED)
+  TEST_VZIP1(, float, f, 16, 4);
+#endif
+  TEST_VZIP1(, float, f, 32, 2);
+
+  TEST_VZIP1(q, int, s, 8, 16);
+  TEST_VZIP1(q, int, s, 16, 8);
+  TEST_VZIP1(q, int, s, 32, 4);
+  TEST_VZIP1(q, int, s, 64, 2);
+  TEST_VZIP1(q, uint, u, 8, 16);
+  TEST_VZIP1(q, uint, u, 16, 8);
+  TEST_VZIP1(q, uint, u, 32, 4);
+  TEST_VZIP1(q, uint, u, 64, 2);
+  TEST_VZIP1(q, poly, p, 8, 16);
+  TEST_VZIP1(q, poly, p, 16, 8);
+#if defined (FP16_SUPPORTED)
+  TEST_VZIP1(q, float, f, 16, 8);
+#endif
+  TEST_VZIP1(q, float, f, 32, 4);
+  TEST_VZIP1(q, float, f, 64, 2);
+
+#if defined (FP16_SUPPORTED)
+  CHECK_RESULTS (TEST_MSG, "");
+#else
+  CHECK_RESULTS_NO_FP16 (TEST_MSG, "");
+#endif
+
+#undef TEST_MSG
+#define TEST_MSG "VZIP2"
+
+#define TEST_VZIP2(Q, T1, T2, W, N) TEST_VZIP(2, Q, T1, T2, W, N)
+
+/* Expected results.  */
+VECT_VAR_DECL(expected2,int,8,8) [] = { 0xf4, 0x11, 0xf5, 0x11,
+					0xf6, 0x11, 0xf7, 0x11 };
+VECT_VAR_DECL(expected2,int,16,4) [] = { 0xfff2, 0x22, 0xfff3, 0x22 };
+VECT_VAR_DECL(expected2,int,32,2) [] = { 0xfffffff1, 0x33 };
+VECT_VAR_DECL(expected2,int,64,1) [] = { 0xfffffffffffffff1 };
+VECT_VAR_DECL(expected2,uint,8,8) [] = { 0xf4, 0x55, 0xf5, 0x55,
+					 0xf6, 0x55, 0xf7, 0x55 };
+VECT_VAR_DECL(expected2,uint,16,4) [] = { 0xfff2, 0x66, 0xfff3, 0x66 };
+VECT_VAR_DECL(expected2,uint,32,2) [] = { 0xfffffff1, 0x77 };
+VECT_VAR_DECL(expected2,uint,64,1) [] = { 0xfffffffffffffff1 };
+VECT_VAR_DECL(expected2,poly,8,8) [] = { 0xf4, 0x55, 0xf5, 0x55,
+					 0xf6, 0x55, 0xf7, 0x55 };
+VECT_VAR_DECL(expected2,poly,16,4) [] = { 0xfff2, 0x66, 0xfff3, 0x66 };
+VECT_VAR_DECL(expected2,hfloat,32,2) [] = { 0xc1700000, 0x42066666 };
+#if defined (FP16_SUPPORTED)
+VECT_VAR_DECL (expected2, hfloat, 16, 4) [] = { 0xcb00, 0x4b4d,
+						0xca80, 0x4b4d };
+#endif
+VECT_VAR_DECL(expected2,int,8,16) [] = { 0xf8, 0x11, 0xf9, 0x11,
+					 0xfa, 0x11, 0xfb, 0x11,
+					 0xfc, 0x11, 0xfd, 0x11,
+					 0xfe, 0x11, 0xff, 0x11 };
+VECT_VAR_DECL(expected2,int,16,8) [] = { 0xfff4, 0x22, 0xfff5, 0x22,
+					 0xfff6, 0x22, 0xfff7, 0x22 };
+VECT_VAR_DECL(expected2,int,32,4) [] = { 0xfffffff2, 0x33,
+					 0xfffffff3, 0x33 };
+VECT_VAR_DECL(expected2,int,64,2) [] = { 0xfffffffffffffff1,
+					 0x44 };
+VECT_VAR_DECL(expected2,uint,8,16) [] = { 0xf8, 0x55, 0xf9, 0x55,
+					  0xfa, 0x55, 0xfb, 0x55,
+					  0xfc, 0x55, 0xfd, 0x55,
+					  0xfe, 0x55, 0xff, 0x55 };
+VECT_VAR_DECL(expected2,uint,16,8) [] = { 0xfff4, 0x66, 0xfff5, 0x66,
+					  0xfff6, 0x66, 0xfff7, 0x66 };
+VECT_VAR_DECL(expected2,uint,32,4) [] = { 0xfffffff2, 0x77,
+					  0xfffffff3, 0x77 };
+VECT_VAR_DECL(expected2,uint,64,2) [] = { 0xfffffffffffffff1,
+					  0x88 };
+VECT_VAR_DECL(expected2,poly,8,16) [] = { 0xf8, 0x55, 0xf9, 0x55,
+					  0xfa, 0x55, 0xfb, 0x55,
+					  0xfc, 0x55, 0xfd, 0x55,
+					  0xfe, 0x55, 0xff, 0x55 };
+VECT_VAR_DECL(expected2,poly,16,8) [] = { 0xfff4, 0x66, 0xfff5, 0x66,
+					  0xfff6, 0x66, 0xfff7, 0x66 };
+#if defined (FP16_SUPPORTED)
+VECT_VAR_DECL (expected2, hfloat, 16, 8) [] = { 0xca00, 0x4b4d,
+						0xc980, 0x4b4d,
+						0xc900, 0x4b4d,
+						0xc880, 0x4b4d };
+#endif
+VECT_VAR_DECL(expected2,hfloat,32,4) [] = { 0xc1600000, 0x42073333,
+					    0xc1500000, 0x42073333 };
+  clean_results ();
+  CLEAN(expected2, int, 64, 1);
+  CLEAN(expected2, uint, 64, 1);
+
+  TEST_VZIP2(, int, s, 8, 8);
+  TEST_VZIP2(, int, s, 16, 4);
+  TEST_VZIP2(, int, s, 32, 2);
+  TEST_VZIP2(, uint, u, 8, 8);
+  TEST_VZIP2(, uint, u, 16, 4);
+  TEST_VZIP2(, uint, u, 32, 2);
+  TEST_VZIP2(, poly, p, 8, 8);
+  TEST_VZIP2(, poly, p, 16, 4);
+#if defined (FP16_SUPPORTED)
+  TEST_VZIP2(, float, f, 16, 4);
+#endif
+  TEST_VZIP2(, float, f, 32, 2);
+
+  TEST_VZIP2(q, int, s, 8, 16);
+  TEST_VZIP2(q, int, s, 16, 8);
+  TEST_VZIP2(q, int, s, 32, 4);
+  TEST_VZIP2(q, int, s, 64, 2);
+  TEST_VZIP2(q, uint, u, 8, 16);
+  TEST_VZIP2(q, uint, u, 16, 8);
+  TEST_VZIP2(q, uint, u, 32, 4);
+  TEST_VZIP2(q, uint, u, 64, 2);
+  TEST_VZIP2(q, poly, p, 8, 16);
+  TEST_VZIP2(q, poly, p, 16, 8);
+#if defined (FP16_SUPPORTED)
+  TEST_VZIP2(q, float, f, 16, 8);
+#endif
+  TEST_VZIP2(q, float, f, 32, 4);
+  TEST_VZIP2(q, float, f, 64, 2);
+
+  CHECK_RESULTS_NAMED (TEST_MSG, expected2, "");
+#if defined (FP16_SUPPORTED)
+  CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected2, "");
+  CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected2, "");
+#endif
+}
+
+int main (void)
+{
+  exec_vzip_half ();
+  return 0;
+}
-- 
2.5.0




^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [AArch64][1/14] ARMv8.2-A FP16 data processing intrinsics
  2016-07-07 16:14 ` [AArch64][1/14] ARMv8.2-A FP16 data processing intrinsics Jiong Wang
@ 2016-07-08 14:07   ` James Greenhalgh
  0 siblings, 0 replies; 32+ messages in thread
From: James Greenhalgh @ 2016-07-08 14:07 UTC (permalink / raw)
  To: Jiong Wang; +Cc: GCC Patches, nd

On Thu, Jul 07, 2016 at 05:13:56PM +0100, Jiong Wang wrote:
> Several data-processing instructions are agnostic to the type of their
> operands. This patch add the mapping between them and those bit- and
> lane-manipulation instructions.
> 
> No ARMv8.2-A FP16 extension hardware support is required for these
> intrinsics.

These intrinsics are independent of the ARMv8.2-A implementation,
and are proposed to be added in a future ACLE specification. I've
checked that the intrinsics added here match those proposed.

OK for trunk.

Thanks,
James

> gcc/
> 2016-07-07  Jiong Wang <jiong.wang@arm.com>
> 
>         * config/aarch64/aarch64-simd.md
> (aarch64_<PERMUTE:perm_insn><PERMUTE:perm_hilo><mode>): Use VALL_F16.
>         (aarch64_ext<mode>): Likewise.
>         (aarch64_rev<REVERSE:rev_op><mode>): Likewise.
>         * config/aarch64/aarch64.c (aarch64_evpc_trn): Support
> V4HFmode and V8HFmode.
>         (aarch64_evpc_uzp): Likewise.
>         (aarch64_evpc_zip): Likewise.
>         (aarch64_evpc_ext): Likewise.
>         (aarch64_evpc_rev): Likewise.
>         * config/aarch64/arm_neon.h (__aarch64_vdup_lane_f16): New.
>         (__aarch64_vdup_laneq_f16): New..
>         (__aarch64_vdupq_lane_f16): New.
>         (__aarch64_vdupq_laneq_f16): New.
>         (vbsl_f16): New.
>         (vbslq_f16): New.
>         (vdup_n_f16): New.
>         (vdupq_n_f16): New.
>         (vdup_lane_f16): New.
>         (vdup_laneq_f16): New.
>         (vdupq_lane_f16): New.
>         (vdupq_laneq_f16): New.
>         (vduph_lane_f16): New.
>         (vduph_laneq_f16): New.
>         (vext_f16): New.
>         (vextq_f16): New.
>         (vmov_n_f16): New.
>         (vmovq_n_f16): New.
>         (vrev64_f16): New.
>         (vrev64q_f16): New.
>         (vtrn1_f16): New.
>         (vtrn1q_f16): New.
>         (vtrn2_f16): New.
>         (vtrn2q_f16): New.
>         (vtrn_f16): New.
>         (vtrnq_f16): New.
>         (__INTERLEAVE_LIST): Support float16x4_t, float16x8_t.
>         (vuzp1_f16): New.
>         (vuzp1q_f16): New.
>         (vuzp2_f16): New.
>         (vuzp2q_f16): New.
>         (vzip1_f16): New.
>         (vzip2q_f16): New.
>         (vmov_n_f16): Reimplement using vdup_n_f16.
>         (vmovq_n_f16): Reimplement using vdupq_n_f16..

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [AArch64][2/14] ARMv8.2-A FP16 one operand vector intrinsics
  2016-07-07 16:14   ` [AArch64][2/14] ARMv8.2-A FP16 one operand vector intrinsics Jiong Wang
@ 2016-07-20 17:00     ` Jiong Wang
  2016-07-25 11:01       ` James Greenhalgh
  0 siblings, 1 reply; 32+ messages in thread
From: Jiong Wang @ 2016-07-20 17:00 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 4119 bytes --]

On 07/07/16 17:14, Jiong Wang wrote:
> This patch add ARMv8.2-A FP16 one operand vector intrinsics.
>
> We introduced new mode iterators to cover HF modes, qualified patterns
> which was using old mode iterators are switched to new ones.
>
> We can't simply extend old iterator like VDQF to conver HF modes,
> because not all patterns using VDQF are with new FP16 support, thus we
> introduced new, temperary iterators, and only apply new iterators on
> those patterns which do have FP16 supports.

I noticed the patchset at

   https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00308.html

has some modifications on the standard name "div" and "sqrt", thus there
are minor conflicts as this patch touch "sqrt" as well.

This patch resolve the conflict and the change is to let
aarch64_emit_approx_sqrt simply return false for V4HFmode and V8HFmode.

gcc/
2016-07-20  Jiong Wang<jiong.wang@arm.com>

         * config/aarch64/aarch64-builtins.c (TYPES_BINOP_USS): New.
         * config/aarch64/aarch64-simd-builtins.def: Register new builtins.
         * config/aarch64/aarch64-simd.md (aarch64_rsqrte<mode>): Extend to HF modes.
         (neg<mode>2): Likewise.
         (abs<mode>2): Likewise.
         (<frint_pattern><mode>2): Likewise.
         (l<fcvt_pattern><su_optab><VDQF:mode><fcvt_target>2): Likewise.
         (<optab><VDQF:mode><fcvt_target>2): Likewise.
         (<fix_trunc_optab><VDQF:mode><fcvt_target>2): Likewise.
         (ftrunc<VDQF:mode>2): Likewise.
         (<optab><fcvt_target><VDQF:mode>2): Likewise.
         (sqrt<mode>2): Likewise.
         (*sqrt<mode>2): Likewise.
         (aarch64_frecpe<mode>): Likewise.
         (aarch64_cm<optab><mode>): Likewise.
         * config/aarch64/aarch64.c (aarch64_emit_approx_sqrt): Return
         false for V4HF and V8HF.
         * config/aarch64/iterators.md (VHSDF, VHSDF_DF, VHSDF_SDF): New.
         (VDQF_COND, fcvt_target, FCVT_TARGET, hcon): Extend mode attribute to HF modes.
         (stype): New.
         * config/aarch64/arm_neon.h (vdup_n_f16): New.
         (vdupq_n_f16): Likewise.
         (vld1_dup_f16): Use vdup_n_f16.
         (vld1q_dup_f16): Use vdupq_n_f16.
         (vabs_f16): New.
         (vabsq_f16): Likewise.
         (vceqz_f16): Likewise.
         (vceqzq_f16): Likewise.
         (vcgez_f16): Likewise.
         (vcgezq_f16): Likewise.
         (vcgtz_f16): Likewise.
         (vcgtzq_f16): Likewise.
         (vclez_f16): Likewise.
         (vclezq_f16): Likewise.
         (vcltz_f16): Likewise.
         (vcltzq_f16): Likewise.
         (vcvt_f16_s16): Likewise.
         (vcvtq_f16_s16): Likewise.
         (vcvt_f16_u16): Likewise.
         (vcvtq_f16_u16): Likewise.
         (vcvt_s16_f16): Likewise.
         (vcvtq_s16_f16): Likewise.
         (vcvt_u16_f16): Likewise.
         (vcvtq_u16_f16): Likewise.
         (vcvta_s16_f16): Likewise.
         (vcvtaq_s16_f16): Likewise.
         (vcvta_u16_f16): Likewise.
         (vcvtaq_u16_f16): Likewise.
         (vcvtm_s16_f16): Likewise.
         (vcvtmq_s16_f16): Likewise.
         (vcvtm_u16_f16): Likewise.
         (vcvtmq_u16_f16): Likewise.
         (vcvtn_s16_f16): Likewise.
         (vcvtnq_s16_f16): Likewise.
         (vcvtn_u16_f16): Likewise.
         (vcvtnq_u16_f16): Likewise.
         (vcvtp_s16_f16): Likewise.
         (vcvtpq_s16_f16): Likewise.
         (vcvtp_u16_f16): Likewise.
         (vcvtpq_u16_f16): Likewise.
         (vneg_f16): Likewise.
         (vnegq_f16): Likewise.
         (vrecpe_f16): Likewise.
         (vrecpeq_f16): Likewise.
         (vrnd_f16): Likewise.
         (vrndq_f16): Likewise.
         (vrnda_f16): Likewise.
         (vrndaq_f16): Likewise.
         (vrndi_f16): Likewise.
         (vrndiq_f16): Likewise.
         (vrndm_f16): Likewise.
         (vrndmq_f16): Likewise.
         (vrndn_f16): Likewise.
         (vrndnq_f16): Likewise.
         (vrndp_f16): Likewise.
         (vrndpq_f16): Likewise.
         (vrndx_f16): Likewise.
         (vrndxq_f16): Likewise.
         (vrsqrte_f16): Likewise.
         (vrsqrteq_f16): Likewise.
         (vsqrt_f16): Likewise.
         (vsqrtq_f16): Likewise.


[-- Attachment #2: upate-2.patch --]
[-- Type: text/x-patch, Size: 27312 bytes --]

diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index 6b90b2af5e9d2b5e7f48569ec1ebcb0ef16314ee..af5fac5b29cf5373561d9bf9a69c401d2bec5cec 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -139,6 +139,10 @@ aarch64_types_binop_ssu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_unsigned };
 #define TYPES_BINOP_SSU (aarch64_types_binop_ssu_qualifiers)
 static enum aarch64_type_qualifiers
+aarch64_types_binop_uss_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_none, qualifier_none };
+#define TYPES_BINOP_USS (aarch64_types_binop_uss_qualifiers)
+static enum aarch64_type_qualifiers
 aarch64_types_binopp_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_poly, qualifier_poly, qualifier_poly };
 #define TYPES_BINOPP (aarch64_types_binopp_qualifiers)
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index f1ad325f464f89c981cbdee8a8f6afafa938639a..22c87be429ba1aac2bbe77f1119d16b6b8bd6e80 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -42,7 +42,7 @@
   BUILTIN_VDC (COMBINE, combine, 0)
   BUILTIN_VB (BINOP, pmul, 0)
   BUILTIN_VALLF (BINOP, fmulx, 0)
-  BUILTIN_VDQF_DF (UNOP, sqrt, 2)
+  BUILTIN_VHSDF_DF (UNOP, sqrt, 2)
   BUILTIN_VD_BHSI (BINOP, addp, 0)
   VAR1 (UNOP, addp, 0, di)
   BUILTIN_VDQ_BHSI (UNOP, clrsb, 2)
@@ -266,23 +266,29 @@
   BUILTIN_VDQF (BINOP, smin_nanp, 0)
 
   /* Implemented by <frint_pattern><mode>2.  */
-  BUILTIN_VDQF (UNOP, btrunc, 2)
-  BUILTIN_VDQF (UNOP, ceil, 2)
-  BUILTIN_VDQF (UNOP, floor, 2)
-  BUILTIN_VDQF (UNOP, nearbyint, 2)
-  BUILTIN_VDQF (UNOP, rint, 2)
-  BUILTIN_VDQF (UNOP, round, 2)
-  BUILTIN_VDQF_DF (UNOP, frintn, 2)
+  BUILTIN_VHSDF (UNOP, btrunc, 2)
+  BUILTIN_VHSDF (UNOP, ceil, 2)
+  BUILTIN_VHSDF (UNOP, floor, 2)
+  BUILTIN_VHSDF (UNOP, nearbyint, 2)
+  BUILTIN_VHSDF (UNOP, rint, 2)
+  BUILTIN_VHSDF (UNOP, round, 2)
+  BUILTIN_VHSDF_DF (UNOP, frintn, 2)
 
   /* Implemented by l<fcvt_pattern><su_optab><VQDF:mode><vcvt_target>2.  */
+  VAR1 (UNOP, lbtruncv4hf, 2, v4hi)
+  VAR1 (UNOP, lbtruncv8hf, 2, v8hi)
   VAR1 (UNOP, lbtruncv2sf, 2, v2si)
   VAR1 (UNOP, lbtruncv4sf, 2, v4si)
   VAR1 (UNOP, lbtruncv2df, 2, v2di)
 
+  VAR1 (UNOPUS, lbtruncuv4hf, 2, v4hi)
+  VAR1 (UNOPUS, lbtruncuv8hf, 2, v8hi)
   VAR1 (UNOPUS, lbtruncuv2sf, 2, v2si)
   VAR1 (UNOPUS, lbtruncuv4sf, 2, v4si)
   VAR1 (UNOPUS, lbtruncuv2df, 2, v2di)
 
+  VAR1 (UNOP, lroundv4hf, 2, v4hi)
+  VAR1 (UNOP, lroundv8hf, 2, v8hi)
   VAR1 (UNOP, lroundv2sf, 2, v2si)
   VAR1 (UNOP, lroundv4sf, 2, v4si)
   VAR1 (UNOP, lroundv2df, 2, v2di)
@@ -290,38 +296,52 @@
   VAR1 (UNOP, lroundsf, 2, si)
   VAR1 (UNOP, lrounddf, 2, di)
 
+  VAR1 (UNOPUS, lrounduv4hf, 2, v4hi)
+  VAR1 (UNOPUS, lrounduv8hf, 2, v8hi)
   VAR1 (UNOPUS, lrounduv2sf, 2, v2si)
   VAR1 (UNOPUS, lrounduv4sf, 2, v4si)
   VAR1 (UNOPUS, lrounduv2df, 2, v2di)
   VAR1 (UNOPUS, lroundusf, 2, si)
   VAR1 (UNOPUS, lroundudf, 2, di)
 
+  VAR1 (UNOP, lceilv4hf, 2, v4hi)
+  VAR1 (UNOP, lceilv8hf, 2, v8hi)
   VAR1 (UNOP, lceilv2sf, 2, v2si)
   VAR1 (UNOP, lceilv4sf, 2, v4si)
   VAR1 (UNOP, lceilv2df, 2, v2di)
 
+  VAR1 (UNOPUS, lceiluv4hf, 2, v4hi)
+  VAR1 (UNOPUS, lceiluv8hf, 2, v8hi)
   VAR1 (UNOPUS, lceiluv2sf, 2, v2si)
   VAR1 (UNOPUS, lceiluv4sf, 2, v4si)
   VAR1 (UNOPUS, lceiluv2df, 2, v2di)
   VAR1 (UNOPUS, lceilusf, 2, si)
   VAR1 (UNOPUS, lceiludf, 2, di)
 
+  VAR1 (UNOP, lfloorv4hf, 2, v4hi)
+  VAR1 (UNOP, lfloorv8hf, 2, v8hi)
   VAR1 (UNOP, lfloorv2sf, 2, v2si)
   VAR1 (UNOP, lfloorv4sf, 2, v4si)
   VAR1 (UNOP, lfloorv2df, 2, v2di)
 
+  VAR1 (UNOPUS, lflooruv4hf, 2, v4hi)
+  VAR1 (UNOPUS, lflooruv8hf, 2, v8hi)
   VAR1 (UNOPUS, lflooruv2sf, 2, v2si)
   VAR1 (UNOPUS, lflooruv4sf, 2, v4si)
   VAR1 (UNOPUS, lflooruv2df, 2, v2di)
   VAR1 (UNOPUS, lfloorusf, 2, si)
   VAR1 (UNOPUS, lfloorudf, 2, di)
 
+  VAR1 (UNOP, lfrintnv4hf, 2, v4hi)
+  VAR1 (UNOP, lfrintnv8hf, 2, v8hi)
   VAR1 (UNOP, lfrintnv2sf, 2, v2si)
   VAR1 (UNOP, lfrintnv4sf, 2, v4si)
   VAR1 (UNOP, lfrintnv2df, 2, v2di)
   VAR1 (UNOP, lfrintnsf, 2, si)
   VAR1 (UNOP, lfrintndf, 2, di)
 
+  VAR1 (UNOPUS, lfrintnuv4hf, 2, v4hi)
+  VAR1 (UNOPUS, lfrintnuv8hf, 2, v8hi)
   VAR1 (UNOPUS, lfrintnuv2sf, 2, v2si)
   VAR1 (UNOPUS, lfrintnuv4sf, 2, v4si)
   VAR1 (UNOPUS, lfrintnuv2df, 2, v2di)
@@ -329,10 +349,14 @@
   VAR1 (UNOPUS, lfrintnudf, 2, di)
 
   /* Implemented by <optab><fcvt_target><VDQF:mode>2.  */
+  VAR1 (UNOP, floatv4hi, 2, v4hf)
+  VAR1 (UNOP, floatv8hi, 2, v8hf)
   VAR1 (UNOP, floatv2si, 2, v2sf)
   VAR1 (UNOP, floatv4si, 2, v4sf)
   VAR1 (UNOP, floatv2di, 2, v2df)
 
+  VAR1 (UNOP, floatunsv4hi, 2, v4hf)
+  VAR1 (UNOP, floatunsv8hi, 2, v8hf)
   VAR1 (UNOP, floatunsv2si, 2, v2sf)
   VAR1 (UNOP, floatunsv4si, 2, v4sf)
   VAR1 (UNOP, floatunsv2di, 2, v2df)
@@ -358,13 +382,13 @@
 
   BUILTIN_VDQ_SI (UNOP, urecpe, 0)
 
-  BUILTIN_VDQF (UNOP, frecpe, 0)
+  BUILTIN_VHSDF (UNOP, frecpe, 0)
   BUILTIN_VDQF (BINOP, frecps, 0)
 
   /* Implemented by a mixture of abs2 patterns.  Note the DImode builtin is
      only ever used for the int64x1_t intrinsic, there is no scalar version.  */
   BUILTIN_VSDQ_I_DI (UNOP, abs, 0)
-  BUILTIN_VDQF (UNOP, abs, 2)
+  BUILTIN_VHSDF (UNOP, abs, 2)
 
   BUILTIN_VQ_HSF (UNOP, vec_unpacks_hi_, 10)
   VAR1 (BINOP, float_truncate_hi_, 0, v4sf)
@@ -457,7 +481,7 @@
   BUILTIN_VALLF (SHIFTIMM_USS, fcvtzu, 3)
 
   /* Implemented by aarch64_rsqrte<mode>.  */
-  BUILTIN_VALLF (UNOP, rsqrte, 0)
+  BUILTIN_VHSDF_SDF (UNOP, rsqrte, 0)
 
   /* Implemented by aarch64_rsqrts<mode>.  */
   BUILTIN_VALLF (BINOP, rsqrts, 0)
@@ -467,3 +491,13 @@
 
   /* Implemented by aarch64_faddp<mode>.  */
   BUILTIN_VDQF (BINOP, faddp, 0)
+
+  /* Implemented by aarch64_cm<optab><mode>.  */
+  BUILTIN_VHSDF_SDF (BINOP_USS, cmeq, 0)
+  BUILTIN_VHSDF_SDF (BINOP_USS, cmge, 0)
+  BUILTIN_VHSDF_SDF (BINOP_USS, cmgt, 0)
+  BUILTIN_VHSDF_SDF (BINOP_USS, cmle, 0)
+  BUILTIN_VHSDF_SDF (BINOP_USS, cmlt, 0)
+
+  /* Implemented by neg<mode>2.  */
+  BUILTIN_VHSDF (UNOP, neg, 2)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 251ad972a4bed027f8c77946fb21ce8d94dc3035..8e922e697d2b1a5ab2e09974429a788731a4dcc5 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -383,12 +383,12 @@
 )
 
 (define_insn "aarch64_rsqrte<mode>"
-  [(set (match_operand:VALLF 0 "register_operand" "=w")
-	(unspec:VALLF [(match_operand:VALLF 1 "register_operand" "w")]
+  [(set (match_operand:VHSDF_SDF 0 "register_operand" "=w")
+	(unspec:VHSDF_SDF [(match_operand:VHSDF_SDF 1 "register_operand" "w")]
 		     UNSPEC_RSQRTE))]
   "TARGET_SIMD"
   "frsqrte\\t%<v>0<Vmtype>, %<v>1<Vmtype>"
-  [(set_attr "type" "neon_fp_rsqrte_<Vetype><q>")])
+  [(set_attr "type" "neon_fp_rsqrte_<stype><q>")])
 
 (define_insn "aarch64_rsqrts<mode>"
   [(set (match_operand:VALLF 0 "register_operand" "=w")
@@ -1565,19 +1565,19 @@
 )
 
 (define_insn "neg<mode>2"
- [(set (match_operand:VDQF 0 "register_operand" "=w")
-       (neg:VDQF (match_operand:VDQF 1 "register_operand" "w")))]
+ [(set (match_operand:VHSDF 0 "register_operand" "=w")
+       (neg:VHSDF (match_operand:VHSDF 1 "register_operand" "w")))]
  "TARGET_SIMD"
  "fneg\\t%0.<Vtype>, %1.<Vtype>"
-  [(set_attr "type" "neon_fp_neg_<Vetype><q>")]
+  [(set_attr "type" "neon_fp_neg_<stype><q>")]
 )
 
 (define_insn "abs<mode>2"
- [(set (match_operand:VDQF 0 "register_operand" "=w")
-       (abs:VDQF (match_operand:VDQF 1 "register_operand" "w")))]
+ [(set (match_operand:VHSDF 0 "register_operand" "=w")
+       (abs:VHSDF (match_operand:VHSDF 1 "register_operand" "w")))]
  "TARGET_SIMD"
  "fabs\\t%0.<Vtype>, %1.<Vtype>"
-  [(set_attr "type" "neon_fp_abs_<Vetype><q>")]
+  [(set_attr "type" "neon_fp_abs_<stype><q>")]
 )
 
 (define_insn "fma<mode>4"
@@ -1735,24 +1735,24 @@
 ;; Vector versions of the floating-point frint patterns.
 ;; Expands to btrunc, ceil, floor, nearbyint, rint, round, frintn.
 (define_insn "<frint_pattern><mode>2"
-  [(set (match_operand:VDQF 0 "register_operand" "=w")
-	(unspec:VDQF [(match_operand:VDQF 1 "register_operand" "w")]
-		      FRINT))]
+  [(set (match_operand:VHSDF 0 "register_operand" "=w")
+	(unspec:VHSDF [(match_operand:VHSDF 1 "register_operand" "w")]
+		       FRINT))]
   "TARGET_SIMD"
   "frint<frint_suffix>\\t%0.<Vtype>, %1.<Vtype>"
-  [(set_attr "type" "neon_fp_round_<Vetype><q>")]
+  [(set_attr "type" "neon_fp_round_<stype><q>")]
 )
 
 ;; Vector versions of the fcvt standard patterns.
 ;; Expands to lbtrunc, lround, lceil, lfloor
-(define_insn "l<fcvt_pattern><su_optab><VDQF:mode><fcvt_target>2"
+(define_insn "l<fcvt_pattern><su_optab><VHSDF:mode><fcvt_target>2"
   [(set (match_operand:<FCVT_TARGET> 0 "register_operand" "=w")
 	(FIXUORS:<FCVT_TARGET> (unspec:<FCVT_TARGET>
-			       [(match_operand:VDQF 1 "register_operand" "w")]
+			       [(match_operand:VHSDF 1 "register_operand" "w")]
 			       FCVT)))]
   "TARGET_SIMD"
   "fcvt<frint_suffix><su>\\t%0.<Vtype>, %1.<Vtype>"
-  [(set_attr "type" "neon_fp_to_int_<Vetype><q>")]
+  [(set_attr "type" "neon_fp_to_int_<stype><q>")]
 )
 
 (define_insn "*aarch64_fcvt<su_optab><VDQF:mode><fcvt_target>2_mult"
@@ -1775,36 +1775,36 @@
   [(set_attr "type" "neon_fp_to_int_<Vetype><q>")]
 )
 
-(define_expand "<optab><VDQF:mode><fcvt_target>2"
+(define_expand "<optab><VHSDF:mode><fcvt_target>2"
   [(set (match_operand:<FCVT_TARGET> 0 "register_operand")
 	(FIXUORS:<FCVT_TARGET> (unspec:<FCVT_TARGET>
-			       [(match_operand:VDQF 1 "register_operand")]
-			       UNSPEC_FRINTZ)))]
+			       [(match_operand:VHSDF 1 "register_operand")]
+				UNSPEC_FRINTZ)))]
   "TARGET_SIMD"
   {})
 
-(define_expand "<fix_trunc_optab><VDQF:mode><fcvt_target>2"
+(define_expand "<fix_trunc_optab><VHSDF:mode><fcvt_target>2"
   [(set (match_operand:<FCVT_TARGET> 0 "register_operand")
 	(FIXUORS:<FCVT_TARGET> (unspec:<FCVT_TARGET>
-			       [(match_operand:VDQF 1 "register_operand")]
-			       UNSPEC_FRINTZ)))]
+			       [(match_operand:VHSDF 1 "register_operand")]
+				UNSPEC_FRINTZ)))]
   "TARGET_SIMD"
   {})
 
-(define_expand "ftrunc<VDQF:mode>2"
-  [(set (match_operand:VDQF 0 "register_operand")
-	(unspec:VDQF [(match_operand:VDQF 1 "register_operand")]
-		      UNSPEC_FRINTZ))]
+(define_expand "ftrunc<VHSDF:mode>2"
+  [(set (match_operand:VHSDF 0 "register_operand")
+	(unspec:VHSDF [(match_operand:VHSDF 1 "register_operand")]
+		       UNSPEC_FRINTZ))]
   "TARGET_SIMD"
   {})
 
-(define_insn "<optab><fcvt_target><VDQF:mode>2"
-  [(set (match_operand:VDQF 0 "register_operand" "=w")
-	(FLOATUORS:VDQF
+(define_insn "<optab><fcvt_target><VHSDF:mode>2"
+  [(set (match_operand:VHSDF 0 "register_operand" "=w")
+	(FLOATUORS:VHSDF
 	  (match_operand:<FCVT_TARGET> 1 "register_operand" "w")))]
   "TARGET_SIMD"
   "<su_optab>cvtf\\t%0.<Vtype>, %1.<Vtype>"
-  [(set_attr "type" "neon_int_to_fp_<Vetype><q>")]
+  [(set_attr "type" "neon_int_to_fp_<stype><q>")]
 )
 
 ;; Conversions between vectors of floats and doubles.
@@ -4296,14 +4296,14 @@
   [(set (match_operand:<V_cmp_result> 0 "register_operand" "=w,w")
 	(neg:<V_cmp_result>
 	  (COMPARISONS:<V_cmp_result>
-	    (match_operand:VALLF 1 "register_operand" "w,w")
-	    (match_operand:VALLF 2 "aarch64_simd_reg_or_zero" "w,YDz")
+	    (match_operand:VHSDF_SDF 1 "register_operand" "w,w")
+	    (match_operand:VHSDF_SDF 2 "aarch64_simd_reg_or_zero" "w,YDz")
 	  )))]
   "TARGET_SIMD"
   "@
   fcm<n_optab>\t%<v>0<Vmtype>, %<v><cmp_1><Vmtype>, %<v><cmp_2><Vmtype>
   fcm<optab>\t%<v>0<Vmtype>, %<v>1<Vmtype>, 0"
-  [(set_attr "type" "neon_fp_compare_<Vetype><q>")]
+  [(set_attr "type" "neon_fp_compare_<stype><q>")]
 )
 
 ;; fac(ge|gt)
@@ -4348,8 +4348,8 @@
 ;; sqrt
 
 (define_expand "sqrt<mode>2"
-  [(set (match_operand:VDQF 0 "register_operand")
-	(sqrt:VDQF (match_operand:VDQF 1 "register_operand")))]
+  [(set (match_operand:VHSDF 0 "register_operand" "=w")
+	(sqrt:VHSDF (match_operand:VHSDF 1 "register_operand" "w")))]
   "TARGET_SIMD"
 {
   if (aarch64_emit_approx_sqrt (operands[0], operands[1], false))
@@ -4357,11 +4357,11 @@
 })
 
 (define_insn "*sqrt<mode>2"
-  [(set (match_operand:VDQF 0 "register_operand" "=w")
-        (sqrt:VDQF (match_operand:VDQF 1 "register_operand" "w")))]
+  [(set (match_operand:VHSDF 0 "register_operand" "=w")
+	(sqrt:VHSDF (match_operand:VHSDF 1 "register_operand" "w")))]
   "TARGET_SIMD"
   "fsqrt\\t%0.<Vtype>, %1.<Vtype>"
-  [(set_attr "type" "neon_fp_sqrt_<Vetype><q>")]
+  [(set_attr "type" "neon_fp_sqrt_<stype><q>")]
 )
 
 ;; Patterns for vector struct loads and stores.
@@ -5413,12 +5413,12 @@
 )
 
 (define_insn "aarch64_frecpe<mode>"
-  [(set (match_operand:VDQF 0 "register_operand" "=w")
-	(unspec:VDQF [(match_operand:VDQF 1 "register_operand" "w")]
-		    UNSPEC_FRECPE))]
+  [(set (match_operand:VHSDF 0 "register_operand" "=w")
+	(unspec:VHSDF [(match_operand:VHSDF 1 "register_operand" "w")]
+	 UNSPEC_FRECPE))]
   "TARGET_SIMD"
   "frecpe\\t%0.<Vtype>, %1.<Vtype>"
-  [(set_attr "type" "neon_fp_recpe_<Vetype><q>")]
+  [(set_attr "type" "neon_fp_recpe_<stype><q>")]
 )
 
 (define_insn "aarch64_frecp<FRECP:frecp_suffix><mode>"
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 58a9d695c0ef9e6e1d67030580428699aba05be4..5ed633542efe58763d68fd9bfbb478ae6ef569c3 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -7598,6 +7598,10 @@ bool
 aarch64_emit_approx_sqrt (rtx dst, rtx src, bool recp)
 {
   machine_mode mode = GET_MODE (dst);
+
+  if (mode == V4HFmode || mode == V8HFmode)
+    return false;
+
   machine_mode mmsk = mode_for_vector
 		        (int_mode_for_mode (GET_MODE_INNER (mode)),
 			 GET_MODE_NUNITS (mode));
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index fd5f094de6a058065e2b1377f5ffc4c1aba01f97..b4310f27aac08ab6ff5e89d58512dafc389b2c37 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -26028,6 +26028,365 @@ __INTERLEAVE_LIST (zip)
 
 /* End of optimal implementations in approved order.  */
 
+#pragma GCC pop_options
+
+/* ARMv8.2-A FP16 intrinsics.  */
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8.2-a+fp16")
+
+/* ARMv8.2-A FP16 one operand vector intrinsics.  */
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vabs_f16 (float16x4_t __a)
+{
+  return __builtin_aarch64_absv4hf (__a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vabsq_f16 (float16x8_t __a)
+{
+  return __builtin_aarch64_absv8hf (__a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vceqz_f16 (float16x4_t __a)
+{
+  return __builtin_aarch64_cmeqv4hf_uss (__a, vdup_n_f16 (0.0f));
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vceqzq_f16 (float16x8_t __a)
+{
+  return __builtin_aarch64_cmeqv8hf_uss (__a, vdupq_n_f16 (0.0f));
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcgez_f16 (float16x4_t __a)
+{
+  return __builtin_aarch64_cmgev4hf_uss (__a, vdup_n_f16 (0.0f));
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcgezq_f16 (float16x8_t __a)
+{
+  return __builtin_aarch64_cmgev8hf_uss (__a, vdupq_n_f16 (0.0f));
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcgtz_f16 (float16x4_t __a)
+{
+  return __builtin_aarch64_cmgtv4hf_uss (__a, vdup_n_f16 (0.0f));
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcgtzq_f16 (float16x8_t __a)
+{
+  return __builtin_aarch64_cmgtv8hf_uss (__a, vdupq_n_f16 (0.0f));
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vclez_f16 (float16x4_t __a)
+{
+  return __builtin_aarch64_cmlev4hf_uss (__a, vdup_n_f16 (0.0f));
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vclezq_f16 (float16x8_t __a)
+{
+  return __builtin_aarch64_cmlev8hf_uss (__a, vdupq_n_f16 (0.0f));
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcltz_f16 (float16x4_t __a)
+{
+  return __builtin_aarch64_cmltv4hf_uss (__a, vdup_n_f16 (0.0f));
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcltzq_f16 (float16x8_t __a)
+{
+  return __builtin_aarch64_cmltv8hf_uss (__a, vdupq_n_f16 (0.0f));
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vcvt_f16_s16 (int16x4_t __a)
+{
+  return __builtin_aarch64_floatv4hiv4hf (__a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vcvtq_f16_s16 (int16x8_t __a)
+{
+  return __builtin_aarch64_floatv8hiv8hf (__a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vcvt_f16_u16 (uint16x4_t __a)
+{
+  return __builtin_aarch64_floatunsv4hiv4hf ((int16x4_t) __a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vcvtq_f16_u16 (uint16x8_t __a)
+{
+  return __builtin_aarch64_floatunsv8hiv8hf ((int16x8_t) __a);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vcvt_s16_f16 (float16x4_t __a)
+{
+  return __builtin_aarch64_lbtruncv4hfv4hi (__a);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vcvtq_s16_f16 (float16x8_t __a)
+{
+  return __builtin_aarch64_lbtruncv8hfv8hi (__a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcvt_u16_f16 (float16x4_t __a)
+{
+  return __builtin_aarch64_lbtruncuv4hfv4hi_us (__a);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcvtq_u16_f16 (float16x8_t __a)
+{
+  return __builtin_aarch64_lbtruncuv8hfv8hi_us (__a);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vcvta_s16_f16 (float16x4_t __a)
+{
+  return __builtin_aarch64_lroundv4hfv4hi (__a);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vcvtaq_s16_f16 (float16x8_t __a)
+{
+  return __builtin_aarch64_lroundv8hfv8hi (__a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcvta_u16_f16 (float16x4_t __a)
+{
+  return __builtin_aarch64_lrounduv4hfv4hi_us (__a);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcvtaq_u16_f16 (float16x8_t __a)
+{
+  return __builtin_aarch64_lrounduv8hfv8hi_us (__a);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vcvtm_s16_f16 (float16x4_t __a)
+{
+  return __builtin_aarch64_lfloorv4hfv4hi (__a);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vcvtmq_s16_f16 (float16x8_t __a)
+{
+  return __builtin_aarch64_lfloorv8hfv8hi (__a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcvtm_u16_f16 (float16x4_t __a)
+{
+  return __builtin_aarch64_lflooruv4hfv4hi_us (__a);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcvtmq_u16_f16 (float16x8_t __a)
+{
+  return __builtin_aarch64_lflooruv8hfv8hi_us (__a);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vcvtn_s16_f16 (float16x4_t __a)
+{
+  return __builtin_aarch64_lfrintnv4hfv4hi (__a);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vcvtnq_s16_f16 (float16x8_t __a)
+{
+  return __builtin_aarch64_lfrintnv8hfv8hi (__a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcvtn_u16_f16 (float16x4_t __a)
+{
+  return __builtin_aarch64_lfrintnuv4hfv4hi_us (__a);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcvtnq_u16_f16 (float16x8_t __a)
+{
+  return __builtin_aarch64_lfrintnuv8hfv8hi_us (__a);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vcvtp_s16_f16 (float16x4_t __a)
+{
+  return __builtin_aarch64_lceilv4hfv4hi (__a);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vcvtpq_s16_f16 (float16x8_t __a)
+{
+  return __builtin_aarch64_lceilv8hfv8hi (__a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcvtp_u16_f16 (float16x4_t __a)
+{
+  return __builtin_aarch64_lceiluv4hfv4hi_us (__a);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcvtpq_u16_f16 (float16x8_t __a)
+{
+  return __builtin_aarch64_lceiluv8hfv8hi_us (__a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vneg_f16 (float16x4_t __a)
+{
+  return -__a;
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vnegq_f16 (float16x8_t __a)
+{
+  return -__a;
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vrecpe_f16 (float16x4_t __a)
+{
+  return __builtin_aarch64_frecpev4hf (__a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vrecpeq_f16 (float16x8_t __a)
+{
+  return __builtin_aarch64_frecpev8hf (__a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vrnd_f16 (float16x4_t __a)
+{
+  return __builtin_aarch64_btruncv4hf (__a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vrndq_f16 (float16x8_t __a)
+{
+  return __builtin_aarch64_btruncv8hf (__a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vrnda_f16 (float16x4_t __a)
+{
+  return __builtin_aarch64_roundv4hf (__a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vrndaq_f16 (float16x8_t __a)
+{
+  return __builtin_aarch64_roundv8hf (__a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vrndi_f16 (float16x4_t __a)
+{
+  return __builtin_aarch64_nearbyintv4hf (__a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vrndiq_f16 (float16x8_t __a)
+{
+  return __builtin_aarch64_nearbyintv8hf (__a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vrndm_f16 (float16x4_t __a)
+{
+  return __builtin_aarch64_floorv4hf (__a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vrndmq_f16 (float16x8_t __a)
+{
+  return __builtin_aarch64_floorv8hf (__a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vrndn_f16 (float16x4_t __a)
+{
+  return __builtin_aarch64_frintnv4hf (__a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vrndnq_f16 (float16x8_t __a)
+{
+  return __builtin_aarch64_frintnv8hf (__a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vrndp_f16 (float16x4_t __a)
+{
+  return __builtin_aarch64_ceilv4hf (__a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vrndpq_f16 (float16x8_t __a)
+{
+  return __builtin_aarch64_ceilv8hf (__a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vrndx_f16 (float16x4_t __a)
+{
+  return __builtin_aarch64_rintv4hf (__a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vrndxq_f16 (float16x8_t __a)
+{
+  return __builtin_aarch64_rintv8hf (__a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vrsqrte_f16 (float16x4_t a)
+{
+  return __builtin_aarch64_rsqrtev4hf (a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vrsqrteq_f16 (float16x8_t a)
+{
+  return __builtin_aarch64_rsqrtev8hf (a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vsqrt_f16 (float16x4_t a)
+{
+  return __builtin_aarch64_sqrtv4hf (a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vsqrtq_f16 (float16x8_t a)
+{
+  return __builtin_aarch64_sqrtv8hf (a);
+}
+
+#pragma GCC pop_options
+
 #undef __aarch64_vget_lane_any
 
 #undef __aarch64_vdup_lane_any
@@ -26084,6 +26443,4 @@ __INTERLEAVE_LIST (zip)
 #undef __aarch64_vdupq_laneq_u32
 #undef __aarch64_vdupq_laneq_u64
 
-#pragma GCC pop_options
-
 #endif
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index e8fbb1281dec2e8f37f58ef2ced792dd62e3b5aa..af5eda9b9f4a80e1309655dcd7798337e1d818eb 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -88,11 +88,20 @@
 ;; Vector Float modes suitable for moving, loading and storing.
 (define_mode_iterator VDQF_F16 [V4HF V8HF V2SF V4SF V2DF])
 
-;; Vector Float modes, barring HF modes.
+;; Vector Float modes.
 (define_mode_iterator VDQF [V2SF V4SF V2DF])
+(define_mode_iterator VHSDF [(V4HF "TARGET_SIMD_F16INST")
+			     (V8HF "TARGET_SIMD_F16INST")
+			     V2SF V4SF V2DF])
 
 ;; Vector Float modes, and DF.
 (define_mode_iterator VDQF_DF [V2SF V4SF V2DF DF])
+(define_mode_iterator VHSDF_DF [(V4HF "TARGET_SIMD_F16INST")
+				(V8HF "TARGET_SIMD_F16INST")
+				V2SF V4SF V2DF DF])
+(define_mode_iterator VHSDF_SDF [(V4HF "TARGET_SIMD_F16INST")
+				 (V8HF "TARGET_SIMD_F16INST")
+				 V2SF V4SF V2DF SF DF])
 
 ;; Vector single Float modes.
 (define_mode_iterator VDQSF [V2SF V4SF])
@@ -366,7 +375,8 @@
 		    (V4HI "") (V8HI "")
 		    (V2SI "") (V4SI  "")
 		    (V2DI "") (V2SF "")
-		    (V4SF "") (V2DF "")])
+		    (V4SF "") (V4HF "")
+		    (V8HF "") (V2DF "")])
 
 ;; For scalar usage of vector/FP registers, narrowing
 (define_mode_attr vn2 [(QI "") (HI "b") (SI "h") (DI "s")
@@ -447,6 +457,16 @@
 			  (QI "b")   (HI "h")
 			  (SI "s")   (DI "d")])
 
+;; Vetype is used everywhere in scheduling type and assembly output,
+;; sometimes they are not the same, for example HF modes on some
+;; instructions.  stype is defined to represent scheduling type
+;; more accurately.
+(define_mode_attr stype [(V8QI "b") (V16QI "b") (V4HI "s") (V8HI "s")
+			 (V2SI "s") (V4SI "s") (V2DI "d") (V4HF "s")
+			 (V8HF "s") (V2SF "s") (V4SF "s") (V2DF "d")
+			 (HF "s") (SF "s") (DF "d") (QI "b") (HI "s")
+			 (SI "s") (DI "d")])
+
 ;; Mode-to-bitwise operation type mapping.
 (define_mode_attr Vbtype [(V8QI "8b")  (V16QI "16b")
 			  (V4HI "8b") (V8HI  "16b")
@@ -656,10 +676,14 @@
 
 (define_mode_attr fcvt_target [(V2DF "v2di") (V4SF "v4si") (V2SF "v2si")
 			       (V2DI "v2df") (V4SI "v4sf") (V2SI "v2sf")
-			       (SF "si") (DF "di") (SI "sf") (DI "df")])
+			       (SF "si") (DF "di") (SI "sf") (DI "df")
+			       (V4HF "v4hi") (V8HF "v8hi") (V4HI "v4hf")
+			       (V8HI "v8hf")])
 (define_mode_attr FCVT_TARGET [(V2DF "V2DI") (V4SF "V4SI") (V2SF "V2SI")
 			       (V2DI "V2DF") (V4SI "V4SF") (V2SI "V2SF")
-			       (SF "SI") (DF "DI") (SI "SF") (DI "DF")])
+			       (SF "SI") (DF "DI") (SI "SF") (DI "DF")
+			       (V4HF "V4HI") (V8HF "V8HI") (V4HI "V4HF")
+			       (V8HI "V8HF")])
 
 
 ;; for the inequal width integer to fp conversions
@@ -687,6 +711,7 @@
 ;; the 'x' constraint.  All other modes may use the 'w' constraint.
 (define_mode_attr h_con [(V2SI "w") (V4SI "w")
 			 (V4HI "x") (V8HI "x")
+			 (V4HF "w") (V8HF "w")
 			 (V2SF "w") (V4SF "w")
 			 (V2DF "w") (DF "w")])
 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [AArch64][7/14] ARMv8.2-A FP16 one operand scalar intrinsics
       [not found]               ` <b6150268-1e2d-3fc6-17c9-7bde47e2534e@foss.arm.com>
@ 2016-07-20 17:01                 ` Jiong Wang
  2016-07-25 11:14                   ` James Greenhalgh
  0 siblings, 1 reply; 32+ messages in thread
From: Jiong Wang @ 2016-07-20 17:01 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 3658 bytes --]

On 07/07/16 17:17, Jiong Wang wrote:
> This patch add ARMv8.2-A FP16 one operand scalar intrinsics
>
> Scalar intrinsics are kept in arm_fp16.h instead of arm_neon.h.

The updated patch resolve the conflict with

    https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00308.html

The change is to let aarch64_emit_approx_sqrt return false for HFmode.

gcc/
2016-07-20  Jiong Wang<jiong.wang@arm.com>

         * config.gcc (aarch64*-*-*): Install arm_fp16.h.
         * config/aarch64/aarch64-builtins.c (hi_UP): New.
         * config/aarch64/aarch64-simd-builtins.def: Register new builtins.
         * config/aarch64/aarch64-simd.md (aarch64_frsqrte<mode>): Extend to HF mode.
         (aarch64_frecp<FRECP:frecp_suffix><mode>): Likewise.
         (aarch64_cm<optab><mode>): Likewise.
         * config/aarch64/aarch64.md (<frint_pattern><mode>2): Likewise.
         (l<fcvt_pattern><su_optab><GPF:mode><GPI:mode>2): Likewise.
         (fix_trunc<GPF:mode><GPI:mode>2): Likewise.
         (sqrt<mode>2): Likewise.
         (*sqrt<mode>2): Likewise.
         (abs<mode>2): Likewise.
         (<optab><mode>hf2): New pattern for HF mode.
         (<optab>hihf2): Likewise.
         * config/aarch64/aarch64.c (aarch64_emit_approx_sqrt): Return
         for HF mode.
         * config/aarch64/arm_neon.h: Include arm_fp16.h.
         * config/aarch64/iterators.md (GPF_F16): New.
         (GPI_F16): Likewise.
         (VHSDF_HSDF): Likewise.
         (w1): Support HF mode.
         (w2): Likewise.
         (v): Likewise.
         (s): Likewise.
         (q): Likewise.
         (Vmtype): Likewise.
         (V_cmp_result): Likewise.
         (fcvt_iesize): Likewise.
         (FCVT_IESIZE): Likewise.
         * config/aarch64/arm_fp16.h: New file.
         (vabsh_f16): New.
         (vceqzh_f16): Likewise.
         (vcgezh_f16): Likewise.
         (vcgtzh_f16): Likewise.
         (vclezh_f16): Likewise.
         (vcltzh_f16): Likewise.
         (vcvth_f16_s16): Likewise.
         (vcvth_f16_s32): Likewise.
         (vcvth_f16_s64): Likewise.
         (vcvth_f16_u16): Likewise.
         (vcvth_f16_u32): Likewise.
         (vcvth_f16_u64): Likewise.
         (vcvth_s16_f16): Likewise.
         (vcvth_s32_f16): Likewise.
         (vcvth_s64_f16): Likewise.
         (vcvth_u16_f16): Likewise.
         (vcvth_u32_f16): Likewise.
         (vcvth_u64_f16): Likewise.
         (vcvtah_s16_f16): Likewise.
         (vcvtah_s32_f16): Likewise.
         (vcvtah_s64_f16): Likewise.
         (vcvtah_u16_f16): Likewise.
         (vcvtah_u32_f16): Likewise.
         (vcvtah_u64_f16): Likewise.
         (vcvtmh_s16_f16): Likewise.
         (vcvtmh_s32_f16): Likewise.
         (vcvtmh_s64_f16): Likewise.
         (vcvtmh_u16_f16): Likewise.
         (vcvtmh_u32_f16): Likewise.
         (vcvtmh_u64_f16): Likewise.
         (vcvtnh_s16_f16): Likewise.
         (vcvtnh_s32_f16): Likewise.
         (vcvtnh_s64_f16): Likewise.
         (vcvtnh_u16_f16): Likewise.
         (vcvtnh_u32_f16): Likewise.
         (vcvtnh_u64_f16): Likewise.
         (vcvtph_s16_f16): Likewise.
         (vcvtph_s32_f16): Likewise.
         (vcvtph_s64_f16): Likewise.
         (vcvtph_u16_f16): Likewise.
         (vcvtph_u32_f16): Likewise.
         (vcvtph_u64_f16): Likewise.
         (vnegh_f16): Likewise.
         (vrecpeh_f16): Likewise.
         (vrecpxh_f16): Likewise.
         (vrndh_f16): Likewise.
         (vrndah_f16): Likewise.
         (vrndih_f16): Likewise.
         (vrndmh_f16): Likewise.
         (vrndnh_f16): Likewise.
         (vrndph_f16): Likewise.
         (vrndxh_f16): Likewise.
         (vrsqrteh_f16): Likewise.
         (vsqrth_f16): Likewise.


[-- Attachment #2: upate-7.patch --]
[-- Type: text/x-patch, Size: 28465 bytes --]

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 1f75f17877334c2bb61cd16b69539ec7514db8ae..8827dc830d374c2512be5713d6dd143913f53c7d 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -300,7 +300,7 @@ m32c*-*-*)
         ;;
 aarch64*-*-*)
 	cpu_type=aarch64
-	extra_headers="arm_neon.h arm_acle.h"
+	extra_headers="arm_fp16.h arm_neon.h arm_acle.h"
 	c_target_objs="aarch64-c.o"
 	cxx_target_objs="aarch64-c.o"
 	extra_objs="aarch64-builtins.o aarch-common.o cortex-a57-fma-steering.o"
diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index af5fac5b29cf5373561d9bf9a69c401d2bec5cec..ca91d9108ead3eb83c21ee86d9e6ed44c8f4ad2d 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -62,6 +62,7 @@
 #define si_UP    SImode
 #define sf_UP    SFmode
 #define hi_UP    HImode
+#define hf_UP    HFmode
 #define qi_UP    QImode
 #define UP(X) X##_UP
 
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 363e131327d6be04dd94e664ef839e46f26940e4..6f50d8405d3ee8c4823037bb2022a4f2f08b72fe 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -274,6 +274,14 @@
   BUILTIN_VHSDF (UNOP, round, 2)
   BUILTIN_VHSDF_DF (UNOP, frintn, 2)
 
+  VAR1 (UNOP, btrunc, 2, hf)
+  VAR1 (UNOP, ceil, 2, hf)
+  VAR1 (UNOP, floor, 2, hf)
+  VAR1 (UNOP, frintn, 2, hf)
+  VAR1 (UNOP, nearbyint, 2, hf)
+  VAR1 (UNOP, rint, 2, hf)
+  VAR1 (UNOP, round, 2, hf)
+
   /* Implemented by l<fcvt_pattern><su_optab><VQDF:mode><vcvt_target>2.  */
   VAR1 (UNOP, lbtruncv4hf, 2, v4hi)
   VAR1 (UNOP, lbtruncv8hf, 2, v8hi)
@@ -292,7 +300,8 @@
   VAR1 (UNOP, lroundv2sf, 2, v2si)
   VAR1 (UNOP, lroundv4sf, 2, v4si)
   VAR1 (UNOP, lroundv2df, 2, v2di)
-  /* Implemented by l<fcvt_pattern><su_optab><GPF:mode><GPI:mode>2.  */
+  /* Implemented by l<fcvt_pattern><su_optab><GPF_F16:mode><GPI:mode>2.  */
+  BUILTIN_GPI_I16 (UNOP, lroundhf, 2)
   VAR1 (UNOP, lroundsf, 2, si)
   VAR1 (UNOP, lrounddf, 2, di)
 
@@ -301,6 +310,7 @@
   VAR1 (UNOPUS, lrounduv2sf, 2, v2si)
   VAR1 (UNOPUS, lrounduv4sf, 2, v4si)
   VAR1 (UNOPUS, lrounduv2df, 2, v2di)
+  BUILTIN_GPI_I16 (UNOPUS, lrounduhf, 2)
   VAR1 (UNOPUS, lroundusf, 2, si)
   VAR1 (UNOPUS, lroundudf, 2, di)
 
@@ -309,12 +319,14 @@
   VAR1 (UNOP, lceilv2sf, 2, v2si)
   VAR1 (UNOP, lceilv4sf, 2, v4si)
   VAR1 (UNOP, lceilv2df, 2, v2di)
+  BUILTIN_GPI_I16 (UNOP, lceilhf, 2)
 
   VAR1 (UNOPUS, lceiluv4hf, 2, v4hi)
   VAR1 (UNOPUS, lceiluv8hf, 2, v8hi)
   VAR1 (UNOPUS, lceiluv2sf, 2, v2si)
   VAR1 (UNOPUS, lceiluv4sf, 2, v4si)
   VAR1 (UNOPUS, lceiluv2df, 2, v2di)
+  BUILTIN_GPI_I16 (UNOPUS, lceiluhf, 2)
   VAR1 (UNOPUS, lceilusf, 2, si)
   VAR1 (UNOPUS, lceiludf, 2, di)
 
@@ -323,12 +335,14 @@
   VAR1 (UNOP, lfloorv2sf, 2, v2si)
   VAR1 (UNOP, lfloorv4sf, 2, v4si)
   VAR1 (UNOP, lfloorv2df, 2, v2di)
+  BUILTIN_GPI_I16 (UNOP, lfloorhf, 2)
 
   VAR1 (UNOPUS, lflooruv4hf, 2, v4hi)
   VAR1 (UNOPUS, lflooruv8hf, 2, v8hi)
   VAR1 (UNOPUS, lflooruv2sf, 2, v2si)
   VAR1 (UNOPUS, lflooruv4sf, 2, v4si)
   VAR1 (UNOPUS, lflooruv2df, 2, v2di)
+  BUILTIN_GPI_I16 (UNOPUS, lflooruhf, 2)
   VAR1 (UNOPUS, lfloorusf, 2, si)
   VAR1 (UNOPUS, lfloorudf, 2, di)
 
@@ -337,6 +351,7 @@
   VAR1 (UNOP, lfrintnv2sf, 2, v2si)
   VAR1 (UNOP, lfrintnv4sf, 2, v4si)
   VAR1 (UNOP, lfrintnv2df, 2, v2di)
+  BUILTIN_GPI_I16 (UNOP, lfrintnhf, 2)
   VAR1 (UNOP, lfrintnsf, 2, si)
   VAR1 (UNOP, lfrintndf, 2, di)
 
@@ -345,6 +360,7 @@
   VAR1 (UNOPUS, lfrintnuv2sf, 2, v2si)
   VAR1 (UNOPUS, lfrintnuv4sf, 2, v4si)
   VAR1 (UNOPUS, lfrintnuv2df, 2, v2di)
+  BUILTIN_GPI_I16 (UNOPUS, lfrintnuhf, 2)
   VAR1 (UNOPUS, lfrintnusf, 2, si)
   VAR1 (UNOPUS, lfrintnudf, 2, di)
 
@@ -376,9 +392,9 @@
 
   /* Implemented by
      aarch64_frecp<FRECP:frecp_suffix><mode>.  */
-  BUILTIN_GPF (UNOP, frecpe, 0)
+  BUILTIN_GPF_F16 (UNOP, frecpe, 0)
   BUILTIN_GPF (BINOP, frecps, 0)
-  BUILTIN_GPF (UNOP, frecpx, 0)
+  BUILTIN_GPF_F16 (UNOP, frecpx, 0)
 
   BUILTIN_VDQ_SI (UNOP, urecpe, 0)
 
@@ -389,6 +405,7 @@
      only ever used for the int64x1_t intrinsic, there is no scalar version.  */
   BUILTIN_VSDQ_I_DI (UNOP, abs, 0)
   BUILTIN_VHSDF (UNOP, abs, 2)
+  VAR1 (UNOP, abs, 2, hf)
 
   BUILTIN_VQ_HSF (UNOP, vec_unpacks_hi_, 10)
   VAR1 (BINOP, float_truncate_hi_, 0, v4sf)
@@ -483,7 +500,7 @@
   BUILTIN_VHSDF_SDF (SHIFTIMM_USS, fcvtzu, 3)
 
   /* Implemented by aarch64_rsqrte<mode>.  */
-  BUILTIN_VHSDF_SDF (UNOP, rsqrte, 0)
+  BUILTIN_VHSDF_HSDF (UNOP, rsqrte, 0)
 
   /* Implemented by aarch64_rsqrts<mode>.  */
   BUILTIN_VHSDF_SDF (BINOP, rsqrts, 0)
@@ -495,17 +512,34 @@
   BUILTIN_VHSDF (BINOP, faddp, 0)
 
   /* Implemented by aarch64_cm<optab><mode>.  */
-  BUILTIN_VHSDF_SDF (BINOP_USS, cmeq, 0)
-  BUILTIN_VHSDF_SDF (BINOP_USS, cmge, 0)
-  BUILTIN_VHSDF_SDF (BINOP_USS, cmgt, 0)
-  BUILTIN_VHSDF_SDF (BINOP_USS, cmle, 0)
-  BUILTIN_VHSDF_SDF (BINOP_USS, cmlt, 0)
+  BUILTIN_VHSDF_HSDF (BINOP_USS, cmeq, 0)
+  BUILTIN_VHSDF_HSDF (BINOP_USS, cmge, 0)
+  BUILTIN_VHSDF_HSDF (BINOP_USS, cmgt, 0)
+  BUILTIN_VHSDF_HSDF (BINOP_USS, cmle, 0)
+  BUILTIN_VHSDF_HSDF (BINOP_USS, cmlt, 0)
 
   /* Implemented by neg<mode>2.  */
-  BUILTIN_VHSDF (UNOP, neg, 2)
+  BUILTIN_VHSDF_HSDF (UNOP, neg, 2)
 
   /* Implemented by aarch64_fac<optab><mode>.  */
   BUILTIN_VHSDF_SDF (BINOP_USS, faclt, 0)
   BUILTIN_VHSDF_SDF (BINOP_USS, facle, 0)
   BUILTIN_VHSDF_SDF (BINOP_USS, facgt, 0)
   BUILTIN_VHSDF_SDF (BINOP_USS, facge, 0)
+
+  /* Implemented by sqrt<mode>2.  */
+  VAR1 (UNOP, sqrt, 2, hf)
+
+  /* Implemented by <optab><mode>hf2.  */
+  VAR1 (UNOP, floatdi, 2, hf)
+  VAR1 (UNOP, floatsi, 2, hf)
+  VAR1 (UNOP, floathi, 2, hf)
+  VAR1 (UNOPUS, floatunsdi, 2, hf)
+  VAR1 (UNOPUS, floatunssi, 2, hf)
+  VAR1 (UNOPUS, floatunshi, 2, hf)
+  BUILTIN_GPI_I16 (UNOP, fix_trunchf, 2)
+  BUILTIN_GPI (UNOP, fix_truncsf, 2)
+  BUILTIN_GPI (UNOP, fix_truncdf, 2)
+  BUILTIN_GPI_I16 (UNOPUS, fixuns_trunchf, 2)
+  BUILTIN_GPI (UNOPUS, fixuns_truncsf, 2)
+  BUILTIN_GPI (UNOPUS, fixuns_truncdf, 2)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 3270dd8321062f211470e3067860d70470efa8f1..d2a274495204c23d40b6522fe33599e407c71bc4 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -383,8 +383,8 @@
 )
 
 (define_insn "aarch64_rsqrte<mode>"
-  [(set (match_operand:VHSDF_SDF 0 "register_operand" "=w")
-	(unspec:VHSDF_SDF [(match_operand:VHSDF_SDF 1 "register_operand" "w")]
+  [(set (match_operand:VHSDF_HSDF 0 "register_operand" "=w")
+	(unspec:VHSDF_HSDF [(match_operand:VHSDF_HSDF 1 "register_operand" "w")]
 		     UNSPEC_RSQRTE))]
   "TARGET_SIMD"
   "frsqrte\\t%<v>0<Vmtype>, %<v>1<Vmtype>"
@@ -1755,6 +1755,32 @@
   [(set_attr "type" "neon_fp_to_int_<stype><q>")]
 )
 
+;; HF Scalar variants of related SIMD instructions.
+(define_insn "l<fcvt_pattern><su_optab>hfhi2"
+  [(set (match_operand:HI 0 "register_operand" "=w")
+	(FIXUORS:HI (unspec:HF [(match_operand:HF 1 "register_operand" "w")]
+		      FCVT)))]
+  "TARGET_SIMD_F16INST"
+  "fcvt<frint_suffix><su>\t%h0, %h1"
+  [(set_attr "type" "neon_fp_to_int_s")]
+)
+
+(define_insn "<optab>_trunchfhi2"
+  [(set (match_operand:HI 0 "register_operand" "=w")
+	(FIXUORS:HI (match_operand:HF 1 "register_operand" "w")))]
+  "TARGET_SIMD_F16INST"
+  "fcvtz<su>\t%h0, %h1"
+  [(set_attr "type" "neon_fp_to_int_s")]
+)
+
+(define_insn "<optab>hihf2"
+  [(set (match_operand:HF 0 "register_operand" "=w")
+	(FLOATUORS:HF (match_operand:HI 1 "register_operand" "w")))]
+  "TARGET_SIMD_F16INST"
+  "<su_optab>cvtf\t%h0, %h1"
+  [(set_attr "type" "neon_int_to_fp_s")]
+)
+
 (define_insn "*aarch64_fcvt<su_optab><VDQF:mode><fcvt_target>2_mult"
   [(set (match_operand:<FCVT_TARGET> 0 "register_operand" "=w")
 	(FIXUORS:<FCVT_TARGET> (unspec:<FCVT_TARGET>
@@ -4297,8 +4323,8 @@
   [(set (match_operand:<V_cmp_result> 0 "register_operand" "=w,w")
 	(neg:<V_cmp_result>
 	  (COMPARISONS:<V_cmp_result>
-	    (match_operand:VHSDF_SDF 1 "register_operand" "w,w")
-	    (match_operand:VHSDF_SDF 2 "aarch64_simd_reg_or_zero" "w,YDz")
+	    (match_operand:VHSDF_HSDF 1 "register_operand" "w,w")
+	    (match_operand:VHSDF_HSDF 2 "aarch64_simd_reg_or_zero" "w,YDz")
 	  )))]
   "TARGET_SIMD"
   "@
@@ -5425,12 +5451,12 @@
 )
 
 (define_insn "aarch64_frecp<FRECP:frecp_suffix><mode>"
-  [(set (match_operand:GPF 0 "register_operand" "=w")
-	(unspec:GPF [(match_operand:GPF 1 "register_operand" "w")]
-		    FRECP))]
+  [(set (match_operand:GPF_F16 0 "register_operand" "=w")
+	(unspec:GPF_F16 [(match_operand:GPF_F16 1 "register_operand" "w")]
+	 FRECP))]
   "TARGET_SIMD"
   "frecp<FRECP:frecp_suffix>\\t%<s>0, %<s>1"
-  [(set_attr "type" "neon_fp_recp<FRECP:frecp_suffix>_<GPF:Vetype><GPF:q>")]
+  [(set_attr "type" "neon_fp_recp<FRECP:frecp_suffix>_<GPF_F16:stype>")]
 )
 
 (define_insn "aarch64_frecps<mode>"
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index a7437c04eb936a5e3ebd0bc77eb4afd8c052df28..27866ccd605abec6ea7c9110022f329c9b172ee0 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -7599,7 +7599,7 @@ aarch64_emit_approx_sqrt (rtx dst, rtx src, bool recp)
 {
   machine_mode mode = GET_MODE (dst);
 
-  if (mode == V4HFmode || mode == V8HFmode)
+  if (mode == HFmode || mode == V4HFmode || mode == V8HFmode)
     return false;
 
   machine_mode mmsk = mode_for_vector
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index bcb7db086574683597d58a69a0fff9f2723d569f..56ad581da6c85716256f22eafbe432cde486154c 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4450,22 +4450,23 @@
 ;; Expands to btrunc, ceil, floor, nearbyint, rint, round, frintn.
 
 (define_insn "<frint_pattern><mode>2"
-  [(set (match_operand:GPF 0 "register_operand" "=w")
-	(unspec:GPF [(match_operand:GPF 1 "register_operand" "w")]
+  [(set (match_operand:GPF_F16 0 "register_operand" "=w")
+	(unspec:GPF_F16 [(match_operand:GPF_F16 1 "register_operand" "w")]
 	 FRINT))]
   "TARGET_FLOAT"
   "frint<frint_suffix>\\t%<s>0, %<s>1"
-  [(set_attr "type" "f_rint<s>")]
+  [(set_attr "type" "f_rint<stype>")]
 )
 
 ;; frcvt floating-point round to integer and convert standard patterns.
 ;; Expands to lbtrunc, lceil, lfloor, lround.
-(define_insn "l<fcvt_pattern><su_optab><GPF:mode><GPI:mode>2"
+(define_insn "l<fcvt_pattern><su_optab><GPF_F16:mode><GPI:mode>2"
   [(set (match_operand:GPI 0 "register_operand" "=r")
-	(FIXUORS:GPI (unspec:GPF [(match_operand:GPF 1 "register_operand" "w")]
-		      FCVT)))]
+	(FIXUORS:GPI
+	  (unspec:GPF_F16 [(match_operand:GPF_F16 1 "register_operand" "w")]
+	   FCVT)))]
   "TARGET_FLOAT"
-  "fcvt<frint_suffix><su>\\t%<GPI:w>0, %<GPF:s>1"
+  "fcvt<frint_suffix><su>\\t%<GPI:w>0, %<GPF_F16:s>1"
   [(set_attr "type" "f_cvtf2i")]
 )
 
@@ -4593,19 +4594,11 @@
   [(set_attr "type" "f_cvt")]
 )
 
-(define_insn "fix_trunc<GPF:mode><GPI:mode>2"
-  [(set (match_operand:GPI 0 "register_operand" "=r")
-        (fix:GPI (match_operand:GPF 1 "register_operand" "w")))]
-  "TARGET_FLOAT"
-  "fcvtzs\\t%<GPI:w>0, %<GPF:s>1"
-  [(set_attr "type" "f_cvtf2i")]
-)
-
-(define_insn "fixuns_trunc<GPF:mode><GPI:mode>2"
+(define_insn "<optab>_trunc<GPF_F16:mode><GPI:mode>2"
   [(set (match_operand:GPI 0 "register_operand" "=r")
-        (unsigned_fix:GPI (match_operand:GPF 1 "register_operand" "w")))]
+	(FIXUORS:GPI (match_operand:GPF_F16 1 "register_operand" "w")))]
   "TARGET_FLOAT"
-  "fcvtzu\\t%<GPI:w>0, %<GPF:s>1"
+  "fcvtz<su>\t%<GPI:w>0, %<GPF_F16:s>1"
   [(set_attr "type" "f_cvtf2i")]
 )
 
@@ -4629,6 +4622,14 @@
   [(set_attr "type" "f_cvti2f")]
 )
 
+(define_insn "<optab><mode>hf2"
+  [(set (match_operand:HF 0 "register_operand" "=w")
+	(FLOATUORS:HF (match_operand:GPI 1 "register_operand" "r")))]
+  "TARGET_FP_F16INST"
+  "<su_optab>cvtf\t%h0, %<w>1"
+  [(set_attr "type" "f_cvti2f")]
+)
+
 ;; Convert between fixed-point and floating-point (scalar modes)
 
 (define_insn "<FCVT_F2FIXED:fcvt_fixed_insn><GPF:mode>3"
@@ -4735,16 +4736,16 @@
 )
 
 (define_insn "neg<mode>2"
-  [(set (match_operand:GPF 0 "register_operand" "=w")
-        (neg:GPF (match_operand:GPF 1 "register_operand" "w")))]
+  [(set (match_operand:GPF_F16 0 "register_operand" "=w")
+	(neg:GPF_F16 (match_operand:GPF_F16 1 "register_operand" "w")))]
   "TARGET_FLOAT"
   "fneg\\t%<s>0, %<s>1"
-  [(set_attr "type" "ffarith<s>")]
+  [(set_attr "type" "ffarith<stype>")]
 )
 
 (define_expand "sqrt<mode>2"
-  [(set (match_operand:GPF 0 "register_operand")
-        (sqrt:GPF (match_operand:GPF 1 "register_operand")))]
+  [(set (match_operand:GPF_F16 0 "register_operand" "=w")
+	(sqrt:GPF_F16 (match_operand:GPF_F16 1 "register_operand" "w")))]
   "TARGET_FLOAT"
 {
   if (aarch64_emit_approx_sqrt (operands[0], operands[1], false))
@@ -4752,19 +4753,19 @@
 })
 
 (define_insn "*sqrt<mode>2"
-  [(set (match_operand:GPF 0 "register_operand" "=w")
-        (sqrt:GPF (match_operand:GPF 1 "register_operand" "w")))]
+  [(set (match_operand:GPF_F16 0 "register_operand" "=w")
+	(sqrt:GPF_F16 (match_operand:GPF_F16 1 "register_operand" "w")))]
   "TARGET_FLOAT"
   "fsqrt\\t%<s>0, %<s>1"
-  [(set_attr "type" "fsqrt<s>")]
+  [(set_attr "type" "fsqrt<stype>")]
 )
 
 (define_insn "abs<mode>2"
-  [(set (match_operand:GPF 0 "register_operand" "=w")
-        (abs:GPF (match_operand:GPF 1 "register_operand" "w")))]
+  [(set (match_operand:GPF_F16 0 "register_operand" "=w")
+	(abs:GPF_F16 (match_operand:GPF_F16 1 "register_operand" "w")))]
   "TARGET_FLOAT"
   "fabs\\t%<s>0, %<s>1"
-  [(set_attr "type" "ffarith<s>")]
+  [(set_attr "type" "ffarith<stype>")]
 )
 
 ;; Given that smax/smin do not specify the result when either input is NaN,
diff --git a/gcc/config/aarch64/arm_fp16.h b/gcc/config/aarch64/arm_fp16.h
new file mode 100644
index 0000000000000000000000000000000000000000..818aa61925b6c78ec93149b391a562bd1aea0b50
--- /dev/null
+++ b/gcc/config/aarch64/arm_fp16.h
@@ -0,0 +1,365 @@
+/* ARM FP16 scalar intrinsics include file.
+
+   Copyright (C) 2016 Free Software Foundation, Inc.
+   Contributed by ARM Ltd.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _AARCH64_FP16_H_
+#define _AARCH64_FP16_H_
+
+#include <stdint.h>
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8.2-a+fp16")
+
+typedef __fp16 float16_t;
+
+/* ARMv8.2-A FP16 one operand scalar intrinsics.  */
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vabsh_f16 (float16_t __a)
+{
+  return __builtin_aarch64_abshf (__a);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vceqzh_f16 (float16_t __a)
+{
+  return __builtin_aarch64_cmeqhf_uss (__a, 0.0f);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vcgezh_f16 (float16_t __a)
+{
+  return __builtin_aarch64_cmgehf_uss (__a, 0.0f);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vcgtzh_f16 (float16_t __a)
+{
+  return __builtin_aarch64_cmgthf_uss (__a, 0.0f);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vclezh_f16 (float16_t __a)
+{
+  return __builtin_aarch64_cmlehf_uss (__a, 0.0f);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vcltzh_f16 (float16_t __a)
+{
+  return __builtin_aarch64_cmlthf_uss (__a, 0.0f);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vcvth_f16_s16 (int16_t __a)
+{
+  return __builtin_aarch64_floathihf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vcvth_f16_s32 (int32_t __a)
+{
+  return __builtin_aarch64_floatsihf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vcvth_f16_s64 (int64_t __a)
+{
+  return __builtin_aarch64_floatdihf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vcvth_f16_u16 (uint16_t __a)
+{
+  return __builtin_aarch64_floatunshihf_us (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vcvth_f16_u32 (uint32_t __a)
+{
+  return __builtin_aarch64_floatunssihf_us (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vcvth_f16_u64 (uint64_t __a)
+{
+  return __builtin_aarch64_floatunsdihf_us (__a);
+}
+
+__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+vcvth_s16_f16 (float16_t __a)
+{
+  return __builtin_aarch64_fix_trunchfhi (__a);
+}
+
+__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+vcvth_s32_f16 (float16_t __a)
+{
+  return __builtin_aarch64_fix_trunchfsi (__a);
+}
+
+__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+vcvth_s64_f16 (float16_t __a)
+{
+  return __builtin_aarch64_fix_trunchfdi (__a);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vcvth_u16_f16 (float16_t __a)
+{
+  return __builtin_aarch64_fixuns_trunchfhi_us (__a);
+}
+
+__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+vcvth_u32_f16 (float16_t __a)
+{
+  return __builtin_aarch64_fixuns_trunchfsi_us (__a);
+}
+
+__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+vcvth_u64_f16 (float16_t __a)
+{
+  return __builtin_aarch64_fixuns_trunchfdi_us (__a);
+}
+
+__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+vcvtah_s16_f16 (float16_t __a)
+{
+  return __builtin_aarch64_lroundhfhi (__a);
+}
+
+__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+vcvtah_s32_f16 (float16_t __a)
+{
+  return __builtin_aarch64_lroundhfsi (__a);
+}
+
+__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+vcvtah_s64_f16 (float16_t __a)
+{
+  return __builtin_aarch64_lroundhfdi (__a);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vcvtah_u16_f16 (float16_t __a)
+{
+  return __builtin_aarch64_lrounduhfhi_us (__a);
+}
+
+__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+vcvtah_u32_f16 (float16_t __a)
+{
+  return __builtin_aarch64_lrounduhfsi_us (__a);
+}
+
+__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+vcvtah_u64_f16 (float16_t __a)
+{
+  return __builtin_aarch64_lrounduhfdi_us (__a);
+}
+
+__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+vcvtmh_s16_f16 (float16_t __a)
+{
+  return __builtin_aarch64_lfloorhfhi (__a);
+}
+
+__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+vcvtmh_s32_f16 (float16_t __a)
+{
+  return __builtin_aarch64_lfloorhfsi (__a);
+}
+
+__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+vcvtmh_s64_f16 (float16_t __a)
+{
+  return __builtin_aarch64_lfloorhfdi (__a);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vcvtmh_u16_f16 (float16_t __a)
+{
+  return __builtin_aarch64_lflooruhfhi_us (__a);
+}
+
+__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+vcvtmh_u32_f16 (float16_t __a)
+{
+  return __builtin_aarch64_lflooruhfsi_us (__a);
+}
+
+__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+vcvtmh_u64_f16 (float16_t __a)
+{
+  return __builtin_aarch64_lflooruhfdi_us (__a);
+}
+
+__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+vcvtnh_s16_f16 (float16_t __a)
+{
+  return __builtin_aarch64_lfrintnhfhi (__a);
+}
+
+__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+vcvtnh_s32_f16 (float16_t __a)
+{
+  return __builtin_aarch64_lfrintnhfsi (__a);
+}
+
+__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+vcvtnh_s64_f16 (float16_t __a)
+{
+  return __builtin_aarch64_lfrintnhfdi (__a);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vcvtnh_u16_f16 (float16_t __a)
+{
+  return __builtin_aarch64_lfrintnuhfhi_us (__a);
+}
+
+__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+vcvtnh_u32_f16 (float16_t __a)
+{
+  return __builtin_aarch64_lfrintnuhfsi_us (__a);
+}
+
+__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+vcvtnh_u64_f16 (float16_t __a)
+{
+  return __builtin_aarch64_lfrintnuhfdi_us (__a);
+}
+
+__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+vcvtph_s16_f16 (float16_t __a)
+{
+  return __builtin_aarch64_lceilhfhi (__a);
+}
+
+__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+vcvtph_s32_f16 (float16_t __a)
+{
+  return __builtin_aarch64_lceilhfsi (__a);
+}
+
+__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+vcvtph_s64_f16 (float16_t __a)
+{
+  return __builtin_aarch64_lceilhfdi (__a);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vcvtph_u16_f16 (float16_t __a)
+{
+  return __builtin_aarch64_lceiluhfhi_us (__a);
+}
+
+__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+vcvtph_u32_f16 (float16_t __a)
+{
+  return __builtin_aarch64_lceiluhfsi_us (__a);
+}
+
+__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+vcvtph_u64_f16 (float16_t __a)
+{
+  return __builtin_aarch64_lceiluhfdi_us (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vnegh_f16 (float16_t __a)
+{
+  return __builtin_aarch64_neghf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vrecpeh_f16 (float16_t __a)
+{
+  return __builtin_aarch64_frecpehf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vrecpxh_f16 (float16_t __a)
+{
+  return __builtin_aarch64_frecpxhf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vrndh_f16 (float16_t __a)
+{
+  return __builtin_aarch64_btrunchf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vrndah_f16 (float16_t __a)
+{
+  return __builtin_aarch64_roundhf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vrndih_f16 (float16_t __a)
+{
+  return __builtin_aarch64_nearbyinthf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vrndmh_f16 (float16_t __a)
+{
+  return __builtin_aarch64_floorhf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vrndnh_f16 (float16_t __a)
+{
+  return __builtin_aarch64_frintnhf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vrndph_f16 (float16_t __a)
+{
+  return __builtin_aarch64_ceilhf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vrndxh_f16 (float16_t __a)
+{
+  return __builtin_aarch64_rinthf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vrsqrteh_f16 (float16_t __a)
+{
+  return __builtin_aarch64_rsqrtehf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vsqrth_f16 (float16_t __a)
+{
+  return __builtin_aarch64_sqrthf (__a);
+}
+
+#pragma GCC pop_options
+
+#endif
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 4382efda8c3f2c4f29781c0ad9beb49b94501c47..fd555583b46e5899772ba4a9a2d80ea973895bc5 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -26032,6 +26032,8 @@ __INTERLEAVE_LIST (zip)
 
 /* ARMv8.2-A FP16 intrinsics.  */
 
+#include "arm_fp16.h"
+
 #pragma GCC push_options
 #pragma GCC target ("arch=armv8.2-a+fp16")
 
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 011b937105e477c0044bfb1c549179058bfbea31..20d0f1bf615396e0d662a51e9c5c9895046cd090 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -26,6 +26,9 @@
 ;; Iterator for General Purpose Integer registers (32- and 64-bit modes)
 (define_mode_iterator GPI [SI DI])
 
+;; Iterator for HI, SI, DI, some instructions can only work on these modes.
+(define_mode_iterator GPI_I16 [(HI "AARCH64_ISA_F16") SI DI])
+
 ;; Iterator for QI and HI modes
 (define_mode_iterator SHORT [QI HI])
 
@@ -38,6 +41,9 @@
 ;; Iterator for General Purpose Floating-point registers (32- and 64-bit modes)
 (define_mode_iterator GPF [SF DF])
 
+;; Iterator for all scalar floating point modes (HF, SF, DF)
+(define_mode_iterator GPF_F16 [(HF "AARCH64_ISA_F16") SF DF])
+
 ;; Iterator for all scalar floating point modes (HF, SF, DF and TF)
 (define_mode_iterator GPF_TF_F16 [HF SF DF TF])
 
@@ -102,6 +108,11 @@
 (define_mode_iterator VHSDF_SDF [(V4HF "TARGET_SIMD_F16INST")
 				 (V8HF "TARGET_SIMD_F16INST")
 				 V2SF V4SF V2DF SF DF])
+(define_mode_iterator VHSDF_HSDF [(V4HF "TARGET_SIMD_F16INST")
+				  (V8HF "TARGET_SIMD_F16INST")
+				  V2SF V4SF V2DF
+				  (HF "TARGET_SIMD_F16INST")
+				  SF DF])
 
 ;; Vector single Float modes.
 (define_mode_iterator VDQSF [V2SF V4SF])
@@ -372,8 +383,8 @@
 (define_mode_attr w [(QI "w") (HI "w") (SI "w") (DI "x") (SF "s") (DF "d")])
 
 ;; For inequal width int to float conversion
-(define_mode_attr w1 [(SF "w") (DF "x")])
-(define_mode_attr w2 [(SF "x") (DF "w")])
+(define_mode_attr w1 [(HF "w") (SF "w") (DF "x")])
+(define_mode_attr w2 [(HF "x") (SF "x") (DF "w")])
 
 (define_mode_attr short_mask [(HI "65535") (QI "255")])
 
@@ -385,7 +396,7 @@
 
 ;; For scalar usage of vector/FP registers
 (define_mode_attr v [(QI "b") (HI "h") (SI "s") (DI "d")
-		    (SF "s") (DF "d")
+		    (HF  "h") (SF "s") (DF "d")
 		    (V8QI "") (V16QI "")
 		    (V4HI "") (V8HI "")
 		    (V2SI "") (V4SI  "")
@@ -416,7 +427,7 @@
 (define_mode_attr vas [(DI "") (SI ".2s")])
 
 ;; Map a floating point mode to the appropriate register name prefix
-(define_mode_attr s [(SF "s") (DF "d")])
+(define_mode_attr s [(HF "h") (SF "s") (DF "d")])
 
 ;; Give the length suffix letter for a sign- or zero-extension.
 (define_mode_attr size [(QI "b") (HI "h") (SI "w")])
@@ -452,8 +463,8 @@
 			 (V4SF ".4s") (V2DF ".2d")
 			 (DI   "")    (SI   "")
 			 (HI   "")    (QI   "")
-			 (TI   "")    (SF   "")
-			 (DF   "")])
+			 (TI   "")    (HF   "")
+			 (SF   "")    (DF   "")])
 
 ;; Register suffix narrowed modes for VQN.
 (define_mode_attr Vmntype [(V8HI ".8b") (V4SI ".4h")
@@ -468,6 +479,7 @@
 			  (V2DI "d") (V4HF "h")
 			  (V8HF "h") (V2SF  "s")
 			  (V4SF "s") (V2DF  "d")
+			  (HF   "h")
 			  (SF   "s") (DF  "d")
 			  (QI "b")   (HI "h")
 			  (SI "s")   (DI "d")])
@@ -639,7 +651,7 @@
 				(V4HF "V4HI") (V8HF  "V8HI")
 				(V2SF "V2SI") (V4SF  "V4SI")
 				(V2DF "V2DI") (DF    "DI")
-				(SF   "SI")])
+				(SF   "SI")   (HF    "HI")])
 
 ;; Lower case mode of results of comparison operations.
 (define_mode_attr v_cmp_result [(V8QI "v8qi") (V16QI "v16qi")
@@ -702,8 +714,8 @@
 
 
 ;; for the inequal width integer to fp conversions
-(define_mode_attr fcvt_iesize [(SF "di") (DF "si")])
-(define_mode_attr FCVT_IESIZE [(SF "DI") (DF "SI")])
+(define_mode_attr fcvt_iesize [(HF "di") (SF "di") (DF "si")])
+(define_mode_attr FCVT_IESIZE [(HF "DI") (SF "DI") (DF "SI")])
 
 (define_mode_attr VSWAP_WIDTH [(V8QI "V16QI") (V16QI "V8QI")
 				(V4HI "V8HI") (V8HI  "V4HI")
@@ -757,7 +769,7 @@
 		     (V4HF "") (V8HF "_q")
 		     (V2SF "") (V4SF  "_q")
 			       (V2DF  "_q")
-		     (QI "") (HI "") (SI "") (DI "") (SF "") (DF "")])
+		     (QI "") (HI "") (SI "") (DI "") (HF "") (SF "") (DF "")])
 
 (define_mode_attr vp [(V8QI "v") (V16QI "v")
 		      (V4HI "v") (V8HI  "v")

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [AArch64][3/14] ARMv8.2-A FP16 two operands vector intrinsics
  2016-07-07 16:15     ` [AArch64][3/14] ARMv8.2-A FP16 two operands " Jiong Wang
@ 2016-07-20 17:01       ` Jiong Wang
  2016-07-25 11:03         ` James Greenhalgh
  0 siblings, 1 reply; 32+ messages in thread
From: Jiong Wang @ 2016-07-20 17:01 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 3669 bytes --]

On 07/07/16 17:15, Jiong Wang wrote:
> This patch add ARMv8.2-A FP16 two operands vector intrinsics.

The updated patch resolve the conflict with

    https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00309.html

The change is to let aarch64_emit_approx_div return false for
V4HFmode and V8HFmode.

gcc/
2016-07-20  Jiong Wang<jiong.wang@arm.com>

         * config/aarch64/aarch64-simd-builtins.def: Register new builtins.
         * config/aarch64/aarch64-simd.md
         (aarch64_rsqrts<mode>): Extend to HF modes.
         (fabd<mode>3): Likewise.
         (<FCVT_F2FIXED:fcvt_fixed_insn><VHSDF_SDF:mode>3): Likewise.
         (<FCVT_FIXED2F:fcvt_fixed_insn><VHSDI_SDI:mode>3): Likewise.
         (aarch64_<maxmin_uns>p<mode>): Likewise.
         (<su><maxmin><mode>3): Likewise.
         (<maxmin_uns><mode>3): Likewise.
         (<fmaxmin><mode>3): Likewise.
         (aarch64_faddp<mode>): Likewise.
         (aarch64_fmulx<mode>): Likewise.
         (aarch64_frecps<mode>): Likewise.
         (*aarch64_fac<optab><mode>): Rename to aarch64_fac<optab><mode>.
         (add<mode>3): Extend to HF modes.
         (sub<mode>3): Likewise.
         (mul<mode>3): Likewise.
         (div<mode>3): Likewise.
         (*div<mode>3): Likewise.
         * config/aarch64/aarch64.c (aarch64_emit_approx_div): Return
         false for V4HF and V8HF.
         * config/aarch64/iterators.md (VDQ_HSDI, VSDQ_HSDI): New mode
         iterator.
         * config/aarch64/arm_neon.h (vadd_f16): Likewise.
         (vaddq_f16): Likewise.
         (vabd_f16): Likewise.
         (vabdq_f16): Likewise.
         (vcage_f16): Likewise.
         (vcageq_f16): Likewise.
         (vcagt_f16): Likewise.
         (vcagtq_f16): Likewise.
         (vcale_f16): Likewise.
         (vcaleq_f16): Likewise.
         (vcalt_f16): Likewise.
         (vcaltq_f16): Likewise.
         (vceq_f16): Likewise.
         (vceqq_f16): Likewise.
         (vcge_f16): Likewise.
         (vcgeq_f16): Likewise.
         (vcgt_f16): Likewise.
         (vcgtq_f16): Likewise.
         (vcle_f16): Likewise.
         (vcleq_f16): Likewise.
         (vclt_f16): Likewise.
         (vcltq_f16): Likewise.
         (vcvt_n_f16_s16): Likewise.
         (vcvtq_n_f16_s16): Likewise.
         (vcvt_n_f16_u16): Likewise.
         (vcvtq_n_f16_u16): Likewise.
         (vcvt_n_s16_f16): Likewise.
         (vcvtq_n_s16_f16): Likewise.
         (vcvt_n_u16_f16): Likewise.
         (vcvtq_n_u16_f16): Likewise.
         (vdiv_f16): Likewise.
         (vdivq_f16): Likewise.
         (vdup_lane_f16): Likewise.
         (vdup_laneq_f16): Likewise.
         (vdupq_lane_f16): Likewise.
         (vdupq_laneq_f16): Likewise.
         (vdups_lane_f16): Likewise.
         (vdups_laneq_f16): Likewise.
         (vmax_f16): Likewise.
         (vmaxq_f16): Likewise.
         (vmaxnm_f16): Likewise.
         (vmaxnmq_f16): Likewise.
         (vmin_f16): Likewise.
         (vminq_f16): Likewise.
         (vminnm_f16): Likewise.
         (vminnmq_f16): Likewise.
         (vmul_f16): Likewise.
         (vmulq_f16): Likewise.
         (vmulx_f16): Likewise.
         (vmulxq_f16): Likewise.
         (vpadd_f16): Likewise.
         (vpaddq_f16): Likewise.
         (vpmax_f16): Likewise.
         (vpmaxq_f16): Likewise.
         (vpmaxnm_f16): Likewise.
         (vpmaxnmq_f16): Likewise.
         (vpmin_f16): Likewise.
         (vpminq_f16): Likewise.
         (vpminnm_f16): Likewise.
         (vpminnmq_f16): Likewise.
         (vrecps_f16): Likewise.
         (vrecpsq_f16): Likewise.
         (vrsqrts_f16): Likewise.
         (vrsqrtsq_f16): Likewise.
         (vsub_f16): Likewise.
         (vsubq_f16): Likewise.


[-- Attachment #2: upate-3.patch --]
[-- Type: text/x-patch, Size: 27640 bytes --]

diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 22c87be429ba1aac2bbe77f1119d16b6b8bd6e80..007dad60b6999158a1c9c1cf2a501a9f0712af54 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -41,7 +41,7 @@
 
   BUILTIN_VDC (COMBINE, combine, 0)
   BUILTIN_VB (BINOP, pmul, 0)
-  BUILTIN_VALLF (BINOP, fmulx, 0)
+  BUILTIN_VHSDF_SDF (BINOP, fmulx, 0)
   BUILTIN_VHSDF_DF (UNOP, sqrt, 2)
   BUILTIN_VD_BHSI (BINOP, addp, 0)
   VAR1 (UNOP, addp, 0, di)
@@ -248,22 +248,22 @@
   BUILTIN_VDQ_BHSI (BINOP, smin, 3)
   BUILTIN_VDQ_BHSI (BINOP, umax, 3)
   BUILTIN_VDQ_BHSI (BINOP, umin, 3)
-  BUILTIN_VDQF (BINOP, smax_nan, 3)
-  BUILTIN_VDQF (BINOP, smin_nan, 3)
+  BUILTIN_VHSDF (BINOP, smax_nan, 3)
+  BUILTIN_VHSDF (BINOP, smin_nan, 3)
 
   /* Implemented by <fmaxmin><mode>3.  */
-  BUILTIN_VDQF (BINOP, fmax, 3)
-  BUILTIN_VDQF (BINOP, fmin, 3)
+  BUILTIN_VHSDF (BINOP, fmax, 3)
+  BUILTIN_VHSDF (BINOP, fmin, 3)
 
   /* Implemented by aarch64_<maxmin_uns>p<mode>.  */
   BUILTIN_VDQ_BHSI (BINOP, smaxp, 0)
   BUILTIN_VDQ_BHSI (BINOP, sminp, 0)
   BUILTIN_VDQ_BHSI (BINOP, umaxp, 0)
   BUILTIN_VDQ_BHSI (BINOP, uminp, 0)
-  BUILTIN_VDQF (BINOP, smaxp, 0)
-  BUILTIN_VDQF (BINOP, sminp, 0)
-  BUILTIN_VDQF (BINOP, smax_nanp, 0)
-  BUILTIN_VDQF (BINOP, smin_nanp, 0)
+  BUILTIN_VHSDF (BINOP, smaxp, 0)
+  BUILTIN_VHSDF (BINOP, sminp, 0)
+  BUILTIN_VHSDF (BINOP, smax_nanp, 0)
+  BUILTIN_VHSDF (BINOP, smin_nanp, 0)
 
   /* Implemented by <frint_pattern><mode>2.  */
   BUILTIN_VHSDF (UNOP, btrunc, 2)
@@ -383,7 +383,7 @@
   BUILTIN_VDQ_SI (UNOP, urecpe, 0)
 
   BUILTIN_VHSDF (UNOP, frecpe, 0)
-  BUILTIN_VDQF (BINOP, frecps, 0)
+  BUILTIN_VHSDF (BINOP, frecps, 0)
 
   /* Implemented by a mixture of abs2 patterns.  Note the DImode builtin is
      only ever used for the int64x1_t intrinsic, there is no scalar version.  */
@@ -475,22 +475,22 @@
   BUILTIN_VSDQ_HSI (QUADOP_LANE, sqrdmlsh_laneq, 0)
 
   /* Implemented by <FCVT_F2FIXED/FIXED2F:fcvt_fixed_insn><*><*>3.  */
-  BUILTIN_VSDQ_SDI (SHIFTIMM, scvtf, 3)
-  BUILTIN_VSDQ_SDI (FCVTIMM_SUS, ucvtf, 3)
-  BUILTIN_VALLF (SHIFTIMM, fcvtzs, 3)
-  BUILTIN_VALLF (SHIFTIMM_USS, fcvtzu, 3)
+  BUILTIN_VSDQ_HSDI (SHIFTIMM, scvtf, 3)
+  BUILTIN_VSDQ_HSDI (FCVTIMM_SUS, ucvtf, 3)
+  BUILTIN_VHSDF_SDF (SHIFTIMM, fcvtzs, 3)
+  BUILTIN_VHSDF_SDF (SHIFTIMM_USS, fcvtzu, 3)
 
   /* Implemented by aarch64_rsqrte<mode>.  */
   BUILTIN_VHSDF_SDF (UNOP, rsqrte, 0)
 
   /* Implemented by aarch64_rsqrts<mode>.  */
-  BUILTIN_VALLF (BINOP, rsqrts, 0)
+  BUILTIN_VHSDF_SDF (BINOP, rsqrts, 0)
 
   /* Implemented by fabd<mode>3.  */
-  BUILTIN_VALLF (BINOP, fabd, 3)
+  BUILTIN_VHSDF_SDF (BINOP, fabd, 3)
 
   /* Implemented by aarch64_faddp<mode>.  */
-  BUILTIN_VDQF (BINOP, faddp, 0)
+  BUILTIN_VHSDF (BINOP, faddp, 0)
 
   /* Implemented by aarch64_cm<optab><mode>.  */
   BUILTIN_VHSDF_SDF (BINOP_USS, cmeq, 0)
@@ -501,3 +501,9 @@
 
   /* Implemented by neg<mode>2.  */
   BUILTIN_VHSDF (UNOP, neg, 2)
+
+  /* Implemented by aarch64_fac<optab><mode>.  */
+  BUILTIN_VHSDF_SDF (BINOP_USS, faclt, 0)
+  BUILTIN_VHSDF_SDF (BINOP_USS, facle, 0)
+  BUILTIN_VHSDF_SDF (BINOP_USS, facgt, 0)
+  BUILTIN_VHSDF_SDF (BINOP_USS, facge, 0)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 8d895a545672a255da6234d6fafeea51dc92ae3b..ec7ab8669cec217e196e9b3d341119bb5988346c 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -391,13 +391,13 @@
   [(set_attr "type" "neon_fp_rsqrte_<stype><q>")])
 
 (define_insn "aarch64_rsqrts<mode>"
-  [(set (match_operand:VALLF 0 "register_operand" "=w")
-	(unspec:VALLF [(match_operand:VALLF 1 "register_operand" "w")
-	       (match_operand:VALLF 2 "register_operand" "w")]
-		     UNSPEC_RSQRTS))]
+  [(set (match_operand:VHSDF_SDF 0 "register_operand" "=w")
+	(unspec:VHSDF_SDF [(match_operand:VHSDF_SDF 1 "register_operand" "w")
+			   (match_operand:VHSDF_SDF 2 "register_operand" "w")]
+	 UNSPEC_RSQRTS))]
   "TARGET_SIMD"
   "frsqrts\\t%<v>0<Vmtype>, %<v>1<Vmtype>, %<v>2<Vmtype>"
-  [(set_attr "type" "neon_fp_rsqrts_<Vetype><q>")])
+  [(set_attr "type" "neon_fp_rsqrts_<stype><q>")])
 
 (define_expand "rsqrt<mode>2"
   [(set (match_operand:VALLF 0 "register_operand" "=w")
@@ -475,14 +475,14 @@
 )
 
 (define_insn "fabd<mode>3"
-  [(set (match_operand:VALLF 0 "register_operand" "=w")
-	(abs:VALLF
-	  (minus:VALLF
-	    (match_operand:VALLF 1 "register_operand" "w")
-	    (match_operand:VALLF 2 "register_operand" "w"))))]
+  [(set (match_operand:VHSDF_SDF 0 "register_operand" "=w")
+	(abs:VHSDF_SDF
+	  (minus:VHSDF_SDF
+	    (match_operand:VHSDF_SDF 1 "register_operand" "w")
+	    (match_operand:VHSDF_SDF 2 "register_operand" "w"))))]
   "TARGET_SIMD"
   "fabd\t%<v>0<Vmtype>, %<v>1<Vmtype>, %<v>2<Vmtype>"
-  [(set_attr "type" "neon_fp_abd_<Vetype><q>")]
+  [(set_attr "type" "neon_fp_abd_<stype><q>")]
 )
 
 (define_insn "and<mode>3"
@@ -1105,10 +1105,10 @@
 
 ;; Pairwise FP Max/Min operations.
 (define_insn "aarch64_<maxmin_uns>p<mode>"
- [(set (match_operand:VDQF 0 "register_operand" "=w")
-       (unspec:VDQF [(match_operand:VDQF 1 "register_operand" "w")
-		     (match_operand:VDQF 2 "register_operand" "w")]
-		    FMAXMINV))]
+ [(set (match_operand:VHSDF 0 "register_operand" "=w")
+       (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand" "w")
+		      (match_operand:VHSDF 2 "register_operand" "w")]
+		      FMAXMINV))]
  "TARGET_SIMD"
  "<maxmin_uns_op>p\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
   [(set_attr "type" "neon_minmax<q>")]
@@ -1517,36 +1517,36 @@
 ;; FP arithmetic operations.
 
 (define_insn "add<mode>3"
- [(set (match_operand:VDQF 0 "register_operand" "=w")
-       (plus:VDQF (match_operand:VDQF 1 "register_operand" "w")
-		  (match_operand:VDQF 2 "register_operand" "w")))]
+ [(set (match_operand:VHSDF 0 "register_operand" "=w")
+       (plus:VHSDF (match_operand:VHSDF 1 "register_operand" "w")
+		   (match_operand:VHSDF 2 "register_operand" "w")))]
  "TARGET_SIMD"
  "fadd\\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
-  [(set_attr "type" "neon_fp_addsub_<Vetype><q>")]
+  [(set_attr "type" "neon_fp_addsub_<stype><q>")]
 )
 
 (define_insn "sub<mode>3"
- [(set (match_operand:VDQF 0 "register_operand" "=w")
-       (minus:VDQF (match_operand:VDQF 1 "register_operand" "w")
-		   (match_operand:VDQF 2 "register_operand" "w")))]
+ [(set (match_operand:VHSDF 0 "register_operand" "=w")
+       (minus:VHSDF (match_operand:VHSDF 1 "register_operand" "w")
+		    (match_operand:VHSDF 2 "register_operand" "w")))]
  "TARGET_SIMD"
  "fsub\\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
-  [(set_attr "type" "neon_fp_addsub_<Vetype><q>")]
+  [(set_attr "type" "neon_fp_addsub_<stype><q>")]
 )
 
 (define_insn "mul<mode>3"
- [(set (match_operand:VDQF 0 "register_operand" "=w")
-       (mult:VDQF (match_operand:VDQF 1 "register_operand" "w")
-		  (match_operand:VDQF 2 "register_operand" "w")))]
+ [(set (match_operand:VHSDF 0 "register_operand" "=w")
+       (mult:VHSDF (match_operand:VHSDF 1 "register_operand" "w")
+		   (match_operand:VHSDF 2 "register_operand" "w")))]
  "TARGET_SIMD"
  "fmul\\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
-  [(set_attr "type" "neon_fp_mul_<Vetype><q>")]
+  [(set_attr "type" "neon_fp_mul_<stype><q>")]
 )
 
 (define_expand "div<mode>3"
- [(set (match_operand:VDQF 0 "register_operand")
-       (div:VDQF (match_operand:VDQF 1 "general_operand")
-		 (match_operand:VDQF 2 "register_operand")))]
+ [(set (match_operand:VHSDF 0 "register_operand" "=w")
+       (div:VHSDF (match_operand:VHSDF 1 "register_operand" "w")
+		  (match_operand:VHSDF 2 "register_operand" "w")))]
  "TARGET_SIMD"
 {
   if (aarch64_emit_approx_div (operands[0], operands[1], operands[2]))
@@ -1556,12 +1556,12 @@
 })
 
 (define_insn "*div<mode>3"
- [(set (match_operand:VDQF 0 "register_operand" "=w")
-       (div:VDQF (match_operand:VDQF 1 "register_operand" "w")
-		 (match_operand:VDQF 2 "register_operand" "w")))]
+ [(set (match_operand:VHSDF 0 "register_operand" "=w")
+       (div:VHSDF (match_operand:VHSDF 1 "register_operand" "w")
+		 (match_operand:VHSDF 2 "register_operand" "w")))]
  "TARGET_SIMD"
  "fdiv\\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
-  [(set_attr "type" "neon_fp_div_<Vetype><q>")]
+  [(set_attr "type" "neon_fp_div_<stype><q>")]
 )
 
 (define_insn "neg<mode>2"
@@ -1826,24 +1826,26 @@
 
 ;; Convert between fixed-point and floating-point (vector modes)
 
-(define_insn "<FCVT_F2FIXED:fcvt_fixed_insn><VDQF:mode>3"
-  [(set (match_operand:<VDQF:FCVT_TARGET> 0 "register_operand" "=w")
-	(unspec:<VDQF:FCVT_TARGET> [(match_operand:VDQF 1 "register_operand" "w")
-				    (match_operand:SI 2 "immediate_operand" "i")]
+(define_insn "<FCVT_F2FIXED:fcvt_fixed_insn><VHSDF:mode>3"
+  [(set (match_operand:<VHSDF:FCVT_TARGET> 0 "register_operand" "=w")
+	(unspec:<VHSDF:FCVT_TARGET>
+	  [(match_operand:VHSDF 1 "register_operand" "w")
+	   (match_operand:SI 2 "immediate_operand" "i")]
 	 FCVT_F2FIXED))]
   "TARGET_SIMD"
   "<FCVT_F2FIXED:fcvt_fixed_insn>\t%<v>0<Vmtype>, %<v>1<Vmtype>, #%2"
-  [(set_attr "type" "neon_fp_to_int_<VDQF:Vetype><q>")]
+  [(set_attr "type" "neon_fp_to_int_<VHSDF:stype><q>")]
 )
 
-(define_insn "<FCVT_FIXED2F:fcvt_fixed_insn><VDQ_SDI:mode>3"
-  [(set (match_operand:<VDQ_SDI:FCVT_TARGET> 0 "register_operand" "=w")
-	(unspec:<VDQ_SDI:FCVT_TARGET> [(match_operand:VDQ_SDI 1 "register_operand" "w")
-				       (match_operand:SI 2 "immediate_operand" "i")]
+(define_insn "<FCVT_FIXED2F:fcvt_fixed_insn><VDQ_HSDI:mode>3"
+  [(set (match_operand:<VDQ_HSDI:FCVT_TARGET> 0 "register_operand" "=w")
+	(unspec:<VDQ_HSDI:FCVT_TARGET>
+	  [(match_operand:VDQ_HSDI 1 "register_operand" "w")
+	   (match_operand:SI 2 "immediate_operand" "i")]
 	 FCVT_FIXED2F))]
   "TARGET_SIMD"
   "<FCVT_FIXED2F:fcvt_fixed_insn>\t%<v>0<Vmtype>, %<v>1<Vmtype>, #%2"
-  [(set_attr "type" "neon_int_to_fp_<VDQ_SDI:Vetype><q>")]
+  [(set_attr "type" "neon_int_to_fp_<VDQ_HSDI:stype><q>")]
 )
 
 ;; ??? Note that the vectorizer usage of the vec_unpacks_[lo/hi] patterns
@@ -2002,33 +2004,33 @@
 ;; NaNs.
 
 (define_insn "<su><maxmin><mode>3"
-  [(set (match_operand:VDQF 0 "register_operand" "=w")
-        (FMAXMIN:VDQF (match_operand:VDQF 1 "register_operand" "w")
-		   (match_operand:VDQF 2 "register_operand" "w")))]
+  [(set (match_operand:VHSDF 0 "register_operand" "=w")
+	(FMAXMIN:VHSDF (match_operand:VHSDF 1 "register_operand" "w")
+		       (match_operand:VHSDF 2 "register_operand" "w")))]
   "TARGET_SIMD"
   "f<maxmin>nm\\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
-  [(set_attr "type" "neon_fp_minmax_<Vetype><q>")]
+  [(set_attr "type" "neon_fp_minmax_<stype><q>")]
 )
 
 (define_insn "<maxmin_uns><mode>3"
-  [(set (match_operand:VDQF 0 "register_operand" "=w")
-       (unspec:VDQF [(match_operand:VDQF 1 "register_operand" "w")
-		     (match_operand:VDQF 2 "register_operand" "w")]
-		    FMAXMIN_UNS))]
+  [(set (match_operand:VHSDF 0 "register_operand" "=w")
+       (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand" "w")
+		      (match_operand:VHSDF 2 "register_operand" "w")]
+		      FMAXMIN_UNS))]
   "TARGET_SIMD"
   "<maxmin_uns_op>\\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
-  [(set_attr "type" "neon_fp_minmax_<Vetype><q>")]
+  [(set_attr "type" "neon_fp_minmax_<stype><q>")]
 )
 
 ;; Auto-vectorized forms for the IEEE-754 fmax()/fmin() functions
 (define_insn "<fmaxmin><mode>3"
-  [(set (match_operand:VDQF 0 "register_operand" "=w")
-	(unspec:VDQF [(match_operand:VDQF 1 "register_operand" "w")
-		      (match_operand:VDQF 2 "register_operand" "w")]
-		      FMAXMIN))]
+  [(set (match_operand:VHSDF 0 "register_operand" "=w")
+	(unspec:VHSDF [(match_operand:VHSDF 1 "register_operand" "w")
+		       (match_operand:VHSDF 2 "register_operand" "w")]
+		       FMAXMIN))]
   "TARGET_SIMD"
   "<fmaxmin_op>\\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
-  [(set_attr "type" "neon_fp_minmax_<Vetype><q>")]
+  [(set_attr "type" "neon_fp_minmax_<stype><q>")]
 )
 
 ;; 'across lanes' add.
@@ -2048,13 +2050,13 @@
 )
 
 (define_insn "aarch64_faddp<mode>"
- [(set (match_operand:VDQF 0 "register_operand" "=w")
-       (unspec:VDQF [(match_operand:VDQF 1 "register_operand" "w")
-		     (match_operand:VDQF 2 "register_operand" "w")]
-		     UNSPEC_FADDV))]
+ [(set (match_operand:VHSDF 0 "register_operand" "=w")
+       (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand" "w")
+		      (match_operand:VHSDF 2 "register_operand" "w")]
+	UNSPEC_FADDV))]
  "TARGET_SIMD"
  "faddp\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
-  [(set_attr "type" "neon_fp_reduc_add_<Vetype><q>")]
+  [(set_attr "type" "neon_fp_reduc_add_<stype><q>")]
 )
 
 (define_insn "aarch64_reduc_plus_internal<mode>"
@@ -3050,13 +3052,14 @@
 ;; fmulx.
 
 (define_insn "aarch64_fmulx<mode>"
-  [(set (match_operand:VALLF 0 "register_operand" "=w")
-	(unspec:VALLF [(match_operand:VALLF 1 "register_operand" "w")
-		       (match_operand:VALLF 2 "register_operand" "w")]
-		      UNSPEC_FMULX))]
+  [(set (match_operand:VHSDF_SDF 0 "register_operand" "=w")
+	(unspec:VHSDF_SDF
+	  [(match_operand:VHSDF_SDF 1 "register_operand" "w")
+	   (match_operand:VHSDF_SDF 2 "register_operand" "w")]
+	   UNSPEC_FMULX))]
  "TARGET_SIMD"
  "fmulx\t%<v>0<Vmtype>, %<v>1<Vmtype>, %<v>2<Vmtype>"
- [(set_attr "type" "neon_fp_mul_<Vetype>")]
+ [(set_attr "type" "neon_fp_mul_<stype>")]
 )
 
 ;; vmulxq_lane_f32, and vmulx_laneq_f32
@@ -4310,16 +4313,18 @@
 ;; Note we can also handle what would be fac(le|lt) by
 ;; generating fac(ge|gt).
 
-(define_insn "*aarch64_fac<optab><mode>"
+(define_insn "aarch64_fac<optab><mode>"
   [(set (match_operand:<V_cmp_result> 0 "register_operand" "=w")
 	(neg:<V_cmp_result>
 	  (FAC_COMPARISONS:<V_cmp_result>
-	    (abs:VALLF (match_operand:VALLF 1 "register_operand" "w"))
-	    (abs:VALLF (match_operand:VALLF 2 "register_operand" "w"))
+	    (abs:VHSDF_SDF
+	      (match_operand:VHSDF_SDF 1 "register_operand" "w"))
+	    (abs:VHSDF_SDF
+	      (match_operand:VHSDF_SDF 2 "register_operand" "w"))
   )))]
   "TARGET_SIMD"
   "fac<n_optab>\t%<v>0<Vmtype>, %<v><cmp_1><Vmtype>, %<v><cmp_2><Vmtype>"
-  [(set_attr "type" "neon_fp_compare_<Vetype><q>")]
+  [(set_attr "type" "neon_fp_compare_<stype><q>")]
 )
 
 ;; addp
@@ -5431,13 +5436,14 @@
 )
 
 (define_insn "aarch64_frecps<mode>"
-  [(set (match_operand:VALLF 0 "register_operand" "=w")
-	(unspec:VALLF [(match_operand:VALLF 1 "register_operand" "w")
-		     (match_operand:VALLF 2 "register_operand" "w")]
-		    UNSPEC_FRECPS))]
+  [(set (match_operand:VHSDF_SDF 0 "register_operand" "=w")
+	(unspec:VHSDF_SDF
+	  [(match_operand:VHSDF_SDF 1 "register_operand" "w")
+	  (match_operand:VHSDF_SDF 2 "register_operand" "w")]
+	  UNSPEC_FRECPS))]
   "TARGET_SIMD"
   "frecps\\t%<v>0<Vmtype>, %<v>1<Vmtype>, %<v>2<Vmtype>"
-  [(set_attr "type" "neon_fp_recps_<Vetype><q>")]
+  [(set_attr "type" "neon_fp_recps_<stype><q>")]
 )
 
 (define_insn "aarch64_urecpe<mode>"
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 5ed633542efe58763d68fd9bfbb478ae6ef569c3..a7437c04eb936a5e3ebd0bc77eb4afd8c052df28 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -7717,6 +7717,10 @@ bool
 aarch64_emit_approx_div (rtx quo, rtx num, rtx den)
 {
   machine_mode mode = GET_MODE (quo);
+
+  if (mode == V4HFmode || mode == V8HFmode)
+    return false;
+
   bool use_approx_division_p = (flag_mlow_precision_div
 			        || (aarch64_tune_params.approx_modes->division
 				    & AARCH64_APPROX_MODE (mode)));
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index b4310f27aac08ab6ff5e89d58512dafc389b2c37..baae27619a6a1c34c0ad338f2afec4932b51cbeb 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -26385,6 +26385,368 @@ vsqrtq_f16 (float16x8_t a)
   return __builtin_aarch64_sqrtv8hf (a);
 }
 
+/* ARMv8.2-A FP16 two operands vector intrinsics.  */
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vadd_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __a + __b;
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vaddq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __a + __b;
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vabd_f16 (float16x4_t a, float16x4_t b)
+{
+  return __builtin_aarch64_fabdv4hf (a, b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vabdq_f16 (float16x8_t a, float16x8_t b)
+{
+  return __builtin_aarch64_fabdv8hf (a, b);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcage_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_aarch64_facgev4hf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcageq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_aarch64_facgev8hf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcagt_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_aarch64_facgtv4hf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcagtq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_aarch64_facgtv8hf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcale_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_aarch64_faclev4hf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcaleq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_aarch64_faclev8hf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcalt_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_aarch64_facltv4hf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcaltq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_aarch64_facltv8hf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vceq_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_aarch64_cmeqv4hf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vceqq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_aarch64_cmeqv8hf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcge_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_aarch64_cmgev4hf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcgeq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_aarch64_cmgev8hf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcgt_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_aarch64_cmgtv4hf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcgtq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_aarch64_cmgtv8hf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcle_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_aarch64_cmlev4hf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcleq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_aarch64_cmlev8hf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vclt_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_aarch64_cmltv4hf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcltq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_aarch64_cmltv8hf_uss (__a, __b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vcvt_n_f16_s16 (int16x4_t __a, const int __b)
+{
+  return __builtin_aarch64_scvtfv4hi (__a, __b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vcvtq_n_f16_s16 (int16x8_t __a, const int __b)
+{
+  return __builtin_aarch64_scvtfv8hi (__a, __b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vcvt_n_f16_u16 (uint16x4_t __a, const int __b)
+{
+  return __builtin_aarch64_ucvtfv4hi_sus (__a, __b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vcvtq_n_f16_u16 (uint16x8_t __a, const int __b)
+{
+  return __builtin_aarch64_ucvtfv8hi_sus (__a, __b);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vcvt_n_s16_f16 (float16x4_t __a, const int __b)
+{
+  return __builtin_aarch64_fcvtzsv4hf (__a, __b);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vcvtq_n_s16_f16 (float16x8_t __a, const int __b)
+{
+  return __builtin_aarch64_fcvtzsv8hf (__a, __b);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcvt_n_u16_f16 (float16x4_t __a, const int __b)
+{
+  return __builtin_aarch64_fcvtzuv4hf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcvtq_n_u16_f16 (float16x8_t __a, const int __b)
+{
+  return __builtin_aarch64_fcvtzuv8hf_uss (__a, __b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vdiv_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __a / __b;
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vdivq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __a / __b;
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vmax_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_aarch64_smax_nanv4hf (__a, __b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vmaxq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_aarch64_smax_nanv8hf (__a, __b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vmaxnm_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_aarch64_fmaxv4hf (__a, __b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vmaxnmq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_aarch64_fmaxv8hf (__a, __b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vmin_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_aarch64_smin_nanv4hf (__a, __b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vminq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_aarch64_smin_nanv8hf (__a, __b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vminnm_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_aarch64_fminv4hf (__a, __b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vminnmq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_aarch64_fminv8hf (__a, __b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vmul_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __a * __b;
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vmulq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __a * __b;
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vmulx_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_aarch64_fmulxv4hf (__a, __b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vmulxq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_aarch64_fmulxv8hf (__a, __b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vpadd_f16 (float16x4_t a, float16x4_t b)
+{
+  return __builtin_aarch64_faddpv4hf (a, b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vpaddq_f16 (float16x8_t a, float16x8_t b)
+{
+  return __builtin_aarch64_faddpv8hf (a, b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vpmax_f16 (float16x4_t a, float16x4_t b)
+{
+  return __builtin_aarch64_smax_nanpv4hf (a, b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vpmaxq_f16 (float16x8_t a, float16x8_t b)
+{
+  return __builtin_aarch64_smax_nanpv8hf (a, b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vpmaxnm_f16 (float16x4_t a, float16x4_t b)
+{
+  return __builtin_aarch64_smaxpv4hf (a, b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vpmaxnmq_f16 (float16x8_t a, float16x8_t b)
+{
+  return __builtin_aarch64_smaxpv8hf (a, b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vpmin_f16 (float16x4_t a, float16x4_t b)
+{
+  return __builtin_aarch64_smin_nanpv4hf (a, b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vpminq_f16 (float16x8_t a, float16x8_t b)
+{
+  return __builtin_aarch64_smin_nanpv8hf (a, b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vpminnm_f16 (float16x4_t a, float16x4_t b)
+{
+  return __builtin_aarch64_sminpv4hf (a, b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vpminnmq_f16 (float16x8_t a, float16x8_t b)
+{
+  return __builtin_aarch64_sminpv8hf (a, b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vrecps_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_aarch64_frecpsv4hf (__a, __b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vrecpsq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_aarch64_frecpsv8hf (__a, __b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vrsqrts_f16 (float16x4_t a, float16x4_t b)
+{
+  return __builtin_aarch64_rsqrtsv4hf (a, b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vrsqrtsq_f16 (float16x8_t a, float16x8_t b)
+{
+  return __builtin_aarch64_rsqrtsv8hf (a, b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vsub_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __a - __b;
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vsubq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __a - __b;
+}
+
 #pragma GCC pop_options
 
 #undef __aarch64_vget_lane_any
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index af5eda9b9f4a80e1309655dcd7798337e1d818eb..35190b4343bd6dfb3a77a58bd1697426962cedc7 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -166,9 +166,19 @@
 ;; Vector modes for S and D
 (define_mode_iterator VDQ_SDI [V2SI V4SI V2DI])
 
+;; Vector modes for H, S and D
+(define_mode_iterator VDQ_HSDI [(V4HI "TARGET_SIMD_F16INST")
+				(V8HI "TARGET_SIMD_F16INST")
+				V2SI V4SI V2DI])
+
 ;; Scalar and Vector modes for S and D
 (define_mode_iterator VSDQ_SDI [V2SI V4SI V2DI SI DI])
 
+;; Scalar and Vector modes for S and D, Vector modes for H.
+(define_mode_iterator VSDQ_HSDI [(V4HI "TARGET_SIMD_F16INST")
+				 (V8HI "TARGET_SIMD_F16INST")
+				 V2SI V4SI V2DI SI DI])
+
 ;; Vector modes for Q and H types.
 (define_mode_iterator VDQQH [V8QI V16QI V4HI V8HI])
 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [AArch64][8/14] ARMv8.2-A FP16 two operands scalar intrinsics
  2016-07-07 16:18               ` [AArch64][8/14] ARMv8.2-A FP16 two operands " Jiong Wang
@ 2016-07-20 17:01                 ` Jiong Wang
  2016-07-25 11:15                   ` James Greenhalgh
  0 siblings, 1 reply; 32+ messages in thread
From: Jiong Wang @ 2016-07-20 17:01 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 2563 bytes --]

On 07/07/16 17:17, Jiong Wang wrote:
> This patch add ARMv8.2-A FP16 two operands scalar intrinsics.

The updated patch resolve the conflict with

    https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00309.html

The change is to let aarch64_emit_approx_div return false for HFmode.

gcc/
2016-07-20  Jiong Wang<jiong.wang@arm.com>

         * config/aarch64/aarch64-simd-builtins.def: Register new builtins.
         * config/aarch64/aarch64.md (<FCVT_F2FIXED:fcvt_fixed_insn>hf<mode>3): New.
         (<FCVT_FIXED2F:fcvt_fixed_insn><mode>hf3): Likewise.
         (add<mode>3): Likewise.
         (sub<mode>3): Likewise.
         (mul<mode>3): Likewise.
         (div<mode>3): Likewise.
         (*div<mode>3): Likewise.
         (<fmaxmin><mode>3): Extend to HF.
         * config/aarch64/aarch64.c (aarch64_emit_approx_div): Return
         false for HFmode.
         * config/aarch64/aarch64-simd.md (aarch64_rsqrts<mode>): Likewise.
         (fabd<mode>3): Likewise.
         (<FCVT_F2FIXED:fcvt_fixed_insn><VHSDF_HSDF:mode>3): Likewise.
         (<FCVT_FIXED2F:fcvt_fixed_insn><VHSDI_HSDI:mode>3): Likewise.
         (aarch64_fmulx<mode>): Likewise.
         (aarch64_fac<optab><mode>): Likewise.
         (aarch64_frecps<mode>): Likewise.
         (<FCVT_F2FIXED:fcvt_fixed_insn>hfhi3): New.
         (<FCVT_FIXED2F:fcvt_fixed_insn>hihf3): Likewise.
         * config/aarch64/iterators.md (VHSDF_SDF): Delete.
         (VSDQ_HSDI): Support HI.
         (fcvt_target, FCVT_TARGET): Likewise.
         * config/aarch64/arm_fp16.h: (vaddh_f16): New.
         (vsubh_f16): Likewise.
         (vabdh_f16): Likewise.
         (vcageh_f16): Likewise.
         (vcagth_f16): Likewise.
         (vcaleh_f16): Likewise.
         (vcalth_f16): Likewise.        (vcleh_f16): Likewise.
         (vclth_f16): Likewise.
         (vcvth_n_f16_s16): Likewise.
         (vcvth_n_f16_s32): Likewise.
         (vcvth_n_f16_s64): Likewise.
         (vcvth_n_f16_u16): Likewise.
         (vcvth_n_f16_u32): Likewise.
         (vcvth_n_f16_u64): Likewise.
         (vcvth_n_s16_f16): Likewise.
         (vcvth_n_s32_f16): Likewise.
         (vcvth_n_s64_f16): Likewise.
         (vcvth_n_u16_f16): Likewise.
         (vcvth_n_u32_f16): Likewise.
         (vcvth_n_u64_f16): Likewise.
         (vdivh_f16): Likewise.
         (vmaxh_f16): Likewise.
         (vmaxnmh_f16): Likewise.
         (vminh_f16): Likewise.
         (vminnmh_f16): Likewise.
         (vmulh_f16): Likewise.
         (vmulxh_f16): Likewise.
         (vrecpsh_f16): Likewise.
         (vrsqrtsh_f16): Likewise.


[-- Attachment #2: upate-8.patch --]
[-- Type: text/x-patch, Size: 19649 bytes --]

diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 6f50d8405d3ee8c4823037bb2022a4f2f08b72fe..31abc077859254e3696adacb3f8f2b9b2da0647f 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -41,7 +41,7 @@
 
   BUILTIN_VDC (COMBINE, combine, 0)
   BUILTIN_VB (BINOP, pmul, 0)
-  BUILTIN_VHSDF_SDF (BINOP, fmulx, 0)
+  BUILTIN_VHSDF_HSDF (BINOP, fmulx, 0)
   BUILTIN_VHSDF_DF (UNOP, sqrt, 2)
   BUILTIN_VD_BHSI (BINOP, addp, 0)
   VAR1 (UNOP, addp, 0, di)
@@ -393,13 +393,12 @@
   /* Implemented by
      aarch64_frecp<FRECP:frecp_suffix><mode>.  */
   BUILTIN_GPF_F16 (UNOP, frecpe, 0)
-  BUILTIN_GPF (BINOP, frecps, 0)
   BUILTIN_GPF_F16 (UNOP, frecpx, 0)
 
   BUILTIN_VDQ_SI (UNOP, urecpe, 0)
 
   BUILTIN_VHSDF (UNOP, frecpe, 0)
-  BUILTIN_VHSDF (BINOP, frecps, 0)
+  BUILTIN_VHSDF_HSDF (BINOP, frecps, 0)
 
   /* Implemented by a mixture of abs2 patterns.  Note the DImode builtin is
      only ever used for the int64x1_t intrinsic, there is no scalar version.  */
@@ -496,17 +495,23 @@
   /* Implemented by <FCVT_F2FIXED/FIXED2F:fcvt_fixed_insn><*><*>3.  */
   BUILTIN_VSDQ_HSDI (SHIFTIMM, scvtf, 3)
   BUILTIN_VSDQ_HSDI (FCVTIMM_SUS, ucvtf, 3)
-  BUILTIN_VHSDF_SDF (SHIFTIMM, fcvtzs, 3)
-  BUILTIN_VHSDF_SDF (SHIFTIMM_USS, fcvtzu, 3)
+  BUILTIN_VHSDF_HSDF (SHIFTIMM, fcvtzs, 3)
+  BUILTIN_VHSDF_HSDF (SHIFTIMM_USS, fcvtzu, 3)
+  VAR1 (SHIFTIMM, scvtfsi, 3, hf)
+  VAR1 (SHIFTIMM, scvtfdi, 3, hf)
+  VAR1 (FCVTIMM_SUS, ucvtfsi, 3, hf)
+  VAR1 (FCVTIMM_SUS, ucvtfdi, 3, hf)
+  BUILTIN_GPI (SHIFTIMM, fcvtzshf, 3)
+  BUILTIN_GPI (SHIFTIMM_USS, fcvtzuhf, 3)
 
   /* Implemented by aarch64_rsqrte<mode>.  */
   BUILTIN_VHSDF_HSDF (UNOP, rsqrte, 0)
 
   /* Implemented by aarch64_rsqrts<mode>.  */
-  BUILTIN_VHSDF_SDF (BINOP, rsqrts, 0)
+  BUILTIN_VHSDF_HSDF (BINOP, rsqrts, 0)
 
   /* Implemented by fabd<mode>3.  */
-  BUILTIN_VHSDF_SDF (BINOP, fabd, 3)
+  BUILTIN_VHSDF_HSDF (BINOP, fabd, 3)
 
   /* Implemented by aarch64_faddp<mode>.  */
   BUILTIN_VHSDF (BINOP, faddp, 0)
@@ -522,10 +527,10 @@
   BUILTIN_VHSDF_HSDF (UNOP, neg, 2)
 
   /* Implemented by aarch64_fac<optab><mode>.  */
-  BUILTIN_VHSDF_SDF (BINOP_USS, faclt, 0)
-  BUILTIN_VHSDF_SDF (BINOP_USS, facle, 0)
-  BUILTIN_VHSDF_SDF (BINOP_USS, facgt, 0)
-  BUILTIN_VHSDF_SDF (BINOP_USS, facge, 0)
+  BUILTIN_VHSDF_HSDF (BINOP_USS, faclt, 0)
+  BUILTIN_VHSDF_HSDF (BINOP_USS, facle, 0)
+  BUILTIN_VHSDF_HSDF (BINOP_USS, facgt, 0)
+  BUILTIN_VHSDF_HSDF (BINOP_USS, facge, 0)
 
   /* Implemented by sqrt<mode>2.  */
   VAR1 (UNOP, sqrt, 2, hf)
@@ -543,3 +548,7 @@
   BUILTIN_GPI_I16 (UNOPUS, fixuns_trunchf, 2)
   BUILTIN_GPI (UNOPUS, fixuns_truncsf, 2)
   BUILTIN_GPI (UNOPUS, fixuns_truncdf, 2)
+
+  /* Implemented by <fmaxmin><mode>3.  */
+  VAR1 (BINOP, fmax, 3, hf)
+  VAR1 (BINOP, fmin, 3, hf)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index d2a274495204c23d40b6522fe33599e407c71bc4..009997620ab86545d00ea8658d3787f18fa655e1 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -391,9 +391,9 @@
   [(set_attr "type" "neon_fp_rsqrte_<stype><q>")])
 
 (define_insn "aarch64_rsqrts<mode>"
-  [(set (match_operand:VHSDF_SDF 0 "register_operand" "=w")
-	(unspec:VHSDF_SDF [(match_operand:VHSDF_SDF 1 "register_operand" "w")
-			   (match_operand:VHSDF_SDF 2 "register_operand" "w")]
+  [(set (match_operand:VHSDF_HSDF 0 "register_operand" "=w")
+	(unspec:VHSDF_HSDF [(match_operand:VHSDF_HSDF 1 "register_operand" "w")
+			    (match_operand:VHSDF_HSDF 2 "register_operand" "w")]
 	 UNSPEC_RSQRTS))]
   "TARGET_SIMD"
   "frsqrts\\t%<v>0<Vmtype>, %<v>1<Vmtype>, %<v>2<Vmtype>"
@@ -475,11 +475,11 @@
 )
 
 (define_insn "fabd<mode>3"
-  [(set (match_operand:VHSDF_SDF 0 "register_operand" "=w")
-	(abs:VHSDF_SDF
-	  (minus:VHSDF_SDF
-	    (match_operand:VHSDF_SDF 1 "register_operand" "w")
-	    (match_operand:VHSDF_SDF 2 "register_operand" "w"))))]
+  [(set (match_operand:VHSDF_HSDF 0 "register_operand" "=w")
+	(abs:VHSDF_HSDF
+	  (minus:VHSDF_HSDF
+	    (match_operand:VHSDF_HSDF 1 "register_operand" "w")
+	    (match_operand:VHSDF_HSDF 2 "register_operand" "w"))))]
   "TARGET_SIMD"
   "fabd\t%<v>0<Vmtype>, %<v>1<Vmtype>, %<v>2<Vmtype>"
   [(set_attr "type" "neon_fp_abd_<stype><q>")]
@@ -3078,10 +3078,10 @@
 ;; fmulx.
 
 (define_insn "aarch64_fmulx<mode>"
-  [(set (match_operand:VHSDF_SDF 0 "register_operand" "=w")
-	(unspec:VHSDF_SDF
-	  [(match_operand:VHSDF_SDF 1 "register_operand" "w")
-	   (match_operand:VHSDF_SDF 2 "register_operand" "w")]
+  [(set (match_operand:VHSDF_HSDF 0 "register_operand" "=w")
+	(unspec:VHSDF_HSDF
+	  [(match_operand:VHSDF_HSDF 1 "register_operand" "w")
+	   (match_operand:VHSDF_HSDF 2 "register_operand" "w")]
 	   UNSPEC_FMULX))]
  "TARGET_SIMD"
  "fmulx\t%<v>0<Vmtype>, %<v>1<Vmtype>, %<v>2<Vmtype>"
@@ -4341,10 +4341,10 @@
   [(set (match_operand:<V_cmp_result> 0 "register_operand" "=w")
 	(neg:<V_cmp_result>
 	  (FAC_COMPARISONS:<V_cmp_result>
-	    (abs:VHSDF_SDF
-	      (match_operand:VHSDF_SDF 1 "register_operand" "w"))
-	    (abs:VHSDF_SDF
-	      (match_operand:VHSDF_SDF 2 "register_operand" "w"))
+	    (abs:VHSDF_HSDF
+	      (match_operand:VHSDF_HSDF 1 "register_operand" "w"))
+	    (abs:VHSDF_HSDF
+	      (match_operand:VHSDF_HSDF 2 "register_operand" "w"))
   )))]
   "TARGET_SIMD"
   "fac<n_optab>\t%<v>0<Vmtype>, %<v><cmp_1><Vmtype>, %<v><cmp_2><Vmtype>"
@@ -5460,10 +5460,10 @@
 )
 
 (define_insn "aarch64_frecps<mode>"
-  [(set (match_operand:VHSDF_SDF 0 "register_operand" "=w")
-	(unspec:VHSDF_SDF
-	  [(match_operand:VHSDF_SDF 1 "register_operand" "w")
-	  (match_operand:VHSDF_SDF 2 "register_operand" "w")]
+  [(set (match_operand:VHSDF_HSDF 0 "register_operand" "=w")
+	(unspec:VHSDF_HSDF
+	  [(match_operand:VHSDF_HSDF 1 "register_operand" "w")
+	  (match_operand:VHSDF_HSDF 2 "register_operand" "w")]
 	  UNSPEC_FRECPS))]
   "TARGET_SIMD"
   "frecps\\t%<v>0<Vmtype>, %<v>1<Vmtype>, %<v>2<Vmtype>"
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 27866ccd605abec6ea7c9110022f329c9b172ee0..dfcb92fddcb23b4b55aee0dd112d3017f2b2bfe4 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -7718,7 +7718,7 @@ aarch64_emit_approx_div (rtx quo, rtx num, rtx den)
 {
   machine_mode mode = GET_MODE (quo);
 
-  if (mode == V4HFmode || mode == V8HFmode)
+  if (mode == HFmode || mode == V4HFmode || mode == V8HFmode)
     return false;
 
   bool use_approx_division_p = (flag_mlow_precision_div
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 56ad581da6c85716256f22eafbe432cde486154c..6d0a9dcf90c9c63721ebe53e7e43b19db1f2d7b3 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4660,38 +4660,78 @@
    (set_attr "simd" "*, yes")]
 )
 
+(define_insn "<FCVT_F2FIXED:fcvt_fixed_insn>hf<mode>3"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+	(unspec:GPI [(match_operand:HF 1 "register_operand" "w")
+		     (match_operand:SI 2 "immediate_operand" "i")]
+	 FCVT_F2FIXED))]
+  "TARGET_FP_F16INST"
+   "<FCVT_F2FIXED:fcvt_fixed_insn>\t%<GPI:w>0, %h1, #%2"
+  [(set_attr "type" "f_cvtf2i")]
+)
+
+(define_insn "<FCVT_FIXED2F:fcvt_fixed_insn><mode>hf3"
+  [(set (match_operand:HF 0 "register_operand" "=w")
+	(unspec:HF [(match_operand:GPI 1 "register_operand" "r")
+		    (match_operand:SI 2 "immediate_operand" "i")]
+	 FCVT_FIXED2F))]
+  "TARGET_FP_F16INST"
+  "<FCVT_FIXED2F:fcvt_fixed_insn>\t%h0, %<GPI:w>1, #%2"
+  [(set_attr "type" "f_cvti2f")]
+)
+
+(define_insn "<FCVT_F2FIXED:fcvt_fixed_insn>hf3"
+  [(set (match_operand:HI 0 "register_operand" "=w")
+	(unspec:HI [(match_operand:HF 1 "register_operand" "w")
+		    (match_operand:SI 2 "immediate_operand" "i")]
+	 FCVT_F2FIXED))]
+  "TARGET_SIMD"
+  "<FCVT_F2FIXED:fcvt_fixed_insn>\t%h0, %h1, #%2"
+  [(set_attr "type" "neon_fp_to_int_s")]
+)
+
+(define_insn "<FCVT_FIXED2F:fcvt_fixed_insn>hi3"
+  [(set (match_operand:HF 0 "register_operand" "=w")
+	(unspec:HF [(match_operand:HI 1 "register_operand" "w")
+		    (match_operand:SI 2 "immediate_operand" "i")]
+	 FCVT_FIXED2F))]
+  "TARGET_SIMD"
+  "<FCVT_FIXED2F:fcvt_fixed_insn>\t%h0, %h1, #%2"
+  [(set_attr "type" "neon_int_to_fp_s")]
+)
+
 ;; -------------------------------------------------------------------
 ;; Floating-point arithmetic
 ;; -------------------------------------------------------------------
 
 (define_insn "add<mode>3"
-  [(set (match_operand:GPF 0 "register_operand" "=w")
-        (plus:GPF
-         (match_operand:GPF 1 "register_operand" "w")
-         (match_operand:GPF 2 "register_operand" "w")))]
+  [(set (match_operand:GPF_F16 0 "register_operand" "=w")
+	(plus:GPF_F16
+	 (match_operand:GPF_F16 1 "register_operand" "w")
+	 (match_operand:GPF_F16 2 "register_operand" "w")))]
   "TARGET_FLOAT"
   "fadd\\t%<s>0, %<s>1, %<s>2"
-  [(set_attr "type" "fadd<s>")]
+  [(set_attr "type" "fadd<stype>")]
 )
 
 (define_insn "sub<mode>3"
-  [(set (match_operand:GPF 0 "register_operand" "=w")
-        (minus:GPF
-         (match_operand:GPF 1 "register_operand" "w")
-         (match_operand:GPF 2 "register_operand" "w")))]
+  [(set (match_operand:GPF_F16 0 "register_operand" "=w")
+	(minus:GPF_F16
+	 (match_operand:GPF_F16 1 "register_operand" "w")
+	 (match_operand:GPF_F16 2 "register_operand" "w")))]
   "TARGET_FLOAT"
   "fsub\\t%<s>0, %<s>1, %<s>2"
-  [(set_attr "type" "fadd<s>")]
+  [(set_attr "type" "fadd<stype>")]
 )
 
 (define_insn "mul<mode>3"
-  [(set (match_operand:GPF 0 "register_operand" "=w")
-        (mult:GPF
-         (match_operand:GPF 1 "register_operand" "w")
-         (match_operand:GPF 2 "register_operand" "w")))]
+  [(set (match_operand:GPF_F16 0 "register_operand" "=w")
+	(mult:GPF_F16
+	 (match_operand:GPF_F16 1 "register_operand" "w")
+	 (match_operand:GPF_F16 2 "register_operand" "w")))]
   "TARGET_FLOAT"
   "fmul\\t%<s>0, %<s>1, %<s>2"
-  [(set_attr "type" "fmul<s>")]
+  [(set_attr "type" "fmul<stype>")]
 )
 
 (define_insn "*fnmul<mode>3"
@@ -4715,9 +4755,9 @@
 )
 
 (define_expand "div<mode>3"
- [(set (match_operand:GPF 0 "register_operand")
-       (div:GPF (match_operand:GPF 1 "general_operand")
-		(match_operand:GPF 2 "register_operand")))]
+ [(set (match_operand:GPF_F16 0 "register_operand")
+       (div:GPF_F16 (match_operand:GPF_F16 1 "general_operand")
+		    (match_operand:GPF_F16 2 "register_operand")))]
  "TARGET_SIMD"
 {
   if (aarch64_emit_approx_div (operands[0], operands[1], operands[2]))
@@ -4727,12 +4767,12 @@
 })
 
 (define_insn "*div<mode>3"
-  [(set (match_operand:GPF 0 "register_operand" "=w")
-        (div:GPF (match_operand:GPF 1 "register_operand" "w")
-	         (match_operand:GPF 2 "register_operand" "w")))]
+  [(set (match_operand:GPF_F16 0 "register_operand" "=w")
+	(div:GPF_F16 (match_operand:GPF_F16 1 "register_operand" "w")
+		     (match_operand:GPF_F16 2 "register_operand" "w")))]
   "TARGET_FLOAT"
   "fdiv\\t%<s>0, %<s>1, %<s>2"
-  [(set_attr "type" "fdiv<s>")]
+  [(set_attr "type" "fdiv<stype>")]
 )
 
 (define_insn "neg<mode>2"
@@ -4792,13 +4832,13 @@
 
 ;; Scalar forms for the IEEE-754 fmax()/fmin() functions
 (define_insn "<fmaxmin><mode>3"
-  [(set (match_operand:GPF 0 "register_operand" "=w")
-	(unspec:GPF [(match_operand:GPF 1 "register_operand" "w")
-		     (match_operand:GPF 2 "register_operand" "w")]
+  [(set (match_operand:GPF_F16 0 "register_operand" "=w")
+	(unspec:GPF_F16 [(match_operand:GPF_F16 1 "register_operand" "w")
+		     (match_operand:GPF_F16 2 "register_operand" "w")]
 		     FMAXMIN))]
   "TARGET_FLOAT"
   "<fmaxmin_op>\\t%<s>0, %<s>1, %<s>2"
-  [(set_attr "type" "f_minmax<s>")]
+  [(set_attr "type" "f_minmax<stype>")]
 )
 
 ;; For copysign (x, y), we want to generate:
diff --git a/gcc/config/aarch64/arm_fp16.h b/gcc/config/aarch64/arm_fp16.h
index 818aa61925b6c78ec93149b391a562bd1aea0b50..21edc65695365d269c8e0d1ae4cd01459f9fdcfb 100644
--- a/gcc/config/aarch64/arm_fp16.h
+++ b/gcc/config/aarch64/arm_fp16.h
@@ -360,6 +360,206 @@ vsqrth_f16 (float16_t __a)
   return __builtin_aarch64_sqrthf (__a);
 }
 
+/* ARMv8.2-A FP16 two operands scalar intrinsics.  */
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vaddh_f16 (float16_t __a, float16_t __b)
+{
+  return __a + __b;
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vabdh_f16 (float16_t __a, float16_t __b)
+{
+  return __builtin_aarch64_fabdhf (__a, __b);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vcageh_f16 (float16_t __a, float16_t __b)
+{
+  return __builtin_aarch64_facgehf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vcagth_f16 (float16_t __a, float16_t __b)
+{
+  return __builtin_aarch64_facgthf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vcaleh_f16 (float16_t __a, float16_t __b)
+{
+  return __builtin_aarch64_faclehf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vcalth_f16 (float16_t __a, float16_t __b)
+{
+  return __builtin_aarch64_faclthf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vceqh_f16 (float16_t __a, float16_t __b)
+{
+  return __builtin_aarch64_cmeqhf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vcgeh_f16 (float16_t __a, float16_t __b)
+{
+  return __builtin_aarch64_cmgehf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vcgth_f16 (float16_t __a, float16_t __b)
+{
+  return __builtin_aarch64_cmgthf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vcleh_f16 (float16_t __a, float16_t __b)
+{
+  return __builtin_aarch64_cmlehf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vclth_f16 (float16_t __a, float16_t __b)
+{
+  return __builtin_aarch64_cmlthf_uss (__a, __b);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vcvth_n_f16_s16 (int16_t __a, const int __b)
+{
+  return __builtin_aarch64_scvtfhi (__a, __b);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vcvth_n_f16_s32 (int32_t __a, const int __b)
+{
+  return __builtin_aarch64_scvtfsihf (__a, __b);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vcvth_n_f16_s64 (int64_t __a, const int __b)
+{
+  return __builtin_aarch64_scvtfdihf (__a, __b);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vcvth_n_f16_u16 (uint16_t __a, const int __b)
+{
+  return __builtin_aarch64_ucvtfhi_sus (__a, __b);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vcvth_n_f16_u32 (uint32_t __a, const int __b)
+{
+  return __builtin_aarch64_ucvtfsihf_sus (__a, __b);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vcvth_n_f16_u64 (uint64_t __a, const int __b)
+{
+  return __builtin_aarch64_ucvtfdihf_sus (__a, __b);
+}
+
+__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+vcvth_n_s16_f16 (float16_t __a, const int __b)
+{
+  return __builtin_aarch64_fcvtzshf (__a, __b);
+}
+
+__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+vcvth_n_s32_f16 (float16_t __a, const int __b)
+{
+  return __builtin_aarch64_fcvtzshfsi (__a, __b);
+}
+
+__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+vcvth_n_s64_f16 (float16_t __a, const int __b)
+{
+  return __builtin_aarch64_fcvtzshfdi (__a, __b);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vcvth_n_u16_f16 (float16_t __a, const int __b)
+{
+  return __builtin_aarch64_fcvtzuhf_uss (__a, __b);
+}
+
+__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+vcvth_n_u32_f16 (float16_t __a, const int __b)
+{
+  return __builtin_aarch64_fcvtzuhfsi_uss (__a, __b);
+}
+
+__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+vcvth_n_u64_f16 (float16_t __a, const int __b)
+{
+  return __builtin_aarch64_fcvtzuhfdi_uss (__a, __b);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vdivh_f16 (float16_t __a, float16_t __b)
+{
+  return __a / __b;
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vmaxh_f16 (float16_t __a, float16_t __b)
+{
+  return __builtin_aarch64_fmaxhf (__a, __b);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vmaxnmh_f16 (float16_t __a, float16_t __b)
+{
+  return __builtin_aarch64_fmaxhf (__a, __b);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vminh_f16 (float16_t __a, float16_t __b)
+{
+  return __builtin_aarch64_fminhf (__a, __b);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vminnmh_f16 (float16_t __a, float16_t __b)
+{
+  return __builtin_aarch64_fminhf (__a, __b);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vmulh_f16 (float16_t __a, float16_t __b)
+{
+  return __a * __b;
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vmulxh_f16 (float16_t __a, float16_t __b)
+{
+  return __builtin_aarch64_fmulxhf (__a, __b);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vrecpsh_f16 (float16_t __a, float16_t __b)
+{
+  return __builtin_aarch64_frecpshf (__a, __b);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vrsqrtsh_f16 (float16_t __a, float16_t __b)
+{
+  return __builtin_aarch64_rsqrtshf (__a, __b);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vsubh_f16 (float16_t __a, float16_t __b)
+{
+  return __a - __b;
+}
+
 #pragma GCC pop_options
 
 #endif
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 20d0f1bf615396e0d662a51e9c5c9895046cd090..91e2e6467b8de6408265f2095cfb4aaf80840559 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -105,9 +105,6 @@
 (define_mode_iterator VHSDF_DF [(V4HF "TARGET_SIMD_F16INST")
 				(V8HF "TARGET_SIMD_F16INST")
 				V2SF V4SF V2DF DF])
-(define_mode_iterator VHSDF_SDF [(V4HF "TARGET_SIMD_F16INST")
-				 (V8HF "TARGET_SIMD_F16INST")
-				 V2SF V4SF V2DF SF DF])
 (define_mode_iterator VHSDF_HSDF [(V4HF "TARGET_SIMD_F16INST")
 				  (V8HF "TARGET_SIMD_F16INST")
 				  V2SF V4SF V2DF
@@ -190,7 +187,9 @@
 ;; Scalar and Vector modes for S and D, Vector modes for H.
 (define_mode_iterator VSDQ_HSDI [(V4HI "TARGET_SIMD_F16INST")
 				 (V8HI "TARGET_SIMD_F16INST")
-				 V2SI V4SI V2DI SI DI])
+				 V2SI V4SI V2DI
+				 (HI "TARGET_SIMD_F16INST")
+				 SI DI])
 
 ;; Vector modes for Q and H types.
 (define_mode_iterator VDQQH [V8QI V16QI V4HI V8HI])
@@ -705,12 +704,12 @@
 			       (V2DI "v2df") (V4SI "v4sf") (V2SI "v2sf")
 			       (SF "si") (DF "di") (SI "sf") (DI "df")
 			       (V4HF "v4hi") (V8HF "v8hi") (V4HI "v4hf")
-			       (V8HI "v8hf")])
+			       (V8HI "v8hf") (HF "hi") (HI "hf")])
 (define_mode_attr FCVT_TARGET [(V2DF "V2DI") (V4SF "V4SI") (V2SF "V2SI")
 			       (V2DI "V2DF") (V4SI "V4SF") (V2SI "V2SF")
 			       (SF "SI") (DF "DI") (SI "SF") (DI "DF")
 			       (V4HF "V4HI") (V8HF "V8HI") (V4HI "V4HF")
-			       (V8HI "V8HF")])
+			       (V8HI "V8HF") (HF "HI") (HI "HF")])
 
 
 ;; for the inequal width integer to fp conversions

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [AArch64][2/14] ARMv8.2-A FP16 one operand vector intrinsics
  2016-07-20 17:00     ` Jiong Wang
@ 2016-07-25 11:01       ` James Greenhalgh
  0 siblings, 0 replies; 32+ messages in thread
From: James Greenhalgh @ 2016-07-25 11:01 UTC (permalink / raw)
  To: Jiong Wang; +Cc: GCC Patches, nd

On Wed, Jul 20, 2016 at 06:00:34PM +0100, Jiong Wang wrote:
> On 07/07/16 17:14, Jiong Wang wrote:
> >This patch add ARMv8.2-A FP16 one operand vector intrinsics.
> >
> >We introduced new mode iterators to cover HF modes, qualified patterns
> >which was using old mode iterators are switched to new ones.
> >
> >We can't simply extend old iterator like VDQF to conver HF modes,
> >because not all patterns using VDQF are with new FP16 support, thus we
> >introduced new, temperary iterators, and only apply new iterators on
> >those patterns which do have FP16 supports.
> 
> I noticed the patchset at
> 
>   https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00308.html
> 
> has some modifications on the standard name "div" and "sqrt", thus there
> are minor conflicts as this patch touch "sqrt" as well.
> 
> This patch resolve the conflict and the change is to let
> aarch64_emit_approx_sqrt simply return false for V4HFmode and V8HFmode.

This is OK for trunk, with one modification...

> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 58a9d695c0ef9e6e1d67030580428699aba05be4..5ed633542efe58763d68fd9bfbb478ae6ef569c3 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -7598,6 +7598,10 @@ bool
>  aarch64_emit_approx_sqrt (rtx dst, rtx src, bool recp)
>  {
>    machine_mode mode = GET_MODE (dst);
> +
> +  if (mode == V4HFmode || mode == V8HFmode)
> +    return false;
> +

Given that you don't plan to handle any HFmode modes here, I'd prefer:

  if (GET_MODE_INNER (mode) == HFmode)
    return false;

That'll save you from updating this again in patch 7/14.

Otherwise, this is OK.

Thanks,
James


> gcc/
> 2016-07-20  Jiong Wang<jiong.wang@arm.com>
> 
>         * config/aarch64/aarch64-builtins.c (TYPES_BINOP_USS): New.
>         * config/aarch64/aarch64-simd-builtins.def: Register new builtins.
>         * config/aarch64/aarch64-simd.md (aarch64_rsqrte<mode>): Extend to HF modes.
>         (neg<mode>2): Likewise.
>         (abs<mode>2): Likewise.
>         (<frint_pattern><mode>2): Likewise.
>         (l<fcvt_pattern><su_optab><VDQF:mode><fcvt_target>2): Likewise.
>         (<optab><VDQF:mode><fcvt_target>2): Likewise.
>         (<fix_trunc_optab><VDQF:mode><fcvt_target>2): Likewise.
>         (ftrunc<VDQF:mode>2): Likewise.
>         (<optab><fcvt_target><VDQF:mode>2): Likewise.
>         (sqrt<mode>2): Likewise.
>         (*sqrt<mode>2): Likewise.
>         (aarch64_frecpe<mode>): Likewise.
>         (aarch64_cm<optab><mode>): Likewise.
>         * config/aarch64/aarch64.c (aarch64_emit_approx_sqrt): Return
>         false for V4HF and V8HF.
>         * config/aarch64/iterators.md (VHSDF, VHSDF_DF, VHSDF_SDF): New.
>         (VDQF_COND, fcvt_target, FCVT_TARGET, hcon): Extend mode attribute to HF modes.
>         (stype): New.
>         * config/aarch64/arm_neon.h (vdup_n_f16): New.
>         (vdupq_n_f16): Likewise.
>         (vld1_dup_f16): Use vdup_n_f16.
>         (vld1q_dup_f16): Use vdupq_n_f16.
>         (vabs_f16): New.
>         (vabsq_f16): Likewise.
>         (vceqz_f16): Likewise.
>         (vceqzq_f16): Likewise.
>         (vcgez_f16): Likewise.
>         (vcgezq_f16): Likewise.
>         (vcgtz_f16): Likewise.
>         (vcgtzq_f16): Likewise.
>         (vclez_f16): Likewise.
>         (vclezq_f16): Likewise.
>         (vcltz_f16): Likewise.
>         (vcltzq_f16): Likewise.
>         (vcvt_f16_s16): Likewise.
>         (vcvtq_f16_s16): Likewise.
>         (vcvt_f16_u16): Likewise.
>         (vcvtq_f16_u16): Likewise.
>         (vcvt_s16_f16): Likewise.
>         (vcvtq_s16_f16): Likewise.
>         (vcvt_u16_f16): Likewise.
>         (vcvtq_u16_f16): Likewise.
>         (vcvta_s16_f16): Likewise.
>         (vcvtaq_s16_f16): Likewise.
>         (vcvta_u16_f16): Likewise.
>         (vcvtaq_u16_f16): Likewise.
>         (vcvtm_s16_f16): Likewise.
>         (vcvtmq_s16_f16): Likewise.
>         (vcvtm_u16_f16): Likewise.
>         (vcvtmq_u16_f16): Likewise.
>         (vcvtn_s16_f16): Likewise.
>         (vcvtnq_s16_f16): Likewise.
>         (vcvtn_u16_f16): Likewise.
>         (vcvtnq_u16_f16): Likewise.
>         (vcvtp_s16_f16): Likewise.
>         (vcvtpq_s16_f16): Likewise.
>         (vcvtp_u16_f16): Likewise.
>         (vcvtpq_u16_f16): Likewise.
>         (vneg_f16): Likewise.
>         (vnegq_f16): Likewise.
>         (vrecpe_f16): Likewise.
>         (vrecpeq_f16): Likewise.
>         (vrnd_f16): Likewise.
>         (vrndq_f16): Likewise.
>         (vrnda_f16): Likewise.
>         (vrndaq_f16): Likewise.
>         (vrndi_f16): Likewise.
>         (vrndiq_f16): Likewise.
>         (vrndm_f16): Likewise.
>         (vrndmq_f16): Likewise.
>         (vrndn_f16): Likewise.
>         (vrndnq_f16): Likewise.
>         (vrndp_f16): Likewise.
>         (vrndpq_f16): Likewise.
>         (vrndx_f16): Likewise.
>         (vrndxq_f16): Likewise.
>         (vrsqrte_f16): Likewise.
>         (vrsqrteq_f16): Likewise.
>         (vsqrt_f16): Likewise.
>         (vsqrtq_f16): Likewise.
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [AArch64][3/14] ARMv8.2-A FP16 two operands vector intrinsics
  2016-07-20 17:01       ` Jiong Wang
@ 2016-07-25 11:03         ` James Greenhalgh
  0 siblings, 0 replies; 32+ messages in thread
From: James Greenhalgh @ 2016-07-25 11:03 UTC (permalink / raw)
  To: Jiong Wang; +Cc: GCC Patches, nd

On Wed, Jul 20, 2016 at 06:00:46PM +0100, Jiong Wang wrote:
> On 07/07/16 17:15, Jiong Wang wrote:
> >This patch add ARMv8.2-A FP16 two operands vector intrinsics.
> 
> The updated patch resolve the conflict with
> 
>    https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00309.html
> 
> The change is to let aarch64_emit_approx_div return false for
> V4HFmode and V8HFmode.

As with patch 2/14, please rewrite this hunk:

> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 5ed633542efe58763d68fd9bfbb478ae6ef569c3..a7437c04eb936a5e3ebd0bc77eb4afd8c052df28 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -7717,6 +7717,10 @@ bool
>  aarch64_emit_approx_div (rtx quo, rtx num, rtx den)
>  {
>    machine_mode mode = GET_MODE (quo);
> +
> +  if (mode == V4HFmode || mode == V8HFmode)
> +    return false;
> +

 To:

  if (GET_MODE_INNER (mode) == HFmode)
    return false;

Otherwise, this patch is OK for trunk.

Thanks,
James

> gcc/
> 2016-07-20  Jiong Wang<jiong.wang@arm.com>
> 
>         * config/aarch64/aarch64-simd-builtins.def: Register new builtins.
>         * config/aarch64/aarch64-simd.md
>         (aarch64_rsqrts<mode>): Extend to HF modes.
>         (fabd<mode>3): Likewise.
>         (<FCVT_F2FIXED:fcvt_fixed_insn><VHSDF_SDF:mode>3): Likewise.
>         (<FCVT_FIXED2F:fcvt_fixed_insn><VHSDI_SDI:mode>3): Likewise.
>         (aarch64_<maxmin_uns>p<mode>): Likewise.
>         (<su><maxmin><mode>3): Likewise.
>         (<maxmin_uns><mode>3): Likewise.
>         (<fmaxmin><mode>3): Likewise.
>         (aarch64_faddp<mode>): Likewise.
>         (aarch64_fmulx<mode>): Likewise.
>         (aarch64_frecps<mode>): Likewise.
>         (*aarch64_fac<optab><mode>): Rename to aarch64_fac<optab><mode>.
>         (add<mode>3): Extend to HF modes.
>         (sub<mode>3): Likewise.
>         (mul<mode>3): Likewise.
>         (div<mode>3): Likewise.
>         (*div<mode>3): Likewise.
>         * config/aarch64/aarch64.c (aarch64_emit_approx_div): Return
>         false for V4HF and V8HF.
>         * config/aarch64/iterators.md (VDQ_HSDI, VSDQ_HSDI): New mode
>         iterator.
>         * config/aarch64/arm_neon.h (vadd_f16): Likewise.
>         (vaddq_f16): Likewise.
>         (vabd_f16): Likewise.
>         (vabdq_f16): Likewise.
>         (vcage_f16): Likewise.
>         (vcageq_f16): Likewise.
>         (vcagt_f16): Likewise.
>         (vcagtq_f16): Likewise.
>         (vcale_f16): Likewise.
>         (vcaleq_f16): Likewise.
>         (vcalt_f16): Likewise.
>         (vcaltq_f16): Likewise.
>         (vceq_f16): Likewise.
>         (vceqq_f16): Likewise.
>         (vcge_f16): Likewise.
>         (vcgeq_f16): Likewise.
>         (vcgt_f16): Likewise.
>         (vcgtq_f16): Likewise.
>         (vcle_f16): Likewise.
>         (vcleq_f16): Likewise.
>         (vclt_f16): Likewise.
>         (vcltq_f16): Likewise.
>         (vcvt_n_f16_s16): Likewise.
>         (vcvtq_n_f16_s16): Likewise.
>         (vcvt_n_f16_u16): Likewise.
>         (vcvtq_n_f16_u16): Likewise.
>         (vcvt_n_s16_f16): Likewise.
>         (vcvtq_n_s16_f16): Likewise.
>         (vcvt_n_u16_f16): Likewise.
>         (vcvtq_n_u16_f16): Likewise.
>         (vdiv_f16): Likewise.
>         (vdivq_f16): Likewise.
>         (vdup_lane_f16): Likewise.
>         (vdup_laneq_f16): Likewise.
>         (vdupq_lane_f16): Likewise.
>         (vdupq_laneq_f16): Likewise.
>         (vdups_lane_f16): Likewise.
>         (vdups_laneq_f16): Likewise.
>         (vmax_f16): Likewise.
>         (vmaxq_f16): Likewise.
>         (vmaxnm_f16): Likewise.
>         (vmaxnmq_f16): Likewise.
>         (vmin_f16): Likewise.
>         (vminq_f16): Likewise.
>         (vminnm_f16): Likewise.
>         (vminnmq_f16): Likewise.
>         (vmul_f16): Likewise.
>         (vmulq_f16): Likewise.
>         (vmulx_f16): Likewise.
>         (vmulxq_f16): Likewise.
>         (vpadd_f16): Likewise.
>         (vpaddq_f16): Likewise.
>         (vpmax_f16): Likewise.
>         (vpmaxq_f16): Likewise.
>         (vpmaxnm_f16): Likewise.
>         (vpmaxnmq_f16): Likewise.
>         (vpmin_f16): Likewise.
>         (vpminq_f16): Likewise.
>         (vpminnm_f16): Likewise.
>         (vpminnmq_f16): Likewise.
>         (vrecps_f16): Likewise.
>         (vrecpsq_f16): Likewise.
>         (vrsqrts_f16): Likewise.
>         (vrsqrtsq_f16): Likewise.
>         (vsub_f16): Likewise.
>         (vsubq_f16): Likewise.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [AArch64][4/14] ARMv8.2-A FP16 three operands vector intrinsics
  2016-07-07 16:16       ` [AArch64][4/14] ARMv8.2-A FP16 three " Jiong Wang
@ 2016-07-25 11:05         ` James Greenhalgh
  0 siblings, 0 replies; 32+ messages in thread
From: James Greenhalgh @ 2016-07-25 11:05 UTC (permalink / raw)
  To: Jiong Wang; +Cc: GCC Patches, nd

On Thu, Jul 07, 2016 at 05:16:01PM +0100, Jiong Wang wrote:
> This patch add ARMv8.2-A FP16 three operands vector intrinsics.
> 
> Three operands intrinsics only contain fma and fms.

OK.

Thanks,
James

> 
> 2016-07-07  Jiong Wang <jiong.wang@arm.com>
> 
> gcc/
>         * config/aarch64/aarch64-simd-builtins.def: Register new builtins.
>         * config/aarch64/aarch64-simd.md (fma<mode>4): Extend to HF modes.
>         (fnma<mode>4): Likewise.
>         * config/aarch64/arm_neon.h (vfma_f16): New.
>         (vfmaq_f16): Likewise.
>         (vfms_f16): Likewise.
>         (vfmsq_f16): Likewise.
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [AArch64][6/14] ARMv8.2-A FP16 reduction vector intrinsics
  2016-07-07 16:17           ` [AArch64][6/14] ARMv8.2-A FP16 reduction " Jiong Wang
@ 2016-07-25 11:06             ` James Greenhalgh
  0 siblings, 0 replies; 32+ messages in thread
From: James Greenhalgh @ 2016-07-25 11:06 UTC (permalink / raw)
  To: Jiong Wang; +Cc: GCC Patches, nd

On Thu, Jul 07, 2016 at 05:16:58PM +0100, Jiong Wang wrote:
> This patch add ARMv8.2-A FP16 reduction vector intrinsics.

OK.

Thanks,
James

> gcc/
> 2016-07-07  Jiong Wang <jiong.wang@arm.com>
> 
>         * config/aarch64/arm_neon.h (vmaxv_f16): New.
>         (vmaxvq_f16): Likewise.
>         (vminv_f16): Likewise.
>         (vminvq_f16): Likewise.
>         (vmaxnmv_f16): Likewise.
>         (vmaxnmvq_f16): Likewise.
>         (vminnmv_f16): Likewise.
>         (vminnmvq_f16): Likewise.
>         * config/aarch64/iterators.md (vp): Support HF modes.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [AArch64][5/14] ARMv8.2-A FP16 lane vector intrinsics
  2016-07-07 16:16         ` [AArch64][5/14] ARMv8.2-A FP16 lane " Jiong Wang
@ 2016-07-25 11:06           ` James Greenhalgh
  0 siblings, 0 replies; 32+ messages in thread
From: James Greenhalgh @ 2016-07-25 11:06 UTC (permalink / raw)
  To: Jiong Wang; +Cc: GCC Patches, nd

On Thu, Jul 07, 2016 at 05:16:28PM +0100, Jiong Wang wrote:
> This patch add ARMv8.2-A FP16 lane vector intrinsics.
> 
> Lane intrinsics are generally derivatives of multiply intrinsics,
> including multiply accumulate.  All necessary backend support for them
> are there already except fmulx, the implementions are largely a
> combination of existed multiply intrinsics with vdup intrinsics

OK.

Thanks,
James

> 2016-07-07  Jiong Wang <jiong.wang@arm.com>
> 
> gcc/
>         * config/aarch64/aarch64-simd.md
> (*aarch64_mulx_elt_to_64v2df): Rename to
>         "*aarch64_mulx_elt_from_dup<mode>".
>         (*aarch64_mul3_elt<mode>): Update schedule type.
>         (*aarch64_mul3_elt_from_dup<mode>): Likewise.
>         (*aarch64_fma4_elt_from_dup<mode>): Likewise.
>         (*aarch64_fnma4_elt_from_dup<mode>): Likewise.
>         * config/aarch64/iterators.md (VMUL): Supprt half precision
> float modes.
>         (f, fp): Support HF modes.
>         * config/aarch64/arm_neon.h (vfma_lane_f16): New.
>         (vfmaq_lane_f16): Likewise.
>         (vfma_laneq_f16): Likewise.
>         (vfmaq_laneq_f16): Likewise.
>         (vfma_n_f16): Likewise.
>         (vfmaq_n_f16): Likewise.
>         (vfms_lane_f16): Likewise.
>         (vfmsq_lane_f16): Likewise.
>         (vfms_laneq_f16): Likewise.
>         (vfmsq_laneq_f16): Likewise.
>         (vfms_n_f16): Likewise.
>         (vfmsq_n_f16): Likewise.
>         (vmul_lane_f16): Likewise.
>         (vmulq_lane_f16): Likewise.
>         (vmul_laneq_f16): Likewise.
>         (vmulq_laneq_f16): Likewise.
>         (vmul_n_f16): Likewise.
>         (vmulq_n_f16): Likewise.
>         (vmulx_lane_f16): Likewise.
>         (vmulxq_lane_f16): Likewise.
>         (vmulx_laneq_f16): Likewise.
>         (vmulxq_laneq_f16): Likewise.
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [AArch64][7/14] ARMv8.2-A FP16 one operand scalar intrinsics
  2016-07-20 17:01                 ` Jiong Wang
@ 2016-07-25 11:14                   ` James Greenhalgh
  0 siblings, 0 replies; 32+ messages in thread
From: James Greenhalgh @ 2016-07-25 11:14 UTC (permalink / raw)
  To: Jiong Wang; +Cc: GCC Patches, nd

On Wed, Jul 20, 2016 at 06:00:53PM +0100, Jiong Wang wrote:
> On 07/07/16 17:17, Jiong Wang wrote:
> >This patch add ARMv8.2-A FP16 one operand scalar intrinsics
> >
> >Scalar intrinsics are kept in arm_fp16.h instead of arm_neon.h.
> 
> The updated patch resolve the conflict with
> 
>    https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00308.html
> 
> The change is to let aarch64_emit_approx_sqrt return false for HFmode.

OK, but...

> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index a7437c04eb936a5e3ebd0bc77eb4afd8c052df28..27866ccd605abec6ea7c9110022f329c9b172ee0 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -7599,7 +7599,7 @@ aarch64_emit_approx_sqrt (rtx dst, rtx src, bool recp)
>  {
>    machine_mode mode = GET_MODE (dst);
>  
> -  if (mode == V4HFmode || mode == V8HFmode)
> +  if (mode == HFmode || mode == V4HFmode || mode == V8HFmode)
>      return false;

...if you take my advice on patch 2/14, you won't need this change.

Otherwise, OK.

Thanks,
James

> gcc/
> 2016-07-20  Jiong Wang<jiong.wang@arm.com>
> 
>         * config.gcc (aarch64*-*-*): Install arm_fp16.h.
>         * config/aarch64/aarch64-builtins.c (hi_UP): New.
>         * config/aarch64/aarch64-simd-builtins.def: Register new builtins.
>         * config/aarch64/aarch64-simd.md (aarch64_frsqrte<mode>): Extend to HF mode.
>         (aarch64_frecp<FRECP:frecp_suffix><mode>): Likewise.
>         (aarch64_cm<optab><mode>): Likewise.
>         * config/aarch64/aarch64.md (<frint_pattern><mode>2): Likewise.
>         (l<fcvt_pattern><su_optab><GPF:mode><GPI:mode>2): Likewise.
>         (fix_trunc<GPF:mode><GPI:mode>2): Likewise.
>         (sqrt<mode>2): Likewise.
>         (*sqrt<mode>2): Likewise.
>         (abs<mode>2): Likewise.
>         (<optab><mode>hf2): New pattern for HF mode.
>         (<optab>hihf2): Likewise.
>         * config/aarch64/aarch64.c (aarch64_emit_approx_sqrt): Return
>         for HF mode.
>         * config/aarch64/arm_neon.h: Include arm_fp16.h.
>         * config/aarch64/iterators.md (GPF_F16): New.
>         (GPI_F16): Likewise.
>         (VHSDF_HSDF): Likewise.
>         (w1): Support HF mode.
>         (w2): Likewise.
>         (v): Likewise.
>         (s): Likewise.
>         (q): Likewise.
>         (Vmtype): Likewise.
>         (V_cmp_result): Likewise.
>         (fcvt_iesize): Likewise.
>         (FCVT_IESIZE): Likewise.
>         * config/aarch64/arm_fp16.h: New file.
>         (vabsh_f16): New.
>         (vceqzh_f16): Likewise.
>         (vcgezh_f16): Likewise.
>         (vcgtzh_f16): Likewise.
>         (vclezh_f16): Likewise.
>         (vcltzh_f16): Likewise.
>         (vcvth_f16_s16): Likewise.
>         (vcvth_f16_s32): Likewise.
>         (vcvth_f16_s64): Likewise.
>         (vcvth_f16_u16): Likewise.
>         (vcvth_f16_u32): Likewise.
>         (vcvth_f16_u64): Likewise.
>         (vcvth_s16_f16): Likewise.
>         (vcvth_s32_f16): Likewise.
>         (vcvth_s64_f16): Likewise.
>         (vcvth_u16_f16): Likewise.
>         (vcvth_u32_f16): Likewise.
>         (vcvth_u64_f16): Likewise.
>         (vcvtah_s16_f16): Likewise.
>         (vcvtah_s32_f16): Likewise.
>         (vcvtah_s64_f16): Likewise.
>         (vcvtah_u16_f16): Likewise.
>         (vcvtah_u32_f16): Likewise.
>         (vcvtah_u64_f16): Likewise.
>         (vcvtmh_s16_f16): Likewise.
>         (vcvtmh_s32_f16): Likewise.
>         (vcvtmh_s64_f16): Likewise.
>         (vcvtmh_u16_f16): Likewise.
>         (vcvtmh_u32_f16): Likewise.
>         (vcvtmh_u64_f16): Likewise.
>         (vcvtnh_s16_f16): Likewise.
>         (vcvtnh_s32_f16): Likewise.
>         (vcvtnh_s64_f16): Likewise.
>         (vcvtnh_u16_f16): Likewise.
>         (vcvtnh_u32_f16): Likewise.
>         (vcvtnh_u64_f16): Likewise.
>         (vcvtph_s16_f16): Likewise.
>         (vcvtph_s32_f16): Likewise.
>         (vcvtph_s64_f16): Likewise.
>         (vcvtph_u16_f16): Likewise.
>         (vcvtph_u32_f16): Likewise.
>         (vcvtph_u64_f16): Likewise.
>         (vnegh_f16): Likewise.
>         (vrecpeh_f16): Likewise.
>         (vrecpxh_f16): Likewise.
>         (vrndh_f16): Likewise.
>         (vrndah_f16): Likewise.
>         (vrndih_f16): Likewise.
>         (vrndmh_f16): Likewise.
>         (vrndnh_f16): Likewise.
>         (vrndph_f16): Likewise.
>         (vrndxh_f16): Likewise.
>         (vrsqrteh_f16): Likewise.
>         (vsqrth_f16): Likewise.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [AArch64][8/14] ARMv8.2-A FP16 two operands scalar intrinsics
  2016-07-20 17:01                 ` Jiong Wang
@ 2016-07-25 11:15                   ` James Greenhalgh
  0 siblings, 0 replies; 32+ messages in thread
From: James Greenhalgh @ 2016-07-25 11:15 UTC (permalink / raw)
  To: Jiong Wang; +Cc: GCC Patches, nd

On Wed, Jul 20, 2016 at 06:00:58PM +0100, Jiong Wang wrote:
> On 07/07/16 17:17, Jiong Wang wrote:
> >This patch add ARMv8.2-A FP16 two operands scalar intrinsics.
> 
> The updated patch resolve the conflict with
> 
>    https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00309.html
> 
> The change is to let aarch64_emit_approx_div return false for HFmode.

If you take the change I proposed on patch 3/14 you won't need this.

OK otherwise.

Thanks,
James

> gcc/
> 2016-07-20  Jiong Wang<jiong.wang@arm.com>
> 
>         * config/aarch64/aarch64-simd-builtins.def: Register new builtins.
>         * config/aarch64/aarch64.md (<FCVT_F2FIXED:fcvt_fixed_insn>hf<mode>3): New.
>         (<FCVT_FIXED2F:fcvt_fixed_insn><mode>hf3): Likewise.
>         (add<mode>3): Likewise.
>         (sub<mode>3): Likewise.
>         (mul<mode>3): Likewise.
>         (div<mode>3): Likewise.
>         (*div<mode>3): Likewise.
>         (<fmaxmin><mode>3): Extend to HF.
>         * config/aarch64/aarch64.c (aarch64_emit_approx_div): Return
>         false for HFmode.
>         * config/aarch64/aarch64-simd.md (aarch64_rsqrts<mode>): Likewise.
>         (fabd<mode>3): Likewise.
>         (<FCVT_F2FIXED:fcvt_fixed_insn><VHSDF_HSDF:mode>3): Likewise.
>         (<FCVT_FIXED2F:fcvt_fixed_insn><VHSDI_HSDI:mode>3): Likewise.
>         (aarch64_fmulx<mode>): Likewise.
>         (aarch64_fac<optab><mode>): Likewise.
>         (aarch64_frecps<mode>): Likewise.
>         (<FCVT_F2FIXED:fcvt_fixed_insn>hfhi3): New.
>         (<FCVT_FIXED2F:fcvt_fixed_insn>hihf3): Likewise.
>         * config/aarch64/iterators.md (VHSDF_SDF): Delete.
>         (VSDQ_HSDI): Support HI.
>         (fcvt_target, FCVT_TARGET): Likewise.
>         * config/aarch64/arm_fp16.h: (vaddh_f16): New.
>         (vsubh_f16): Likewise.
>         (vabdh_f16): Likewise.
>         (vcageh_f16): Likewise.
>         (vcagth_f16): Likewise.
>         (vcaleh_f16): Likewise.
>         (vcalth_f16): Likewise.        (vcleh_f16): Likewise.
>         (vclth_f16): Likewise.
>         (vcvth_n_f16_s16): Likewise.
>         (vcvth_n_f16_s32): Likewise.
>         (vcvth_n_f16_s64): Likewise.
>         (vcvth_n_f16_u16): Likewise.
>         (vcvth_n_f16_u32): Likewise.
>         (vcvth_n_f16_u64): Likewise.
>         (vcvth_n_s16_f16): Likewise.
>         (vcvth_n_s32_f16): Likewise.
>         (vcvth_n_s64_f16): Likewise.
>         (vcvth_n_u16_f16): Likewise.
>         (vcvth_n_u32_f16): Likewise.
>         (vcvth_n_u64_f16): Likewise.
>         (vdivh_f16): Likewise.
>         (vmaxh_f16): Likewise.
>         (vmaxnmh_f16): Likewise.
>         (vminh_f16): Likewise.
>         (vminnmh_f16): Likewise.
>         (vmulh_f16): Likewise.
>         (vmulxh_f16): Likewise.
>         (vrecpsh_f16): Likewise.
>         (vrsqrtsh_f16): Likewise.
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [AArch64][9/14] ARMv8.2-A FP16 three operands scalar intrinsics
  2016-07-07 16:18                 ` [AArch64][9/14] ARMv8.2-A FP16 three operands " Jiong Wang
@ 2016-07-25 11:15                   ` James Greenhalgh
  0 siblings, 0 replies; 32+ messages in thread
From: James Greenhalgh @ 2016-07-25 11:15 UTC (permalink / raw)
  To: Jiong Wang; +Cc: GCC Patches, nd

On Thu, Jul 07, 2016 at 05:18:15PM +0100, Jiong Wang wrote:
> This patch add ARMv8.2-A FP16 three operands scalar intrinsics.

OK.

Thanks,
James

> gcc/
> 2016-07-07  Jiong Wang <jiong.wang@arm.com>
> 
>         * config/aarch64/aarch64-simd-builtins.def: Register new builtins.
>         * config/aarch64/aarch64.md (fma): New for HF.
>         (fnma): Likewise.
>         * config/aarch64/arm_fp16.h (vfmah_f16): New.
>         (vfmsh_f16): Likewise.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [AArch64][10/14] ARMv8.2-A FP16 lane scalar intrinsics
  2016-07-07 16:18                   ` [AArch64][10/14] ARMv8.2-A FP16 lane scalar intrinsics Jiong Wang
@ 2016-07-25 11:16                     ` James Greenhalgh
  0 siblings, 0 replies; 32+ messages in thread
From: James Greenhalgh @ 2016-07-25 11:16 UTC (permalink / raw)
  To: Jiong Wang; +Cc: GCC Patches, nd

On Thu, Jul 07, 2016 at 05:18:29PM +0100, Jiong Wang wrote:
> This patch adds ARMv8.2-A FP16 lane scalar intrinsics.

OK.

Thanks,
James

> 
> gcc/
> 2016-07-07  Jiong Wang <jiong.wang@arm.com>
> 
>         * config/aarch64/arm_neon.h (vfmah_lane_f16): New.
>         (vfmah_laneq_f16): Likewise.
>         (vfmsh_lane_f16): Likewise.
>         (vfmsh_laneq_f16): Likewise.
>         (vmulh_lane_f16): Likewise.
>         (vmulh_laneq_f16): Likewise.
>         (vmulxh_lane_f16): Likewise.
>         (vmulxh_laneq_f16): Likewise.
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [AArch64][11/14] ARMv8.2-A FP16 testsuite selector
  2016-07-07 16:18                     ` [AArch64][11/14] ARMv8.2-A FP16 testsuite selector Jiong Wang
@ 2016-10-10  8:57                       ` James Greenhalgh
  0 siblings, 0 replies; 32+ messages in thread
From: James Greenhalgh @ 2016-10-10  8:57 UTC (permalink / raw)
  To: Jiong Wang; +Cc: GCC Patches, nd

On Thu, Jul 07, 2016 at 05:18:41PM +0100, Jiong Wang wrote:
> ARMv8.2-A adds support for scalar and vector FP16 instructions to ARM and
> AArch64. This patch adds support for testing code for AArch64 targets
> using the new instructions. It is based on the target-support code for
> ARMv8.2-A added for ARM (AArch32).

OK.

Thanks,
James

> gcc/testsuite/
> 2016-07-07  Matthew Wahab <matthew.wahab@arm.com>
>             Jiong Wang <jiong.wang@arm.com>
> 
>         * target-supports.exp (add_options_for_arm_v8_2a_fp16_scalar):
>         Mention AArch64 support.
>         (add_options_for_arm_v8_2a_fp16_neon): Likewise.
>         (check_effective_target_arm_v8_2a_fp16_scalar_ok_nocache): Support
>         AArch64 targets.
>         (check_effective_target_arm_v8_2a_fp16_neon_ok_nocache): Support
>         AArch64 targets.
>         (check_effective_target_arm_v8_2a_fp16_scalar_hw): Support AArch64
>         targets.
>         (check_effective_target_arm_v8_2a_fp16_neon_hw): Likewise.
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [AArch64][13/14] ARMv8.2-A testsuite for new vector intrinsics
  2016-07-07 16:19                         ` [AArch64][13/14] ARMv8.2-A testsuite for new vector intrinsics Jiong Wang
@ 2016-10-10  9:55                           ` James Greenhalgh
  0 siblings, 0 replies; 32+ messages in thread
From: James Greenhalgh @ 2016-10-10  9:55 UTC (permalink / raw)
  To: Jiong Wang; +Cc: GCC Patches, nd

On Thu, Jul 07, 2016 at 05:19:25PM +0100, Jiong Wang wrote:
> This patch contains testcases for those new vector intrinsics which are only
> available for AArch64.


OK.

Thanks,
James

> gcc/testsuite/
> 2016-07-07  Jiong Wang <jiong.wang@arm.com>
> 
>         * gcc.target/aarch64/advsimd-intrinsics/vdiv_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vfmas_lane_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vfmas_n_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vmaxnmv_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vmaxv_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vminnmv_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vminv_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vmul_lane_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vmulx_f16_1.c: New
>         * gcc.target/aarch64/advsimd-intrinsics/vmulx_lane_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vmulx_n_f16_1.c: New
>         * gcc.target/aarch64/advsimd-intrinsics/vpminmaxnm_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vrndi_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vsqrt_f16_1.c: New.
> 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [AArch64][12/14] ARMv8.2-A testsuite for new data movement intrinsics
  2016-07-07 16:19                       ` [AArch64][12/14] ARMv8.2-A testsuite for new data movement intrinsics Jiong Wang
@ 2016-10-10  9:55                         ` James Greenhalgh
  0 siblings, 0 replies; 32+ messages in thread
From: James Greenhalgh @ 2016-10-10  9:55 UTC (permalink / raw)
  To: Jiong Wang; +Cc: GCC Patches, nd

On Thu, Jul 07, 2016 at 05:19:09PM +0100, Jiong Wang wrote:
> This patch contains testcases for those new scalar intrinsics which are only
> available for AArch64.

OK.

Thanks,
James

> 
> gcc/testsuite/
> 2016-07-07  Jiong Wang <jiong.wang@arm.com>
> 
>         * gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
> (FP16_SUPPORTED):
>         Enable AArch64.
>         * gcc.target/aarch64/advsimd-intrinsics/vdup_lane.c: Add
> support for
>         vdup*_laneq.
>         * gcc.target/aarch64/advsimd-intrinsics/vduph_lane.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vtrn_half.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vuzp_half.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vzip_half.c: New.
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [AArch64][14/14] ARMv8.2-A testsuite for new scalar intrinsics
  2016-07-07 16:19                           ` [AArch64][14/14] ARMv8.2-A testsuite for new scalar intrinsics Jiong Wang
@ 2016-10-10  9:56                             ` James Greenhalgh
  0 siblings, 0 replies; 32+ messages in thread
From: James Greenhalgh @ 2016-10-10  9:56 UTC (permalink / raw)
  To: Jiong Wang; +Cc: GCC Patches, nd

On Thu, Jul 07, 2016 at 05:19:37PM +0100, Jiong Wang wrote:
> This patch contains testcases for those new scalar intrinsics which are only
> available for AArch64.

OK.

Thanks,
James

> gcc/testsuite/
> 2016-07-07  Jiong Wang <jiong.wang@arm.com>
> 
>         * gcc.target/aarch64/advsimd-intrinsics/unary_scalar_op.inc:
> Support FMT64.
>         * gcc.target/aarch64/advsimd-intrinsics/vabdh_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcageh_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcagth_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcaleh_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcalth_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vceqh_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vceqzh_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcgeh_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcgezh_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcgth_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcgtzh_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcleh_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vclezh_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vclth_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcltzh_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcvtah_s16_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcvtah_s64_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcvtah_u16_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcvtah_u64_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_s16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_s64_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_u16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_u64_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_s16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_s64_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_u16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_u64_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcvth_n_s16_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcvth_n_s64_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcvth_n_u16_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcvth_n_u64_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcvth_s16_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcvth_s64_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcvth_u16_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcvth_u64_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcvtmh_s16_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcvtmh_s64_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcvtmh_u16_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcvtmh_u64_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcvtnh_s16_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcvtnh_s64_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcvtnh_u16_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcvtnh_u64_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcvtph_s16_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcvtph_s64_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcvtph_u16_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcvtph_u64_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vfmash_lane_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vmaxh_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vminh_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vmulh_lane_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vmulxh_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vmulxh_lane_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vrecpeh_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vrecpsh_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vrecpxh_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vrsqrteh_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vrsqrtsh_f16_1.c: New.


^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2016-10-10  9:56 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <67f7b93f-0a92-de8f-8c50-5b4b573fed3a@foss.arm.com>
     [not found] ` <99eb95e3-5e9c-c6c9-b85f-e67d15f4859a@foss.arm.com>
2016-07-07 16:14   ` [AArch64][2/14] ARMv8.2-A FP16 one operand vector intrinsics Jiong Wang
2016-07-20 17:00     ` Jiong Wang
2016-07-25 11:01       ` James Greenhalgh
     [not found]   ` <21c3c64f-95ad-c127-3f8a-4afd236aae33@foss.arm.com>
2016-07-07 16:15     ` [AArch64][3/14] ARMv8.2-A FP16 two operands " Jiong Wang
2016-07-20 17:01       ` Jiong Wang
2016-07-25 11:03         ` James Greenhalgh
     [not found]     ` <938d13c1-39be-5fe3-9997-e55942bbd163@foss.arm.com>
2016-07-07 16:16       ` [AArch64][4/14] ARMv8.2-A FP16 three " Jiong Wang
2016-07-25 11:05         ` James Greenhalgh
     [not found]       ` <a12ecde7-2ac1-0539-334e-9a33395dd3eb@foss.arm.com>
2016-07-07 16:16         ` [AArch64][5/14] ARMv8.2-A FP16 lane " Jiong Wang
2016-07-25 11:06           ` James Greenhalgh
     [not found]         ` <a3eeda81-cb1c-6d9e-706d-c5c067a90d74@foss.arm.com>
2016-07-07 16:17           ` [AArch64][6/14] ARMv8.2-A FP16 reduction " Jiong Wang
2016-07-25 11:06             ` James Greenhalgh
     [not found]           ` <cf21a824-01c3-0969-d12b-884c4e70e7f1@foss.arm.com>
2016-07-07 16:17             ` [AArch64][7/14] ARMv8.2-A FP16 one operand scalar intrinsics Jiong Wang
     [not found]               ` <b6150268-1e2d-3fc6-17c9-7bde47e2534e@foss.arm.com>
2016-07-20 17:01                 ` Jiong Wang
2016-07-25 11:14                   ` James Greenhalgh
     [not found]             ` <c9ed296a-1105-6bda-1927-e72be567c590@foss.arm.com>
     [not found]               ` <d91fc169-1317-55ed-c36c-6dc5dac088cc@foss.arm.com>
2016-07-07 16:18                 ` [AArch64][9/14] ARMv8.2-A FP16 three operands " Jiong Wang
2016-07-25 11:15                   ` James Greenhalgh
     [not found]                 ` <94dcb98c-81c6-a1d5-bb1a-ff8278f0a07b@foss.arm.com>
     [not found]                   ` <82155ca9-a506-b1fc-bdd4-6a637dc66a1e@foss.arm.com>
2016-07-07 16:18                     ` [AArch64][11/14] ARMv8.2-A FP16 testsuite selector Jiong Wang
2016-10-10  8:57                       ` James Greenhalgh
     [not found]                     ` <135287e5-6fc1-4957-d320-16f38260fa28@foss.arm.com>
     [not found]                       ` <cdb3640f-134a-f2be-c728-b1467fb7aaf9@foss.arm.com>
2016-07-07 16:19                         ` [AArch64][13/14] ARMv8.2-A testsuite for new vector intrinsics Jiong Wang
2016-10-10  9:55                           ` James Greenhalgh
     [not found]                         ` <c5443f0d-577b-776b-4c97-7b16b06f8264@foss.arm.com>
2016-07-07 16:19                           ` [AArch64][14/14] ARMv8.2-A testsuite for new scalar intrinsics Jiong Wang
2016-10-10  9:56                             ` James Greenhalgh
2016-07-07 16:19                       ` [AArch64][12/14] ARMv8.2-A testsuite for new data movement intrinsics Jiong Wang
2016-10-10  9:55                         ` James Greenhalgh
2016-07-07 16:18                   ` [AArch64][10/14] ARMv8.2-A FP16 lane scalar intrinsics Jiong Wang
2016-07-25 11:16                     ` James Greenhalgh
2016-07-07 16:18               ` [AArch64][8/14] ARMv8.2-A FP16 two operands " Jiong Wang
2016-07-20 17:01                 ` Jiong Wang
2016-07-25 11:15                   ` James Greenhalgh
2016-07-07 16:14 ` [AArch64][1/14] ARMv8.2-A FP16 data processing intrinsics Jiong Wang
2016-07-08 14:07   ` James Greenhalgh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).