[PATCH 0/17][ARM] ARMv8.2-A and FP16 extension support.

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH 0/17][ARM] ARMv8.2-A and FP16 extension support.
@ 2016-05-17 14:20 Matthew Wahab
  2016-05-17 14:23 ` [PATCH 1/17][ARM] Add ARMv8.2-A command line option and profile Matthew Wahab
                   ` (16 more replies)
  0 siblings, 17 replies; 73+ messages in thread
From: Matthew Wahab @ 2016-05-17 14:20 UTC (permalink / raw)
  To: gcc-patches

Hello,

The ARMv8.2-A architecture builds on ARMv8.1-A and includes an optional
extension supporting half-precision floating point (FP16)
arithmetic. This extension adds instructions to the VFP and NEON
instructions sets to provide operations on IEEE754-2008 formatted FP16
values.

This patch set adds support to GCC for the ARMv8.2-A architecture and
for the FP16 extension. The FP16 VFP and NEON instructions are exposed
as new ACLE intrinsics and support is added to the compiler to make use
of data movement and other precision-preserving instructions.

The new half-precision operations are treated as complementary to the
existing FP16 support. To preserve compatibility with existing code, the
ARM __fp16 data-type continues to be treated as a storage-only format
and operations on it continue to be by promotion to single precision
floating point. Half-precision operations are only supported through the
use of the intrinsics added by this patch set.

This series also includes a number of patches to improve the handling of
16-bit integer and floating point values. These are to support the code
generation of ARMv8.2 FP16 extension but are also made available
independently of it. Among these changes are a number of new ACLE data
processing instrinsics to support half-precision (f16) data.

The patches in this series:

- Add the command line and profile for the new architecture.
- Add selectors to the testsuite target-support to distinguish targets
   using the IEEE FP16 format from those using the ARM Alternative format.
- Add support (selectors and directives) to the testsuite target-support
   for ARMv8.2-A and the FP16 extension.
- Add feature macros for the new features.
- Improve the handling of 16-bit integers when VFP units are available.
- Add vector shuffle intrinsics for float16_t.
- Add data movement instructions introduced by the new extension.
- Add the VFP FP16 arithmetic instructions introduced by the extension.
- Add the NEON FP16 arithmetic instructions introduced by the extension.
- Refactor the code for initializing and expanding the NEON intrinsics.
- Add builtins to support intrinsics for VFP FP16 instruction.
- Add builtins to support intrinsics for NEON FP16 instruction.
- Add intrinsics for VFP FP16 instruction.
- Add intrinsics for NEON FP16 instruction.
- Add tests for ARMv8.2-A and the new FP16 support.
- Add tests for the VFP FP16 intrinsics.
- Add tests for the NEON FP16 intrinsics.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator. Also tested aarch64-none-elf with the
advsimd-intrinsics testsuite using an ARMv8.2-A emulator.

Matthew

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 1/17][ARM] Add ARMv8.2-A command line option and profile.
  2016-05-17 14:20 [PATCH 0/17][ARM] ARMv8.2-A and FP16 extension support Matthew Wahab
@ 2016-05-17 14:23 ` Matthew Wahab
  2016-07-04 13:46   ` Matthew Wahab
  2016-05-17 14:25 ` [PATCH 2/17][Testsuite] Add a selector for ARM FP16 alternative format support Matthew Wahab
                   ` (15 subsequent siblings)
  16 siblings, 1 reply; 73+ messages in thread
From: Matthew Wahab @ 2016-05-17 14:23 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1827 bytes --]

This patch adds the command options for the architecture ARMv8.2-A and
the half-precision extension. The architecture is selected by
-march=armv8.2-a and has all the properties of -march=armv8.1-a.

This patch also enables the CRC extension (+crc) which is required
for both ARMv8.2-A and ARMv8.1-A architectures but is not currently
enabled by default for -march=armv8.1-a.

The half-precision extension is selected using the extension +fp16. This
enables the VFP FP16 instructions if an ARMv8 VFP unit is also
specified, e.g. by -mfpu=fp-armv8. It also enables the FP16 NEON
instructions if an ARMv8 NEON unit is specified, e.g. by
-mfpu=neon-fp-armv8. Note that if the NEON FP16 instructions are enabled
then so are the VFP FP16 instructions.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on an
ARMv8.2-A emulator.

Ok for trunk?
Matthew

2016-05-17  Matthew Wahab  <matthew.wahab@arm.com>

	* config/arm/arm-arches.def ("armv8.1-a"): Add FL_CRC32.
	("armv8.2-a"): New.
	("armv8.2-a+fp16"): New.
	* config/arm/arm-protos.h (FL2_ARCH8_2): New.
	(FL2_FP16INST): New.
	(FL2_FOR_ARCH8_2A): New.
	* config/arm/arm-tables.opt: Regenerate.
	* config/arm/arm.c (arm_arch8_2): New.
	(arm_fp16_inst): New.
	(arm_option_override): Set arm_arch8_2 and arm_fp16_inst.  Check
	for incompatible fp16-format settings.
	* config/arm/arm.h (TARGET_VFP_FP16INST): New.
	(TARGET_NEON_FP16INST): New.
	(arm_arch8_2): Declare.
	(arm_fp16_inst): Declare.
	* config/arm/bpabi.h (BE8_LINK_SPEC): Add entries for
	march=armv8.2-a and march=armv8.2-a+fp16.
	* config/arm/t-aprofile (Arch Matches): Add entries for armv8.2-a
	and armv8.2-a+fp16.
	* doc/invoke.texi (ARM Options): Add "-march=armv8.1-a",
	"-march=armv8.2-a" and "-march=armv8.2-a+fp16".



[-- Attachment #2: 0001-PATCH-1-17-ARM-Add-ARMv8.2-A-command-line-option-and.patch --]
[-- Type: text/x-patch, Size: 9707 bytes --]

From 7df41b0a5d248d842fd4c89082dc1a1055dc4604 Mon Sep 17 00:00:00 2001
From: Matthew Wahab <matthew.wahab@arm.com>
Date: Thu, 7 Apr 2016 13:31:24 +0100
Subject: [PATCH 01/17] [PATCH 1/17][ARM] Add ARMv8.2-A command line option and
 profile.

2016-05-17  Matthew Wahab  <matthew.wahab@arm.com>

	* config/arm/arm-arches.def ("armv8.1-a"): Add FL_CRC32.
	("armv8.2-a"): New.
	("armv8.2-a+fp16"): New.
	* config/arm/arm-protos.h (FL2_ARCH8_2): New.
	(FL2_FP16INST): New.
	(FL2_FOR_ARCH8_2A): New.
	* config/arm/arm-tables.opt: Regenerate.
	* config/arm/arm.c (arm_arch8_2): New.
	(arm_fp16_inst): New.
	(arm_option_override): Set arm_arch8_2 and arm_fp16_inst.  Check
	for incompatible fp16-format settings.
	* config/arm/arm.h (TARGET_VFP_FP16INST): New.
	(TARGET_NEON_FP16INST): New.
	(arm_arch8_2): Declare.
	(arm_fp16_inst): Declare.
	* config/arm/bpabi.h (BE8_LINK_SPEC): Add entries for
	march=armv8.2-a and march=armv8.2-a+fp16.
	* config/arm/t-aprofile (Arch Matches): Add entries for armv8.2-a
	and armv8.2-a+fp16.
	* doc/invoke.texi (ARM Options): Add "-march=armv8.1-a",
	"-march=armv8.2-a" and "-march=armv8.2-a+fp16".
---
 gcc/config/arm/arm-arches.def | 10 ++++++++--
 gcc/config/arm/arm-protos.h   |  4 ++++
 gcc/config/arm/arm-tables.opt | 10 ++++++++--
 gcc/config/arm/arm.c          | 15 +++++++++++++++
 gcc/config/arm/arm.h          | 14 ++++++++++++++
 gcc/config/arm/bpabi.h        |  4 ++++
 gcc/config/arm/t-aprofile     |  2 ++
 gcc/doc/invoke.texi           | 13 +++++++++++++
 8 files changed, 68 insertions(+), 4 deletions(-)

diff --git a/gcc/config/arm/arm-arches.def b/gcc/config/arm/arm-arches.def
index fd02b18..2b4a80e 100644
--- a/gcc/config/arm/arm-arches.def
+++ b/gcc/config/arm/arm-arches.def
@@ -58,10 +58,16 @@ ARM_ARCH("armv7e-m", cortexm4,  7EM,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC |	      FL_F
 ARM_ARCH("armv8-a", cortexa53,  8A,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC |             FL_FOR_ARCH8A))
 ARM_ARCH("armv8-a+crc",cortexa53, 8A,   ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_CRC32  | FL_FOR_ARCH8A))
 ARM_ARCH("armv8.1-a", cortexa53,  8A,
-	  ARM_FSET_MAKE (FL_CO_PROC | FL_FOR_ARCH8A,  FL2_FOR_ARCH8_1A))
+	  ARM_FSET_MAKE (FL_CO_PROC | FL_CRC32 | FL_FOR_ARCH8A,
+			 FL2_FOR_ARCH8_1A))
 ARM_ARCH("armv8.1-a+crc",cortexa53, 8A,
 	  ARM_FSET_MAKE (FL_CO_PROC | FL_CRC32 | FL_FOR_ARCH8A,
 			 FL2_FOR_ARCH8_1A))
+ARM_ARCH ("armv8.2-a", cortexa53,  8A,
+	  ARM_FSET_MAKE (FL_CO_PROC | FL_CRC32 | FL_FOR_ARCH8A,
+			 FL2_FOR_ARCH8_2A))
+ARM_ARCH ("armv8.2-a+fp16", cortexa53,  8A,
+	  ARM_FSET_MAKE (FL_CO_PROC | FL_CRC32 | FL_FOR_ARCH8A,
+			 FL2_FOR_ARCH8_2A | FL2_FP16INST))
 ARM_ARCH("iwmmxt",  iwmmxt,     5TE,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT))
 ARM_ARCH("iwmmxt2", iwmmxt2,    5TE,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT | FL_IWMMXT2))
-
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index d8179c4..c1a1eb8 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -390,6 +390,9 @@ extern bool arm_is_constant_pool_ref (rtx);
 #define FL_ARCH6KZ    (1 << 31)       /* ARMv6KZ architecture.  */
 
 #define FL2_ARCH8_1   (1 << 0)	      /* Architecture 8.1.  */
+#define FL2_ARCH8_2   (1 << 1)	      /* Architecture 8.2.  */
+#define FL2_FP16INST  (1 << 2)	      /* FP16 Instructions for ARMv8.2 and
+					 later.  */
 
 /* Flags that only effect tuning, not available instructions.  */
 #define FL_TUNE		(FL_WBUF | FL_VFPV2 | FL_STRONG | FL_LDSCHED \
@@ -420,6 +423,7 @@ extern bool arm_is_constant_pool_ref (rtx);
 #define FL_FOR_ARCH7EM  (FL_FOR_ARCH7M | FL_ARCH7EM)
 #define FL_FOR_ARCH8A	(FL_FOR_ARCH7VE | FL_ARCH8)
 #define FL2_FOR_ARCH8_1A	FL2_ARCH8_1
+#define FL2_FOR_ARCH8_2A	(FL2_FOR_ARCH8_1A | FL2_ARCH8_2)
 
 /* There are too many feature bits to fit in a single word so the set of cpu and
    fpu capabilities is a structure.  A feature set is created and manipulated
diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt
index adec6c9..fccd621 100644
--- a/gcc/config/arm/arm-tables.opt
+++ b/gcc/config/arm/arm-tables.opt
@@ -428,10 +428,16 @@ EnumValue
 Enum(arm_arch) String(armv8.1-a+crc) Value(28)
 
 EnumValue
-Enum(arm_arch) String(iwmmxt) Value(29)
+Enum(arm_arch) String(armv8.2-a) Value(29)
 
 EnumValue
-Enum(arm_arch) String(iwmmxt2) Value(30)
+Enum(arm_arch) String(armv8.2-a+fp16) Value(30)
+
+EnumValue
+Enum(arm_arch) String(iwmmxt) Value(31)
+
+EnumValue
+Enum(arm_arch) String(iwmmxt2) Value(32)
 
 Enum
 Name(arm_fpu) Type(int)
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index c3f74dc..f3914ef 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -815,6 +815,13 @@ int arm_arch8 = 0;
 /* Nonzero if this chip supports the ARMv8.1 extensions.  */
 int arm_arch8_1 = 0;
 
+/* Nonzero if this chip supports the ARM Architecture 8.2 extensions.  */
+int arm_arch8_2 = 0;
+
+/* Nonzero if this chip supports the FP16 instructions extension of ARM
+   Architecture 8.2.  */
+int arm_fp16_inst = 0;
+
 /* Nonzero if this chip can benefit from load scheduling.  */
 int arm_ld_sched = 0;
 
@@ -3165,6 +3172,8 @@ arm_option_override (void)
   arm_arch7em = ARM_FSET_HAS_CPU1 (insn_flags, FL_ARCH7EM);
   arm_arch8 = ARM_FSET_HAS_CPU1 (insn_flags, FL_ARCH8);
   arm_arch8_1 = ARM_FSET_HAS_CPU2 (insn_flags, FL2_ARCH8_1);
+  arm_arch8_2 = ARM_FSET_HAS_CPU2 (insn_flags, FL2_ARCH8_2);
+  arm_fp16_inst = ARM_FSET_HAS_CPU2 (insn_flags, FL2_FP16INST);
   arm_arch_thumb2 = ARM_FSET_HAS_CPU1 (insn_flags, FL_THUMB2);
   arm_arch_xscale = ARM_FSET_HAS_CPU1 (insn_flags, FL_XSCALE);
 
@@ -3180,6 +3189,12 @@ arm_option_override (void)
   arm_tune_cortex_a9 = (arm_tune == cortexa9) != 0;
   arm_arch_crc = ARM_FSET_HAS_CPU1 (insn_flags, FL_CRC32);
   arm_m_profile_small_mul = ARM_FSET_HAS_CPU1 (insn_flags, FL_SMALLMUL);
+  if (arm_fp16_inst)
+    {
+      if (arm_fp16_format == ARM_FP16_FORMAT_ALTERNATIVE)
+	error ("selected fp16 options are incompatible.");
+      arm_fp16_format = ARM_FP16_FORMAT_IEEE;
+    }
 
   /* V5 code we generate is completely interworking capable, so we turn off
      TARGET_INTERWORK here to avoid many tests later on.  */
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 5b1a030..952cf08 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -222,6 +222,13 @@ extern void (*arm_lang_output_object_attributes_hook)(void);
 /* FPU supports ARMv8.1 Adv.SIMD extensions.  */
 #define TARGET_NEON_RDMA (TARGET_NEON && arm_arch8_1)
 
+/* FPU supports the floating point FP16 instructions for ARMv8.2 and later.  */
+#define TARGET_VFP_FP16INST \
+  (TARGET_32BIT && TARGET_HARD_FLOAT && TARGET_FPU_ARMV8 && arm_fp16_inst)
+
+/* FPU supports the AdvSIMD FP16 instructions for ARMv8.2 and later.  */
+#define TARGET_NEON_FP16INST (TARGET_VFP_FP16INST && TARGET_NEON_RDMA)
+
 /* Q-bit is present.  */
 #define TARGET_ARM_QBIT \
   (TARGET_32BIT && arm_arch5e && (arm_arch_notm || arm_arch7))
@@ -448,6 +455,13 @@ extern int arm_arch8;
 /* Nonzero if this chip supports the ARM Architecture 8.1 extensions.  */
 extern int arm_arch8_1;
 
+/* Nonzero if this chip supports the ARM Architecture 8.2 extensions.  */
+extern int arm_arch8_2;
+
+/* Nonzero if this chip supports the FP16 instructions extension of ARM
+   Architecture 8.2.  */
+extern int arm_fp16_inst;
+
 /* Nonzero if this chip can benefit from load scheduling.  */
 extern int arm_ld_sched;
 
diff --git a/gcc/config/arm/bpabi.h b/gcc/config/arm/bpabi.h
index 06488ba..d7f721a 100644
--- a/gcc/config/arm/bpabi.h
+++ b/gcc/config/arm/bpabi.h
@@ -90,6 +90,8 @@
    |march=armv8-a+crc					\
    |march=armv8.1-a					\
    |march=armv8.1-a+crc					\
+   |march=armv8.2-a					\
+   |march=armv8.2-a+fp16				\
    :%{!r:--be8}}}"
 #else
 #define BE8_LINK_SPEC \
@@ -121,6 +123,8 @@
    |march=armv8-a+crc					\
    |march=armv8.1-a					\
    |march=armv8.1-a+crc					\
+   |march=armv8.2-a					\
+   |march=armv8.2-a+fp16				\
    :%{!r:--be8}}}"
 #endif
 
diff --git a/gcc/config/arm/t-aprofile b/gcc/config/arm/t-aprofile
index b0ecc2f..2a53b81 100644
--- a/gcc/config/arm/t-aprofile
+++ b/gcc/config/arm/t-aprofile
@@ -101,6 +101,8 @@ MULTILIB_MATCHES       += march?armv8-a=mcpu?xgene1
 MULTILIB_MATCHES       += march?armv8-a=march?armv8-a+crc
 MULTILIB_MATCHES       += march?armv8-a=march?armv8.1-a
 MULTILIB_MATCHES       += march?armv8-a=march?armv8.1-a+crc
+MULTILIB_MATCHES       += march?armv8-a=march?armv8.2-a
+MULTILIB_MATCHES       += march?armv8-a=march?armv8.2-a+fp16
 
 # FPU matches
 MULTILIB_MATCHES       += mfpu?vfpv3-d16=mfpu?vfpv3
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 8f35f47..cbed378 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -14030,6 +14030,19 @@ extensions.
 @option{-march=armv8-a+crc} enables code generation for the ARMv8-A
 architecture together with the optional CRC32 extensions.
 
+@option{-march=armv8.1-a} enables compiler support for the ARMv8.1-A
+architecture.  This also enables the features provided by
+@option{-march=armv8-a+crc}.
+
+@option{-march=armv8.2-a} enables compiler support for the ARMv8.2-A
+architecture.  This also enables the features provided by
+@option{-march=armv8.1-a}.
+
+@option{-march=armv8.2-a+fp16} enables compiler support for the
+ARMv8.2-A architecture with the optional FP16 instructions extension.
+This also enables the features provided by @option{-march=armv8.1-a}
+and implies @option{-mfp16-format=ieee}.
+
 @option{-march=native} causes the compiler to auto-detect the architecture
 of the build computer.  At present, this feature is only supported on
 GNU/Linux, and not all architectures are recognized.  If the auto-detect
-- 
2.1.4


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 2/17][Testsuite] Add a selector for ARM FP16 alternative format support.
  2016-05-17 14:20 [PATCH 0/17][ARM] ARMv8.2-A and FP16 extension support Matthew Wahab
  2016-05-17 14:23 ` [PATCH 1/17][ARM] Add ARMv8.2-A command line option and profile Matthew Wahab
@ 2016-05-17 14:25 ` Matthew Wahab
  2016-07-27 13:34   ` Ramana Radhakrishnan
  2016-05-17 14:26 ` [PATCH 3/17][Testsuite] Add ARM support for ARMv8.2-A with FP16 arithmetic instructions Matthew Wahab
                   ` (14 subsequent siblings)
  16 siblings, 1 reply; 73+ messages in thread
From: Matthew Wahab @ 2016-05-17 14:25 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2389 bytes --]

The ARMv8.2-A FP16 extension only supports the IEEE format for FP16
data. It is not compatible with the option -mfp16-format=none nor with
the option -mfp16-format=alternative (selecting the ARM alternative FP16
format). Using either with the FP16 extension will trigger a compiler
error.

This patch adds the selector arm_fp16_alternative_ok to the testsuite's
target-support code to allow tests to require support for the
alternative format. It also adds selector arm_fp16_none_ok to check
whether -mfp16-format=none is a valid option for the target.  The patch
also updates existing tests to make use of the new selectors.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on an
ARMv8.2-A emulator.

Ok for trunk?
Matthew

2016-05-17  Matthew Wahab  <matthew.wahab@arm.com>

	* doc/sourcebuild.texi (ARM-specific attributes): Add entries for
	arm_fp16_alternative_ok and arm_fp16_none_ok.

testsuite/
2016-05-17  Matthew Wahab  <matthew.wahab@arm.com>

	* g++.dg/ext/arm-fp16/arm-fp16-ops-3.C: Use
	arm_fp16_alternative_ok.
	* g++.dg/ext/arm-fp16/arm-fp16-ops-4.C: Likewise.
	* gcc.dg/torture/arm-fp16-int-convert-alt.c: Likewise.
	* gcc/testsuite/gcc.dg/torture/arm-fp16-ops-3.c: Likewise.
	* gcc/testsuite/gcc.dg/torture/arm-fp16-ops-4.c: Likewise.
	* gcc.target/arm/fp16-compile-alt-1.c: Likewise.
	* gcc.target/arm/fp16-compile-alt-10.c: Likewise.
	* gcc.target/arm/fp16-compile-alt-11.c: Likewise.
	* gcc.target/arm/fp16-compile-alt-12.c: Likewise.
	* gcc.target/arm/fp16-compile-alt-2.c: Likewise.
	* gcc.target/arm/fp16-compile-alt-3.c: Likewise.
	* gcc.target/arm/fp16-compile-alt-4.c: Likewise.
	* gcc.target/arm/fp16-compile-alt-5.c: Likewise.
	* gcc.target/arm/fp16-compile-alt-6.c: Likewise.
	* gcc.target/arm/fp16-compile-alt-7.c: Likewise.
	* gcc.target/arm/fp16-compile-alt-8.c: Likewise.
	* gcc.target/arm/fp16-compile-alt-9.c: Likewise.
	* gcc.target/arm/fp16-compile-none-1.c: Use arm_fp16_none_ok.
	* gcc.target/arm/fp16-compile-none-2.c: Likewise.
	* gcc.target/arm/fp16-rounding-alt-1.c: Use
	arm_fp16_alternative_ok.
	* lib/target-supports.exp
	(check_effective_target_arm_fp16_alternative_ok_nocache): New.
	(check_effective_target_arm_fp16_alternative_ok): New.
	(check_effective_target_arm_fp16_none_ok_nocache): New.
	(check_effective_target_arm_fp16_none_ok): New.


[-- Attachment #2: 0002-PATCH-2-17-Testsuite-Add-a-selector-for-ARM-FP16-alt.patch --]
[-- Type: text/x-patch, Size: 16140 bytes --]

From 1901fdfbd2f8da9809a60e43284a1749b015dfba Mon Sep 17 00:00:00 2001
From: Matthew Wahab <matthew.wahab@arm.com>
Date: Thu, 7 Apr 2016 13:33:51 +0100
Subject: [PATCH 02/17] [PATCH 2/17][Testsuite] Add a selector for ARM FP16
 alternative format support.

2016-05-17  Matthew Wahab  <matthew.wahab@arm.com>

	* doc/sourcebuild.texi (ARM-specific attributes): Add entries for
	arm_fp16_alternative_ok and arm_fp16_none_ok.

testsuite/
2016-05-17  Matthew Wahab  <matthew.wahab@arm.com>

	* g++.dg/ext/arm-fp16/arm-fp16-ops-3.C: Use
	arm_fp16_alternative_ok.
	* g++.dg/ext/arm-fp16/arm-fp16-ops-4.C: Likewise.
	* gcc.dg/torture/arm-fp16-int-convert-alt.c: Likewise.
	* gcc/testsuite/gcc.dg/torture/arm-fp16-ops-3.c: Likewise.
	* gcc/testsuite/gcc.dg/torture/arm-fp16-ops-4.c: Likewise.
	* gcc.target/arm/fp16-compile-alt-1.c: Likewise.
	* gcc.target/arm/fp16-compile-alt-10.c: Likewise.
	* gcc.target/arm/fp16-compile-alt-11.c: Likewise.
	* gcc.target/arm/fp16-compile-alt-12.c: Likewise.
	* gcc.target/arm/fp16-compile-alt-2.c: Likewise.
	* gcc.target/arm/fp16-compile-alt-3.c: Likewise.
	* gcc.target/arm/fp16-compile-alt-4.c: Likewise.
	* gcc.target/arm/fp16-compile-alt-5.c: Likewise.
	* gcc.target/arm/fp16-compile-alt-6.c: Likewise.
	* gcc.target/arm/fp16-compile-alt-7.c: Likewise.
	* gcc.target/arm/fp16-compile-alt-8.c: Likewise.
	* gcc.target/arm/fp16-compile-alt-9.c: Likewise.
	* gcc.target/arm/fp16-compile-none-1.c: Use arm_fp16_none_ok.
	* gcc.target/arm/fp16-compile-none-2.c: Likewise.
	* gcc.target/arm/fp16-rounding-alt-1.c: Use
	arm_fp16_alternative_ok.
	* lib/target-supports.exp
	(check_effective_target_arm_fp16_alternative_ok_nocache): New.
	(check_effective_target_arm_fp16_alternative_ok): New.
	(check_effective_target_arm_fp16_none_ok_nocache): New.
	(check_effective_target_arm_fp16_none_ok): New.
---
 gcc/doc/sourcebuild.texi                           |  7 +++
 gcc/testsuite/g++.dg/ext/arm-fp16/arm-fp16-ops-3.C |  1 +
 gcc/testsuite/g++.dg/ext/arm-fp16/arm-fp16-ops-4.C |  1 +
 .../gcc.dg/torture/arm-fp16-int-convert-alt.c      |  1 +
 gcc/testsuite/gcc.dg/torture/arm-fp16-ops-3.c      |  1 +
 gcc/testsuite/gcc.dg/torture/arm-fp16-ops-4.c      |  1 +
 gcc/testsuite/gcc.target/arm/fp16-compile-alt-1.c  |  1 +
 gcc/testsuite/gcc.target/arm/fp16-compile-alt-10.c |  1 +
 gcc/testsuite/gcc.target/arm/fp16-compile-alt-11.c |  1 +
 gcc/testsuite/gcc.target/arm/fp16-compile-alt-12.c |  1 +
 gcc/testsuite/gcc.target/arm/fp16-compile-alt-2.c  |  1 +
 gcc/testsuite/gcc.target/arm/fp16-compile-alt-3.c  |  1 +
 gcc/testsuite/gcc.target/arm/fp16-compile-alt-4.c  |  1 +
 gcc/testsuite/gcc.target/arm/fp16-compile-alt-5.c  |  1 +
 gcc/testsuite/gcc.target/arm/fp16-compile-alt-6.c  |  1 +
 gcc/testsuite/gcc.target/arm/fp16-compile-alt-7.c  |  1 +
 gcc/testsuite/gcc.target/arm/fp16-compile-alt-8.c  |  1 +
 gcc/testsuite/gcc.target/arm/fp16-compile-alt-9.c  |  1 +
 gcc/testsuite/gcc.target/arm/fp16-compile-none-1.c |  1 +
 gcc/testsuite/gcc.target/arm/fp16-compile-none-2.c |  1 +
 gcc/testsuite/gcc.target/arm/fp16-rounding-alt-1.c |  1 +
 gcc/testsuite/lib/target-supports.exp              | 59 ++++++++++++++++++++++
 22 files changed, 86 insertions(+)

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 3142cd5..dd6abda 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1565,6 +1565,13 @@ options, including @code{-mfp16-format=ieee} if necessary to obtain the
 Test system supports executing Neon half-precision float instructions.
 (Implies previous.)
 
+@item arm_fp16_alternative_ok
+ARM target supports the ARM FP16 alternative format.  Some multilibs
+may be incompatible with the options needed.
+
+@item arm_fp16_none_ok
+ARM target supports specifying none as the ARM FP16 format.
+
 @item arm_thumb1_ok
 ARM target generates Thumb-1 code for @code{-mthumb}.
 
diff --git a/gcc/testsuite/g++.dg/ext/arm-fp16/arm-fp16-ops-3.C b/gcc/testsuite/g++.dg/ext/arm-fp16/arm-fp16-ops-3.C
index 8f9ab64..29080c7 100644
--- a/gcc/testsuite/g++.dg/ext/arm-fp16/arm-fp16-ops-3.C
+++ b/gcc/testsuite/g++.dg/ext/arm-fp16/arm-fp16-ops-3.C
@@ -1,5 +1,6 @@
 /* Test various operators on __fp16 and mixed __fp16/float operands.  */
 /* { dg-do run { target arm*-*-* } } */
+/* { dg-require-effective-target arm_fp16_alternative_ok } */
 /* { dg-options "-mfp16-format=alternative" } */
 
 #include "arm-fp16-ops.h"
diff --git a/gcc/testsuite/g++.dg/ext/arm-fp16/arm-fp16-ops-4.C b/gcc/testsuite/g++.dg/ext/arm-fp16/arm-fp16-ops-4.C
index 4877f39..4be8883 100644
--- a/gcc/testsuite/g++.dg/ext/arm-fp16/arm-fp16-ops-4.C
+++ b/gcc/testsuite/g++.dg/ext/arm-fp16/arm-fp16-ops-4.C
@@ -1,5 +1,6 @@
 /* Test various operators on __fp16 and mixed __fp16/float operands.  */
 /* { dg-do run { target arm*-*-* } } */
+/* { dg-require-effective-target arm_fp16_alternative_ok } */
 /* { dg-options "-mfp16-format=alternative -ffast-math" } */
 
 #include "arm-fp16-ops.h"
diff --git a/gcc/testsuite/gcc.dg/torture/arm-fp16-int-convert-alt.c b/gcc/testsuite/gcc.dg/torture/arm-fp16-int-convert-alt.c
index bcd7aef..7eb73e6 100644
--- a/gcc/testsuite/gcc.dg/torture/arm-fp16-int-convert-alt.c
+++ b/gcc/testsuite/gcc.dg/torture/arm-fp16-int-convert-alt.c
@@ -1,5 +1,6 @@
 /* Test floating-point conversions.  Standard types and __fp16.  */
 /* { dg-do run { target arm*-*-* } } */
+/* { dg-require-effective-target arm_fp16_alternative_ok }
 /* { dg-options "-mfp16-format=alternative" } */
 
 #include "fp-int-convert.h"
diff --git a/gcc/testsuite/gcc.dg/torture/arm-fp16-ops-3.c b/gcc/testsuite/gcc.dg/torture/arm-fp16-ops-3.c
index 8f9ab64..7716baf 100644
--- a/gcc/testsuite/gcc.dg/torture/arm-fp16-ops-3.c
+++ b/gcc/testsuite/gcc.dg/torture/arm-fp16-ops-3.c
@@ -1,5 +1,6 @@
 /* Test various operators on __fp16 and mixed __fp16/float operands.  */
 /* { dg-do run { target arm*-*-* } } */
+/* { dg-require-effective-target arm_fp16_alternative_ok }
 /* { dg-options "-mfp16-format=alternative" } */
 
 #include "arm-fp16-ops.h"
diff --git a/gcc/testsuite/gcc.dg/torture/arm-fp16-ops-4.c b/gcc/testsuite/gcc.dg/torture/arm-fp16-ops-4.c
index 4877f39..1940f43 100644
--- a/gcc/testsuite/gcc.dg/torture/arm-fp16-ops-4.c
+++ b/gcc/testsuite/gcc.dg/torture/arm-fp16-ops-4.c
@@ -1,5 +1,6 @@
 /* Test various operators on __fp16 and mixed __fp16/float operands.  */
 /* { dg-do run { target arm*-*-* } } */
+/* { dg-require-effective-target arm_fp16_alternative_ok }
 /* { dg-options "-mfp16-format=alternative -ffast-math" } */
 
 #include "arm-fp16-ops.h"
diff --git a/gcc/testsuite/gcc.target/arm/fp16-compile-alt-1.c b/gcc/testsuite/gcc.target/arm/fp16-compile-alt-1.c
index 3abcd94..0845e88 100644
--- a/gcc/testsuite/gcc.target/arm/fp16-compile-alt-1.c
+++ b/gcc/testsuite/gcc.target/arm/fp16-compile-alt-1.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-require-effective-target arm_fp16_alternative_ok } */
 /* { dg-options "-mfp16-format=alternative" } */
 
 __fp16 xx = 0.0;
diff --git a/gcc/testsuite/gcc.target/arm/fp16-compile-alt-10.c b/gcc/testsuite/gcc.target/arm/fp16-compile-alt-10.c
index 2e3d31f..a8772a1 100644
--- a/gcc/testsuite/gcc.target/arm/fp16-compile-alt-10.c
+++ b/gcc/testsuite/gcc.target/arm/fp16-compile-alt-10.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-require-effective-target arm_fp16_alternative_ok } */
 /* { dg-options "-mfp16-format=alternative -pedantic -std=gnu99" } */
 
 #include <math.h>
diff --git a/gcc/testsuite/gcc.target/arm/fp16-compile-alt-11.c b/gcc/testsuite/gcc.target/arm/fp16-compile-alt-11.c
index 62a7a3d..1cb3d2c 100644
--- a/gcc/testsuite/gcc.target/arm/fp16-compile-alt-11.c
+++ b/gcc/testsuite/gcc.target/arm/fp16-compile-alt-11.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-require-effective-target arm_fp16_alternative_ok } */
 /* { dg-options "-mfp16-format=alternative -pedantic -std=gnu99" } */
 
 #include <math.h>
diff --git a/gcc/testsuite/gcc.target/arm/fp16-compile-alt-12.c b/gcc/testsuite/gcc.target/arm/fp16-compile-alt-12.c
index 09586e9..3c3bd2f 100644
--- a/gcc/testsuite/gcc.target/arm/fp16-compile-alt-12.c
+++ b/gcc/testsuite/gcc.target/arm/fp16-compile-alt-12.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-require-effective-target arm_fp16_alternative_ok } */
 /* { dg-options "-mfp16-format=alternative" } */
 
 float xx __attribute__((mode(HF))) = 0.0;
diff --git a/gcc/testsuite/gcc.target/arm/fp16-compile-alt-2.c b/gcc/testsuite/gcc.target/arm/fp16-compile-alt-2.c
index b7fe99d..8a45f1f 100644
--- a/gcc/testsuite/gcc.target/arm/fp16-compile-alt-2.c
+++ b/gcc/testsuite/gcc.target/arm/fp16-compile-alt-2.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-require-effective-target arm_fp16_alternative_ok } */
 /* { dg-options "-mfp16-format=alternative" } */
 
 /* Encoding taken from:  http://en.wikipedia.org/wiki/Half_precision */
diff --git a/gcc/testsuite/gcc.target/arm/fp16-compile-alt-3.c b/gcc/testsuite/gcc.target/arm/fp16-compile-alt-3.c
index f325a84..e786a51 100644
--- a/gcc/testsuite/gcc.target/arm/fp16-compile-alt-3.c
+++ b/gcc/testsuite/gcc.target/arm/fp16-compile-alt-3.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-require-effective-target arm_fp16_alternative_ok } */
 /* { dg-options "-mfp16-format=alternative" } */
 
 /* Encoding taken from:  http://en.wikipedia.org/wiki/Half_precision */
diff --git a/gcc/testsuite/gcc.target/arm/fp16-compile-alt-4.c b/gcc/testsuite/gcc.target/arm/fp16-compile-alt-4.c
index 4b9b331..cfeb61a 100644
--- a/gcc/testsuite/gcc.target/arm/fp16-compile-alt-4.c
+++ b/gcc/testsuite/gcc.target/arm/fp16-compile-alt-4.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-require-effective-target arm_fp16_alternative_ok } */
 /* { dg-options "-mfp16-format=alternative" } */
 
 /* Encoding taken from:  http://en.wikipedia.org/wiki/Half_precision */
diff --git a/gcc/testsuite/gcc.target/arm/fp16-compile-alt-5.c b/gcc/testsuite/gcc.target/arm/fp16-compile-alt-5.c
index 458f507..3b741ae 100644
--- a/gcc/testsuite/gcc.target/arm/fp16-compile-alt-5.c
+++ b/gcc/testsuite/gcc.target/arm/fp16-compile-alt-5.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-require-effective-target arm_fp16_alternative_ok } */
 /* { dg-options "-mfp16-format=alternative" } */
 
 /* Encoding taken from:  http://en.wikipedia.org/wiki/Half_precision */
diff --git a/gcc/testsuite/gcc.target/arm/fp16-compile-alt-6.c b/gcc/testsuite/gcc.target/arm/fp16-compile-alt-6.c
index dbb4a99..abffff5 100644
--- a/gcc/testsuite/gcc.target/arm/fp16-compile-alt-6.c
+++ b/gcc/testsuite/gcc.target/arm/fp16-compile-alt-6.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-require-effective-target arm_fp16_alternative_ok } */
 /* { dg-options "-mfp16-format=alternative" } */
 
 /* This number is the maximum value representable in the alternative
diff --git a/gcc/testsuite/gcc.target/arm/fp16-compile-alt-7.c b/gcc/testsuite/gcc.target/arm/fp16-compile-alt-7.c
index 40940a6..c339f19 100644
--- a/gcc/testsuite/gcc.target/arm/fp16-compile-alt-7.c
+++ b/gcc/testsuite/gcc.target/arm/fp16-compile-alt-7.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-require-effective-target arm_fp16_alternative_ok } */
 /* { dg-options "-mfp16-format=alternative -pedantic" } */
 
 /* This number overflows the range of the alternative encoding.  Since this
diff --git a/gcc/testsuite/gcc.target/arm/fp16-compile-alt-8.c b/gcc/testsuite/gcc.target/arm/fp16-compile-alt-8.c
index cbc0a39..deeb5cd 100644
--- a/gcc/testsuite/gcc.target/arm/fp16-compile-alt-8.c
+++ b/gcc/testsuite/gcc.target/arm/fp16-compile-alt-8.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-require-effective-target arm_fp16_alternative_ok } */
 /* { dg-options "-mfp16-format=alternative" } */
 
 /* Encoding taken from:  http://en.wikipedia.org/wiki/Half_precision */
diff --git a/gcc/testsuite/gcc.target/arm/fp16-compile-alt-9.c b/gcc/testsuite/gcc.target/arm/fp16-compile-alt-9.c
index 6487c8d..f9f5654 100644
--- a/gcc/testsuite/gcc.target/arm/fp16-compile-alt-9.c
+++ b/gcc/testsuite/gcc.target/arm/fp16-compile-alt-9.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-require-effective-target arm_fp16_alternative_ok } */
 /* { dg-options "-mfp16-format=alternative" } */
 
 /* Encoding taken from:  http://en.wikipedia.org/wiki/Half_precision */
diff --git a/gcc/testsuite/gcc.target/arm/fp16-compile-none-1.c b/gcc/testsuite/gcc.target/arm/fp16-compile-none-1.c
index e912505..9472249 100644
--- a/gcc/testsuite/gcc.target/arm/fp16-compile-none-1.c
+++ b/gcc/testsuite/gcc.target/arm/fp16-compile-none-1.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-require-effective-target arm_fp16_none_ok } */
 /* { dg-options "-mfp16-format=none" } */
 
 /* __fp16 type name is not recognized unless you explicitly enable it
diff --git a/gcc/testsuite/gcc.target/arm/fp16-compile-none-2.c b/gcc/testsuite/gcc.target/arm/fp16-compile-none-2.c
index eb7eef5..9ec21e5 100644
--- a/gcc/testsuite/gcc.target/arm/fp16-compile-none-2.c
+++ b/gcc/testsuite/gcc.target/arm/fp16-compile-none-2.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-require-effective-target arm_fp16_none_ok } */
 /* { dg-options "-mfp16-format=none" } */
 
 /* mode(HF) attributes are not recognized unless you explicitly enable
diff --git a/gcc/testsuite/gcc.target/arm/fp16-rounding-alt-1.c b/gcc/testsuite/gcc.target/arm/fp16-rounding-alt-1.c
index f50b447..1c15b61 100644
--- a/gcc/testsuite/gcc.target/arm/fp16-rounding-alt-1.c
+++ b/gcc/testsuite/gcc.target/arm/fp16-rounding-alt-1.c
@@ -3,6 +3,7 @@
    from double to __fp16.  */
 
 /* { dg-do run } */
+/* { dg-require-effective-target arm_fp16_alternative_ok } */
 /* { dg-options "-mfp16-format=alternative" } */
 
 #include <stdlib.h>
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 04ca176..ed89a3b 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -3043,6 +3043,65 @@ proc add_options_for_arm_neon_fp16 { flags } {
     return "$flags $et_arm_neon_fp16_flags"
 }
 
+# Return 1 if this is an ARM target supporting the FP16 alternative
+# format.  Some multilibs may be incompatible with the options needed.  Also
+# set et_arm_neon_fp16_flags to the best options to add.
+
+proc check_effective_target_arm_fp16_alternative_ok_nocache { } {
+    global et_arm_neon_fp16_flags
+    set et_arm_neon_fp16_flags ""
+    if { [check_effective_target_arm32] } {
+	foreach flags {"" "-mfloat-abi=softfp" "-mfpu=neon-fp16"
+		       "-mfpu=neon-fp16 -mfloat-abi=softfp"} {
+	    if { [check_no_compiler_messages_nocache \
+		      arm_fp16_alternative_ok object {
+		#if !defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+		#error __ARM_FP16_FORMAT_ALTERNATIVE not defined
+		#endif
+	    } "$flags -mfp16-format=alternative"] } {
+		set et_arm_neon_fp16_flags "$flags -mfp16-format=alternative"
+		return 1
+	    }
+	}
+    }
+
+    return 0
+}
+
+proc check_effective_target_arm_fp16_alternative_ok { } {
+    return [check_cached_effective_target arm_fp16_alternative_ok \
+		check_effective_target_arm_fp16_alternative_ok_nocache]
+}
+
+# Return 1 if this is an ARM target supports specifying the FP16 none
+# format.  Some multilibs may be incompatible with the options needed.
+
+proc check_effective_target_arm_fp16_none_ok_nocache { } {
+    if { [check_effective_target_arm32] } {
+	foreach flags {"" "-mfloat-abi=softfp" "-mfpu=neon-fp16"
+		       "-mfpu=neon-fp16 -mfloat-abi=softfp"} {
+	    if { [check_no_compiler_messages_nocache \
+		      arm_fp16_none_ok object {
+		#if defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+		#error __ARM_FP16_FORMAT_ALTERNATIVE defined
+		#endif
+		#if defined (__ARM_FP16_FORMAT_IEEE)
+		#error __ARM_FP16_FORMAT_IEEE defined
+		#endif
+	    } "$flags -mfp16-format=none"] } {
+		return 1
+	    }
+	}
+    }
+
+    return 0
+}
+
+proc check_effective_target_arm_fp16_none_ok { } {
+    return [check_cached_effective_target arm_fp16_none_ok \
+		check_effective_target_arm_fp16_none_ok_nocache]
+}
+
 # Return 1 if this is an ARM target supporting -mfpu=neon-fp-armv8
 # -mfloat-abi=softfp or equivalent options.  Some multilibs may be
 # incompatible with these options.  Also set et_arm_v8_neon_flags to the
-- 
2.1.4


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 3/17][Testsuite] Add ARM support for ARMv8.2-A with FP16 arithmetic instructions.
  2016-05-17 14:20 [PATCH 0/17][ARM] ARMv8.2-A and FP16 extension support Matthew Wahab
  2016-05-17 14:23 ` [PATCH 1/17][ARM] Add ARMv8.2-A command line option and profile Matthew Wahab
  2016-05-17 14:25 ` [PATCH 2/17][Testsuite] Add a selector for ARM FP16 alternative format support Matthew Wahab
@ 2016-05-17 14:26 ` Matthew Wahab
  2016-07-04 13:49   ` Matthew Wahab
  2016-05-17 14:28 ` [PATCH 4/17][ARM] Define feature macros for FP16 Matthew Wahab
                   ` (13 subsequent siblings)
  16 siblings, 1 reply; 73+ messages in thread
From: Matthew Wahab @ 2016-05-17 14:26 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1497 bytes --]

The ARMv8.2-A FP16 extension adds to both the VFP and the NEON
instruction sets. This patch adds support to the testsuite to select
targets and set options for tests that make use of these
instructions. It also adds documentation for ARMv8.1-A selectors.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on an
ARMv8.2-A emulator.

Ok for trunk?
Matthew

2016-05-17  Matthew Wahab  <matthew.wahab@arm.com>

	* doc/sourcebuild.texi (ARM-specific attributes): Add entries for
         arm_v8_1a_neon_ok, arm_v8_2a_fp16_scalar_ok, arm_v8_2a_fp16_scalar_hw,
	arm_v8_2a_fp16_neon_ok and arm_v8_2a_fp16_neon_hw.
	(Add options): Add entries for arm_v8_1a_neon, arm_v8_2a_scalar,
	arm_v8_2a_neon.

testsuite/
2016-05-17  Matthew Wahab  <matthew.wahab@arm.com>

	* lib/target-supports.exp
	(add_options_for_arm_v8_2a_fp16_scalar_ok): New.
	(add_options_for_arm_v8_2a_fp16_neon): New.
	(check_effective_target_arm_arch_v8_2a_ok): Auto-generate.
	(add_options_for_arm_arch_v8_2a): Auto-generate.
	(check_effective_target_arm_arch_v8_2a_multilib): Auto-generate.
	(check_effective_target_arm_v8_2a_fp16_scalar_ok_nocache): New.
	(check_effective_target_arm_v8_2a_fp16_scalar_ok): New.
	(check_effective_target_arm_v8_2a_fp16_neon_ok_nocache): New.
	(check_effective_target_arm_v8_2a_fp16_neon_ok): New.
	(check_effective_target_arm_v8_2a_fp16_scalar_hw): New.
	(check_effective_target_arm_v8_2a_fp16_neon_hw): New.


[-- Attachment #2: 0003-PATCH-3-17-Testsuite-Add-ARM-support-for-ARMv8.2-A-w.patch --]
[-- Type: text/x-patch, Size: 10062 bytes --]

From ba9b4dcf774d0fdffae11ac59537255775e8f1b6 Mon Sep 17 00:00:00 2001
From: Matthew Wahab <matthew.wahab@arm.com>
Date: Thu, 7 Apr 2016 13:34:30 +0100
Subject: [PATCH 03/17] [PATCH 3/17][Testsuite] Add ARM support for ARMv8.2-A
 with FP16   arithmetic instructions.

2016-05-17  Matthew Wahab  <matthew.wahab@arm.com>

	* doc/sourcebuild.texi (ARM-specific attributes): Add entries for
        arm_v8_1a_none_ok, arm_v8_2a_fp16_scalar_ok, arm_v8_2a_fp16_scalar_hw,
	arm_v8_2a_fp16_neon_ok and arm_v8_2a_fp16_neon_hw.
	(Add options): Add entries for arm_v8_1a_neon, arm_v8_2a_scalar,
	arm_v8_2a_neon.
	* lib/target-supports.exp
	(add_options_for_arm_v8_2a_fp16_scalar_ok): New.
	(add_options_for_arm_v8_2a_fp16_neon): New.
	(check_effective_target_arm_arch_v8_2a_ok): Auto-generate.
	(add_options_for_arm_arch_v8_2a): Auto-generate.
	(check_effective_target_arm_arch_v8_2a_multilib): Auto-generate.
	(check_effective_target_arm_v8_2a_fp16_scalar_ok_nocache): New.
	(check_effective_target_arm_v8_2a_fp16_scalar_ok): New.
	(check_effective_target_arm_v8_2a_fp16_neon_ok_nocache): New.
	(check_effective_target_arm_v8_2a_fp16_neon_ok): New.
	(check_effective_target_arm_v8_2a_fp16_scalar_hw): New.
	(check_effective_target_arm_v8_2a_fp16_neon_hw): New.
---
 gcc/doc/sourcebuild.texi              |  40 ++++++++++
 gcc/testsuite/lib/target-supports.exp | 145 +++++++++++++++++++++++++++++++++-
 2 files changed, 184 insertions(+), 1 deletion(-)

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index dd6abda..66904a7 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1596,6 +1596,7 @@ ARM target supports @code{-mfpu=neon-fp-armv8 -mfloat-abi=softfp}.
 Some multilibs may be incompatible with these options.
 
 @item arm_v8_1a_neon_ok
+@anchor{arm_v8_1a_neon_ok}
 ARM target supports options to generate ARMv8.1 Adv.SIMD instructions.
 Some multilibs may be incompatible with these options.
 
@@ -1604,6 +1605,28 @@ ARM target supports executing ARMv8.1 Adv.SIMD instructions.  Some
 multilibs may be incompatible with the options needed.  Implies
 arm_v8_1a_neon_ok.
 
+@item arm_v8_2a_fp16_scalar_ok
+@anchor{arm_v8_2a_fp16_scalar_ok}
+ARM target supports options to generate instructions for ARMv8.2 and
+scalar instructions from the FP16 extension.  Some multilibs may be
+incompatible with these options.
+
+@item arm_v8_2a_fp16_scalar_hw
+ARM target supports executing instructions for ARMv8.2 and scalar
+instructions from the FP16 extension.  Some multilibs may be
+incompatible with these options.  Implies arm_v8_2a_fp16_neon_ok.
+
+@item arm_v8_2a_fp16_neon_ok
+@anchor{arm_v8_2a_fp16_neon_ok}
+ARM target supports options to generate instructions from ARMv8.2 with
+the FP16 extension.  Some multilibs may be incompatible with these
+options.  Implies arm_v8_2a_fp16_scalar_ok.
+
+@item arm_v8_2a_fp16_neon_hw
+ARM target supports executing instructions from ARMv8.2 with the FP16
+extension.  Some multilibs may be incompatible with these options.
+Implies arm_v8_2a_fp16_neon_ok and arm_v8_2a_fp16_scalar_hw.
+
 @item arm_prefer_ldrd_strd
 ARM target prefers @code{LDRD} and @code{STRD} instructions over
 @code{LDM} and @code{STM} instructions.
@@ -2088,6 +2111,23 @@ the @ref{arm_neon_fp16_ok,,arm_neon_fp16_ok effective target keyword}.
 arm vfp3 floating point support; see
 the @ref{arm_vfp3_ok,,arm_vfp3_ok effective target keyword}.
 
+@item arm_v8_1a_neon
+Add options for ARMv8.1 with Adv.SIMD support, if this is supported
+by the target; see the @ref{arm_v8_1a_neon_ok,,arm_v8_1a_neon_ok}
+effective target keyword.
+
+@item arm_v8_2a_fp16_scalar
+Add options for ARMv8.2 with scalar FP16 support, if this is
+supported by the target; see the
+@ref{arm_v8_2a_fp16_scalar_ok,,arm_v8_2a_fp16_scalar_ok} effective
+target keyword.
+
+@item arm_v8_2a_fp16_neon
+Add options for ARMv8.2 with Adv.SIMD FP16 support, if this is
+supported by the target; see the
+@ref{arm_v8_2a_fp16_neon_ok,,arm_v8_2a_fp16_neon_ok} effective target
+keyword.
+
 @item bind_pic_locally
 Add the target-specific flags needed to enable functions to bind
 locally when using pic/PIC passes in the testsuite.
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index ed89a3b..7354acc 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2904,6 +2904,28 @@ proc add_options_for_arm_v8_1a_neon { flags } {
     return "$flags $et_arm_v8_1a_neon_flags -march=armv8.1-a"
 }
 
+# Add the options needed for ARMv8.2 with the scalar FP16 extension.
+# Also adds the ARMv8 FP options for ARM.
+
+proc add_options_for_arm_v8_2a_fp16_scalar { flags } {
+    if { ! [check_effective_target_arm_v8_2a_fp16_scalar_ok] } {
+	return "$flags"
+    }
+    global et_arm_v8_2a_fp16_scalar_flags
+    return "$flags $et_arm_v8_2a_fp16_scalar_flags"
+}
+
+# Add the options needed for ARMv8.2 with the FP16 extension.  Also adds
+# the ARMv8 NEON options for ARM.
+
+proc add_options_for_arm_v8_2a_fp16_neon { flags } {
+    if { ! [check_effective_target_arm_v8_2a_fp16_neon_ok] } {
+	return "$flags"
+    }
+    global et_arm_v8_2a_fp16_neon_flags
+    return "$flags $et_arm_v8_2a_fp16_neon_flags"
+}
+
 proc add_options_for_arm_crc { flags } {
     if { ! [check_effective_target_arm_crc_ok] } {
         return "$flags"
@@ -3251,7 +3273,8 @@ foreach { armfunc armflag armdef } { v4 "-march=armv4 -marm" __ARM_ARCH_4__
 				     v7m "-march=armv7-m -mthumb" __ARM_ARCH_7M__
 				     v7em "-march=armv7e-m -mthumb" __ARM_ARCH_7EM__
 				     v8a "-march=armv8-a" __ARM_ARCH_8A__
-				     v8_1a "-march=armv8.1a" __ARM_ARCH_8A__ } {
+				     v8_1a "-march=armv8.1a" __ARM_ARCH_8A__
+				     v8_2a "-march=armv8.2a" __ARM_ARCH_8A__ } {
     eval [string map [list FUNC $armfunc FLAG $armflag DEF $armdef ] {
 	proc check_effective_target_arm_arch_FUNC_ok { } {
 	    if { [ string match "*-marm*" "FLAG" ] &&
@@ -3463,6 +3486,76 @@ proc check_effective_target_arm_v8_1a_neon_ok { } {
 		check_effective_target_arm_v8_1a_neon_ok_nocache]
 }
 
+# Return 1 if the target supports ARMv8.2 scalar FP16 arithmetic
+# instructions, 0 otherwise.  The test is valid for ARM.  Record the
+# command line options needed.
+
+proc check_effective_target_arm_v8_2a_fp16_scalar_ok_nocache { } {
+    global et_arm_v8_2a_fp16_scalar_flags
+    set et_arm_v8_2a_fp16_scalar_flags ""
+
+    if { ![istarget arm*-*-*] } {
+	return 0;
+    }
+
+    # Iterate through sets of options to find the compiler flags that
+    # need to be added to the -march option.
+    foreach flags {"" "-mfpu=fp-armv8" "-mfloat-abi=softfp" \
+		       "-mfpu=fp-armv8 -mfloat-abi=softfp"} {
+	if { [check_no_compiler_messages_nocache \
+		  arm_v8_2a_fp16_scalar_ok object {
+	    #if !defined (__ARM_FEATURE_FP16_SCALAR_ARITHMETIC)
+	    #error "__ARM_FEATURE_FP16_SCALAR_ARITHMETIC not defined"
+	    #endif
+	} "$flags -march=armv8.2-a+fp16"] } {
+	    set et_arm_v8_2a_fp16_scalar_flags "$flags -march=armv8.2-a+fp16"
+	    return 1
+	}
+    }
+
+    return 0;
+}
+
+proc check_effective_target_arm_v8_2a_fp16_scalar_ok { } {
+    return [check_cached_effective_target arm_v8_2a_fp16_scalar_ok \
+		check_effective_target_arm_v8_2a_fp16_scalar_ok_nocache]
+}
+
+# Return 1 if the target supports ARMv8.2 Adv.SIMD FP16 arithmetic
+# instructions, 0 otherwise.  The test is valid for ARM.  Record the
+# command line options needed.
+
+proc check_effective_target_arm_v8_2a_fp16_neon_ok_nocache { } {
+    global et_arm_v8_2a_fp16_neon_flags
+    set et_arm_v8_2a_fp16_neon_flags ""
+
+    if { ![istarget arm*-*-*] } {
+	return 0;
+    }
+
+    # Iterate through sets of options to find the compiler flags that
+    # need to be added to the -march option.
+    foreach flags {"" "-mfpu=neon-fp-armv8" "-mfloat-abi=softfp" \
+		       "-mfpu=neon-fp-armv8 -mfloat-abi=softfp"} {
+	if { [check_no_compiler_messages_nocache \
+		  arm_v8_2a_fp16_neon_ok object {
+	    #if !defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+	    #error "__ARM_FEATURE_FP16_VECTOR_ARITHMETIC not defined"
+	    #endif
+	} "$flags -march=armv8.2-a+fp16"] } {
+	    set et_arm_v8_2a_fp16_neon_flags "$flags -march=armv8.2-a+fp16"
+	    return 1
+	}
+    }
+
+    return 0;
+}
+
+proc check_effective_target_arm_v8_2a_fp16_neon_ok { } {
+    return [check_cached_effective_target arm_v8_2a_fp16_neon_ok \
+		check_effective_target_arm_v8_2a_fp16_neon_ok_nocache]
+}
+
 # Return 1 if the target supports executing ARMv8 NEON instructions, 0
 # otherwise.
 
@@ -3519,6 +3612,56 @@ proc check_effective_target_arm_v8_1a_neon_hw { } {
     } [add_options_for_arm_v8_1a_neon ""]]
 }
 
+# Return 1 if the target supports executing instructions floating point
+# instructions from ARMv8.2 with the FP16 extension, 0 otherwise.  The
+# test is valid for ARM.
+
+proc check_effective_target_arm_v8_2a_fp16_scalar_hw { } {
+    if { ![check_effective_target_arm_v8_2a_fp16_scalar_ok] } {
+	return 0;
+    }
+    return [check_runtime arm_v8_2a_fp16_scalar_hw_available {
+	int
+	main (void)
+	{
+	  __fp16 a = 1.0;
+	  __fp16 result;
+
+	  asm ("vabs.f16 %0, %1"
+	       : "=w"(result)
+	       : "w"(a)
+	       : /* No clobbers.  */);
+
+	  return (result == 1.0) ? 0 : 1;
+	}
+    } [add_options_for_arm_v8_2a_fp16_scalar ""]]
+}
+
+# Return 1 if the target supports executing instructions Adv.SIMD
+# instructions from ARMv8.2 with the FP16 extension, 0 otherwise.  The
+# test is valid for ARM.
+
+proc check_effective_target_arm_v8_2a_fp16_neon_hw { } {
+    if { ![check_effective_target_arm_v8_2a_fp16_neon_ok] } {
+	return 0;
+    }
+    return [check_runtime arm_v8_2a_fp16_neon_hw_available {
+	int
+	main (void)
+	{
+	  __simd64_float16_t a = {1.0, -1.0, 1.0, -1.0};
+	  __simd64_float16_t result;
+
+	  asm ("vabs.f16 %P0, %P1"
+	       : "=w"(result)
+	       : "w"(a)
+	       : /* No clobbers.  */);
+
+	  return (result[0] == 1.0) ? 0 : 1;
+	}
+    } [add_options_for_arm_v8_2a_fp16_neon ""]]
+}
+
 # Return 1 if this is a ARM target with NEON enabled.
 
 proc check_effective_target_arm_neon { } {
-- 
2.1.4


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4/17][ARM] Define feature macros for FP16.
  2016-05-17 14:20 [PATCH 0/17][ARM] ARMv8.2-A and FP16 extension support Matthew Wahab
                   ` (2 preceding siblings ...)
  2016-05-17 14:26 ` [PATCH 3/17][Testsuite] Add ARM support for ARMv8.2-A with FP16 arithmetic instructions Matthew Wahab
@ 2016-05-17 14:28 ` Matthew Wahab
  2016-07-27 13:35   ` Ramana Radhakrishnan
  2016-05-17 14:29 ` [PATCH 5/17][ARM] Enable HI mode moves for floating point values Matthew Wahab
                   ` (12 subsequent siblings)
  16 siblings, 1 reply; 73+ messages in thread
From: Matthew Wahab @ 2016-05-17 14:28 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1034 bytes --]

The FP16 extension introduced with the ARMv8.2-A architecture adds
instructions operating on FP16 values to the VFP and NEON instruction
sets.

The patch adds the feature macro __ARM_FEATURE_FP16_SCALAR_ARITHMETIC
which is defined to be 1 if the VFP FP16 instructions are available; it
is otherwise undefined.

The patch also adds the feature macro __ARM_FEATURE_FP16_VECTOR_ARITHMETIC
which is defined to be 1 if the NEON FP16 instructions are available; it
is otherwise undefined.

These two macros will appear in a future version of the ACLE.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator.

Ok for trunk?
Matthew

2016-05-17  Matthew Wahab  <matthew.wahab@arm.com>

	* config/arm/arm-c.c (arm_cpu_builtins): Define
	"__ARM_FEATURE_FP16_SCALAR_ARITHMETIC" and
	"__ARM_FEATURE_FP16_VECTOR_ARITHMETIC".

testsuite/
2016-05-17  Matthew Wahab  <matthew.wahab@arm.com>

	* gcc.target/arm/attr-fp16-arith-1.c: New.


[-- Attachment #2: 0004-PATCH-4-17-ARM-Define-feature-macros-for-FP16.patch --]
[-- Type: text/x-patch, Size: 2891 bytes --]

From 688b4d34a64a40abd4705a9bdaea40929a7a1d26 Mon Sep 17 00:00:00 2001
From: Matthew Wahab <matthew.wahab@arm.com>
Date: Thu, 7 Apr 2016 13:32:15 +0100
Subject: [PATCH 04/17] [PATCH 4/17][ARM] Define feature macros for FP16.

2016-05-17  Matthew Wahab  <matthew.wahab@arm.com>

	* config/arm/arm-c.c (arm_cpu_builtins): Define
	"__ARM_FEATURE_FP16_SCALAR_ARITHMETIC" and
	"__ARM_FEATURE_FP16_VECTOR_ARITHMETIC".

testsuite/
2016-05-17  Matthew Wahab  <matthew.wahab@arm.com>

	* gcc.target/arm/attr-fp16-arith-1.c: New.
---
 gcc/config/arm/arm-c.c                           |  5 +++
 gcc/testsuite/gcc.target/arm/attr-fp16-arith-1.c | 45 ++++++++++++++++++++++++
 2 files changed, 50 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/arm/attr-fp16-arith-1.c

diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
index b98470f..7283700 100644
--- a/gcc/config/arm/arm-c.c
+++ b/gcc/config/arm/arm-c.c
@@ -142,6 +142,11 @@ arm_cpu_builtins (struct cpp_reader* pfile)
   def_or_undef_macro (pfile, "__ARM_FP16_ARGS",
 		      arm_fp16_format != ARM_FP16_FORMAT_NONE);
 
+  def_or_undef_macro (pfile, "__ARM_FEATURE_FP16_SCALAR_ARITHMETIC",
+		      TARGET_VFP_FP16INST);
+  def_or_undef_macro (pfile, "__ARM_FEATURE_FP16_VECTOR_ARITHMETIC",
+		      TARGET_NEON_FP16INST);
+
   def_or_undef_macro (pfile, "__ARM_FEATURE_FMA", TARGET_FMA);
   def_or_undef_macro (pfile, "__ARM_NEON__", TARGET_NEON);
   def_or_undef_macro (pfile, "__ARM_NEON", TARGET_NEON);
diff --git a/gcc/testsuite/gcc.target/arm/attr-fp16-arith-1.c b/gcc/testsuite/gcc.target/arm/attr-fp16-arith-1.c
new file mode 100644
index 0000000..5011315
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/attr-fp16-arith-1.c
@@ -0,0 +1,45 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_ok } */
+/* { dg-options "-O2" } */
+/* { dg-add-options arm_v8_2a_fp16_scalar } */
+
+/* Reset fpu to a value compatible with the next pragmas.  */
+#pragma GCC target ("fpu=vfp")
+
+#pragma GCC push_options
+#pragma GCC target ("fpu=fp-armv8")
+
+#ifndef __ARM_FEATURE_FP16_SCALAR_ARITHMETIC
+#error __ARM_FEATURE_FP16_SCALAR_ARITHMETIC not defined.
+#endif
+
+#pragma GCC push_options
+#pragma GCC target ("fpu=neon-fp-armv8")
+
+#ifndef __ARM_FEATURE_FP16_VECTOR_ARITHMETIC
+#error __ARM_FEATURE_FP16_VECTOR_ARITHMETIC not defined.
+#endif
+
+#ifndef __ARM_NEON
+#error __ARM_NEON not defined.
+#endif
+
+#if !defined (__ARM_FP) || !(__ARM_FP & 0x2)
+#error Invalid value for __ARM_FP
+#endif
+
+#pragma GCC pop_options
+
+/* Check that the FP version is correctly reset to mfpu=fp-armv8.  */
+
+#if !defined (__ARM_FP) || !(__ARM_FP & 0x2)
+#error __ARM_FP should record FP16 support.
+#endif
+
+#pragma GCC pop_options
+
+/* Check that the FP version is correctly reset to mfpu=vfp.  */
+
+#if !defined (__ARM_FP) || (__ARM_FP & 0x2)
+#error Unexpected value for __ARM_FP.
+#endif
-- 
2.1.4


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 5/17][ARM] Enable HI mode moves for floating point values.
  2016-05-17 14:20 [PATCH 0/17][ARM] ARMv8.2-A and FP16 extension support Matthew Wahab
                   ` (3 preceding siblings ...)
  2016-05-17 14:28 ` [PATCH 4/17][ARM] Define feature macros for FP16 Matthew Wahab
@ 2016-05-17 14:29 ` Matthew Wahab
  2016-07-27 13:57   ` Ramana Radhakrishnan
  2016-05-17 14:32 ` [PATCH 6/17][ARM] Add data processing intrinsics for float16_t Matthew Wahab
                   ` (11 subsequent siblings)
  16 siblings, 1 reply; 73+ messages in thread
From: Matthew Wahab @ 2016-05-17 14:29 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1211 bytes --]

The handling of 16-bit integer data-movement in the ARM backend doesn't
make full use of the VFP instructions when they are available, even when
the values are for use in VFP operations.

This patch adds support for using the VFP instructions and registers
when moving 16-bit integer and floating point data between registers and
between registers and memory.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator. Tested this patch for arm-none-linux-gnueabihf
with native bootstrap and make check and for arm-none-eabi with
check-gcc on an ARMv8.2-A emulator.

Ok for trunk?
Matthew

2016-05-17  Jiong Wang  <jiong.wang@arm.com>
	    Matthew Wahab  <matthew.wahab@arm.com>

	* config/arm/arm.c (output_move_vfp): Weaken assert to allow
	HImode.
	(arm_hard_regno_mode_ok): Allow HImode values in VFP registers.
	* config/arm/arm.md (*movhi_insn_arch4) Disable when VFP registers are
	available.
	(*movhi_bytes): Likewise.
	* config/arm/vfp.md (*arm_movhi_vfp): New.
	(*thumb2_movhi_vfp): New.

testsuite/
2016-05-17  Matthew Wahab  <matthew.wahab@arm.com>

	* gcc.target/arm/short-vfp-1.c: New.


[-- Attachment #2: 0005-PATCH-5-17-ARM-Enable-HI-mode-moves-for-floating-poi.patch --]
[-- Type: text/x-patch, Size: 7288 bytes --]

From 0b8bc5f2966924c523d6fd75cf73dd01341914e2 Mon Sep 17 00:00:00 2001
From: Matthew Wahab <matthew.wahab@arm.com>
Date: Thu, 7 Apr 2016 13:33:04 +0100
Subject: [PATCH 05/17] [PATCH 5/17][ARM] Enable HI mode moves for floating
 point values.

2016-05-17  Jiong Wang  <jiong.wang@arm.com>
	    Matthew Wahab  <matthew.wahab@arm.com>

	* config/arm/arm.c (output_move_vfp): Weaken assert to allow
	HImode.
	(arm_hard_regno_mode_ok): Allow HImode values in VFP registers.
	* config/arm/arm.md (*movhi_bytes): Disable when VFP registers are
	available.  Also fix some white-space.
	* config/arm/vfp.md (*arm_movhi_vfp): New.
	(*thumb2_movhi_vfp): New.

testsuite/
2016-05-17  Matthew Wahab  <matthew.wahab@arm.com>

	* gcc.target/arm/short-vfp-1.c: New.
---
 gcc/config/arm/arm.c                       |  5 ++
 gcc/config/arm/arm.md                      |  6 +-
 gcc/config/arm/vfp.md                      | 93 ++++++++++++++++++++++++++++++
 gcc/testsuite/gcc.target/arm/short-vfp-1.c | 45 +++++++++++++++
 4 files changed, 146 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/short-vfp-1.c

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index f3914ef..26a8a48 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -18628,6 +18628,7 @@ output_move_vfp (rtx *operands)
   gcc_assert ((mode == HFmode && TARGET_HARD_FLOAT && TARGET_VFP)
 	      || mode == SFmode
 	      || mode == DFmode
+	      || mode == HImode
 	      || mode == SImode
 	      || mode == DImode
               || (TARGET_NEON && VALID_NEON_DREG_MODE (mode)));
@@ -23422,6 +23423,10 @@ arm_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
       if (mode == HFmode)
 	return VFP_REGNO_OK_FOR_SINGLE (regno);
 
+      /* VFP registers can hold HImode values.  */
+      if (mode == HImode)
+	return VFP_REGNO_OK_FOR_SINGLE (regno);
+
       if (TARGET_NEON)
         return (VALID_NEON_DREG_MODE (mode) && VFP_REGNO_OK_FOR_DOUBLE (regno))
                || (VALID_NEON_QREG_MODE (mode)
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 4049f10..3e23178 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -6365,7 +6365,7 @@
   [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r,r,m,r")
 	(match_operand:HI 1 "general_operand"      "rIk,K,n,r,mi"))]
   "TARGET_ARM
-   && arm_arch4
+   && arm_arch4 && !(TARGET_HARD_FLOAT && TARGET_VFP)
    && (register_operand (operands[0], HImode)
        || register_operand (operands[1], HImode))"
   "@
@@ -6391,7 +6391,7 @@
 (define_insn "*movhi_bytes"
   [(set (match_operand:HI 0 "s_register_operand" "=r,r,r")
 	(match_operand:HI 1 "arm_rhs_operand"  "I,rk,K"))]
-  "TARGET_ARM"
+  "TARGET_ARM && !(TARGET_HARD_FLOAT && TARGET_VFP)"
   "@
    mov%?\\t%0, %1\\t%@ movhi
    mov%?\\t%0, %1\\t%@ movhi
@@ -6399,7 +6399,7 @@
   [(set_attr "predicable" "yes")
    (set_attr "type" "mov_imm,mov_reg,mvn_imm")]
 )
-	
+
 ;; We use a DImode scratch because we may occasionally need an additional
 ;; temporary if the address isn't offsettable -- push_reload doesn't seem
 ;; to take any notice of the "o" constraints on reload_memory_operand operand.
diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
index 9750ba1..d7c874a 100644
--- a/gcc/config/arm/vfp.md
+++ b/gcc/config/arm/vfp.md
@@ -18,6 +18,99 @@
 ;; along with GCC; see the file COPYING3.  If not see
 ;; <http://www.gnu.org/licenses/>.  */
 
+;; Patterns for HI moves which provide more data transfer instructions when VFP
+;; support is enabled.
+(define_insn "*arm_movhi_vfp"
+ [(set
+   (match_operand:HI 0 "nonimmediate_operand"
+    "=rk,  r, r, m, r, *t,  r, *t")
+   (match_operand:HI 1 "general_operand"
+    "rIk, K, n, r, mi, r, *t, *t"))]
+ "TARGET_ARM && TARGET_HARD_FLOAT && TARGET_VFP
+  && (register_operand (operands[0], HImode)
+       || register_operand (operands[1], HImode))"
+{
+  switch (which_alternative)
+    {
+    case 0:
+      return "mov%?\t%0, %1\t%@ movhi";
+    case 1:
+      return "mvn%?\t%0, #%B1\t%@ movhi";
+    case 2:
+      return "movw%?\t%0, %L1\t%@ movhi";
+    case 3:
+      return "strh%?\t%1, %0\t%@ movhi";
+    case 4:
+      return "ldrh%?\t%0, %1\t%@ movhi";
+    case 5:
+    case 6:
+      return "vmov%?\t%0, %1\t%@ int";
+    case 7:
+      return "vmov%?.f32\t%0, %1\t%@ int";
+    default:
+      gcc_unreachable ();
+    }
+}
+ [(set_attr "predicable" "yes")
+  (set_attr_alternative "type"
+   [(if_then_else
+     (match_operand 1 "const_int_operand" "")
+     (const_string "mov_imm")
+     (const_string "mov_reg"))
+    (const_string "mvn_imm")
+    (const_string "mov_imm")
+    (const_string "store1")
+    (const_string "load1")
+    (const_string "f_mcr")
+    (const_string "f_mrc")
+    (const_string "fmov")])
+  (set_attr "pool_range" "*, *, *, *, 256, *, *, *")
+  (set_attr "neg_pool_range" "*, *, *, *, 244, *, *, *")
+  (set_attr "length" "4")]
+)
+
+(define_insn "*thumb2_movhi_vfp"
+ [(set
+   (match_operand:HI 0 "nonimmediate_operand"
+    "=rk, r, l, r, m, r, *t, r, *t")
+   (match_operand:HI 1 "general_operand"
+    "rk, I, Py, n, r, m, r, *t, *t"))]
+ "TARGET_THUMB2 && TARGET_HARD_FLOAT && TARGET_VFP
+  && (register_operand (operands[0], HImode)
+       || register_operand (operands[1], HImode))"
+{
+  switch (which_alternative)
+    {
+    case 0:
+    case 1:
+    case 2:
+      return "mov%?\t%0, %1\t%@ movhi";
+    case 3:
+      return "movw%?\t%0, %L1\t%@ movhi";
+    case 4:
+      return "strh%?\t%1, %0\t%@ movhi";
+    case 5:
+      return "ldrh%?\t%0, %1\t%@ movhi";
+    case 6:
+    case 7:
+      return "vmov%?\t%0, %1\t%@ int";
+    case 8:
+      return "vmov%?.f32\t%0, %1\t%@ int";
+    default:
+      gcc_unreachable ();
+    }
+}
+ [(set_attr "predicable" "yes")
+  (set_attr "predicable_short_it"
+   "yes, no, yes, no, no, no, no, no, no")
+  (set_attr "type"
+   "mov_reg, mov_imm, mov_imm, mov_imm, store1, load1,\
+    f_mcr, f_mrc, fmov")
+  (set_attr "pool_range" "*, *, *, *, *, 4094, *, *, *")
+  (set_attr "neg_pool_range" "*, *, *, *, *, 250, *, *, *")
+  (set_attr "length" "2, 4, 2, 4, 4, 4, 4, 4, 4")]
+)
+
 ;; SImode moves
 ;; ??? For now do not allow loading constants into vfp regs.  This causes
 ;; problems because small constants get converted into adds.
diff --git a/gcc/testsuite/gcc.target/arm/short-vfp-1.c b/gcc/testsuite/gcc.target/arm/short-vfp-1.c
new file mode 100644
index 0000000..d96c763
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/short-vfp-1.c
@@ -0,0 +1,45 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_vfp_ok }
+/* { dg-options "-mfpu=vfp" } */
+
+int
+test_sisf (float x)
+{
+  return (int)x;
+}
+
+short
+test_hisf (float x)
+{
+  return (short)x;
+}
+
+float
+test_sfsi (int x)
+{
+  return (float)x;
+}
+
+float
+test_sfhi (short x)
+{
+  return (float)x;
+}
+
+short
+test_hisi (int x)
+{
+  return (short)x;
+}
+
+int
+test_sihi (short x)
+{
+  return (int)x;
+}
+
+/* {dg-final { scan-assembler-times {vcvt\.s32\.f32\ts[0-9]+,s[0-9]+} 2 }} */
+/* {dg-final { scan-assembler-times {vcvt\.f32\.s32\ts[0-9]+,s[0-9]+} 2 }} */
+/* {dg-final { scan-assembler-times {vmov\tr[0-9]+,s[0-9]+} 2 }} */
+/* {dg-final { scan-assembler-times {vmov\ts[0-9]+,r[0-9]+} 2 }} */
+/* {dg-final { scan-assembler-times {sxth\tr[0-9]+,r[0-9]+} 2 }} */
-- 
2.1.4


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 6/17][ARM] Add data processing intrinsics for float16_t.
  2016-05-17 14:20 [PATCH 0/17][ARM] ARMv8.2-A and FP16 extension support Matthew Wahab
                   ` (4 preceding siblings ...)
  2016-05-17 14:29 ` [PATCH 5/17][ARM] Enable HI mode moves for floating point values Matthew Wahab
@ 2016-05-17 14:32 ` Matthew Wahab
  2016-07-27 13:59   ` Ramana Radhakrishnan
  2016-05-17 14:34 ` [PATCH 7/17][ARM] Add FP16 data movement instructions Matthew Wahab
                   ` (10 subsequent siblings)
  16 siblings, 1 reply; 73+ messages in thread
From: Matthew Wahab @ 2016-05-17 14:32 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 3617 bytes --]

The ACLE specifies a number of intrinsics for manipulating vectors
holding values in most of the integer and floating point type. These
include 16-bit integer types but not 16-bit floating point even though
the same instruction is used for both.

A future version of the ACLE extends the data processing intrinscs to
the 16-bit floating point types, making the intrinsics available
under the same conditions as the ARM __fp16 type.

This patch adds the new intrinsics:
  vbsl_f16, vbslq_f16, vdup_n_f16, vdupq_n_f16, vdup_lane_f16,
  vdupq_lane_f16, vext_f16, vextq_f16, vmov_n_f16, vmovq_n_f16,
  vrev64_f16, vrev64q_f16, vtrn_f16, vtrnq_f16, vuzp_f16, vuzpq_f16,
  vzip_f16, vzipq_f16.

This patch also updates the advsimd-intrinsics testsuite to test the f16
variants for ARM targets. These intrinsics are only implemented in the
ARM target so the tests are disabled for AArch64 using an extra
condition on a new convenience macro FP16_SUPPORTED. This patch also
disables, for the ARM target, the testsuite defined macro vdup_n_f16 as
it is no longer needed.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator. Also tested for aarch64-none-elf with the
advsimd-intrinsics testsuite using an ARMv8.2-A emulator.

Ok for trunk?
Matthew

2016-05-17  Matthew Wahab  <matthew.wahab@arm.com>

	* config/arm/arm.c (arm_evpc_neon_vuzp): Add support for V8HF and
	V4HF modes.
	(arm_evpc_neon_vzip): Likewise.
	(arm_evpc_neon_vrev): Likewise.
	(arm_evpc_neon_vtrn): Likewise.
	(arm_evpc_neon_vext): Likewise.
	* config/arm/arm_neon.h (vbsl_f16): New.
	(vbslq_f16): New.
	(vdup_n_f16): New.
	(vdupq_n_f16): New.
	(vdup_lane_f16): New.
	(vdupq_lane_f16): New.
	(vext_f16): New.
	(vextq_f16): New.
	(vmov_n_f16): New.
	(vmovq_n_f16): New.
	(vrev64_f16): New.
	(vrev64q_f16): New.
	(vtrn_f16): New.
	(vtrnq_f16): New.
	(vuzp_f16): New.
	(vuzpq_f16): New.
	(vzip_f16): New.
	(vzipq_f16): New.
	* config/arm/arm_neon_buillins.def (vdup_n): New (v8hf, v4hf variants).
	(vdup_lane): New (v8hf, v4hf variants).
	(vext): New (v8hf, v4hf variants).
	(vbsl): New (v8hf, v4hf variants).
	* config/arm/iterators.md (VDQWH): New.
	(VH): New.
	(V_double_vector_mode): Add V8HF and V4HF.  Fix white-space.
	(Scalar_mul_8_16): Fix white-space.
	(Is_d_reg): Add V4HF and V8HF.
	* config/arm/neon.md (neon_vdup_lane<mode>_internal): New.
	(neon_vdup_lane<mode>): New.
	(neon_vtrn<mode>_internal): Replace VDQW with VDQWH.
	(*neon_vtrn<mode>_insn): Likewise.
	(neon_vzip<mode>_internal): Likewise. Also fix white-space.
	(*neon_vzip<mode>_insn): Likewise
	(neon_vuzp<mode>_internal): Likewise.
	(*neon_vuzp<mode>_insn): Likewise
	* config/arm/vec-common.md (vec_perm_const<mode>): New.

testsuite/
2016-05-17  Matthew Wahab  <matthew.wahab@arm.com>

	* gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
	(FP16_SUPPORTED): New
	(vdup_n_f16): Disable for non-AArch64 targets.
	* gcc.target/aarch64/advsimd-intrinsics/vbsl.c: Add __fp16 tests,
	conditional on FP16_SUPPORTED.
	* gcc.target/aarch64/advsimd-intrinsics/vdup-vmov.c: Likewise.
	* gcc.target/aarch64/advsimd-intrinsics/vdup_lane.c: Likewise.
	* gcc.target/aarch64/advsimd-intrinsics/vext.c: Likewise.
	* gcc.target/aarch64/advsimd-intrinsics/vrev.c: Likewise.
	* gcc.target/aarch64/advsimd-intrinsics/vshuffle.inc: Add support
	for testing __fp16.
	* gcc.target/aarch64/advsimd-intrinsics/vtrn.c: Add __fp16 tests,
	conditional on FP16_SUPPORTED.
	* gcc.target/aarch64/advsimd-intrinsics/vuzp.c: Likewise.
	* gcc.target/aarch64/advsimd-intrinsics/vzip.c: Likewise.


[-- Attachment #2: 0006-PATCH-6-17-ARM-Add-data-processing-intrinsics-for-fl.patch --]
[-- Type: text/x-patch, Size: 52221 bytes --]

From 08c5cf4b5c6c846a4f62b6ad8776f2388b135e55 Mon Sep 17 00:00:00 2001
From: Matthew Wahab <matthew.wahab@arm.com>
Date: Thu, 7 Apr 2016 14:48:29 +0100
Subject: [PATCH 06/17] [PATCH 6/17][ARM] Add data processing intrinsics for
 float16_t.

2016-05-17  Matthew Wahab  <matthew.wahab@arm.com>

	* config/arm/arm.c (arm_evpc_neon_vuzp): Add support for V8HF and
	V4HF modes.
	(arm_evpc_neon_vtrn): Likewise.
	(arm_evpc_neon_vrev): Likewise.
	(arm_evpc_neon_vext): Likewise.
	* config/arm/arm_neon.h (vbsl_f16): New.
	(vbslq_f16): New.
	(vdup_n_f16): New.
	(vdupq_n_f16): New.
	(vdup_lane_f16): New.
	(vdupq_lane_f16): New.
	(vext_f16): New.
	(vextq_f16): New.
	(vmov_n_f16): New.
	(vmovq_n_f16): New.
	(vrev64_f16): New.
	(vrev64q_f16): New.
	(vtrn_f16): New.
	(vtrnq_f16): New.
	(vuzp_f16): New.
	(vuzpq_f16): New.
	(vzip_f16): New.
	(vzipq_f16): New.
	* config/arm/arm_neon_buillins.def (vdup_n): New (v8hf, v4hf variants).
	(vdup_lane): New (v8hf, v4hf variants).
	(vext): New (v8hf, v4hf variants).
	(vbsl): New (v8hf, v4hf variants).
	* config/arm/iterators.md (VDQWH): New.
	(VH): New.
	(V_double_vector_mode): Add V8HF and V4HF.  Fix white-space.
	(Scalar_mul_8_16): Fix white-space.
	(Is_d_reg): Add V4HF and V8HF.
	* config/arm/neon.md (neon_vdup_lane<mode>_internal): New.
	(neon_vdup_lane<mode>): New.
	(neon_vtrn<mode>_internal): Replace VDQW with VDQWH.
	(*neon_vtrn<mode>_insn): Likewise.
	(neon_vzip<mode>_internal): Likewise. Also fix white-space.
	(*neon_vzip<mode>_insn): Likewise
	(neon_vuzp<mode>_internal): Likewise.
	(*neon_vuzp<mode>_insn): Likewise
	* config/arm/vec-common.md (vec_perm_const<mode>): New.

testsuite/
2016-05-17  Matthew Wahab  <matthew.wahab@arm.com>

	* gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
	(FP16_SUPPORTED): New
	(expected-hfloat-16x4): Make conditional on __fp16 support.
	(expected-hfloat-16x8): Likewise.
	(vdup_n_f16): Disable for non-AArch64 targets.
	* gcc.target/aarch64/advsimd-intrinsics/vbsl.c: Add __fp16 tests,
	conditional on FP16_SUPPORTED.
	* gcc.target/aarch64/advsimd-intrinsics/vdup-vmov.c: Likewise.
	* gcc.target/aarch64/advsimd-intrinsics/vdup_lane.c: Likewise.
	* gcc.target/aarch64/advsimd-intrinsics/vext.c: Likewise.
	* gcc.target/aarch64/advsimd-intrinsics/vrev.c: Likewise.
	* gcc.target/aarch64/advsimd-intrinsics/vshuffle.inc: Add support
	for testing __fp16.
	* gcc.target/aarch64/advsimd-intrinsics/vtrn.c: Add __fp16 tests,
	conditional on FP16_SUPPORTED.
	* gcc.target/aarch64/advsimd-intrinsics/vuzp.c: Likewise.
	* gcc.target/aarch64/advsimd-intrinsics/vzip.c: Likewise.
---
 gcc/config/arm/arm.c                               |  10 ++
 gcc/config/arm/arm_neon.h                          | 175 +++++++++++++++++++++
 gcc/config/arm/arm_neon_builtins.def               |   4 +
 gcc/config/arm/iterators.md                        |  26 +--
 gcc/config/arm/neon.md                             | 115 +++++++++-----
 gcc/config/arm/vec-common.md                       |  14 ++
 .../aarch64/advsimd-intrinsics/arm-neon-ref.h      |  13 +-
 .../gcc.target/aarch64/advsimd-intrinsics/vbsl.c   |  28 ++++
 .../aarch64/advsimd-intrinsics/vdup-vmov.c         |  75 +++++++++
 .../aarch64/advsimd-intrinsics/vdup_lane.c         |  23 +++
 .../gcc.target/aarch64/advsimd-intrinsics/vext.c   |  30 ++++
 .../gcc.target/aarch64/advsimd-intrinsics/vrev.c   |  20 +++
 .../aarch64/advsimd-intrinsics/vshuffle.inc        |  42 ++++-
 .../gcc.target/aarch64/advsimd-intrinsics/vtrn.c   |  20 +++
 .../gcc.target/aarch64/advsimd-intrinsics/vuzp.c   |  20 +++
 .../gcc.target/aarch64/advsimd-intrinsics/vzip.c   |  20 +++
 16 files changed, 586 insertions(+), 49 deletions(-)

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 26a8a48..6892040 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -28420,6 +28420,8 @@ arm_evpc_neon_vuzp (struct expand_vec_perm_d *d)
     case V8QImode:  gen = gen_neon_vuzpv8qi_internal;  break;
     case V8HImode:  gen = gen_neon_vuzpv8hi_internal;  break;
     case V4HImode:  gen = gen_neon_vuzpv4hi_internal;  break;
+    case V8HFmode:  gen = gen_neon_vuzpv8hf_internal;  break;
+    case V4HFmode:  gen = gen_neon_vuzpv4hf_internal;  break;
     case V4SImode:  gen = gen_neon_vuzpv4si_internal;  break;
     case V2SImode:  gen = gen_neon_vuzpv2si_internal;  break;
     case V2SFmode:  gen = gen_neon_vuzpv2sf_internal;  break;
@@ -28493,6 +28495,8 @@ arm_evpc_neon_vzip (struct expand_vec_perm_d *d)
     case V8QImode:  gen = gen_neon_vzipv8qi_internal;  break;
     case V8HImode:  gen = gen_neon_vzipv8hi_internal;  break;
     case V4HImode:  gen = gen_neon_vzipv4hi_internal;  break;
+    case V8HFmode:  gen = gen_neon_vzipv8hf_internal;  break;
+    case V4HFmode:  gen = gen_neon_vzipv4hf_internal;  break;
     case V4SImode:  gen = gen_neon_vzipv4si_internal;  break;
     case V2SImode:  gen = gen_neon_vzipv2si_internal;  break;
     case V2SFmode:  gen = gen_neon_vzipv2sf_internal;  break;
@@ -28545,6 +28549,8 @@ arm_evpc_neon_vrev (struct expand_vec_perm_d *d)
 	case V8QImode:  gen = gen_neon_vrev32v8qi;  break;
 	case V8HImode:  gen = gen_neon_vrev64v8hi;  break;
 	case V4HImode:  gen = gen_neon_vrev64v4hi;  break;
+	case V8HFmode:  gen = gen_neon_vrev64v8hf;  break;
+	case V4HFmode:  gen = gen_neon_vrev64v4hf;  break;
 	default:
 	  return false;
 	}
@@ -28628,6 +28634,8 @@ arm_evpc_neon_vtrn (struct expand_vec_perm_d *d)
     case V8QImode:  gen = gen_neon_vtrnv8qi_internal;  break;
     case V8HImode:  gen = gen_neon_vtrnv8hi_internal;  break;
     case V4HImode:  gen = gen_neon_vtrnv4hi_internal;  break;
+    case V8HFmode:  gen = gen_neon_vtrnv8hf_internal;  break;
+    case V4HFmode:  gen = gen_neon_vtrnv4hf_internal;  break;
     case V4SImode:  gen = gen_neon_vtrnv4si_internal;  break;
     case V2SImode:  gen = gen_neon_vtrnv2si_internal;  break;
     case V2SFmode:  gen = gen_neon_vtrnv2sf_internal;  break;
@@ -28703,6 +28711,8 @@ arm_evpc_neon_vext (struct expand_vec_perm_d *d)
     case V8HImode: gen = gen_neon_vextv8hi; break;
     case V2SImode: gen = gen_neon_vextv2si; break;
     case V4SImode: gen = gen_neon_vextv4si; break;
+    case V4HFmode: gen = gen_neon_vextv4hf; break;
+    case V8HFmode: gen = gen_neon_vextv8hf; break;
     case V2SFmode: gen = gen_neon_vextv2sf; break;
     case V4SFmode: gen = gen_neon_vextv4sf; break;
     case V2DImode: gen = gen_neon_vextv2di; break;
diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 07503d7..5b433b4 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -14830,6 +14830,181 @@ vmull_high_p64 (poly64x2_t __a, poly64x2_t __b)
 
 #pragma GCC pop_options
 
+  /* Half-precision data processing intrinsics.  */
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vbsl_f16 (uint16x4_t __a, float16x4_t __b, float16x4_t __c)
+{
+  return __builtin_neon_vbslv4hf ((int16x4_t)__a, __b, __c);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vbslq_f16 (uint16x8_t __a, float16x8_t __b, float16x8_t __c)
+{
+  return __builtin_neon_vbslv8hf ((int16x8_t)__a, __b, __c);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vdup_n_f16 (float16_t __a)
+{
+  return __builtin_neon_vdup_nv4hf (__a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vdupq_n_f16 (float16_t __a)
+{
+  return __builtin_neon_vdup_nv8hf (__a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vdup_lane_f16 (float16x4_t __a, const int __b)
+{
+  return __builtin_neon_vdup_lanev4hf (__a, __b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vdupq_lane_f16 (float16x4_t __a, const int __b)
+{
+  return __builtin_neon_vdup_lanev8hf (__a, __b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vext_f16 (float16x4_t __a, float16x4_t __b, const int __c)
+{
+  return __builtin_neon_vextv4hf (__a, __b, __c);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vextq_f16 (float16x8_t __a, float16x8_t __b, const int __c)
+{
+  return __builtin_neon_vextv8hf (__a, __b, __c);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vmov_n_f16 (float16_t __a)
+{
+  return __builtin_neon_vdup_nv4hf (__a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vmovq_n_f16 (float16_t __a)
+{
+  return __builtin_neon_vdup_nv8hf (__a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vrev64_f16 (float16x4_t __a)
+{
+  return (float16x4_t)__builtin_shuffle (__a, (uint16x4_t){ 3, 2, 1, 0 });
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vrev64q_f16 (float16x8_t __a)
+{
+  return
+    (float16x8_t)__builtin_shuffle (__a,
+				    (uint16x8_t){ 3, 2, 1, 0, 7, 6, 5, 4 });
+}
+
+__extension__ static __inline float16x4x2_t __attribute__ ((__always_inline__))
+vtrn_f16 (float16x4_t __a, float16x4_t __b)
+{
+  float16x4x2_t __rv;
+#ifdef __ARM_BIG_ENDIAN
+  __rv.val[0] = __builtin_shuffle (__a, __b, (uint16x4_t){ 5, 1, 7, 3 });
+  __rv.val[1] = __builtin_shuffle (__a, __b, (uint16x4_t){ 4, 0, 6, 2 });
+#else
+  __rv.val[0] = __builtin_shuffle (__a, __b, (uint16x4_t){ 0, 4, 2, 6 });
+  __rv.val[1] = __builtin_shuffle (__a, __b, (uint16x4_t){ 1, 5, 3, 7 });
+#endif
+  return __rv;
+}
+
+__extension__ static __inline float16x8x2_t __attribute__ ((__always_inline__))
+vtrnq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  float16x8x2_t __rv;
+#ifdef __ARM_BIG_ENDIAN
+  __rv.val[0] = __builtin_shuffle (__a, __b,
+				   (uint16x8_t){ 9, 1, 11, 3, 13, 5, 15, 7 });
+  __rv.val[1] = __builtin_shuffle (__a, __b,
+				   (uint16x8_t){ 8, 0, 10, 2, 12, 4, 14, 6 });
+#else
+  __rv.val[0] = __builtin_shuffle (__a, __b,
+				   (uint16x8_t){ 0, 8, 2, 10, 4, 12, 6, 14 });
+  __rv.val[1] = __builtin_shuffle (__a, __b,
+				   (uint16x8_t){ 1, 9, 3, 11, 5, 13, 7, 15 });
+#endif
+  return __rv;
+}
+
+__extension__ static __inline float16x4x2_t __attribute__ ((__always_inline__))
+vuzp_f16 (float16x4_t __a, float16x4_t __b)
+{
+  float16x4x2_t __rv;
+#ifdef __ARM_BIG_ENDIAN
+  __rv.val[0] = __builtin_shuffle (__a, __b, (uint16x4_t){ 5, 7, 1, 3 });
+  __rv.val[1] = __builtin_shuffle (__a, __b, (uint16x4_t){ 4, 6, 0, 2 });
+#else
+  __rv.val[0] = __builtin_shuffle (__a, __b, (uint16x4_t){ 0, 2, 4, 6 });
+  __rv.val[1] = __builtin_shuffle (__a, __b, (uint16x4_t){ 1, 3, 5, 7 });
+#endif
+  return __rv;
+}
+
+__extension__ static __inline float16x8x2_t __attribute__ ((__always_inline__))
+vuzpq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  float16x8x2_t __rv;
+#ifdef __ARM_BIG_ENDIAN
+  __rv.val[0] = __builtin_shuffle (__a, __b, (uint16x8_t)
+				   { 5, 7, 1, 3, 13, 15, 9, 11 });
+  __rv.val[1] = __builtin_shuffle (__a, __b, (uint16x8_t)
+				   { 4, 6, 0, 2, 12, 14, 8, 10 });
+#else
+  __rv.val[0] = __builtin_shuffle (__a, __b,
+				   (uint16x8_t){ 0, 2, 4, 6, 8, 10, 12, 14 });
+  __rv.val[1] = __builtin_shuffle (__a, __b,
+				   (uint16x8_t){ 1, 3, 5, 7, 9, 11, 13, 15 });
+#endif
+  return __rv;
+}
+
+__extension__ static __inline float16x4x2_t __attribute__ ((__always_inline__))
+vzip_f16 (float16x4_t __a, float16x4_t __b)
+{
+  float16x4x2_t __rv;
+#ifdef __ARM_BIG_ENDIAN
+  __rv.val[0] = __builtin_shuffle (__a, __b, (uint16x4_t){ 6, 2, 7, 3 });
+  __rv.val[1] = __builtin_shuffle (__a, __b, (uint16x4_t){ 4, 0, 5, 1 });
+#else
+  __rv.val[0] = __builtin_shuffle (__a, __b, (uint16x4_t){ 0, 4, 1, 5 });
+  __rv.val[1] = __builtin_shuffle (__a, __b, (uint16x4_t){ 2, 6, 3, 7 });
+#endif
+  return __rv;
+}
+
+__extension__ static __inline float16x8x2_t __attribute__ ((__always_inline__))
+vzipq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  float16x8x2_t __rv;
+#ifdef __ARM_BIG_ENDIAN
+  __rv.val[0] = __builtin_shuffle (__a, __b, (uint16x8_t)
+				   { 10, 2, 11, 3, 8, 0, 9, 1 });
+  __rv.val[1] = __builtin_shuffle (__a, __b, (uint16x8_t)
+				   { 14, 6, 15, 7, 12, 4, 13, 5 });
+#else
+  __rv.val[0] = __builtin_shuffle (__a, __b,
+				   (uint16x8_t){ 0, 8, 1, 9, 2, 10, 3, 11 });
+  __rv.val[1] = __builtin_shuffle (__a, __b,
+				   (uint16x8_t){ 4, 12, 5, 13, 6, 14, 7, 15 });
+#endif
+  return __rv;
+}
+
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/gcc/config/arm/arm_neon_builtins.def b/gcc/config/arm/arm_neon_builtins.def
index d9fac78..a4ba516 100644
--- a/gcc/config/arm/arm_neon_builtins.def
+++ b/gcc/config/arm/arm_neon_builtins.def
@@ -166,8 +166,10 @@ VAR10 (SETLANE, vset_lane,
 VAR5 (UNOP, vcreate, v8qi, v4hi, v2si, v2sf, di)
 VAR10 (UNOP, vdup_n,
 	 v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di)
+VAR2 (UNOP, vdup_n, v8hf, v4hf)
 VAR10 (GETLANE, vdup_lane,
 	 v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di)
+VAR2 (GETLANE, vdup_lane, v8hf, v4hf)
 VAR6 (COMBINE, vcombine, v8qi, v4hi, v4hf, v2si, v2sf, di)
 VAR6 (UNOP, vget_high, v16qi, v8hi, v8hf, v4si, v4sf, v2di)
 VAR6 (UNOP, vget_low, v16qi, v8hi, v8hf, v4si, v4sf, v2di)
@@ -197,6 +199,7 @@ VAR2 (MAC_N, vmlslu_n, v4hi, v2si)
 VAR2 (MAC_N, vqdmlsl_n, v4hi, v2si)
 VAR10 (SETLANE, vext,
 	 v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di)
+VAR2 (SETLANE, vext, v8hf, v4hf)
 VAR8 (UNOP, vrev64, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf)
 VAR4 (UNOP, vrev32, v8qi, v4hi, v16qi, v8hi)
 VAR2 (UNOP, vrev16, v8qi, v16qi)
@@ -208,6 +211,7 @@ VAR1 (UNOP, vcvtv4sf, v4hf)
 VAR1 (UNOP, vcvtv4hf, v4sf)
 VAR10 (TERNOP, vbsl,
 	 v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di)
+VAR2 (TERNOP, vbsl, v8hf, v4hf)
 VAR2 (UNOP, copysignf, v2sf, v4sf)
 VAR2 (UNOP, vrintn, v2sf, v4sf)
 VAR2 (UNOP, vrinta, v2sf, v4sf)
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index aba1023..3f9d9e4 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -119,6 +119,10 @@
 ;; All supported vector modes (except those with 64-bit integer elements).
 (define_mode_iterator VDQW [V8QI V16QI V4HI V8HI V2SI V4SI V2SF V4SF])
 
+;; All supported vector modes including 16-bit float modes.
+(define_mode_iterator VDQWH [V8QI V16QI V4HI V8HI V2SI V4SI V2SF V4SF
+			     V8HF V4HF])
+
 ;; Supported integer vector modes (not 64 bit elements).
 (define_mode_iterator VDQIW [V8QI V16QI V4HI V8HI V2SI V4SI])
 
@@ -174,6 +178,9 @@
 ;; Modes with 8-bit, 16-bit and 32-bit elements.
 (define_mode_iterator VU [V16QI V8HI V4SI])
 
+;; Vector modes for 16-bit floating-point support.
+(define_mode_iterator VH [V8HF V4HF])
+
 ;; Iterators used for fixed-point support.
 (define_mode_iterator FIXED [QQ HQ SQ UQQ UHQ USQ HA SA UHA USA])
 
@@ -475,9 +482,10 @@
 ;; Used for neon_vdup_lane, where the second operand is double-sized
 ;; even when the first one is quad.
 (define_mode_attr V_double_vector_mode [(V16QI "V8QI") (V8HI "V4HI")
-                                        (V4SI "V2SI") (V4SF "V2SF")
-                                        (V8QI "V8QI") (V4HI "V4HI")
-                                        (V2SI "V2SI") (V2SF "V2SF")])
+					(V4SI "V2SI") (V4SF "V2SF")
+					(V8QI "V8QI") (V4HI "V4HI")
+					(V2SI "V2SI") (V2SF "V2SF")
+					(V8HF "V4HF") (V4HF "V4HF")])
 
 ;; Mode of result of comparison operations (and bit-select operand 1).
 (define_mode_attr V_cmp_result [(V8QI "V8QI") (V16QI "V16QI")
@@ -582,17 +590,17 @@
                  (DI "false") (V2DI "false")])
 
 (define_mode_attr Scalar_mul_8_16 [(V8QI "true") (V16QI "true")
-                   (V4HI "true") (V8HI "true")
-                   (V2SI "false") (V4SI "false")
-                   (V2SF "false") (V4SF "false")
-                   (DI "false") (V2DI "false")])
-
+				   (V4HI "true") (V8HI "true")
+				   (V2SI "false") (V4SI "false")
+				   (V2SF "false") (V4SF "false")
+				   (DI "false") (V2DI "false")])
 
 (define_mode_attr Is_d_reg [(V8QI "true") (V16QI "false")
                             (V4HI "true") (V8HI  "false")
                             (V2SI "true") (V4SI  "false")
                             (V2SF "true") (V4SF  "false")
-                            (DI   "true") (V2DI  "false")])
+                            (DI   "true") (V2DI  "false")
+			    (V4HF "true") (V8HF  "false")])
 
 (define_mode_attr V_mode_nunits [(V8QI "8") (V16QI "16")
 				 (V4HF "4") (V8HF "8")
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 6b4896d..5fcc991 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -3045,6 +3045,28 @@ if (BYTES_BIG_ENDIAN)
   [(set_attr "type" "neon_dup<q>")]
 )
 
+(define_insn "neon_vdup_lane<mode>_internal"
+ [(set (match_operand:VH 0 "s_register_operand" "=w")
+   (vec_duplicate:VH
+    (vec_select:<V_elem>
+     (match_operand:<V_double_vector_mode> 1 "s_register_operand" "w")
+     (parallel [(match_operand:SI 2 "immediate_operand" "i")]))))]
+ "TARGET_NEON && TARGET_FP16"
+{
+  if (BYTES_BIG_ENDIAN)
+    {
+      int elt = INTVAL (operands[2]);
+      elt = GET_MODE_NUNITS (<V_double_vector_mode>mode) - 1 - elt;
+      operands[2] = GEN_INT (elt);
+    }
+  if (<Is_d_reg>)
+    return "vdup.<V_sz_elem>\t%P0, %P1[%c2]";
+  else
+    return "vdup.<V_sz_elem>\t%q0, %P1[%c2]";
+}
+  [(set_attr "type" "neon_dup<q>")]
+)
+
 (define_expand "neon_vdup_lane<mode>"
   [(match_operand:VDQW 0 "s_register_operand" "=w")
    (match_operand:<V_double_vector_mode> 1 "s_register_operand" "w")
@@ -3064,6 +3086,25 @@ if (BYTES_BIG_ENDIAN)
     DONE;
 })
 
+(define_expand "neon_vdup_lane<mode>"
+  [(match_operand:VH 0 "s_register_operand")
+   (match_operand:<V_double_vector_mode> 1 "s_register_operand")
+   (match_operand:SI 2 "immediate_operand")]
+  "TARGET_NEON && TARGET_FP16"
+{
+  if (BYTES_BIG_ENDIAN)
+    {
+      unsigned int elt = INTVAL (operands[2]);
+      unsigned int reg_nelts
+	= 64 / GET_MODE_UNIT_BITSIZE (<V_double_vector_mode>mode);
+      elt ^= reg_nelts - 1;
+      operands[2] = GEN_INT (elt);
+    }
+  emit_insn (gen_neon_vdup_lane<mode>_internal (operands[0], operands[1],
+						operands[2]));
+  DONE;
+})
+
 ; Scalar index is ignored, since only zero is valid here.
 (define_expand "neon_vdup_lanedi"
   [(match_operand:DI 0 "s_register_operand" "=w")
@@ -4281,25 +4322,25 @@ if (BYTES_BIG_ENDIAN)
 
 (define_expand "neon_vtrn<mode>_internal"
   [(parallel
-    [(set (match_operand:VDQW 0 "s_register_operand" "")
-	  (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "")
-			(match_operand:VDQW 2 "s_register_operand" "")]
+    [(set (match_operand:VDQWH 0 "s_register_operand")
+	  (unspec:VDQWH [(match_operand:VDQWH 1 "s_register_operand")
+			 (match_operand:VDQWH 2 "s_register_operand")]
 	   UNSPEC_VTRN1))
-     (set (match_operand:VDQW 3 "s_register_operand" "")
-          (unspec:VDQW [(match_dup 1) (match_dup 2)] UNSPEC_VTRN2))])]
+     (set (match_operand:VDQWH 3 "s_register_operand")
+	  (unspec:VDQWH [(match_dup 1) (match_dup 2)] UNSPEC_VTRN2))])]
   "TARGET_NEON"
   ""
 )
 
 ;; Note: Different operand numbering to handle tied registers correctly.
 (define_insn "*neon_vtrn<mode>_insn"
-  [(set (match_operand:VDQW 0 "s_register_operand" "=&w")
-        (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "0")
-                      (match_operand:VDQW 3 "s_register_operand" "2")]
-                     UNSPEC_VTRN1))
-   (set (match_operand:VDQW 2 "s_register_operand" "=&w")
-         (unspec:VDQW [(match_dup 1) (match_dup 3)]
-                     UNSPEC_VTRN2))]
+  [(set (match_operand:VDQWH 0 "s_register_operand" "=&w")
+	(unspec:VDQWH [(match_operand:VDQWH 1 "s_register_operand" "0")
+		       (match_operand:VDQWH 3 "s_register_operand" "2")]
+	 UNSPEC_VTRN1))
+   (set (match_operand:VDQWH 2 "s_register_operand" "=&w")
+	(unspec:VDQWH [(match_dup 1) (match_dup 3)]
+	 UNSPEC_VTRN2))]
   "TARGET_NEON"
   "vtrn.<V_sz_elem>\t%<V_reg>0, %<V_reg>2"
   [(set_attr "type" "neon_permute<q>")]
@@ -4307,25 +4348,25 @@ if (BYTES_BIG_ENDIAN)
 
 (define_expand "neon_vzip<mode>_internal"
   [(parallel
-    [(set (match_operand:VDQW 0 "s_register_operand" "")
-	  (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "")
-	  	        (match_operand:VDQW 2 "s_register_operand" "")]
-		       UNSPEC_VZIP1))
-    (set (match_operand:VDQW 3 "s_register_operand" "")
-	 (unspec:VDQW [(match_dup 1) (match_dup 2)] UNSPEC_VZIP2))])]
+    [(set (match_operand:VDQWH 0 "s_register_operand")
+	  (unspec:VDQWH [(match_operand:VDQWH 1 "s_register_operand")
+			 (match_operand:VDQWH 2 "s_register_operand")]
+	   UNSPEC_VZIP1))
+    (set (match_operand:VDQWH 3 "s_register_operand")
+	 (unspec:VDQWH [(match_dup 1) (match_dup 2)] UNSPEC_VZIP2))])]
   "TARGET_NEON"
   ""
 )
 
 ;; Note: Different operand numbering to handle tied registers correctly.
 (define_insn "*neon_vzip<mode>_insn"
-  [(set (match_operand:VDQW 0 "s_register_operand" "=&w")
-        (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "0")
-                      (match_operand:VDQW 3 "s_register_operand" "2")]
-                     UNSPEC_VZIP1))
-   (set (match_operand:VDQW 2 "s_register_operand" "=&w")
-        (unspec:VDQW [(match_dup 1) (match_dup 3)]
-                     UNSPEC_VZIP2))]
+  [(set (match_operand:VDQWH 0 "s_register_operand" "=&w")
+	(unspec:VDQWH [(match_operand:VDQWH 1 "s_register_operand" "0")
+		       (match_operand:VDQWH 3 "s_register_operand" "2")]
+	 UNSPEC_VZIP1))
+   (set (match_operand:VDQWH 2 "s_register_operand" "=&w")
+	(unspec:VDQWH [(match_dup 1) (match_dup 3)]
+	 UNSPEC_VZIP2))]
   "TARGET_NEON"
   "vzip.<V_sz_elem>\t%<V_reg>0, %<V_reg>2"
   [(set_attr "type" "neon_zip<q>")]
@@ -4333,25 +4374,25 @@ if (BYTES_BIG_ENDIAN)
 
 (define_expand "neon_vuzp<mode>_internal"
   [(parallel
-    [(set (match_operand:VDQW 0 "s_register_operand" "")
-	  (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "")
-			(match_operand:VDQW 2 "s_register_operand" "")]
+    [(set (match_operand:VDQWH 0 "s_register_operand")
+	  (unspec:VDQWH [(match_operand:VDQWH 1 "s_register_operand")
+			(match_operand:VDQWH 2 "s_register_operand")]
 	   UNSPEC_VUZP1))
-     (set (match_operand:VDQW 3 "s_register_operand" "")
-	  (unspec:VDQW [(match_dup 1) (match_dup 2)] UNSPEC_VUZP2))])]
+     (set (match_operand:VDQWH 3 "s_register_operand" "")
+	  (unspec:VDQWH [(match_dup 1) (match_dup 2)] UNSPEC_VUZP2))])]
   "TARGET_NEON"
   ""
 )
 
 ;; Note: Different operand numbering to handle tied registers correctly.
 (define_insn "*neon_vuzp<mode>_insn"
-  [(set (match_operand:VDQW 0 "s_register_operand" "=&w")
-        (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "0")
-                      (match_operand:VDQW 3 "s_register_operand" "2")]
-                     UNSPEC_VUZP1))
-   (set (match_operand:VDQW 2 "s_register_operand" "=&w")
-        (unspec:VDQW [(match_dup 1) (match_dup 3)]
-                     UNSPEC_VUZP2))]
+  [(set (match_operand:VDQWH 0 "s_register_operand" "=&w")
+	(unspec:VDQWH [(match_operand:VDQWH 1 "s_register_operand" "0")
+		       (match_operand:VDQWH 3 "s_register_operand" "2")]
+	 UNSPEC_VUZP1))
+   (set (match_operand:VDQWH 2 "s_register_operand" "=&w")
+	(unspec:VDQWH [(match_dup 1) (match_dup 3)]
+	 UNSPEC_VUZP2))]
   "TARGET_NEON"
   "vuzp.<V_sz_elem>\t%<V_reg>0, %<V_reg>2"
   [(set_attr "type" "neon_zip<q>")]
diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
index ce98f71..645b01e 100644
--- a/gcc/config/arm/vec-common.md
+++ b/gcc/config/arm/vec-common.md
@@ -124,6 +124,20 @@
     FAIL;
 })
 
+(define_expand "vec_perm_const<mode>"
+  [(match_operand:VH 0 "s_register_operand")
+   (match_operand:VH 1 "s_register_operand")
+   (match_operand:VH 2 "s_register_operand")
+   (match_operand:<V_cmp_result> 3)]
+  "TARGET_NEON"
+{
+  if (arm_expand_vec_perm_const (operands[0], operands[1],
+				 operands[2], operands[3]))
+    DONE;
+  else
+    FAIL;
+})
+
 (define_expand "vec_perm<mode>"
   [(match_operand:VE 0 "s_register_operand" "")
    (match_operand:VE 1 "s_register_operand" "")
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
index 49fbd84..001e320 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
@@ -16,6 +16,15 @@ extern void *memset(void *, int, size_t);
 extern void *memcpy(void *, const void *, size_t);
 extern size_t strlen(const char *);
 
+/* Helper macro to select FP16 tests.  */
+#if (!defined (__aarch64__)						\
+     && (defined (__ARM_FP16_FORMAT_IEEE)				\
+	 || defined (__ARM_FP16_FORMAT_ALTERNATIVE)))
+#define FP16_SUPPORTED (1)
+#else
+#undef FP16_SUPPORTED
+#endif
+
 /* Various string construction helpers.  */
 
 /*
@@ -500,7 +509,9 @@ static void clean_results (void)
 /* Helpers to initialize vectors.  */
 #define VDUP(VAR, Q, T1, T2, W, N, V)			\
   VECT_VAR(VAR, T1, W, N) = vdup##Q##_n_##T2##W(V)
-#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+#if (defined (__aarch64__)						\
+     && (defined (__ARM_FP16_FORMAT_IEEE)				\
+	 || defined (__ARM_FP16_FORMAT_ALTERNATIVE)))
 /* Work around that there is no vdup_n_f16 intrinsic.  */
 #define vdup_n_f16(VAL)		\
   __extension__			\
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vbsl.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vbsl.c
index c4fdbb4..e9b3dfd 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vbsl.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vbsl.c
@@ -16,6 +16,10 @@ VECT_VAR_DECL(expected,uint,64,1) [] = { 0xfffffff1 };
 VECT_VAR_DECL(expected,poly,8,8) [] = { 0xf3, 0xf3, 0xf3, 0xf3,
 					0xf7, 0xf7, 0xf7, 0xf7 };
 VECT_VAR_DECL(expected,poly,16,4) [] = { 0xfff0, 0xfff0, 0xfff2, 0xfff2 };
+#if defined (FP16_SUPPORTED)
+VECT_VAR_DECL (expected, hfloat, 16, 4) [] = { 0xcc09, 0xcb89,
+					       0xcb09, 0xca89 };
+#endif
 VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0xc1800004, 0xc1700004 };
 VECT_VAR_DECL(expected,int,8,16) [] = { 0xf2, 0xf2, 0xf2, 0xf2,
 					0xf6, 0xf6, 0xf6, 0xf6,
@@ -43,6 +47,12 @@ VECT_VAR_DECL(expected,poly,8,16) [] = { 0xf3, 0xf3, 0xf3, 0xf3,
 					 0xf7, 0xf7, 0xf7, 0xf7 };
 VECT_VAR_DECL(expected,poly,16,8) [] = { 0xfff0, 0xfff0, 0xfff2, 0xfff2,
 					 0xfff4, 0xfff4, 0xfff6, 0xfff6 };
+#if defined (FP16_SUPPORTED)
+VECT_VAR_DECL (expected, hfloat, 16, 8) [] = { 0xcc09, 0xcb89,
+					       0xcb09, 0xca89,
+					       0xca09, 0xc989,
+					       0xc909, 0xc889 };
+#endif
 VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0xc1800001, 0xc1700001,
 					   0xc1600001, 0xc1500001 };
 
@@ -66,6 +76,10 @@ void exec_vbsl (void)
   clean_results ();
 
   TEST_MACRO_ALL_VARIANTS_2_5(VLOAD, vector, buffer);
+#if defined (FP16_SUPPORTED)
+  VLOAD(vector, buffer, , float, f, 16, 4);
+  VLOAD(vector, buffer, q, float, f, 16, 8);
+#endif
   VLOAD(vector, buffer, , float, f, 32, 2);
   VLOAD(vector, buffer, q, float, f, 32, 4);
 
@@ -80,6 +94,9 @@ void exec_vbsl (void)
   VDUP(vector2, , uint, u, 16, 4, 0xFFF2);
   VDUP(vector2, , uint, u, 32, 2, 0xFFFFFFF0);
   VDUP(vector2, , uint, u, 64, 1, 0xFFFFFFF3);
+#if defined (FP16_SUPPORTED)
+  VDUP(vector2, , float, f, 16, 4, -2.4f);   /* -2.4f is 0xC0CD.  */
+#endif
   VDUP(vector2, , float, f, 32, 2, -30.3f);
   VDUP(vector2, , poly, p, 8, 8, 0xF3);
   VDUP(vector2, , poly, p, 16, 4, 0xFFF2);
@@ -94,6 +111,9 @@ void exec_vbsl (void)
   VDUP(vector2, q, uint, u, 64, 2, 0xFFFFFFF3);
   VDUP(vector2, q, poly, p, 8, 16, 0xF3);
   VDUP(vector2, q, poly, p, 16, 8, 0xFFF2);
+#if defined (FP16_SUPPORTED)
+  VDUP(vector2, q, float, f, 16, 8, -2.4f);
+#endif
   VDUP(vector2, q, float, f, 32, 4, -30.4f);
 
   VDUP(vector_first, , uint, u, 8, 8, 0xF4);
@@ -111,10 +131,18 @@ void exec_vbsl (void)
   TEST_VBSL(uint, , poly, p, 16, 4);
   TEST_VBSL(uint, q, poly, p, 8, 16);
   TEST_VBSL(uint, q, poly, p, 16, 8);
+#if defined (FP16_SUPPORTED)
+  TEST_VBSL(uint, , float, f, 16, 4);
+  TEST_VBSL(uint, q, float, f, 16, 8);
+#endif
   TEST_VBSL(uint, , float, f, 32, 2);
   TEST_VBSL(uint, q, float, f, 32, 4);
 
+#if defined (FP16_SUPPORTED)
+  CHECK_RESULTS (TEST_MSG, "");
+#else
   CHECK_RESULTS_NO_FP16 (TEST_MSG, "");
+#endif
 }
 
 int main (void)
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vdup-vmov.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vdup-vmov.c
index 22d45d5..aef4173 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vdup-vmov.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vdup-vmov.c
@@ -19,6 +19,10 @@ VECT_VAR_DECL(expected0,uint,64,1) [] = { 0xfffffffffffffff0 };
 VECT_VAR_DECL(expected0,poly,8,8) [] = { 0xf0, 0xf0, 0xf0, 0xf0,
 					 0xf0, 0xf0, 0xf0, 0xf0 };
 VECT_VAR_DECL(expected0,poly,16,4) [] = { 0xfff0, 0xfff0, 0xfff0, 0xfff0 };
+#if defined (FP16_SUPPORTED)
+VECT_VAR_DECL (expected0, hfloat, 16, 4) [] = { 0xcc00, 0xcc00,
+						0xcc00, 0xcc00 };
+#endif
 VECT_VAR_DECL(expected0,hfloat,32,2) [] = { 0xc1800000, 0xc1800000 };
 VECT_VAR_DECL(expected0,int,8,16) [] = { 0xf0, 0xf0, 0xf0, 0xf0,
 					 0xf0, 0xf0, 0xf0, 0xf0,
@@ -46,6 +50,12 @@ VECT_VAR_DECL(expected0,poly,8,16) [] = { 0xf0, 0xf0, 0xf0, 0xf0,
 					  0xf0, 0xf0, 0xf0, 0xf0 };
 VECT_VAR_DECL(expected0,poly,16,8) [] = { 0xfff0, 0xfff0, 0xfff0, 0xfff0,
 					  0xfff0, 0xfff0, 0xfff0, 0xfff0 };
+#if defined (FP16_SUPPORTED)
+VECT_VAR_DECL (expected0, hfloat, 16, 8) [] = { 0xcc00, 0xcc00,
+						0xcc00, 0xcc00,
+						0xcc00, 0xcc00,
+						0xcc00, 0xcc00 };
+#endif
 VECT_VAR_DECL(expected0,hfloat,32,4) [] = { 0xc1800000, 0xc1800000,
 					    0xc1800000, 0xc1800000 };
 
@@ -63,6 +73,10 @@ VECT_VAR_DECL(expected1,uint,64,1) [] = { 0xfffffffffffffff1 };
 VECT_VAR_DECL(expected1,poly,8,8) [] = { 0xf1, 0xf1, 0xf1, 0xf1,
 					 0xf1, 0xf1, 0xf1, 0xf1 };
 VECT_VAR_DECL(expected1,poly,16,4) [] = { 0xfff1, 0xfff1, 0xfff1, 0xfff1 };
+#if defined (FP16_SUPPORTED)
+VECT_VAR_DECL (expected1, hfloat, 16, 4) [] = { 0xcb80, 0xcb80,
+						0xcb80, 0xcb80 };
+#endif
 VECT_VAR_DECL(expected1,hfloat,32,2) [] = { 0xc1700000, 0xc1700000 };
 VECT_VAR_DECL(expected1,int,8,16) [] = { 0xf1, 0xf1, 0xf1, 0xf1,
 					 0xf1, 0xf1, 0xf1, 0xf1,
@@ -90,6 +104,12 @@ VECT_VAR_DECL(expected1,poly,8,16) [] = { 0xf1, 0xf1, 0xf1, 0xf1,
 					  0xf1, 0xf1, 0xf1, 0xf1 };
 VECT_VAR_DECL(expected1,poly,16,8) [] = { 0xfff1, 0xfff1, 0xfff1, 0xfff1,
 					  0xfff1, 0xfff1, 0xfff1, 0xfff1 };
+#if defined (FP16_SUPPORTED)
+VECT_VAR_DECL (expected1, hfloat, 16, 8) [] = { 0xcb80, 0xcb80,
+						0xcb80, 0xcb80,
+						0xcb80, 0xcb80,
+						0xcb80, 0xcb80 };
+#endif
 VECT_VAR_DECL(expected1,hfloat,32,4) [] = { 0xc1700000, 0xc1700000,
 					    0xc1700000, 0xc1700000 };
 
@@ -107,6 +127,10 @@ VECT_VAR_DECL(expected2,uint,64,1) [] = { 0xfffffffffffffff2 };
 VECT_VAR_DECL(expected2,poly,8,8) [] = { 0xf2, 0xf2, 0xf2, 0xf2,
 					 0xf2, 0xf2, 0xf2, 0xf2 };
 VECT_VAR_DECL(expected2,poly,16,4) [] = { 0xfff2, 0xfff2, 0xfff2, 0xfff2 };
+#if defined (FP16_SUPPORTED)
+VECT_VAR_DECL (expected2, hfloat, 16, 4) [] = { 0xcb00, 0xcb00,
+						0xcb00, 0xcb00 };
+#endif
 VECT_VAR_DECL(expected2,hfloat,32,2) [] = { 0xc1600000, 0xc1600000 };
 VECT_VAR_DECL(expected2,int,8,16) [] = { 0xf2, 0xf2, 0xf2, 0xf2,
 					 0xf2, 0xf2, 0xf2, 0xf2,
@@ -134,6 +158,12 @@ VECT_VAR_DECL(expected2,poly,8,16) [] = { 0xf2, 0xf2, 0xf2, 0xf2,
 					  0xf2, 0xf2, 0xf2, 0xf2 };
 VECT_VAR_DECL(expected2,poly,16,8) [] = { 0xfff2, 0xfff2, 0xfff2, 0xfff2,
 					  0xfff2, 0xfff2, 0xfff2, 0xfff2 };
+#if defined (FP16_SUPPORTED)
+VECT_VAR_DECL (expected2, hfloat, 16, 8) [] = { 0xcb00, 0xcb00,
+						0xcb00, 0xcb00,
+						0xcb00, 0xcb00,
+						0xcb00, 0xcb00 };
+#endif
 VECT_VAR_DECL(expected2,hfloat,32,4) [] = { 0xc1600000, 0xc1600000,
 					    0xc1600000, 0xc1600000 };
 
@@ -171,6 +201,9 @@ void exec_vdup_vmov (void)
     TEST_VDUP(, uint, u, 64, 1);
     TEST_VDUP(, poly, p, 8, 8);
     TEST_VDUP(, poly, p, 16, 4);
+#if defined (FP16_SUPPORTED)
+    TEST_VDUP(, float, f, 16, 4);
+#endif
     TEST_VDUP(, float, f, 32, 2);
 
     TEST_VDUP(q, int, s, 8, 16);
@@ -183,8 +216,26 @@ void exec_vdup_vmov (void)
     TEST_VDUP(q, uint, u, 64, 2);
     TEST_VDUP(q, poly, p, 8, 16);
     TEST_VDUP(q, poly, p, 16, 8);
+#if defined (FP16_SUPPORTED)
+    TEST_VDUP(q, float, f, 16, 8);
+#endif
     TEST_VDUP(q, float, f, 32, 4);
 
+#if defined (FP16_SUPPORTED)
+    switch (i) {
+    case 0:
+      CHECK_RESULTS_NAMED (TEST_MSG, expected0, "");
+      break;
+    case 1:
+      CHECK_RESULTS_NAMED (TEST_MSG, expected1, "");
+      break;
+    case 2:
+      CHECK_RESULTS_NAMED (TEST_MSG, expected2, "");
+      break;
+    default:
+      abort();
+    }
+#else
     switch (i) {
     case 0:
       CHECK_RESULTS_NAMED_NO_FP16 (TEST_MSG, expected0, "");
@@ -198,6 +249,7 @@ void exec_vdup_vmov (void)
     default:
       abort();
     }
+#endif
   }
 
   /* Do the same tests with vmov. Use the same expected results.  */
@@ -216,6 +268,9 @@ void exec_vdup_vmov (void)
     TEST_VMOV(, uint, u, 64, 1);
     TEST_VMOV(, poly, p, 8, 8);
     TEST_VMOV(, poly, p, 16, 4);
+#if defined (FP16_SUPPORTED)
+    TEST_VMOV(, float, f, 16, 4);
+#endif
     TEST_VMOV(, float, f, 32, 2);
 
     TEST_VMOV(q, int, s, 8, 16);
@@ -228,8 +283,26 @@ void exec_vdup_vmov (void)
     TEST_VMOV(q, uint, u, 64, 2);
     TEST_VMOV(q, poly, p, 8, 16);
     TEST_VMOV(q, poly, p, 16, 8);
+#if defined (FP16_SUPPORTED)
+    TEST_VMOV(q, float, f, 16, 8);
+#endif
     TEST_VMOV(q, float, f, 32, 4);
 
+#if defined (FP16_SUPPORTED)
+    switch (i) {
+    case 0:
+      CHECK_RESULTS_NAMED (TEST_MSG, expected0, "");
+      break;
+    case 1:
+      CHECK_RESULTS_NAMED (TEST_MSG, expected1, "");
+      break;
+    case 2:
+      CHECK_RESULTS_NAMED (TEST_MSG, expected2, "");
+      break;
+    default:
+      abort();
+    }
+#else
     switch (i) {
     case 0:
       CHECK_RESULTS_NAMED_NO_FP16 (TEST_MSG, expected0, "");
@@ -243,6 +316,8 @@ void exec_vdup_vmov (void)
     default:
       abort();
     }
+#endif
+
   }
 }
 
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vdup_lane.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vdup_lane.c
index ef708dc..c4b8f14 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vdup_lane.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vdup_lane.c
@@ -17,6 +17,10 @@ VECT_VAR_DECL(expected,poly,8,8) [] = { 0xf7, 0xf7, 0xf7, 0xf7,
 					0xf7, 0xf7, 0xf7, 0xf7 };
 VECT_VAR_DECL(expected,poly,16,4) [] = { 0xfff3, 0xfff3, 0xfff3, 0xfff3 };
 VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0xc1700000, 0xc1700000 };
+#if defined (FP16_SUPPORTED)
+VECT_VAR_DECL (expected, hfloat, 16, 4) [] = { 0xca80, 0xca80,
+					       0xca80, 0xca80 };
+#endif
 VECT_VAR_DECL(expected,int,8,16) [] = { 0xf2, 0xf2, 0xf2, 0xf2,
 					0xf2, 0xf2, 0xf2, 0xf2,
 					0xf2, 0xf2, 0xf2, 0xf2,
@@ -43,6 +47,12 @@ VECT_VAR_DECL(expected,poly,8,16) [] = { 0xf5, 0xf5, 0xf5, 0xf5,
 					 0xf5, 0xf5, 0xf5, 0xf5 };
 VECT_VAR_DECL(expected,poly,16,8) [] = { 0xfff1, 0xfff1, 0xfff1, 0xfff1,
 					 0xfff1, 0xfff1, 0xfff1, 0xfff1 };
+#if defined (FP16_SUPPORTED)
+VECT_VAR_DECL (expected, hfloat, 16, 8) [] = { 0xca80, 0xca80,
+					       0xca80, 0xca80,
+					       0xca80, 0xca80,
+					       0xca80, 0xca80 };
+#endif
 VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0xc1700000, 0xc1700000,
 					   0xc1700000, 0xc1700000 };
 
@@ -63,6 +73,9 @@ void exec_vdup_lane (void)
   clean_results ();
 
   TEST_MACRO_64BITS_VARIANTS_2_5(VLOAD, vector, buffer);
+#if defined (FP16_SUPPORTED)
+  VLOAD(vector, buffer, , float, f, 16, 4);
+#endif
   VLOAD(vector, buffer, , float, f, 32, 2);
 
   /* Choose lane arbitrarily.  */
@@ -76,6 +89,9 @@ void exec_vdup_lane (void)
   TEST_VDUP_LANE(, uint, u, 64, 1, 1, 0);
   TEST_VDUP_LANE(, poly, p, 8, 8, 8, 7);
   TEST_VDUP_LANE(, poly, p, 16, 4, 4, 3);
+#if defined (FP16_SUPPORTED)
+  TEST_VDUP_LANE(, float, f, 16, 4, 4, 3);
+#endif
   TEST_VDUP_LANE(, float, f, 32, 2, 2, 1);
 
   TEST_VDUP_LANE(q, int, s, 8, 16, 8, 2);
@@ -88,9 +104,16 @@ void exec_vdup_lane (void)
   TEST_VDUP_LANE(q, uint, u, 64, 2, 1, 0);
   TEST_VDUP_LANE(q, poly, p, 8, 16, 8, 5);
   TEST_VDUP_LANE(q, poly, p, 16, 8, 4, 1);
+#if defined (FP16_SUPPORTED)
+  TEST_VDUP_LANE(q, float, f, 16, 8, 4, 3);
+#endif
   TEST_VDUP_LANE(q, float, f, 32, 4, 2, 1);
 
+#if defined (FP16_SUPPORTED)
+  CHECK_RESULTS (TEST_MSG, "");
+#else
   CHECK_RESULTS_NO_FP16 (TEST_MSG, "");
+#endif
 }
 
 int main (void)
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vext.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vext.c
index 98f88a6..908294a 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vext.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vext.c
@@ -16,6 +16,10 @@ VECT_VAR_DECL(expected,uint,64,1) [] = { 0xfffffffffffffff0 };
 VECT_VAR_DECL(expected,poly,8,8) [] = { 0xf6, 0xf7, 0x55, 0x55,
 					0x55, 0x55, 0x55, 0x55 };
 VECT_VAR_DECL(expected,poly,16,4) [] = { 0xfff2, 0xfff3, 0x66, 0x66 };
+#if defined (FP16_SUPPORTED)
+VECT_VAR_DECL (expected, hfloat, 16, 4) [] = { 0xcb00, 0xca80,
+					       0x4b4d, 0x4b4d };
+#endif
 VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0xc1700000, 0x42066666 };
 VECT_VAR_DECL(expected,int,8,16) [] = { 0xfe, 0xff, 0x11, 0x11,
 					0x11, 0x11, 0x11, 0x11,
@@ -39,6 +43,12 @@ VECT_VAR_DECL(expected,poly,8,16) [] = { 0xfc, 0xfd, 0xfe, 0xff,
 					 0x55, 0x55, 0x55, 0x55 };
 VECT_VAR_DECL(expected,poly,16,8) [] = { 0xfff6, 0xfff7, 0x66, 0x66,
 					 0x66, 0x66, 0x66, 0x66 };
+#if defined (FP16_SUPPORTED)
+VECT_VAR_DECL (expected, hfloat, 16, 8) [] = { 0xc880, 0x4b4d,
+					       0x4b4d, 0x4b4d,
+					       0x4b4d, 0x4b4d,
+					       0x4b4d, 0x4b4d };
+#endif
 VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0xc1500000, 0x4204cccd,
 					   0x4204cccd, 0x4204cccd };
 
@@ -60,6 +70,10 @@ void exec_vext (void)
   clean_results ();
 
   TEST_MACRO_ALL_VARIANTS_2_5(VLOAD, vector1, buffer);
+#ifdef FP16_SUPPORTED
+  VLOAD(vector1, buffer, , float, f, 16, 4);
+  VLOAD(vector1, buffer, q, float, f, 16, 8);
+#endif
   VLOAD(vector1, buffer, , float, f, 32, 2);
   VLOAD(vector1, buffer, q, float, f, 32, 4);
 
@@ -74,6 +88,9 @@ void exec_vext (void)
   VDUP(vector2, , uint, u, 64, 1, 0x88);
   VDUP(vector2, , poly, p, 8, 8, 0x55);
   VDUP(vector2, , poly, p, 16, 4, 0x66);
+#if defined (FP16_SUPPORTED)
+  VDUP (vector2, , float, f, 16, 4, 14.6f);   /* 14.6f is 0x4b4d.  */
+#endif
   VDUP(vector2, , float, f, 32, 2, 33.6f);
 
   VDUP(vector2, q, int, s, 8, 16, 0x11);
@@ -86,6 +103,9 @@ void exec_vext (void)
   VDUP(vector2, q, uint, u, 64, 2, 0x88);
   VDUP(vector2, q, poly, p, 8, 16, 0x55);
   VDUP(vector2, q, poly, p, 16, 8, 0x66);
+#if defined (FP16_SUPPORTED)
+  VDUP (vector2, q, float, f, 16, 8, 14.6f);
+#endif
   VDUP(vector2, q, float, f, 32, 4, 33.2f);
 
   /* Choose arbitrary extract offsets.  */
@@ -99,6 +119,9 @@ void exec_vext (void)
   TEST_VEXT(, uint, u, 64, 1, 0);
   TEST_VEXT(, poly, p, 8, 8, 6);
   TEST_VEXT(, poly, p, 16, 4, 2);
+#if defined (FP16_SUPPORTED)
+  TEST_VEXT(, float, f, 16, 4, 2);
+#endif
   TEST_VEXT(, float, f, 32, 2, 1);
 
   TEST_VEXT(q, int, s, 8, 16, 14);
@@ -111,9 +134,16 @@ void exec_vext (void)
   TEST_VEXT(q, uint, u, 64, 2, 1);
   TEST_VEXT(q, poly, p, 8, 16, 12);
   TEST_VEXT(q, poly, p, 16, 8, 6);
+#if defined (FP16_SUPPORTED)
+  TEST_VEXT(q, float, f, 16, 8, 7);
+#endif
   TEST_VEXT(q, float, f, 32, 4, 3);
 
+#if defined (FP16_SUPPORTED)
+  CHECK_RESULTS (TEST_MSG, "");
+#else
   CHECK_RESULTS_NO_FP16 (TEST_MSG, "");
+#endif
 }
 
 int main (void)
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrev.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrev.c
index 3b574da..0c01318 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrev.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrev.c
@@ -63,6 +63,10 @@ VECT_VAR_DECL(expected_vrev64,uint,32,2) [] = { 0xfffffff1, 0xfffffff0 };
 VECT_VAR_DECL(expected_vrev64,poly,8,8) [] = { 0xf7, 0xf6, 0xf5, 0xf4,
 					       0xf3, 0xf2, 0xf1, 0xf0 };
 VECT_VAR_DECL(expected_vrev64,poly,16,4) [] = { 0xfff3, 0xfff2, 0xfff1, 0xfff0 };
+#if defined (FP16_SUPPORTED)
+VECT_VAR_DECL (expected_vrev64, hfloat, 16, 4) [] = { 0xca80, 0xcb00,
+						      0xcb80, 0xcc00 };
+#endif
 VECT_VAR_DECL(expected_vrev64,hfloat,32,2) [] = { 0xc1700000, 0xc1800000 };
 VECT_VAR_DECL(expected_vrev64,int,8,16) [] = { 0xf7, 0xf6, 0xf5, 0xf4,
 					       0xf3, 0xf2, 0xf1, 0xf0,
@@ -86,6 +90,12 @@ VECT_VAR_DECL(expected_vrev64,poly,8,16) [] = { 0xf7, 0xf6, 0xf5, 0xf4,
 						0xfb, 0xfa, 0xf9, 0xf8 };
 VECT_VAR_DECL(expected_vrev64,poly,16,8) [] = { 0xfff3, 0xfff2, 0xfff1, 0xfff0,
 						0xfff7, 0xfff6, 0xfff5, 0xfff4 };
+#if defined (FP16_SUPPORTED)
+VECT_VAR_DECL (expected_vrev64, hfloat, 16, 8) [] = { 0xca80, 0xcb00,
+						      0xcb80, 0xcc00,
+						      0xc880, 0xc900,
+						      0xc980, 0xca00 };
+#endif
 VECT_VAR_DECL(expected_vrev64,hfloat,32,4) [] = { 0xc1700000, 0xc1800000,
 						  0xc1500000, 0xc1600000 };
 
@@ -104,6 +114,10 @@ void exec_vrev (void)
 
   /* Initialize input "vector" from "buffer".  */
   TEST_MACRO_ALL_VARIANTS_2_5(VLOAD, vector, buffer);
+#if defined (FP16_SUPPORTED)
+  VLOAD (vector, buffer, , float, f, 16, 4);
+  VLOAD (vector, buffer, q, float, f, 16, 8);
+#endif
   VLOAD(vector, buffer, , float, f, 32, 2);
   VLOAD(vector, buffer, q, float, f, 32, 4);
 
@@ -187,6 +201,12 @@ void exec_vrev (void)
   CHECK(TEST_MSG, poly, 8, 16, PRIx8, expected_vrev64, "");
   CHECK(TEST_MSG, poly, 16, 8, PRIx16, expected_vrev64, "");
 
+#if defined (FP16_SUPPORTED)
+  TEST_VREV (, float, f, 16, 4, 64);
+  TEST_VREV (q, float, f, 16, 8, 64);
+  CHECK_FP(TEST_MSG, float, 16, 4, PRIx32, expected_vrev64, "");
+  CHECK_FP(TEST_MSG, float, 16, 8, PRIx32, expected_vrev64, "");
+#endif
   TEST_VREV(, float, f, 32, 2, 64);
   TEST_VREV(q, float, f, 32, 4, 64);
   CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected_vrev64, "");
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vshuffle.inc b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vshuffle.inc
index b55a205..ad5bf31 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vshuffle.inc
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vshuffle.inc
@@ -53,9 +53,17 @@ void FNNAME (INSN_NAME) (void)
   DECL_VSHUFFLE(float, 32, 4)
 
   DECL_ALL_VSHUFFLE();
+#if defined (FP16_SUPPORTED)
+  DECL_VSHUFFLE (float, 16, 4);
+  DECL_VSHUFFLE (float, 16, 8);
+#endif
 
   /* Initialize input "vector" from "buffer".  */
   TEST_MACRO_ALL_VARIANTS_2_5(VLOAD, vector1, buffer);
+#if defined (FP16_SUPPORTED)
+  VLOAD (vector1, buffer, , float, f, 16, 4);
+  VLOAD (vector1, buffer, q, float, f, 16, 8);
+#endif
   VLOAD(vector1, buffer, , float, f, 32, 2);
   VLOAD(vector1, buffer, q, float, f, 32, 4);
 
@@ -68,6 +76,9 @@ void FNNAME (INSN_NAME) (void)
   VDUP(vector2, , uint, u, 32, 2, 0x77);
   VDUP(vector2, , poly, p, 8, 8, 0x55);
   VDUP(vector2, , poly, p, 16, 4, 0x66);
+#if defined (FP16_SUPPORTED)
+  VDUP (vector2, , float, f, 16, 4, 14.6f);   /* 14.6f is 0x4b4d.  */
+#endif
   VDUP(vector2, , float, f, 32, 2, 33.6f);
 
   VDUP(vector2, q, int, s, 8, 16, 0x11);
@@ -78,8 +89,11 @@ void FNNAME (INSN_NAME) (void)
   VDUP(vector2, q, uint, u, 32, 4, 0x77);
   VDUP(vector2, q, poly, p, 8, 16, 0x55);
   VDUP(vector2, q, poly, p, 16, 8, 0x66);
+#if defined (FP16_SUPPORTED)
+  VDUP (vector2, q, float, f, 16, 8, 14.6f);
+#endif
   VDUP(vector2, q, float, f, 32, 4, 33.8f);
-  
+
 #define TEST_ALL_VSHUFFLE(INSN)				\
   TEST_VSHUFFLE(INSN, , int, s, 8, 8);			\
   TEST_VSHUFFLE(INSN, , int, s, 16, 4);			\
@@ -100,6 +114,10 @@ void FNNAME (INSN_NAME) (void)
   TEST_VSHUFFLE(INSN, q, poly, p, 16, 8);		\
   TEST_VSHUFFLE(INSN, q, float, f, 32, 4)
 
+#define TEST_VSHUFFLE_FP16(INSN)		\
+  TEST_VSHUFFLE(INSN, , float, f, 16, 4);	\
+  TEST_VSHUFFLE(INSN, q, float, f, 16, 8);
+
 #define TEST_ALL_EXTRA_CHUNKS()			\
   TEST_EXTRA_CHUNK(int, 8, 8, 1);		\
   TEST_EXTRA_CHUNK(int, 16, 4, 1);		\
@@ -143,17 +161,37 @@ void FNNAME (INSN_NAME) (void)
     CHECK(test_name, poly, 8, 16, PRIx8, EXPECTED, comment);		\
     CHECK(test_name, poly, 16, 8, PRIx16, EXPECTED, comment);		\
     CHECK_FP(test_name, float, 32, 4, PRIx32, EXPECTED, comment);	\
-  }									\
+  }
+
+#define CHECK_RESULTS_VSHUFFLE_FP16(test_name,EXPECTED,comment)		\
+  {									\
+    CHECK_FP (test_name, float, 16, 4, PRIx16, EXPECTED, comment);	\
+    CHECK_FP (test_name, float, 16, 8, PRIx16, EXPECTED, comment);	\
+  }
 
   clean_results ();
 
   /* Execute the tests.  */
   TEST_ALL_VSHUFFLE(INSN_NAME);
+#if defined (FP16_SUPPORTED)
+  TEST_VSHUFFLE_FP16 (INSN_NAME);
+#endif
 
   CHECK_RESULTS_VSHUFFLE (TEST_MSG, expected0, "(chunk 0)");
+#if defined (FP16_SUPPORTED)
+  CHECK_RESULTS_VSHUFFLE_FP16 (TEST_MSG, expected0, "(chunk 0)");
+#endif
 
   TEST_ALL_EXTRA_CHUNKS();
+#if defined (FP16_SUPPORTED)
+  TEST_EXTRA_CHUNK (float, 16, 4, 1);
+  TEST_EXTRA_CHUNK (float, 16, 8, 1);
+#endif
+
   CHECK_RESULTS_VSHUFFLE (TEST_MSG, expected1, "(chunk 1)");
+#if defined (FP16_SUPPORTED)
+  CHECK_RESULTS_VSHUFFLE_FP16 (TEST_MSG, expected1, "(chunk 1)");
+#endif
 }
 
 int main (void)
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vtrn.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vtrn.c
index 2c4a09c..ea2d8d8 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vtrn.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vtrn.c
@@ -15,6 +15,10 @@ VECT_VAR_DECL(expected0,uint,32,2) [] = { 0xfffffff0, 0xfffffff1 };
 VECT_VAR_DECL(expected0,poly,8,8) [] = { 0xf0, 0xf1, 0x55, 0x55,
 					 0xf2, 0xf3, 0x55, 0x55 };
 VECT_VAR_DECL(expected0,poly,16,4) [] = { 0xfff0, 0xfff1, 0x66, 0x66 };
+#if defined (FP16_SUPPORTED)
+VECT_VAR_DECL (expected0, hfloat, 16, 4) [] = { 0xcc00, 0xcb80,
+						0x4b4d, 0x4b4d };
+#endif
 VECT_VAR_DECL(expected0,hfloat,32,2) [] = { 0xc1800000, 0xc1700000 };
 VECT_VAR_DECL(expected0,int,8,16) [] = { 0xf0, 0xf1, 0x11, 0x11,
 					 0xf2, 0xf3, 0x11, 0x11,
@@ -36,6 +40,12 @@ VECT_VAR_DECL(expected0,poly,8,16) [] = { 0xf0, 0xf1, 0x55, 0x55,
 					  0xf6, 0xf7, 0x55, 0x55 };
 VECT_VAR_DECL(expected0,poly,16,8) [] = { 0xfff0, 0xfff1, 0x66, 0x66,
 					  0xfff2, 0xfff3, 0x66, 0x66 };
+#if defined (FP16_SUPPORTED)
+VECT_VAR_DECL (expected0, hfloat, 16, 8) [] = { 0xcc00, 0xcb80,
+						0x4b4d, 0x4b4d,
+						0xcb00, 0xca80,
+						0x4b4d, 0x4b4d };
+#endif
 VECT_VAR_DECL(expected0,hfloat,32,4) [] = { 0xc1800000, 0xc1700000,
 					    0x42073333, 0x42073333 };
 
@@ -51,6 +61,10 @@ VECT_VAR_DECL(expected1,uint,32,2) [] = { 0x77, 0x77 };
 VECT_VAR_DECL(expected1,poly,8,8) [] = { 0xf4, 0xf5, 0x55, 0x55,
 					 0xf6, 0xf7, 0x55, 0x55 };
 VECT_VAR_DECL(expected1,poly,16,4) [] = { 0xfff2, 0xfff3, 0x66, 0x66 };
+#if defined (FP16_SUPPORTED)
+VECT_VAR_DECL (expected1, hfloat, 16, 4) [] = { 0xcb00, 0xca80,
+						0x4b4d, 0x4b4d };
+#endif
 VECT_VAR_DECL(expected1,hfloat,32,2) [] = { 0x42066666, 0x42066666 };
 VECT_VAR_DECL(expected1,int,8,16) [] = { 0xf8, 0xf9, 0x11, 0x11,
 					 0xfa, 0xfb, 0x11, 0x11,
@@ -72,6 +86,12 @@ VECT_VAR_DECL(expected1,poly,8,16) [] = { 0xf8, 0xf9, 0x55, 0x55,
 					  0xfe, 0xff, 0x55, 0x55 };
 VECT_VAR_DECL(expected1,poly,16,8) [] = { 0xfff4, 0xfff5, 0x66, 0x66,
 					  0xfff6, 0xfff7, 0x66, 0x66 };
+#if defined (FP16_SUPPORTED)
+VECT_VAR_DECL (expected1, hfloat, 16, 8) [] = { 0xca00, 0xc980,
+						0x4b4d, 0x4b4d,
+						0xc900, 0xc880,
+						0x4b4d, 0x4b4d };
+#endif
 VECT_VAR_DECL(expected1,hfloat,32,4) [] = { 0xc1600000, 0xc1500000,
 					    0x42073333, 0x42073333 };
 
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vuzp.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vuzp.c
index ab6e576..43b49ca 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vuzp.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vuzp.c
@@ -19,6 +19,10 @@ VECT_VAR_DECL(expected0,poly,8,8) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
 					 0xf4, 0xf5, 0xf6, 0xf7 };
 VECT_VAR_DECL(expected0,poly,16,4) [] = { 0xfff0, 0xfff1,
 					  0xfff2, 0xfff3 };
+#if defined (FP16_SUPPORTED)
+VECT_VAR_DECL (expected0, hfloat, 16, 4) [] = { 0xcc00, 0xcb80,
+						0xcb00, 0xca80 };
+#endif
 VECT_VAR_DECL(expected0,hfloat,32,2) [] = { 0xc1800000, 0xc1700000 };
 VECT_VAR_DECL(expected0,int,8,16) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
 					 0xf4, 0xf5, 0xf6, 0xf7,
@@ -48,6 +52,12 @@ VECT_VAR_DECL(expected0,poly,16,8) [] = { 0xfff0, 0xfff1,
 					  0xfff2, 0xfff3,
 					  0xfff4, 0xfff5,
 					  0xfff6, 0xfff7 };
+#if defined (FP16_SUPPORTED)
+VECT_VAR_DECL (expected0, hfloat, 16, 8) [] = { 0xcc00, 0xcb80,
+						0xcb00, 0xca80,
+						0xca00, 0xc980,
+						0xc900, 0xc880 };
+#endif
 VECT_VAR_DECL(expected0,hfloat,32,4) [] = { 0xc1800000, 0xc1700000,
 					    0xc1600000, 0xc1500000 };
 
@@ -63,6 +73,10 @@ VECT_VAR_DECL(expected1,uint,32,2) [] = { 0x77, 0x77 };
 VECT_VAR_DECL(expected1,poly,8,8) [] = { 0x55, 0x55, 0x55, 0x55,
 					 0x55, 0x55, 0x55, 0x55 };
 VECT_VAR_DECL(expected1,poly,16,4) [] = { 0x66, 0x66, 0x66, 0x66 };
+#if defined (FP16_SUPPORTED)
+VECT_VAR_DECL (expected1, hfloat, 16, 4) [] = { 0x4b4d, 0x4b4d,
+						0x4b4d, 0x4b4d };
+#endif
 VECT_VAR_DECL(expected1,hfloat,32,2) [] = { 0x42066666, 0x42066666 };
 VECT_VAR_DECL(expected1,int,8,16) [] = { 0x11, 0x11, 0x11, 0x11,
 					 0x11, 0x11, 0x11, 0x11,
@@ -84,6 +98,12 @@ VECT_VAR_DECL(expected1,poly,8,16) [] = { 0x55, 0x55, 0x55, 0x55,
 					  0x55, 0x55, 0x55, 0x55 };
 VECT_VAR_DECL(expected1,poly,16,8) [] = { 0x66, 0x66, 0x66, 0x66,
 					  0x66, 0x66, 0x66, 0x66 };
+#if defined (FP16_SUPPORTED)
+VECT_VAR_DECL (expected1, hfloat, 16, 8) [] = { 0x4b4d, 0x4b4d,
+						0x4b4d, 0x4b4d,
+						0x4b4d, 0x4b4d,
+						0x4b4d, 0x4b4d };
+#endif
 VECT_VAR_DECL(expected1,hfloat,32,4) [] = { 0x42073333, 0x42073333,
 					    0x42073333, 0x42073333 };
 
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vzip.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vzip.c
index b5fe516..20f4f5d 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vzip.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vzip.c
@@ -18,6 +18,10 @@ VECT_VAR_DECL(expected0,poly,8,8) [] = { 0xf0, 0xf4, 0x55, 0x55,
 					 0xf1, 0xf5, 0x55, 0x55 };
 VECT_VAR_DECL(expected0,poly,16,4) [] = { 0xfff0, 0xfff2,
 					  0x66, 0x66 };
+#if defined (FP16_SUPPORTED)
+VECT_VAR_DECL (expected0, hfloat, 16, 4) [] = { 0xcc00, 0xcb00,
+						0x4b4d, 0x4b4d };
+#endif
 VECT_VAR_DECL(expected0,hfloat,32,2) [] = { 0xc1800000, 0xc1700000 };
 VECT_VAR_DECL(expected0,int,8,16) [] = { 0xf0, 0xf8, 0x11, 0x11,
 					 0xf1, 0xf9, 0x11, 0x11,
@@ -41,6 +45,12 @@ VECT_VAR_DECL(expected0,poly,8,16) [] = { 0xf0, 0xf8, 0x55, 0x55,
 					  0xf3, 0xfb, 0x55, 0x55 };
 VECT_VAR_DECL(expected0,poly,16,8) [] = { 0xfff0, 0xfff4, 0x66, 0x66,
 					  0xfff1, 0xfff5, 0x66, 0x66 };
+#if defined (FP16_SUPPORTED)
+VECT_VAR_DECL (expected0, hfloat, 16, 8) [] = { 0xcc00, 0xca00,
+						0x4b4d, 0x4b4d,
+						0xcb80, 0xc980,
+						0x4b4d, 0x4b4d };
+#endif
 VECT_VAR_DECL(expected0,hfloat,32,4) [] = { 0xc1800000, 0xc1600000,
 					    0x42073333, 0x42073333 };
 
@@ -59,6 +69,10 @@ VECT_VAR_DECL(expected1,poly,8,8) [] = { 0xf2, 0xf6, 0x55, 0x55,
 					 0xf3, 0xf7, 0x55, 0x55 };
 VECT_VAR_DECL(expected1,poly,16,4) [] = { 0xfff1, 0xfff3,
 					  0x66, 0x66 };
+#if defined (FP16_SUPPORTED)
+VECT_VAR_DECL (expected1, hfloat, 16, 4) [] = { 0xcb80, 0xca80,
+						0x4b4d, 0x4b4d };
+#endif
 VECT_VAR_DECL(expected1,hfloat,32,2) [] = { 0x42066666, 0x42066666 };
 VECT_VAR_DECL(expected1,int,8,16) [] = { 0xf4, 0xfc, 0x11, 0x11,
 					 0xf5, 0xfd, 0x11, 0x11,
@@ -82,6 +96,12 @@ VECT_VAR_DECL(expected1,poly,8,16) [] = { 0xf4, 0xfc, 0x55, 0x55,
 					  0xf7, 0xff, 0x55, 0x55 };
 VECT_VAR_DECL(expected1,poly,16,8) [] = { 0xfff2, 0xfff6, 0x66, 0x66,
 					  0xfff3, 0xfff7, 0x66, 0x66 };
+#if defined (FP16_SUPPORTED)
+VECT_VAR_DECL (expected1, hfloat, 16, 8) [] = { 0xcb00, 0xc900,
+						0x4b4d, 0x4b4d,
+						0xca80, 0xc880,
+						0x4b4d, 0x4b4d };
+#endif
 VECT_VAR_DECL(expected1,hfloat,32,4) [] = { 0xc1700000, 0xc1500000,
 					    0x42073333, 0x42073333 };
 
-- 
2.1.4


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 7/17][ARM] Add FP16 data movement instructions.
  2016-05-17 14:20 [PATCH 0/17][ARM] ARMv8.2-A and FP16 extension support Matthew Wahab
                   ` (5 preceding siblings ...)
  2016-05-17 14:32 ` [PATCH 6/17][ARM] Add data processing intrinsics for float16_t Matthew Wahab
@ 2016-05-17 14:34 ` Matthew Wahab
  2016-07-04 13:57   ` Matthew Wahab
  2016-05-17 14:36 ` [PATCH 8/17][ARM] Add VFP FP16 arithmetic instructions Matthew Wahab
                   ` (9 subsequent siblings)
  16 siblings, 1 reply; 73+ messages in thread
From: Matthew Wahab @ 2016-05-17 14:34 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1926 bytes --]

The ARMv8.2-A FP16 extension adds a number of instructions to support
data movement for FP16 values. This patch adds these instructions to the
backend, making them available to the compiler code generator.

The new instructions include VSEL which selects between two registers
depending on a condition. This is used to support conditional data
movement which can depend on the result of comparisons between
half-precision values. These comparisons are always done by conversion
to single-precision.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator. This patch also tested for
arm-none-linux-gnueabihf with native bootstrap and make check and for
arm-none-eabi with check-gcc on an ARMv8.2-A emulator.

Ok for trunk?
Matthew

2016-05-17  Matthew Wahab  <matthew.wahab@arm.com>
	    Jiong Wang  <jiong.wang@arm.com>

	* config/arm/arm.c (coproc_secondary_reload_class): Make HFmode
	available when FP16 instructions are available.
	(output_mov_vfp): Add support for 16-bit data moves.
	(arm_validize_comparison): Fix some white-space.  Support HFmode
	by conversion to SFmode.
	* config/arm/arm.md (truncdfhf2): Fix a comment.
	(extendhfdf2): Likewise.
	(cstorehf4): New.
	(movsicc): Fix some white-space.
	(movhfcc): New.
	(movsfcc): Fix some white-space.
	(*cmovhf): New.
	* config/arm/vfp.md (*arm_movhi_vfp): Disable when VFP FP16
	instructions are available.
	(*thumb_movhi_vfp): Likewise.
	(*arm_movhi_fp16): New.
	(*thumb_movhi_fp16): New.
	(*movhf_vfp_fp16): New.
	(*movhf_vfp_neon): Disable when VFP FP16 instructions are
	available.
	(*movhf_vfp): Likewise.
	(extendhfsf2): Enable when VFP FP16 instructions are available.
	(truncsfhf2):  Enable when VFP FP16 instructions are available.

testsuite/
2016-05-17  Matthew Wahab  <matthew.wahab@arm.com>

	* gcc.target/arm/armv8_2_fp16-move-1.c: New.


[-- Attachment #2: 0007-PATCH-7-17-ARM-Add-FP16-data-movement-instructions.patch --]
[-- Type: text/x-patch, Size: 18545 bytes --]

From 83268813cf9aa59940ed17d623606c9e485f6ecf Mon Sep 17 00:00:00 2001
From: Matthew Wahab <matthew.wahab@arm.com>
Date: Thu, 7 Apr 2016 13:35:04 +0100
Subject: [PATCH 07/17] [PATCH 7/17][ARM] Add FP16 data movement instructions.

2016-05-17  Matthew Wahab  <matthew.wahab@arm.com>
	    Jiong Wang  <jiong.wang@arm.com>

	* config/arm/arm.c (coproc_secondary_reload_class): Make HFmode
	available when FP16 instructions are available.
	(output_mov_vfp): Add support for 16-bit data moves.
	(arm_validize_comparison): Fix some white-space.  Support HFmode
	by conversion to SFmode.
	* config/arm/arm.md (truncdfhf2): Fix a comment.
	(extendhfdf2): Likewise.
	(cstorehf4_fp16): New.
	(movsicc): Fix some white-space.
	(movhfcc): New.
	(movsfcc): Fix some white-space.
	(*cmovhf): New.
	* config/arm/vfp.md (*arm_movhi_vfp): Disable when VFP FP16
	instructions are available.
	(*thumb_movhi_vfp): Likewise.
	(*arm_movhi_fp16): New.
	(*thumb_movhi_fp16): New.
	(*movhf_vfp_fp16): New.
	(*movhf_vfp_neon): Disable when VFP FP16 instructions are
	available.
	(*movhf_vfp): Likewise.
	(extendhfsf2): Enable when VFP FP16 instructions are available.
	(truncsfhf2):  Enable when VFP FP16 instructions are available.

testsuite/
2016-05-17  Matthew Wahab  <matthew.wahab@arm.com>

	* gcc.target/arm/armv8_2_fp16-move-1.c: New.
---
 gcc/config/arm/arm.c                               |  16 +-
 gcc/config/arm/arm.md                              |  81 ++++++++-
 gcc/config/arm/vfp.md                              | 182 ++++++++++++++++++++-
 gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c | 166 +++++++++++++++++++
 4 files changed, 433 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 6892040..187ebda 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -13162,7 +13162,7 @@ coproc_secondary_reload_class (machine_mode mode, rtx x, bool wb)
 {
   if (mode == HFmode)
     {
-      if (!TARGET_NEON_FP16)
+      if (!TARGET_NEON_FP16 && !TARGET_VFP_FP16INST)
 	return GENERAL_REGS;
       if (s_register_operand (x, mode) || neon_vector_mem_operand (x, 2, true))
 	return NO_REGS;
@@ -18613,6 +18613,8 @@ output_move_vfp (rtx *operands)
   rtx reg, mem, addr, ops[2];
   int load = REG_P (operands[0]);
   int dp = GET_MODE_SIZE (GET_MODE (operands[0])) == 8;
+  int sp = (!TARGET_VFP_FP16INST
+	    || GET_MODE_SIZE (GET_MODE (operands[0])) == 4);
   int integer_p = GET_MODE_CLASS (GET_MODE (operands[0])) == MODE_INT;
   const char *templ;
   char buff[50];
@@ -18659,7 +18661,7 @@ output_move_vfp (rtx *operands)
 
   sprintf (buff, templ,
 	   load ? "ld" : "st",
-	   dp ? "64" : "32",
+	   dp ? "64" : sp ? "32" : "16",
 	   dp ? "P" : "",
 	   integer_p ? "\t%@ int" : "");
   output_asm_insn (buff, ops);
@@ -29238,7 +29240,7 @@ arm_validize_comparison (rtx *comparison, rtx * op1, rtx * op2)
 {
   enum rtx_code code = GET_CODE (*comparison);
   int code_int;
-  machine_mode mode = (GET_MODE (*op1) == VOIDmode) 
+  machine_mode mode = (GET_MODE (*op1) == VOIDmode)
     ? GET_MODE (*op2) : GET_MODE (*op1);
 
   gcc_assert (GET_MODE (*op1) != VOIDmode || GET_MODE (*op2) != VOIDmode);
@@ -29266,6 +29268,14 @@ arm_validize_comparison (rtx *comparison, rtx * op1, rtx * op2)
 	*op2 = force_reg (mode, *op2);
       return true;
 
+    case HFmode:
+      if (!TARGET_VFP_FP16INST)
+	break;
+      /* FP16 comparisons are done in SF mode.  */
+      mode = SFmode;
+      *op1 = convert_to_mode (mode, *op1, 1);
+      *op2 = convert_to_mode (mode, *op2, 1);
+      /* Fall through.  */
     case SFmode:
     case DFmode:
       if (!arm_float_compare_operand (*op1, mode))
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 3e23178..224a72f 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -4857,7 +4857,7 @@
   ""
 )
 
-/* DFmode -> HFmode conversions have to go through SFmode.  */
+;; DFmode to HFmode conversions have to go through SFmode.
 (define_expand "truncdfhf2"
   [(set (match_operand:HF  0 "general_operand" "")
 	(float_truncate:HF
@@ -5364,7 +5364,7 @@
   ""
 )
 
-/* HFmode -> DFmode conversions have to go through SFmode.  */
+;; HFmode -> DFmode conversions have to go through SFmode.
 (define_expand "extendhfdf2"
   [(set (match_operand:DF                  0 "general_operand" "")
 	(float_extend:DF (match_operand:HF 1 "general_operand"  "")))]
@@ -7369,6 +7369,24 @@
   DONE;
 }")
 
+(define_expand "cstorehf4"
+  [(set (match_operand:SI 0 "s_register_operand")
+	(match_operator:SI 1 "expandable_comparison_operator"
+	 [(match_operand:HF 2 "s_register_operand")
+	  (match_operand:HF 3 "arm_float_compare_operand")]))]
+  "TARGET_VFP_FP16INST"
+  {
+    if (!arm_validize_comparison (&operands[1],
+				  &operands[2],
+				  &operands[3]))
+       FAIL;
+
+    emit_insn (gen_cstore_cc (operands[0], operands[1],
+			      operands[2], operands[3]));
+    DONE;
+  }
+)
+
 (define_expand "cstoresf4"
   [(set (match_operand:SI 0 "s_register_operand" "")
 	(match_operator:SI 1 "expandable_comparison_operator"
@@ -7421,9 +7439,31 @@
     rtx ccreg;
 
     if (!arm_validize_comparison (&operands[1], &XEXP (operands[1], 0), 
-       				  &XEXP (operands[1], 1)))
+				  &XEXP (operands[1], 1)))
       FAIL;
-    
+
+    code = GET_CODE (operands[1]);
+    ccreg = arm_gen_compare_reg (code, XEXP (operands[1], 0),
+				 XEXP (operands[1], 1), NULL_RTX);
+    operands[1] = gen_rtx_fmt_ee (code, VOIDmode, ccreg, const0_rtx);
+  }"
+)
+
+(define_expand "movhfcc"
+  [(set (match_operand:HF 0 "s_register_operand")
+	(if_then_else:HF (match_operand 1 "arm_cond_move_operator")
+			 (match_operand:HF 2 "s_register_operand")
+			 (match_operand:HF 3 "s_register_operand")))]
+  "TARGET_VFP_FP16INST"
+  "
+  {
+    enum rtx_code code = GET_CODE (operands[1]);
+    rtx ccreg;
+
+    if (!arm_validize_comparison (&operands[1], &XEXP (operands[1], 0),
+				  &XEXP (operands[1], 1)))
+      FAIL;
+
     code = GET_CODE (operands[1]);
     ccreg = arm_gen_compare_reg (code, XEXP (operands[1], 0),
 				 XEXP (operands[1], 1), NULL_RTX);
@@ -7442,7 +7482,7 @@
     enum rtx_code code = GET_CODE (operands[1]);
     rtx ccreg;
 
-    if (!arm_validize_comparison (&operands[1], &XEXP (operands[1], 0), 
+    if (!arm_validize_comparison (&operands[1], &XEXP (operands[1], 0),
        				  &XEXP (operands[1], 1)))
        FAIL;
 
@@ -7507,6 +7547,37 @@
    (set_attr "type" "fcsel")]
 )
 
+(define_insn "*cmovhf"
+    [(set (match_operand:HF 0 "s_register_operand" "=t")
+	(if_then_else:HF (match_operator 1 "arm_vsel_comparison_operator"
+			 [(match_operand 2 "cc_register" "") (const_int 0)])
+			  (match_operand:HF 3 "s_register_operand" "t")
+			  (match_operand:HF 4 "s_register_operand" "t")))]
+  "TARGET_VFP_FP16INST"
+  "*
+  {
+    enum arm_cond_code code = maybe_get_arm_condition_code (operands[1]);
+    switch (code)
+      {
+      case ARM_GE:
+      case ARM_GT:
+      case ARM_EQ:
+      case ARM_VS:
+	return \"vsel%d1.f16\\t%0, %3, %4\";
+      case ARM_LT:
+      case ARM_LE:
+      case ARM_NE:
+      case ARM_VC:
+	return \"vsel%D1.f16\\t%0, %4, %3\";
+      default:
+	gcc_unreachable ();
+      }
+    return \"\";
+  }"
+  [(set_attr "conds" "use")
+   (set_attr "type" "fcsel")]
+)
+
 (define_insn_and_split "*movsicc_insn"
   [(set (match_operand:SI 0 "s_register_operand" "=r,r,r,r,r,r,r,r")
 	(if_then_else:SI
diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
index d7c874a..b1c13fa 100644
--- a/gcc/config/arm/vfp.md
+++ b/gcc/config/arm/vfp.md
@@ -27,6 +27,7 @@
    (match_operand:HI 1 "general_operand"
     "rIk, K, n, r, mi, r, *t, *t"))]
  "TARGET_ARM && TARGET_HARD_FLOAT && TARGET_VFP
+  && !TARGET_VFP_FP16INST
   && (register_operand (operands[0], HImode)
        || register_operand (operands[1], HImode))"
 {
@@ -76,6 +77,7 @@
    (match_operand:HI 1 "general_operand"
     "rk, I, Py, n, r, m, r, *t, *t"))]
  "TARGET_THUMB2 && TARGET_HARD_FLOAT && TARGET_VFP
+  && !TARGET_VFP_FP16INST
   && (register_operand (operands[0], HImode)
        || register_operand (operands[1], HImode))"
 {
@@ -111,6 +113,99 @@
   (set_attr "length" "2, 4, 2, 4, 4, 4, 4, 4, 4")]
 )
 
+;; Patterns for HI moves which provide more data transfer instructions when FP16
+;; instructions are available.
+(define_insn "*arm_movhi_fp16"
+ [(set
+   (match_operand:HI 0 "nonimmediate_operand"
+    "=r,  r, r, m, r, *t,  r, *t")
+   (match_operand:HI 1 "general_operand"
+    "rIk, K, n, r, mi, r, *t, *t"))]
+ "TARGET_ARM && TARGET_VFP_FP16INST
+  && (register_operand (operands[0], HImode)
+       || register_operand (operands[1], HImode))"
+{
+  switch (which_alternative)
+    {
+    case 0:
+      return "mov%?\t%0, %1\t%@ movhi";
+    case 1:
+      return "mvn%?\t%0, #%B1\t%@ movhi";
+    case 2:
+      return "movw%?\t%0, %L1\t%@ movhi";
+    case 3:
+      return "strh%?\t%1, %0\t%@ movhi";
+    case 4:
+      return "ldrh%?\t%0, %1\t%@ movhi";
+    case 5:
+    case 6:
+      return "vmov%?.f16\t%0, %1\t%@ int";
+    case 7:
+      return "vmov%?.f32\t%0, %1\t%@ int";
+    default:
+      gcc_unreachable ();
+    }
+}
+ [(set_attr "predicable" "yes")
+  (set_attr_alternative "type"
+   [(if_then_else
+     (match_operand 1 "const_int_operand" "")
+     (const_string "mov_imm")
+     (const_string "mov_reg"))
+    (const_string "mvn_imm")
+    (const_string "mov_imm")
+    (const_string "store1")
+    (const_string "load1")
+    (const_string "f_mcr")
+    (const_string "f_mrc")
+    (const_string "fmov")])
+  (set_attr "pool_range" "*, *, *, *, 256, *, *, *")
+  (set_attr "neg_pool_range" "*, *, *, *, 244, *, *, *")
+  (set_attr "length" "4")]
+)
+
+(define_insn "*thumb2_movhi_fp16"
+ [(set
+   (match_operand:HI 0 "nonimmediate_operand"
+    "=rk, r, l, r, m, r, *t, r, *t")
+   (match_operand:HI 1 "general_operand"
+    "rk, I, Py, n, r, m, r, *t, *t"))]
+ "TARGET_THUMB2 && TARGET_VFP_FP16INST
+  && (register_operand (operands[0], HImode)
+       || register_operand (operands[1], HImode))"
+{
+  switch (which_alternative)
+    {
+    case 0:
+    case 1:
+    case 2:
+      return "mov%?\t%0, %1\t%@ movhi";
+    case 3:
+      return "movw%?\t%0, %L1\t%@ movhi";
+    case 4:
+      return "strh%?\t%1, %0\t%@ movhi";
+    case 5:
+      return "ldrh%?\t%0, %1\t%@ movhi";
+    case 6:
+    case 7:
+      return "vmov%?.f16\t%0, %1\t%@ int";
+    case 8:
+      return "vmov%?.f32\t%0, %1\t%@ int";
+    default:
+      gcc_unreachable ();
+    }
+}
+ [(set_attr "predicable" "yes")
+  (set_attr "predicable_short_it"
+   "yes, no, yes, no, no, no, no, no, no")
+  (set_attr "type"
+   "mov_reg, mov_imm, mov_imm, mov_imm, store1, load1,\
+    f_mcr, f_mrc, fmov")
+  (set_attr "pool_range" "*, *, *, *, *, 4094, *, *, *")
+  (set_attr "neg_pool_range" "*, *, *, *, *, 250, *, *, *")
+  (set_attr "length" "2, 4, 2, 4, 4, 4, 4, 4, 4")]
+)
+
 ;; SImode moves
 ;; ??? For now do not allow loading constants into vfp regs.  This causes
 ;; problems because small constants get converted into adds.
@@ -304,10 +399,87 @@
  )
 
 ;; HFmode moves
+
+(define_insn "*movhf_vfp_fp16"
+  [(set (match_operand:HF 0 "nonimmediate_operand"
+			  "= r,m,t,r,t,r,t,t,Um,r")
+	(match_operand:HF 1 "general_operand"
+			  "  m,r,t,r,r,t,Dv,Um,t,F"))]
+  "TARGET_32BIT
+   && TARGET_VFP_FP16INST
+   && (s_register_operand (operands[0], HFmode)
+       || s_register_operand (operands[1], HFmode))"
+ {
+  switch (which_alternative)
+    {
+    case 0: /* ARM register from memory.  */
+      return \"ldrh%?\\t%0, %1\\t%@ __fp16\";
+    case 1: /* Memory from ARM register.  */
+      return \"strh%?\\t%1, %0\\t%@ __fp16\";
+    case 2: /* S register from S register.  */
+      return \"vmov\\t%0, %1\t%@ __fp16\";
+    case 3: /* ARM register from ARM register.  */
+      return \"mov%?\\t%0, %1\\t%@ __fp16\";
+    case 4: /* S register from ARM register.  */
+    case 5: /* ARM register from S register.  */
+    case 6: /* S register from immediate.  */
+      return \"vmov.f16\\t%0, %1\t%@ __fp16\";
+    case 7: /* S register from memory.  */
+      return \"vld1.16\\t{%z0}, %A1\";
+    case 8: /* Memory from S register.  */
+      return \"vst1.16\\t{%z1}, %A0\";
+    case 9: /* ARM register from constant.  */
+      {
+	long bits;
+	rtx ops[4];
+
+	bits = real_to_target (NULL, CONST_DOUBLE_REAL_VALUE (operands[1]),
+			       HFmode);
+	ops[0] = operands[0];
+	ops[1] = GEN_INT (bits);
+	ops[2] = GEN_INT (bits & 0xff00);
+	ops[3] = GEN_INT (bits & 0x00ff);
+
+	if (arm_arch_thumb2)
+	  output_asm_insn (\"movw\\t%0, %1\", ops);
+	else
+	  output_asm_insn (\"mov\\t%0, %2\;orr\\t%0, %0, %3\", ops);
+	return \"\";
+       }
+    default:
+      gcc_unreachable ();
+    }
+ }
+  [(set_attr "predicable" "yes, yes, no, yes, no, no, no, no, no, no")
+   (set_attr "predicable_short_it" "no, no, no, yes,\
+				    no, no, no, no,\
+				    no, no")
+   (set_attr_alternative "type"
+    [(const_string "load1") (const_string "store1")
+     (const_string "fmov") (const_string "mov_reg")
+     (const_string "f_mcr") (const_string "f_mrc")
+     (const_string "fconsts") (const_string "neon_load1_1reg")
+     (const_string "neon_store1_1reg")
+     (if_then_else (match_test "arm_arch_thumb2")
+      (const_string "mov_imm")
+      (const_string "multiple"))])
+   (set_attr_alternative "length"
+    [(const_int 4) (const_int 4)
+     (const_int 4) (const_int 4)
+     (const_int 4) (const_int 4)
+     (const_int 4) (const_int 4)
+     (const_int 4)
+     (if_then_else (match_test "arm_arch_thumb2")
+      (const_int 4)
+      (const_int 8))])]
+)
+
 (define_insn "*movhf_vfp_neon"
   [(set (match_operand:HF 0 "nonimmediate_operand" "= t,Um,r,m,t,r,t,r,r")
 	(match_operand:HF 1 "general_operand"	   " Um, t,m,r,t,r,r,t,F"))]
-  "TARGET_32BIT && TARGET_HARD_FLOAT && TARGET_NEON_FP16
+  "TARGET_32BIT
+   && TARGET_HARD_FLOAT && TARGET_NEON_FP16
+   && !TARGET_VFP_FP16INST
    && (   s_register_operand (operands[0], HFmode)
        || s_register_operand (operands[1], HFmode))"
   "*
@@ -361,8 +533,10 @@
 (define_insn "*movhf_vfp"
   [(set (match_operand:HF 0 "nonimmediate_operand" "=r,m,t,r,t,r,r")
 	(match_operand:HF 1 "general_operand"	   " m,r,t,r,r,t,F"))]
-  "TARGET_32BIT && TARGET_HARD_FLOAT && TARGET_VFP
+  "TARGET_32BIT
+   && TARGET_HARD_FLOAT && TARGET_VFP
    && !TARGET_NEON_FP16
+   && !TARGET_VFP_FP16INST
    && (   s_register_operand (operands[0], HFmode)
        || s_register_operand (operands[1], HFmode))"
   "*
@@ -1095,7 +1269,7 @@
 (define_insn "extendhfsf2"
   [(set (match_operand:SF		   0 "s_register_operand" "=t")
 	(float_extend:SF (match_operand:HF 1 "s_register_operand" "t")))]
-  "TARGET_32BIT && TARGET_HARD_FLOAT && TARGET_FP16"
+  "TARGET_32BIT && TARGET_HARD_FLOAT && (TARGET_FP16 || TARGET_VFP_FP16INST)"
   "vcvtb%?.f32.f16\\t%0, %1"
   [(set_attr "predicable" "yes")
    (set_attr "predicable_short_it" "no")
@@ -1105,7 +1279,7 @@
 (define_insn "truncsfhf2"
   [(set (match_operand:HF		   0 "s_register_operand" "=t")
 	(float_truncate:HF (match_operand:SF 1 "s_register_operand" "t")))]
-  "TARGET_32BIT && TARGET_HARD_FLOAT && TARGET_FP16"
+  "TARGET_32BIT && TARGET_HARD_FLOAT && (TARGET_FP16 || TARGET_VFP_FP16INST)"
   "vcvtb%?.f16.f32\\t%0, %1"
   [(set_attr "predicable" "yes")
    (set_attr "predicable_short_it" "no")
diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c
new file mode 100644
index 0000000..7108703
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c
@@ -0,0 +1,166 @@
+/* { dg-do compile }  */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_ok }  */
+/* { dg-options "-O2" }  */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+__fp16
+test_load_1 (__fp16* a)
+{
+  return *a;
+}
+
+__fp16
+test_load_2 (__fp16* a, int i)
+{
+  return a[i];
+}
+
+/* { dg-final { scan-assembler-times {vld1\.16\t\{d[0-9]+\[[0-9]+\]\}, \[r[0-9]\]+} 2 } }  */
+
+void
+test_store_1 (__fp16* a, __fp16 b)
+{
+  *a = b;
+}
+
+void
+test_store_2 (__fp16* a, int i, __fp16 b)
+{
+  a[i] = b;
+}
+
+/* { dg-final { scan-assembler-times {vst1\.16\t\{d[0-9]+\[[0-9]+\]\}, \[r[0-9]\]+} 2 } }  */
+
+
+__fp16
+test_load_store_1 (__fp16* a, int i, __fp16* b)
+{
+  a[i] = b[i];
+}
+
+__fp16
+test_load_store_2 (__fp16* a, int i, __fp16* b)
+{
+  a[i] = b[i + 2];
+  return a[i];
+}
+/* { dg-final { scan-assembler-times {ldrh\tr[0-9]+} 2 } }  */
+/* { dg-final { scan-assembler-times {strh\tr[0-9]+} 2 } }  */
+
+__fp16
+test_select_1 (int sel, __fp16 a, __fp16 b)
+{
+  if (sel)
+    return a;
+  else
+    return b;
+}
+
+__fp16
+test_select_2 (int sel, __fp16 a, __fp16 b)
+{
+  return sel ? a : b;
+}
+
+__fp16
+test_select_3 (__fp16 a, __fp16 b, __fp16 c)
+{
+  return (a == b) ? b : c;
+}
+
+__fp16
+test_select_4 (__fp16 a, __fp16 b, __fp16 c)
+{
+  return (a != b) ? b : c;
+}
+
+__fp16
+test_select_5 (__fp16 a, __fp16 b, __fp16 c)
+{
+  return (a < b) ? b : c;
+}
+
+__fp16
+test_select_6 (__fp16 a, __fp16 b, __fp16 c)
+{
+  return (a <= b) ? b : c;
+}
+
+__fp16
+test_select_7 (__fp16 a, __fp16 b, __fp16 c)
+{
+  return (a > b) ? b : c;
+}
+
+__fp16
+test_select_8 (__fp16 a, __fp16 b, __fp16 c)
+{
+  return (a >= b) ? b : c;
+}
+
+/* { dg-final { scan-assembler-times {vseleq\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 4 } } */
+/* { dg-final { scan-assembler-times {vselgt\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } }  */
+/* { dg-final { scan-assembler-times {vselge\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } }  */
+
+/* { dg-final { scan-assembler-times {vmov\.f16\ts[0-9]+, r[0-9]+} 4 } }  */
+/* { dg-final { scan-assembler-times {vmov\.f16\tr[0-9]+, s[0-9]+} 4 } }  */
+
+int
+test_compare_1 (__fp16 a, __fp16 b)
+{
+  if (a == b)
+    return -1;
+  else
+    return 0;
+}
+
+int
+test_compare_ (__fp16 a, __fp16 b)
+{
+  if (a != b)
+    return -1;
+  else
+    return 0;
+}
+
+int
+test_compare_2 (__fp16 a, __fp16 b)
+{
+  if (a > b)
+    return -1;
+  else
+    return 0;
+}
+
+int
+test_compare_3 (__fp16 a, __fp16 b)
+{
+  if (a >= b)
+    return -1;
+  else
+    return 0;
+}
+
+int
+test_compare_4 (__fp16 a, __fp16 b)
+{
+  if (a < b)
+    return -1;
+  else
+    return 0;
+}
+
+int
+test_compare_5 (__fp16 a, __fp16 b)
+{
+  if (a <= b)
+    return -1;
+  else
+    return 0;
+}
+
+/* { dg-final { scan-assembler-not {vcmp\.f16} } }  */
+/* { dg-final { scan-assembler-not {vcmpe\.f16} } }  */
+
+/* { dg-final { scan-assembler-times {vcmp\.f32} 4 } }  */
+/* { dg-final { scan-assembler-times {vcmpe\.f32} 8 } }  */
-- 
2.1.4


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 8/17][ARM] Add VFP FP16 arithmetic instructions.
  2016-05-17 14:20 [PATCH 0/17][ARM] ARMv8.2-A and FP16 extension support Matthew Wahab
                   ` (6 preceding siblings ...)
  2016-05-17 14:34 ` [PATCH 7/17][ARM] Add FP16 data movement instructions Matthew Wahab
@ 2016-05-17 14:36 ` Matthew Wahab
  2016-05-18  0:52   ` Joseph Myers
  2016-07-04 14:02   ` Matthew Wahab
  2016-05-17 14:37 ` [PATCH 9/17][ARM] Add NEON " Matthew Wahab
                   ` (8 subsequent siblings)
  16 siblings, 2 replies; 73+ messages in thread
From: Matthew Wahab @ 2016-05-17 14:36 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 3268 bytes --]

The ARMv8.2-A FP16 extension adds a number of arithmetic instrutctions
to the VFP instruction set. This patch adds support for these
instructions to the ARM backend.

In most cases the instructions are added using non-standard pattern
names. This is to force operations on __fp16 values to be done, by
conversion, using the single-precision instructions. The exceptions are
the precision preserving operations ABS and NEG.

The instruction patterns can be used by the compiler to optimize
half-precision operations. Since the patterns names are non-standard,
the only way for half-precision operations to be generated is by using
the intrinsics added by this patch series meaning that existing code
will not be affected.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator.

Ok for trunk?
Matthew

2016-05-17  Matthew Wahab  <matthew.wahab@arm.com>

	* config/arm/iterators.md (Code iterators): Fix some white-space
	in the comments.
	(GLTE): New.
	(ABSNEG): New
	(FCVT): Moved from vfp.md.
	(VCVT_HF_US_N): New.
	(VCVT_SI_US_N): New.
	(VCVT_HF_US): New.
	(VCVTH_US): New.
	(FP16_RND): New.
	(absneg_str): New.
	(FCVTI32typename): Moved from vfp.md.
	(sup): Add UNSPEC_VCVTA_S, UNSPEC_VCVTA_U, UNSPEC_VCVTM_S,
	UNSPEC_VCVTM_U, UNSPEC_VCVTN_S, UNSPEC_VCVTN_U, UNSPEC_VCVTP_S,
	UNSPEC_VCVTP_U, UNSPEC_VCVT_HF_S_N, UNSPEC_VCVT_HF_U_N,
	UNSPEC_VCVT_SI_S_N, UNSPEC_VCVT_SI_U_N, UUNSPEC_VCVTH_S and
	UNSPEC_VCVTH_U.
	(vcvth_op): New.
	(fp16_rnd_str): New.
	(fp16_rnd_insn): New.
	* config/arm/unspecs.md (UNSPEC_VCVT_HF_S_N): New.
	(UNSPEC_VCVT_HF_U_N): New.
	(UNSPEC_VCVT_SI_S_N): New.
	(UNSPEC_VCVT_SI_U_N): New.
	(UNSPEC_VCVTH_S): New.
	(UNSPEC_VCVTH_U): New.
	(UNSPEC_VCVTA_S): New.
	(UNSPEC_VCVTA_U): New.
	(UNSPEC_VCVTM_S): New.
	(UNSPEC_VCVTM_U): New.
	(UNSPEC_VCVTN_S): New.
	(UNSPEC_VCVTN_U): New.
	(UNSPEC_VCVTP_S): New.
	(UNSPEC_VCVTP_U): New.
	(UNSPEC_VCVTP_S): New.
	(UNSPEC_VCVTP_U): New.
	(UNSPEC_VRND): New.
	(UNSPEC_VRNDA): New.
	(UNSPEC_VRNDI): New.
	(UNSPEC_VRNDM): New.
	(UNSPEC_VRNDN): New.
	(UNSPEC_VRNDP): New.
	(UNSPEC_VRNDX): New.
	* config/arm/vfp.md (<absneg_str>hf2): New.
	(neon_v<absneg_str>hf): New.
	(neon_v<fp16_rnd_str>hf): New.
	(neon_vrndihf): New.
	(addhf3_fp16): New.
	(neon_vaddhf): New.
	(subhf3_fp16): New.
	(neon_vsubhf): New.
	(divhf3_fp16): New.
	(neon_vdivhf): New.
	(mulhf3_fp16): New.
	(neon_vmulhf): New.
	(*mulsf3neghf_vfp): New.
	(*negmulhf3_vfp): New.
	(*mulsf3addhf_vfp): New.
	(*mulhf3subhf_vfp): New.
	(*mulhf3neghfaddhf_vfp): New.
	(*mulhf3neghfsubhf_vfp): New.
	(fmahf4_fp16): New.
	(neon_vfmahf): New.
	(fmsubhf4_fp16): New.
	(neon_vfmshf): New.
	(*fnmsubhf4): New.
	(*fnmaddhf4): New.
	(neon_vsqrthf): New.
	(neon_vrsqrtshf): New.
	(FCVT): Move to iterators.md.
	(FCVTI32typename): Likewise.
	(neon_vcvth<sup>hf): New.
	(neon_vcvth<sup>si): New.
	(neon_vcvth<sup>_nhf_unspec): New.
	(neon_vcvth<sup>_nhf): New.
	(neon_vcvth<sup>_nsi_unspec): New.
	(neon_vcvth<sup>_nsi): New.
	(neon_vcvt<vcvth_op>h<sup>si): New.
	(neon_<fmaxmin_op>hf): New.

testsuite/
2016-05-17  Matthew Wahab  <matthew.wahab@arm.com>

	* gcc.target/arm/armv8_2-fp16-arith-1.c: New.
	* gcc.target/arm/armv8_2-fp16-conv-1.c: New.


[-- Attachment #2: 0008-PATCH-8-17-ARM-Add-VFP-FP16-arithmetic-instructions.patch --]
[-- Type: text/x-patch, Size: 29656 bytes --]

From 3e773f2ec85ea66d0be0e3a97ea52826156c00f2 Mon Sep 17 00:00:00 2001
From: Matthew Wahab <matthew.wahab@arm.com>
Date: Thu, 7 Apr 2016 14:49:17 +0100
Subject: [PATCH 08/17] [PATCH 8/17][ARM] Add VFP FP16 arithmetic instructions.

2016-05-17  Matthew Wahab  <matthew.wahab@arm.com>

	* config/arm/iterators.md (Code iterators): Fix some white-space
	in the comments.
	(GLTE): New.
	(ABSNEG): New
	(FCVT): Moved from vfp.md.
	(VCVT_HF_US_N): New.
	(VCVT_SI_US_N): New.
	(VCVT_HF_US): New.
	(VCVTH_US): New.
	(FP16_RND): New.
	(absneg_str): New.
	(FCVTI32typename): Moved from vfp.md.
	(sup): Add UNSPEC_VCVTA_S, UNSPEC_VCVTA_U, UNSPEC_VCVTM_S,
	UNSPEC_VCVTM_U, UNSPEC_VCVTN_S, UNSPEC_VCVTN_U, UNSPEC_VCVTP_S,
	UNSPEC_VCVTP_U, UNSPEC_VCVT_HF_S_N, UNSPEC_VCVT_HF_U_N,
	UNSPEC_VCVT_SI_S_N, UNSPEC_VCVT_SI_U_N,  UNSPEC_VCVTH_S_N,
	UNSPEC_VCVTH_U_N, UNSPEC_VCVTH_S and UNSPEC_VCVTH_U.
	(vcvth_op): New.
	(fp16_rnd_str): New.
	(fp16_rnd_insn): New.
	* config/arm/unspecs.md (UNSPEC_VCVT_HF_S_N): New.
	(UNSPEC_VCVT_HF_U_N): New.
	(UNSPEC_VCVT_SI_S_N): New.
	(UNSPEC_VCVT_SI_U_N): New.
	(UNSPEC_VCVTH_S): New.
	(UNSPEC_VCVTH_U): New.
	(UNSPEC_VCVTA_S): New.
	(UNSPEC_VCVTA_U): New.
	(UNSPEC_VCVTM_S): New.
	(UNSPEC_VCVTM_U): New.
	(UNSPEC_VCVTN_S): New.
	(UNSPEC_VCVTN_U): New.
	(UNSPEC_VCVTP_S): New.
	(UNSPEC_VCVTP_U): New.
	(UNSPEC_VCVTP_S): New.
	(UNSPEC_VCVTP_U): New.
	(UNSPEC_VRND): New.
	(UNSPEC_VRNDA): New.
	(UNSPEC_VRNDI): New.
	(UNSPEC_VRNDM): New.
	(UNSPEC_VRNDN): New.
	(UNSPEC_VRNDP): New.
	(UNSPEC_VRNDX): New.
	* config/arm/vfp.md (<absneg_str>hf2): New.
	(neon_v<absneg_str>hf): New.
	(neon_v<fp16_rnd_str>hf): New.
	(neon_vrndihf): New.
	(addhf3_fp16): New.
	(neon_vaddhf): New.
	(subhf3_fp16): New.
	(neon_vsubhf): New.
	(divhf3_fp16): New.
	(neon_vdivhf): New.
	(mulhf3_fp16): New.
	(neon_vmulhf): New.
	(*mulsf3neghf_vfp): New.
	(*negmulhf3_vfp): New.
	(*mulsf3addhf_vfp): New.
	(*mulhf3subhf_vfp): New.
	(*mulhf3neghfaddhf_vfp): New.
	(*mulhf3neghfsubhf_vfp): New.
	(fmahf4_fp16): New.
	(neon_vfmahf): New.
	(fmsubhf4_fp16): New.
	(neon_vfmshf): New.
	(*fnmsubhf4): New.
	(*fnmaddhf4): New.
	(neon_vsqrthf): New.
	(neon_vrsqrtshf): New.
	(FCVT): Move to iterators.md.
	(FCVTI32typename): Likewise.
	(neon_vcvth<sup>hf): New.
	(neon_vcvth<sup>si): New.
	(neon_vcvth<sup>_nhf_unspec): New.
	(neon_vcvth<sup>_nhf): New.
	(neon_vcvth<sup>_nsi_unspec): New.
	(neon_vcvth<sup>_nsi): New.
	(neon_vcvt<vcvth_op>h<sup>si): New.
	(neon_<fmaxmin_op>hf): New.

testsuite/
2016-05-17  Matthew Wahab  <matthew.wahab@arm.com>

	* gcc.target/arm/armv8_2-fp16-arith-1.c: New.
	* gcc.target/arm/armv8_2-fp16-conv-1.c: New.
---
 gcc/config/arm/iterators.md                        |  59 ++-
 gcc/config/arm/unspecs.md                          |  21 +
 gcc/config/arm/vfp.md                              | 423 ++++++++++++++++++++-
 .../gcc.target/arm/armv8_2-fp16-arith-1.c          |  68 ++++
 gcc/testsuite/gcc.target/arm/armv8_2-fp16-conv-1.c | 101 +++++
 5 files changed, 666 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/armv8_2-fp16-arith-1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/armv8_2-fp16-conv-1.c

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 3f9d9e4..9371b6a 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -199,14 +199,17 @@
 ;; Code iterators
 ;;----------------------------------------------------------------------------
 
-;; A list of condition codes used in compare instructions where 
-;; the carry flag from the addition is used instead of doing the 
+;; A list of condition codes used in compare instructions where
+;; the carry flag from the addition is used instead of doing the
 ;; compare a second time.
 (define_code_iterator LTUGEU [ltu geu])
 
 ;; The signed gt, ge comparisons
 (define_code_iterator GTGE [gt ge])
 
+;; The signed gt, ge, lt, le comparisons
+(define_code_iterator GLTE [gt ge lt le])
+
 ;; The unsigned gt, ge comparisons
 (define_code_iterator GTUGEU [gtu geu])
 
@@ -235,6 +238,12 @@
 ;; Binary operators whose second operand can be shifted.
 (define_code_iterator SHIFTABLE_OPS [plus minus ior xor and])
 
+;; Operations on the sign of a number.
+(define_code_iterator ABSNEG [abs neg])
+
+;; Conversions.
+(define_code_iterator FCVT [unsigned_float float])
+
 ;; plus and minus are the only SHIFTABLE_OPS for which Thumb2 allows
 ;; a stack pointer opoerand.  The minus operation is a candidate for an rsub
 ;; and hence only plus is supported.
@@ -330,6 +339,22 @@
 
 (define_int_iterator VCVT_US_N [UNSPEC_VCVT_S_N UNSPEC_VCVT_U_N])
 
+(define_int_iterator VCVT_HF_US_N [UNSPEC_VCVT_HF_S_N UNSPEC_VCVT_HF_U_N])
+
+(define_int_iterator VCVT_SI_US_N [UNSPEC_VCVT_SI_S_N UNSPEC_VCVT_SI_U_N])
+
+(define_int_iterator VCVT_HF_US [UNSPEC_VCVTA_S UNSPEC_VCVTA_U
+				 UNSPEC_VCVTM_S UNSPEC_VCVTM_U
+				 UNSPEC_VCVTN_S UNSPEC_VCVTN_U
+				 UNSPEC_VCVTP_S UNSPEC_VCVTP_U])
+
+(define_int_iterator VCVTH_US [UNSPEC_VCVTH_S UNSPEC_VCVTH_U])
+
+;; Operators for FP16 instructions.
+(define_int_iterator FP16_RND [UNSPEC_VRND UNSPEC_VRNDA
+			       UNSPEC_VRNDM UNSPEC_VRNDN
+			       UNSPEC_VRNDP UNSPEC_VRNDX])
+
 (define_int_iterator VQMOVN [UNSPEC_VQMOVN_S UNSPEC_VQMOVN_U])
 
 (define_int_iterator VMOVL [UNSPEC_VMOVL_S UNSPEC_VMOVL_U])
@@ -687,6 +712,12 @@
 (define_code_attr shift [(ashiftrt "ashr") (lshiftrt "lshr")])
 (define_code_attr shifttype [(ashiftrt "signed") (lshiftrt "unsigned")])
 
+;; String reprentations of operations on the sign of a number.
+(define_code_attr absneg_str [(abs "abs") (neg "neg")])
+
+;; Conversions.
+(define_code_attr FCVTI32typename [(unsigned_float "u32") (float "s32")])
+
 ;;----------------------------------------------------------------------------
 ;; Int attributes
 ;;----------------------------------------------------------------------------
@@ -718,7 +749,13 @@
   (UNSPEC_VPMAX "s") (UNSPEC_VPMAX_U "u")
   (UNSPEC_VPMIN "s") (UNSPEC_VPMIN_U "u")
   (UNSPEC_VCVT_S "s") (UNSPEC_VCVT_U "u")
+  (UNSPEC_VCVTA_S "s") (UNSPEC_VCVTA_U "u")
+  (UNSPEC_VCVTM_S "s") (UNSPEC_VCVTM_U "u")
+  (UNSPEC_VCVTN_S "s") (UNSPEC_VCVTN_U "u")
+  (UNSPEC_VCVTP_S "s") (UNSPEC_VCVTP_U "u")
   (UNSPEC_VCVT_S_N "s") (UNSPEC_VCVT_U_N "u")
+  (UNSPEC_VCVT_HF_S_N "s") (UNSPEC_VCVT_HF_U_N "u")
+  (UNSPEC_VCVT_SI_S_N "s") (UNSPEC_VCVT_SI_U_N "u")
   (UNSPEC_VQMOVN_S "s") (UNSPEC_VQMOVN_U "u")
   (UNSPEC_VMOVL_S "s") (UNSPEC_VMOVL_U "u")
   (UNSPEC_VSHL_S "s") (UNSPEC_VSHL_U "u")
@@ -733,9 +770,25 @@
   (UNSPEC_VSHLL_S_N "s") (UNSPEC_VSHLL_U_N "u")
   (UNSPEC_VSRA_S_N "s") (UNSPEC_VSRA_U_N "u")
   (UNSPEC_VRSRA_S_N "s") (UNSPEC_VRSRA_U_N "u")
-
+  (UNSPEC_VCVTH_S "s") (UNSPEC_VCVTH_U "u")
 ])
 
+(define_int_attr vcvth_op
+ [(UNSPEC_VCVTA_S "a") (UNSPEC_VCVTA_U "a")
+  (UNSPEC_VCVTM_S "m") (UNSPEC_VCVTM_U "m")
+  (UNSPEC_VCVTN_S "n") (UNSPEC_VCVTN_U "n")
+  (UNSPEC_VCVTP_S "p") (UNSPEC_VCVTP_U "p")])
+
+(define_int_attr fp16_rnd_str
+  [(UNSPEC_VRND "rnd") (UNSPEC_VRNDA "rnda")
+   (UNSPEC_VRNDM "rndm") (UNSPEC_VRNDN "rndn")
+   (UNSPEC_VRNDP "rndp") (UNSPEC_VRNDX "rndx")])
+
+(define_int_attr fp16_rnd_insn
+  [(UNSPEC_VRND "vrintz") (UNSPEC_VRNDA "vrinta")
+   (UNSPEC_VRNDM "vrintm") (UNSPEC_VRNDN "vrintn")
+   (UNSPEC_VRNDP "vrintp") (UNSPEC_VRNDX "vrintx")])
+
 (define_int_attr cmp_op_unsp [(UNSPEC_VCEQ "eq") (UNSPEC_VCGT "gt")
                               (UNSPEC_VCGE "ge") (UNSPEC_VCLE "le")
                               (UNSPEC_VCLT "lt") (UNSPEC_VCAGE "ge")
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index 5744c62..57a47ff 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -203,6 +203,20 @@
   UNSPEC_VCVT_U
   UNSPEC_VCVT_S_N
   UNSPEC_VCVT_U_N
+  UNSPEC_VCVT_HF_S_N
+  UNSPEC_VCVT_HF_U_N
+  UNSPEC_VCVT_SI_S_N
+  UNSPEC_VCVT_SI_U_N
+  UNSPEC_VCVTH_S
+  UNSPEC_VCVTH_U
+  UNSPEC_VCVTA_S
+  UNSPEC_VCVTA_U
+  UNSPEC_VCVTM_S
+  UNSPEC_VCVTM_U
+  UNSPEC_VCVTN_S
+  UNSPEC_VCVTN_U
+  UNSPEC_VCVTP_S
+  UNSPEC_VCVTP_U
   UNSPEC_VEXT
   UNSPEC_VHADD_S
   UNSPEC_VHADD_U
@@ -365,5 +379,12 @@
   UNSPEC_NVRINTN
   UNSPEC_VQRDMLAH
   UNSPEC_VQRDMLSH
+  UNSPEC_VRND
+  UNSPEC_VRNDA
+  UNSPEC_VRNDI
+  UNSPEC_VRNDM
+  UNSPEC_VRNDN
+  UNSPEC_VRNDP
+  UNSPEC_VRNDX
 ])
 
diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
index b1c13fa..6202fc3 100644
--- a/gcc/config/arm/vfp.md
+++ b/gcc/config/arm/vfp.md
@@ -937,9 +937,73 @@
    (set_attr "type" "ffarithd")]
 )
 
+;; ABS and NEG for FP16.
+(define_insn "<absneg_str>hf2"
+  [(set (match_operand:HF 0 "s_register_operand" "=w")
+    (ABSNEG:HF (match_operand:HF 1 "s_register_operand" "w")))]
+ "TARGET_VFP_FP16INST"
+ "v<absneg_str>.f16\t%0, %1"
+  [(set_attr "conds" "unconditional")
+   (set_attr "type" "ffariths")]
+)
+
+(define_expand "neon_v<absneg_str>hf"
+ [(set
+   (match_operand:HF 0 "s_register_operand")
+   (ABSNEG:HF (match_operand:HF 1 "s_register_operand")))]
+ "TARGET_VFP_FP16INST"
+{
+  emit_insn (gen_<absneg_str>hf2 (operands[0], operands[1]));
+  DONE;
+})
+
+;; VRND for FP16.
+(define_insn "neon_v<fp16_rnd_str>hf"
+  [(set (match_operand:HF 0 "s_register_operand" "=w")
+    (unspec:HF
+     [(match_operand:HF 1 "s_register_operand" "w")]
+     FP16_RND))]
+ "TARGET_VFP_FP16INST"
+ "<fp16_rnd_insn>.f16\t%0, %1"
+ [(set_attr "conds" "unconditional")
+  (set_attr "type" "neon_fp_round_s")]
+)
+
+(define_insn "neon_vrndihf"
+  [(set (match_operand:HF 0 "s_register_operand" "=w")
+    (unspec:HF
+     [(match_operand:HF 1 "s_register_operand" "w")]
+     UNSPEC_VRNDI))]
+  "TARGET_VFP_FP16INST"
+  "vrintr.f16\t%0, %1"
+ [(set_attr "conds" "unconditional")
+  (set_attr "type" "neon_fp_round_s")]
+)
 
 ;; Arithmetic insns
 
+(define_insn "addhf3_fp16"
+  [(set
+    (match_operand:HF 0 "s_register_operand" "=w")
+    (plus:HF
+     (match_operand:HF 1 "s_register_operand" "w")
+     (match_operand:HF 2 "s_register_operand" "w")))]
+ "TARGET_VFP_FP16INST"
+ "vadd.f16\t%0, %1, %2"
+  [(set_attr "conds" "unconditional")
+   (set_attr "type" "fadds")]
+)
+
+(define_expand "neon_vaddhf"
+  [(match_operand:HF 0 "s_register_operand")
+   (match_operand:HF 1 "s_register_operand")
+   (match_operand:HF 2 "s_register_operand")]
+  "TARGET_VFP_FP16INST"
+{
+  emit_insn (gen_addhf3_fp16 (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
 (define_insn "*addsf3_vfp"
   [(set (match_operand:SF	   0 "s_register_operand" "=t")
 	(plus:SF (match_operand:SF 1 "s_register_operand" "t")
@@ -962,6 +1026,28 @@
    (set_attr "type" "faddd")]
 )
 
+(define_insn "subhf3_fp16"
+ [(set
+   (match_operand:HF 0 "s_register_operand" "=w")
+   (minus:HF
+    (match_operand:HF 1 "s_register_operand" "w")
+    (match_operand:HF 2 "s_register_operand" "w")))]
+ "TARGET_VFP_FP16INST"
+ "vsub.f16\t%0, %1, %2"
+  [(set_attr "conds" "unconditional")
+   (set_attr "type" "fadds")]
+)
+
+(define_expand "neon_vsubhf"
+  [(match_operand:HF 0 "s_register_operand")
+   (match_operand:HF 1 "s_register_operand")
+   (match_operand:HF 2 "s_register_operand")]
+  "TARGET_VFP_FP16INST"
+{
+  emit_insn (gen_subhf3_fp16 (operands[0], operands[1],
+			      operands[2]));
+  DONE;
+})
 
 (define_insn "*subsf3_vfp"
   [(set (match_operand:SF	    0 "s_register_operand" "=t")
@@ -988,6 +1074,29 @@
 
 ;; Division insns
 
+;; FP16 Division.
+(define_insn "divhf3_fp16"
+  [(set
+    (match_operand:HF	   0 "s_register_operand" "=w")
+    (div:HF
+     (match_operand:HF 1 "s_register_operand" "w")
+     (match_operand:HF 2 "s_register_operand" "w")))]
+  "TARGET_VFP_FP16INST"
+  "vdiv.f16\t%0, %1, %2"
+  [(set_attr "conds" "unconditional")
+   (set_attr "type" "fdivs")]
+)
+
+(define_expand "neon_vdivhf"
+  [(match_operand:HF 0 "s_register_operand")
+   (match_operand:HF 1 "s_register_operand")
+   (match_operand:HF 2 "s_register_operand")]
+ "TARGET_VFP_FP16INST"
+{
+  emit_insn (gen_divhf3_fp16 (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
 ; VFP9 Erratum 760019: It's potentially unsafe to overwrite the input
 ; operands, so mark the output as early clobber for VFPv2 on ARMv5 or
 ; earlier.
@@ -1018,6 +1127,27 @@
 
 ;; Multiplication insns
 
+(define_insn "mulhf3_fp16"
+ [(set
+   (match_operand:HF 0 "s_register_operand" "=w")
+   (mult:HF (match_operand:HF 1 "s_register_operand" "w")
+	    (match_operand:HF 2 "s_register_operand" "w")))]
+  "TARGET_VFP_FP16INST"
+  "vmul.f16\t%0, %1, %2"
+  [(set_attr "conds" "unconditional")
+   (set_attr "type" "fmuls")]
+)
+
+(define_expand "neon_vmulfhf"
+ [(match_operand:HF 0 "s_register_operand")
+  (match_operand:HF 1 "s_register_operand")
+  (match_operand:HF 2 "s_register_operand")]
+ "TARGET_VFP_FP16INST"
+{
+  emit_insn (gen_mulhf3_fp16 (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
 (define_insn "*mulsf3_vfp"
   [(set (match_operand:SF	   0 "s_register_operand" "=t")
 	(mult:SF (match_operand:SF 1 "s_register_operand" "t")
@@ -1040,6 +1170,26 @@
    (set_attr "type" "fmuld")]
 )
 
+(define_insn "*mulsf3neghf_vfp"
+  [(set (match_operand:HF		   0 "s_register_operand" "=t")
+	(mult:HF (neg:HF (match_operand:HF 1 "s_register_operand" "t"))
+		 (match_operand:HF	   2 "s_register_operand" "t")))]
+  "TARGET_VFP_FP16INST && !flag_rounding_math"
+  "vnmul.f16\\t%0, %1, %2"
+  [(set_attr "conds" "unconditional")
+   (set_attr "type" "fmuls")]
+)
+
+(define_insn "*negmulhf3_vfp"
+  [(set (match_operand:HF		   0 "s_register_operand" "=t")
+	(neg:HF (mult:HF (match_operand:HF 1 "s_register_operand" "t")
+		 (match_operand:HF	   2 "s_register_operand" "t"))))]
+  "TARGET_VFP_FP16INST"
+  "vnmul.f16\\t%0, %1, %2"
+  [(set_attr "conds" "unconditional")
+   (set_attr "type" "fmuls")]
+)
+
 (define_insn "*mulsf3negsf_vfp"
   [(set (match_operand:SF		   0 "s_register_operand" "=t")
 	(mult:SF (neg:SF (match_operand:SF 1 "s_register_operand" "t"))
@@ -1089,6 +1239,18 @@
 ;; Multiply-accumulate insns
 
 ;; 0 = 1 * 2 + 0
+(define_insn "*mulsf3addhf_vfp"
+ [(set (match_operand:HF 0 "s_register_operand" "=t")
+       (plus:HF
+	(mult:HF (match_operand:HF 2 "s_register_operand" "t")
+		 (match_operand:HF 3 "s_register_operand" "t"))
+	(match_operand:HF 1 "s_register_operand" "0")))]
+  "TARGET_VFP_FP16INST"
+  "vmla.f16\\t%0, %2, %3"
+  [(set_attr "conds" "unconditional")
+   (set_attr "type" "fmacs")]
+)
+
 (define_insn "*mulsf3addsf_vfp"
   [(set (match_operand:SF		    0 "s_register_operand" "=t")
 	(plus:SF (mult:SF (match_operand:SF 2 "s_register_operand" "t")
@@ -1114,6 +1276,17 @@
 )
 
 ;; 0 = 1 * 2 - 0
+(define_insn "*mulhf3subhf_vfp"
+  [(set (match_operand:HF 0 "s_register_operand" "=t")
+	(minus:HF (mult:HF (match_operand:HF 2 "s_register_operand" "t")
+			   (match_operand:HF 3 "s_register_operand" "t"))
+		  (match_operand:HF 1 "s_register_operand" "0")))]
+  "TARGET_VFP_FP16INST"
+  "vnmls.f16\\t%0, %2, %3"
+  [(set_attr "conds" "unconditional")
+   (set_attr "type" "fmacs")]
+)
+
 (define_insn "*mulsf3subsf_vfp"
   [(set (match_operand:SF		     0 "s_register_operand" "=t")
 	(minus:SF (mult:SF (match_operand:SF 2 "s_register_operand" "t")
@@ -1139,6 +1312,17 @@
 )
 
 ;; 0 = -(1 * 2) + 0
+(define_insn "*mulhf3neghfaddhf_vfp"
+  [(set (match_operand:HF 0 "s_register_operand" "=t")
+	(minus:HF (match_operand:HF 1 "s_register_operand" "0")
+		  (mult:HF (match_operand:HF 2 "s_register_operand" "t")
+			   (match_operand:HF 3 "s_register_operand" "t"))))]
+  "TARGET_VFP_FP16INST"
+  "vmls.f16\\t%0, %2, %3"
+  [(set_attr "conds" "unconditional")
+   (set_attr "type" "fmacs")]
+)
+
 (define_insn "*mulsf3negsfaddsf_vfp"
   [(set (match_operand:SF		     0 "s_register_operand" "=t")
 	(minus:SF (match_operand:SF	     1 "s_register_operand" "0")
@@ -1165,6 +1349,18 @@
 
 
 ;; 0 = -(1 * 2) - 0
+(define_insn "*mulhf3neghfsubhf_vfp"
+  [(set (match_operand:HF 0 "s_register_operand" "=t")
+	(minus:HF (mult:HF
+		   (neg:HF (match_operand:HF 2 "s_register_operand" "t"))
+		   (match_operand:HF 3 "s_register_operand" "t"))
+		  (match_operand:HF 1 "s_register_operand" "0")))]
+  "TARGET_VFP_FP16INST"
+  "vnmla.f16\\t%0, %2, %3"
+  [(set_attr "conds" "unconditional")
+   (set_attr "type" "fmacs")]
+)
+
 (define_insn "*mulsf3negsfsubsf_vfp"
   [(set (match_operand:SF		      0 "s_register_operand" "=t")
 	(minus:SF (mult:SF
@@ -1193,6 +1389,30 @@
 
 ;; Fused-multiply-accumulate
 
+(define_insn "fmahf4_fp16"
+  [(set (match_operand:HF 0 "register_operand" "=w")
+    (fma:HF
+     (match_operand:HF 1 "register_operand" "w")
+     (match_operand:HF 2 "register_operand" "w")
+     (match_operand:HF 3 "register_operand" "0")))]
+ "TARGET_VFP_FP16INST"
+ "vfma.f16\\t%0, %1, %2"
+ [(set_attr "conds" "unconditional")
+  (set_attr "type" "ffmas")]
+)
+
+(define_expand "neon_vfmahf"
+  [(match_operand:HF 0 "s_register_operand")
+   (match_operand:HF 1 "s_register_operand")
+   (match_operand:HF 2 "s_register_operand")
+   (match_operand:HF 3 "s_register_operand")]
+  "TARGET_VFP_FP16INST"
+{
+  emit_insn (gen_fmahf4_fp16 (operands[0], operands[2], operands[3],
+			      operands[1]));
+  DONE;
+})
+
 (define_insn "fma<SDF:mode>4"
   [(set (match_operand:SDF 0 "register_operand" "=<F_constraint>")
         (fma:SDF (match_operand:SDF 1 "register_operand" "<F_constraint>")
@@ -1205,6 +1425,30 @@
    (set_attr "type" "ffma<vfp_type>")]
 )
 
+(define_insn "fmsubhf4_fp16"
+ [(set (match_operand:HF 0 "register_operand" "=w")
+   (fma:HF
+    (neg:HF (match_operand:HF 1 "register_operand" "w"))
+    (match_operand:HF 2 "register_operand" "w")
+    (match_operand:HF 3 "register_operand" "0")))]
+ "TARGET_VFP_FP16INST"
+ "vfms.f16\\t%0, %1, %2"
+ [(set_attr "conds" "unconditional")
+  (set_attr "type" "ffmas")]
+)
+
+(define_expand "neon_vfmshf"
+  [(match_operand:HF 0 "s_register_operand")
+   (match_operand:HF 1 "s_register_operand")
+   (match_operand:HF 2 "s_register_operand")
+   (match_operand:HF 3 "s_register_operand")]
+  "TARGET_VFP_FP16INST"
+{
+  emit_insn (gen_fmsubhf4_fp16 (operands[0], operands[2], operands[3],
+				operands[1]));
+  DONE;
+})
+
 (define_insn "*fmsub<SDF:mode>4"
   [(set (match_operand:SDF 0 "register_operand" "=<F_constraint>")
 	(fma:SDF (neg:SDF (match_operand:SDF 1 "register_operand"
@@ -1218,6 +1462,17 @@
    (set_attr "type" "ffma<vfp_type>")]
 )
 
+(define_insn "*fnmsubhf4"
+  [(set (match_operand:HF 0 "register_operand" "=w")
+	(fma:HF (match_operand:HF 1 "register_operand" "w")
+		 (match_operand:HF 2 "register_operand" "w")
+		 (neg:HF (match_operand:HF 3 "register_operand" "0"))))]
+  "TARGET_VFP_FP16INST"
+  "vfnms.f16\\t%0, %1, %2"
+  [(set_attr "conds" "unconditional")
+   (set_attr "type" "ffmas")]
+)
+
 (define_insn "*fnmsub<SDF:mode>4"
   [(set (match_operand:SDF 0 "register_operand" "=<F_constraint>")
 	(fma:SDF (match_operand:SDF 1 "register_operand" "<F_constraint>")
@@ -1230,6 +1485,17 @@
    (set_attr "type" "ffma<vfp_type>")]
 )
 
+(define_insn "*fnmaddhf4"
+  [(set (match_operand:HF 0 "register_operand" "=w")
+	(fma:HF (neg:HF (match_operand:HF 1 "register_operand" "w"))
+		 (match_operand:HF 2 "register_operand" "w")
+		 (neg:HF (match_operand:HF 3 "register_operand" "0"))))]
+  "TARGET_VFP_FP16INST"
+  "vfnma.f16\\t%0, %1, %2"
+  [(set_attr "conds" "unconditional")
+   (set_attr "type" "ffmas")]
+)
+
 (define_insn "*fnmadd<SDF:mode>4"
   [(set (match_operand:SDF 0 "register_operand" "=<F_constraint>")
 	(fma:SDF (neg:SDF (match_operand:SDF 1 "register_operand"
@@ -1372,6 +1638,27 @@
 
 ;; Sqrt insns.
 
+(define_insn "neon_vsqrthf"
+  [(set (match_operand:HF 0 "s_register_operand" "=w")
+	(sqrt:HF (match_operand:HF 1 "s_register_operand" "w")))]
+  "TARGET_VFP_FP16INST"
+  "vsqrt.f16\t%0, %1"
+  [(set_attr "conds" "unconditional")
+   (set_attr "type" "fsqrts")]
+)
+
+(define_insn "neon_vrsqrtshf"
+  [(set
+    (match_operand:HF 0 "s_register_operand" "=w")
+    (unspec:HF [(match_operand:HF 1 "s_register_operand" "w")
+		(match_operand:HF 2 "s_register_operand" "w")]
+     UNSPEC_VRSQRTS))]
+ "TARGET_VFP_FP16INST"
+ "vrsqrts.f16\t%0, %1, %2"
+ [(set_attr "conds" "unconditional")
+  (set_attr "type" "fsqrts")]
+)
+
 ; VFP9 Erratum 760019: It's potentially unsafe to overwrite the input
 ; operands, so mark the output as early clobber for VFPv2 on ARMv5 or
 ; earlier.
@@ -1528,9 +1815,6 @@
 )
 
 ;; Fixed point to floating point conversions.
-(define_code_iterator FCVT [unsigned_float float])
-(define_code_attr FCVTI32typename [(unsigned_float "u32") (float "s32")])
-
 (define_insn "*combine_vcvt_f32_<FCVTI32typename>"
   [(set (match_operand:SF 0 "s_register_operand" "=t")
 	(mult:SF (FCVT:SF (match_operand:SI 1 "s_register_operand" "0"))
@@ -1575,6 +1859,125 @@
    (set_attr "type" "f_cvtf2i")]
  )
 
+;; FP16 conversions.
+(define_insn "neon_vcvth<sup>hf"
+ [(set (match_operand:HF 0 "s_register_operand" "=w")
+   (unspec:HF
+    [(match_operand:SI 1 "s_register_operand" "w")]
+    VCVTH_US))]
+ "TARGET_VFP_FP16INST"
+ "vcvt.f16.<sup>%#32\t%0, %1"
+ [(set_attr "conds" "unconditional")
+  (set_attr "type" "f_cvti2f")]
+)
+
+(define_insn "neon_vcvth<sup>si"
+ [(set (match_operand:SI 0 "s_register_operand" "=w")
+   (unspec:SI
+    [(match_operand:HF 1 "s_register_operand" "w")]
+    VCVTH_US))]
+ "TARGET_VFP_FP16INST"
+ "vcvt.<sup>%#32.f16\t%0, %1"
+ [(set_attr "conds" "unconditional")
+  (set_attr "type" "f_cvtf2i")]
+)
+
+;; The neon_vcvth<sup>_nhf patterns are used to generate the instruction for the
+;; vcvth_n_f16_<sup>32 arm_fp16 intrinsics.  They are complicated by the
+;; hardware requirement that the source and destination registers are the same
+;; despite having different machine modes.  The approach is to use a temporary
+;; register for the conversion and move that to the correct destination.
+
+;; Generate an unspec pattern for the intrinsic.
+(define_insn "neon_vcvth<sup>_nhf_unspec"
+ [(set
+   (match_operand:SI 0 "s_register_operand" "=w")
+   (unspec:SI
+    [(match_operand:SI 1 "s_register_operand" "0")
+     (match_operand:SI 2 "immediate_operand" "i")]
+    VCVT_HF_US_N))
+ (set
+  (match_operand:HF 3 "s_register_operand" "=w")
+  (float_truncate:HF (float:SF (match_dup 0))))]
+ "TARGET_VFP_FP16INST"
+{
+  neon_const_bounds (operands[2], 1, 33);
+  return "vcvt.f16.<sup>32\t%0, %0, %2\;vmov.f32\t%3, %0";
+}
+  [(set_attr "conds" "unconditional")
+   (set_attr "type" "f_cvti2f")]
+)
+
+;; Generate the instruction patterns needed for vcvth_n_f16_s32 neon intrinsics.
+(define_expand "neon_vcvth<sup>_nhf"
+ [(match_operand:HF 0 "s_register_operand")
+  (unspec:HF [(match_operand:SI 1 "s_register_operand")
+	      (match_operand:SI 2 "immediate_operand")]
+   VCVT_HF_US_N)]
+"TARGET_VFP_FP16INST"
+{
+  rtx op1 = gen_reg_rtx (SImode);
+
+  neon_const_bounds (operands[2], 1, 33);
+
+  emit_move_insn (op1, operands[1]);
+  emit_insn (gen_neon_vcvth<sup>_nhf_unspec (op1, op1, operands[2],
+					     operands[0]));
+  DONE;
+})
+
+;; The neon_vcvth<sup>_nsi patterns are used to generate the instruction for the
+;; vcvth_n_<sup>32_f16 arm_fp16 intrinsics.  They have the same restrictions and
+;; are implemented in the same way as the neon_vcvth<sup>_nhf patterns.
+
+;; Generate an unspec pattern, constraining the registers.
+(define_insn "neon_vcvth<sup>_nsi_unspec"
+ [(set (match_operand:SI 0 "s_register_operand" "=w")
+   (unspec:SI
+    [(fix:SI
+      (fix:SF
+       (float_extend:SF
+	(match_operand:HF 1 "s_register_operand" "w"))))
+     (match_operand:SI 2 "immediate_operand" "i")]
+    VCVT_SI_US_N))]
+ "TARGET_VFP_FP16INST"
+{
+  neon_const_bounds (operands[2], 1, 33);
+  return "vmov.f32\t%0, %1\;vcvt.<sup>%#32.f16\t%0, %0, %2";
+}
+  [(set_attr "conds" "unconditional")
+   (set_attr "type" "f_cvtf2i")]
+)
+
+;; Generate the instruction patterns needed for vcvth_n_f16_s32 neon intrinsics.
+(define_expand "neon_vcvth<sup>_nsi"
+ [(match_operand:SI 0 "s_register_operand")
+  (unspec:SI
+   [(match_operand:HF 1 "s_register_operand")
+    (match_operand:SI 2 "immediate_operand")]
+   VCVT_SI_US_N)]
+ "TARGET_VFP_FP16INST"
+{
+  rtx op1 = gen_reg_rtx (SImode);
+
+  neon_const_bounds (operands[2], 1, 33);
+  emit_insn (gen_neon_vcvth<sup>_nsi_unspec (op1, operands[1], operands[2]));
+  emit_move_insn (operands[0], op1);
+  DONE;
+})
+
+(define_insn "neon_vcvt<vcvth_op>h<sup>si"
+ [(set
+   (match_operand:SI 0 "s_register_operand" "=w")
+   (unspec:SI
+    [(match_operand:HF 1 "s_register_operand" "w")]
+    VCVT_HF_US))]
+ "TARGET_VFP_FP16INST"
+ "vcvt<vcvth_op>.<sup>%#32.f16\t%0, %1"
+  [(set_attr "conds" "unconditional")
+   (set_attr "type" "f_cvtf2i")]
+)
+
 ;; Store multiple insn used in function prologue.
 (define_insn "*push_multi_vfp"
   [(match_parallel 2 "multi_register_push"
@@ -1644,6 +2047,20 @@
 )
 
 ;; Scalar forms for the IEEE-754 fmax()/fmin() functions
+
+(define_insn "neon_<fmaxmin_op>hf"
+ [(set
+   (match_operand:HF 0 "s_register_operand" "=w")
+   (unspec:HF
+    [(match_operand:HF 1 "s_register_operand" "w")
+     (match_operand:HF 2 "s_register_operand" "w")]
+    VMAXMINFNM))]
+ "TARGET_VFP_FP16INST"
+ "<fmaxmin_op>.f16\t%0, %1, %2"
+ [(set_attr "conds" "unconditional")
+  (set_attr "type" "f_minmaxs")]
+)
+
 (define_insn "<fmaxmin><mode>3"
   [(set (match_operand:SDF 0 "s_register_operand" "=<F_constraint>")
 	(unspec:SDF [(match_operand:SDF 1 "s_register_operand" "<F_constraint>")
diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-arith-1.c b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-arith-1.c
new file mode 100644
index 0000000..8399288
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-arith-1.c
@@ -0,0 +1,68 @@
+/* { dg-do compile }  */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_ok }  */
+/* { dg-options "-O2 -ffast-math" }  */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+/* Test instructions generated for half-precision arithmetic.  */
+
+typedef __fp16 float16_t;
+typedef __simd64_float16_t float16x4_t;
+typedef __simd128_float16_t float16x8_t;
+
+float16_t
+fp16_abs (float16_t a)
+{
+  return (a < 0) ? -a : a;
+}
+
+#define TEST_UNOP(NAME, OPERATOR, TY)		\
+  TY test_##NAME##_##TY (TY a)			\
+  {						\
+    return OPERATOR (a);			\
+  }
+
+#define TEST_BINOP(NAME, OPERATOR, TY)		\
+  TY test_##NAME##_##TY (TY a, TY b)		\
+  {						\
+    return a OPERATOR b;			\
+  }
+
+#define TEST_CMP(NAME, OPERATOR, RTY, TY)	\
+  RTY test_##NAME##_##TY (TY a, TY b)		\
+  {						\
+    return a OPERATOR b;			\
+  }
+
+/* Scalars.  */
+
+TEST_UNOP (neg, -, float16_t)
+TEST_UNOP (abs, fp16_abs, float16_t)
+
+TEST_BINOP (add, +, float16_t)
+TEST_BINOP (sub, -, float16_t)
+TEST_BINOP (mult, *, float16_t)
+TEST_BINOP (div, /, float16_t)
+
+TEST_CMP (equal, ==, int, float16_t)
+TEST_CMP (unequal, !=, int, float16_t)
+TEST_CMP (lessthan, <, int, float16_t)
+TEST_CMP (greaterthan, >, int, float16_t)
+TEST_CMP (lessthanequal, <=, int, float16_t)
+TEST_CMP (greaterthanqual, >=, int, float16_t)
+
+/* { dg-final { scan-assembler-times {vneg\.f16\ts[0-9]+, s[0-9]+} 1 } }  */
+/* { dg-final { scan-assembler-times {vabs\.f16\ts[0-9]+, s[0-9]+} 2 } }  */
+
+/* { dg-final { scan-assembler-times {vadd\.f32\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } }  */
+/* { dg-final { scan-assembler-times {vsub\.f32\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } }  */
+/* { dg-final { scan-assembler-times {vmul\.f32\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } }  */
+/* { dg-final { scan-assembler-times {vdiv\.f32\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } }  */
+/* { dg-final { scan-assembler-times {vcmp\.f32\ts[0-9]+, s[0-9]+} 2 } }  */
+/* { dg-final { scan-assembler-times {vcmpe\.f32\ts[0-9]+, s[0-9]+} 4 } }  */
+
+/* { dg-final { scan-assembler-not {vadd\.f16} } }  */
+/* { dg-final { scan-assembler-not {vsub\.f16} } }  */
+/* { dg-final { scan-assembler-not {vmul\.f16} } }  */
+/* { dg-final { scan-assembler-not {vdiv\.f16} } }  */
+/* { dg-final { scan-assembler-not {vcmp\.f16} } }  */
+/* { dg-final { scan-assembler-not {vcmpe\.f16} } }  */
diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-conv-1.c b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-conv-1.c
new file mode 100644
index 0000000..c9639a5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-conv-1.c
@@ -0,0 +1,101 @@
+/* { dg-do compile }  */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_ok }  */
+/* { dg-options "-O2" }  */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+/* Test ARMv8.2 FP16 conversions.  */
+#include <arm_fp16.h>
+
+float
+f16_to_f32 (__fp16 a)
+{
+  return (float)a;
+}
+
+float
+f16_to_pf32 (__fp16* a)
+{
+  return (float)*a;
+}
+
+short
+f16_to_s16 (__fp16 a)
+{
+  return (short)a;
+}
+
+short
+pf16_to_s16 (__fp16* a)
+{
+  return (short)*a;
+}
+
+/* { dg-final { scan-assembler-times {vcvtb\.f32\.f16\ts[0-9]+, s[0-9]+} 4 } }  */
+
+__fp16
+f32_to_f16 (float a)
+{
+  return (__fp16)a;
+}
+
+void
+f32_to_pf16 (__fp16* x, float a)
+{
+  *x = (__fp16)a;
+}
+
+__fp16
+s16_to_f16 (short a)
+{
+  return (__fp16)a;
+}
+
+void
+s16_to_pf16 (__fp16* x, short a)
+{
+  *x = (__fp16)a;
+}
+
+/* { dg-final { scan-assembler-times {vcvtb\.f16\.f32\ts[0-9]+, s[0-9]+} 4 } }  */
+
+float
+s16_to_f32 (short a)
+{
+  return (float)a;
+}
+
+/* { dg-final { scan-assembler-times {vcvt\.f32\.s32\ts[0-9]+, s[0-9]+} 3 } }  */
+
+short
+f32_to_s16 (float a)
+{
+  return (short)a;
+}
+
+/* { dg-final { scan-assembler-times {vcvt\.s32\.f32\ts[0-9]+, s[0-9]+} 3 } }  */
+
+unsigned short
+f32_to_u16 (float a)
+{
+  return (unsigned short)a;
+}
+
+/* { dg-final { scan-assembler-times {vcvt\.u32\.f32\ts[0-9]+, s[0-9]+} 1 } }  */
+
+short
+f64_to_s16 (double a)
+{
+  return (short)a;
+}
+
+/* { dg-final { scan-assembler-times {vcvt\.s32\.f64\ts[0-9]+, d[0-9]+} 1 } }  */
+
+unsigned short
+f64_to_u16 (double a)
+{
+  return (unsigned short)a;
+}
+
+/* { dg-final { scan-assembler-times {vcvt\.s32\.f64\ts[0-9]+, d[0-9]+} 1 } }  */
+
+
-- 
2.1.4


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 9/17][ARM] Add NEON FP16 arithmetic instructions.
  2016-05-17 14:20 [PATCH 0/17][ARM] ARMv8.2-A and FP16 extension support Matthew Wahab
                   ` (7 preceding siblings ...)
  2016-05-17 14:36 ` [PATCH 8/17][ARM] Add VFP FP16 arithmetic instructions Matthew Wahab
@ 2016-05-17 14:37 ` Matthew Wahab
  2016-05-18  0:58   ` Joseph Myers
  2016-05-17 14:39 ` [PATCH 10/17][ARM] Refactor support code for NEON builtins Matthew Wahab
                   ` (7 subsequent siblings)
  16 siblings, 1 reply; 73+ messages in thread
From: Matthew Wahab @ 2016-05-17 14:37 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 3409 bytes --]

The ARMv8.2-A FP16 extension adds a number of arithmetic instrutctions
to the NEON instruction set. This patch adds support for these
instructions to the ARM backend.

As with the VFP FP16 arithmetic instructions, operations on __fp16
values are done by conversion to single-precision. Any new optimization
supported by the instruction descriptions can only apply to code
generated using intrinsics added in this patch series.

A number of the instructions are modelled as two variants, one using
UNSPEC and the other using RTL operations, with the model used decided
by the funsafe-math-optimizations flag. This follows the
single-precision instructions and is due to the half-precision
operations having the same conditions and restrictions on their use in
optmizations (when they are enabled).

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator.

Ok for trunk?
Matthew

2016-05-17  Matthew Wahab  <matthew.wahab@arm.com>

	* config/arm/iterators.md (VCVTHI): New.
	(NEON_VCMP): Add UNSPEC_VCLT and UNSPEC_VCLE.  Fix a long line.
	(NEON_VAGLTE): New.
	(VFM_LANE_AS): New.
	(VH_CVTTO): New.
	(V_reg): Add HF, V4HF and V8HF.  Fix white-space.
	(V_HALF): Add V4HF.  Fix white-space.
	(V_if_elem): Add HF, V4HF and V8HF.  Fix white-space.
	(V_s_elem): Likewise.
	(V_sz_elem): Fix white-space.
	(V_elem_ch): Likewise.
	(VH_elem_ch): New.
	(scalar_mul_constraint): Add V8HF and V4HF.
	(Is_float_mode): Fix white-space.
	(Is_d_reg): Likewise.
	(q): Add HF.  Fix white-space.
	(float_sup): New.
	(float_SUP): New.
	(cmp_op_unsp): Add UNSPEC_VCALE and UNSPEC_VCALT.
	(neon_vfm_lane_as): New.
	* config/arm/neon.md (add<mode>3_fp16): New.
	(sub<mode>3_fp16): New.
	(mul<mode>3add<mode>_neon): New.
	(*fma<VH:mode>4): New.
	(fma<VH:mode>4_intrinsic): New.
	(fmsub<VCVTF:mode>4_intrinsic): Fix white-space.
	(*fmsub<VH:mode>4): New.
	(fmsub<VH:mode>4_intrinsic): New.
	(<absneg_str><mode>2_fp16): New.
	(neon_v<absneg_str><mode>): New.
	(neon_v<fp16_rnd_str><mode>): New.
	(neon_vsqrte<mode>): New.
	(neon_vpaddv4hf): New.
	(neon_vadd<mode>): New.
	(neon_vsub<mode>): New.
	(neon_vadd<mode>_unspec): New.
	(neon_vsub<mode>_unspec): New.
	(neon_vmulf<mode>): New.
	(neon_vfma<VH:mode>): New.
	(neon_vfms<VH:mode>): New.
	(neon_vc<cmp_op><mode>): New.
	(neon_vc<cmp_op><mode>_fp16insn): New
	(neon_vc<cmp_op_unsp><mode>_fp16insn_unspec): New.
	(neon_vca<cmp_op><mode>): New.
	(neon_vca<cmp_op><mode>_fp16insn): New.
	(neon_vca<cmp_op_unsp><mode>_fp16insn_unspec): New.
	(neon_vc<cmp_op>z<mode>): New.
	(neon_vabd<mode>): New.
	(neon_v<maxmin>f<mode>): New.
	(neon_vp<maxmin>fv4hf): New.
	(neon_<fmaxmin_op><mode>): New.
	(neon_vrecps<mode>): New.
	(neon_vrsqrts<mode>): New.
	(neon_vrecpe<mode>): New (VH variant).
	(neon_vcvt<sup><mode>): New (VCVTHI variant).
	(neon_vcvt<sup><mode>): New (VH variant).
	(neon_vcvt<sup>_n<mode>): New (VH variant).
	(neon_vcvt<sup>_n<mode>): New (VCVTHI variant).
	(neon_vcvt<vcvth_op><sup><mode>): New (VH variant).
	(neon_vmul_lane<mode>): New.
	(neon_vmul_n<mode>): New.
	* config/arm/unspecs.md (UNSPEC_VCALE): New
	(UNSPEC_VCALT): New.
	(UNSPEC_VFMA_LANE): New.
	(UNSPECS_VFMS_LANE): New.
	(UNSPECS_VSQRTE): New.

testsuite/
2016-05-17  Matthew Wahab  <matthew.wahab@arm.com>

	* gcc.target/arm/armv8_2-fp16-arith-1.c: Add tests for float16x4_t
	and float16x8_t.


[-- Attachment #2: 0009-PATCH-9-17-ARM-Add-NEON-FP16-arithmetic-instructions.patch --]
[-- Type: text/x-patch, Size: 38036 bytes --]

From 623f36632cc2848f16ba1c75f400198a72dc6ea4 Mon Sep 17 00:00:00 2001
From: Matthew Wahab <matthew.wahab@arm.com>
Date: Thu, 7 Apr 2016 16:19:57 +0100
Subject: [PATCH 09/17] [PATCH 9/17][ARM] Add NEON FP16 arithmetic
 instructions.

2016-05-17  Matthew Wahab  <matthew.wahab@arm.com>

	* config/arm/iterators.md (VCVTHI): New.
	(NEON_VCMP): Add UNSPEC_VCLT and UNSPEC_VCLE.  Fix a long line.
	(NEON_VAGLTE): New.
	(VFM_LANE_AS): New.
	(VH_CVTTO): New.
	(V_reg): Add HF, V4HF and V8HF.  Fix white-space.
	(V_HALF): Add V4HF.  Fix white-space.
	(V_if_elem): Add HF, V4HF and V8HF.  Fix white-space.
	(V_s_elem): Likewise.
	(V_sz_elem): Fix white-space.
	(V_elem_ch): Likewise.
	(VH_elem_ch): New.
	(scalar_mul_constraint): Add V8HF and V4HF.
	(Is_float_mode): Fix white-space.
	(Is_d_reg): Add V4HF and V8HF.  Fix white-space.
	(q): Add HF.  Fix white-space.
	(float_sup): New.
	(float_SUP): New.
	(cmp_op_unsp): Add UNSPEC_VCALE and UNSPEC_VCALT.
	(neon_vfm_lane_as): New.
	* config/arm/neon.md (add<mode>3_fp16): New.
	(sub<mode>3_fp16): New.
	(mul<mode>3add<mode>_neon): New.
	(*fma<VH:mode>4): New.
	(fma<VH:mode>4_intrinsic): New.
	(fmsub<VCVTF:mode>4_intrinsic): Fix white-space.
	(*fmsub<VH:mode>4): New.
	(fmsub<VH:mode>4_intrinsic): New.
	(<absneg_str><mode>2_fp16): New.
	(neon_v<absneg_str><mode>): New.
	(neon_v<fp16_rnd_str><mode>): New.
	(neon_vsqrte<mode>): New.
	(neon_vpadd<mode>): New.
	(neon_vadd<mode>): New.
	(neon_vsub<mode>): New.
	(neon_vadd<mode>_unspec): New.
	(neon_vsub<mode>_unspec): New.
	(neon_vmulf<mode>): New.
	(neon_vfma<VH:mode>): New.
	(neon_vfms<VH:mode>): New.
	(neon_vc<cmp_op><mode>): New.
	(neon_vc<cmp_op><mode>_fp16insn): New
	(neon_vc<cmp_op_unsp><mode>_fp16insn_unspec): New.
	(neon_vca<cmp_op><mode>): New.
	(neon_vca<cmp_op><mode>_fp16insn): New.
	(neon_vca<cmp_op_unsp><mode>_fp16insn_unspec): New.
	(neon_vc<cmp_op>z<mode>): New.
	(neon_vabd<mode>): New.
	(neon_v<maxmin>f<mode>): New.
	(neon_vp<maxmin>f<mode>): New.
	(neon_<fmaxmin_op><mode>): New.
	(neon_vrecps<mode>): New.
	(neon_vrsqrts<mode>): New.
	(neon_vrecpe<mode>): New (VH variant).
	(neon_vdup_lane<mode>_internal): New.
	(neon_vdup_lane<mode>): New.
	(neon_vcvt<sup><mode>): New (VCVTHI variant).
	(neon_vcvt<sup><mode>): New (VH variant).
	(neon_vcvt<sup>_n<mode>): New (VH variant).
	(neon_vcvt<sup>_n<mode>): New (VCVTHI variant).
	(neon_vcvt<vcvth_op><sup><mode>): New (VH variant).
	(neon_vmul_lane<mode>): New.
	(neon_vmul_n<mode>): New.
	* config/arm/unspecs.md (UNSPEC_VCALE): New
	(UNSPEC_VCALT): New.
	(UNSPEC_VFMA_LANE): New.
	(UNSPECS_VFMS_LANE): New.
	(UNSPECS_VSQRTE): New.

testsuite/
2016-05-17  Matthew Wahab  <matthew.wahab@arm.com>

	* gcc.target/arm/armv8_2-fp16-arith-1.c: Add tests for float16x4_t
	and float16x8_t.
---
 gcc/config/arm/iterators.md                        | 121 +++--
 gcc/config/arm/neon.md                             | 503 ++++++++++++++++++++-
 gcc/config/arm/unspecs.md                          |   6 +-
 .../gcc.target/arm/armv8_2-fp16-arith-1.c          |  49 +-
 4 files changed, 621 insertions(+), 58 deletions(-)

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 9371b6a..be39e4a 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -145,6 +145,9 @@
 ;; Vector modes form int->float conversions.
 (define_mode_iterator VCVTI [V2SI V4SI])
 
+;; Vector modes for int->half conversions.
+(define_mode_iterator VCVTHI [V4HI V8HI])
+
 ;; Vector modes for doubleword multiply-accumulate, etc. insns.
 (define_mode_iterator VMD [V4HI V2SI V2SF])
 
@@ -267,10 +270,14 @@
 (define_int_iterator VRINT [UNSPEC_VRINTZ UNSPEC_VRINTP UNSPEC_VRINTM
                             UNSPEC_VRINTR UNSPEC_VRINTX UNSPEC_VRINTA])
 
-(define_int_iterator NEON_VCMP [UNSPEC_VCEQ UNSPEC_VCGT UNSPEC_VCGE UNSPEC_VCLT UNSPEC_VCLE])
+(define_int_iterator NEON_VCMP [UNSPEC_VCEQ UNSPEC_VCGT UNSPEC_VCGE
+				UNSPEC_VCLT UNSPEC_VCLE])
 
 (define_int_iterator NEON_VACMP [UNSPEC_VCAGE UNSPEC_VCAGT])
 
+(define_int_iterator NEON_VAGLTE [UNSPEC_VCAGE UNSPEC_VCAGT
+				  UNSPEC_VCALE UNSPEC_VCALT])
+
 (define_int_iterator VCVT [UNSPEC_VRINTP UNSPEC_VRINTM UNSPEC_VRINTA])
 
 (define_int_iterator NEON_VRINT [UNSPEC_NVRINTP UNSPEC_NVRINTZ UNSPEC_NVRINTM
@@ -398,6 +405,8 @@
 
 (define_int_iterator VQRDMLH_AS [UNSPEC_VQRDMLAH UNSPEC_VQRDMLSH])
 
+(define_int_iterator VFM_LANE_AS [UNSPEC_VFMA_LANE UNSPEC_VFMS_LANE])
+
 ;;----------------------------------------------------------------------------
 ;; Mode attributes
 ;;----------------------------------------------------------------------------
@@ -416,6 +425,10 @@
 (define_mode_attr V_cvtto [(V2SI "v2sf") (V2SF "v2si")
                            (V4SI "v4sf") (V4SF "v4si")])
 
+;; (Opposite) mode to convert to/from for vector-half mode conversions.
+(define_mode_attr VH_CVTTO [(V4HI "V4HF") (V4HF "V4HI")
+			    (V8HI "V8HF") (V8HF "V8HI")])
+
 ;; Define element mode for each vector mode.
 (define_mode_attr V_elem [(V8QI "QI") (V16QI "QI")
 			  (V4HI "HI") (V8HI "HI")
@@ -459,12 +472,13 @@
 
 ;; Register width from element mode
 (define_mode_attr V_reg [(V8QI "P") (V16QI "q")
-                         (V4HI "P") (V8HI  "q")
-                         (V4HF "P") (V8HF  "q")
-                         (V2SI "P") (V4SI  "q")
-                         (V2SF "P") (V4SF  "q")
-                         (DI   "P") (V2DI  "q")
-                         (SF   "")  (DF    "P")])
+			 (V4HI "P") (V8HI  "q")
+			 (V4HF "P") (V8HF  "q")
+			 (V2SI "P") (V4SI  "q")
+			 (V2SF "P") (V4SF  "q")
+			 (DI   "P") (V2DI  "q")
+			 (SF   "")  (DF    "P")
+			 (HF   "")])
 
 ;; Wider modes with the same number of elements.
 (define_mode_attr V_widen [(V8QI "V8HI") (V4HI "V4SI") (V2SI "V2DI")])
@@ -480,7 +494,7 @@
 (define_mode_attr V_HALF [(V16QI "V8QI") (V8HI "V4HI")
 			  (V8HF "V4HF") (V4SI  "V2SI")
 			  (V4SF "V2SF") (V2DF "DF")
-                          (V2DI "DI")])
+			  (V2DI "DI") (V4HF "HF")])
 
 ;; Same, but lower-case.
 (define_mode_attr V_half [(V16QI "v8qi") (V8HI "v4hi")
@@ -529,18 +543,22 @@
 ;; Get element type from double-width mode, for operations where we 
 ;; don't care about signedness.
 (define_mode_attr V_if_elem [(V8QI "i8")  (V16QI "i8")
-                 (V4HI "i16") (V8HI  "i16")
-                             (V2SI "i32") (V4SI  "i32")
-                             (DI   "i64") (V2DI  "i64")
-                 (V2SF "f32") (V4SF  "f32")
-                 (SF "f32") (DF "f64")])
+			     (V4HI "i16") (V8HI  "i16")
+			     (V2SI "i32") (V4SI  "i32")
+			     (DI   "i64") (V2DI  "i64")
+			     (V2SF "f32") (V4SF  "f32")
+			     (SF   "f32") (DF    "f64")
+			     (HF   "f16") (V4HF  "f16")
+			     (V8HF "f16")])
 
 ;; Same, but for operations which work on signed values.
 (define_mode_attr V_s_elem [(V8QI "s8")  (V16QI "s8")
-                (V4HI "s16") (V8HI  "s16")
-                            (V2SI "s32") (V4SI  "s32")
-                            (DI   "s64") (V2DI  "s64")
-                (V2SF "f32") (V4SF  "f32")])
+			    (V4HI "s16") (V8HI  "s16")
+			    (V2SI "s32") (V4SI  "s32")
+			    (DI   "s64") (V2DI  "s64")
+			    (V2SF "f32") (V4SF  "f32")
+			    (HF   "f16") (V4HF  "f16")
+			    (V8HF "f16")])
 
 ;; Same, but for operations which work on unsigned values.
 (define_mode_attr V_u_elem [(V8QI "u8")  (V16QI "u8")
@@ -557,17 +575,22 @@
                              (V2SF "32") (V4SF "32")])
 
 (define_mode_attr V_sz_elem [(V8QI "8")  (V16QI "8")
-                 (V4HI "16") (V8HI  "16")
-                             (V2SI "32") (V4SI  "32")
-                             (DI   "64") (V2DI  "64")
+			     (V4HI "16") (V8HI  "16")
+			     (V2SI "32") (V4SI  "32")
+			     (DI   "64") (V2DI  "64")
 			     (V4HF "16") (V8HF "16")
-                 (V2SF "32") (V4SF  "32")])
+			     (V2SF "32") (V4SF  "32")])
 
 (define_mode_attr V_elem_ch [(V8QI "b")  (V16QI "b")
-                             (V4HI "h") (V8HI  "h")
-                             (V2SI "s") (V4SI  "s")
-                             (DI   "d") (V2DI  "d")
-                             (V2SF "s") (V4SF  "s")])
+			     (V4HI "h") (V8HI  "h")
+			     (V2SI "s") (V4SI  "s")
+			     (DI   "d") (V2DI  "d")
+			     (V2SF "s") (V4SF  "s")
+			     (V2SF "s") (V4SF  "s")])
+
+(define_mode_attr VH_elem_ch [(V4HI "s") (V8HI  "s")
+			      (V4HF "s") (V8HF  "s")
+			      (HF "s")])
 
 ;; Element sizes for duplicating ARM registers to all elements of a vector.
 (define_mode_attr VD_dup [(V8QI "8") (V4HI "16") (V2SI "32") (V2SF "32")])
@@ -603,16 +626,17 @@
 ;; This mode attribute is used to obtain the correct register constraints.
 
 (define_mode_attr scalar_mul_constraint [(V4HI "x") (V2SI "t") (V2SF "t")
-                                         (V8HI "x") (V4SI "t") (V4SF "t")])
+					 (V8HI "x") (V4SI "t") (V4SF "t")
+					 (V8HF "x") (V4HF "x")])
 
 ;; Predicates used for setting type for neon instructions
 
 (define_mode_attr Is_float_mode [(V8QI "false") (V16QI "false")
-                 (V4HI "false") (V8HI "false")
-                 (V2SI "false") (V4SI "false")
-                 (V4HF "true") (V8HF "true")
-                 (V2SF "true") (V4SF "true")
-                 (DI "false") (V2DI "false")])
+				 (V4HI "false") (V8HI "false")
+				 (V2SI "false") (V4SI "false")
+				 (V4HF "true") (V8HF "true")
+				 (V2SF "true") (V4SF "true")
+				 (DI "false") (V2DI "false")])
 
 (define_mode_attr Scalar_mul_8_16 [(V8QI "true") (V16QI "true")
 				   (V4HI "true") (V8HI "true")
@@ -621,10 +645,10 @@
 				   (DI "false") (V2DI "false")])
 
 (define_mode_attr Is_d_reg [(V8QI "true") (V16QI "false")
-                            (V4HI "true") (V8HI  "false")
-                            (V2SI "true") (V4SI  "false")
-                            (V2SF "true") (V4SF  "false")
-                            (DI   "true") (V2DI  "false")
+			    (V4HI "true") (V8HI  "false")
+			    (V2SI "true") (V4SI  "false")
+			    (V2SF "true") (V4SF  "false")
+			    (DI   "true") (V2DI  "false")
 			    (V4HF "true") (V8HF  "false")])
 
 (define_mode_attr V_mode_nunits [(V8QI "8") (V16QI "16")
@@ -670,12 +694,14 @@
 
 ;; Mode attribute used to build the "type" attribute.
 (define_mode_attr q [(V8QI "") (V16QI "_q")
-                     (V4HI "") (V8HI "_q")
-                     (V2SI "") (V4SI "_q")
+		     (V4HI "") (V8HI "_q")
+		     (V2SI "") (V4SI "_q")
 		     (V4HF "") (V8HF "_q")
-                     (V2SF "") (V4SF "_q")
-                     (DI "")   (V2DI "_q")
-                     (DF "")   (V2DF "_q")])
+		     (V2SF "") (V4SF "_q")
+		     (V4HF "") (V8HF "_q")
+		     (DI "")   (V2DI "_q")
+		     (DF "")   (V2DF "_q")
+		     (HF "")])
 
 (define_mode_attr pf [(V8QI "p") (V16QI "p") (V2SF "f") (V4SF "f")])
 
@@ -718,6 +744,10 @@
 ;; Conversions.
 (define_code_attr FCVTI32typename [(unsigned_float "u32") (float "s32")])
 
+(define_code_attr float_sup [(unsigned_float "u") (float "s")])
+
+(define_code_attr float_SUP [(unsigned_float "U") (float "S")])
+
 ;;----------------------------------------------------------------------------
 ;; Int attributes
 ;;----------------------------------------------------------------------------
@@ -790,9 +820,10 @@
    (UNSPEC_VRNDP "vrintp") (UNSPEC_VRNDX "vrintx")])
 
 (define_int_attr cmp_op_unsp [(UNSPEC_VCEQ "eq") (UNSPEC_VCGT "gt")
-                              (UNSPEC_VCGE "ge") (UNSPEC_VCLE "le")
-                              (UNSPEC_VCLT "lt") (UNSPEC_VCAGE "ge")
-                              (UNSPEC_VCAGT "gt")])
+			      (UNSPEC_VCGE "ge") (UNSPEC_VCLE "le")
+			      (UNSPEC_VCLT "lt") (UNSPEC_VCAGE "ge")
+			      (UNSPEC_VCAGT "gt") (UNSPEC_VCALE "le")
+			      (UNSPEC_VCALT "lt")])
 
 (define_int_attr r [
   (UNSPEC_VRHADD_S "r") (UNSPEC_VRHADD_U "r")
@@ -908,3 +939,7 @@
 
 ;; Attributes for VQRDMLAH/VQRDMLSH
 (define_int_attr neon_rdma_as [(UNSPEC_VQRDMLAH "a") (UNSPEC_VQRDMLSH "s")])
+
+;; Attributes for VFMA_LANE/ VFMS_LANE
+(define_int_attr neon_vfm_lane_as
+ [(UNSPEC_VFMA_LANE "a") (UNSPEC_VFMS_LANE "s")])
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 5fcc991..7a44f5f 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -505,6 +505,20 @@
                     (const_string "neon_add<q>")))]
 )
 
+(define_insn "add<mode>3_fp16"
+  [(set
+    (match_operand:VH 0 "s_register_operand" "=w")
+    (plus:VH
+     (match_operand:VH 1 "s_register_operand" "w")
+     (match_operand:VH 2 "s_register_operand" "w")))]
+ "TARGET_NEON_FP16INST"
+ "vadd.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+ [(set (attr "type")
+   (if_then_else (match_test "<Is_float_mode>")
+    (const_string "neon_fp_addsub_s<q>")
+    (const_string "neon_add<q>")))]
+)
+
 (define_insn "adddi3_neon"
   [(set (match_operand:DI 0 "s_register_operand" "=w,?&r,?&r,?w,?&r,?&r,?&r")
         (plus:DI (match_operand:DI 1 "s_register_operand" "%w,0,0,w,r,0,r")
@@ -543,6 +557,17 @@
                     (const_string "neon_sub<q>")))]
 )
 
+(define_insn "sub<mode>3_fp16"
+ [(set
+   (match_operand:VH 0 "s_register_operand" "=w")
+   (minus:VH
+    (match_operand:VH 1 "s_register_operand" "w")
+    (match_operand:VH 2 "s_register_operand" "w")))]
+ "TARGET_NEON_FP16INST"
+ "vsub.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+ [(set_attr "type" "neon_sub<q>")]
+)
+
 (define_insn "subdi3_neon"
   [(set (match_operand:DI 0 "s_register_operand" "=w,?&r,?&r,?&r,?w")
         (minus:DI (match_operand:DI 1 "s_register_operand" "w,0,r,0,w")
@@ -591,6 +616,16 @@
 		    (const_string "neon_mla_<V_elem_ch><q>")))]
 )
 
+(define_insn "mul<mode>3add<mode>_neon"
+  [(set (match_operand:VH 0 "s_register_operand" "=w")
+	(plus:VH (mult:VH (match_operand:VH 2 "s_register_operand" "w")
+			  (match_operand:VH 3 "s_register_operand" "w"))
+		  (match_operand:VH 1 "s_register_operand" "0")))]
+  "TARGET_NEON_FP16INST && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
+  "vmla.f16\t%<V_reg>0, %<V_reg>2, %<V_reg>3"
+  [(set_attr "type" "neon_fp_mla_s<q>")]
+)
+
 (define_insn "mul<mode>3neg<mode>add<mode>_neon"
   [(set (match_operand:VDQW 0 "s_register_operand" "=w")
         (minus:VDQW (match_operand:VDQW 1 "s_register_operand" "0")
@@ -629,6 +664,28 @@
   [(set_attr "type" "neon_fp_mla_s<q>")]
 )
 
+(define_insn "*fma<VH:mode>4"
+  [(set (match_operand:VH 0 "register_operand" "=w")
+    (fma:VH
+     (match_operand:VH 1 "register_operand" "w")
+     (match_operand:VH 2 "register_operand" "w")
+     (match_operand:VH 3 "register_operand" "0")))]
+ "TARGET_NEON_FP16INST && flag_unsafe_math_optimizations"
+ "vfma.<V_if_elem>\\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+ [(set_attr "type" "neon_fp_mla_s<q>")]
+)
+
+(define_insn "fma<VH:mode>4_intrinsic"
+ [(set (match_operand:VH 0 "register_operand" "=w")
+   (fma:VH
+    (match_operand:VH 1 "register_operand" "w")
+    (match_operand:VH 2 "register_operand" "w")
+    (match_operand:VH 3 "register_operand" "0")))]
+ "TARGET_NEON_FP16INST"
+ "vfma.<V_if_elem>\\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+ [(set_attr "type" "neon_fp_mla_s<q>")]
+)
+
 (define_insn "*fmsub<VCVTF:mode>4"
   [(set (match_operand:VCVTF 0 "register_operand" "=w")
         (fma:VCVTF (neg:VCVTF (match_operand:VCVTF 1 "register_operand" "w"))
@@ -640,13 +697,36 @@
 )
 
 (define_insn "fmsub<VCVTF:mode>4_intrinsic"
-  [(set (match_operand:VCVTF 0 "register_operand" "=w")
-        (fma:VCVTF (neg:VCVTF (match_operand:VCVTF 1 "register_operand" "w"))
-		   (match_operand:VCVTF 2 "register_operand" "w")
-		   (match_operand:VCVTF 3 "register_operand" "0")))]
-  "TARGET_NEON && TARGET_FMA"
-  "vfms%?.<V_if_elem>\\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
-  [(set_attr "type" "neon_fp_mla_s<q>")]
+ [(set (match_operand:VCVTF 0 "register_operand" "=w")
+   (fma:VCVTF
+    (neg:VCVTF (match_operand:VCVTF 1 "register_operand" "w"))
+    (match_operand:VCVTF 2 "register_operand" "w")
+    (match_operand:VCVTF 3 "register_operand" "0")))]
+ "TARGET_NEON && TARGET_FMA"
+ "vfms%?.<V_if_elem>\\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+ [(set_attr "type" "neon_fp_mla_s<q>")]
+)
+
+(define_insn "*fmsub<VH:mode>4"
+ [(set (match_operand:VH 0 "register_operand" "=w")
+   (fma:VH
+    (neg:VH (match_operand:VH 1 "register_operand" "w"))
+    (match_operand:VH 2 "register_operand" "w")
+    (match_operand:VH 3 "register_operand" "0")))]
+ "TARGET_NEON_FP16INST && flag_unsafe_math_optimizations"
+ "vfms.<V_if_elem>\\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+ [(set_attr "type" "neon_fp_mla_s<q>")]
+)
+
+(define_insn "fmsub<VH:mode>4_intrinsic"
+ [(set (match_operand:VH 0 "register_operand" "=w")
+   (fma:VH
+    (neg:VH (match_operand:VH 1 "register_operand" "w"))
+    (match_operand:VH 2 "register_operand" "w")
+    (match_operand:VH 3 "register_operand" "0")))]
+ "TARGET_NEON_FP16INST"
+ "vfms.<V_if_elem>\\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+ [(set_attr "type" "neon_fp_mla_s<q>")]
 )
 
 (define_insn "neon_vrint<NEON_VRINT:nvrint_variant><VCVTF:mode>"
@@ -860,6 +940,44 @@
   ""
 )
 
+(define_insn "<absneg_str><mode>2_fp16"
+  [(set (match_operand:VH 0 "s_register_operand" "=w")
+    (ABSNEG:VH (match_operand:VH 1 "s_register_operand" "w")))]
+ "TARGET_NEON_FP16INST"
+ "v<absneg_str>.<V_s_elem>\t%<V_reg>0, %<V_reg>1"
+ [(set_attr "type" "neon_abs<q>")]
+)
+
+(define_expand "neon_v<absneg_str><mode>"
+ [(set
+   (match_operand:VH 0 "s_register_operand")
+   (ABSNEG:VH (match_operand:VH 1 "s_register_operand")))]
+ "TARGET_NEON_FP16INST"
+{
+  emit_insn (gen_<absneg_str><mode>2_fp16 (operands[0], operands[1]));
+  DONE;
+})
+
+(define_insn "neon_v<fp16_rnd_str><mode>"
+  [(set (match_operand:VH 0 "s_register_operand" "=w")
+    (unspec:VH
+     [(match_operand:VH 1 "s_register_operand" "w")]
+     FP16_RND))]
+ "TARGET_NEON_FP16INST"
+ "<fp16_rnd_insn>.<V_s_elem>\t%<V_reg>0, %<V_reg>1"
+ [(set_attr "type" "neon_fp_round_s<q>")]
+)
+
+(define_insn "neon_vsqrte<mode>"
+  [(set (match_operand:VH 0 "s_register_operand" "=w")
+    (unspec:VH
+     [(match_operand:VH 1 "s_register_operand" "w")]
+     UNSPEC_VSQRTE))]
+  "TARGET_NEON_FP16INST"
+  "vsqrte.f16\t%<V_reg>0, %<V_reg>1"
+ [(set_attr "type" "neon_fp_rsqrte_s<q>")]
+)
+
 (define_insn "*umin<mode>3_neon"
   [(set (match_operand:VDQIW 0 "s_register_operand" "=w")
 	(umin:VDQIW (match_operand:VDQIW 1 "s_register_operand" "w")
@@ -1601,6 +1719,17 @@
                     (const_string "neon_reduc_add<q>")))]
 )
 
+(define_insn "neon_vpaddv4hf"
+ [(set
+   (match_operand:V4HF 0 "s_register_operand" "=w")
+   (unspec:V4HF [(match_operand:V4HF 1 "s_register_operand" "w")
+		 (match_operand:V4HF 2 "s_register_operand" "w")]
+    UNSPEC_VPADD))]
+ "TARGET_NEON_FP16INST"
+ "vpadd.f16\t%P0, %P1, %P2"
+ [(set_attr "type" "neon_reduc_add")]
+)
+
 (define_insn "neon_vpsmin<mode>"
   [(set (match_operand:VD 0 "s_register_operand" "=w")
 	(unspec:VD [(match_operand:VD 1 "s_register_operand" "w")
@@ -1949,6 +2078,26 @@
   DONE;
 })
 
+(define_expand "neon_vadd<mode>"
+  [(match_operand:VH 0 "s_register_operand")
+   (match_operand:VH 1 "s_register_operand")
+   (match_operand:VH 2 "s_register_operand")]
+  "TARGET_NEON_FP16INST"
+{
+  emit_insn (gen_add<mode>3_fp16 (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
+(define_expand "neon_vsub<mode>"
+  [(match_operand:VH 0 "s_register_operand")
+   (match_operand:VH 1 "s_register_operand")
+   (match_operand:VH 2 "s_register_operand")]
+  "TARGET_NEON_FP16INST"
+{
+  emit_insn (gen_sub<mode>3_fp16 (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
 ; Note that NEON operations don't support the full IEEE 754 standard: in
 ; particular, denormal values are flushed to zero.  This means that GCC cannot
 ; use those instructions for autovectorization, etc. unless
@@ -1974,6 +2123,30 @@
                     (const_string "neon_add<q>")))]
 )
 
+(define_insn "neon_vadd<mode>_unspec"
+  [(set
+    (match_operand:VH 0 "s_register_operand" "=w")
+    (unspec:VH
+     [(match_operand:VH 1 "s_register_operand" "w")
+      (match_operand:VH 2 "s_register_operand" "w")]
+     UNSPEC_VADD))]
+ "TARGET_NEON_FP16INST"
+ "vadd.f16\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+ [(set_attr "type" "neon_add<q>")]
+)
+
+(define_insn "neon_vsub<mode>_unspec"
+  [(set
+    (match_operand:VH 0 "s_register_operand" "=w")
+    (unspec:VH
+     [(match_operand:VH 1 "s_register_operand" "w")
+      (match_operand:VH 2 "s_register_operand" "w")]
+     UNSPEC_VSUB))]
+ "TARGET_NEON_FP16INST"
+ "vsub.f16\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+ [(set_attr "type" "neon_sub<q>")]
+)
+
 (define_insn "neon_vaddl<sup><mode>"
   [(set (match_operand:<V_widen> 0 "s_register_operand" "=w")
         (unspec:<V_widen> [(match_operand:VDI 1 "s_register_operand" "w")
@@ -2040,6 +2213,17 @@
                     (const_string "neon_mul_<V_elem_ch><q>")))]
 )
 
+(define_insn "neon_vmulf<mode>"
+ [(set
+   (match_operand:VH 0 "s_register_operand" "=w")
+   (mult:VH
+    (match_operand:VH 1 "s_register_operand" "w")
+    (match_operand:VH 2 "s_register_operand" "w")))]
+  "TARGET_NEON_FP16INST"
+  "vmul.f16\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+ [(set_attr "type" "neon_mul_<VH_elem_ch><q>")]
+)
+
 (define_expand "neon_vmla<mode>"
   [(match_operand:VDQW 0 "s_register_operand" "=w")
    (match_operand:VDQW 1 "s_register_operand" "0")
@@ -2068,6 +2252,18 @@
   DONE;
 })
 
+(define_expand "neon_vfma<VH:mode>"
+  [(match_operand:VH 0 "s_register_operand")
+   (match_operand:VH 1 "s_register_operand")
+   (match_operand:VH 2 "s_register_operand")
+   (match_operand:VH 3 "s_register_operand")]
+  "TARGET_NEON_FP16INST"
+{
+  emit_insn (gen_fma<mode>4_intrinsic (operands[0], operands[2], operands[3],
+				       operands[1]));
+  DONE;
+})
+
 (define_expand "neon_vfms<VCVTF:mode>"
   [(match_operand:VCVTF 0 "s_register_operand")
    (match_operand:VCVTF 1 "s_register_operand")
@@ -2080,6 +2276,18 @@
   DONE;
 })
 
+(define_expand "neon_vfms<VH:mode>"
+  [(match_operand:VH 0 "s_register_operand")
+   (match_operand:VH 1 "s_register_operand")
+   (match_operand:VH 2 "s_register_operand")
+   (match_operand:VH 3 "s_register_operand")]
+  "TARGET_NEON_FP16INST"
+{
+  emit_insn (gen_fmsub<mode>4_intrinsic (operands[0], operands[2], operands[3],
+					 operands[1]));
+  DONE;
+})
+
 ; Used for intrinsics when flag_unsafe_math_optimizations is false.
 
 (define_insn "neon_vmla<mode>_unspec"
@@ -2380,6 +2588,72 @@
   [(set_attr "type" "neon_fp_compare_s<q>")]
 )
 
+(define_expand "neon_vc<cmp_op><mode>"
+ [(match_operand:<V_cmp_result> 0 "s_register_operand")
+  (neg:<V_cmp_result>
+   (COMPARISONS:VH
+    (match_operand:VH 1 "s_register_operand")
+    (match_operand:VH 2 "reg_or_zero_operand")))]
+ "TARGET_NEON_FP16INST"
+{
+  /* For FP comparisons use UNSPECS unless -funsafe-math-optimizations
+     are enabled.  */
+  if (GET_MODE_CLASS (<MODE>mode) == MODE_VECTOR_FLOAT
+      && !flag_unsafe_math_optimizations)
+    emit_insn
+      (gen_neon_vc<cmp_op><mode>_fp16insn_unspec
+       (operands[0], operands[1], operands[2]));
+  else
+    emit_insn
+      (gen_neon_vc<cmp_op><mode>_fp16insn
+       (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
+(define_insn "neon_vc<cmp_op><mode>_fp16insn"
+ [(set (match_operand:<V_cmp_result> 0 "s_register_operand" "=w,w")
+   (neg:<V_cmp_result>
+    (COMPARISONS:<V_cmp_result>
+     (match_operand:VH 1 "s_register_operand" "w,w")
+     (match_operand:VH 2 "reg_or_zero_operand" "w,Dz"))))]
+ "TARGET_NEON_FP16INST
+  && !(GET_MODE_CLASS (<MODE>mode) == MODE_VECTOR_FLOAT
+  && !flag_unsafe_math_optimizations)"
+{
+  char pattern[100];
+  sprintf (pattern, "vc<cmp_op>.%s%%#<V_sz_elem>\t%%<V_reg>0,"
+	   " %%<V_reg>1, %s",
+	   GET_MODE_CLASS (<MODE>mode) == MODE_VECTOR_FLOAT
+	   ? "f" : "<cmp_type>",
+	   which_alternative == 0
+	   ? "%<V_reg>2" : "#0");
+  output_asm_insn (pattern, operands);
+  return "";
+}
+ [(set (attr "type")
+   (if_then_else (match_operand 2 "zero_operand")
+    (const_string "neon_compare_zero<q>")
+    (const_string "neon_compare<q>")))])
+
+(define_insn "neon_vc<cmp_op_unsp><mode>_fp16insn_unspec"
+ [(set
+   (match_operand:<V_cmp_result> 0 "s_register_operand" "=w,w")
+   (unspec:<V_cmp_result>
+    [(match_operand:VH 1 "s_register_operand" "w,w")
+     (match_operand:VH 2 "reg_or_zero_operand" "w,Dz")]
+    NEON_VCMP))]
+ "TARGET_NEON_FP16INST"
+{
+  char pattern[100];
+  sprintf (pattern, "vc<cmp_op_unsp>.f%%#<V_sz_elem>\t%%<V_reg>0,"
+	   " %%<V_reg>1, %s",
+	   which_alternative == 0
+	   ? "%<V_reg>2" : "#0");
+  output_asm_insn (pattern, operands);
+  return "";
+}
+ [(set_attr "type" "neon_fp_compare_s<q>")])
+
 (define_insn "neon_vc<cmp_op>u<mode>"
   [(set (match_operand:<V_cmp_result> 0 "s_register_operand" "=w")
         (neg:<V_cmp_result>
@@ -2431,6 +2705,60 @@
   [(set_attr "type" "neon_fp_compare_s<q>")]
 )
 
+(define_expand "neon_vca<cmp_op><mode>"
+  [(set
+    (match_operand:<V_cmp_result> 0 "s_register_operand")
+    (neg:<V_cmp_result>
+     (GLTE:<V_cmp_result>
+      (abs:VH (match_operand:VH 1 "s_register_operand"))
+      (abs:VH (match_operand:VH 2 "s_register_operand")))))]
+ "TARGET_NEON_FP16INST"
+{
+  if (flag_unsafe_math_optimizations)
+    emit_insn (gen_neon_vca<cmp_op><mode>_fp16insn
+	       (operands[0], operands[1], operands[2]));
+  else
+    emit_insn (gen_neon_vca<cmp_op><mode>_fp16insn_unspec
+	       (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
+(define_insn "neon_vca<cmp_op><mode>_fp16insn"
+  [(set
+    (match_operand:<V_cmp_result> 0 "s_register_operand" "=w")
+    (neg:<V_cmp_result>
+     (GLTE:<V_cmp_result>
+      (abs:VH (match_operand:VH 1 "s_register_operand" "w"))
+      (abs:VH (match_operand:VH 2 "s_register_operand" "w")))))]
+ "TARGET_NEON_FP16INST && flag_unsafe_math_optimizations"
+ "vac<cmp_op>.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+ [(set_attr "type" "neon_fp_compare_s<q>")]
+)
+
+(define_insn "neon_vca<cmp_op_unsp><mode>_fp16insn_unspec"
+ [(set (match_operand:<V_cmp_result> 0 "s_register_operand" "=w")
+   (unspec:<V_cmp_result>
+    [(match_operand:VH 1 "s_register_operand" "w")
+     (match_operand:VH 2 "s_register_operand" "w")]
+    NEON_VAGLTE))]
+ "TARGET_NEON"
+ "vac<cmp_op_unsp>.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+ [(set_attr "type" "neon_fp_compare_s<q>")]
+)
+
+(define_expand "neon_vc<cmp_op>z<mode>"
+ [(set
+   (match_operand:<V_cmp_result> 0 "s_register_operand")
+   (COMPARISONS:<V_cmp_result>
+    (match_operand:VH 1 "s_register_operand")
+    (const_int 0)))]
+ "TARGET_NEON_FP16INST"
+ {
+  emit_insn (gen_neon_vc<cmp_op><mode> (operands[0], operands[1],
+					CONST0_RTX (<MODE>mode)));
+  DONE;
+})
+
 (define_insn "neon_vtst<mode>"
   [(set (match_operand:VDQIW 0 "s_register_operand" "=w")
         (unspec:VDQIW [(match_operand:VDQIW 1 "s_register_operand" "w")
@@ -2451,6 +2779,16 @@
   [(set_attr "type" "neon_abd<q>")]
 )
 
+(define_insn "neon_vabd<mode>"
+  [(set (match_operand:VH 0 "s_register_operand" "=w")
+    (unspec:VH [(match_operand:VH 1 "s_register_operand" "w")
+		(match_operand:VH 2 "s_register_operand" "w")]
+     UNSPEC_VABD_F))]
+ "TARGET_NEON_FP16INST"
+ "vabd.<V_s_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+  [(set_attr "type" "neon_abd<q>")]
+)
+
 (define_insn "neon_vabdf<mode>"
   [(set (match_operand:VCVTF 0 "s_register_operand" "=w")
         (unspec:VCVTF [(match_operand:VCVTF 1 "s_register_operand" "w")
@@ -2513,6 +2851,40 @@
   [(set_attr "type" "neon_fp_minmax_s<q>")]
 )
 
+(define_insn "neon_v<maxmin>f<mode>"
+ [(set (match_operand:VH 0 "s_register_operand" "=w")
+   (unspec:VH
+    [(match_operand:VH 1 "s_register_operand" "w")
+     (match_operand:VH 2 "s_register_operand" "w")]
+    VMAXMINF))]
+ "TARGET_NEON_FP16INST"
+ "v<maxmin>.<V_s_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+ [(set_attr "type" "neon_fp_minmax_s<q>")]
+)
+
+(define_insn "neon_vp<maxmin>fv4hf"
+ [(set (match_operand:V4HF 0 "s_register_operand" "=w")
+   (unspec:V4HF
+    [(match_operand:V4HF 1 "s_register_operand" "w")
+     (match_operand:V4HF 2 "s_register_operand" "w")]
+    VPMAXMINF))]
+ "TARGET_NEON_FP16INST"
+ "vp<maxmin>.f16\t%P0, %P1, %P2"
+  [(set_attr "type" "neon_reduc_minmax")]
+)
+
+(define_insn "neon_<fmaxmin_op><mode>"
+ [(set
+   (match_operand:VH 0 "s_register_operand" "=w")
+   (unspec:VH
+    [(match_operand:VH 1 "s_register_operand" "w")
+     (match_operand:VH 2 "s_register_operand" "w")]
+    VMAXMINFNM))]
+ "TARGET_NEON_FP16INST"
+ "<fmaxmin_op>.<V_s_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+ [(set_attr "type" "neon_fp_minmax_s<q>")]
+)
+
 ;; Vector forms for the IEEE-754 fmax()/fmin() functions
 (define_insn "<fmaxmin><mode>3"
   [(set (match_operand:VCVTF 0 "s_register_operand" "=w")
@@ -2584,6 +2956,17 @@
   [(set_attr "type" "neon_fp_recps_s<q>")]
 )
 
+(define_insn "neon_vrecps<mode>"
+  [(set
+    (match_operand:VH 0 "s_register_operand" "=w")
+    (unspec:VH [(match_operand:VH 1 "s_register_operand" "w")
+		(match_operand:VH 2 "s_register_operand" "w")]
+     UNSPEC_VRECPS))]
+  "TARGET_NEON_FP16INST"
+  "vrecps.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+  [(set_attr "type" "neon_fp_recps_s<q>")]
+)
+
 (define_insn "neon_vrsqrts<mode>"
   [(set (match_operand:VCVTF 0 "s_register_operand" "=w")
         (unspec:VCVTF [(match_operand:VCVTF 1 "s_register_operand" "w")
@@ -2594,6 +2977,17 @@
   [(set_attr "type" "neon_fp_rsqrts_s<q>")]
 )
 
+(define_insn "neon_vrsqrts<mode>"
+  [(set
+    (match_operand:VH 0 "s_register_operand" "=w")
+    (unspec:VH [(match_operand:VH 1 "s_register_operand" "w")
+		 (match_operand:VH 2 "s_register_operand" "w")]
+     UNSPEC_VRSQRTS))]
+ "TARGET_NEON_FP16INST"
+ "vrsqrts.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+ [(set_attr "type" "neon_fp_rsqrts_s<q>")]
+)
+
 (define_expand "neon_vabs<mode>"
   [(match_operand:VDQW 0 "s_register_operand" "")
    (match_operand:VDQW 1 "s_register_operand" "")]
@@ -2709,6 +3103,15 @@
 })
 
 (define_insn "neon_vrecpe<mode>"
+  [(set (match_operand:VH 0 "s_register_operand" "=w")
+	(unspec:VH [(match_operand:VH 1 "s_register_operand" "w")]
+		   UNSPEC_VRECPE))]
+  "TARGET_NEON_FP16INST"
+  "vrecpe.f16\t%<V_reg>0, %<V_reg>1"
+  [(set_attr "type" "neon_fp_recpe_s<q>")]
+)
+
+(define_insn "neon_vrecpe<mode>"
   [(set (match_operand:V32 0 "s_register_operand" "=w")
 	(unspec:V32 [(match_operand:V32 1 "s_register_operand" "w")]
                     UNSPEC_VRECPE))]
@@ -3251,6 +3654,28 @@ if (BYTES_BIG_ENDIAN)
   [(set_attr "type" "neon_fp_cvt_narrow_s_q")]
 )
 
+(define_insn "neon_vcvt<sup><mode>"
+ [(set
+   (match_operand:<VH_CVTTO> 0 "s_register_operand" "=w")
+   (unspec:<VH_CVTTO>
+    [(match_operand:VCVTHI 1 "s_register_operand" "w")]
+    VCVT_US))]
+ "TARGET_NEON_FP16INST"
+ "vcvt.f16.<sup>%#16\t%<V_reg>0, %<V_reg>1"
+  [(set_attr "type" "neon_int_to_fp_<VH_elem_ch><q>")]
+)
+
+(define_insn "neon_vcvt<sup><mode>"
+ [(set
+   (match_operand:<VH_CVTTO> 0 "s_register_operand" "=w")
+   (unspec:<VH_CVTTO>
+    [(match_operand:VH 1 "s_register_operand" "w")]
+    VCVT_US))]
+ "TARGET_NEON_FP16INST"
+ "vcvt.<sup>%#16.f16\t%<V_reg>0, %<V_reg>1"
+  [(set_attr "type" "neon_fp_to_int_<VH_elem_ch><q>")]
+)
+
 (define_insn "neon_vcvt<sup>_n<mode>"
   [(set (match_operand:<V_CVTTO> 0 "s_register_operand" "=w")
 	(unspec:<V_CVTTO> [(match_operand:VCVTF 1 "s_register_operand" "w")
@@ -3265,6 +3690,20 @@ if (BYTES_BIG_ENDIAN)
 )
 
 (define_insn "neon_vcvt<sup>_n<mode>"
+ [(set (match_operand:<VH_CVTTO> 0 "s_register_operand" "=w")
+   (unspec:<VH_CVTTO>
+    [(match_operand:VH 1 "s_register_operand" "w")
+     (match_operand:SI 2 "immediate_operand" "i")]
+    VCVT_US_N))]
+  "TARGET_NEON_FP16INST"
+{
+  neon_const_bounds (operands[2], 1, 33);
+  return "vcvt.<sup>%#16.f16\t%<V_reg>0, %<V_reg>1, %2";
+}
+ [(set_attr "type" "neon_fp_to_int_<VH_elem_ch><q>")]
+)
+
+(define_insn "neon_vcvt<sup>_n<mode>"
   [(set (match_operand:<V_CVTTO> 0 "s_register_operand" "=w")
 	(unspec:<V_CVTTO> [(match_operand:VCVTI 1 "s_register_operand" "w")
 			   (match_operand:SI 2 "immediate_operand" "i")]
@@ -3277,6 +3716,31 @@ if (BYTES_BIG_ENDIAN)
   [(set_attr "type" "neon_int_to_fp_<V_elem_ch><q>")]
 )
 
+(define_insn "neon_vcvt<sup>_n<mode>"
+ [(set (match_operand:<VH_CVTTO> 0 "s_register_operand" "=w")
+   (unspec:<VH_CVTTO>
+    [(match_operand:VCVTHI 1 "s_register_operand" "w")
+     (match_operand:SI 2 "immediate_operand" "i")]
+    VCVT_US_N))]
+ "TARGET_NEON_FP16INST"
+{
+  neon_const_bounds (operands[2], 1, 33);
+  return "vcvt.f16.<sup>%#16\t%<V_reg>0, %<V_reg>1, %2";
+}
+ [(set_attr "type" "neon_int_to_fp_<VH_elem_ch><q>")]
+)
+
+(define_insn "neon_vcvt<vcvth_op><sup><mode>"
+ [(set
+   (match_operand:<VH_CVTTO> 0 "s_register_operand" "=w")
+   (unspec:<VH_CVTTO>
+    [(match_operand:VH 1 "s_register_operand" "w")]
+    VCVT_HF_US))]
+ "TARGET_NEON_FP16INST"
+ "vcvt<vcvth_op>.<sup>%#16.f16\t%<V_reg>0, %<V_reg>1"
+  [(set_attr "type" "neon_fp_to_int_<VH_elem_ch><q>")]
+)
+
 (define_insn "neon_vmovn<mode>"
   [(set (match_operand:<V_narrow> 0 "s_register_operand" "=w")
 	(unspec:<V_narrow> [(match_operand:VN 1 "s_register_operand" "w")]
@@ -3347,6 +3811,18 @@ if (BYTES_BIG_ENDIAN)
                    (const_string "neon_mul_<V_elem_ch>_scalar<q>")))]
 )
 
+(define_insn "neon_vmul_lane<mode>"
+  [(set (match_operand:VH 0 "s_register_operand" "=w")
+	(unspec:VH [(match_operand:VH 1 "s_register_operand" "w")
+		    (match_operand:V4HF 2 "s_register_operand"
+		     "<scalar_mul_constraint>")
+		     (match_operand:SI 3 "immediate_operand" "i")]
+		     UNSPEC_VMUL_LANE))]
+  "TARGET_NEON_FP16INST"
+  "vmul.f16\t%<V_reg>0, %<V_reg>1, %P2[%c3]"
+  [(set_attr "type" "neon_fp_mul_s_scalar<q>")]
+)
+
 (define_insn "neon_vmull<sup>_lane<mode>"
   [(set (match_operand:<V_widen> 0 "s_register_operand" "=w")
 	(unspec:<V_widen> [(match_operand:VMDI 1 "s_register_operand" "w")
@@ -3601,6 +4077,19 @@ if (BYTES_BIG_ENDIAN)
   DONE;
 })
 
+(define_expand "neon_vmul_n<mode>"
+  [(match_operand:VH 0 "s_register_operand")
+   (match_operand:VH 1 "s_register_operand")
+   (match_operand:<V_elem> 2 "s_register_operand")]
+  "TARGET_NEON_FP16INST"
+{
+  rtx tmp = gen_reg_rtx (V4HFmode);
+  emit_insn (gen_neon_vset_lanev4hf (tmp, operands[2], tmp, const0_rtx));
+  emit_insn (gen_neon_vmul_lane<mode> (operands[0], operands[1], tmp,
+				       const0_rtx));
+  DONE;
+})
+
 (define_expand "neon_vmulls_n<mode>"
   [(match_operand:<V_widen> 0 "s_register_operand" "")
    (match_operand:VMDI 1 "s_register_operand" "")
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index 57a47ff..cc5a16a 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -191,6 +191,8 @@
   UNSPEC_VBSL
   UNSPEC_VCAGE
   UNSPEC_VCAGT
+  UNSPEC_VCALE
+  UNSPEC_VCALT
   UNSPEC_VCEQ
   UNSPEC_VCGE
   UNSPEC_VCGEU
@@ -258,6 +260,8 @@
   UNSPEC_VMLSL_S_LANE
   UNSPEC_VMLSL_U_LANE
   UNSPEC_VMLSL_LANE
+  UNSPEC_VFMA_LANE
+  UNSPEC_VFMS_LANE
   UNSPEC_VMOVL_S
   UNSPEC_VMOVL_U
   UNSPEC_VMOVN
@@ -386,5 +390,5 @@
   UNSPEC_VRNDN
   UNSPEC_VRNDP
   UNSPEC_VRNDX
+  UNSPEC_VSQRTE
 ])
-
diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-arith-1.c b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-arith-1.c
index 8399288..029d13c 100644
--- a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-arith-1.c
+++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-arith-1.c
@@ -9,6 +9,9 @@ typedef __fp16 float16_t;
 typedef __simd64_float16_t float16x4_t;
 typedef __simd128_float16_t float16x8_t;
 
+typedef short int16x4_t __attribute__ ((vector_size (8)));
+typedef short int int16x8_t  __attribute__ ((vector_size (16)));
+
 float16_t
 fp16_abs (float16_t a)
 {
@@ -50,15 +53,47 @@ TEST_CMP (greaterthan, >, int, float16_t)
 TEST_CMP (lessthanequal, <=, int, float16_t)
 TEST_CMP (greaterthanqual, >=, int, float16_t)
 
-/* { dg-final { scan-assembler-times {vneg\.f16\ts[0-9]+, s[0-9]+} 1 } }  */
+/* Vectors of size 4.  */
+
+TEST_UNOP (neg, -, float16x4_t)
+
+TEST_BINOP (add, +, float16x4_t)
+TEST_BINOP (sub, -, float16x4_t)
+TEST_BINOP (mult, *, float16x4_t)
+TEST_BINOP (div, /, float16x4_t)
+
+TEST_CMP (equal, ==, int16x4_t, float16x4_t)
+TEST_CMP (unequal, !=, int16x4_t, float16x4_t)
+TEST_CMP (lessthan, <, int16x4_t, float16x4_t)
+TEST_CMP (greaterthan, >, int16x4_t, float16x4_t)
+TEST_CMP (lessthanequal, <=, int16x4_t, float16x4_t)
+TEST_CMP (greaterthanqual, >=, int16x4_t, float16x4_t)
+
+/* Vectors of size 8.  */
+
+TEST_UNOP (neg, -, float16x8_t)
+
+TEST_BINOP (add, +, float16x8_t)
+TEST_BINOP (sub, -, float16x8_t)
+TEST_BINOP (mult, *, float16x8_t)
+TEST_BINOP (div, /, float16x8_t)
+
+TEST_CMP (equal, ==, int16x8_t, float16x8_t)
+TEST_CMP (unequal, !=, int16x8_t, float16x8_t)
+TEST_CMP (lessthan, <, int16x8_t, float16x8_t)
+TEST_CMP (greaterthan, >, int16x8_t, float16x8_t)
+TEST_CMP (lessthanequal, <=, int16x8_t, float16x8_t)
+TEST_CMP (greaterthanqual, >=, int16x8_t, float16x8_t)
+
+/* { dg-final { scan-assembler-times {vneg\.f16\ts[0-9]+, s[0-9]+} 13 } }  */
 /* { dg-final { scan-assembler-times {vabs\.f16\ts[0-9]+, s[0-9]+} 2 } }  */
 
-/* { dg-final { scan-assembler-times {vadd\.f32\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } }  */
-/* { dg-final { scan-assembler-times {vsub\.f32\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } }  */
-/* { dg-final { scan-assembler-times {vmul\.f32\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } }  */
-/* { dg-final { scan-assembler-times {vdiv\.f32\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } }  */
-/* { dg-final { scan-assembler-times {vcmp\.f32\ts[0-9]+, s[0-9]+} 2 } }  */
-/* { dg-final { scan-assembler-times {vcmpe\.f32\ts[0-9]+, s[0-9]+} 4 } }  */
+/* { dg-final { scan-assembler-times {vadd\.f32\ts[0-9]+, s[0-9]+, s[0-9]+} 13 } }  */
+/* { dg-final { scan-assembler-times {vsub\.f32\ts[0-9]+, s[0-9]+, s[0-9]+} 13 } }  */
+/* { dg-final { scan-assembler-times {vmul\.f32\ts[0-9]+, s[0-9]+, s[0-9]+} 13 } }  */
+/* { dg-final { scan-assembler-times {vdiv\.f32\ts[0-9]+, s[0-9]+, s[0-9]+} 13 } }  */
+/* { dg-final { scan-assembler-times {vcmp\.f32\ts[0-9]+, s[0-9]+} 26 } }  */
+/* { dg-final { scan-assembler-times {vcmpe\.f32\ts[0-9]+, s[0-9]+} 52 } }  */
 
 /* { dg-final { scan-assembler-not {vadd\.f16} } }  */
 /* { dg-final { scan-assembler-not {vsub\.f16} } }  */
-- 
2.1.4


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 10/17][ARM] Refactor support code for NEON builtins.
  2016-05-17 14:20 [PATCH 0/17][ARM] ARMv8.2-A and FP16 extension support Matthew Wahab
                   ` (8 preceding siblings ...)
  2016-05-17 14:37 ` [PATCH 9/17][ARM] Add NEON " Matthew Wahab
@ 2016-05-17 14:39 ` Matthew Wahab
  2016-07-28 11:54   ` Ramana Radhakrishnan
  2016-05-17 14:41 ` [PATCH 11/17][ARM] Add builtins for VFP FP16 intrinsics Matthew Wahab
                   ` (6 subsequent siblings)
  16 siblings, 1 reply; 73+ messages in thread
From: Matthew Wahab @ 2016-05-17 14:39 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1201 bytes --]

The ACLE intrinsics introduced to support the ARMv8.2 FP16 extensions
require that intrinsics for scalar (VFP) instructions are available
under different conditions from those for the NEON intrinsics. To
support this, changes to the builtins support code are needed to enable
the scalar intrinsics to be initialized and expanded independently of
the NEON intrinsics.

This patch prepares for this by refactoring some of the builtin support
code so that it can be used for both the scalar and the NEON intrinsics.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator.

Ok for trunk?
Matthew

2016-05-17  Matthew Wahab  <matthew.wahab@arm.com>

	* config/arm/arm-builtins.c (ARM_BUILTIN_NEON_PATTERN_START):
	Change offset calculation.
	(arm_init_neon_builtin): New.
	(arm_init_builtins): Move body of a loop to the standalone
	function arm_init_neon_builtin.
	(arm_expand_neon_builtin_1): New.  Update comment.  Function body
	moved from arm_expand_neon_builtin with some white-space fixes.
	(arm_expand_neon_builtin): Move code into the standalone function
	arm_expand_neon_builtin_1.


[-- Attachment #2: 0010-PATCH-10-17-ARM-Refactor-support-code-for-NEON-built.patch --]
[-- Type: text/x-patch, Size: 13208 bytes --]

From 01aee04d2dc6d2d089407ab14892164417f8407e Mon Sep 17 00:00:00 2001
From: Matthew Wahab <matthew.wahab@arm.com>
Date: Thu, 7 Apr 2016 13:36:09 +0100
Subject: [PATCH 10/17] [PATCH 10/17][ARM] Refactor support code for NEON
 builtins.

2016-05-17  Matthew Wahab  <matthew.wahab@arm.com>

	* config/arm/arm-builtins.c (arm_init_neon_builtin): New.
	(arm_init_builtins): Move body of a loop to the standalone
	function arm_init_neon_builtin.
	(arm_expand_neon_builtin_1): New.  Update comment.  Function body
	moved from arm_neon_builtin with some white-space fixes.
	(arm_expand_neon_builtin): Move code into the standalone function
	arm_expand_neon_builtin_1.
---
 gcc/config/arm/arm-builtins.c | 292 +++++++++++++++++++++++-------------------
 1 file changed, 158 insertions(+), 134 deletions(-)

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 90fb40f..5a22b91 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -543,7 +543,7 @@ enum arm_builtins
 };
 
 #define ARM_BUILTIN_NEON_PATTERN_START \
-    (ARM_BUILTIN_MAX - ARRAY_SIZE (neon_builtin_data))
+  (ARM_BUILTIN_NEON_BASE + 1)
 
 #undef CF
 #undef VAR1
@@ -895,6 +895,110 @@ arm_init_simd_builtin_scalar_types (void)
 					     "__builtin_neon_uti");
 }
 
+/* Set up a NEON builtin.  */
+
+static void
+arm_init_neon_builtin (unsigned int fcode,
+		       neon_builtin_datum *d)
+{
+  bool print_type_signature_p = false;
+  char type_signature[SIMD_MAX_BUILTIN_ARGS] = { 0 };
+  char namebuf[60];
+  tree ftype = NULL;
+  tree fndecl = NULL;
+
+  d->fcode = fcode;
+
+  /* We must track two variables here.  op_num is
+     the operand number as in the RTL pattern.  This is
+     required to access the mode (e.g. V4SF mode) of the
+     argument, from which the base type can be derived.
+     arg_num is an index in to the qualifiers data, which
+     gives qualifiers to the type (e.g. const unsigned).
+     The reason these two variables may differ by one is the
+     void return type.  While all return types take the 0th entry
+     in the qualifiers array, there is no operand for them in the
+     RTL pattern.  */
+  int op_num = insn_data[d->code].n_operands - 1;
+  int arg_num = d->qualifiers[0] & qualifier_void
+    ? op_num + 1
+    : op_num;
+  tree return_type = void_type_node, args = void_list_node;
+  tree eltype;
+
+  /* Build a function type directly from the insn_data for this
+     builtin.  The build_function_type () function takes care of
+     removing duplicates for us.  */
+  for (; op_num >= 0; arg_num--, op_num--)
+    {
+      machine_mode op_mode = insn_data[d->code].operand[op_num].mode;
+      enum arm_type_qualifiers qualifiers = d->qualifiers[arg_num];
+
+      if (qualifiers & qualifier_unsigned)
+	{
+	  type_signature[arg_num] = 'u';
+	  print_type_signature_p = true;
+	}
+      else if (qualifiers & qualifier_poly)
+	{
+	  type_signature[arg_num] = 'p';
+	  print_type_signature_p = true;
+	}
+      else
+	type_signature[arg_num] = 's';
+
+      /* Skip an internal operand for vget_{low, high}.  */
+      if (qualifiers & qualifier_internal)
+	continue;
+
+      /* Some builtins have different user-facing types
+	 for certain arguments, encoded in d->mode.  */
+      if (qualifiers & qualifier_map_mode)
+	op_mode = d->mode;
+
+      /* For pointers, we want a pointer to the basic type
+	 of the vector.  */
+      if (qualifiers & qualifier_pointer && VECTOR_MODE_P (op_mode))
+	op_mode = GET_MODE_INNER (op_mode);
+
+      eltype = arm_simd_builtin_type
+	(op_mode,
+	 (qualifiers & qualifier_unsigned) != 0,
+	 (qualifiers & qualifier_poly) != 0);
+      gcc_assert (eltype != NULL);
+
+      /* Add qualifiers.  */
+      if (qualifiers & qualifier_const)
+	eltype = build_qualified_type (eltype, TYPE_QUAL_CONST);
+
+      if (qualifiers & qualifier_pointer)
+	eltype = build_pointer_type (eltype);
+
+      /* If we have reached arg_num == 0, we are at a non-void
+	 return type.  Otherwise, we are still processing
+	 arguments.  */
+      if (arg_num == 0)
+	return_type = eltype;
+      else
+	args = tree_cons (NULL_TREE, eltype, args);
+    }
+
+  ftype = build_function_type (return_type, args);
+
+  gcc_assert (ftype != NULL);
+
+  if (print_type_signature_p)
+    snprintf (namebuf, sizeof (namebuf), "__builtin_neon_%s_%s",
+	      d->name, type_signature);
+  else
+    snprintf (namebuf, sizeof (namebuf), "__builtin_neon_%s",
+	      d->name);
+
+  fndecl = add_builtin_function (namebuf, ftype, fcode, BUILT_IN_MD,
+				 NULL, NULL_TREE);
+  arm_builtin_decls[fcode] = fndecl;
+}
+
 /* Set up all the NEON builtins, even builtins for instructions that are not
    in the current target ISA to allow the user to compile particular modules
    with different target specific options that differ from the command line
@@ -924,103 +1028,8 @@ arm_init_neon_builtins (void)
 
   for (i = 0; i < ARRAY_SIZE (neon_builtin_data); i++, fcode++)
     {
-      bool print_type_signature_p = false;
-      char type_signature[SIMD_MAX_BUILTIN_ARGS] = { 0 };
       neon_builtin_datum *d = &neon_builtin_data[i];
-      char namebuf[60];
-      tree ftype = NULL;
-      tree fndecl = NULL;
-
-      d->fcode = fcode;
-
-      /* We must track two variables here.  op_num is
-	 the operand number as in the RTL pattern.  This is
-	 required to access the mode (e.g. V4SF mode) of the
-	 argument, from which the base type can be derived.
-	 arg_num is an index in to the qualifiers data, which
-	 gives qualifiers to the type (e.g. const unsigned).
-	 The reason these two variables may differ by one is the
-	 void return type.  While all return types take the 0th entry
-	 in the qualifiers array, there is no operand for them in the
-	 RTL pattern.  */
-      int op_num = insn_data[d->code].n_operands - 1;
-      int arg_num = d->qualifiers[0] & qualifier_void
-		      ? op_num + 1
-		      : op_num;
-      tree return_type = void_type_node, args = void_list_node;
-      tree eltype;
-
-      /* Build a function type directly from the insn_data for this
-	 builtin.  The build_function_type () function takes care of
-	 removing duplicates for us.  */
-      for (; op_num >= 0; arg_num--, op_num--)
-	{
-	  machine_mode op_mode = insn_data[d->code].operand[op_num].mode;
-	  enum arm_type_qualifiers qualifiers = d->qualifiers[arg_num];
-
-	  if (qualifiers & qualifier_unsigned)
-	    {
-	      type_signature[arg_num] = 'u';
-	      print_type_signature_p = true;
-	    }
-	  else if (qualifiers & qualifier_poly)
-	    {
-	      type_signature[arg_num] = 'p';
-	      print_type_signature_p = true;
-	    }
-	  else
-	    type_signature[arg_num] = 's';
-
-	  /* Skip an internal operand for vget_{low, high}.  */
-	  if (qualifiers & qualifier_internal)
-	    continue;
-
-	  /* Some builtins have different user-facing types
-	     for certain arguments, encoded in d->mode.  */
-	  if (qualifiers & qualifier_map_mode)
-	      op_mode = d->mode;
-
-	  /* For pointers, we want a pointer to the basic type
-	     of the vector.  */
-	  if (qualifiers & qualifier_pointer && VECTOR_MODE_P (op_mode))
-	    op_mode = GET_MODE_INNER (op_mode);
-
-	  eltype = arm_simd_builtin_type
-		     (op_mode,
-		      (qualifiers & qualifier_unsigned) != 0,
-		      (qualifiers & qualifier_poly) != 0);
-	  gcc_assert (eltype != NULL);
-
-	  /* Add qualifiers.  */
-	  if (qualifiers & qualifier_const)
-	    eltype = build_qualified_type (eltype, TYPE_QUAL_CONST);
-
-	  if (qualifiers & qualifier_pointer)
-	      eltype = build_pointer_type (eltype);
-
-	  /* If we have reached arg_num == 0, we are at a non-void
-	     return type.  Otherwise, we are still processing
-	     arguments.  */
-	  if (arg_num == 0)
-	    return_type = eltype;
-	  else
-	    args = tree_cons (NULL_TREE, eltype, args);
-	}
-
-      ftype = build_function_type (return_type, args);
-
-      gcc_assert (ftype != NULL);
-
-      if (print_type_signature_p)
-	snprintf (namebuf, sizeof (namebuf), "__builtin_neon_%s_%s",
-		  d->name, type_signature);
-      else
-	snprintf (namebuf, sizeof (namebuf), "__builtin_neon_%s",
-		  d->name);
-
-      fndecl = add_builtin_function (namebuf, ftype, fcode, BUILT_IN_MD,
-				     NULL, NULL_TREE);
-      arm_builtin_decls[fcode] = fndecl;
+      arm_init_neon_builtin (fcode, d);
     }
 }
 
@@ -2211,40 +2220,16 @@ constant_arg:
   return target;
 }
 
-/* Expand a Neon builtin, i.e. those registered only if TARGET_NEON holds.
-   Most of these are "special" because they don't have symbolic
-   constants defined per-instruction or per instruction-variant. Instead, the
-   required info is looked up in the table neon_builtin_data.  */
+/* Expand a neon builtin.  This is also used for vfp builtins, which behave in
+   the same way.  These builtins are "special" because they don't have symbolic
+   constants defined per-instruction or per instruction-variant.  Instead, the
+   required info is looked up in the NEON_BUILTIN_DATA record that is passed
+   into the function.  */
+
 static rtx
-arm_expand_neon_builtin (int fcode, tree exp, rtx target)
+arm_expand_neon_builtin_1 (int fcode, tree exp, rtx target,
+			   neon_builtin_datum *d)
 {
-  /* Check in the context of the function making the call whether the
-     builtin is supported.  */
-  if (! TARGET_NEON)
-    {
-      fatal_error (input_location,
-		   "You must enable NEON instructions (e.g. -mfloat-abi=softfp -mfpu=neon) to use these intrinsics.");
-      return const0_rtx;
-    }
-
-  if (fcode == ARM_BUILTIN_NEON_LANE_CHECK)
-    {
-      /* Builtin is only to check bounds of the lane passed to some intrinsics
-	 that are implemented with gcc vector extensions in arm_neon.h.  */
-
-      tree nlanes = CALL_EXPR_ARG (exp, 0);
-      gcc_assert (TREE_CODE (nlanes) == INTEGER_CST);
-      rtx lane_idx = expand_normal (CALL_EXPR_ARG (exp, 1));
-      if (CONST_INT_P (lane_idx))
-	neon_lane_bounds (lane_idx, 0, TREE_INT_CST_LOW (nlanes), exp);
-      else
-	error ("%Klane index must be a constant immediate", exp);
-      /* Don't generate any RTL.  */
-      return const0_rtx;
-    }
-
-  neon_builtin_datum *d =
-		&neon_builtin_data[fcode - ARM_BUILTIN_NEON_PATTERN_START];
   enum insn_code icode = d->code;
   builtin_arg args[SIMD_MAX_BUILTIN_ARGS + 1];
   int num_args = insn_data[d->code].n_operands;
@@ -2260,8 +2245,8 @@ arm_expand_neon_builtin (int fcode, tree exp, rtx target)
       /* We have four arrays of data, each indexed in a different fashion.
 	 qualifiers - element 0 always describes the function return type.
 	 operands - element 0 is either the operand for return value (if
-	   the function has a non-void return type) or the operand for the
-	   first argument.
+	 the function has a non-void return type) or the operand for the
+	 first argument.
 	 expr_args - element 0 always holds the first argument.
 	 args - element 0 is always used for the return type.  */
       int qualifiers_k = k;
@@ -2283,7 +2268,7 @@ arm_expand_neon_builtin (int fcode, tree exp, rtx target)
 	  bool op_const_int_p =
 	    (CONST_INT_P (arg)
 	     && (*insn_data[icode].operand[operands_k].predicate)
-		(arg, insn_data[icode].operand[operands_k].mode));
+	     (arg, insn_data[icode].operand[operands_k].mode));
 	  args[k] = op_const_int_p ? NEON_ARG_CONSTANT : NEON_ARG_COPY_TO_REG;
 	}
       else if (d->qualifiers[qualifiers_k] & qualifier_pointer)
@@ -2296,8 +2281,47 @@ arm_expand_neon_builtin (int fcode, tree exp, rtx target)
   /* The interface to arm_expand_neon_args expects a 0 if
      the function is void, and a 1 if it is not.  */
   return arm_expand_neon_args
-	  (target, d->mode, fcode, icode, !is_void, exp,
-	   &args[1]);
+    (target, d->mode, fcode, icode, !is_void, exp,
+     &args[1]);
+}
+
+/* Expand a Neon builtin, i.e. those registered only if TARGET_NEON holds.
+   Most of these are "special" because they don't have symbolic
+   constants defined per-instruction or per instruction-variant.  Instead, the
+   required info is looked up in the table neon_builtin_data.  */
+
+static rtx
+arm_expand_neon_builtin (int fcode, tree exp, rtx target)
+{
+  if (fcode >= ARM_BUILTIN_NEON_BASE && ! TARGET_NEON)
+    {
+      fatal_error (input_location,
+		   "You must enable NEON instructions"
+		   " (e.g. -mfloat-abi=softfp -mfpu=neon)"
+		   " to use these intrinsics.");
+      return const0_rtx;
+    }
+
+  if (fcode == ARM_BUILTIN_NEON_LANE_CHECK)
+    {
+      /* Builtin is only to check bounds of the lane passed to some intrinsics
+	 that are implemented with gcc vector extensions in arm_neon.h.  */
+
+      tree nlanes = CALL_EXPR_ARG (exp, 0);
+      gcc_assert (TREE_CODE (nlanes) == INTEGER_CST);
+      rtx lane_idx = expand_normal (CALL_EXPR_ARG (exp, 1));
+      if (CONST_INT_P (lane_idx))
+	neon_lane_bounds (lane_idx, 0, TREE_INT_CST_LOW (nlanes), exp);
+      else
+	error ("%Klane index must be a constant immediate", exp);
+      /* Don't generate any RTL.  */
+      return const0_rtx;
+    }
+
+  neon_builtin_datum *d
+    = &neon_builtin_data[fcode - ARM_BUILTIN_NEON_PATTERN_START];
+
+  return arm_expand_neon_builtin_1 (fcode, exp, target, d);
 }
 
 /* Expand an expression EXP that calls a built-in function,
-- 
2.1.4


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 11/17][ARM] Add builtins for VFP FP16 intrinsics.
  2016-05-17 14:20 [PATCH 0/17][ARM] ARMv8.2-A and FP16 extension support Matthew Wahab
                   ` (9 preceding siblings ...)
  2016-05-17 14:39 ` [PATCH 10/17][ARM] Refactor support code for NEON builtins Matthew Wahab
@ 2016-05-17 14:41 ` Matthew Wahab
  2016-07-04 14:12   ` Matthew Wahab
  2016-05-17 14:43 ` [PATCH 12/17][ARM] Add builtins for NEON " Matthew Wahab
                   ` (5 subsequent siblings)
  16 siblings, 1 reply; 73+ messages in thread
From: Matthew Wahab @ 2016-05-17 14:41 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1318 bytes --]

The ACLE intrinsics introduced to support the ARMv8.2 FP16 extensions
require that intrinsics for scalar floating pointer (VFP) instructions
are available under different conditions from those for the NEON
intrinsics.

This patch adds the support code and builtins data for the new VFP
intrinsics. Because of the similarities between the scalar and NEON
builtins, the support code for the scalar builtins follows the code for
the NEON builtins. The declarations for the VFP builtins are also added
in this patch since the support code expects non-empty tables.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator.

Ok for trunk?
Matthew

2016-05-17  Matthew Wahab  <matthew.wahab@arm.com>

	* config/arm/arm-builtins.c (hf_UP): New.
	(si_UP): New.
	(arm_vfp_builtin_data): New.  Update comment.
	(enum arm_builtins): Include arm_vfp_builtins.def.
	(ARM_BUILTIN_VFP_PATTERN_START): New.
	(arm_init_vfp_builtins): New.
	(arm_init_builtins): Add arm_init_vfp_builtins.
	(arm_expand_vfp_builtin): New.
	(arm_expand_builtins: Update for arm_expand_vfp_builtin.  Fix
	long line.
	* config/arm/arm_vfp_builtins.c: New file.
	* config/arm/t-arm (arm.o): Add arm_vfp_builtins.def.
	(arm-builtins.o): Likewise.


[-- Attachment #2: 0011-PATCH-11-17-ARM-Add-builtins-for-VFP-FP16-intrinsics.patch --]
[-- Type: text/x-patch, Size: 8655 bytes --]

From d1f2b10a2e672b1dc886d8d1efb136d970f967f1 Mon Sep 17 00:00:00 2001
From: Matthew Wahab <matthew.wahab@arm.com>
Date: Thu, 7 Apr 2016 15:33:14 +0100
Subject: [PATCH 11/17] [PATCH 11/17][ARM] Add builtins for VFP FP16
 intrinsics.

2016-05-17  Matthew Wahab  <matthew.wahab@arm.com>

	* config/arm/arm-builtins.c (hf_UP): New.
	(si_UP): New.
	(arm_vfp_builtin_data): New.  Update comment.
	(arm_init_vfp_builtins): New.
	(arm_init_builtins): Add arm_init_vfp_builtins.
	(arm_expand_vfp_builtin): New.
	(arm_expand_builtins): Update for arm_expand_vfp_builtin.  Fix
	long line.
	* config/arm/arm_vfp_builtins.c: New file.
	* config/arm/t-arm (arm.o): Add arm_vfp_builtins.def.
	(arm-builtins.o): Likewise.
---
 gcc/config/arm/arm-builtins.c       | 75 +++++++++++++++++++++++++++++++++----
 gcc/config/arm/arm_vfp_builtins.def | 56 +++++++++++++++++++++++++++
 gcc/config/arm/t-arm                |  4 +-
 3 files changed, 126 insertions(+), 9 deletions(-)
 create mode 100644 gcc/config/arm/arm_vfp_builtins.def

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 5a22b91..58c68a6 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -190,6 +190,8 @@ arm_storestruct_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define ti_UP	 TImode
 #define ei_UP	 EImode
 #define oi_UP	 OImode
+#define hf_UP	 HFmode
+#define si_UP	 SImode
 
 #define UP(X) X##_UP
 
@@ -239,12 +241,22 @@ typedef struct {
   VAR11 (T, N, A, B, C, D, E, F, G, H, I, J, K) \
   VAR1 (T, N, L)
 
-/* The NEON builtin data can be found in arm_neon_builtins.def.
-   The mode entries in the following table correspond to the "key" type of the
-   instruction variant, i.e. equivalent to that which would be specified after
-   the assembler mnemonic, which usually refers to the last vector operand.
-   The modes listed per instruction should be the same as those defined for
-   that instruction's pattern in neon.md.  */
+/* The NEON builtin data can be found in arm_neon_builtins.def and
+   arm_vfp_builtins.def.  The entries in arm_neon_builtins.def require
+   TARGET_NEON to be true.  The entries in arm_vfp_builtins.def require
+   TARGET_VFP to be true.  The feature tests are checked when the builtins are
+   expanded.
+
+   The mode entries in the following table correspond to
+   the "key" type of the instruction variant, i.e. equivalent to that which
+   would be specified after the assembler mnemonic, which usually refers to the
+   last vector operand.  The modes listed per instruction should be the same as
+   those defined for that instruction's pattern in neon.md.  */
+
+static neon_builtin_datum vfp_builtin_data[] =
+{
+#include "arm_vfp_builtins.def"
+};
 
 static neon_builtin_datum neon_builtin_data[] =
 {
@@ -534,6 +546,10 @@ enum arm_builtins
 #undef CRYPTO2
 #undef CRYPTO3
 
+  ARM_BUILTIN_VFP_BASE,
+
+#include "arm_vfp_builtins.def"
+
   ARM_BUILTIN_NEON_BASE,
   ARM_BUILTIN_NEON_LANE_CHECK = ARM_BUILTIN_NEON_BASE,
 
@@ -542,6 +558,9 @@ enum arm_builtins
   ARM_BUILTIN_MAX
 };
 
+#define ARM_BUILTIN_VFP_PATTERN_START \
+  (ARM_BUILTIN_VFP_BASE + 1)
+
 #define ARM_BUILTIN_NEON_PATTERN_START \
   (ARM_BUILTIN_NEON_BASE + 1)
 
@@ -1033,6 +1052,20 @@ arm_init_neon_builtins (void)
     }
 }
 
+/* Set up all the scalar floating point builtins.  */
+
+static void
+arm_init_vfp_builtins (void)
+{
+  unsigned int i, fcode = ARM_BUILTIN_VFP_PATTERN_START;
+
+  for (i = 0; i < ARRAY_SIZE (vfp_builtin_data); i++, fcode++)
+    {
+      neon_builtin_datum *d = &vfp_builtin_data[i];
+      arm_init_neon_builtin (fcode, d);
+    }
+}
+
 static void
 arm_init_crypto_builtins (void)
 {
@@ -1777,7 +1810,7 @@ arm_init_builtins (void)
   if (TARGET_HARD_FLOAT)
     {
       arm_init_neon_builtins ();
-
+      arm_init_vfp_builtins ();
       arm_init_crypto_builtins ();
     }
 
@@ -2324,6 +2357,27 @@ arm_expand_neon_builtin (int fcode, tree exp, rtx target)
   return arm_expand_neon_builtin_1 (fcode, exp, target, d);
 }
 
+/* Expand a VFP builtin, if TARGET_VFP is true.  These builtins are treated like
+   neon builtins except that the data is looked up in table
+   VFP_BUILTIN_DATA.  */
+
+static rtx
+arm_expand_vfp_builtin (int fcode, tree exp, rtx target)
+{
+  if (fcode >= ARM_BUILTIN_VFP_BASE && ! TARGET_VFP)
+    {
+      fatal_error (input_location,
+		   "You must enable VFP instructions"
+		   " to use these intrinsics.");
+      return const0_rtx;
+    }
+
+  neon_builtin_datum *d
+    = &vfp_builtin_data[fcode - ARM_BUILTIN_VFP_PATTERN_START];
+
+  return arm_expand_neon_builtin_1 (fcode, exp, target, d);
+}
+
 /* Expand an expression EXP that calls a built-in function,
    with result going to TARGET if that's convenient
    (and in mode MODE if that's convenient).
@@ -2361,13 +2415,18 @@ arm_expand_builtin (tree exp,
   if (fcode >= ARM_BUILTIN_NEON_BASE)
     return arm_expand_neon_builtin (fcode, exp, target);
 
+  if (fcode >= ARM_BUILTIN_VFP_BASE)
+    return arm_expand_vfp_builtin (fcode, exp, target);
+
   /* Check in the context of the function making the call whether the
      builtin is supported.  */
   if (fcode >= ARM_BUILTIN_CRYPTO_BASE
       && (!TARGET_CRYPTO || !TARGET_HARD_FLOAT))
     {
       fatal_error (input_location,
-		   "You must enable crypto intrinsics (e.g. include -mfloat-abi=softfp -mfpu=crypto-neon...) to use these intrinsics.");
+		   "You must enable crypto instructions"
+		   " (e.g. include -mfloat-abi=softfp -mfpu=crypto-neon...)"
+		   " to use these intrinsics.");
       return const0_rtx;
     }
 
diff --git a/gcc/config/arm/arm_vfp_builtins.def b/gcc/config/arm/arm_vfp_builtins.def
new file mode 100644
index 0000000..35014ce
--- /dev/null
+++ b/gcc/config/arm/arm_vfp_builtins.def
@@ -0,0 +1,56 @@
+/* VFP instruction builtin definitions.
+   Copyright (C) 2016 Free Software Foundation, Inc.
+   Contributed by ARM Ltd.
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+/* This file lists the builtins that may be available when VFP is enabled but
+   not NEON is enabled.  The entries otherwise have the same requirements and
+   generate the same structures as those in the arm_neon_builtins.def.  */
+
+/* FP16 Arithmetic instructions.  */
+VAR1 (UNOP, vabs, hf)
+VAR2 (UNOP, vcvths, hf, si)
+VAR2 (UNOP, vcvthu, hf, si)
+VAR1 (UNOP, vcvtahs, si)
+VAR1 (UNOP, vcvtahu, si)
+VAR1 (UNOP, vcvtmhs, si)
+VAR1 (UNOP, vcvtmhu, si)
+VAR1 (UNOP, vcvtnhs, si)
+VAR1 (UNOP, vcvtnhu, si)
+VAR1 (UNOP, vcvtphs, si)
+VAR1 (UNOP, vcvtphu, si)
+VAR1 (UNOP, vneg, hf)
+VAR1 (UNOP, vrnd, hf)
+VAR1 (UNOP, vrnda, hf)
+VAR1 (UNOP, vrndi, hf)
+VAR1 (UNOP, vrndm, hf)
+VAR1 (UNOP, vrndn, hf)
+VAR1 (UNOP, vrndp, hf)
+VAR1 (UNOP, vrndx, hf)
+VAR1 (UNOP, vsqrt, hf)
+
+VAR1 (BINOP, vadd, hf)
+VAR2 (BINOP, vcvths_n, hf, si)
+VAR2 (BINOP, vcvthu_n, hf, si)
+VAR1 (BINOP, vdiv, hf)
+VAR1 (BINOP, vmaxnm, hf)
+VAR1 (BINOP, vminnm, hf)
+VAR1 (BINOP, vmulf, hf)
+VAR1 (BINOP, vsub, hf)
+
+VAR1 (TERNOP, vfma, hf)
+VAR1 (TERNOP, vfms, hf)
diff --git a/gcc/config/arm/t-arm b/gcc/config/arm/t-arm
index 749a58d..803baa2 100644
--- a/gcc/config/arm/t-arm
+++ b/gcc/config/arm/t-arm
@@ -95,7 +95,8 @@ arm.o: $(srcdir)/config/arm/arm.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
   $(srcdir)/config/arm/arm-cores.def \
   $(srcdir)/config/arm/arm-arches.def $(srcdir)/config/arm/arm-fpus.def \
   $(srcdir)/config/arm/arm-protos.h \
-  $(srcdir)/config/arm/arm_neon_builtins.def
+  $(srcdir)/config/arm/arm_neon_builtins.def \
+  $(srcdir)/config/arm/arm_vfp_builtins.def
 
 arm-builtins.o: $(srcdir)/config/arm/arm-builtins.c $(CONFIG_H) \
   $(SYSTEM_H) coretypes.h $(TM_H) \
@@ -103,6 +104,7 @@ arm-builtins.o: $(srcdir)/config/arm/arm-builtins.c $(CONFIG_H) \
   $(DIAGNOSTIC_CORE_H) $(OPTABS_H) \
   $(srcdir)/config/arm/arm-protos.h \
   $(srcdir)/config/arm/arm_neon_builtins.def \
+  $(srcdir)/config/arm/arm_vfp_builtins.def \
   $(srcdir)/config/arm/arm-simd-builtin-types.def
 	$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
 		$(srcdir)/config/arm/arm-builtins.c
-- 
2.1.4


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 12/17][ARM] Add builtins for NEON FP16 intrinsics.
  2016-05-17 14:20 [PATCH 0/17][ARM] ARMv8.2-A and FP16 extension support Matthew Wahab
                   ` (10 preceding siblings ...)
  2016-05-17 14:41 ` [PATCH 11/17][ARM] Add builtins for VFP FP16 intrinsics Matthew Wahab
@ 2016-05-17 14:43 ` Matthew Wahab
  2016-07-04 14:13   ` Matthew Wahab
  2016-05-17 14:44 ` [PATCH 13/17][ARM] Add VFP FP16 instrinsics Matthew Wahab
                   ` (4 subsequent siblings)
  16 siblings, 1 reply; 73+ messages in thread
From: Matthew Wahab @ 2016-05-17 14:43 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2714 bytes --]

This patch adds the builtins data for the ACLE intrinsics introduced to
support the NEON instructions of the ARMv8.2-A FP16 extension.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator.

Ok for trunk?
Matthew

2016-05-17  Matthew Wahab  <matthew.wahab@arm.com>

	* config/arm/arm_neon_builtins.def (vadd): New (v8hf, v4hf
	variants).
	(vmulf): New (v8hf, v4hf variants).
	(vfma): New (v8hf, v4hf variants).
	(vfms): New (v8hf, v4hf variants).
	(vsub): New (v8hf, v4hf variants).
	(vcage): New (v8hf, v4hf variants).
	(vcagt): New (v8hf, v4hf variants).
	(vcale): New (v8hf, v4hf variants).
	(vcalt): New (v8hf, v4hf variants).
	(vceq): New (v8hf, v4hf variants).
	(vcgt): New (v8hf, v4hf variants).
	(vcge): New (v8hf, v4hf variants).
	(vcle): New (v8hf, v4hf variants).
	(vclt): New (v8hf, v4hf variants).
	(vceqz): New (v8hf, v4hf variants).
	(vcgez): New (v8hf, v4hf variants).
	(vcgtz): New (v8hf, v4hf variants).
	(vcltz): New (v8hf, v4hf variants).
	(vclez): New (v8hf, v4hf variants).
	(vabd): New (v8hf, v4hf variants).
	(vmaxf): New (v8hf, v4hf variants).
	(vmaxnm): New (v8hf, v4hf variants).
	(vminf): New (v8hf, v4hf variants).
	(vminnm): New (v8hf, v4hf variants).
	(vpmaxf): New (v4hf variant).
	(vpminf): New (v4hf variant).
	(vpadd): New (v4hf variant).
	(vrecps): New (v8hf, v4hf variants).
	(vrsqrts): New (v8hf, v4hf variants).
	(vabs): New (v8hf, v4hf variants).
	(vneg): New (v8hf, v4hf variants).
	(vrecpe): New (v8hf, v4hf variants).
	(vrnd): New (v8hf, v4hf variants).
	(vrnda): New (v8hf, v4hf variants).
	(vrndm): New (v8hf, v4hf variants).
	(vrndn): New (v8hf, v4hf variants).
	(vrndp): New (v8hf, v4hf variants).
	(vrndx): New (v8hf, v4hf variants).
	(vsqrte): New (v8hf, v4hf variants).
	(vdup_n): New (v8hf, v4hf variants).
	(vdup_lane): New (v8hf, v4hf variants).
	(vmul_lane): Add v4hf and v8hf variants.
	(vmul_n): Add v4hf and v8hf variants.
	(vmul_n): Add v4hf and v8hf variants.
	(vext): New (v8hf, v4hf variants).
	(vcvts): New (v8hi, v4hi variants).
	(vcvts): New (v8hf, v4hf variants).
	(vcvtu): New (v8hi, v4hi variants).
	(vcvtu): New (v8hf, v4hf variants).
	(vcvts_n): New (v8hf, v4hf variants).
	(vcvtu_n): New (v8hi, v4hi variants).
	(vcvts_n): New (v8hi, v4hi variants).
	(vcvtu_n): New (v8hf, v4hf variants).
	(vbsl): New (v8hf, v4hf variants).
	(vcvtas): New (v8hf, v4hf variants).
	(vcvtau): New (v8hf, v4hf variants).
	(vcvtms): New (v8hf, v4hf variants).
	(vcvtmu): New (v8hf, v4hf variants).
	(vcvtns): New (v8hf, v4hf variants).
	(vcvtnu): New (v8hf, v4hf variants).
	(vcvtps): New (v8hf, v4hf variants).
	(vcvtpu): New (v8hf, v4hf variants).


[-- Attachment #2: 0012-PATCH-12-17-ARM-Add-builtins-for-NEON-FP16-intrinsic.patch --]
[-- Type: text/x-patch, Size: 9972 bytes --]

From ca740dee578be4c67afeec106feaa1633daff63b Mon Sep 17 00:00:00 2001
From: Matthew Wahab <matthew.wahab@arm.com>
Date: Thu, 7 Apr 2016 13:36:41 +0100
Subject: [PATCH 12/17] [PATCH 12/17][ARM] Add builtins for NEON FP16
 intrinsics.

2016-05-17  Matthew Wahab  <matthew.wahab@arm.com>

	* config/arm/arm_neon_builtins.def (vadd): New (v8hf, v4hf
	variants).
	(vmulf): New (v8hf, v4hf variants).
	(vfma): New (v8hf, v4hf variants).
	(vfms): New (v8hf, v4hf variants).
	(vsub): New (v8hf, v4hf variants).
	(vcage): New (v8hf, v4hf variants).
	(vcagt): New (v8hf, v4hf variants).
	(vcale): New (v8hf, v4hf variants).
	(vcalt): New (v8hf, v4hf variants).
	(vceq): New (v8hf, v4hf variants).
	(vcgt): New (v8hf, v4hf variants).
	(vcge): New (v8hf, v4hf variants).
	(vcle): New (v8hf, v4hf variants).
	(vclt): New (v8hf, v4hf variants).
	(vceqz): New (v8hf, v4hf variants).
	(vcgez): New (v8hf, v4hf variants).
	(vcgtz): New (v8hf, v4hf variants).
	(vcltz): New (v8hf, v4hf variants).
	(vclez): New (v8hf, v4hf variants).
	(vabd): New (v8hf, v4hf variants).
	(vmaxf): New (v8hf, v4hf variants).
	(vmaxnm): New (v8hf, v4hf variants).
	(vminf): New (v8hf, v4hf variants).
	(vminnm): New (v8hf, v4hf variants).
	(vpmaxf): New (v4hf variant).
	(vpminf): New (v4hf variant).
	(vpadd): New (v4hf variant).
	(vrecps): New (v8hf, v4hf variants).
	(vrsqrts): New (v8hf, v4hf variants).
	(vabs): New (v8hf, v4hf variants).
	(vneg): New (v8hf, v4hf variants).
	(vrecpe): New (v8hf, v4hf variants).
	(vrnd): New (v8hf, v4hf variants).
	(vrnda): New (v8hf, v4hf variants).
	(vrndm): New (v8hf, v4hf variants).
	(vrndn): New (v8hf, v4hf variants).
	(vrndp): New (v8hf, v4hf variants).
	(vrndx): New (v8hf, v4hf variants).
	(vsqrte): New (v8hf, v4hf variants).
	(vdup_n): New (v8hf, v4hf variants).
	(vdup_lane): New (v8hf, v4hf variants).
	(vmul_lane): Add v4hf and v8hf variants.
	(vmul_n): Add v4hf and v8hf variants.
	(vmul_n): Add v4hf and v8hf variants.
	(vext): New (v8hf, v4hf variants).
	(vcvts): New (v8hi, v4hi variants).
	(vcvts): New (v8hf, v4hf variants).
	(vcvtu): New (v8hi, v4hi variants).
	(vcvtu): New (v8hf, v4hf variants).
	(vcvts_n): New (v8hf, v4hf variants).
	(vcvtu_n): New (v8hi, v4hi variants).
	(vcvts_n): New (v8hi, v4hi variants).
	(vcvtu_n): New (v8hf, v4hf variants).
	(vbsl): New (v8hf, v4hf variants).
	(vcvtas): New (v8hf, v4hf variants).
	(vcvtau): New (v8hf, v4hf variants).
	(vcvtms): New (v8hf, v4hf variants).
	(vcvtmu): New (v8hf, v4hf variants).
	(vcvtns): New (v8hf, v4hf variants).
	(vcvtnu): New (v8hf, v4hf variants).
	(vcvtps): New (v8hf, v4hf variants).
	(vcvtpu): New (v8hf, v4hf variants).
---
 gcc/config/arm/arm_neon_builtins.def | 59 ++++++++++++++++++++++++++++++++++--
 1 file changed, 57 insertions(+), 2 deletions(-)

diff --git a/gcc/config/arm/arm_neon_builtins.def b/gcc/config/arm/arm_neon_builtins.def
index a4ba516..4a54a29 100644
--- a/gcc/config/arm/arm_neon_builtins.def
+++ b/gcc/config/arm/arm_neon_builtins.def
@@ -19,6 +19,7 @@
    <http://www.gnu.org/licenses/>.  */
 
 VAR2 (BINOP, vadd, v2sf, v4sf)
+VAR2 (BINOP, vadd, v8hf, v4hf)
 VAR3 (BINOP, vaddls, v8qi, v4hi, v2si)
 VAR3 (BINOP, vaddlu, v8qi, v4hi, v2si)
 VAR3 (BINOP, vaddws, v8qi, v4hi, v2si)
@@ -32,12 +33,15 @@ VAR8 (BINOP, vqaddu, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
 VAR3 (BINOP, vaddhn, v8hi, v4si, v2di)
 VAR3 (BINOP, vraddhn, v8hi, v4si, v2di)
 VAR2 (BINOP, vmulf, v2sf, v4sf)
+VAR2 (BINOP, vmulf, v8hf, v4hf)
 VAR2 (BINOP, vmulp, v8qi, v16qi)
 VAR8 (TERNOP, vmla, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf)
 VAR3 (TERNOP, vmlals, v8qi, v4hi, v2si)
 VAR3 (TERNOP, vmlalu, v8qi, v4hi, v2si)
 VAR2 (TERNOP, vfma, v2sf, v4sf)
+VAR2 (TERNOP, vfma, v4hf, v8hf)
 VAR2 (TERNOP, vfms, v2sf, v4sf)
+VAR2 (TERNOP, vfms, v4hf, v8hf)
 VAR8 (TERNOP, vmls, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf)
 VAR3 (TERNOP, vmlsls, v8qi, v4hi, v2si)
 VAR3 (TERNOP, vmlslu, v8qi, v4hi, v2si)
@@ -94,6 +98,7 @@ VAR8 (TERNOP_IMM, vsrau_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
 VAR8 (TERNOP_IMM, vrsras_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
 VAR8 (TERNOP_IMM, vrsrau_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
 VAR2 (BINOP, vsub, v2sf, v4sf)
+VAR2 (BINOP, vsub, v8hf, v4hf)
 VAR3 (BINOP, vsubls, v8qi, v4hi, v2si)
 VAR3 (BINOP, vsublu, v8qi, v4hi, v2si)
 VAR3 (BINOP, vsubws, v8qi, v4hi, v2si)
@@ -111,12 +116,27 @@ VAR8 (BINOP, vcgt, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf)
 VAR6 (BINOP, vcgtu, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
 VAR2 (BINOP, vcage, v2sf, v4sf)
 VAR2 (BINOP, vcagt, v2sf, v4sf)
+VAR2 (BINOP, vcage, v4hf, v8hf)
+VAR2 (BINOP, vcagt, v4hf, v8hf)
+VAR2 (BINOP, vcale, v4hf, v8hf)
+VAR2 (BINOP, vcalt, v4hf, v8hf)
+VAR2 (BINOP, vceq, v4hf, v8hf)
+VAR2 (BINOP, vcge, v4hf, v8hf)
+VAR2 (BINOP, vcgt, v4hf, v8hf)
+VAR2 (BINOP, vcle, v4hf, v8hf)
+VAR2 (BINOP, vclt, v4hf, v8hf)
+VAR2 (UNOP, vceqz, v4hf, v8hf)
+VAR2 (UNOP, vcgez, v4hf, v8hf)
+VAR2 (UNOP, vcgtz, v4hf, v8hf)
+VAR2 (UNOP, vclez, v4hf, v8hf)
+VAR2 (UNOP, vcltz, v4hf, v8hf)
 VAR6 (BINOP, vtst, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
 VAR6 (BINOP, vabds, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
 VAR6 (BINOP, vabdu, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
 VAR2 (BINOP, vabdf, v2sf, v4sf)
 VAR3 (BINOP, vabdls, v8qi, v4hi, v2si)
 VAR3 (BINOP, vabdlu, v8qi, v4hi, v2si)
+VAR2 (BINOP, vabd, v8hf, v4hf)
 
 VAR6 (TERNOP, vabas, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
 VAR6 (TERNOP, vabau, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
@@ -126,27 +146,38 @@ VAR3 (TERNOP, vabalu, v8qi, v4hi, v2si)
 VAR6 (BINOP, vmaxs, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
 VAR6 (BINOP, vmaxu, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
 VAR2 (BINOP, vmaxf, v2sf, v4sf)
+VAR2 (BINOP, vmaxf, v8hf, v4hf)
+VAR2 (BINOP, vmaxnm, v4hf, v8hf)
 VAR6 (BINOP, vmins, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
 VAR6 (BINOP, vminu, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
 VAR2 (BINOP, vminf, v2sf, v4sf)
+VAR2 (BINOP, vminf, v4hf, v8hf)
+VAR2 (BINOP, vminnm, v8hf, v4hf)
 
 VAR3 (BINOP, vpmaxs, v8qi, v4hi, v2si)
 VAR3 (BINOP, vpmaxu, v8qi, v4hi, v2si)
 VAR1 (BINOP, vpmaxf, v2sf)
+VAR1 (BINOP, vpmaxf, v4hf)
 VAR3 (BINOP, vpmins, v8qi, v4hi, v2si)
 VAR3 (BINOP, vpminu, v8qi, v4hi, v2si)
 VAR1 (BINOP, vpminf, v2sf)
+VAR1 (BINOP, vpminf, v4hf)
 
 VAR4 (BINOP, vpadd, v8qi, v4hi, v2si, v2sf)
+VAR1 (BINOP, vpadd, v4hf)
 VAR6 (UNOP, vpaddls, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
 VAR6 (UNOP, vpaddlu, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
 VAR6 (BINOP, vpadals, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
 VAR6 (BINOP, vpadalu, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
 VAR2 (BINOP, vrecps, v2sf, v4sf)
 VAR2 (BINOP, vrsqrts, v2sf, v4sf)
+VAR2 (BINOP, vrecps, v4hf, v8hf)
+VAR2 (BINOP, vrsqrts, v4hf, v8hf)
 VAR8 (TERNOP_IMM, vsri_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
 VAR8 (TERNOP_IMM, vsli_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
 VAR8 (UNOP, vabs, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf)
+VAR2 (UNOP, vabs, v8hf, v4hf)
+VAR2 (UNOP, vneg, v8hf, v4hf)
 VAR6 (UNOP, vqabs, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
 VAR8 (UNOP, vneg, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf)
 VAR6 (UNOP, vqneg, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
@@ -155,8 +186,16 @@ VAR6 (UNOP, vclz, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
 VAR5 (BSWAP, bswap, v4hi, v8hi, v2si, v4si, v2di)
 VAR2 (UNOP, vcnt, v8qi, v16qi)
 VAR4 (UNOP, vrecpe, v2si, v2sf, v4si, v4sf)
+VAR2 (UNOP, vrecpe, v8hf, v4hf)
 VAR4 (UNOP, vrsqrte, v2si, v2sf, v4si, v4sf)
 VAR6 (UNOP, vmvn, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
+VAR2 (UNOP, vrnd, v8hf, v4hf)
+VAR2 (UNOP, vrnda, v8hf, v4hf)
+VAR2 (UNOP, vrndm, v8hf, v4hf)
+VAR2 (UNOP, vrndn, v8hf, v4hf)
+VAR2 (UNOP, vrndp, v8hf, v4hf)
+VAR2 (UNOP, vrndx, v8hf, v4hf)
+VAR2 (UNOP, vsqrte, v8hf, v4hf)
   /* FIXME: vget_lane supports more variants than this!  */
 VAR10 (GETLANE, vget_lane,
 	 v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di)
@@ -179,7 +218,7 @@ VAR3 (UNOP, vqmovnu, v8hi, v4si, v2di)
 VAR3 (UNOP, vqmovun, v8hi, v4si, v2di)
 VAR3 (UNOP, vmovls, v8qi, v4hi, v2si)
 VAR3 (UNOP, vmovlu, v8qi, v4hi, v2si)
-VAR6 (SETLANE, vmul_lane, v4hi, v2si, v2sf, v8hi, v4si, v4sf)
+VAR8 (SETLANE, vmul_lane, v4hi, v2si, v2sf, v8hi, v4si, v4sf, v4hf, v8hf)
 VAR6 (MAC_LANE, vmla_lane, v4hi, v2si, v2sf, v8hi, v4si, v4sf)
 VAR2 (MAC_LANE, vmlals_lane, v4hi, v2si)
 VAR2 (MAC_LANE, vmlalu_lane, v4hi, v2si)
@@ -188,7 +227,7 @@ VAR6 (MAC_LANE, vmls_lane, v4hi, v2si, v2sf, v8hi, v4si, v4sf)
 VAR2 (MAC_LANE, vmlsls_lane, v4hi, v2si)
 VAR2 (MAC_LANE, vmlslu_lane, v4hi, v2si)
 VAR2 (MAC_LANE, vqdmlsl_lane, v4hi, v2si)
-VAR6 (BINOP, vmul_n, v4hi, v2si, v2sf, v8hi, v4si, v4sf)
+VAR8 (BINOP, vmul_n, v4hi, v2si, v2sf, v8hi, v4si, v4sf, v4hf, v8hf)
 VAR6 (MAC_N, vmla_n, v4hi, v2si, v2sf, v8hi, v4si, v4sf)
 VAR2 (MAC_N, vmlals_n, v4hi, v2si)
 VAR2 (MAC_N, vmlalu_n, v4hi, v2si)
@@ -204,9 +243,17 @@ VAR8 (UNOP, vrev64, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf)
 VAR4 (UNOP, vrev32, v8qi, v4hi, v16qi, v8hi)
 VAR2 (UNOP, vrev16, v8qi, v16qi)
 VAR4 (UNOP, vcvts, v2si, v2sf, v4si, v4sf)
+VAR2 (UNOP, vcvts, v4hi, v8hi)
+VAR2 (UNOP, vcvts, v4hf, v8hf)
+VAR2 (UNOP, vcvtu, v4hi, v8hi)
+VAR2 (UNOP, vcvtu, v4hf, v8hf)
 VAR4 (UNOP, vcvtu, v2si, v2sf, v4si, v4sf)
 VAR4 (BINOP, vcvts_n, v2si, v2sf, v4si, v4sf)
 VAR4 (BINOP, vcvtu_n, v2si, v2sf, v4si, v4sf)
+VAR2 (BINOP, vcvts_n, v4hf, v8hf)
+VAR2 (BINOP, vcvtu_n, v4hi, v8hi)
+VAR2 (BINOP, vcvts_n, v4hi, v8hi)
+VAR2 (BINOP, vcvtu_n, v4hf, v8hf)
 VAR1 (UNOP, vcvtv4sf, v4hf)
 VAR1 (UNOP, vcvtv4hf, v4sf)
 VAR10 (TERNOP, vbsl,
@@ -223,6 +270,14 @@ VAR1 (UNOP, vcvtav2sf, v2si)
 VAR1 (UNOP, vcvtav4sf, v4si)
 VAR1 (UNOP, vcvtauv2sf, v2si)
 VAR1 (UNOP, vcvtauv4sf, v4si)
+VAR2 (UNOP, vcvtas, v4hf, v8hf)
+VAR2 (UNOP, vcvtau, v4hf, v8hf)
+VAR2 (UNOP, vcvtms, v4hf, v8hf)
+VAR2 (UNOP, vcvtmu, v4hf, v8hf)
+VAR2 (UNOP, vcvtns, v4hf, v8hf)
+VAR2 (UNOP, vcvtnu, v4hf, v8hf)
+VAR2 (UNOP, vcvtps, v4hf, v8hf)
+VAR2 (UNOP, vcvtpu, v4hf, v8hf)
 VAR1 (UNOP, vcvtpv2sf, v2si)
 VAR1 (UNOP, vcvtpv4sf, v4si)
 VAR1 (UNOP, vcvtpuv2sf, v2si)
-- 
2.1.4


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 13/17][ARM] Add VFP FP16 instrinsics.
  2016-05-17 14:20 [PATCH 0/17][ARM] ARMv8.2-A and FP16 extension support Matthew Wahab
                   ` (11 preceding siblings ...)
  2016-05-17 14:43 ` [PATCH 12/17][ARM] Add builtins for NEON " Matthew Wahab
@ 2016-05-17 14:44 ` Matthew Wahab
  2016-07-04 14:14   ` Matthew Wahab
  2016-05-17 14:47 ` [PATCH 14/17][ARM] Add NEON " Matthew Wahab
                   ` (3 subsequent siblings)
  16 siblings, 1 reply; 73+ messages in thread
From: Matthew Wahab @ 2016-05-17 14:44 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2568 bytes --]

The ARMv8.2-A architecture introduces an optional FP16 extension adding
half-precision floating point data processing instructions to the
existing scalar (floating point) support. A future version of the ACLE
will add support for these instructions and this patch implements that
support.

The ACLE will introduce new intrinsics for the scalar (floating-point)
instructions together with a new header file arm_fp16.h. The ACLE will
require that the intrinsics are available when both the header file is
included and the ACLE feature macro __ARM_FEATURE_FP16_SCALAR_ARITHMETIC
is defined. (The new ACLE feature macros are dealt with in an earlier
patch.)

The patch adds the arm_fp16.h header file with the following new
intrinsics:
----
float16_t vabsh_f16 (float16_t __a)
int32_t vcvtah_s32_f16 (float16_t __a)
uint32_t vcvtah_u32_f16 (float16_t __a)
float16_t vcvth_f16_s32 (int32_t __a)
float16_t vcvth_f16_u32 (uint32_t __a)
int32_t vcvth_s32_f16 (float16_t __a)
uint32_t vcvth_u32_f16 (float16_t __a)
int32_t vcvtmh_s32_f16 (float16_t __a)
uint32_t vcvtmh_u32_f16 (float16_t __a)
int32_t vcvtnh_s32_f16 (float16_t __a)
uint32_t vcvtnh_u32_f16 (float16_t __a)
int32_t vcvtph_s32_f16 (float16_t __a)
uint32_t vcvtph_u32_f16 (float16_t __a)
float16_t vnegh_f16 (float16_t __a)
float16_t vrndah_f16 (float16_t __a)
float16_t vrndh_f16 (float16_t __a)
float16_t vrndih_f16 (float16_t __a)
float16_t vrndmh_f16 (float16_t __a)
float16_t vrndnh_f16 (float16_t __a)
float16_t vrndph_f16 (float16_t __a)
float16_t vrndxh_f16 (float16_t __a)
float16_t vsqrth_f16 (float16_t __a)

float16_t vaddh_f16 (float16_t __a, float16_t __b)
float16_t vcvth_n_f16_s32 (int32_t __a, const int __b)
float16_t vcvth_n_f16_u32 (uint32_t __a, const int __b)
int32_t vcvth_n_s32_f16 (float16_t __a, const int __b)
uint32_t vcvth_n_u32_f16 (float16_t __a, const int __b)
float16_t vdivh_f16 (float16_t __a, float16_t __b)
float16_t vmaxnmh_f16 (float16_t __a, float16_t __b)
float16_t vminnmh_f16 (float16_t __a, float16_t __b)
float16_t vmulh_f16 (float16_t __a, float16_t __b)
float16_t vsubh_f16 (float16_t __a, float16_t __b)

float16_t vfmah_f16 (float16_t __a, float16_t __b, float16_t __c)
float16_t vfmsh_f16 (float16_t __a, float16_t __b, float16_t __c)
----

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator.

Ok for trunk?
Matthew

2016-05-17  Matthew Wahab  <matthew.wahab@arm.com>

	* config.gcc (extra_headers): Add arm_fp16.h
	* config/arm/arm_fp16.h: New.


[-- Attachment #2: 0013-PATCH-13-17-ARM-Add-VFP-FP16-instrinsics.patch --]
[-- Type: text/x-patch, Size: 8251 bytes --]

From 0c7d4da5a7c8ca9cf3ce2f23072668c4155b35d9 Mon Sep 17 00:00:00 2001
From: Matthew Wahab <matthew.wahab@arm.com>
Date: Thu, 7 Apr 2016 15:36:23 +0100
Subject: [PATCH 13/17] [PATCH 13/17][ARM] Add VFP FP16 instrinsics.

2016-05-17  Matthew Wahab  <matthew.wahab@arm.com>

	* config.gcc (extra_headers): Add arm_fp16.h
	* config/arm/arm_fp16.h: New.
---
 gcc/config.gcc            |   2 +-
 gcc/config/arm/arm_fp16.h | 255 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 256 insertions(+), 1 deletion(-)
 create mode 100644 gcc/config/arm/arm_fp16.h

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 51af122a..e22ff9e 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -327,7 +327,7 @@ arc*-*-*)
 arm*-*-*)
 	cpu_type=arm
 	extra_objs="arm-builtins.o aarch-common.o"
-	extra_headers="mmintrin.h arm_neon.h arm_acle.h"
+	extra_headers="mmintrin.h arm_neon.h arm_acle.h arm_fp16.h"
 	target_type_format_char='%'
 	c_target_objs="arm-c.o"
 	cxx_target_objs="arm-c.o"
diff --git a/gcc/config/arm/arm_fp16.h b/gcc/config/arm/arm_fp16.h
new file mode 100644
index 0000000..702090a
--- /dev/null
+++ b/gcc/config/arm/arm_fp16.h
@@ -0,0 +1,255 @@
+/* ARM FP16 intrinsics include file.
+
+   Copyright (C) 2016 Free Software Foundation, Inc.
+   Contributed by ARM Ltd.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _GCC_ARM_FP16_H
+#define _GCC_ARM_FP16_H 1
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <stdint.h>
+
+/* Intrinsics for FP16 instructions.  */
+#pragma GCC push_options
+#pragma GCC target ("fpu=fp-armv8")
+
+#if defined (__ARM_FEATURE_FP16_SCALAR_ARITHMETIC)
+
+typedef __fp16 float16_t;
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vabsh_f16 (float16_t __a)
+{
+  return __builtin_neon_vabshf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vaddh_f16 (float16_t __a, float16_t __b)
+{
+  return __builtin_neon_vaddhf (__a, __b);
+}
+
+__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+vcvtah_s32_f16 (float16_t __a)
+{
+  return __builtin_neon_vcvtahssi (__a);
+}
+
+__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+vcvtah_u32_f16 (float16_t __a)
+{
+  return __builtin_neon_vcvtahusi (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vcvth_f16_s32 (int32_t __a)
+{
+  return __builtin_neon_vcvthshf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vcvth_f16_u32 (uint32_t __a)
+{
+  return __builtin_neon_vcvthuhf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vcvth_n_f16_s32 (int32_t __a, const int __b)
+{
+  return __builtin_neon_vcvths_nhf (__a, __b);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vcvth_n_f16_u32 (uint32_t __a, const int __b)
+{
+  return __builtin_neon_vcvthu_nhf ((int32_t)__a, __b);
+}
+
+__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+vcvth_n_s32_f16 (float16_t __a, const int __b)
+{
+  return __builtin_neon_vcvths_nsi (__a, __b);
+}
+
+__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+vcvth_n_u32_f16 (float16_t __a, const int __b)
+{
+  return (uint32_t)__builtin_neon_vcvthu_nsi (__a, __b);
+}
+
+__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+vcvth_s32_f16 (float16_t __a)
+{
+  return __builtin_neon_vcvthssi (__a);
+}
+
+__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+vcvth_u32_f16 (float16_t __a)
+{
+  return __builtin_neon_vcvthusi (__a);
+}
+
+__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+vcvtmh_s32_f16 (float16_t __a)
+{
+  return __builtin_neon_vcvtmhssi (__a);
+}
+
+__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+vcvtmh_u32_f16 (float16_t __a)
+{
+  return __builtin_neon_vcvtmhusi (__a);
+}
+
+__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+vcvtnh_s32_f16 (float16_t __a)
+{
+  return __builtin_neon_vcvtnhssi (__a);
+}
+
+__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+vcvtnh_u32_f16 (float16_t __a)
+{
+  return __builtin_neon_vcvtnhusi (__a);
+}
+
+__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+vcvtph_s32_f16 (float16_t __a)
+{
+  return __builtin_neon_vcvtphssi (__a);
+}
+
+__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+vcvtph_u32_f16 (float16_t __a)
+{
+  return __builtin_neon_vcvtphusi (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vdivh_f16 (float16_t __a, float16_t __b)
+{
+  return __builtin_neon_vdivhf (__a, __b);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vfmah_f16 (float16_t __a, float16_t __b, float16_t __c)
+{
+  return __builtin_neon_vfmahf (__a, __b, __c);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vfmsh_f16 (float16_t __a, float16_t __b, float16_t __c)
+{
+  return __builtin_neon_vfmshf (__a, __b, __c);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vmaxnmh_f16 (float16_t __a, float16_t __b)
+{
+  return __builtin_neon_vmaxnmhf (__a, __b);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vminnmh_f16 (float16_t __a, float16_t __b)
+{
+  return __builtin_neon_vminnmhf (__a, __b);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vmulh_f16 (float16_t __a, float16_t __b)
+{
+  return __builtin_neon_vmulfhf (__a, __b);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vnegh_f16 (float16_t __a)
+{
+  return __builtin_neon_vneghf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vrndah_f16 (float16_t __a)
+{
+  return __builtin_neon_vrndahf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vrndh_f16 (float16_t __a)
+{
+  return __builtin_neon_vrndhf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vrndih_f16 (float16_t __a)
+{
+  return __builtin_neon_vrndihf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vrndmh_f16 (float16_t __a)
+{
+  return __builtin_neon_vrndmhf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vrndnh_f16 (float16_t __a)
+{
+  return __builtin_neon_vrndnhf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vrndph_f16 (float16_t __a)
+{
+  return __builtin_neon_vrndphf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vrndxh_f16 (float16_t __a)
+{
+  return __builtin_neon_vrndxhf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vsqrth_f16 (float16_t __a)
+{
+  return __builtin_neon_vsqrthf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vsubh_f16 (float16_t __a, float16_t __b)
+{
+  return __builtin_neon_vsubhf (__a, __b);
+}
+
+#endif /* __ARM_FEATURE_FP16_SCALAR_ARITHMETIC  */
+#pragma GCC pop_options
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
-- 
2.1.4


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 14/17][ARM] Add NEON FP16 instrinsics.
  2016-05-17 14:20 [PATCH 0/17][ARM] ARMv8.2-A and FP16 extension support Matthew Wahab
                   ` (12 preceding siblings ...)
  2016-05-17 14:44 ` [PATCH 13/17][ARM] Add VFP FP16 instrinsics Matthew Wahab
@ 2016-05-17 14:47 ` Matthew Wahab
  2016-07-04 14:16   ` Matthew Wahab
  2016-05-17 14:49 ` [PATCH 15/17][ARM] Add tests for ARMv8.2-A FP16 support Matthew Wahab
                   ` (2 subsequent siblings)
  16 siblings, 1 reply; 73+ messages in thread
From: Matthew Wahab @ 2016-05-17 14:47 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 8834 bytes --]

The ARMv8.2-A architecture introduces an optional FP16 extension adding
half-precision floating point data processing instructions to the
existing Adv.SIMD (NEON) support. A future version of the ACLE will add
support for these instructions and this patch implements that support.

The ACLE will introduce new intrinsics for the Adv.SIMD instructions
together and will require that these intrinsics are available when both
the header file arm_neon.h is included and the ACLE feature macro
__ARM_FEATURE_FP16_VECTOR_ARITHMETIC is defined. (The new ACLE feature
macro is dealt with in an earlier patch.)

The patch adds the following new intrinsics to arm_neon.h:
----
float16x4_t vabs_f16 (float16x4_t __a)
float16x8_t vabsq_f16 (float16x8_t __a)
uint16x4_t vceqz_f16 (float16x4_t __a)
uint16x8_t vceqzq_f16 (float16x8_t __a)
uint16x4_t vcgez_f16 (float16x4_t __a)
uint16x8_t vcgezq_f16 (float16x8_t __a)
uint16x4_t vcgtz_f16 (float16x4_t __a)
uint16x8_t vcgtzq_f16 (float16x8_t __a)
uint16x4_t vclez_f16 (float16x4_t __a)
uint16x8_t vclezq_f16 (float16x8_t __a)
uint16x4_t vcltz_f16 (float16x4_t __a)
uint16x8_t vcltzq_f16 (float16x8_t __a)
float16x4_t vcvt_f16_s16 (int16x4_t __a)
float16x4_t vcvt_f16_u16 (uint16x4_t __a)
int16x4_t vcvt_s16_f16 (float16x4_t __a)
uint16x4_t vcvt_u16_f16 (float16x4_t __a)
float16x8_t vcvtq_f16_s16 (int16x8_t __a)
float16x8_t vcvtq_f16_u16 (uint16x8_t __a)
int16x8_t vcvtq_s16_f16 (float16x8_t __a)
uint16x8_t vcvtq_u16_f16 (float16x8_t __a)
int16x4_t vcvta_s16_f16 (float16x4_t __a)
uint16x4_t vcvta_u16_f16 (float16x4_t __a)
int16x8_t vcvtaq_s16_f16 (float16x8_t __a)
uint16x8_t vcvtaq_u16_f16 (float16x8_t __a)
int16x4_t vcvtm_s16_f16 (float16x4_t __a)
uint16x4_t vcvtm_u16_f16 (float16x4_t __a)
int16x8_t vcvtmq_s16_f16 (float16x8_t __a)
uint16x8_t vcvtmq_u16_f16 (float16x8_t __a)
int16x4_t vcvtn_s16_f16 (float16x4_t __a)
uint16x4_t vcvtn_u16_f16 (float16x4_t __a)
int16x8_t vcvtnq_s16_f16 (float16x8_t __a)
uint16x8_t vcvtnq_u16_f16 (float16x8_t __a)
int16x4_t vcvtp_s16_f16 (float16x4_t __a)
uint16x4_t vcvtp_u16_f16 (float16x4_t __a)
int16x8_t vcvtpq_s16_f16 (float16x8_t __a)
uint16x8_t vcvtpq_u16_f16 (float16x8_t __a)
float16x4_t vneg_f16 (float16x4_t __a)
float16x8_t vnegq_f16 (float16x8_t __a)
float16x4_t vrecpe_f16 (float16x4_t __a)
float16x8_t vrecpeq_f16 (float16x8_t __a)
float16x4_t vrnd_f16 (float16x4_t __a)
float16x8_t vrndq_f16 (float16x8_t __a)
float16x4_t vrnda_f16 (float16x4_t __a)
float16x8_t vrndaq_f16 (float16x8_t __a)
float16x4_t vrndm_f16 (float16x4_t __a)
float16x8_t vrndmq_f16 (float16x8_t __a)
float16x4_t vrndn_f16 (float16x4_t __a)
float16x8_t vrndnq_f16 (float16x8_t __a)
float16x4_t vrndp_f16 (float16x4_t __a)
float16x8_t vrndpq_f16 (float16x8_t __a)
float16x4_t vrndx_f16 (float16x4_t __a)
float16x8_t vrndxq_f16 (float16x8_t __a)
float16x4_t vsqrte_f16 (float16x4_t __a)
float16x8_t vsqrteq_f16 (float16x8_t __a)

float16x4_t vabd_f16 (float16x4_t __a, float16x4_t __b)
float16x8_t vabdq_f16 (float16x8_t __a, float16x8_t __b)
float16x4_t vadd_f16 (float16x4_t __a, float16x4_t __b)
float16x8_t vaddq_f16 (float16x8_t __a, float16x8_t __b)
uint16x4_t vcage_f16 (float16x4_t __a, float16x4_t __b)
uint16x8_t vcageq_f16 (float16x8_t __a, float16x8_t __b)
uint16x4_t vcagt_f16 (float16x4_t __a, float16x4_t __b)
uint16x8_t vcagtq_f16 (float16x8_t __a, float16x8_t __b)
uint16x4_t vcale_f16 (float16x4_t __a, float16x4_t __b)
uint16x8_t vcaleq_f16 (float16x8_t __a, float16x8_t __b)
uint16x4_t vcalt_f16 (float16x4_t __a, float16x4_t __b)
uint16x8_t vcaltq_f16 (float16x8_t __a, float16x8_t __b)
uint16x4_t vceq_f16 (float16x4_t __a, float16x4_t __b)
uint16x8_t vceqq_f16 (float16x8_t __a, float16x8_t __b)
uint16x4_t vcge_f16 (float16x4_t __a, float16x4_t __b)
uint16x8_t vcgeq_f16 (float16x8_t __a, float16x8_t __b)
uint16x4_t vcgt_f16 (float16x4_t __a, float16x4_t __b)
uint16x8_t vcgtq_f16 (float16x8_t __a, float16x8_t __b)
uint16x4_t vcle_f16 (float16x4_t __a, float16x4_t __b)
uint16x8_t vcleq_f16 (float16x8_t __a, float16x8_t __b)
uint16x4_t vclt_f16 (float16x4_t __a, float16x4_t __b)
uint16x8_t vcltq_f16 (float16x8_t __a, float16x8_t __b)
float16x4_t vcvt_n_f16_s16 (int16x4_t __a, const int __b)
float16x4_t vcvt_n_f16_u16 (uint16x4_t __a, const int __b)
float16x8_t vcvtq_n_f16_s16 (int16x8_t __a, const int __b)
float16x8_t vcvtq_n_f16_u16 (uint16x8_t __a, const int __b)
int16x4_t vcvt_n_s16_f16 (float16x4_t __a, const int __b)
uint16x4_t vcvt_n_u16_f16 (float16x4_t __a, const int __b)
int16x8_t vcvtq_n_s16_f16 (float16x8_t __a, const int __b)
uint16x8_t vcvtq_n_u16_f16 (float16x8_t __a, const int __b)
float16x4_t vmax_f16 (float16x4_t __a, float16x4_t __b)
float16x8_t vmaxq_f16 (float16x8_t __a, float16x8_t __b)
float16x4_t vmaxnm_f16 (float16x4_t __a, float16x4_t __b)
float16x8_t vmaxnmq_f16 (float16x8_t __a, float16x8_t __b)
float16x4_t vmin_f16 (float16x4_t __a, float16x4_t __b)
float16x8_t vminq_f16 (float16x8_t __a, float16x8_t __b)
float16x4_t vminnm_f16 (float16x4_t __a, float16x4_t __b)
float16x8_t vminnmq_f16 (float16x8_t __a, float16x8_t __b)
float16x4_t vmul_f16 (float16x4_t __a, float16x4_t __b)
float16x4_t vmul_n_f16 (float16x4_t __a, float16_t __b)
float16x8_t vmulq_f16 (float16x8_t __a, float16x8_t __b)
float16x8_t vmulq_n_f16 (float16x8_t __a, float16_t __b)
float16x4_t vpadd_f16 (float16x4_t __a, float16x4_t __b)
float16x4_t vpmax_f16 (float16x4_t __a, float16x4_t __b)
float16x4_t vpmin_f16 (float16x4_t __a, float16x4_t __b)
float16x4_t vrecps_f16 (float16x4_t __a, float16x4_t __b)
float16x8_t vrecpsq_f16 (float16x8_t __a, float16x8_t __b)
float16x4_t vrsqrts_f16 (float16x4_t __a, float16x4_t __b)
float16x8_t vrsqrtsq_f16 (float16x8_t __a, float16x8_t __b)
float16x4_t vsub_f16 (float16x4_t __a, float16x4_t __b)
float16x8_t vsubq_f16 (float16x8_t __a, float16x8_t __b)

float16x4_t vfma_f16 (float16x4_t __a, float16x4_t __b, float16x4_t __c)
float16x8_t vfmaq_f16 (float16x8_t __a, float16x8_t __b, float16x8_t __c)
float16x4_t vfms_f16 (float16x4_t __a, float16x4_t __b, float16x4_t __c)
float16x8_t vfmsq_f16 (float16x8_t __a, float16x8_t __b, float16x8_t __c)
float16x4_t vmul_lane_f16 (float16x4_t __a, float16x4_t __b, const int __c)
float16x8_t vmulq_lane_f16 (float16x8_t __a, float16x4_t __b, const int __c)
----

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator.

Ok for trunk?
Matthew

2016-05-17  Matthew Wahab  <matthew.wahab@arm.com>

	* config/arm/arm_neon.h: Include arm_fp16.h.
	(vabd_f16): New.
	(vabdq_f16): New.
	(vabs_f16): New.
	(vabsq_f16): New.
	(vadd_f16): New.
	(vaddq_f16): New.
	(vcage_f16): New.
	(vcageq_f16): New.
	(vcagt_f16): New.
	(vcagtq_f16): New.
	(vcale_f16): New.
	(vcaleq_f16): New.
	(vcalt_f16): New.
	(vcaltq_f16): New.
	(vceq_f16): New.
	(vceqq_f16): New.
	(vceqz_f16): New.
	(vceqzq_f16): New.
	(vcge_f16): New.
	(vcgeq_f16): New.
	(vcgez_f16): New.
	(vcgezq_f16): New.
	(vcgt_f16): New.
	(vcgtq_f16): New.
	(vcgtz_f16): New.
	(vcgtzq_f16): New.
	(vcle_f16): New.
	(vcleq_f16): New.
	(vclez_f16): New.
	(vclezq_f16): New.
	(vclt_f16): New.
	(vcltq_f16): New.
	(vcltz_f16): New.
	(vcltzq_f16): New.
	(vcvt_f16_s16): New.
	(vcvt_f16_u16): New.
	(vcvt_s16_f16): New.
	(vcvt_u16_f16): New.
	(vcvtq_f16_s16): New.
	(vcvtq_f16_u16): New.
	(vcvtq_s16_f16): New.
	(vcvtq_u16_f16): New.
	(vcvta_s16_f16): New.
	(vcvta_u16_f16): New.
	(vcvtaq_s16_f16): New.
	(vcvtaq_u16_f16): New.
	(vcvtm_s16_f16): New.
	(vcvtm_u16_f16): New.
	(vcvtmq_s16_f16): New.
	(vcvtmq_u16_f16): New.
	(vcvtn_s16_f16): New.
	(vcvtn_u16_f16): New.
	(vcvtnq_s16_f16): New.
	(vcvtnq_u16_f16): New.
	(vcvtp_s16_f16): New.
	(vcvtp_u16_f16): New.
	(vcvtpq_s16_f16): New.
	(vcvtpq_u16_f16): New.
	(vcvt_n_f16_s16): New.
	(vcvt_n_f16_u16): New.
	(vcvtq_n_f16_s16): New.
	(vcvtq_n_f16_u16): New.
	(vcvt_n_s16_f16): New.
	(vcvt_n_u16_f16): New.
	(vcvtq_n_s16_f16): New.
	(vcvtq_n_u16_f16): New.
	(vfma_f16): New.
	(vfmaq_f16): New.
	(vfms_f16): New.
	(vfmsq_f16): New.
	(vmax_f16): New.
	(vmaxq_f16): New.
	(vmaxnm_f16): New.
	(vmaxnmq_f16): New.
	(vmin_f16): New.
	(vminq_f16): New.
	(vminnm_f16): New.
	(vminnmq_f16): New.
	(vmul_f16): New.
	(vmul_lane_f16): New.
	(vmul_n_f16): New.
	(vmulq_f16): New.
	(vmulq_lane_f16): New.
	(vmulq_n_f16): New.
	(vneg_f16): New.
	(vnegq_f16): New.
	(vpadd_f16): New.
	(vpmax_f16): New.
	(vpmin_f16): New.
	(vrecpe_f16): New.
	(vrecpeq_f16): New.
	(vrnd_f16): New.
	(vrndq_f16): New.
	(vrnda_f16): New.
	(vrndaq_f16): New.
	(vrndm_f16): New.
	(vrndmq_f16): New.
	(vrndn_f16): New.
	(vrndnq_f16): New.
	(vrndp_f16): New.
	(vrndpq_f16): New.
	(vrndx_f16): New.
	(vrndxq_f16): New.
	(vsqrte_f16): New.
	(vsqrteq_f16): New.
	(vrecps_f16): New.
	(vrecpsq_f16): New.
	(vrsqrts_f16): New.
	(vrsqrtsq_f16): New.
	(vsub_f16): New.
	(vsubq_f16): New.


[-- Attachment #2: 0014-PATCH-14-17-ARM-Add-NEON-FP16-instrinsics.patch --]
[-- Type: text/x-patch, Size: 23109 bytes --]

From 3f8692f5849049af0db05d1cc3b4cda80ae131e0 Mon Sep 17 00:00:00 2001
From: Matthew Wahab <matthew.wahab@arm.com>
Date: Thu, 7 Apr 2016 15:36:34 +0100
Subject: [PATCH 14/17] [PATCH 14/17][ARM] Add NEON FP16 instrinsics.

2016-05-17  Matthew Wahab  <matthew.wahab@arm.com>

	* config/arm/arm_neon.h (vabd_f16): New.
	(vabdq_f16): New.
	(vabs_f16): New.
	(vabsq_f16): New.
	(vadd_f16): New.
	(vaddq_f16): New.
	(vcage_f16): New.
	(vcageq_f16): New.
	(vcagt_f16): New.
	(vcagtq_f16): New.
	(vcale_f16): New.
	(vcaleq_f16): New.
	(vcalt_f16): New.
	(vcaltq_f16): New.
	(vceq_f16): New.
	(vceqq_f16): New.
	(vceqz_f16): New.
	(vceqzq_f16): New.
	(vcge_f16): New.
	(vcgeq_f16): New.
	(vcgez_f16): New.
	(vcgezq_f16): New.
	(vcgt_f16): New.
	(vcgtq_f16): New.
	(vcgtz_f16): New.
	(vcgtzq_f16): New.
	(vcle_f16): New.
	(vcleq_f16): New.
	(vclez_f16): New.
	(vclezq_f16): New.
	(vclt_f16): New.
	(vcltq_f16): New.
	(vcltz_f16): New.
	(vcltzq_f16): New.
	(vcvt_f16_s16): New.
	(vcvt_f16_u16): New.
	(vcvt_s16_f16): New.
	(vcvt_u16_f16): New.
	(vcvtq_f16_s16): New.
	(vcvtq_f16_u16): New.
	(vcvtq_s16_f16): New.
	(vcvtq_u16_f16): New.
	(vcvta_s16_f16): New.
	(vcvta_u16_f16): New.
	(vcvtaq_s16_f16): New.
	(vcvtaq_u16_f16): New.
	(vcvtm_s16_f16): New.
	(vcvtm_u16_f16): New.
	(vcvtmq_s16_f16): New.
	(vcvtmq_u16_f16): New.
	(vcvtn_s16_f16): New.
	(vcvtn_u16_f16): New.
	(vcvtnq_s16_f16): New.
	(vcvtnq_u16_f16): New.
	(vcvtp_s16_f16): New.
	(vcvtp_u16_f16): New.
	(vcvtpq_s16_f16): New.
	(vcvtpq_u16_f16): New.
	(vcvt_n_f16_s16): New.
	(vcvt_n_f16_u16): New.
	(vcvtq_n_f16_s16): New.
	(vcvtq_n_f16_u16): New.
	(vcvt_n_s16_f16): New.
	(vcvt_n_u16_f16): New.
	(vcvtq_n_s16_f16): New.
	(vcvtq_n_u16_f16): New.
	(vfma_f16): New.
	(vfmaq_f16): New.
	(vfms_f16): New.
	(vfmsq_f16): New.
	(vmax_f16): New.
	(vmaxq_f16): New.
	(vmaxnm_f16): New.
	(vmaxnmq_f16): New.
	(vmin_f16): New.
	(vminq_f16): New.
	(vminnm_f16): New.
	(vminnmq_f16): New.
	(vmul_f16): New.
	(vmul_lane_f16): New.
	(vmul_n_f16): New.
	(vmulq_f16): New.
	(vmulq_lane_f16): New.
	(vmulq_n_f16): New.
	(vneg_f16): New.
	(vnegq_f16): New.
	(vpadd_f16): New.
	(vpmax_f16): New.
	(vpmin_f16): New.
	(vrecpe_f16): New.
	(vrecpeq_f16): New.
	(vrnd_f16): New.
	(vrndq_f16): New.
	(vrnda_f16): New.
	(vrndaq_f16): New.
	(vrndm_f16): New.
	(vrndmq_f16): New.
	(vrndn_f16): New.
	(vrndnq_f16): New.
	(vrndp_f16): New.
	(vrndpq_f16): New.
	(vrndx_f16): New.
	(vrndxq_f16): New.
	(vsqrte_f16): New.
	(vsqrteq_f16): New.
	(vrecps_f16): New.
	(vrecpsq_f16): New.
	(vrsqrts_f16): New.
	(vrsqrtsq_f16): New.
	(vsub_f16): New.
	(vsubq_f16): New.
---
 gcc/config/arm/arm_neon.h | 675 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 675 insertions(+)

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 5b433b4..4075ff8 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -38,6 +38,7 @@
 extern "C" {
 #endif
 
+#include <arm_fp16.h>
 #include <stdint.h>
 
 typedef __simd64_int8_t int8x8_t;
@@ -14830,6 +14831,680 @@ vmull_high_p64 (poly64x2_t __a, poly64x2_t __b)
 
 #pragma GCC pop_options
 
+  /* Intrinsics for FP16 instructions.  */
+#pragma GCC push_options
+#pragma GCC target ("fpu=neon-fp-armv8")
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vabd_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_neon_vabdv4hf (__a, __b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vabdq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_neon_vabdv8hf (__a, __b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vabs_f16 (float16x4_t __a)
+{
+  return __builtin_neon_vabsv4hf (__a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vabsq_f16 (float16x8_t __a)
+{
+  return __builtin_neon_vabsv8hf (__a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vadd_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_neon_vaddv4hf (__a, __b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vaddq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_neon_vaddv8hf (__a, __b);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcage_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return (uint16x4_t)__builtin_neon_vcagev4hf (__a, __b);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcageq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return (uint16x8_t)__builtin_neon_vcagev8hf (__a, __b);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcagt_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return (uint16x4_t)__builtin_neon_vcagtv4hf (__a, __b);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcagtq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return (uint16x8_t)__builtin_neon_vcagtv8hf (__a, __b);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcale_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return (uint16x4_t)__builtin_neon_vcalev4hf (__a, __b);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcaleq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return (uint16x8_t)__builtin_neon_vcalev8hf (__a, __b);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcalt_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return (uint16x4_t)__builtin_neon_vcaltv4hf (__a, __b);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcaltq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return (uint16x8_t)__builtin_neon_vcaltv8hf (__a, __b);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vceq_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return (uint16x4_t)__builtin_neon_vceqv4hf (__a, __b);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vceqq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return (uint16x8_t)__builtin_neon_vceqv8hf (__a, __b);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vceqz_f16 (float16x4_t __a)
+{
+  return (uint16x4_t)__builtin_neon_vceqzv4hf (__a);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vceqzq_f16 (float16x8_t __a)
+{
+  return (uint16x8_t)__builtin_neon_vceqzv8hf (__a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcge_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return (uint16x4_t)__builtin_neon_vcgev4hf (__a, __b);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcgeq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return (uint16x8_t)__builtin_neon_vcgev8hf (__a, __b);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcgez_f16 (float16x4_t __a)
+{
+  return (uint16x4_t)__builtin_neon_vcgezv4hf (__a);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcgezq_f16 (float16x8_t __a)
+{
+  return (uint16x8_t)__builtin_neon_vcgezv8hf (__a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcgt_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return (uint16x4_t)__builtin_neon_vcgtv4hf (__a, __b);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcgtq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return (uint16x8_t)__builtin_neon_vcgtv8hf (__a, __b);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcgtz_f16 (float16x4_t __a)
+{
+  return (uint16x4_t)__builtin_neon_vcgtzv4hf (__a);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcgtzq_f16 (float16x8_t __a)
+{
+  return (uint16x8_t)__builtin_neon_vcgtzv8hf (__a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcle_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return (uint16x4_t)__builtin_neon_vclev4hf (__a, __b);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcleq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return (uint16x8_t)__builtin_neon_vclev8hf (__a, __b);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vclez_f16 (float16x4_t __a)
+{
+  return (uint16x4_t)__builtin_neon_vclezv4hf (__a);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vclezq_f16 (float16x8_t __a)
+{
+  return (uint16x8_t)__builtin_neon_vclezv8hf (__a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vclt_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return (uint16x4_t)__builtin_neon_vcltv4hf (__a, __b);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcltq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return (uint16x8_t)__builtin_neon_vcltv8hf (__a, __b);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcltz_f16 (float16x4_t __a)
+{
+  return (uint16x4_t)__builtin_neon_vcltzv4hf (__a);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcltzq_f16 (float16x8_t __a)
+{
+  return (uint16x8_t)__builtin_neon_vcltzv8hf (__a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vcvt_f16_s16 (int16x4_t __a)
+{
+  return (float16x4_t)__builtin_neon_vcvtsv4hi (__a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vcvt_f16_u16 (uint16x4_t __a)
+{
+  return (float16x4_t)__builtin_neon_vcvtuv4hi ((int16x4_t)__a);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vcvt_s16_f16 (float16x4_t __a)
+{
+  return (int16x4_t)__builtin_neon_vcvtsv4hf (__a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcvt_u16_f16 (float16x4_t __a)
+{
+  return (uint16x4_t)__builtin_neon_vcvtuv4hf (__a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vcvtq_f16_s16 (int16x8_t __a)
+{
+  return (float16x8_t)__builtin_neon_vcvtsv8hi (__a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vcvtq_f16_u16 (uint16x8_t __a)
+{
+  return (float16x8_t)__builtin_neon_vcvtuv8hi ((int16x8_t)__a);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vcvtq_s16_f16 (float16x8_t __a)
+{
+  return (int16x8_t)__builtin_neon_vcvtsv8hf (__a);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcvtq_u16_f16 (float16x8_t __a)
+{
+  return (uint16x8_t)__builtin_neon_vcvtuv8hf (__a);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vcvta_s16_f16 (float16x4_t __a)
+{
+  return __builtin_neon_vcvtasv4hf (__a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcvta_u16_f16 (float16x4_t __a)
+{
+  return (uint16x4_t)__builtin_neon_vcvtauv4hf (__a);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vcvtaq_s16_f16 (float16x8_t __a)
+{
+  return __builtin_neon_vcvtasv8hf (__a);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcvtaq_u16_f16 (float16x8_t __a)
+{
+  return (uint16x8_t)__builtin_neon_vcvtauv8hf (__a);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vcvtm_s16_f16 (float16x4_t __a)
+{
+  return __builtin_neon_vcvtmsv4hf (__a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcvtm_u16_f16 (float16x4_t __a)
+{
+  return (uint16x4_t)__builtin_neon_vcvtmuv4hf (__a);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vcvtmq_s16_f16 (float16x8_t __a)
+{
+  return __builtin_neon_vcvtmsv8hf (__a);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcvtmq_u16_f16 (float16x8_t __a)
+{
+  return (uint16x8_t)__builtin_neon_vcvtmuv8hf (__a);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vcvtn_s16_f16 (float16x4_t __a)
+{
+  return __builtin_neon_vcvtnsv4hf (__a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcvtn_u16_f16 (float16x4_t __a)
+{
+  return (uint16x4_t)__builtin_neon_vcvtnuv4hf (__a);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vcvtnq_s16_f16 (float16x8_t __a)
+{
+  return __builtin_neon_vcvtnsv8hf (__a);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcvtnq_u16_f16 (float16x8_t __a)
+{
+  return (uint16x8_t)__builtin_neon_vcvtnuv8hf (__a);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vcvtp_s16_f16 (float16x4_t __a)
+{
+  return __builtin_neon_vcvtpsv4hf (__a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcvtp_u16_f16 (float16x4_t __a)
+{
+  return (uint16x4_t)__builtin_neon_vcvtpuv4hf (__a);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vcvtpq_s16_f16 (float16x8_t __a)
+{
+  return __builtin_neon_vcvtpsv8hf (__a);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcvtpq_u16_f16 (float16x8_t __a)
+{
+  return (uint16x8_t)__builtin_neon_vcvtpuv8hf (__a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vcvt_n_f16_s16 (int16x4_t __a, const int __b)
+{
+  return __builtin_neon_vcvts_nv4hi (__a, __b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vcvt_n_f16_u16 (uint16x4_t __a, const int __b)
+{
+  return __builtin_neon_vcvtu_nv4hi ((int16x4_t)__a, __b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vcvtq_n_f16_s16 (int16x8_t __a, const int __b)
+{
+  return __builtin_neon_vcvts_nv8hi (__a, __b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vcvtq_n_f16_u16 (uint16x8_t __a, const int __b)
+{
+  return __builtin_neon_vcvtu_nv8hi ((int16x8_t)__a, __b);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vcvt_n_s16_f16 (float16x4_t __a, const int __b)
+{
+  return __builtin_neon_vcvts_nv4hf (__a, __b);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcvt_n_u16_f16 (float16x4_t __a, const int __b)
+{
+  return (uint16x4_t)__builtin_neon_vcvtu_nv4hf (__a, __b);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vcvtq_n_s16_f16 (float16x8_t __a, const int __b)
+{
+  return __builtin_neon_vcvts_nv8hf (__a, __b);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcvtq_n_u16_f16 (float16x8_t __a, const int __b)
+{
+  return (uint16x8_t)__builtin_neon_vcvtu_nv8hf (__a, __b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vfma_f16 (float16x4_t __a, float16x4_t __b, float16x4_t __c)
+{
+  return __builtin_neon_vfmav4hf (__a, __b, __c);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vfmaq_f16 (float16x8_t __a, float16x8_t __b, float16x8_t __c)
+{
+  return __builtin_neon_vfmav8hf (__a, __b, __c);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vfms_f16 (float16x4_t __a, float16x4_t __b, float16x4_t __c)
+{
+  return __builtin_neon_vfmsv4hf (__a, __b, __c);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vfmsq_f16 (float16x8_t __a, float16x8_t __b, float16x8_t __c)
+{
+  return __builtin_neon_vfmsv8hf (__a, __b, __c);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vmax_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_neon_vmaxfv4hf (__a, __b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vmaxq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_neon_vmaxfv8hf (__a, __b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vmaxnm_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_neon_vmaxnmv4hf (__a, __b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vmaxnmq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_neon_vmaxnmv8hf (__a, __b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vmin_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_neon_vminfv4hf (__a, __b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vminq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_neon_vminfv8hf (__a, __b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vminnm_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_neon_vminnmv4hf (__a, __b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vminnmq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_neon_vminnmv8hf (__a, __b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vmul_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_neon_vmulfv4hf (__a, __b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vmul_lane_f16 (float16x4_t __a, float16x4_t __b, const int __c)
+{
+  return __builtin_neon_vmul_lanev4hf (__a, __b, __c);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vmul_n_f16 (float16x4_t __a, float16_t __b)
+{
+  return __builtin_neon_vmul_nv4hf (__a, __b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vmulq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_neon_vmulfv8hf (__a, __b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vmulq_lane_f16 (float16x8_t __a, float16x4_t __b, const int __c)
+{
+  return __builtin_neon_vmul_lanev8hf (__a, __b, __c);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vmulq_n_f16 (float16x8_t __a, float16_t __b)
+{
+  return __builtin_neon_vmul_nv8hf (__a, __b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vneg_f16 (float16x4_t __a)
+{
+  return __builtin_neon_vnegv4hf (__a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vnegq_f16 (float16x8_t __a)
+{
+  return __builtin_neon_vnegv8hf (__a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vpadd_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_neon_vpaddv4hf (__a, __b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vpmax_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_neon_vpmaxfv4hf (__a, __b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vpmin_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_neon_vpminfv4hf (__a, __b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vrecpe_f16 (float16x4_t __a)
+{
+  return __builtin_neon_vrecpev4hf (__a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vrecpeq_f16 (float16x8_t __a)
+{
+  return __builtin_neon_vrecpev8hf (__a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vrnd_f16 (float16x4_t __a)
+{
+  return __builtin_neon_vrndv4hf (__a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vrndq_f16 (float16x8_t __a)
+{
+  return __builtin_neon_vrndv8hf (__a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vrnda_f16 (float16x4_t __a)
+{
+  return __builtin_neon_vrndav4hf (__a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vrndaq_f16 (float16x8_t __a)
+{
+  return __builtin_neon_vrndav8hf (__a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vrndm_f16 (float16x4_t __a)
+{
+  return __builtin_neon_vrndmv4hf (__a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vrndmq_f16 (float16x8_t __a)
+{
+  return __builtin_neon_vrndmv8hf (__a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vrndn_f16 (float16x4_t __a)
+{
+  return __builtin_neon_vrndnv4hf (__a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vrndnq_f16 (float16x8_t __a)
+{
+  return __builtin_neon_vrndnv8hf (__a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vrndp_f16 (float16x4_t __a)
+{
+  return __builtin_neon_vrndpv4hf (__a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vrndpq_f16 (float16x8_t __a)
+{
+  return __builtin_neon_vrndpv8hf (__a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vrndx_f16 (float16x4_t __a)
+{
+  return __builtin_neon_vrndxv4hf (__a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vrndxq_f16 (float16x8_t __a)
+{
+  return __builtin_neon_vrndxv8hf (__a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vsqrte_f16 (float16x4_t __a)
+{
+  return __builtin_neon_vsqrtev4hf (__a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vsqrteq_f16 (float16x8_t __a)
+{
+  return __builtin_neon_vsqrtev8hf (__a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vrecps_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_neon_vrecpsv4hf (__a, __b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vrecpsq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_neon_vrecpsv8hf (__a, __b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vrsqrts_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_neon_vrsqrtsv4hf (__a, __b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vrsqrtsq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_neon_vrsqrtsv8hf (__a, __b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vsub_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_neon_vsubv4hf (__a, __b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vsubq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_neon_vsubv8hf (__a, __b);
+}
+
+#endif /* __ARM_FEATURE_VECTOR_FP16_ARITHMETIC.  */
+#pragma GCC pop_options
+
   /* Half-precision data processing intrinsics.  */
 #if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
 
-- 
2.1.4


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 15/17][ARM] Add tests for ARMv8.2-A FP16 support.
  2016-05-17 14:20 [PATCH 0/17][ARM] ARMv8.2-A and FP16 extension support Matthew Wahab
                   ` (13 preceding siblings ...)
  2016-05-17 14:47 ` [PATCH 14/17][ARM] Add NEON " Matthew Wahab
@ 2016-05-17 14:49 ` Matthew Wahab
  2016-07-04 14:17   ` Matthew Wahab
  2016-05-17 14:51 ` [PATCH 16/17][ARM] Add tests for VFP FP16 ACLE instrinsics Matthew Wahab
  2016-05-17 14:52 ` [PATCH 17/17][ARM] Add tests for NEON FP16 ACLE intrinsics Matthew Wahab
  16 siblings, 1 reply; 73+ messages in thread
From: Matthew Wahab @ 2016-05-17 14:49 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 912 bytes --]

Support for using the half-precision floating point operations added by
the ARMv8.2-A FP16 extension is based on the macros and intrinsics added
to the ACLE for the extension.

This patch adds tests to check the compilers treatment of the ACLE
macros and the code generated for the new intrinsics. It does not
include the executable tests for the
gcc.target/aarch64/advsimd-intrinsics testsuite. Those are added later
in the patch series.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator.

Ok for trunk?
Matthew

testsuite/
2016-05-17  Matthew Wahab  <matthew.wahab@arm.com>

	* gcc.target/arm/armv8_2-fp16-neon-1.c: New.
	* gcc.target/arm/armv8_2-fp16-scalar-1.c: New.
	* gcc.target/arm/armv8_2-fp16-scalar-2.c: New.
	* gcc.target/arm/attr-fp16-arith-1.c: Add a test of intrinsics
	support.


[-- Attachment #2: 0015-PATCH-15-17-ARM-Add-tests-for-ARMv8.2-A-FP16-support.patch --]
[-- Type: text/x-patch, Size: 27354 bytes --]

From fe0cac871efe08d491a3b4ac027c29db1a72d15c Mon Sep 17 00:00:00 2001
From: Matthew Wahab <matthew.wahab@arm.com>
Date: Thu, 7 Apr 2016 13:38:02 +0100
Subject: [PATCH 15/17] [PATCH 15/17][ARM] Add tests for ARMv8.2-A FP16
 support.

testsuite/
2016-05-17  Matthew Wahab  <matthew.wahab@arm.com>

	* gcc.target/arm/armv8_2-fp16-neon-1.c: New.
	* gcc.target/arm/armv8_2-fp16-scalar-1.c: New.
	* gcc.target/arm/armv8_2-fp16-scalar-2.c: New.
	* gcc.target/arm/attr-fp16-arith-1.c: Add a test of intrinsics
	support.
---
 gcc/testsuite/gcc.target/arm/armv8_2-fp16-neon-1.c | 490 +++++++++++++++++++++
 .../gcc.target/arm/armv8_2-fp16-scalar-1.c         | 203 +++++++++
 .../gcc.target/arm/armv8_2-fp16-scalar-2.c         |  71 +++
 gcc/testsuite/gcc.target/arm/attr-fp16-arith-1.c   |  13 +
 4 files changed, 777 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/arm/armv8_2-fp16-neon-1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/armv8_2-fp16-scalar-1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/armv8_2-fp16-scalar-2.c

diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-neon-1.c b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-neon-1.c
new file mode 100644
index 0000000..576031e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-neon-1.c
@@ -0,0 +1,490 @@
+/* { dg-do compile }  */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_ok }  */
+/* { dg-options "-O2" }  */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
+
+/* Test instructions generated for the FP16 vector intrinsics.  */
+
+#include <arm_neon.h>
+
+#define MSTRCAT(L, str)	L##str
+
+#define UNOP_TEST(insn)				\
+  float16x4_t					\
+  MSTRCAT (test_##insn, _16x4) (float16x4_t a)	\
+  {						\
+    return MSTRCAT (insn, _f16) (a);		\
+  }						\
+  float16x8_t					\
+  MSTRCAT (test_##insn, _16x8) (float16x8_t a)	\
+  {						\
+    return MSTRCAT (insn, q_f16) (a);		\
+  }
+
+#define BINOP_TEST(insn)					\
+  float16x4_t							\
+  MSTRCAT (test_##insn, _16x4) (float16x4_t a, float16x4_t b)	\
+  {								\
+    return MSTRCAT (insn, _f16) (a, b);				\
+  }								\
+  float16x8_t							\
+  MSTRCAT (test_##insn, _16x8) (float16x8_t a, float16x8_t b)	\
+  {								\
+    return MSTRCAT (insn, q_f16) (a, b);			\
+  }
+
+#define BINOP_LANE_TEST(insn, I)					\
+  float16x4_t								\
+  MSTRCAT (test_##insn##_lane, _16x4) (float16x4_t a, float16x4_t b)	\
+  {									\
+    return MSTRCAT (insn, _lane_f16) (a, b, I);				\
+  }									\
+  float16x8_t								\
+  MSTRCAT (test_##insn##_lane, _16x8) (float16x8_t a, float16x4_t b)	\
+  {									\
+    return MSTRCAT (insn, q_lane_f16) (a, b, I);			\
+  }
+
+#define BINOP_LANEQ_TEST(insn, I)					\
+  float16x4_t								\
+  MSTRCAT (test_##insn##_laneq, _16x4) (float16x4_t a, float16x8_t b)	\
+  {									\
+    return MSTRCAT (insn, _laneq_f16) (a, b, I);			\
+  }									\
+  float16x8_t								\
+  MSTRCAT (test_##insn##_laneq, _16x8) (float16x8_t a, float16x8_t b)	\
+  {									\
+    return MSTRCAT (insn, q_laneq_f16) (a, b, I);			\
+  }									\
+
+#define BINOP_N_TEST(insn)					\
+  float16x4_t							\
+  MSTRCAT (test_##insn##_n, _16x4) (float16x4_t a, float16_t b)	\
+  {								\
+    return MSTRCAT (insn, _n_f16) (a, b);			\
+  }								\
+  float16x8_t							\
+  MSTRCAT (test_##insn##_n, _16x8) (float16x8_t a, float16_t b)	\
+  {								\
+    return MSTRCAT (insn, q_n_f16) (a, b);			\
+  }
+
+#define TERNOP_TEST(insn)						\
+  float16_t								\
+  MSTRCAT (test_##insn, _16) (float16_t a, float16_t b, float16_t c)	\
+  {									\
+    return MSTRCAT (insn, h_f16) (a, b, c);				\
+  }									\
+  float16x4_t								\
+  MSTRCAT (test_##insn, _16x4) (float16x4_t a, float16x4_t b,		\
+			       float16x4_t c)				\
+  {									\
+    return MSTRCAT (insn, _f16) (a, b, c);				\
+  }									\
+  float16x8_t								\
+  MSTRCAT (test_##insn, _16x8) (float16x8_t a, float16x8_t b,		\
+			       float16x8_t c)				\
+  {									\
+    return MSTRCAT (insn, q_f16) (a, b, c);				\
+  }
+
+#define VCMP1_TEST(insn)			\
+  uint16x4_t					\
+  MSTRCAT (test_##insn, _16x4) (float16x4_t a)	\
+  {						\
+    return MSTRCAT (insn, _f16) (a);		\
+  }						\
+  uint16x8_t					\
+  MSTRCAT (test_##insn, _16x8) (float16x8_t a)	\
+  {						\
+    return MSTRCAT (insn, q_f16) (a);		\
+  }
+
+#define VCMP2_TEST(insn)					\
+  uint16x4_t							\
+  MSTRCAT (test_##insn, _16x4) (float16x4_t a, float16x4_t b)	\
+  {								\
+    return MSTRCAT (insn, _f16) (a, b);				\
+  }								\
+  uint16x8_t							\
+  MSTRCAT (test_##insn, _16x8) (float16x8_t a, float16x8_t b)	\
+  {								\
+    return MSTRCAT (insn, q_f16) (a, b);			\
+  }
+
+#define VCVT_TEST(insn, TY, TO, FR)			\
+  MSTRCAT (TO, 16x4_t)					\
+  MSTRCAT (test_##insn, TY) (MSTRCAT (FR, 16x4_t) a)	\
+  {							\
+    return MSTRCAT (insn, TY) (a);			\
+  }							\
+  MSTRCAT (TO, 16x8_t)					\
+  MSTRCAT (test_##insn##_q, TY) (MSTRCAT (FR, 16x8_t) a)	\
+  {							\
+    return MSTRCAT (insn, q##TY) (a);			\
+  }
+
+#define VCVT_N_TEST(insn, TY, TO, FR)			\
+  MSTRCAT (TO, 16x4_t)					\
+  MSTRCAT (test_##insn##_n, TY) (MSTRCAT (FR, 16x4_t) a)	\
+  {							\
+    return MSTRCAT (insn, _n##TY) (a, 1);		\
+  }							\
+  MSTRCAT (TO, 16x8_t)					\
+  MSTRCAT (test_##insn##_n_q, TY) (MSTRCAT (FR, 16x8_t) a)	\
+  {							\
+    return MSTRCAT (insn, q_n##TY) (a, 1);		\
+  }
+
+VCMP1_TEST (vceqz)
+/* { dg-final { scan-assembler-times {vceq\.f16\td[0-9]+, d[0-0]+, #0} 1 } }  */
+/* { dg-final { scan-assembler-times {vceq\.f16\tq[0-9]+, q[0-9]+, #0} 1 } }  */
+
+VCMP1_TEST (vcgtz)
+/* { dg-final { scan-assembler-times {vcgt\.f16\td[0-9]+, d[0-9]+, #0} 1 } }  */
+/* { dg-final { scan-assembler-times {vceq\.f16\tq[0-9]+, q[0-9]+, #0} 1 } }  */
+
+VCMP1_TEST (vcgez)
+/* { dg-final { scan-assembler-times {vcge\.f16\td[0-9]+, d[0-9]+, #0} 1 } }  */
+/* { dg-final { scan-assembler-times {vcge\.f16\tq[0-9]+, q[0-9]+, #0} 1 } }  */
+
+VCMP1_TEST (vcltz)
+/* { dg-final { scan-assembler-times {vclt.f16\td[0-9]+, d[0-9]+, #0} 1 } }  */
+/* { dg-final { scan-assembler-times {vclt.f16\tq[0-9]+, q[0-9]+, #0} 1 } }  */
+
+VCMP1_TEST (vclez)
+/* { dg-final { scan-assembler-times {vcle\.f16\td[0-9]+, d[0-9]+, #0} 1 } }  */
+/* { dg-final { scan-assembler-times {vcle\.f16\tq[0-9]+, q[0-9]+, #0} 1 } }  */
+
+VCVT_TEST (vcvt, _f16_s16, float, int)
+VCVT_N_TEST (vcvt, _f16_s16, float, int)
+/* { dg-final { scan-assembler-times {vcvt\.f16\.s16\td[0-9]+, d[0-9]+} 2 } }
+   { dg-final { scan-assembler-times {vcvt\.f16\.s16\tq[0-9]+, q[0-9]+} 2 } }
+   { dg-final { scan-assembler-times {vcvt\.f16\.s16\td[0-9]+, d[0-9]+, #1} 1 } }
+   { dg-final { scan-assembler-times {vcvt\.f16\.s16\tq[0-9]+, q[0-9]+, #1} 1 } }  */
+
+VCVT_TEST (vcvt, _f16_u16, float, uint)
+VCVT_N_TEST (vcvt, _f16_u16, float, uint)
+/* { dg-final { scan-assembler-times {vcvt\.f16\.u16\td[0-9]+, d[0-9]+} 2 } }
+   { dg-final { scan-assembler-times {vcvt\.f16\.u16\tq[0-9]+, q[0-9]+} 2 } }
+   { dg-final { scan-assembler-times {vcvt\.f16\.u16\td[0-9]+, d[0-9]+, #1} 1 } }
+   { dg-final { scan-assembler-times {vcvt\.f16\.u16\tq[0-9]+, q[0-9]+, #1} 1 } }  */
+
+VCVT_TEST (vcvt, _s16_f16, int, float)
+VCVT_N_TEST (vcvt, _s16_f16, int, float)
+/* { dg-final { scan-assembler-times {vcvt\.s16\.f16\td[0-9]+, d[0-9]+} 2 } }
+   { dg-final { scan-assembler-times {vcvt\.s16\.f16\tq[0-9]+, q[0-9]+} 2 } }
+   { dg-final { scan-assembler-times {vcvt\.s16\.f16\td[0-9]+, d[0-9]+, #1} 1 } }
+   { dg-final { scan-assembler-times {vcvt\.s16\.f16\tq[0-9]+, q[0-9]+, #1} 1 } }  */
+
+VCVT_TEST (vcvt, _u16_f16, uint, float)
+VCVT_N_TEST (vcvt, _u16_f16, uint, float)
+/* { dg-final { scan-assembler-times {vcvt\.u16\.f16\td[0-9]+, d[0-9]+} 2 } }
+   { dg-final { scan-assembler-times {vcvt\.u16\.f16\tq[0-9]+, q[0-9]+} 2 } }
+   { dg-final { scan-assembler-times {vcvt\.u16\.f16\td[0-9]+, d[0-9]+, #1} 1 } }
+   { dg-final { scan-assembler-times {vcvt\.u16\.f16\tq[0-9]+, q[0-9]+, #1} 1 } }  */
+
+VCVT_TEST (vcvta, _s16_f16, int, float)
+/* { dg-final { scan-assembler-times {vcvta\.s16\.f16\td[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vcvta\.s16\.f16\tq[0-9]+, q[0-9]+} 1 } }
+*/
+
+VCVT_TEST (vcvta, _u16_f16, uint, float)
+/* { dg-final { scan-assembler-times {vcvta\.u16\.f16\td[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vcvta\.u16\.f16\tq[0-9]+, q[0-9]+} 1 } }
+*/
+
+VCVT_TEST (vcvtm, _s16_f16, int, float)
+/* { dg-final { scan-assembler-times {vcvtm\.s16\.f16\td[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vcvtm\.s16\.f16\tq[0-9]+, q[0-9]+} 1 } }
+*/
+
+VCVT_TEST (vcvtm, _u16_f16, uint, float)
+/* { dg-final { scan-assembler-times {vcvtm\.u16\.f16\td[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vcvtm\.u16\.f16\tq[0-9]+, q[0-9]+} 1 } }
+*/
+
+VCVT_TEST (vcvtn, _s16_f16, int, float)
+/* { dg-final { scan-assembler-times {vcvtn\.s16\.f16\td[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vcvtn\.s16\.f16\tq[0-9]+, q[0-9]+} 1 } }
+*/
+
+VCVT_TEST (vcvtn, _u16_f16, uint, float)
+/* { dg-final { scan-assembler-times {vcvtn\.u16\.f16\td[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vcvtn\.u16\.f16\tq[0-9]+, q[0-9]+} 1 } }
+*/
+
+VCVT_TEST (vcvtp, _s16_f16, int, float)
+/* { dg-final { scan-assembler-times {vcvtp\.s16\.f16\td[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vcvtp\.s16\.f16\tq[0-9]+, q[0-9]+} 1 } }
+*/
+
+VCVT_TEST (vcvtp, _u16_f16, uint, float)
+/* { dg-final { scan-assembler-times {vcvtp\.u16\.f16\td[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vcvtp\.u16\.f16\tq[0-9]+, q[0-9]+} 1 } }
+*/
+
+UNOP_TEST (vabs)
+/* { dg-final { scan-assembler-times {vabs\.f16\td[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vabs\.f16\tq[0-9]+, q[0-9]+} 1 } }  */
+
+UNOP_TEST (vneg)
+/* { dg-final { scan-assembler-times {vneg\.f16\td[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vneg\.f16\tq[0-9]+, q[0-9]+} 1 } }  */
+
+UNOP_TEST (vrecpe)
+/* { dg-final { scan-assembler-times {vrecpe\.f16\td[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vrecpe\.f16\tq[0-9]+, q[0-9]+} 1 } }  */
+
+UNOP_TEST (vrnd)
+/* { dg-final { scan-assembler-times {vrintz\.f16\td[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vrintz\.f16\tq[0-9]+, q[0-9]+} 1 } }  */
+
+UNOP_TEST (vrnda)
+/* { dg-final { scan-assembler-times {vrinta\.f16\td[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vrinta\.f16\tq[0-9]+, q[0-9]+} 1 } }  */
+
+UNOP_TEST (vrndm)
+/* { dg-final { scan-assembler-times {vrintm\.f16\td[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vrintm\.f16\tq[0-9]+, q[0-9]+} 1 } }  */
+
+UNOP_TEST (vrndn)
+/* { dg-final { scan-assembler-times {vrintn\.f16\td[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vrintn\.f16\tq[0-9]+, q[0-9]+} 1 } }  */
+
+UNOP_TEST (vrndp)
+/* { dg-final { scan-assembler-times {vrintp\.f16\td[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vrintp\.f16\tq[0-9]+, q[0-9]+} 1 } }  */
+
+UNOP_TEST (vrndx)
+/* { dg-final { scan-assembler-times {vrintx\.f16\td[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vrintx\.f16\tq[0-9]+, q[0-9]+} 1 } }  */
+
+UNOP_TEST (vsqrte)
+/* { dg-final { scan-assembler-times {vsqrte\.f16\td[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vsqrte\.f16\tq[0-9]+, q[0-9]+} 1 } }  */
+
+BINOP_TEST (vadd)
+/* { dg-final { scan-assembler-times {vadd\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vadd\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
+
+BINOP_TEST (vabd)
+/* { dg-final { scan-assembler-times {vabd\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vabd\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
+
+VCMP2_TEST (vcage)
+/* { dg-final { scan-assembler-times {vacge\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vacge\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
+
+VCMP2_TEST (vcagt)
+/* { dg-final { scan-assembler-times {vacgt\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vacgt\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
+
+VCMP2_TEST (vcale)
+/* { dg-final { scan-assembler-times {vacle\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vacle\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
+
+VCMP2_TEST (vcalt)
+/* { dg-final { scan-assembler-times {vaclt\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vaclt\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
+
+VCMP2_TEST (vceq)
+/* { dg-final { scan-assembler-times {vceq\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vceq\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
+
+VCMP2_TEST (vcge)
+/* { dg-final { scan-assembler-times {vcge\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vcge\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
+
+VCMP2_TEST (vcgt)
+/* { dg-final { scan-assembler-times {vcgt\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vcgt\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
+
+VCMP2_TEST (vcle)
+/* { dg-final { scan-assembler-times {vcle\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vcle\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
+
+VCMP2_TEST (vclt)
+/* { dg-final { scan-assembler-times {vclt\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vclt\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
+
+BINOP_TEST (vmax)
+/* { dg-final { scan-assembler-times {vmax\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vmax\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
+
+BINOP_TEST (vmin)
+/* { dg-final { scan-assembler-times {vmin\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vmin\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
+
+BINOP_TEST (vmaxnm)
+/* { dg-final { scan-assembler-times {vmaxnm\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }
+  { dg-final { scan-assembler-times {vmaxnm\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
+
+BINOP_TEST (vminnm)
+/* { dg-final { scan-assembler-times {vminnm\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }
+  { dg-final { scan-assembler-times {vminnm\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
+
+BINOP_TEST (vmul)
+/* { dg-final { scan-assembler-times {vmul\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 3 } }
+   { dg-final { scan-assembler-times {vmul\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
+BINOP_LANE_TEST (vmul, 2)
+/* { dg-final { scan-assembler-times {vmul\.f16\td[0-9]+, d[0-9]+, d[0-9]+\[2\]} 1 } }
+   { dg-final { scan-assembler-times {vmul\.f16\tq[0-9]+, q[0-9]+, d[0-9]+\[2\]} 1 } }  */
+BINOP_N_TEST (vmul)
+/* { dg-final { scan-assembler-times {vmul\.f16\td[0-9]+, d[0-9]+, d[0-9]+\[0\]} 1 } }
+   { dg-final { scan-assembler-times {vmul\.f16\tq[0-9]+, q[0-9]+, d[0-9]+\[0\]} 1 } }*/
+
+float16x4_t
+test_vpadd_16x4 (float16x4_t a, float16x4_t b)
+{
+  return vpadd_f16 (a, b);
+}
+/* { dg-final { scan-assembler-times {vpadd\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } } */
+
+float16x4_t
+test_vpmax_16x4 (float16x4_t a, float16x4_t b)
+{
+  return vpmax_f16 (a, b);
+}
+/* { dg-final { scan-assembler-times {vpmax\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } } */
+
+float16x4_t
+test_vpmin_16x4 (float16x4_t a, float16x4_t b)
+{
+  return vpmin_f16 (a, b);
+}
+/* { dg-final { scan-assembler-times {vpmin\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } } */
+
+BINOP_TEST (vsub)
+/* { dg-final { scan-assembler-times {vsub\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vsub\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
+
+BINOP_TEST (vrecps)
+/* { dg-final { scan-assembler-times {vrecps\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }
+  { dg-final { scan-assembler-times {vrecps\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
+
+BINOP_TEST (vrsqrts)
+/* { dg-final { scan-assembler-times {vrsqrts\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }
+  { dg-final { scan-assembler-times {vrsqrts\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
+
+TERNOP_TEST (vfma)
+/* { dg-final { scan-assembler-times {vfma\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }
+  { dg-final { scan-assembler-times {vfma\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
+
+TERNOP_TEST (vfms)
+/* { dg-final { scan-assembler-times {vfms\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }
+  { dg-final { scan-assembler-times {vfms\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
+
+float16x4_t
+test_vmov_n_f16 (float16_t a)
+{
+  return vmov_n_f16 (a);
+}
+
+float16x4_t
+test_vdup_n_f16 (float16_t a)
+{
+  return vdup_n_f16 (a);
+}
+/* { dg-final { scan-assembler-times {vdup\.16\td[0-9]+, r[0-9]+} 2 } }  */
+
+float16x8_t
+test_vmovq_n_f16 (float16_t a)
+{
+  return vmovq_n_f16 (a);
+}
+
+float16x8_t
+test_vdupq_n_f16 (float16_t a)
+{
+  return vdupq_n_f16 (a);
+}
+/* { dg-final { scan-assembler-times {vdup\.16\tq[0-9]+, r[0-9]+} 2 } }  */
+
+float16x4_t
+test_vdup_lane_f16 (float16x4_t a)
+{
+  return vdup_lane_f16 (a, 1);
+}
+/* { dg-final { scan-assembler-times {vdup\.16\td[0-9]+, d[0-9]+\[1\]} 1 } }  */
+
+float16x8_t
+test_vdupq_lane_f16 (float16x4_t a)
+{
+  return vdupq_lane_f16 (a, 1);
+}
+/* { dg-final { scan-assembler-times {vdup\.16\tq[0-9]+, d[0-9]+\[1\]} 1 } }  */
+
+float16x4_t
+test_vext_f16 (float16x4_t a, float16x4_t b)
+{
+  return vext_f16 (a, b, 1);
+}
+/* { dg-final { scan-assembler-times {vext\.16\td[0-9]+, d[0-9]+, d[0-9]+, #1} 1 } } */
+
+float16x8_t
+test_vextq_f16 (float16x8_t a, float16x8_t b)
+{
+  return vextq_f16 (a, b, 1);
+}
+/*   { dg-final { scan-assembler-times {vext\.16\tq[0-9]+, q[0-9]+, q[0-9]+, #1} 1 } }  */
+
+UNOP_TEST (vrev64)
+/* { dg-final { scan-assembler-times {vrev64\.16\td[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vrev64\.16\tq[0-9]+, q[0-9]+} 1 } }  */
+
+float16x4_t
+test_vbsl16x4 (uint16x4_t a, float16x4_t b, float16x4_t c)
+{
+  return vbsl_f16 (a, b, c);
+}
+/* { dg-final { scan-assembler-times {vbsl\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }  */
+
+float16x8_t
+test_vbslq16x8 (uint16x8_t a, float16x8_t b, float16x8_t c)
+{
+  return vbslq_f16 (a, b, c);
+}
+/*{ dg-final { scan-assembler-times {vbsl\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
+
+float16x4x2_t
+test_vzip16x4 (float16x4_t a, float16x4_t b)
+{
+  return vzip_f16 (a, b);
+}
+/* { dg-final { scan-assembler-times {vzip\.16\td[0-9]+, d[0-9]+} 1 } }  */
+
+float16x8x2_t
+test_vzipq16x8 (float16x8_t a, float16x8_t b)
+{
+  return vzipq_f16 (a, b);
+}
+/*{ dg-final { scan-assembler-times {vzip\.16\tq[0-9]+, q[0-9]+} 1 } }  */
+
+float16x4x2_t
+test_vuzp16x4 (float16x4_t a, float16x4_t b)
+{
+  return vuzp_f16 (a, b);
+}
+/* { dg-final { scan-assembler-times {vuzp\.16\td[0-9]+, d[0-9]+} 1 } }  */
+
+float16x8x2_t
+test_vuzpq16x8 (float16x8_t a, float16x8_t b)
+{
+  return vuzpq_f16 (a, b);
+}
+/*{ dg-final { scan-assembler-times {vuzp\.16\tq[0-9]+, q[0-9]+} 1 } }  */
+
+float16x4x2_t
+test_vtrn16x4 (float16x4_t a, float16x4_t b)
+{
+  return vtrn_f16 (a, b);
+}
+/* { dg-final { scan-assembler-times {vtrn\.16\td[0-9]+, d[0-9]+} 1 } }  */
+
+float16x8x2_t
+test_vtrnq16x8 (float16x8_t a, float16x8_t b)
+{
+  return vtrnq_f16 (a, b);
+}
+/*{ dg-final { scan-assembler-times {vtrn\.16\tq[0-9]+, q[0-9]+} 1 } }  */
diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-scalar-1.c b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-scalar-1.c
new file mode 100644
index 0000000..2eddb76
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-scalar-1.c
@@ -0,0 +1,203 @@
+/* { dg-do compile }  */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_ok }  */
+/* { dg-options "-O2" }  */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+/* Test instructions generated for the FP16 scalar intrinsics.  */
+#include <arm_fp16.h>
+
+#define MSTRCAT(L, str)	L##str
+
+#define UNOP_TEST(insn)				\
+  float16_t					\
+  MSTRCAT (test_##insn, 16) (float16_t a)	\
+  {						\
+    return MSTRCAT (insn, h_f16) (a);		\
+  }
+
+#define BINOP_TEST(insn)				\
+  float16_t						\
+  MSTRCAT (test_##insn, 16) (float16_t a, float16_t b)	\
+  {							\
+    return MSTRCAT (insn, h_f16) (a, b);		\
+  }
+
+#define TERNOP_TEST(insn)						\
+  float16_t								\
+  MSTRCAT (test_##insn, 16) (float16_t a, float16_t b, float16_t c)	\
+  {									\
+    return MSTRCAT (insn, h_f16) (a, b, c);				\
+  }
+
+float16_t
+test_vcvth_f16_s32 (int32_t a)
+{
+  return vcvth_f16_s32 (a);
+}
+
+float16_t
+test_vcvth_n_f16_s32 (int32_t a)
+{
+  return vcvth_n_f16_s32 (a, 1);
+}
+/* { dg-final { scan-assembler-times {vcvt\.f16\.s32\ts[0-9]+, s[0-9]+} 2 } }  */
+/* { dg-final { scan-assembler-times {vcvt\.f16\.s32\ts[0-9]+, s[0-9]+, #1} 1 } }  */
+
+float16_t
+test_vcvth_f16_u32 (uint32_t a)
+{
+  return vcvth_f16_u32 (a);
+}
+
+float16_t
+test_vcvth_n_f16_u32 (uint32_t a)
+{
+  return vcvth_n_f16_u32 (a, 1);
+}
+
+/* { dg-final { scan-assembler-times {vcvt\.f16\.u32\ts[0-9]+, s[0-9]+} 2 } }  */
+/* { dg-final { scan-assembler-times {vcvt\.f16\.u32\ts[0-9]+, s[0-9]+, #1} 1 } }  */
+
+uint32_t
+test_vcvth_u32_f16 (float16_t a)
+{
+  return vcvth_u32_f16 (a);
+}
+/* { dg-final { scan-assembler-times {vcvt\.u32\.f16\ts[0-9]+, s[0-9]+} 2 } }  */
+
+uint32_t
+test_vcvth_n_u32_f16 (float16_t a)
+{
+  return vcvth_n_u32_f16 (a, 1);
+}
+/* { dg-final { scan-assembler-times {vcvt\.u32\.f16\ts[0-9]+, s[0-9]+, #1} 1 } }  */
+
+int32_t
+test_vcvth_s32_f16 (float16_t a)
+{
+  return vcvth_s32_f16 (a);
+}
+
+int32_t
+test_vcvth_n_s32_f16 (float16_t a)
+{
+  return vcvth_n_s32_f16 (a, 1);
+}
+
+/* { dg-final { scan-assembler-times {vcvt\.s32\.f16\ts[0-9]+, s[0-9]+} 2 } }  */
+/* { dg-final { scan-assembler-times {vcvt\.s32\.f16\ts[0-9]+, s[0-9]+, #1} 1 } }  */
+
+int32_t
+test_vcvtah_s32_f16 (float16_t a)
+{
+  return vcvtah_s32_f16 (a);
+}
+/* { dg-final { scan-assembler-times {vcvta\.s32\.f16\ts[0-9]+, s[0-9]+} 1 } }  */
+
+uint32_t
+test_vcvtah_u32_f16 (float16_t a)
+{
+  return vcvtah_u32_f16 (a);
+}
+/* { dg-final { scan-assembler-times {vcvta\.u32\.f16\ts[0-9]+, s[0-9]+} 1 } }  */
+
+int32_t
+test_vcvtmh_s32_f16 (float16_t a)
+{
+  return vcvtmh_s32_f16 (a);
+}
+/* { dg-final { scan-assembler-times {vcvtm\.s32\.f16\ts[0-9]+, s[0-9]+} 1 } }  */
+
+uint32_t
+test_vcvtmh_u32_f16 (float16_t a)
+{
+  return vcvtmh_u32_f16 (a);
+}
+/* { dg-final { scan-assembler-times {vcvtm\.u32\.f16\ts[0-9]+, s[0-9]+} 1 } }
+ */
+
+int32_t
+test_vcvtnh_s32_f16 (float16_t a)
+{
+  return vcvtnh_s32_f16 (a);
+}
+/* { dg-final { scan-assembler-times {vcvtn\.s32\.f16\ts[0-9]+, s[0-9]+} 1 } }
+ */
+
+uint32_t
+test_vcvtnh_u32_f16 (float16_t a)
+{
+  return vcvtnh_u32_f16 (a);
+}
+/* { dg-final { scan-assembler-times {vcvtn\.u32\.f16\ts[0-9]+, s[0-9]+} 1 } }
+ */
+
+int32_t
+test_vcvtph_s32_f16 (float16_t a)
+{
+  return vcvtph_s32_f16 (a);
+}
+/* { dg-final { scan-assembler-times {vcvtp\.s32\.f16\ts[0-9]+, s[0-9]+} 1 } }
+ */
+
+uint32_t
+test_vcvtph_u32_f16 (float16_t a)
+{
+  return vcvtph_u32_f16 (a);
+}
+/* { dg-final { scan-assembler-times {vcvtp\.u32\.f16\ts[0-9]+, s[0-9]+} 1 } }
+ */
+
+UNOP_TEST (vabs)
+/* { dg-final { scan-assembler-times {vabs\.f16\ts[0-9]+, s[0-9]+} 1 } }  */
+
+UNOP_TEST (vneg)
+/* { dg-final { scan-assembler-times {vneg\.f16\ts[0-9]+, s[0-9]+} 1 } }  */
+
+UNOP_TEST (vrnd)
+/* { dg-final { scan-assembler-times {vrintz\.f16\ts[0-9]+, s[0-9]+} 1 } }  */
+
+UNOP_TEST (vrndi)
+/* { dg-final { scan-assembler-times {vrintr\.f16\ts[0-9]+, s[0-9]+} 1 } }  */
+
+UNOP_TEST (vrnda)
+/* { dg-final { scan-assembler-times {vrinta\.f16\ts[0-9]+, s[0-9]+} 1 } }  */
+
+UNOP_TEST (vrndm)
+/* { dg-final { scan-assembler-times {vrinta\.f16\ts[0-9]+, s[0-9]+} 1 } }  */
+
+UNOP_TEST (vrndn)
+/* { dg-final { scan-assembler-times {vrinta\.f16\ts[0-9]+, s[0-9]+} 1 } }  */
+
+UNOP_TEST (vrndp)
+/* { dg-final { scan-assembler-times {vrinta\.f16\ts[0-9]+, s[0-9]+} 1 } }  */
+
+UNOP_TEST (vrndx)
+/* { dg-final { scan-assembler-times {vrinta\.f16\ts[0-9]+, s[0-9]+} 1 } }  */
+
+UNOP_TEST (vsqrt)
+/* { dg-final { scan-assembler-times {vsqrt\.f16\ts[0-9]+, s[0-9]+} 1 } }  */
+
+BINOP_TEST (vadd)
+/* { dg-final { scan-assembler-times {vadd\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } }  */
+
+BINOP_TEST (vdiv)
+/* { dg-final { scan-assembler-times {vdiv\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } }  */
+
+BINOP_TEST (vmaxnm)
+/* { dg-final { scan-assembler-times {vmaxnm\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } }  */
+
+BINOP_TEST (vminnm)
+/* { dg-final { scan-assembler-times {vminnm\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } }  */
+
+BINOP_TEST (vmul)
+/* { dg-final { scan-assembler-times {vmul\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } }  */
+
+BINOP_TEST (vsub)
+/* { dg-final { scan-assembler-times {vsub\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } }  */
+
+TERNOP_TEST (vfma)
+/* { dg-final { scan-assembler-times {vfma\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } }  */
+
+TERNOP_TEST (vfms)
+/* { dg-final { scan-assembler-times {vfms\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } }  */
diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-scalar-2.c b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-scalar-2.c
new file mode 100644
index 0000000..99e6902
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-scalar-2.c
@@ -0,0 +1,71 @@
+/* { dg-do compile }  */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_ok }  */
+/* { dg-options "-O2" }  */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+/* Test compiler use of FP16 instructions.  */
+#include <arm_fp16.h>
+
+float16_t
+test_mov_imm_1 (float16_t a)
+{
+  return 1.0;
+}
+
+float16_t
+test_mov_imm_2 (float16_t a)
+{
+  float16_t b = 1.0;
+  return b;
+}
+
+float16_t
+test_vmov_imm_3 (float16_t a)
+{
+  float16_t b = 1.0;
+  return vaddh_f16 (a, b);
+}
+
+float16_t
+test_vmov_imm_4 (float16_t a)
+{
+  return vaddh_f16 (a, 1.0);
+}
+
+/* { dg-final { scan-assembler-times {vmov.f16\ts[0-9]+, #1\.0e\+0} 4 } }
+   { dg-final { scan-assembler-times {vadd.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 2 } } */
+
+float16_t
+test_vmla_1 (float16_t a, float16_t b, float16_t c)
+{
+  return vaddh_f16 (vmulh_f16 (a, b), c);
+}
+/* { dg-final { scan-assembler-times {vmla\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } }  */
+
+float16_t
+test_vmla_2 (float16_t a, float16_t b, float16_t c)
+{
+  return vsubh_f16 (vmulh_f16 (vnegh_f16 (a), b), c);
+}
+/* { dg-final { scan-assembler-times {vnmla\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } } */
+
+float16_t
+test_vmls_1 (float16_t a, float16_t b, float16_t c)
+{
+  return vsubh_f16 (c, vmulh_f16 (a, b));
+}
+
+float16_t
+test_vmls_2 (float16_t a, float16_t b, float16_t c)
+{
+  return vsubh_f16 (a, vmulh_f16 (b, c));
+}
+/* { dg-final { scan-assembler-times {vmls\.f16} 2 } } */
+
+float16_t
+test_vnmls_1 (float16_t a, float16_t b, float16_t c)
+{
+  return vsubh_f16 (vmulh_f16 (a, b), c);
+}
+/* { dg-final { scan-assembler-times {vnmls\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } } */
+
diff --git a/gcc/testsuite/gcc.target/arm/attr-fp16-arith-1.c b/gcc/testsuite/gcc.target/arm/attr-fp16-arith-1.c
index 5011315..a93d30f 100644
--- a/gcc/testsuite/gcc.target/arm/attr-fp16-arith-1.c
+++ b/gcc/testsuite/gcc.target/arm/attr-fp16-arith-1.c
@@ -28,6 +28,19 @@
 #error Invalid value for __ARM_FP
 #endif
 
+#include "arm_neon.h"
+
+float16_t
+foo (float16x4_t b)
+{
+  float16x4_t a = {2.0, 3.0, 4.0, 5.0};
+  float16x4_t res = vadd_f16 (a, b);
+
+  return res[0];
+}
+
+/* { dg-final { scan-assembler "vadd\\.f16\td\[0-9\]+, d\[0-9\]+" } } */
+
 #pragma GCC pop_options
 
 /* Check that the FP version is correctly reset to mfpu=fp-armv8.  */
-- 
2.1.4


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 16/17][ARM] Add tests for VFP FP16 ACLE instrinsics.
  2016-05-17 14:20 [PATCH 0/17][ARM] ARMv8.2-A and FP16 extension support Matthew Wahab
                   ` (14 preceding siblings ...)
  2016-05-17 14:49 ` [PATCH 15/17][ARM] Add tests for ARMv8.2-A FP16 support Matthew Wahab
@ 2016-05-17 14:51 ` Matthew Wahab
  2016-05-18  1:07   ` Joseph Myers
  2016-05-17 14:52 ` [PATCH 17/17][ARM] Add tests for NEON FP16 ACLE intrinsics Matthew Wahab
  16 siblings, 1 reply; 73+ messages in thread
From: Matthew Wahab @ 2016-05-17 14:51 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 3357 bytes --]

Support for using the half-precision floating point operations added by
the ARMv8.2-A FP16 extension is based on the macros and intrinsics added
to the ACLE for the extension.

This patch adds executable tests for the ACLE scalar (floating point)
intrinsics to the advsimd-intrinsics testsuite. The tests were written
by Jiong Wang.

In some tests, there are unavoidable differences in precision when
calculating the actual and the expected results of an FP16 operation. A
new support function CHECK_FP_BIAS is used so that these tests can check
for an acceptable margin of error. In these tests, the tolerance is
given as the absolute integer difference between the bitvectors of the
expected and the actual results.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator. Also tested for aarch64-none-elf with the
advsimd-intrinsics testsuite using an ARMv8.2-A emulator.

Ok for trunk?
Matthew

testsuite/
2016-05-17  Jiong Wang  <jiong.wang@arm.com>
	    Matthew Wahab  <matthew.wahab@arm.com>

	* gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
	(CHECK_FP_BIAS): New.
	* gcc.target/aarch64/advsimd-intrinsics/vabsh_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vaddh_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvtah_s32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvtah_u32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_s32_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_u32_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_s32_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_u32_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvth_n_s32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvth_n_u32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvth_s32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvth_u32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvtmh_s32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvtmh_u32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvtnh_s32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvtnh_u32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvtph_s32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvtph_u32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vdivh_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vfmah_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vfmsh_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vmaxnmh_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vminnmh_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vmulh_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vnegh_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vrndah_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vrndh_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vrndih_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vrndmh_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vrndnh_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vrndph_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vrndxh_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vsqrth_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vsubh_f16_1.c: New.


[-- Attachment #2: 0016-PATCH-16-17-ARM-Add-tests-for-VFP-FP16-ACLE-instrins.patch --]
[-- Type: text/x-patch, Size: 57343 bytes --]

From fe243d41337fcce0c93a8ce1df68921c680bcfe8 Mon Sep 17 00:00:00 2001
From: Matthew Wahab <matthew.wahab@arm.com>
Date: Thu, 7 Apr 2016 15:40:52 +0100
Subject: [PATCH 16/17] [PATCH 16/17][ARM] Add tests for VFP FP16 ACLE
 instrinsics.

testsuite/
2016-05-17  Jiong Wang  <jiong.wang@arm.com>
	    Matthew Wahab  <matthew.wahab@arm.com>

	* gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
	(CHECK_FP_BIAS): New.
	* gcc.target/aarch64/advsimd-intrinsics/vabsh_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vaddh_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvtah_s32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvtah_u32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_s32_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_u32_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_s32_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_u32_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvth_n_s32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvth_n_u32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvth_s32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvth_u32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvtmh_s32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvtmh_u32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvtnh_s32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvtnh_u32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvtph_s32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvtph_u32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vdivh_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vfmah_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vfmsh_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vmaxnmh_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vminnmh_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vmulh_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vnegh_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vrndah_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vrndh_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vrndih_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vrndmh_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vrndnh_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vrndph_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vrndxh_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vsqrth_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vsubh_f16_1.c: New.
---
 .../aarch64/advsimd-intrinsics/arm-neon-ref.h      | 40 +++++++++++++
 .../aarch64/advsimd-intrinsics/vabsh_f16_1.c       | 41 +++++++++++++
 .../aarch64/advsimd-intrinsics/vaddh_f16_1.c       | 42 +++++++++++++
 .../aarch64/advsimd-intrinsics/vcvtah_s32_f16_1.c  | 50 ++++++++++++++++
 .../aarch64/advsimd-intrinsics/vcvtah_u32_f16_1.c  | 50 ++++++++++++++++
 .../aarch64/advsimd-intrinsics/vcvth_f16_s32_1.c   | 49 +++++++++++++++
 .../aarch64/advsimd-intrinsics/vcvth_f16_u32_1.c   | 49 +++++++++++++++
 .../aarch64/advsimd-intrinsics/vcvth_n_f16_s32_1.c | 57 ++++++++++++++++++
 .../aarch64/advsimd-intrinsics/vcvth_n_f16_u32_1.c | 56 ++++++++++++++++++
 .../aarch64/advsimd-intrinsics/vcvth_n_s32_f16_1.c | 54 +++++++++++++++++
 .../aarch64/advsimd-intrinsics/vcvth_n_u32_f16_1.c | 54 +++++++++++++++++
 .../aarch64/advsimd-intrinsics/vcvth_s32_f16_1.c   | 50 ++++++++++++++++
 .../aarch64/advsimd-intrinsics/vcvth_u32_f16_1.c   | 50 ++++++++++++++++
 .../aarch64/advsimd-intrinsics/vcvtmh_s32_f16_1.c  | 50 ++++++++++++++++
 .../aarch64/advsimd-intrinsics/vcvtmh_u32_f16_1.c  | 50 ++++++++++++++++
 .../aarch64/advsimd-intrinsics/vcvtnh_s32_f16_1.c  | 50 ++++++++++++++++
 .../aarch64/advsimd-intrinsics/vcvtnh_u32_f16_1.c  | 50 ++++++++++++++++
 .../aarch64/advsimd-intrinsics/vcvtph_s32_f16_1.c  | 50 ++++++++++++++++
 .../aarch64/advsimd-intrinsics/vcvtph_u32_f16_1.c  | 50 ++++++++++++++++
 .../aarch64/advsimd-intrinsics/vdivh_f16_1.c       | 52 ++++++++++++++++
 .../aarch64/advsimd-intrinsics/vfmah_f16_1.c       | 69 ++++++++++++++++++++++
 .../aarch64/advsimd-intrinsics/vfmsh_f16_1.c       | 69 ++++++++++++++++++++++
 .../aarch64/advsimd-intrinsics/vmaxnmh_f16_1.c     | 51 ++++++++++++++++
 .../aarch64/advsimd-intrinsics/vminnmh_f16_1.c     | 51 ++++++++++++++++
 .../aarch64/advsimd-intrinsics/vmulh_f16_1.c       | 51 ++++++++++++++++
 .../aarch64/advsimd-intrinsics/vnegh_f16_1.c       | 63 ++++++++++++++++++++
 .../aarch64/advsimd-intrinsics/vrndah_f16_1.c      | 51 ++++++++++++++++
 .../aarch64/advsimd-intrinsics/vrndh_f16_1.c       | 51 ++++++++++++++++
 .../aarch64/advsimd-intrinsics/vrndih_f16_1.c      | 51 ++++++++++++++++
 .../aarch64/advsimd-intrinsics/vrndmh_f16_1.c      | 51 ++++++++++++++++
 .../aarch64/advsimd-intrinsics/vrndnh_f16_1.c      | 51 ++++++++++++++++
 .../aarch64/advsimd-intrinsics/vrndph_f16_1.c      | 51 ++++++++++++++++
 .../aarch64/advsimd-intrinsics/vrndxh_f16_1.c      | 51 ++++++++++++++++
 .../aarch64/advsimd-intrinsics/vsqrth_f16_1.c      | 58 ++++++++++++++++++
 .../aarch64/advsimd-intrinsics/vsubh_f16_1.c       | 42 +++++++++++++
 35 files changed, 1805 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vabsh_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vaddh_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtah_s32_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtah_u32_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_s32_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_u32_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_s32_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_u32_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_s32_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_u32_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_s32_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_u32_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtmh_s32_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtmh_u32_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtnh_s32_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtnh_u32_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtph_s32_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtph_u32_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vdivh_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfmah_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfmsh_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmaxnmh_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vminnmh_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulh_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vnegh_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndah_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndh_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndih_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndmh_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndnh_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndph_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndxh_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vsqrth_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vsubh_f16_1.c

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
index 001e320..0585b7e 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
@@ -122,6 +122,46 @@ extern size_t strlen(const char *);
     fprintf(stderr, "CHECKED %s\n", MSG);				\
   }
 
+/* Floating-point variant tolerate minor diffs.  */
+#define CHECK_FP_BIAS(MSG, T, W, N, FMT, EXPECTED, COMMENT, BIAS)	\
+  {									\
+    int i;								\
+    uint##W##_t op_max;							\
+    uint##W##_t op_min;							\
+    for (i=0; i<N ; i++)						\
+      {									\
+	union fp_operand						\
+	{								\
+	  uint##W##_t i;						\
+	  float##W##_t f;						\
+	} tmp_res, tmp_exp;						\
+	tmp_res.f = VECT_VAR (result, T, W, N)[i];			\
+	tmp_exp.i = VECT_VAR (EXPECTED, h##T, W, N)[i];			\
+	op_max = tmp_exp.i;						\
+	op_min = tmp_res.i;						\
+	if (tmp_res.i > tmp_exp.i)					\
+	  {								\
+	    op_max = tmp_res.i;						\
+	    op_min = tmp_exp.i;						\
+	  }								\
+	if ((op_max - op_min) > BIAS)					\
+	  {								\
+	    fprintf (stderr,						\
+		     "ERROR in %s (%s line %d in buffer '%s') at type %s " \
+		     "index %d: got 0x%" FMT " != 0x%" FMT " %s\n",	\
+		     MSG, __FILE__, __LINE__,				\
+		     STR (EXPECTED),					\
+		     STR (VECT_NAME (T, W, N)),				\
+		     i,							\
+		     tmp_res.i,						\
+		     tmp_exp.i,						\
+		     strlen (COMMENT) > 0 ? COMMENT : "");		\
+	    abort ();							\
+	  }								\
+      }									\
+    fprintf (stderr, "CHECKED %s\n", MSG);				\
+  }
+
 /* Clean buffer with a non-zero pattern to help diagnose buffer
    overflows.  */
 #define CLEAN_PATTERN_8  0x33
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vabsh_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vabsh_f16_1.c
new file mode 100644
index 0000000..a2213a3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vabsh_f16_1.c
@@ -0,0 +1,41 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (123.4)
+#define B FP16_C (-567.8)
+#define C FP16_C (-34.8)
+#define D FP16_C (1024)
+#define E FP16_C (663.1)
+#define F FP16_C (169.1)
+#define G FP16_C (-4.8)
+#define H FP16_C (77)
+
+extern void abort ();
+
+/* Expected results for vabsh.  */
+float16_t src[8] = {A, B, C, D, E, F, G, H};
+float16_t expected[8] = {A, -B, -C, D, E, F, -G, H};
+
+void
+exec_vabsh_f16 (void)
+{
+  int index;
+
+  for (index = 0; index < 8; index++)
+    {
+      float16_t ret = vabsh_f16 (src[index]);
+      if (* (uint16_t *) &ret != * (uint16_t *) &expected[index])
+	abort ();
+    }
+}
+
+int
+main (void)
+{
+  exec_vabsh_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vaddh_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vaddh_f16_1.c
new file mode 100644
index 0000000..2741f84
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vaddh_f16_1.c
@@ -0,0 +1,42 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (123.4)
+#define B FP16_C (-567.8)
+#define C FP16_C (-34.8)
+#define D FP16_C (1024)
+#define E FP16_C (663.1)
+#define F FP16_C (169.1)
+#define G FP16_C (-4.8)
+#define H FP16_C (77)
+
+extern void abort ();
+
+/* Expected results for vaddh_f16.  */
+float16_t src1[4] = {A, B, C, D};
+float16_t src2[4] = {E, F, G, H};
+float16_t expected[4] = {A + E, B + F, C + G, D + H};
+
+void
+exec_vaddh_f16 (void)
+{
+  int index;
+
+  for (index = 0; index < 4; index++)
+    {
+      float16_t ret = vaddh_f16 (src1[index], src2[index]);
+      if (* (uint16_t *) &ret != * (uint16_t *) &expected[index])
+	abort ();
+    }
+}
+
+int
+main (void)
+{
+  exec_vaddh_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtah_s32_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtah_s32_f16_1.c
new file mode 100644
index 0000000..b9abbe9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtah_s32_f16_1.c
@@ -0,0 +1,50 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+#define FP16_C(a) ((__fp16) a)
+#define AF FP16_C (123.9)
+#define BF FP16_C (-56.8)
+#define CF FP16_C (0.7)
+#define DF FP16_C (24.6)
+#define EF FP16_C (-63.5)
+#define FF FP16_C (169.4)
+#define GF FP16_C (-4.3)
+#define HF FP16_C (77.0)
+
+#define A (124)
+#define B (-57)
+#define C (1)
+#define D (25)
+#define E (-64)
+#define F (169)
+#define G (-4)
+#define H (77)
+
+extern void abort ();
+
+/* Expected results for vcvtah_s32_f16.  */
+float16_t src[8] = {AF, BF, CF, DF, EF, FF, GF, HF};
+int32_t expected[8] = {A, B, C, D, E, F, G, H};
+
+void
+exec_vcvtah_s32_f16 (void)
+{
+  int index;
+
+  for (index = 0; index < 8; index++)
+    {
+      int32_t ret = vcvtah_s32_f16 (src[index]);
+      if (ret != expected[index])
+	abort ();
+    }
+}
+
+int
+main (void)
+{
+  exec_vcvtah_s32_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtah_u32_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtah_u32_f16_1.c
new file mode 100644
index 0000000..e51109c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtah_u32_f16_1.c
@@ -0,0 +1,50 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+#define FP16_C(a) ((__fp16) a)
+#define AF FP16_C (123.9)
+#define BF FP16_C (56.8)
+#define CF FP16_C (0.7)
+#define DF FP16_C (24.6)
+#define EF FP16_C (63.5)
+#define FF FP16_C (169.4)
+#define GF FP16_C (4.3)
+#define HF FP16_C (77.0)
+
+#define A (124)
+#define B (57)
+#define C (1)
+#define D (25)
+#define E (64)
+#define F (169)
+#define G (4)
+#define H (77)
+
+extern void abort ();
+
+/* Expected results for vcvtah_u32_f16.  */
+float16_t src[8] = {AF, BF, CF, DF, EF, FF, GF, HF};
+uint32_t expected[8] = {A, B, C, D, E, F, G, H};
+
+void
+exec_vcvtah_u32_f16 (void)
+{
+  int index;
+
+  for (index = 0; index < 8; index++)
+    {
+      uint32_t ret = vcvtah_u32_f16 (src[index]);
+      if (ret != expected[index])
+	abort ();
+    }
+}
+
+int
+main (void)
+{
+  exec_vcvtah_u32_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_s32_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_s32_1.c
new file mode 100644
index 0000000..7304f57
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_s32_1.c
@@ -0,0 +1,49 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+#define FP16_C(a) ((__fp16) a)
+#define A (123)
+#define B (-567)
+#define C (0)
+#define D (1024)
+#define E (-63)
+#define F (169)
+#define G (-4)
+#define H (77)
+#define AF FP16_C (A)
+#define BF FP16_C (B)
+#define CF FP16_C (C)
+#define DF FP16_C (D)
+#define EF FP16_C (E)
+#define FF FP16_C (F)
+#define GF FP16_C (G)
+#define HF FP16_C (H)
+
+extern void abort ();
+
+/* Expected results for vcvth_f16_s32.  */
+int32_t src[8] = {A, B, C, D, E, F, G, H};
+float16_t expected[8] = {AF, BF, CF, DF, EF, FF, GF, HF};
+
+void
+exec_vcvth_f16_s32 (void)
+{
+  int index;
+
+  for (index = 0; index < 8; index++)
+    {
+      float16_t ret = vcvth_f16_s32 (src[index]);
+      if (* (uint16_t *) &ret != * (uint16_t *) &expected[index])
+	abort ();
+    }
+}
+
+int
+main (void)
+{
+  exec_vcvth_f16_s32 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_u32_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_u32_1.c
new file mode 100644
index 0000000..228663d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_u32_1.c
@@ -0,0 +1,49 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+#define FP16_C(a) ((__fp16) a)
+#define A (123)
+#define B (567)
+#define C (0)
+#define D (1024)
+#define E (63)
+#define F (169)
+#define G (4)
+#define H (77)
+#define AF FP16_C (A)
+#define BF FP16_C (B)
+#define CF FP16_C (C)
+#define DF FP16_C (D)
+#define EF FP16_C (E)
+#define FF FP16_C (F)
+#define GF FP16_C (G)
+#define HF FP16_C (H)
+
+extern void abort ();
+
+/* Expected results for vcvth_f16_u32.  */
+uint32_t src[8] = {A, B, C, D, E, F, G, H};
+float16_t expected[8] = {AF, BF, CF, DF, EF, FF, GF, HF};
+
+void
+exec_vcvth_f16_u32 (void)
+{
+  int index;
+
+  for (index = 0; index < 8; index++)
+    {
+      float16_t ret = vcvth_f16_u32 (src[index]);
+      if (* (uint16_t *) &ret != * (uint16_t *) &expected[index])
+	abort ();
+    }
+}
+
+int
+main (void)
+{
+  exec_vcvth_f16_u32 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_s32_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_s32_1.c
new file mode 100644
index 0000000..395a15a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_s32_1.c
@@ -0,0 +1,57 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+#include <stdio.h>
+
+#define FRAC_1 1
+#define FRAC_2 2
+
+#define FP16(a) ((__fp16) (a))
+#define A (1)
+#define B (10)
+#define C (48)
+#define D (100)
+#define E (-1)
+#define F (-10)
+#define G (7)
+#define H (-7)
+
+extern void abort ();
+
+/* Expected results for vcvth_n_f16_s32.  */
+int32_t src[8] = {A, B, C, D, E, F, G, H};
+float16_t expected[8] = {FP16 (0.5), FP16 (5), FP16 (24), FP16 (50),
+			 FP16 (-0.5), FP16 (-5), FP16 (3.5), FP16 (-3.5)};
+
+float16_t expected2[8] = {FP16 (0.25), FP16 (2.5), FP16 (12), FP16 (25),
+			  FP16 (-0.25), FP16 (-2.5), FP16 (1.75), FP16 (-1.75)};
+
+void
+exec_vcvth_n_f16_s32 (void)
+{
+  int index;
+
+  for (index = 0; index < 8; index++)
+    {
+      float16_t ret = vcvth_n_f16_s32 (src[index], FRAC_1);
+      if (* (uint16_t *) &ret != * (uint16_t *) &expected[index])
+	abort ();
+    }
+
+  for (index = 0; index < 8; index++)
+    {
+      float16_t ret = vcvth_n_f16_s32 (src[index], FRAC_2);
+      if (* (uint16_t *) &ret != * (uint16_t *) &expected2[index])
+	abort ();
+    }
+
+}
+
+int
+main (void)
+{
+  exec_vcvth_n_f16_s32 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_u32_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_u32_1.c
new file mode 100644
index 0000000..41f2fda
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_u32_1.c
@@ -0,0 +1,56 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+#include <stdio.h>
+
+#define FRAC_1 1
+#define FRAC_2 2
+
+#define FP16(a) ((__fp16) (a))
+#define A (1)
+#define B (10)
+#define C (48)
+#define D (100)
+#define E (1000)
+#define F (0)
+#define G (500)
+#define H (9)
+
+extern void abort ();
+
+/* Expected results for vcvth_n_f16_u32.  */
+uint32_t src[8] = {A, B, C, D, E, F, G, H};
+float16_t expected[8] = {FP16 (0.5), FP16 (5), FP16 (24), FP16 (50),
+			 FP16 (500), FP16 (0.0), FP16 (250), FP16 (4.5)};
+float16_t expected2[8] = {FP16 (0.25), FP16 (2.5), FP16 (12), FP16 (25),
+			  FP16 (250), FP16 (0.0), FP16 (125), FP16 (2.25)};
+
+void
+exec_vcvth_n_f16_u32 (void)
+{
+  int index;
+
+  for (index = 0; index < 8; index++)
+    {
+      float16_t ret = vcvth_n_f16_u32 (src[index], FRAC_1);
+      if (* (uint16_t *) &ret != * (uint16_t *) &expected[index])
+	abort ();
+    }
+
+  for (index = 0; index < 8; index++)
+    {
+      float16_t ret = vcvth_n_f16_u32 (src[index], FRAC_2);
+      if (* (uint16_t *) &ret != * (uint16_t *) &expected2[index])
+	abort ();
+    }
+
+}
+
+int
+main (void)
+{
+  exec_vcvth_n_f16_u32 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_s32_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_s32_f16_1.c
new file mode 100644
index 0000000..7c22298
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_s32_f16_1.c
@@ -0,0 +1,54 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+#include <stdio.h>
+
+#define FRAC_1 1
+#define FRAC_2 2
+
+#define FP16_C(a) ((__fp16) (a))
+#define A FP16_C (2.5)
+#define B FP16_C (100)
+#define C FP16_C (7.1)
+#define D FP16_C (-9.9)
+#define E FP16_C (-5.0)
+#define F FP16_C (9.1)
+#define G FP16_C (-4.8)
+#define H FP16_C (77)
+
+extern void abort ();
+
+/* Expected results for vcvth_n_s32_f16.  */
+float16_t src[8] = {A, B, C, D, E, F, G, H};
+int32_t expected[8] = {5, 200, 14, -19, -10, 18, -9, 154};
+int32_t expected2[8] = {10, 400, 28, -39, -20, 36, -19, 308};
+
+void
+exec_vcvth_n_s32_f16 (void)
+{
+  int index;
+
+  for (index = 0; index < 8; index++)
+    {
+      int32_t ret = vcvth_n_s32_f16 (src[index], FRAC_1);
+      if (ret != expected[index])
+	abort ();
+    }
+
+  for (index = 0; index < 8; index++)
+    {
+      int32_t ret = vcvth_n_s32_f16 (src[index], FRAC_2);
+      if (ret != expected2[index])
+	abort ();
+    }
+
+}
+
+int
+main (void)
+{
+  exec_vcvth_n_s32_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_u32_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_u32_f16_1.c
new file mode 100644
index 0000000..75eef3e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_u32_f16_1.c
@@ -0,0 +1,54 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+#include <stdio.h>
+
+#define FRAC_1 1
+#define FRAC_2 2
+
+#define FP16_C(a) ((__fp16) (a))
+#define A FP16_C (2.5)
+#define B FP16_C (100)
+#define C FP16_C (7.1)
+#define D FP16_C (9.9)
+#define E FP16_C (5.0)
+#define F FP16_C (9.1)
+#define G FP16_C (4.8)
+#define H FP16_C (77)
+
+extern void abort ();
+
+/* Expected results for vcvth_n_u32_f16.  */
+float16_t src[8] = {A, B, C, D, E, F, G, H};
+uint32_t expected[8] = {5, 200, 14, 19, 10, 18, 9, 154};
+uint32_t expected2[8] = {10, 400, 28, 39, 20, 36, 19, 308};
+
+void
+exec_vcvth_n_u32_f16 (void)
+{
+  int index;
+
+  for (index = 0; index < 8; index++)
+    {
+      uint32_t ret = vcvth_n_u32_f16 (src[index], FRAC_1);
+      if (ret != expected[index])
+	abort ();
+    }
+
+  for (index = 0; index < 8; index++)
+    {
+      uint32_t ret = vcvth_n_u32_f16 (src[index], FRAC_2);
+      if (ret != expected2[index])
+	abort ();
+    }
+
+}
+
+int
+main (void)
+{
+  exec_vcvth_n_u32_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_s32_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_s32_f16_1.c
new file mode 100644
index 0000000..aba0c93
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_s32_f16_1.c
@@ -0,0 +1,50 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+#define FP16_C(a) ((__fp16) a)
+#define AF FP16_C (123.9)
+#define BF FP16_C (-56.8)
+#define CF FP16_C (0.7)
+#define DF FP16_C (24.6)
+#define EF FP16_C (-63.5)
+#define FF FP16_C (169.4)
+#define GF FP16_C (-4.3)
+#define HF FP16_C (77.0)
+
+#define A (123)
+#define B (-56)
+#define C (0)
+#define D (24)
+#define E (-63)
+#define F (169)
+#define G (-4)
+#define H (77)
+
+extern void abort ();
+
+/* Expected results for vcvth_s32_f16.  */
+float16_t src[8] = {AF, BF, CF, DF, EF, FF, GF, HF};
+int32_t expected[8] = {A, B, C, D, E, F, G, H};
+
+void
+exec_vcvth_s32_f16 (void)
+{
+  int index;
+
+  for (index = 0; index < 8; index++)
+    {
+      int32_t ret = vcvth_s32_f16 (src[index]);
+      if (ret != expected[index])
+	abort ();
+    }
+}
+
+int
+main (void)
+{
+  exec_vcvth_s32_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_u32_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_u32_f16_1.c
new file mode 100644
index 0000000..378103f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_u32_f16_1.c
@@ -0,0 +1,50 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+#define FP16_C(a) ((__fp16) a)
+#define AF FP16_C (123.9)
+#define BF FP16_C (56.8)
+#define CF FP16_C (0.7)
+#define DF FP16_C (24.6)
+#define EF FP16_C (63.5)
+#define FF FP16_C (169.4)
+#define GF FP16_C (4.3)
+#define HF FP16_C (77.0)
+
+#define A (123)
+#define B (56)
+#define C (0)
+#define D (24)
+#define E (63)
+#define F (169)
+#define G (4)
+#define H (77)
+
+extern void abort ();
+
+/* Expected results for vcvth_u32_f16.  */
+float16_t src[8] = {AF, BF, CF, DF, EF, FF, GF, HF};
+uint32_t expected[8] = {A, B, C, D, E, F, G, H};
+
+void
+exec_vcvth_u32_f16 (void)
+{
+  int index;
+
+  for (index = 0; index < 8; index++)
+    {
+      uint32_t ret = vcvth_u32_f16 (src[index]);
+      if (ret != expected[index])
+	abort ();
+    }
+}
+
+int
+main (void)
+{
+  exec_vcvth_u32_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtmh_s32_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtmh_s32_f16_1.c
new file mode 100644
index 0000000..286450c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtmh_s32_f16_1.c
@@ -0,0 +1,50 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+#define FP16_C(a) ((__fp16) a)
+#define AF FP16_C (123.9)
+#define BF FP16_C (-56.8)
+#define CF FP16_C (0.7)
+#define DF FP16_C (24.6)
+#define EF FP16_C (-63.5)
+#define FF FP16_C (169.4)
+#define GF FP16_C (-4.3)
+#define HF FP16_C (77.0)
+
+#define A (123)
+#define B (-57)
+#define C (0)
+#define D (24)
+#define E (-64)
+#define F (169)
+#define G (-5)
+#define H (77)
+
+extern void abort ();
+
+/* Expected results for vcvtmh_s32_f16.  */
+float16_t src[8] = {AF, BF, CF, DF, EF, FF, GF, HF};
+int32_t expected[8] = {A, B, C, D, E, F, G, H};
+
+void
+exec_vcvtmh_s32_f16 (void)
+{
+  int index;
+
+  for (index = 0; index < 8; index++)
+    {
+      int32_t ret = vcvtmh_s32_f16 (src[index]);
+      if (ret != expected[index])
+	abort ();
+    }
+}
+
+int
+main (void)
+{
+  exec_vcvtmh_s32_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtmh_u32_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtmh_u32_f16_1.c
new file mode 100644
index 0000000..9b75827
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtmh_u32_f16_1.c
@@ -0,0 +1,50 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+#define FP16_C(a) ((__fp16) a)
+#define AF FP16_C (123.9)
+#define BF FP16_C (56.8)
+#define CF FP16_C (0.7)
+#define DF FP16_C (24.6)
+#define EF FP16_C (63.5)
+#define FF FP16_C (169.4)
+#define GF FP16_C (4.3)
+#define HF FP16_C (77.0)
+
+#define A (123)
+#define B (56)
+#define C (0)
+#define D (24)
+#define E (63)
+#define F (169)
+#define G (4)
+#define H (77)
+
+extern void abort ();
+
+/* Expected results for vcvtmh_u32_f16.  */
+float16_t src[8] = {AF, BF, CF, DF, EF, FF, GF, HF};
+uint32_t expected[8] = {A, B, C, D, E, F, G, H};
+
+void
+exec_vcvtmh_u32_f16 (void)
+{
+  int index;
+
+  for (index = 0; index < 8; index++)
+    {
+      uint32_t ret = vcvtmh_u32_f16 (src[index]);
+      if (ret != expected[index])
+	abort ();
+    }
+}
+
+int
+main (void)
+{
+  exec_vcvtmh_u32_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtnh_s32_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtnh_s32_f16_1.c
new file mode 100644
index 0000000..d6dc16a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtnh_s32_f16_1.c
@@ -0,0 +1,50 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+#define FP16_C(a) ((__fp16) a)
+#define AF FP16_C (123.9)
+#define BF FP16_C (-56.5)
+#define CF FP16_C (0.7)
+#define DF FP16_C (24.6)
+#define EF FP16_C (-63.5)
+#define FF FP16_C (169.4)
+#define GF FP16_C (-4.3)
+#define HF FP16_C (77.0)
+
+#define A (124)
+#define B (-56)
+#define C (1)
+#define D (25)
+#define E (-64)
+#define F (169)
+#define G (-4)
+#define H (77)
+
+extern void abort ();
+
+/* Expected results for vcvtnh_s32_f16.  */
+float16_t src[8] = {AF, BF, CF, DF, EF, FF, GF, HF};
+int32_t expected[8] = {A, B, C, D, E, F, G, H};
+
+void
+exec_vcvtnh_s32_f16 (void)
+{
+  int index;
+
+  for (index = 0; index < 8; index++)
+    {
+      int32_t ret = vcvtnh_s32_f16 (src[index]);
+      if (ret != expected[index])
+	abort ();
+    }
+}
+
+int
+main (void)
+{
+  exec_vcvtnh_s32_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtnh_u32_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtnh_u32_f16_1.c
new file mode 100644
index 0000000..a2d5089
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtnh_u32_f16_1.c
@@ -0,0 +1,50 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+#define FP16_C(a) ((__fp16) a)
+#define AF FP16_C (123.9)
+#define BF FP16_C (56.5)
+#define CF FP16_C (0.7)
+#define DF FP16_C (24.6)
+#define EF FP16_C (63.5)
+#define FF FP16_C (169.4)
+#define GF FP16_C (4.3)
+#define HF FP16_C (77.0)
+
+#define A (124)
+#define B (56)
+#define C (1)
+#define D (25)
+#define E (64)
+#define F (169)
+#define G (4)
+#define H (77)
+
+extern void abort ();
+
+/* Expected results for vcvtnh_u32_f16.  */
+float16_t src[8] = {AF, BF, CF, DF, EF, FF, GF, HF};
+uint32_t expected[8] = {A, B, C, D, E, F, G, H};
+
+void
+exec_vcvtnh_u32_f16 (void)
+{
+  int index;
+
+  for (index = 0; index < 8; index++)
+    {
+      uint32_t ret = vcvtnh_u32_f16 (src[index]);
+      if (ret != expected[index])
+	abort ();
+    }
+}
+
+int
+main (void)
+{
+  exec_vcvtnh_u32_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtph_s32_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtph_s32_f16_1.c
new file mode 100644
index 0000000..be7af56
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtph_s32_f16_1.c
@@ -0,0 +1,50 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+#define FP16_C(a) ((__fp16) a)
+#define AF FP16_C (123.9)
+#define BF FP16_C (-56.5)
+#define CF FP16_C (0.7)
+#define DF FP16_C (24.6)
+#define EF FP16_C (-63.5)
+#define FF FP16_C (169.4)
+#define GF FP16_C (-4.3)
+#define HF FP16_C (77.0)
+
+#define A (124)
+#define B (-56)
+#define C (1)
+#define D (25)
+#define E (-63)
+#define F (170)
+#define G (-4)
+#define H (77)
+
+extern void abort ();
+
+/* Expected results for vcvtph_s32_f16.  */
+float16_t src[8] = {AF, BF, CF, DF, EF, FF, GF, HF};
+int32_t expected[8] = {A, B, C, D, E, F, G, H};
+
+void
+exec_vcvtph_s32_f16 (void)
+{
+  int index;
+
+  for (index = 0; index < 8; index++)
+    {
+      int32_t ret = vcvtph_s32_f16 (src[index]);
+      if (ret != expected[index])
+	abort ();
+    }
+}
+
+int
+main (void)
+{
+  exec_vcvtph_s32_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtph_u32_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtph_u32_f16_1.c
new file mode 100644
index 0000000..ec6dba1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtph_u32_f16_1.c
@@ -0,0 +1,50 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+#define FP16_C(a) ((__fp16) a)
+#define AF FP16_C (123.9)
+#define BF FP16_C (56.5)
+#define CF FP16_C (0.7)
+#define DF FP16_C (24.6)
+#define EF FP16_C (63.5)
+#define FF FP16_C (169.4)
+#define GF FP16_C (4.3)
+#define HF FP16_C (77.0)
+
+#define A (124)
+#define B (57)
+#define C (1)
+#define D (25)
+#define E (64)
+#define F (170)
+#define G (5)
+#define H (77)
+
+extern void abort ();
+
+/* Expected results for vcvtph_u32_f16.  */
+float16_t src[8] = {AF, BF, CF, DF, EF, FF, GF, HF};
+uint32_t expected[8] = {A, B, C, D, E, F, G, H};
+
+void
+exec_vcvtph_u32_f16 (void)
+{
+  int index;
+
+  for (index = 0; index < 8; index++)
+    {
+      uint32_t ret = vcvtph_u32_f16 (src[index]);
+      if (ret != expected[index])
+	abort ();
+    }
+}
+
+int
+main (void)
+{
+  exec_vcvtph_u32_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vdivh_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vdivh_f16_1.c
new file mode 100644
index 0000000..6fd7f5e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vdivh_f16_1.c
@@ -0,0 +1,52 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (13.4)
+#define B FP16_C (-56.8)
+#define C FP16_C (-34.8)
+#define D FP16_C (12)
+#define E FP16_C (63.1)
+#define F FP16_C (19.1)
+#define G FP16_C (-4.8)
+#define H FP16_C (77)
+
+#define I FP16_C (0.7)
+#define J FP16_C (-78)
+#define K FP16_C (11.23)
+#define L FP16_C (98)
+#define M FP16_C (87.1)
+#define N FP16_C (-8)
+#define O FP16_C (-1.1)
+#define P FP16_C (-9.7)
+
+extern void abort ();
+
+/* Expected results for vdivh.  */
+float16_t src1[8] = {A, B, C, D, I, J, K, L};
+float16_t src2[8] = {E, F, G, H, M, N, O, P};
+float16_t expected[8] = {A / E, B / F, C / G, D / H,
+			 I / M, J / N, K / O, L / P};
+
+void
+exec_vdivh_f16 (void)
+{
+  int index;
+
+  for (index = 0; index < 8; index++)
+    {
+      float16_t ret = vdivh_f16 (src1[index], src2[index]);
+      if (* (uint16_t *) &ret != * (uint16_t *) &expected[index])
+	abort ();
+    }
+}
+
+int
+main (void)
+{
+  exec_vdivh_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfmah_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfmah_f16_1.c
new file mode 100644
index 0000000..ba8f9c5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfmah_f16_1.c
@@ -0,0 +1,69 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+#define FP16_C(a) ((__fp16) a)
+#define A0 FP16_C (123.4)
+#define B0 FP16_C (-5.8)
+#define C0 FP16_C (-3.8)
+#define D0 FP16_C (10)
+
+#define A1 FP16_C (12.4)
+#define B1 FP16_C (-5.8)
+#define C1 FP16_C (90.8)
+#define D1 FP16_C (24)
+
+#define A2 FP16_C (23.4)
+#define B2 FP16_C (-5.8)
+#define C2 FP16_C (8.9)
+#define D2 FP16_C (4)
+
+#define E0 FP16_C (3.4)
+#define F0 FP16_C (-55.8)
+#define G0 FP16_C (-31.8)
+#define H0 FP16_C (2)
+
+#define E1 FP16_C (123.4)
+#define F1 FP16_C (-5.8)
+#define G1 FP16_C (-3.8)
+#define H1 FP16_C (102)
+
+#define E2 FP16_C (4.9)
+#define F2 FP16_C (-15.8)
+#define G2 FP16_C (39.8)
+#define H2 FP16_C (49)
+
+extern void abort ();
+
+/* Expected results for vfmah_f16.  */
+
+float16_t src1[8] = {A0, B0, C0, D0, E0, F0, G0, H0};
+float16_t src2[8] = {A1, B1, C1, D1, E1, F1, G1, H1};
+float16_t src3[8] = {A2, B2, C2, D2, E2, F2, G2, H2};
+float16_t expected[8] = {A0 + A1 * A2, B0 + B1 * B2,
+			 C0 + C1 * C2, D0 + D1 * D2,
+			 E0 + E1 * E2, F0 + F1 * F2,
+			 G0 + G1 * G2, H0 + H1 * H2};
+
+void
+exec_vfmah_f16 (void)
+{
+  int index;
+
+  for (index = 0; index < 8; index++)
+    {
+      float16_t ret = vfmah_f16 (src1[index], src2[index], src3[index]);
+      if (* (uint16_t *) &ret != * (uint16_t *) &expected[index])
+	abort ();
+    }
+
+}
+
+int
+main (void)
+{
+  exec_vfmah_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfmsh_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfmsh_f16_1.c
new file mode 100644
index 0000000..c132295
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfmsh_f16_1.c
@@ -0,0 +1,69 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+#define FP16_C(a) ((__fp16) a)
+#define A0 FP16_C (123.4)
+#define B0 FP16_C (-5.8)
+#define C0 FP16_C (-3.8)
+#define D0 FP16_C (10)
+
+#define A1 FP16_C (12.4)
+#define B1 FP16_C (-5.8)
+#define C1 FP16_C (90.8)
+#define D1 FP16_C (24)
+
+#define A2 FP16_C (23.4)
+#define B2 FP16_C (-5.8)
+#define C2 FP16_C (8.9)
+#define D2 FP16_C (4)
+
+#define E0 FP16_C (3.4)
+#define F0 FP16_C (-55.8)
+#define G0 FP16_C (-31.8)
+#define H0 FP16_C (2)
+
+#define E1 FP16_C (123.4)
+#define F1 FP16_C (-5.8)
+#define G1 FP16_C (-3.8)
+#define H1 FP16_C (102)
+
+#define E2 FP16_C (4.9)
+#define F2 FP16_C (-15.8)
+#define G2 FP16_C (39.8)
+#define H2 FP16_C (49)
+
+extern void abort ();
+
+/* Expected results for vfmsh_f16.  */
+
+float16_t src1[8] = {A0, B0, C0, D0, E0, F0, G0, H0};
+float16_t src2[8] = {A1, B1, C1, D1, E1, F1, G1, H1};
+float16_t src3[8] = {A2, B2, C2, D2, E2, F2, G2, H2};
+float16_t expected[8] = {A0 + -A1 * A2, B0 + -B1 * B2,
+			 C0 + -C1 * C2, D0 + -D1 * D2,
+			 E0 + -E1 * E2, F0 + -F1 * F2,
+			 G0 + -G1 * G2, H0 + -H1 * H2};
+
+void
+exec_vfmsh_f16 (void)
+{
+  int index;
+
+  for (index = 0; index < 8; index++)
+    {
+      float16_t ret = vfmsh_f16 (src1[index], src2[index], src3[index]);
+      if (* (uint16_t *) &ret != * (uint16_t *) &expected[index])
+	abort ();
+    }
+
+}
+
+int
+main (void)
+{
+  exec_vfmsh_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmaxnmh_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmaxnmh_f16_1.c
new file mode 100644
index 0000000..7474cda
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmaxnmh_f16_1.c
@@ -0,0 +1,51 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (123.4)
+#define B FP16_C (-567.8)
+#define C FP16_C (-34.8)
+#define D FP16_C (1024)
+#define E FP16_C (663.1)
+#define F FP16_C (169.1)
+#define G FP16_C (-4.8)
+#define H FP16_C (77)
+
+#define I FP16_C (0.7)
+#define J FP16_C (-78)
+#define K FP16_C (101.23)
+#define L FP16_C (98)
+#define M FP16_C (870.1)
+#define N FP16_C (-8781)
+#define O FP16_C (__builtin_inff ()) /* +Inf */
+#define P FP16_C (-__builtin_inff ()) /* -Inf */
+
+extern void abort ();
+
+/* Expected results for vmaxnmh.  */
+float16_t src1[8] = {A, B, C, D, I, J, K, L};
+float16_t src2[8] = {E, F, G, H, M, N, O, P};
+float16_t expected[8] = {E, F, G, D, M, J, O, L};
+
+void
+exec_vmaxnmh_f16 (void)
+{
+  int index;
+
+  for (index = 0; index < 8; index++)
+    {
+      float16_t ret = vmaxnmh_f16 (src1[index], src2[index]);
+      if (* (uint16_t *) &ret != * (uint16_t *) &expected[index])
+	abort ();
+    }
+}
+
+int
+main (void)
+{
+  exec_vmaxnmh_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vminnmh_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vminnmh_f16_1.c
new file mode 100644
index 0000000..7e01549
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vminnmh_f16_1.c
@@ -0,0 +1,51 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (123.4)
+#define B FP16_C (-567.8)
+#define C FP16_C (-34.8)
+#define D FP16_C (1024)
+#define E FP16_C (663.1)
+#define F FP16_C (169.1)
+#define G FP16_C (-4.8)
+#define H FP16_C (77)
+
+#define I FP16_C (0.7)
+#define J FP16_C (-78)
+#define K FP16_C (101.23)
+#define L FP16_C (98)
+#define M FP16_C (870.1)
+#define N FP16_C (-8781)
+#define O FP16_C (__builtin_inff ()) /* +Inf */
+#define P FP16_C (-__builtin_inff ()) /* -Inf */
+
+extern void abort ();
+
+/* Expected results for vminnmh.  */
+float16_t src1[8] = {A, B, C, D, I, J, K, L};
+float16_t src2[8] = {E, F, G, H, M, N, O, P};
+float16_t expected[8] = {A, B, C, H, I, N, K, P};
+
+void
+exec_vminnmh_f16 (void)
+{
+  int index;
+
+  for (index = 0; index < 8; index++)
+    {
+      float16_t ret = vminnmh_f16 (src1[index], src2[index]);
+      if (* (uint16_t *) &ret != * (uint16_t *) &expected[index])
+	abort ();
+    }
+}
+
+int
+main (void)
+{
+  exec_vminnmh_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulh_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulh_f16_1.c
new file mode 100644
index 0000000..80244a5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulh_f16_1.c
@@ -0,0 +1,51 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (13.4)
+#define B FP16_C (-56.8)
+#define C FP16_C (-34.8)
+#define D FP16_C (12)
+#define E FP16_C (63.1)
+#define F FP16_C (19.1)
+#define G FP16_C (-4.8)
+#define H FP16_C (77)
+
+#define I FP16_C (0.7)
+#define J FP16_C (-78)
+#define K FP16_C (11.23)
+#define L FP16_C (98)
+#define M FP16_C (87.1)
+#define N FP16_C (-8)
+#define O FP16_C (-1.1)
+#define P FP16_C (-9.7)
+
+extern void abort ();
+
+/* Expected results for vmulh.  */
+float16_t src1[8] = {A, B, C, D, I, J, K, L};
+float16_t src2[8] = {E, F, G, H, M, N, O, P};
+float16_t expected[8] = {A * E, B * F, C * G, D * H,
+			 I * M, J * N, K * O, L * P};
+void
+exec_vmulh_f16 (void)
+{
+  int index;
+
+  for (index = 0; index < 8; index++)
+    {
+      float16_t ret = vmulh_f16 (src1[index], src2[index]);
+      if (* (uint16_t *) &ret != * (uint16_t *) &expected[index])
+	abort ();
+    }
+}
+
+int
+main (void)
+{
+  exec_vmulh_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vnegh_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vnegh_f16_1.c
new file mode 100644
index 0000000..6fd6c11
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vnegh_f16_1.c
@@ -0,0 +1,63 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+#define FP16_C(a) ((__fp16) a)
+#define FP16_C(a) ((__fp16) a)
+
+#define AS 123.9
+#define BS -56.8
+#define CS 0.7
+#define DS 24.6
+#define ES -63.5
+#define FS 169.4
+#define GS -4.3
+#define HS 77.0
+
+#define A FP16_C (AS)
+#define B FP16_C (BS)
+#define C FP16_C (CS)
+#define D FP16_C (DS)
+#define E FP16_C (ES)
+#define F FP16_C (FS)
+#define G FP16_C (GS)
+#define H FP16_C (HS)
+
+#define AF FP16_C (-AS)
+#define BF FP16_C (-BS)
+#define CF FP16_C (-CS)
+#define DF FP16_C (-DS)
+#define EF FP16_C (-ES)
+#define FF FP16_C (-FS)
+#define GF FP16_C (-GS)
+#define HF FP16_C (-HS)
+
+
+
+extern void abort ();
+
+/* Expected results for vnegh_f16.  */
+float16_t src[8] = {A, B, C, D, E, F, G, H};
+float16_t expected[8] = {AF, BF, CF, DF, EF, FF, GF, HF};
+
+void
+exec_vnegh_f16 (void)
+{
+  int index;
+
+  for (index = 0; index < 8; index++)
+    {
+      float16_t ret = vnegh_f16 (src[index]);
+      if (* (uint16_t *) &ret != * (uint16_t *) &expected[index])
+	abort ();
+    }
+}
+
+int
+main (void)
+{
+  exec_vnegh_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndah_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndah_f16_1.c
new file mode 100644
index 0000000..e8dc8c6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndah_f16_1.c
@@ -0,0 +1,51 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (123.4)
+#define B FP16_C (-56.8)
+#define C FP16_C (34.8)
+#define D FP16_C (24.5)
+#define E FP16_C (-663.1)
+#define F FP16_C (-144.5)
+#define G FP16_C (4.8)
+#define H FP16_C (77.0)
+
+#define AF FP16_C (123.0)
+#define BF FP16_C (-57.0)
+#define CF FP16_C (35.0)
+#define DF FP16_C (25.0)
+#define EF FP16_C (-663.0)
+#define FF FP16_C (-145.0)
+#define GF FP16_C (5.0)
+#define HF FP16_C (77.0)
+
+
+extern void abort ();
+
+/* Expected results for vrndah_f16.  */
+float16_t src[8] = {A, B, C, D, E, F, G, H};
+float16_t expected[8] = {AF, BF, CF, DF, EF, FF, GF, HF};
+
+void
+exec_vrndah_f16 (void)
+{
+  int index;
+
+  for (index = 0; index < 8; index++)
+    {
+      float16_t ret = vrndah_f16 (src[index]);
+      if (* (short *) &ret != * (short *) &expected[index])
+	abort ();
+    }
+}
+
+int
+main (void)
+{
+  exec_vrndah_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndh_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndh_f16_1.c
new file mode 100644
index 0000000..c2c3db3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndh_f16_1.c
@@ -0,0 +1,51 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (123.4)
+#define B FP16_C (-56.8)
+#define C FP16_C (34.8)
+#define D FP16_C (1024.5)
+#define E FP16_C (-663.1)
+#define F FP16_C (-144.5)
+#define G FP16_C (4.8)
+#define H FP16_C (77.0)
+
+#define AF FP16_C (123.0)
+#define BF FP16_C (-56.0)
+#define CF FP16_C (34.0)
+#define DF FP16_C (1024.0)
+#define EF FP16_C (-663.0)
+#define FF FP16_C (-144.0)
+#define GF FP16_C (4.0)
+#define HF FP16_C (77.0)
+
+
+extern void abort ();
+
+/* Expected results for vrndh_f16.  */
+float16_t src[8] = {A, B, C, D, E, F, G, H};
+float16_t expected[8] = {AF, BF, CF, DF, EF, FF, GF, HF};
+
+void
+exec_vrndh_f16 (void)
+{
+  int index;
+
+  for (index = 0; index < 8; index++)
+    {
+      float16_t ret = vrndh_f16 (src[index]);
+      if (* (short *) &ret != * (short *) &expected[index])
+	abort ();
+    }
+}
+
+int
+main (void)
+{
+  exec_vrndh_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndih_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndih_f16_1.c
new file mode 100644
index 0000000..0250965
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndih_f16_1.c
@@ -0,0 +1,51 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (123.4)
+#define B FP16_C (-56.8)
+#define C FP16_C (34.8)
+#define D FP16_C (24.5)
+#define E FP16_C (-663.1)
+#define F FP16_C (-144.5)
+#define G FP16_C (4.8)
+#define H FP16_C (77.0)
+
+#define AF FP16_C (123.0)
+#define BF FP16_C (-57.0)
+#define CF FP16_C (35.0)
+#define DF FP16_C (24.0)
+#define EF FP16_C (-663.0)
+#define FF FP16_C (-144.0)
+#define GF FP16_C (5.0)
+#define HF FP16_C (77.0)
+
+
+extern void abort ();
+
+/* Expected results for vrndih_f16.  */
+float16_t src[8] = {A, B, C, D, E, F, G, H};
+float16_t expected[8] = {AF, BF, CF, DF, EF, FF, GF, HF};
+
+void
+exec_vrndih_f16 (void)
+{
+  int index;
+
+  for (index = 0; index < 8; index++)
+    {
+      float16_t ret = vrndih_f16 (src[index]);
+      if (* (short *) &ret != * (short *) &expected[index])
+	abort ();
+    }
+}
+
+int
+main (void)
+{
+  exec_vrndih_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndmh_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndmh_f16_1.c
new file mode 100644
index 0000000..c098281
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndmh_f16_1.c
@@ -0,0 +1,51 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (123.4)
+#define B FP16_C (-56.8)
+#define C FP16_C (34.8)
+#define D FP16_C (24.5)
+#define E FP16_C (-63.1)
+#define F FP16_C (-144.5)
+#define G FP16_C (4.8)
+#define H FP16_C (77.0)
+
+#define AF FP16_C (123.0)
+#define BF FP16_C (-57.0)
+#define CF FP16_C (34.0)
+#define DF FP16_C (24.0)
+#define EF FP16_C (-64.0)
+#define FF FP16_C (-145.0)
+#define GF FP16_C (4.0)
+#define HF FP16_C (77.0)
+
+
+extern void abort ();
+
+/* Expected results for vrndmh_f16.  */
+float16_t src[8] = {A, B, C, D, E, F, G, H};
+float16_t expected[8] = {AF, BF, CF, DF, EF, FF, GF, HF};
+
+void
+exec_vrndmh_f16 (void)
+{
+  int index;
+
+  for (index = 0; index < 8; index++)
+    {
+      float16_t ret = vrndmh_f16 (src[index]);
+      if (* (short *) &ret != * (short *) &expected[index])
+	abort ();
+    }
+}
+
+int
+main (void)
+{
+  exec_vrndmh_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndnh_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndnh_f16_1.c
new file mode 100644
index 0000000..227d9cb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndnh_f16_1.c
@@ -0,0 +1,51 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (123.4)
+#define B FP16_C (-56.8)
+#define C FP16_C (34.8)
+#define D FP16_C (24.5)
+#define E FP16_C (-63.1)
+#define F FP16_C (-143.5)
+#define G FP16_C (4.8)
+#define H FP16_C (77.0)
+
+#define AF FP16_C (123.0)
+#define BF FP16_C (-57.0)
+#define CF FP16_C (35.0)
+#define DF FP16_C (24.0)
+#define EF FP16_C (-63.0)
+#define FF FP16_C (-144.0)
+#define GF FP16_C (5.0)
+#define HF FP16_C (77.0)
+
+
+extern void abort ();
+
+/* Expected results for vrndnh_f16.  */
+float16_t src[8] = {A, B, C, D, E, F, G, H};
+float16_t expected[8] = {AF, BF, CF, DF, EF, FF, GF, HF};
+
+void
+exec_vrndnh_f16 (void)
+{
+  int index;
+
+  for (index = 0; index < 8; index++)
+    {
+      float16_t ret = vrndnh_f16 (src[index]);
+      if (* (short *) &ret != * (short *) &expected[index])
+	abort ();
+    }
+}
+
+int
+main (void)
+{
+  exec_vrndnh_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndph_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndph_f16_1.c
new file mode 100644
index 0000000..14af2ea
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndph_f16_1.c
@@ -0,0 +1,51 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (123.4)
+#define B FP16_C (-56.8)
+#define C FP16_C (34.8)
+#define D FP16_C (24.5)
+#define E FP16_C (-63.1)
+#define F FP16_C (-143.5)
+#define G FP16_C (4.8)
+#define H FP16_C (77.0)
+
+#define AF FP16_C (124.0)
+#define BF FP16_C (-56.0)
+#define CF FP16_C (35.0)
+#define DF FP16_C (25.0)
+#define EF FP16_C (-63.0)
+#define FF FP16_C (-143.0)
+#define GF FP16_C (5.0)
+#define HF FP16_C (77.0)
+
+
+extern void abort ();
+
+/* Expected results for vrndph_f16.  */
+float16_t src[8] = {A, B, C, D, E, F, G, H};
+float16_t expected[8] = {AF, BF, CF, DF, EF, FF, GF, HF};
+
+void
+exec_vrndph_f16 (void)
+{
+  int index;
+
+  for (index = 0; index < 8; index++)
+    {
+      float16_t ret = vrndph_f16 (src[index]);
+      if (* (short *) &ret != * (short *) &expected[index])
+	abort ();
+    }
+}
+
+int
+main (void)
+{
+  exec_vrndph_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndxh_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndxh_f16_1.c
new file mode 100644
index 0000000..90f6bd6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndxh_f16_1.c
@@ -0,0 +1,51 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (123.4)
+#define B FP16_C (-56.8)
+#define C FP16_C (34.8)
+#define D FP16_C (24.5)
+#define E FP16_C (-663.1)
+#define F FP16_C (-144.5)
+#define G FP16_C (4.8)
+#define H FP16_C (77.0)
+
+#define AF FP16_C (123.0)
+#define BF FP16_C (-57.0)
+#define CF FP16_C (35.0)
+#define DF FP16_C (24.0)
+#define EF FP16_C (-663.0)
+#define FF FP16_C (-144.0)
+#define GF FP16_C (5.0)
+#define HF FP16_C (77.0)
+
+
+extern void abort ();
+
+/* Expected results for vrndxh_f16.  */
+float16_t src[8] = {A, B, C, D, E, F, G, H};
+float16_t expected[8] = {AF, BF, CF, DF, EF, FF, GF, HF};
+
+void
+exec_vrndxh_f16 (void)
+{
+  int index;
+
+  for (index = 0; index < 8; index++)
+    {
+      float16_t ret = vrndxh_f16 (src[index]);
+      if (* (short *) &ret != * (short *) &expected[index])
+	abort ();
+    }
+}
+
+int
+main (void)
+{
+  exec_vrndxh_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vsqrth_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vsqrth_f16_1.c
new file mode 100644
index 0000000..e482318
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vsqrth_f16_1.c
@@ -0,0 +1,58 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+extern void abort (void);
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (12.4)
+#define B FP16_C (5.8)
+#define C FP16_C (3.8)
+#define D FP16_C (10)
+#define E FP16_C (66.1)
+#define F FP16_C (16.1)
+#define G FP16_C (4.8)
+#define H FP16_C (77)
+
+#define SQRT_A FP16_C (3.5213)
+#define SQRT_B FP16_C (2.4083)
+#define SQRT_C FP16_C (1.9493)
+#define SQRT_D FP16_C (3.1622)
+#define SQRT_E FP16_C (8.1301)
+#define SQRT_F FP16_C (4.0124)
+#define SQRT_G FP16_C (2.1908)
+#define SQRT_H FP16_C (8.7749)
+
+
+/* Expected results for vsqrth.  */
+float16_t src[8] = {A, B, C, D, E, F, G, H};
+float16_t expected[8] = {SQRT_A, SQRT_B, SQRT_C, SQRT_D,
+			 SQRT_E, SQRT_F, SQRT_G, SQRT_H};
+
+/* The acceptable difference between the bit-patterns for the expected and
+   actual results.  */
+const int bias = 0;
+
+void
+exec_vsqrth_f16 (void)
+{
+  int index;
+  for (index = 0; index < 8; index++)
+    {
+      float16_t ret = vsqrth_f16 (src[index]);
+      uint32_t diff;
+
+      diff = (* (uint16_t *) &ret) - (* (uint16_t *) &expected[index]);
+      if (diff > bias)
+	abort ();
+    }
+}
+
+int
+main (void)
+{
+  exec_vsqrth_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vsubh_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vsubh_f16_1.c
new file mode 100644
index 0000000..455f363
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vsubh_f16_1.c
@@ -0,0 +1,42 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (123.4)
+#define B FP16_C (-567.8)
+#define C FP16_C (-34.8)
+#define D FP16_C (1024)
+#define E FP16_C (663.1)
+#define F FP16_C (169.1)
+#define G FP16_C (-4.8)
+#define H FP16_C (77)
+
+extern void abort ();
+
+/* Expected results for vsubh.  */
+float16_t src1[4] = {A, B, C, D};
+float16_t src2[4] = {E, F, G, H};
+float16_t expected[4] = {A - E, B - F, C - G, D - H};
+
+void
+exec_vsubh_f16 (void)
+{
+  int index;
+
+  for (index = 0; index < 4; index++)
+    {
+      float16_t ret = vsubh_f16 (src1[index], src2[index]);
+      if (* (uint16_t *) &ret != * (uint16_t *) &expected[index])
+	abort ();
+    }
+}
+
+int
+main (void)
+{
+  exec_vsubh_f16 ();
+  return 0;
+}
-- 
2.1.4


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 17/17][ARM] Add tests for NEON FP16 ACLE intrinsics.
  2016-05-17 14:20 [PATCH 0/17][ARM] ARMv8.2-A and FP16 extension support Matthew Wahab
                   ` (15 preceding siblings ...)
  2016-05-17 14:51 ` [PATCH 16/17][ARM] Add tests for VFP FP16 ACLE instrinsics Matthew Wahab
@ 2016-05-17 14:52 ` Matthew Wahab
  2016-07-04 14:22   ` Matthew Wahab
  16 siblings, 1 reply; 73+ messages in thread
From: Matthew Wahab @ 2016-05-17 14:52 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 4065 bytes --]

Support for using the half-precision floating point operations added by
the ARMv8.2-A FP16 extension is based on the macros and intrinsics added
to the ACLE for the extension.

This patch adds executable tests for the ACLE Adv.SIMD (NEON) intrinsics
to the advsimd-intrinsics testsuite. The tests were written by Jiong
Wang.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator. Also tested for aarch64-none-elf with the
advsimd-intrinsics testsuite using an ARMv8.2-A emulator.

Ok for trunk?
Matthew

testsuite/
2016-05-17  Jiong Wang  <jiong.wang@arm.com>
	    Matthew Wahab  <matthew.wahab@arm.com>

	* gcc.target/aarch64/advsimd-intrinsics/vabd_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vabs_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vadd_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcage_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcagt_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcale_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcalt_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vceq_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vceqz_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcge_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcgez_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcgt_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcgtz_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcle_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vclez_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vclt_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcltz_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvt_f16_s16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvt_f16_u16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvt_n_f16_s16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvt_n_f16_u16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvt_n_s16_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvt_n_u16_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvt_s16_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvt_u16_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvta_s16_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvta_u16_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvtm_s16_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvtm_u16_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvtp_s16_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvtp_u16_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vfma_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vfms_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vmax_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vmaxnm_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vmin_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vminnm_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vmul_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vmul_lane_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vmul_n_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vneg_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vpadd_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vpmax_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vpmin_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vrecpe_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vrecps_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vrnd_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vrnda_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vrndm_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vrndn_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vrndp_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vrndx_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vrsqrts_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vsub_f16_1.c: New.


[-- Attachment #2: 0017-PATCH-17-17-ARM-Add-tests-for-NEON-FP16-ACLE-intrins.patch --]
[-- Type: text/x-patch, Size: 157568 bytes --]

From ed12d5911f5cb5634ca6c014a366f4ae7559ad22 Mon Sep 17 00:00:00 2001
From: Matthew Wahab <matthew.wahab@arm.com>
Date: Thu, 7 Apr 2016 15:41:45 +0100
Subject: [PATCH 17/17] [PATCH 17/17][ARM] Add tests for NEON FP16 ACLE
 intrinsics.

testsuite/
2016-05-17  Jiong Wang  <jiong.wang@arm.com>
	    Matthew Wahab  <matthew.wahab@arm.com>

	* gcc.target/aarch64/advsimd-intrinsics/vabd_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vabs_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vadd_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcage_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcagt_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcale_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcalt_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vceq_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vceqz_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcge_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcgez_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcgt_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcgtz_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcle_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vclez_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vclt_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcltz_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvt_f16_s16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvt_f16_u16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvt_n_f16_s16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvt_n_f16_u16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvt_n_s16_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvt_n_u16_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvt_s16_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvt_u16_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvta_s16_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvta_u16_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvtm_s16_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvtm_u16_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvtp_s16_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvtp_u16_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vfma_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vfms_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vmax_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vmaxnm_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vmin_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vminnm_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vmul_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vmul_lane_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vmul_n_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vneg_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vpadd_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vpmax_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vpmin_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vrecpe_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vrecps_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vrnd_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vrnda_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vrndm_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vrndn_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vrndp_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vrndx_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vrsqrts_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vsub_f16_1.c: New.
---
 .../aarch64/advsimd-intrinsics/vabd_f16_1.c        |  82 +++++++++
 .../aarch64/advsimd-intrinsics/vabs_f16_1.c        |  65 +++++++
 .../aarch64/advsimd-intrinsics/vadd_f16_1.c        |  82 +++++++++
 .../aarch64/advsimd-intrinsics/vcage_f16_1.c       |  77 +++++++++
 .../aarch64/advsimd-intrinsics/vcagt_f16_1.c       |  77 +++++++++
 .../aarch64/advsimd-intrinsics/vcale_f16_1.c       |  78 +++++++++
 .../aarch64/advsimd-intrinsics/vcalt_f16_1.c       |  78 +++++++++
 .../aarch64/advsimd-intrinsics/vceq_f16_1.c        |  76 ++++++++
 .../aarch64/advsimd-intrinsics/vceqz_f16_1.c       |  61 +++++++
 .../aarch64/advsimd-intrinsics/vcge_f16_1.c        |  76 ++++++++
 .../aarch64/advsimd-intrinsics/vcgez_f16_1.c       |  61 +++++++
 .../aarch64/advsimd-intrinsics/vcgt_f16_1.c        |  76 ++++++++
 .../aarch64/advsimd-intrinsics/vcgtz_f16_1.c       |  61 +++++++
 .../aarch64/advsimd-intrinsics/vcle_f16_1.c        |  76 ++++++++
 .../aarch64/advsimd-intrinsics/vclez_f16_1.c       |  61 +++++++
 .../aarch64/advsimd-intrinsics/vclt_f16_1.c        |  77 +++++++++
 .../aarch64/advsimd-intrinsics/vcltz_f16_1.c       |  61 +++++++
 .../aarch64/advsimd-intrinsics/vcvt_f16_s16_1.c    |  70 ++++++++
 .../aarch64/advsimd-intrinsics/vcvt_f16_u16_1.c    |  70 ++++++++
 .../aarch64/advsimd-intrinsics/vcvt_n_f16_s16_1.c  |  73 ++++++++
 .../aarch64/advsimd-intrinsics/vcvt_n_f16_u16_1.c  |  73 ++++++++
 .../aarch64/advsimd-intrinsics/vcvt_n_s16_f16_1.c  |  66 +++++++
 .../aarch64/advsimd-intrinsics/vcvt_n_u16_f16_1.c  |  67 +++++++
 .../aarch64/advsimd-intrinsics/vcvt_s16_f16_1.c    |  65 +++++++
 .../aarch64/advsimd-intrinsics/vcvt_u16_f16_1.c    |  65 +++++++
 .../aarch64/advsimd-intrinsics/vcvta_s16_f16_1.c   |  70 ++++++++
 .../aarch64/advsimd-intrinsics/vcvta_u16_f16_1.c   |  70 ++++++++
 .../aarch64/advsimd-intrinsics/vcvtm_s16_f16_1.c   |  70 ++++++++
 .../aarch64/advsimd-intrinsics/vcvtm_u16_f16_1.c   |  70 ++++++++
 .../aarch64/advsimd-intrinsics/vcvtp_s16_f16_1.c   |  70 ++++++++
 .../aarch64/advsimd-intrinsics/vcvtp_u16_f16_1.c   |  70 ++++++++
 .../aarch64/advsimd-intrinsics/vfma_f16_1.c        | 106 ++++++++++++
 .../aarch64/advsimd-intrinsics/vfms_f16_1.c        | 104 +++++++++++
 .../aarch64/advsimd-intrinsics/vmax_f16_1.c        |  81 +++++++++
 .../aarch64/advsimd-intrinsics/vmaxnm_f16_1.c      |  82 +++++++++
 .../aarch64/advsimd-intrinsics/vmin_f16_1.c        |  81 +++++++++
 .../aarch64/advsimd-intrinsics/vminnm_f16_1.c      |  83 +++++++++
 .../aarch64/advsimd-intrinsics/vmul_f16_1.c        |  82 +++++++++
 .../aarch64/advsimd-intrinsics/vmul_lane_f16_1.c   | 155 +++++++++++++++++
 .../aarch64/advsimd-intrinsics/vmul_n_f16_1.c      | 192 +++++++++++++++++++++
 .../aarch64/advsimd-intrinsics/vneg_f16_1.c        |  65 +++++++
 .../aarch64/advsimd-intrinsics/vpadd_f16_1.c       |  87 ++++++++++
 .../aarch64/advsimd-intrinsics/vpmax_f16_1.c       |  87 ++++++++++
 .../aarch64/advsimd-intrinsics/vpmin_f16_1.c       |  86 +++++++++
 .../aarch64/advsimd-intrinsics/vrecpe_f16_1.c      |  75 ++++++++
 .../aarch64/advsimd-intrinsics/vrecps_f16_1.c      |  86 +++++++++
 .../aarch64/advsimd-intrinsics/vrnd_f16_1.c        |  74 ++++++++
 .../aarch64/advsimd-intrinsics/vrnda_f16_1.c       |  74 ++++++++
 .../aarch64/advsimd-intrinsics/vrndm_f16_1.c       |  74 ++++++++
 .../aarch64/advsimd-intrinsics/vrndn_f16_1.c       |  74 ++++++++
 .../aarch64/advsimd-intrinsics/vrndp_f16_1.c       |  74 ++++++++
 .../aarch64/advsimd-intrinsics/vrndx_f16_1.c       |  74 ++++++++
 .../aarch64/advsimd-intrinsics/vrsqrts_f16_1.c     |  92 ++++++++++
 .../aarch64/advsimd-intrinsics/vsub_f16_1.c        |  82 +++++++++
 54 files changed, 4264 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vabd_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vabs_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vadd_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcage_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcagt_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcale_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcalt_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vceq_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vceqz_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcge_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcgez_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcgt_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcgtz_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcle_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vclez_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vclt_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcltz_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_f16_s16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_f16_u16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_n_f16_s16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_n_f16_u16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_n_s16_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_n_u16_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_s16_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_u16_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvta_s16_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvta_u16_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtm_s16_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtm_u16_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtp_s16_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtp_u16_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfma_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfms_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmax_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmaxnm_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmin_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vminnm_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmul_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmul_lane_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmul_n_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vneg_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vpadd_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vpmax_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vpmin_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrecpe_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrecps_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnd_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnda_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndm_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndn_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndp_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndx_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrsqrts_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vsub_f16_1.c

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vabd_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vabd_f16_1.c
new file mode 100644
index 0000000..34dc784
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vabd_f16_1.c
@@ -0,0 +1,82 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (123.4)
+#define B FP16_C (-567.8)
+#define C FP16_C (-34.8)
+#define D FP16_C (1024)
+#define E FP16_C (663.1)
+#define F FP16_C (169.1)
+#define G FP16_C (-4.8)
+#define H FP16_C (77)
+
+#define I FP16_C (0.7)
+#define J FP16_C (-78)
+#define K FP16_C (101.23)
+#define L FP16_C (98)
+#define M FP16_C (870.1)
+#define N FP16_C (-8781)
+#define O FP16_C (-1.1)
+#define P FP16_C (47823)
+
+/* Expected results for vabd.  */
+VECT_VAR_DECL (expected, float, 16, 4) [] = {E - A, F - B, -C + G, D - H};
+hfloat16_t * VECT_VAR (expected_static, hfloat, 16, 4) =
+  (hfloat16_t *) VECT_VAR (expected, float, 16, 4);
+
+VECT_VAR_DECL (expected, float, 16, 8) [] = {E - A, F - B, -C + G, D - H,
+					     M - I, -N + J, K - O, P - L};
+hfloat16_t * VECT_VAR (expected_static, hfloat, 16, 8) =
+  (hfloat16_t *) VECT_VAR (expected, float, 16, 8);
+
+void
+exec_vabd_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VABD (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc_1, float, 16, 4);
+  DECL_VARIABLE (vsrc_2, float, 16, 4);
+  VECT_VAR_DECL (buf_src_1, float, 16, 4) [] = {A, B, C, D};
+  VECT_VAR_DECL (buf_src_2, float, 16, 4) [] = {E, F, G, H};
+  VLOAD (vsrc_1, buf_src_1, , float, f, 16, 4);
+  VLOAD (vsrc_2, buf_src_2, , float, f, 16, 4);
+  DECL_VARIABLE (vector_res, float, 16, 4) =
+    vabd_f16 (VECT_VAR (vsrc_1, float, 16, 4), VECT_VAR (vsrc_2, float, 16, 4));
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_static, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VABDQ (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc_1, float, 16, 8);
+  DECL_VARIABLE (vsrc_2, float, 16, 8);
+  VECT_VAR_DECL (buf_src_1, float, 16, 8) [] = {A, B, C, D, I, J, K, L};
+  VECT_VAR_DECL (buf_src_2, float, 16, 8) [] = {E, F, G, H, M, N, O, P};
+  VLOAD (vsrc_1, buf_src_1, q, float, f, 16, 8);
+  VLOAD (vsrc_2, buf_src_2, q, float, f, 16, 8);
+  DECL_VARIABLE (vector_res, float, 16, 8) =
+    vabdq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+	       VECT_VAR (vsrc_2, float, 16, 8));
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_static, "");
+}
+
+int
+main (void)
+{
+  exec_vabd_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vabs_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vabs_f16_1.c
new file mode 100644
index 0000000..8748726
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vabs_f16_1.c
@@ -0,0 +1,65 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (123.4)
+#define B FP16_C (-567.8)
+#define C FP16_C (-34.8)
+#define D FP16_C (1024)
+#define E FP16_C (663.1)
+#define F FP16_C (169.1)
+#define G FP16_C (-4.8)
+#define H FP16_C (77)
+
+/* Expected results for vabs.  */
+VECT_VAR_DECL (expected, float, 16, 4) [] = {A, -B, -C, D};
+hfloat16_t * VECT_VAR (expected_static, hfloat, 16, 4) =
+  (hfloat16_t *) VECT_VAR (expected, float, 16, 4);
+
+VECT_VAR_DECL (expected, float, 16, 8) [] = {A, -B, -C, D, E, F, -G, H};
+hfloat16_t * VECT_VAR (expected_static, hfloat, 16, 8) =
+  (hfloat16_t *) VECT_VAR (expected, float, 16, 8);
+
+void
+exec_vabs_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VABS (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc, float, 16, 4);
+  VECT_VAR_DECL (buf_src, float, 16, 4) [] = {A, B, C, D};
+  VLOAD (vsrc, buf_src, , float, f, 16, 4);
+  DECL_VARIABLE (vector_res, float, 16, 4) =
+    vabs_f16 (VECT_VAR (vsrc, float, 16, 4));
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_static, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VABSQ (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc, float, 16, 8);
+  VECT_VAR_DECL (buf_src, float, 16, 8) [] = {A, B, C, D, E, F, G, H};
+  VLOAD (vsrc, buf_src, q, float, f, 16, 8);
+  DECL_VARIABLE (vector_res, float, 16, 8) =
+    vabsq_f16 (VECT_VAR (vsrc, float, 16, 8));
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_static, "");
+}
+
+int
+main (void)
+{
+  exec_vabs_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vadd_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vadd_f16_1.c
new file mode 100644
index 0000000..c741bd4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vadd_f16_1.c
@@ -0,0 +1,82 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (123.4)
+#define B FP16_C (-5.8)
+#define C FP16_C (-3.8)
+#define D FP16_C (1024)
+#define E FP16_C (663.1)
+#define F FP16_C (169.1)
+#define G FP16_C (-4.8)
+#define H FP16_C (-77)
+
+#define I FP16_C (0.7)
+#define J FP16_C (-78)
+#define K FP16_C (101.23)
+#define L FP16_C (98)
+#define M FP16_C (870.1)
+#define N FP16_C (-8781)
+#define O FP16_C (-1.1)
+#define P FP16_C (47823)
+
+/* Expected results for vadd.  */
+VECT_VAR_DECL (expected, float, 16, 4) [] = {A + E, B + F, C + G, D + H};
+hfloat16_t * VECT_VAR (expected_static, hfloat, 16, 4) =
+  (hfloat16_t *) VECT_VAR (expected, float, 16, 4);
+
+VECT_VAR_DECL (expected, float, 16, 8) [] = {A + E, B + F, C + G, D + H,
+					     I + M, J + N, K + O, L + P};
+hfloat16_t * VECT_VAR (expected_static, hfloat, 16, 8) =
+  (hfloat16_t *) VECT_VAR (expected, float, 16, 8);
+
+void
+exec_vadd_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VADD (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc_1, float, 16, 4);
+  DECL_VARIABLE (vsrc_2, float, 16, 4);
+  VECT_VAR_DECL (buf_src_1, float, 16, 4) [] = {A, B, C, D};
+  VECT_VAR_DECL (buf_src_2, float, 16, 4) [] = {E, F, G, H};
+  VLOAD (vsrc_1, buf_src_1, , float, f, 16, 4);
+  VLOAD (vsrc_2, buf_src_2, , float, f, 16, 4);
+  DECL_VARIABLE (vector_res, float, 16, 4) =
+    vadd_f16 (VECT_VAR (vsrc_1, float, 16, 4),VECT_VAR (vsrc_2, float, 16, 4));
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_static, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VADDQ (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc_1, float, 16, 8);
+  DECL_VARIABLE (vsrc_2, float, 16, 8);
+  VECT_VAR_DECL (buf_src_1, float, 16, 8) [] = {A, B, C, D, I, J, K, L};
+  VECT_VAR_DECL (buf_src_2, float, 16, 8) [] = {E, F, G, H, M, N, O, P};
+  VLOAD (vsrc_1, buf_src_1, q, float, f, 16, 8);
+  VLOAD (vsrc_2, buf_src_2, q, float, f, 16, 8);
+  DECL_VARIABLE (vector_res, float, 16, 8) =
+    vaddq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+	       VECT_VAR (vsrc_2, float, 16, 8));
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_static, "");
+}
+
+int
+main (void)
+{
+  exec_vadd_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcage_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcage_f16_1.c
new file mode 100644
index 0000000..1438fe3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcage_f16_1.c
@@ -0,0 +1,77 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (123.4)
+#define B FP16_C (-5.8)
+#define C FP16_C (-3.8)
+#define D FP16_C (1024)
+#define E FP16_C (663.1)
+#define F FP16_C (169.1)
+#define G FP16_C (-4.8)
+#define H FP16_C (-77)
+
+#define I FP16_C (0.7)
+#define J FP16_C (-78)
+#define K FP16_C (101.23)
+#define L FP16_C (98)
+#define M FP16_C (870.1)
+#define N FP16_C (-8781)
+#define O FP16_C (-1.1)
+#define P FP16_C (47823)
+
+/* Expected results for vcage.  */
+VECT_VAR_DECL (expected, uint, 16, 4) [] = {0x0, 0x0, 0x0, 0xFFFF};
+VECT_VAR_DECL (expected, uint, 16, 8) [] = {0x0, 0x0, 0x0, 0xFFFF,
+					    0x0, 0x0, 0xFFFF, 0x0};
+void
+exec_vcage_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VCAGE (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc_1, float, 16, 4);
+  DECL_VARIABLE (vsrc_2, float, 16, 4);
+  VECT_VAR_DECL (buf_src_1, float, 16, 4) [] = {A, B, C, D};
+  VECT_VAR_DECL (buf_src_2, float, 16, 4) [] = {E, F, G, H};
+  VLOAD (vsrc_1, buf_src_1, , float, f, 16, 4);
+  VLOAD (vsrc_2, buf_src_2, , float, f, 16, 4);
+  DECL_VARIABLE (vector_res, uint, 16, 4) =
+    vcage_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+	       VECT_VAR (vsrc_2, float, 16, 4));
+  vst1_u16 (VECT_VAR (result, uint, 16, 4),
+	    VECT_VAR (vector_res, uint, 16, 4));
+
+  CHECK (TEST_MSG, uint, 16, 4, PRIx16, expected, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VCAGEQ (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc_1, float, 16, 8);
+  DECL_VARIABLE (vsrc_2, float, 16, 8);
+  VECT_VAR_DECL (buf_src_1, float, 16, 8) [] = {A, B, C, D, I, J, K, L};
+  VECT_VAR_DECL (buf_src_2, float, 16, 8) [] = {E, F, G, H, M, N, O, P};
+  VLOAD (vsrc_1, buf_src_1, q, float, f, 16, 8);
+  VLOAD (vsrc_2, buf_src_2, q, float, f, 16, 8);
+  DECL_VARIABLE (vector_res, uint, 16, 8) =
+    vcageq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		VECT_VAR (vsrc_2, float, 16, 8));
+  vst1q_u16 (VECT_VAR (result, uint, 16, 8),
+	     VECT_VAR (vector_res, uint, 16, 8));
+
+  CHECK (TEST_MSG, uint, 16, 8, PRIx16, expected, "");
+}
+
+int
+main (void)
+{
+  exec_vcage_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcagt_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcagt_f16_1.c
new file mode 100644
index 0000000..a4e1134
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcagt_f16_1.c
@@ -0,0 +1,77 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (123.4)
+#define B FP16_C (-5.8)
+#define C FP16_C (-3.8)
+#define D FP16_C (1024)
+#define E FP16_C (123.1)
+#define F FP16_C (169.1)
+#define G FP16_C (3.8)
+#define H FP16_C (-77)
+
+#define I FP16_C (0.7)
+#define J FP16_C (-78)
+#define K FP16_C (101.23)
+#define L FP16_C (98)
+#define M FP16_C (870.1)
+#define N FP16_C (-78.3)
+#define O FP16_C (-1.1)
+#define P FP16_C (47823)
+
+/* Expected results for vcagt.  */
+VECT_VAR_DECL (expected, uint, 16, 4) [] = {0xFFFF, 0x0, 0x0, 0xFFFF};
+VECT_VAR_DECL (expected, uint, 16, 8) [] = {0xFFFF, 0x0, 0x0, 0xFFFF,
+					    0x0, 0x0, 0xFFFF, 0x0};
+void
+exec_vcagt_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VCAGT (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc_1, float, 16, 4);
+  DECL_VARIABLE (vsrc_2, float, 16, 4);
+  VECT_VAR_DECL (buf_src_1, float, 16, 4) [] = {A, B, C, D};
+  VECT_VAR_DECL (buf_src_2, float, 16, 4) [] = {E, F, G, H};
+  VLOAD (vsrc_1, buf_src_1, , float, f, 16, 4);
+  VLOAD (vsrc_2, buf_src_2, , float, f, 16, 4);
+  DECL_VARIABLE (vector_res, uint, 16, 4) =
+    vcagt_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+	       VECT_VAR (vsrc_2, float, 16, 4));
+  vst1_u16 (VECT_VAR (result, uint, 16, 4),
+	    VECT_VAR (vector_res, uint, 16, 4));
+
+  CHECK (TEST_MSG, uint, 16, 4, PRIx16, expected, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VCAGTQ (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc_1, float, 16, 8);
+  DECL_VARIABLE (vsrc_2, float, 16, 8);
+  VECT_VAR_DECL (buf_src_1, float, 16, 8) [] = {A, B, C, D, I, J, K, L};
+  VECT_VAR_DECL (buf_src_2, float, 16, 8) [] = {E, F, G, H, M, N, O, P};
+  VLOAD (vsrc_1, buf_src_1, q, float, f, 16, 8);
+  VLOAD (vsrc_2, buf_src_2, q, float, f, 16, 8);
+  DECL_VARIABLE (vector_res, uint, 16, 8) =
+    vcagtq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		VECT_VAR (vsrc_2, float, 16, 8));
+  vst1q_u16 (VECT_VAR (result, uint, 16, 8),
+	     VECT_VAR (vector_res, uint, 16, 8));
+
+  CHECK (TEST_MSG, uint, 16, 8, PRIx16, expected, "");
+}
+
+int
+main (void)
+{
+  exec_vcagt_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcale_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcale_f16_1.c
new file mode 100644
index 0000000..08707d5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcale_f16_1.c
@@ -0,0 +1,78 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (123.4)
+#define B FP16_C (-5.8)
+#define C FP16_C (-3.8)
+#define D FP16_C (1024)
+#define E FP16_C (-123.4)
+#define F FP16_C (169.1)
+#define G FP16_C (3.8)
+#define H FP16_C (-77)
+
+#define I FP16_C (0.7)
+#define J FP16_C (-78)
+#define K FP16_C (101.23)
+#define L FP16_C (98)
+#define M FP16_C (870.1)
+#define N FP16_C (78.3)
+#define O FP16_C (-1.1)
+#define P FP16_C (47823)
+
+/* Expected results for vcale.  */
+VECT_VAR_DECL (expected, uint, 16, 4) [] = {0xFFFF, 0xFFFF, 0xFFFF, 0x0};
+VECT_VAR_DECL (expected, uint, 16, 8) [] = {0xFFFF, 0xFFFF, 0xFFFF, 0x0,
+					    0xFFFF, 0xFFFF, 0x0, 0xFFFF};
+
+void
+exec_vcale_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VCALE (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc_1, float, 16, 4);
+  DECL_VARIABLE (vsrc_2, float, 16, 4);
+  VECT_VAR_DECL (buf_src_1, float, 16, 4) [] = {A, B, C, D};
+  VECT_VAR_DECL (buf_src_2, float, 16, 4) [] = {E, F, G, H};
+  VLOAD (vsrc_1, buf_src_1, , float, f, 16, 4);
+  VLOAD (vsrc_2, buf_src_2, , float, f, 16, 4);
+  DECL_VARIABLE (vector_res, uint, 16, 4) =
+    vcale_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+	       VECT_VAR (vsrc_2, float, 16, 4));
+  vst1_u16 (VECT_VAR (result, uint, 16, 4),
+	    VECT_VAR (vector_res, uint, 16, 4));
+
+  CHECK (TEST_MSG, uint, 16, 4, PRIx16, expected, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VCALEQ (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc_1, float, 16, 8);
+  DECL_VARIABLE (vsrc_2, float, 16, 8);
+  VECT_VAR_DECL (buf_src_1, float, 16, 8) [] = {A, B, C, D, I, J, K, L};
+  VECT_VAR_DECL (buf_src_2, float, 16, 8) [] = {E, F, G, H, M, N, O, P};
+  VLOAD (vsrc_1, buf_src_1, q, float, f, 16, 8);
+  VLOAD (vsrc_2, buf_src_2, q, float, f, 16, 8);
+  DECL_VARIABLE (vector_res, uint, 16, 8) =
+    vcaleq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		VECT_VAR (vsrc_2, float, 16, 8));
+  vst1q_u16 (VECT_VAR (result, uint, 16, 8),
+	     VECT_VAR (vector_res, uint, 16, 8));
+
+  CHECK (TEST_MSG, uint, 16, 8, PRIx16, expected, "");
+}
+
+int
+main (void)
+{
+  exec_vcale_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcalt_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcalt_f16_1.c
new file mode 100644
index 0000000..ade4db5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcalt_f16_1.c
@@ -0,0 +1,78 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (123.4)
+#define B FP16_C (-5.8)
+#define C FP16_C (-3.8)
+#define D FP16_C (1024)
+#define E FP16_C (-123.4)
+#define F FP16_C (169.1)
+#define G FP16_C (3.8)
+#define H FP16_C (-77)
+
+#define I FP16_C (0.7)
+#define J FP16_C (-78)
+#define K FP16_C (101.23)
+#define L FP16_C (98)
+#define M FP16_C (870.1)
+#define N FP16_C (78.3)
+#define O FP16_C (-1.1)
+#define P FP16_C (47823)
+
+/* Expected results for vcalt.  */
+VECT_VAR_DECL (expected, uint, 16, 4) [] = {0x0, 0xFFFF, 0x0, 0x0};
+VECT_VAR_DECL (expected, uint, 16, 8) [] = {0x0, 0xFFFF, 0x0, 0x0,
+					    0xFFFF, 0xFFFF, 0x0, 0xFFFF};
+
+void
+exec_vcalt_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VCALT (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc_1, float, 16, 4);
+  DECL_VARIABLE (vsrc_2, float, 16, 4);
+  VECT_VAR_DECL (buf_src_1, float, 16, 4) [] = {A, B, C, D};
+  VECT_VAR_DECL (buf_src_2, float, 16, 4) [] = {E, F, G, H};
+  VLOAD (vsrc_1, buf_src_1, , float, f, 16, 4);
+  VLOAD (vsrc_2, buf_src_2, , float, f, 16, 4);
+  DECL_VARIABLE (vector_res, uint, 16, 4) =
+    vcalt_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+	       VECT_VAR (vsrc_2, float, 16, 4));
+  vst1_u16 (VECT_VAR (result, uint, 16, 4),
+	    VECT_VAR (vector_res, uint, 16, 4));
+
+  CHECK (TEST_MSG, uint, 16, 4, PRIx16, expected, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VCALTQ (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc_1, float, 16, 8);
+  DECL_VARIABLE (vsrc_2, float, 16, 8);
+  VECT_VAR_DECL (buf_src_1, float, 16, 8) [] = {A, B, C, D, I, J, K, L};
+  VECT_VAR_DECL (buf_src_2, float, 16, 8) [] = {E, F, G, H, M, N, O, P};
+  VLOAD (vsrc_1, buf_src_1, q, float, f, 16, 8);
+  VLOAD (vsrc_2, buf_src_2, q, float, f, 16, 8);
+  DECL_VARIABLE (vector_res, uint, 16, 8) =
+    vcaltq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		VECT_VAR (vsrc_2, float, 16, 8));
+  vst1q_u16 (VECT_VAR (result, uint, 16, 8),
+	     VECT_VAR (vector_res, uint, 16, 8));
+
+  CHECK (TEST_MSG, uint, 16, 8, PRIx16, expected, "");
+}
+
+int
+main (void)
+{
+  exec_vcalt_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vceq_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vceq_f16_1.c
new file mode 100644
index 0000000..1507299
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vceq_f16_1.c
@@ -0,0 +1,76 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (123.4)
+#define B FP16_C (-5.8)
+#define C FP16_C (3.8)
+#define D FP16_C (1024)
+#define E FP16_C (-123.4)
+#define F FP16_C (169.1)
+#define G FP16_C (3.8)
+#define H FP16_C (-77)
+
+#define I FP16_C (0.7)
+#define J FP16_C (-78)
+#define K FP16_C (101.23)
+#define L FP16_C (98)
+#define M FP16_C (870.1)
+#define N FP16_C (-78)
+#define O FP16_C (-1.1)
+#define P FP16_C (47823)
+
+/* Expected results for vceq.  */
+VECT_VAR_DECL (expected, uint, 16, 4) [] = {0x0, 0x0, 0xFFFF, 0x0};
+VECT_VAR_DECL (expected, uint, 16, 8) [] = {0x0, 0x0, 0xFFFF, 0x0,
+					    0x0, 0xFFFF, 0x0, 0x0};
+void
+exec_vceq_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VCEQ (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc_1, float, 16, 4);
+  DECL_VARIABLE (vsrc_2, float, 16, 4);
+  VECT_VAR_DECL (buf_src_1, float, 16, 4) [] = {A, B, C, D};
+  VECT_VAR_DECL (buf_src_2, float, 16, 4) [] = {E, F, G, H};
+  VLOAD (vsrc_1, buf_src_1, , float, f, 16, 4);
+  VLOAD (vsrc_2, buf_src_2, , float, f, 16, 4);
+  DECL_VARIABLE (vector_res, uint, 16, 4) =
+    vceq_f16 (VECT_VAR (vsrc_1, float, 16, 4), VECT_VAR (vsrc_2, float, 16, 4));
+  vst1_u16 (VECT_VAR (result, uint, 16, 4),
+	    VECT_VAR (vector_res, uint, 16, 4));
+
+  CHECK (TEST_MSG, uint, 16, 4, PRIx16, expected, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VCEQQ (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc_1, float, 16, 8);
+  DECL_VARIABLE (vsrc_2, float, 16, 8);
+  VECT_VAR_DECL (buf_src_1, float, 16, 8) [] = {A, B, C, D, I, J, K, L};
+  VECT_VAR_DECL (buf_src_2, float, 16, 8) [] = {E, F, G, H, M, N, O, P};
+  VLOAD (vsrc_1, buf_src_1, q, float, f, 16, 8);
+  VLOAD (vsrc_2, buf_src_2, q, float, f, 16, 8);
+  DECL_VARIABLE (vector_res, uint, 16, 8) =
+    vceqq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		VECT_VAR (vsrc_2, float, 16, 8));
+  vst1q_u16 (VECT_VAR (result, uint, 16, 8),
+	     VECT_VAR (vector_res, uint, 16, 8));
+
+  CHECK (TEST_MSG, uint, 16, 8, PRIx16, expected, "");
+}
+
+int
+main (void)
+{
+  exec_vceq_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vceqz_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vceqz_f16_1.c
new file mode 100644
index 0000000..9c43dd1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vceqz_f16_1.c
@@ -0,0 +1,61 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (123.4)
+#define B FP16_C (0)
+#define C FP16_C (-34.8)
+#define D FP16_C (0)
+#define E FP16_C (0)
+#define F FP16_C (169.1)
+#define G FP16_C (-4.8)
+#define H FP16_C (77)
+
+/* Expected results for vceqz_f16  */
+VECT_VAR_DECL (expected, uint, 16, 4) [] = {0x0, 0xFFFF, 0x0, 0xFFFF};
+VECT_VAR_DECL (expected, uint, 16, 8) [] = {0x0, 0xFFFF, 0x0, 0xFFFF,
+					    0xFFFF, 0x0, 0x0, 0x0};
+
+void
+exec_vceqz_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VCEQZ (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc, float, 16, 4);
+  VECT_VAR_DECL (buf_src, float, 16, 4) [] = {A, B, C, D};
+  VLOAD (vsrc, buf_src, , float, f, 16, 4);
+  DECL_VARIABLE (vector_res, uint, 16, 4) =
+    vceqz_f16 (VECT_VAR (vsrc, float, 16, 4));
+  vst1_u16 (VECT_VAR (result, uint, 16, 4),
+	    VECT_VAR (vector_res, uint, 16, 4));
+
+  CHECK (TEST_MSG, uint, 16, 4, PRIx16, expected, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VCEQZQ (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc, float, 16, 8);
+  VECT_VAR_DECL (buf_src, float, 16, 8) [] = {A, B, C, D, E, F, G, H};
+  VLOAD (vsrc, buf_src, q, float, f, 16, 8);
+  DECL_VARIABLE (vector_res, uint, 16, 8) =
+    vceqzq_f16 (VECT_VAR (vsrc, float, 16, 8));
+  vst1q_u16 (VECT_VAR (result, uint, 16, 8),
+	     VECT_VAR (vector_res, uint, 16, 8));
+
+  CHECK (TEST_MSG, uint, 16, 8, PRIx16, expected, "");
+}
+
+int
+main (void)
+{
+  exec_vceqz_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcge_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcge_f16_1.c
new file mode 100644
index 0000000..f59146a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcge_f16_1.c
@@ -0,0 +1,76 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (123.4)
+#define B FP16_C (-5.8)
+#define C FP16_C (3.8)
+#define D FP16_C (1024)
+#define E FP16_C (-123.4)
+#define F FP16_C (169.1)
+#define G FP16_C (3.8)
+#define H FP16_C (-77)
+
+#define I FP16_C (0.7)
+#define J FP16_C (-78)
+#define K FP16_C (101.23)
+#define L FP16_C (98)
+#define M FP16_C (870.1)
+#define N FP16_C (-78)
+#define O FP16_C (-1.1)
+#define P FP16_C (47823)
+
+/* Expected results for vcge.  */
+VECT_VAR_DECL (expected, uint, 16, 4) [] = {0xFFFF, 0x0, 0xFFFF, 0xFFFF};
+VECT_VAR_DECL (expected, uint, 16, 8) [] = {0xFFFF, 0x0, 0xFFFF, 0xFFFF,
+					    0x0, 0xFFFF, 0xFFFF, 0x0};
+void
+exec_vcge_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VCGE (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc_1, float, 16, 4);
+  DECL_VARIABLE (vsrc_2, float, 16, 4);
+  VECT_VAR_DECL (buf_src_1, float, 16, 4) [] = {A, B, C, D};
+  VECT_VAR_DECL (buf_src_2, float, 16, 4) [] = {E, F, G, H};
+  VLOAD (vsrc_1, buf_src_1, , float, f, 16, 4);
+  VLOAD (vsrc_2, buf_src_2, , float, f, 16, 4);
+  DECL_VARIABLE (vector_res, uint, 16, 4) =
+    vcge_f16 (VECT_VAR (vsrc_1, float, 16, 4), VECT_VAR (vsrc_2, float, 16, 4));
+  vst1_u16 (VECT_VAR (result, uint, 16, 4),
+	    VECT_VAR (vector_res, uint, 16, 4));
+
+  CHECK (TEST_MSG, uint, 16, 4, PRIx16, expected, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VCGEQ (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc_1, float, 16, 8);
+  DECL_VARIABLE (vsrc_2, float, 16, 8);
+  VECT_VAR_DECL (buf_src_1, float, 16, 8) [] = {A, B, C, D, I, J, K, L};
+  VECT_VAR_DECL (buf_src_2, float, 16, 8) [] = {E, F, G, H, M, N, O, P};
+  VLOAD (vsrc_1, buf_src_1, q, float, f, 16, 8);
+  VLOAD (vsrc_2, buf_src_2, q, float, f, 16, 8);
+  DECL_VARIABLE (vector_res, uint, 16, 8) =
+    vcgeq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+	       VECT_VAR (vsrc_2, float, 16, 8));
+  vst1q_u16 (VECT_VAR (result, uint, 16, 8),
+	     VECT_VAR (vector_res, uint, 16, 8));
+
+  CHECK (TEST_MSG, uint, 16, 8, PRIx16, expected, "");
+}
+
+int
+main (void)
+{
+  exec_vcge_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcgez_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcgez_f16_1.c
new file mode 100644
index 0000000..725cb62
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcgez_f16_1.c
@@ -0,0 +1,61 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (123.4)
+#define B FP16_C (0)
+#define C FP16_C (-34.8)
+#define D FP16_C (0)
+#define E FP16_C (0)
+#define F FP16_C (169.1)
+#define G FP16_C (-4.8)
+#define H FP16_C (77)
+
+/* Expected results for vcgez_f16  */
+VECT_VAR_DECL (expected, uint, 16, 4) [] = {0xFFFF, 0xFFFF, 0x0, 0xFFFF};
+VECT_VAR_DECL (expected, uint, 16, 8) [] = {0xFFFF, 0xFFFF, 0x0, 0xFFFF,
+					    0xFFFF, 0xFFFF, 0x0, 0xFFFF};
+
+void
+exec_vcgez_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VCGEZ (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc, float, 16, 4);
+  VECT_VAR_DECL (buf_src, float, 16, 4) [] = {A, B, C, D};
+  VLOAD (vsrc, buf_src, , float, f, 16, 4);
+  DECL_VARIABLE (vector_res, uint, 16, 4) =
+    vcgez_f16 (VECT_VAR (vsrc, float, 16, 4));
+  vst1_u16 (VECT_VAR (result, uint, 16, 4),
+	    VECT_VAR (vector_res, uint, 16, 4));
+
+  CHECK (TEST_MSG, uint, 16, 4, PRIx16, expected, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VCGEZQ (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc, float, 16, 8);
+  VECT_VAR_DECL (buf_src, float, 16, 8) [] = {A, B, C, D, E, F, G, H};
+  VLOAD (vsrc, buf_src, q, float, f, 16, 8);
+  DECL_VARIABLE (vector_res, uint, 16, 8) =
+    vcgezq_f16 (VECT_VAR (vsrc, float, 16, 8));
+  vst1q_u16 (VECT_VAR (result, uint, 16, 8),
+	     VECT_VAR (vector_res, uint, 16, 8));
+
+  CHECK (TEST_MSG, uint, 16, 8, PRIx16, expected, "");
+}
+
+int
+main (void)
+{
+  exec_vcgez_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcgt_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcgt_f16_1.c
new file mode 100644
index 0000000..7ec0fda
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcgt_f16_1.c
@@ -0,0 +1,76 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (123.4)
+#define B FP16_C (-5.8)
+#define C FP16_C (3.8)
+#define D FP16_C (1024)
+#define E FP16_C (-123.4)
+#define F FP16_C (169.1)
+#define G FP16_C (3.8)
+#define H FP16_C (-77)
+
+#define I FP16_C (0.7)
+#define J FP16_C (-78)
+#define K FP16_C (101.23)
+#define L FP16_C (98)
+#define M FP16_C (870.1)
+#define N FP16_C (-78)
+#define O FP16_C (-1.1)
+#define P FP16_C (47823)
+
+/* Expected results for vcgt.  */
+VECT_VAR_DECL (expected, uint, 16, 4) [] = {0xFFFF, 0x0, 0x0, 0xFFFF};
+VECT_VAR_DECL (expected, uint, 16, 8) [] = {0xFFFF, 0x0, 0x0, 0xFFFF,
+					    0x0, 0x0, 0xFFFF, 0x0};
+void
+exec_vcgt_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VCGT (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc_1, float, 16, 4);
+  DECL_VARIABLE (vsrc_2, float, 16, 4);
+  VECT_VAR_DECL (buf_src_1, float, 16, 4) [] = {A, B, C, D};
+  VECT_VAR_DECL (buf_src_2, float, 16, 4) [] = {E, F, G, H};
+  VLOAD (vsrc_1, buf_src_1, , float, f, 16, 4);
+  VLOAD (vsrc_2, buf_src_2, , float, f, 16, 4);
+  DECL_VARIABLE (vector_res, uint, 16, 4) =
+    vcgt_f16 (VECT_VAR (vsrc_1, float, 16, 4), VECT_VAR (vsrc_2, float, 16, 4));
+  vst1_u16 (VECT_VAR (result, uint, 16, 4),
+	    VECT_VAR (vector_res, uint, 16, 4));
+
+  CHECK (TEST_MSG, uint, 16, 4, PRIx16, expected, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VCGTQ (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc_1, float, 16, 8);
+  DECL_VARIABLE (vsrc_2, float, 16, 8);
+  VECT_VAR_DECL (buf_src_1, float, 16, 8) [] = {A, B, C, D, I, J, K, L};
+  VECT_VAR_DECL (buf_src_2, float, 16, 8) [] = {E, F, G, H, M, N, O, P};
+  VLOAD (vsrc_1, buf_src_1, q, float, f, 16, 8);
+  VLOAD (vsrc_2, buf_src_2, q, float, f, 16, 8);
+  DECL_VARIABLE (vector_res, uint, 16, 8) =
+    vcgtq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+	       VECT_VAR (vsrc_2, float, 16, 8));
+  vst1q_u16 (VECT_VAR (result, uint, 16, 8),
+	     VECT_VAR (vector_res, uint, 16, 8));
+
+  CHECK (TEST_MSG, uint, 16, 8, PRIx16, expected, "");
+}
+
+int
+main (void)
+{
+  exec_vcgt_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcgtz_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcgtz_f16_1.c
new file mode 100644
index 0000000..1864b70
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcgtz_f16_1.c
@@ -0,0 +1,61 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (123.4)
+#define B FP16_C (0)
+#define C FP16_C (-34.8)
+#define D FP16_C (0)
+#define E FP16_C (0)
+#define F FP16_C (169.1)
+#define G FP16_C (-4.8)
+#define H FP16_C (77)
+
+/* Expected results for vcgtz_f16  */
+VECT_VAR_DECL (expected, uint, 16, 4) [] = {0xFFFF, 0x0, 0x0, 0x0};
+VECT_VAR_DECL (expected, uint, 16, 8) [] = {0xFFFF, 0x0, 0x0, 0x0,
+					    0x0, 0xFFFF, 0x0, 0xFFFF};
+
+void
+exec_vcgtz_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VCGTZ (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc, float, 16, 4);
+  VECT_VAR_DECL (buf_src, float, 16, 4) [] = {A, B, C, D};
+  VLOAD (vsrc, buf_src, , float, f, 16, 4);
+  DECL_VARIABLE (vector_res, uint, 16, 4) =
+    vcgtz_f16 (VECT_VAR (vsrc, float, 16, 4));
+  vst1_u16 (VECT_VAR (result, uint, 16, 4),
+	    VECT_VAR (vector_res, uint, 16, 4));
+
+  CHECK (TEST_MSG, uint, 16, 4, PRIx16, expected, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VCGTZQ (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc, float, 16, 8);
+  VECT_VAR_DECL (buf_src, float, 16, 8) [] = {A, B, C, D, E, F, G, H};
+  VLOAD (vsrc, buf_src, q, float, f, 16, 8);
+  DECL_VARIABLE (vector_res, uint, 16, 8) =
+    vcgtzq_f16 (VECT_VAR (vsrc, float, 16, 8));
+  vst1q_u16 (VECT_VAR (result, uint, 16, 8),
+	     VECT_VAR (vector_res, uint, 16, 8));
+
+  CHECK (TEST_MSG, uint, 16, 8, PRIx16, expected, "");
+}
+
+int
+main (void)
+{
+  exec_vcgtz_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcle_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcle_f16_1.c
new file mode 100644
index 0000000..92b1be4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcle_f16_1.c
@@ -0,0 +1,76 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (123.4)
+#define B FP16_C (-5.8)
+#define C FP16_C (3.8)
+#define D FP16_C (1024)
+#define E FP16_C (-123.4)
+#define F FP16_C (169.1)
+#define G FP16_C (3.8)
+#define H FP16_C (-77)
+
+#define I FP16_C (0.7)
+#define J FP16_C (-78)
+#define K FP16_C (101.23)
+#define L FP16_C (98)
+#define M FP16_C (870.1)
+#define N FP16_C (-78)
+#define O FP16_C (-1.1)
+#define P FP16_C (47823)
+
+/* Expected results for vcle.  */
+VECT_VAR_DECL (expected, uint, 16, 4) [] = {0x0, 0xFFFF, 0xFFFF, 0x0};
+VECT_VAR_DECL (expected, uint, 16, 8) [] = {0x0, 0xFFFF, 0xFFFF, 0x0,
+					    0xFFFF, 0xFFFF, 0x0, 0xFFFF};
+void
+exec_vcle_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VCLE (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc_1, float, 16, 4);
+  DECL_VARIABLE (vsrc_2, float, 16, 4);
+  VECT_VAR_DECL (buf_src_1, float, 16, 4) [] = {A, B, C, D};
+  VECT_VAR_DECL (buf_src_2, float, 16, 4) [] = {E, F, G, H};
+  VLOAD (vsrc_1, buf_src_1, , float, f, 16, 4);
+  VLOAD (vsrc_2, buf_src_2, , float, f, 16, 4);
+  DECL_VARIABLE (vector_res, uint, 16, 4) =
+    vcle_f16 (VECT_VAR (vsrc_1, float, 16, 4), VECT_VAR (vsrc_2, float, 16, 4));
+  vst1_u16 (VECT_VAR (result, uint, 16, 4),
+	    VECT_VAR (vector_res, uint, 16, 4));
+
+  CHECK (TEST_MSG, uint, 16, 4, PRIx16, expected, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VCLEQ (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc_1, float, 16, 8);
+  DECL_VARIABLE (vsrc_2, float, 16, 8);
+  VECT_VAR_DECL (buf_src_1, float, 16, 8) [] = {A, B, C, D, I, J, K, L};
+  VECT_VAR_DECL (buf_src_2, float, 16, 8) [] = {E, F, G, H, M, N, O, P};
+  VLOAD (vsrc_1, buf_src_1, q, float, f, 16, 8);
+  VLOAD (vsrc_2, buf_src_2, q, float, f, 16, 8);
+  DECL_VARIABLE (vector_res, uint, 16, 8) =
+    vcleq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+	       VECT_VAR (vsrc_2, float, 16, 8));
+  vst1q_u16 (VECT_VAR (result, uint, 16, 8),
+	     VECT_VAR (vector_res, uint, 16, 8));
+
+  CHECK (TEST_MSG, uint, 16, 8, PRIx16, expected, "");
+}
+
+int
+main (void)
+{
+  exec_vcle_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vclez_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vclez_f16_1.c
new file mode 100644
index 0000000..c84f0a6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vclez_f16_1.c
@@ -0,0 +1,61 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (123.4)
+#define B FP16_C (0)
+#define C FP16_C (-34.8)
+#define D FP16_C (0)
+#define E FP16_C (0)
+#define F FP16_C (169.1)
+#define G FP16_C (-4.8)
+#define H FP16_C (77)
+
+/* Expected results for vclez_f16  */
+VECT_VAR_DECL (expected, uint, 16, 4) [] = {0x0, 0xFFFF, 0xFFFF, 0xFFFF};
+VECT_VAR_DECL (expected, uint, 16, 8) [] = {0x0, 0xFFFF, 0xFFFF, 0xFFFF,
+					    0xFFFF, 0x0, 0xFFFF, 0x0};
+
+void
+exec_vclez_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VCLEZ (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc, float, 16, 4);
+  VECT_VAR_DECL (buf_src, float, 16, 4) [] = {A, B, C, D};
+  VLOAD (vsrc, buf_src, , float, f, 16, 4);
+  DECL_VARIABLE (vector_res, uint, 16, 4) =
+    vclez_f16 (VECT_VAR (vsrc, float, 16, 4));
+  vst1_u16 (VECT_VAR (result, uint, 16, 4),
+	    VECT_VAR (vector_res, uint, 16, 4));
+
+  CHECK (TEST_MSG, uint, 16, 4, PRIx16, expected, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VCLEZQ (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc, float, 16, 8);
+  VECT_VAR_DECL (buf_src, float, 16, 8) [] = {A, B, C, D, E, F, G, H};
+  VLOAD (vsrc, buf_src, q, float, f, 16, 8);
+  DECL_VARIABLE (vector_res, uint, 16, 8) =
+    vclezq_f16 (VECT_VAR (vsrc, float, 16, 8));
+  vst1q_u16 (VECT_VAR (result, uint, 16, 8),
+	     VECT_VAR (vector_res, uint, 16, 8));
+
+  CHECK (TEST_MSG, uint, 16, 8, PRIx16, expected, "");
+}
+
+int
+main (void)
+{
+  exec_vclez_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vclt_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vclt_f16_1.c
new file mode 100644
index 0000000..6a17c3d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vclt_f16_1.c
@@ -0,0 +1,77 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (123.4)
+#define B FP16_C (-5.8)
+#define C FP16_C (3.8)
+#define D FP16_C (1024)
+#define E FP16_C (-123.4)
+#define F FP16_C (169.1)
+#define G FP16_C (3.8)
+#define H FP16_C (-77)
+
+#define I FP16_C (0.7)
+#define J FP16_C (-78)
+#define K FP16_C (101.23)
+#define L FP16_C (98)
+#define M FP16_C (870.1)
+#define N FP16_C (-78)
+#define O FP16_C (-1.1)
+#define P FP16_C (47823)
+
+/* Expected results for vclt.  */
+VECT_VAR_DECL (expected, uint, 16, 4) [] = {0x0, 0xFFFF, 0x0, 0x0};
+VECT_VAR_DECL (expected, uint, 16, 8) [] = {0x0, 0xFFFF, 0x0, 0x0,
+					    0xFFFF, 0x0, 0x0, 0xFFFF};
+
+void
+exec_vclt_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VCLT (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc_1, float, 16, 4);
+  DECL_VARIABLE (vsrc_2, float, 16, 4);
+  VECT_VAR_DECL (buf_src_1, float, 16, 4) [] = {A, B, C, D};
+  VECT_VAR_DECL (buf_src_2, float, 16, 4) [] = {E, F, G, H};
+  VLOAD (vsrc_1, buf_src_1, , float, f, 16, 4);
+  VLOAD (vsrc_2, buf_src_2, , float, f, 16, 4);
+  DECL_VARIABLE (vector_res, uint, 16, 4) =
+    vclt_f16 (VECT_VAR (vsrc_1, float, 16, 4), VECT_VAR (vsrc_2, float, 16, 4));
+  vst1_u16 (VECT_VAR (result, uint, 16, 4),
+	    VECT_VAR (vector_res, uint, 16, 4));
+
+  CHECK (TEST_MSG, uint, 16, 4, PRIx16, expected, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VCLTQ (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc_1, float, 16, 8);
+  DECL_VARIABLE (vsrc_2, float, 16, 8);
+  VECT_VAR_DECL (buf_src_1, float, 16, 8) [] = {A, B, C, D, I, J, K, L};
+  VECT_VAR_DECL (buf_src_2, float, 16, 8) [] = {E, F, G, H, M, N, O, P};
+  VLOAD (vsrc_1, buf_src_1, q, float, f, 16, 8);
+  VLOAD (vsrc_2, buf_src_2, q, float, f, 16, 8);
+  DECL_VARIABLE (vector_res, uint, 16, 8) =
+    vcltq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+	       VECT_VAR (vsrc_2, float, 16, 8));
+  vst1q_u16 (VECT_VAR (result, uint, 16, 8),
+	     VECT_VAR (vector_res, uint, 16, 8));
+
+  CHECK (TEST_MSG, uint, 16, 8, PRIx16, expected, "");
+}
+
+int
+main (void)
+{
+  exec_vclt_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcltz_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcltz_f16_1.c
new file mode 100644
index 0000000..d9e414b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcltz_f16_1.c
@@ -0,0 +1,61 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (123.4)
+#define B FP16_C (0)
+#define C FP16_C (-34.8)
+#define D FP16_C (0)
+#define E FP16_C (0)
+#define F FP16_C (169.1)
+#define G FP16_C (-4.8)
+#define H FP16_C (77)
+
+/* Expected results for vcltz_f16  */
+VECT_VAR_DECL (expected, uint, 16, 4) [] = {0x0, 0x0, 0xFFFF, 0x0};
+VECT_VAR_DECL (expected, uint, 16, 8) [] = {0x0, 0x0, 0xFFFF, 0x0,
+					    0x0, 0x0, 0xFFFF, 0x0};
+
+void
+exec_vcltz_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VCLTZ (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc, float, 16, 4);
+  VECT_VAR_DECL (buf_src, float, 16, 4) [] = {A, B, C, D};
+  VLOAD (vsrc, buf_src, , float, f, 16, 4);
+  DECL_VARIABLE (vector_res, uint, 16, 4) =
+    vcltz_f16 (VECT_VAR (vsrc, float, 16, 4));
+  vst1_u16 (VECT_VAR (result, uint, 16, 4),
+	    VECT_VAR (vector_res, uint, 16, 4));
+
+  CHECK (TEST_MSG, uint, 16, 4, PRIx16, expected, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VCLTZQ (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc, float, 16, 8);
+  VECT_VAR_DECL (buf_src, float, 16, 8) [] = {A, B, C, D, E, F, G, H};
+  VLOAD (vsrc, buf_src, q, float, f, 16, 8);
+  DECL_VARIABLE (vector_res, uint, 16, 8) =
+    vcltzq_f16 (VECT_VAR (vsrc, float, 16, 8));
+  vst1q_u16 (VECT_VAR (result, uint, 16, 8),
+	     VECT_VAR (vector_res, uint, 16, 8));
+
+  CHECK (TEST_MSG, uint, 16, 8, PRIx16, expected, "");
+}
+
+int
+main (void)
+{
+  exec_vcltz_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_f16_s16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_f16_s16_1.c
new file mode 100644
index 0000000..ccdaa4e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_f16_s16_1.c
@@ -0,0 +1,70 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FP16(a) ((__fp16) (a))
+#define SHORT(a) ((short) (a))
+#define A SHORT(123)
+#define B SHORT(-567)
+#define C SHORT(-34)
+#define D SHORT(1024)
+#define E SHORT(663)
+#define F SHORT(169)
+#define G SHORT(-4)
+#define H SHORT(77)
+
+/* Expected results for vcvt.f16.s16.  */
+VECT_VAR_DECL (expected, float, 16, 4) [] = {FP16 (A), FP16 (B),
+					     FP16 (C), FP16 (D)};
+hfloat16_t * VECT_VAR (expected_static, hfloat, 16, 4) =
+  (hfloat16_t *) VECT_VAR (expected, float, 16, 4);
+
+VECT_VAR_DECL (expected, float, 16, 8) [] = {FP16 (A), FP16 (B),
+					     FP16 (C), FP16 (D),
+					     FP16 (E), FP16 (F),
+					     FP16 (G), FP16 (H)};
+hfloat16_t * VECT_VAR (expected_static, hfloat, 16, 8) =
+  (hfloat16_t *) VECT_VAR (expected, float, 16, 8);
+
+void
+exec_vcvtf16s16_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VCVT (F16 <- S16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc, int, 16, 4);
+  VECT_VAR_DECL (buf_src, int, 16, 4) [] = {A, B, C, D};
+  VLOAD (vsrc, buf_src, , int, s, 16, 4);
+  DECL_VARIABLE (vector_res, float, 16, 4) =
+    vcvt_f16_s16 (VECT_VAR (vsrc, int, 16, 4));
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_static, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VCVTQ (F16 <- S16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc, int, 16, 8);
+  VECT_VAR_DECL (buf_src, int, 16, 8) [] = {A, B, C, D, E, F, G, H};
+  VLOAD (vsrc, buf_src, q, int, s, 16, 8);
+  DECL_VARIABLE (vector_res, float, 16, 8) =
+    vcvtq_f16_s16 (VECT_VAR (vsrc, int, 16, 8));
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_static, "");
+}
+
+int
+main (void)
+{
+  exec_vcvtf16s16_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_f16_u16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_f16_u16_1.c
new file mode 100644
index 0000000..5de5554
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_f16_u16_1.c
@@ -0,0 +1,70 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FP16(a) ((__fp16) (a))
+#define USHORT(a) ((unsigned short) (a))
+#define A USHORT(123)
+#define B USHORT(-567)
+#define C USHORT(-34)
+#define D USHORT(1024)
+#define E USHORT(663)
+#define F USHORT(169)
+#define G USHORT(-4)
+#define H USHORT(77)
+
+/* Expected results for vcvt.f16.u16.  */
+VECT_VAR_DECL (expected, float, 16, 4) [] = {FP16 (A), FP16 (B),
+					     FP16 (C), FP16 (D)};
+hfloat16_t * VECT_VAR (expected_static, hfloat, 16, 4) =
+  (hfloat16_t *) VECT_VAR (expected, float, 16, 4);
+
+VECT_VAR_DECL (expected, float, 16, 8) [] = {FP16 (A), FP16 (B),
+					     FP16 (C), FP16 (D),
+					     FP16 (E), FP16 (F),
+					     FP16 (G), FP16 (H)};
+hfloat16_t * VECT_VAR (expected_static, hfloat, 16, 8) =
+  (hfloat16_t *) VECT_VAR (expected, float, 16, 8);
+
+void
+exec_vcvtf16u16_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VCVT (F16 <- U16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc, uint, 16, 4);
+  VECT_VAR_DECL (buf_src, uint, 16, 4) [] = {A, B, C, D};
+  VLOAD (vsrc, buf_src, , uint, u, 16, 4);
+  DECL_VARIABLE (vector_res, float, 16, 4) =
+    vcvt_f16_u16 (VECT_VAR (vsrc, uint, 16, 4));
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_static, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VCVTQ (F16 <- U16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc, uint, 16, 8);
+  VECT_VAR_DECL (buf_src, uint, 16, 8) [] = {A, B, C, D, E, F, G, H};
+  VLOAD (vsrc, buf_src, q, uint, u, 16, 8);
+  DECL_VARIABLE (vector_res, float, 16, 8) =
+    vcvtq_f16_u16 (VECT_VAR (vsrc, uint, 16, 8));
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_static, "");
+}
+
+int
+main (void)
+{
+  exec_vcvtf16u16_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_n_f16_s16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_n_f16_s16_1.c
new file mode 100644
index 0000000..373d267
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_n_f16_s16_1.c
@@ -0,0 +1,73 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FRAC_1 1
+#define FRAC_2 2
+
+#define FP16(a) ((__fp16) (a))
+#define SHORT(a) ((short) (a))
+#define A SHORT(1)
+#define B SHORT(10)
+#define C SHORT(48)
+#define D SHORT(100)
+#define E SHORT(-1)
+#define F SHORT(-10)
+#define G SHORT(7)
+#define H SHORT(-7)
+
+/* Expected results for vcvt (fixed).f16.s16.  */
+VECT_VAR_DECL (expected, float, 16, 4) [] = {FP16 (0.5), FP16 (5),
+					     FP16 (24), FP16 (50)};
+hfloat16_t * VECT_VAR (expected_static, hfloat, 16, 4) =
+  (hfloat16_t *) VECT_VAR (expected, float, 16, 4);
+
+VECT_VAR_DECL (expected, float, 16, 8) [] = {FP16 (0.25), FP16 (2.5),
+					     FP16 (12), FP16 (25),
+					     FP16 (-0.25), FP16 (-2.5),
+					     FP16 (1.75), FP16 (-1.75)};
+hfloat16_t * VECT_VAR (expected_static, hfloat, 16, 8) =
+  (hfloat16_t *) VECT_VAR (expected, float, 16, 8);
+
+void
+exec_vcvt_n_f16s16_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VCVT Fixed (F16 <- S16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc, int, 16, 4);
+  VECT_VAR_DECL (buf_src, int, 16, 4) [] = {A, B, C, D};
+  VLOAD (vsrc, buf_src, , int, s, 16, 4);
+  DECL_VARIABLE (vector_res, float, 16, 4) =
+    vcvt_n_f16_s16 (VECT_VAR (vsrc, int, 16, 4), FRAC_1);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_static, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VCVTQ Fixed (F16 <- S16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc, int, 16, 8);
+  VECT_VAR_DECL (buf_src, int, 16, 8) [] = {A, B, C, D, E, F, G, H};
+  VLOAD (vsrc, buf_src, q, int, s, 16, 8);
+  DECL_VARIABLE (vector_res, float, 16, 8) =
+    vcvtq_n_f16_s16 (VECT_VAR (vsrc, int, 16, 8), FRAC_2);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_static, "");
+}
+
+int
+main (void)
+{
+  exec_vcvt_n_f16s16_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_n_f16_u16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_n_f16_u16_1.c
new file mode 100644
index 0000000..fa15f4e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_n_f16_u16_1.c
@@ -0,0 +1,73 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FRAC_1 1
+#define FRAC_2 2
+
+#define FP16(a) ((__fp16) (a))
+#define SHORT(a) ((short) (a))
+#define A SHORT(1)
+#define B SHORT(3)
+#define C SHORT(48)
+#define D SHORT(100)
+#define E SHORT(1000)
+#define F SHORT(4)
+#define G SHORT(0)
+#define H SHORT(9)
+
+/* Expected results for vcvt (fixed).f16.u16.  */
+VECT_VAR_DECL (expected, float, 16, 4) [] = {FP16 (0.5), FP16 (1.5),
+					     FP16 (24), FP16 (50)};
+hfloat16_t * VECT_VAR (expected_static, hfloat, 16, 4) =
+  (hfloat16_t *) VECT_VAR (expected, float, 16, 4);
+
+VECT_VAR_DECL (expected, float, 16, 8) [] = {FP16 (0.25), FP16 (0.75),
+					     FP16 (12), FP16 (25),
+					     FP16 (250), FP16 (1),
+					     FP16 (0), FP16 (2.25)};
+hfloat16_t * VECT_VAR (expected_static, hfloat, 16, 8) =
+  (hfloat16_t *) VECT_VAR (expected, float, 16, 8);
+
+void
+exec_vcvt_n_f16u16_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VCVT Fixed (F16 <- U16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc, uint, 16, 4);
+  VECT_VAR_DECL (buf_src, uint, 16, 4) [] = {A, B, C, D};
+  VLOAD (vsrc, buf_src, , uint, u, 16, 4);
+  DECL_VARIABLE (vector_res, float, 16, 4) =
+    vcvt_n_f16_u16 (VECT_VAR (vsrc, uint, 16, 4), FRAC_1);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_static, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VCVTQ Fixed (F16 <- U16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc, uint, 16, 8);
+  VECT_VAR_DECL (buf_src, uint, 16, 8) [] = {A, B, C, D, E, F, G, H};
+  VLOAD (vsrc, buf_src, q, uint, u, 16, 8);
+  DECL_VARIABLE (vector_res, float, 16, 8) =
+    vcvtq_n_f16_u16 (VECT_VAR (vsrc, uint, 16, 8), FRAC_2);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_static, "");
+}
+
+int
+main (void)
+{
+  exec_vcvt_n_f16u16_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_n_s16_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_n_s16_f16_1.c
new file mode 100644
index 0000000..cf45635
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_n_s16_f16_1.c
@@ -0,0 +1,66 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FRAC_1 1
+#define FRAC_2 2
+
+#define SHORT(a) ((short) (a))
+#define FP16_C(a) ((__fp16) (a))
+#define A FP16_C (2.5)
+#define B FP16_C (100)
+#define C FP16_C (7.1)
+#define D FP16_C (-9.9)
+#define E FP16_C (-5.0)
+#define F FP16_C (9.1)
+#define G FP16_C (-4.8)
+#define H FP16_C (77)
+
+/* Expected results for vcvt (fixed).s16.f16.  */
+VECT_VAR_DECL (expected, int, 16, 4) [] = {5, 200, 14, -19};
+
+VECT_VAR_DECL (expected, int, 16, 8) [] = {10, 400, 28, -39,
+					   -20, 36, -19, 308};
+
+void
+exec_vcvt_n_s16f16_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VCVT Fixed (S16 <- F16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc, float, 16, 4);
+  VECT_VAR_DECL (buf_src, float, 16, 4) [] = {A, B, C, D};
+  VLOAD (vsrc, buf_src, , float, f, 16, 4);
+  DECL_VARIABLE (vector_res, int, 16, 4) =
+    vcvt_n_s16_f16 (VECT_VAR (vsrc, float, 16, 4), FRAC_1);
+  vst1_s16 (VECT_VAR (result, int, 16, 4),
+	    VECT_VAR (vector_res, int, 16, 4));
+
+  CHECK (TEST_MSG, int, 16, 4, PRIx16, expected, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VCVTQ Fixed (S16 <- F16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc, float, 16, 8);
+  VECT_VAR_DECL (buf_src, float, 16, 8) [] = {A, B, C, D, E, F, G, H};
+  VLOAD (vsrc, buf_src, q, float, f, 16, 8);
+  DECL_VARIABLE (vector_res, int, 16, 8) =
+    vcvtq_n_s16_f16 (VECT_VAR (vsrc, float, 16, 8), FRAC_2);
+  vst1q_s16 (VECT_VAR (result, int, 16, 8),
+	     VECT_VAR (vector_res, int, 16, 8));
+
+  CHECK (TEST_MSG, int, 16, 8, PRIx16, expected, "");
+}
+
+int
+main (void)
+{
+  exec_vcvt_n_s16f16_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_n_u16_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_n_u16_f16_1.c
new file mode 100644
index 0000000..e540f2a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_n_u16_f16_1.c
@@ -0,0 +1,67 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FRAC_1 1
+#define FRAC_2 2
+
+#define USHORT(a) ((unsigned short) (a))
+#define FP16_C(a) ((__fp16) (a))
+#define A FP16_C (2.5)
+#define B FP16_C (100)
+#define C FP16_C (7.1)
+#define D FP16_C (9.9)
+#define E FP16_C (5.0)
+#define F FP16_C (9.1)
+#define G FP16_C (4.8)
+#define H FP16_C (77)
+
+
+/* Expected results for vcvt (fixed).u16.f16.  */
+VECT_VAR_DECL (expected, uint, 16, 4) [] = {5, 200, 14, 19};
+
+VECT_VAR_DECL (expected, uint, 16, 8) [] = {10, 400, 28, 39,
+					    20, 36, 19, 308};
+
+void
+exec_vcvt_n_u16f16_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VCVT Fixed (U16 <- F16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc, float, 16, 4);
+  VECT_VAR_DECL (buf_src, float, 16, 4) [] = {A, B, C, D};
+  VLOAD (vsrc, buf_src, , float, f, 16, 4);
+  DECL_VARIABLE (vector_res, uint, 16, 4) =
+    vcvt_n_u16_f16 (VECT_VAR (vsrc, float, 16, 4), FRAC_1);
+  vst1_u16 (VECT_VAR (result, uint, 16, 4),
+	    VECT_VAR (vector_res, uint, 16, 4));
+
+  CHECK (TEST_MSG, uint, 16, 4, PRIx16, expected, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VCVTQ Fixed (U16 <- F16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc, float, 16, 8);
+  VECT_VAR_DECL (buf_src, float, 16, 8) [] = {A, B, C, D, E, F, G, H};
+  VLOAD (vsrc, buf_src, q, float, f, 16, 8);
+  DECL_VARIABLE (vector_res, uint, 16, 8) =
+    vcvtq_n_u16_f16 (VECT_VAR (vsrc, float, 16, 8), FRAC_2);
+  vst1q_u16 (VECT_VAR (result, uint, 16, 8),
+	     VECT_VAR (vector_res, uint, 16, 8));
+
+  CHECK (TEST_MSG, uint, 16, 8, PRIx16, expected, "");
+}
+
+int
+main (void)
+{
+  exec_vcvt_n_u16f16_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_s16_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_s16_f16_1.c
new file mode 100644
index 0000000..73ae138
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_s16_f16_1.c
@@ -0,0 +1,65 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define SHORT(a) ((short) (a))
+#define FP16_C(a) ((__fp16) (a))
+#define A FP16_C (123.4)
+#define B FP16_C (-567.8)
+#define C FP16_C (-34.8)
+#define D FP16_C (1024)
+#define E FP16_C (663.1)
+#define F FP16_C (169.1)
+#define G FP16_C (-4.8)
+#define H FP16_C (77)
+
+/* Expected results for vcvt.s16.f16.  */
+VECT_VAR_DECL (expected, int, 16, 4) [] = {SHORT (A), SHORT (B),
+					   SHORT (C), SHORT (D)};
+
+VECT_VAR_DECL (expected, int, 16, 8) [] = {SHORT (A), SHORT (B),
+					   SHORT (C), SHORT (D),
+					   SHORT (E), SHORT (F),
+					   SHORT (G), SHORT (H)};
+void
+exec_vcvts16f16_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VCVT (S16 <- F16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc, float, 16, 4);
+  VECT_VAR_DECL (buf_src, float, 16, 4) [] = {A, B, C, D};
+  VLOAD (vsrc, buf_src, , float, f, 16, 4);
+  DECL_VARIABLE (vector_res, int, 16, 4) =
+    vcvt_s16_f16 (VECT_VAR (vsrc, float, 16, 4));
+  vst1_s16 (VECT_VAR (result, int, 16, 4),
+	    VECT_VAR (vector_res, int, 16, 4));
+
+  CHECK (TEST_MSG, int, 16, 4, PRIx16, expected, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VCVTQ (S16 <- F16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc, float, 16, 8);
+  VECT_VAR_DECL (buf_src, float, 16, 8) [] = {A, B, C, D, E, F, G, H};
+  VLOAD (vsrc, buf_src, q, float, f, 16, 8);
+  DECL_VARIABLE (vector_res, int, 16, 8) =
+    vcvtq_s16_f16 (VECT_VAR (vsrc, float, 16, 8));
+  vst1q_s16 (VECT_VAR (result, int, 16, 8),
+	     VECT_VAR (vector_res, int, 16, 8));
+
+  CHECK (TEST_MSG, int, 16, 8, PRIx16, expected, "");
+}
+
+int
+main (void)
+{
+  exec_vcvts16f16_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_u16_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_u16_f16_1.c
new file mode 100644
index 0000000..5fa7e76
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_u16_f16_1.c
@@ -0,0 +1,65 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define USHORT(a) ((unsigned short) (a))
+#define FP16_C(a) ((__fp16) (a))
+#define A FP16_C (123.4)
+#define B FP16_C (-567.8)
+#define C FP16_C (-34.8)
+#define D FP16_C (1024)
+#define E FP16_C (663.1)
+#define F FP16_C (169.1)
+#define G FP16_C (-4.8)
+#define H FP16_C (77)
+
+/* Expected results for vcvt.u16.f16.  */
+VECT_VAR_DECL (expected, uint, 16, 4) [] = {USHORT (A), USHORT (B),
+					    USHORT (C), USHORT (D)};
+
+VECT_VAR_DECL (expected, uint, 16, 8) [] = {USHORT (A), USHORT (B),
+					    USHORT (C), USHORT (D),
+					    USHORT (E), USHORT (F),
+					    USHORT (G), USHORT (H)};
+void
+exec_vcvtu16f16_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VCVT (U16 <- F16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc, float, 16, 4);
+  VECT_VAR_DECL (buf_src, float, 16, 4) [] = {A, B, C, D};
+  VLOAD (vsrc, buf_src, , float, f, 16, 4);
+  DECL_VARIABLE (vector_res, uint, 16, 4) =
+    vcvt_u16_f16 (VECT_VAR (vsrc, float, 16, 4));
+  vst1_u16 (VECT_VAR (result, uint, 16, 4),
+	    VECT_VAR (vector_res, uint, 16, 4));
+
+  CHECK (TEST_MSG, uint, 16, 4, PRIx16, expected, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VCVTQ (U16 <- F16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc, float, 16, 8);
+  VECT_VAR_DECL (buf_src, float, 16, 8) [] = {A, B, C, D, E, F, G, H};
+  VLOAD (vsrc, buf_src, q, float, f, 16, 8);
+  DECL_VARIABLE (vector_res, uint, 16, 8) =
+    vcvtq_u16_f16 (VECT_VAR (vsrc, float, 16, 8));
+  vst1q_u16 (VECT_VAR (result, uint, 16, 8),
+	     VECT_VAR (vector_res, uint, 16, 8));
+
+  CHECK (TEST_MSG, uint, 16, 8, PRIx16, expected, "");
+}
+
+int
+main (void)
+{
+  exec_vcvtu16f16_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvta_s16_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvta_s16_f16_1.c
new file mode 100644
index 0000000..771c4cd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvta_s16_f16_1.c
@@ -0,0 +1,70 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define SHORT(a) ((short) (a))
+#define FP16_C(a) ((__fp16) (a))
+#define A FP16_C (123.4)
+#define CVTA_A SHORT (123)
+#define B FP16_C (-567.8)
+#define CVTA_B SHORT (-568)
+#define C FP16_C (-34.5)
+#define CVTA_C SHORT (-35)
+#define D FP16_C (1024)
+#define CVTA_D SHORT (1024)
+#define E FP16_C (663.1)
+#define CVTA_E SHORT (663)
+#define F FP16_C (169.5)
+#define CVTA_F SHORT (170)
+#define G FP16_C (-4.8)
+#define CVTA_G SHORT (-5)
+#define H FP16_C (77)
+#define CVTA_H SHORT (77)
+
+/* Expected results for vcvta.s16.f16.  */
+VECT_VAR_DECL (expected, int, 16, 4) [] = {CVTA_A, CVTA_B, CVTA_C, CVTA_D};
+
+VECT_VAR_DECL (expected, int, 16, 8) [] = {CVTA_A, CVTA_B, CVTA_C, CVTA_D,
+					   CVTA_E, CVTA_F, CVTA_G, CVTA_H};
+void
+exec_vcvtas16f16_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VCVTA (S16 <- F16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc, float, 16, 4);
+  VECT_VAR_DECL (buf_src, float, 16, 4) [] = {A, B, C, D};
+  VLOAD (vsrc, buf_src, , float, f, 16, 4);
+  DECL_VARIABLE (vector_res, int, 16, 4) =
+    vcvta_s16_f16 (VECT_VAR (vsrc, float, 16, 4));
+  vst1_s16 (VECT_VAR (result, int, 16, 4),
+	    VECT_VAR (vector_res, int, 16, 4));
+
+  CHECK (TEST_MSG, int, 16, 4, PRIx16, expected, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VCVTAQ (S16 <- F16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc, float, 16, 8);
+  VECT_VAR_DECL (buf_src, float, 16, 8) [] = {A, B, C, D, E, F, G, H};
+  VLOAD (vsrc, buf_src, q, float, f, 16, 8);
+  DECL_VARIABLE (vector_res, int, 16, 8) =
+    vcvtaq_s16_f16 (VECT_VAR (vsrc, float, 16, 8));
+  vst1q_s16 (VECT_VAR (result, int, 16, 8),
+	     VECT_VAR (vector_res, int, 16, 8));
+
+  CHECK (TEST_MSG, int, 16, 8, PRIx16, expected, "");
+}
+
+int
+main (void)
+{
+  exec_vcvtas16f16_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvta_u16_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvta_u16_f16_1.c
new file mode 100644
index 0000000..144223d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvta_u16_f16_1.c
@@ -0,0 +1,70 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define USHORT(a) ((unsigned short) (a))
+#define FP16_C(a) ((__fp16) (a))
+#define A FP16_C (123.4)
+#define CVTA_A USHORT (123)
+#define B FP16_C (567.8)
+#define CVTA_B USHORT (568)
+#define C FP16_C (34.5)
+#define CVTA_C USHORT (35)
+#define D FP16_C (1024)
+#define CVTA_D USHORT (1024)
+#define E FP16_C (663)
+#define CVTA_E USHORT (663)
+#define F FP16_C (169)
+#define CVTA_F USHORT (169)
+#define G FP16_C (4.8)
+#define CVTA_G USHORT (5)
+#define H FP16_C (77)
+#define CVTA_H USHORT (77)
+
+/* Expected results for vcvta.u16.f16.  */
+VECT_VAR_DECL (expected, uint, 16, 4) [] = {CVTA_A, CVTA_B, CVTA_C, CVTA_D};
+
+VECT_VAR_DECL (expected, uint, 16, 8) [] = {CVTA_A, CVTA_B, CVTA_C, CVTA_D,
+					    CVTA_E, CVTA_F, CVTA_G, CVTA_H};
+void
+exec_vcvtau16f16_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VCVTA (U16 <- F16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc, float, 16, 4);
+  VECT_VAR_DECL (buf_src, float, 16, 4) [] = {A, B, C, D};
+  VLOAD (vsrc, buf_src, , float, f, 16, 4);
+  DECL_VARIABLE (vector_res, uint, 16, 4) =
+    vcvta_u16_f16 (VECT_VAR (vsrc, float, 16, 4));
+  vst1_u16 (VECT_VAR (result, uint, 16, 4),
+	    VECT_VAR (vector_res, uint, 16, 4));
+
+  CHECK (TEST_MSG, uint, 16, 4, PRIx16, expected, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VCVTAQ (U16 <- F16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc, float, 16, 8);
+  VECT_VAR_DECL (buf_src, float, 16, 8) [] = {A, B, C, D, E, F, G, H};
+  VLOAD (vsrc, buf_src, q, float, f, 16, 8);
+  DECL_VARIABLE (vector_res, uint, 16, 8) =
+    vcvtaq_u16_f16 (VECT_VAR (vsrc, float, 16, 8));
+  vst1q_u16 (VECT_VAR (result, uint, 16, 8),
+	     VECT_VAR (vector_res, uint, 16, 8));
+
+  CHECK (TEST_MSG, uint, 16, 8, PRIx16, expected, "");
+}
+
+int
+main (void)
+{
+  exec_vcvtau16f16_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtm_s16_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtm_s16_f16_1.c
new file mode 100644
index 0000000..86f7706
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtm_s16_f16_1.c
@@ -0,0 +1,70 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define SHORT(a) ((short) (a))
+#define FP16_C(a) ((__fp16) (a))
+#define A FP16_C (123.4)
+#define CVTM_A SHORT (123)
+#define B FP16_C (-567.8)
+#define CVTM_B SHORT (-568)
+#define C FP16_C (-34.5)
+#define CVTM_C SHORT (-35)
+#define D FP16_C (1024)
+#define CVTM_D SHORT (1024)
+#define E FP16_C (663.1)
+#define CVTM_E SHORT (663)
+#define F FP16_C (169.5)
+#define CVTM_F SHORT (169)
+#define G FP16_C (-4.8)
+#define CVTM_G SHORT (-5)
+#define H FP16_C (77)
+#define CVTM_H SHORT (77)
+
+/* Expected results for vcvtm.s16.f16.  */
+VECT_VAR_DECL (expected, int, 16, 4) [] = {CVTM_A, CVTM_B, CVTM_C, CVTM_D};
+
+VECT_VAR_DECL (expected, int, 16, 8) [] = {CVTM_A, CVTM_B, CVTM_C, CVTM_D,
+					   CVTM_E, CVTM_F, CVTM_G, CVTM_H};
+void
+exec_vcvtms16f16_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VCVTM (S16 <- F16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc, float, 16, 4);
+  VECT_VAR_DECL (buf_src, float, 16, 4) [] = {A, B, C, D};
+  VLOAD (vsrc, buf_src, , float, f, 16, 4);
+  DECL_VARIABLE (vector_res, int, 16, 4) =
+    vcvtm_s16_f16 (VECT_VAR (vsrc, float, 16, 4));
+  vst1_s16 (VECT_VAR (result, int, 16, 4),
+	    VECT_VAR (vector_res, int, 16, 4));
+
+  CHECK (TEST_MSG, int, 16, 4, PRIx16, expected, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VCVTMQ (S16 <- F16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc, float, 16, 8);
+  VECT_VAR_DECL (buf_src, float, 16, 8) [] = {A, B, C, D, E, F, G, H};
+  VLOAD (vsrc, buf_src, q, float, f, 16, 8);
+  DECL_VARIABLE (vector_res, int, 16, 8) =
+    vcvtmq_s16_f16 (VECT_VAR (vsrc, float, 16, 8));
+  vst1q_s16 (VECT_VAR (result, int, 16, 8),
+	     VECT_VAR (vector_res, int, 16, 8));
+
+  CHECK (TEST_MSG, int, 16, 8, PRIx16, expected, "");
+}
+
+int
+main (void)
+{
+  exec_vcvtms16f16_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtm_u16_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtm_u16_f16_1.c
new file mode 100644
index 0000000..7159e2a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtm_u16_f16_1.c
@@ -0,0 +1,70 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define USHORT(a) ((unsigned short) (a))
+#define FP16_C(a) ((__fp16) (a))
+#define A FP16_C (123.4)
+#define CVTM_A USHORT (123)
+#define B FP16_C (567.8)
+#define CVTM_B USHORT (568)
+#define C FP16_C (34.5)
+#define CVTM_C USHORT (34)
+#define D FP16_C (1024.5)
+#define CVTM_D USHORT (1024)
+#define E FP16_C (663)
+#define CVTM_E USHORT (663)
+#define F FP16_C (169.5)
+#define CVTM_F USHORT (169)
+#define G FP16_C (4.8)
+#define CVTM_G USHORT (4)
+#define H FP16_C (77)
+#define CVTM_H USHORT (77)
+
+/* Expected results for vcvtm.u16.f16.  */
+VECT_VAR_DECL (expected, uint, 16, 4) [] = {CVTM_A, CVTM_B, CVTM_C, CVTM_D};
+
+VECT_VAR_DECL (expected, uint, 16, 8) [] = {CVTM_A, CVTM_B, CVTM_C, CVTM_D,
+					    CVTM_E, CVTM_F, CVTM_G, CVTM_H};
+void
+exec_vcvtmu16f16_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VCVTM (U16 <- F16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc, float, 16, 4);
+  VECT_VAR_DECL (buf_src, float, 16, 4) [] = {A, B, C, D};
+  VLOAD (vsrc, buf_src, , float, f, 16, 4);
+  DECL_VARIABLE (vector_res, uint, 16, 4) =
+    vcvtm_u16_f16 (VECT_VAR (vsrc, float, 16, 4));
+  vst1_u16 (VECT_VAR (result, uint, 16, 4),
+	    VECT_VAR (vector_res, uint, 16, 4));
+
+  CHECK (TEST_MSG, uint, 16, 4, PRIx16, expected, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VCVTMQ (U16 <- F16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc, float, 16, 8);
+  VECT_VAR_DECL (buf_src, float, 16, 8) [] = {A, B, C, D, E, F, G, H};
+  VLOAD (vsrc, buf_src, q, float, f, 16, 8);
+  DECL_VARIABLE (vector_res, uint, 16, 8) =
+    vcvtmq_u16_f16 (VECT_VAR (vsrc, float, 16, 8));
+  vst1q_u16 (VECT_VAR (result, uint, 16, 8),
+	     VECT_VAR (vector_res, uint, 16, 8));
+
+  CHECK (TEST_MSG, uint, 16, 8, PRIx16, expected, "");
+}
+
+int
+main (void)
+{
+  exec_vcvtmu16f16_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtp_s16_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtp_s16_f16_1.c
new file mode 100644
index 0000000..fe79f23
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtp_s16_f16_1.c
@@ -0,0 +1,70 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define SHORT(a) ((short) (a))
+#define FP16_C(a) ((__fp16) (a))
+#define A FP16_C (123.4)
+#define CVTP_A SHORT (124)
+#define B FP16_C (-567.8)
+#define CVTP_B SHORT (-568)
+#define C FP16_C (-34.5)
+#define CVTP_C SHORT (-34)
+#define D FP16_C (1024)
+#define CVTP_D SHORT (1024)
+#define E FP16_C (663.1)
+#define CVTP_E SHORT (663)
+#define F FP16_C (169.5)
+#define CVTP_F SHORT (170)
+#define G FP16_C (-4.8)
+#define CVTP_G SHORT (-4)
+#define H FP16_C (77)
+#define CVTP_H SHORT (77)
+
+/* Expected results for vcvtp.s16.f16.  */
+VECT_VAR_DECL (expected, int, 16, 4) [] = {CVTP_A, CVTP_B, CVTP_C, CVTP_D};
+
+VECT_VAR_DECL (expected, int, 16, 8) [] = {CVTP_A, CVTP_B, CVTP_C, CVTP_D,
+					   CVTP_E, CVTP_F, CVTP_G, CVTP_H};
+void
+exec_vcvtps16f16_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VCVTP (S16 <- F16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc, float, 16, 4);
+  VECT_VAR_DECL (buf_src, float, 16, 4) [] = {A, B, C, D};
+  VLOAD (vsrc, buf_src, , float, f, 16, 4);
+  DECL_VARIABLE (vector_res, int, 16, 4) =
+    vcvtp_s16_f16 (VECT_VAR (vsrc, float, 16, 4));
+  vst1_s16 (VECT_VAR (result, int, 16, 4),
+	    VECT_VAR (vector_res, int, 16, 4));
+
+  CHECK (TEST_MSG, int, 16, 4, PRIx16, expected, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VCVTPQ (S16 <- F16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc, float, 16, 8);
+  VECT_VAR_DECL (buf_src, float, 16, 8) [] = {A, B, C, D, E, F, G, H};
+  VLOAD (vsrc, buf_src, q, float, f, 16, 8);
+  DECL_VARIABLE (vector_res, int, 16, 8) =
+    vcvtpq_s16_f16 (VECT_VAR (vsrc, float, 16, 8));
+  vst1q_s16 (VECT_VAR (result, int, 16, 8),
+	     VECT_VAR (vector_res, int, 16, 8));
+
+  CHECK (TEST_MSG, int, 16, 8, PRIx16, expected, "");
+}
+
+int
+main (void)
+{
+  exec_vcvtps16f16_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtp_u16_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtp_u16_f16_1.c
new file mode 100644
index 0000000..4955c61
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtp_u16_f16_1.c
@@ -0,0 +1,70 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define USHORT(a) ((unsigned short) (a))
+#define FP16_C(a) ((__fp16) (a))
+#define A FP16_C (123.4)
+#define CVTP_A USHORT (124)
+#define B FP16_C (567.8)
+#define CVTP_B USHORT (568)
+#define C FP16_C (34.5)
+#define CVTP_C USHORT (35)
+#define D FP16_C (1024)
+#define CVTP_D USHORT (1024)
+#define E FP16_C (663)
+#define CVTP_E USHORT (663)
+#define F FP16_C (169)
+#define CVTP_F USHORT (169)
+#define G FP16_C (4.8)
+#define CVTP_G USHORT (5)
+#define H FP16_C (77)
+#define CVTP_H USHORT (77)
+
+/* Expected results for vcvtp.u16.f16.  */
+VECT_VAR_DECL (expected, uint, 16, 4) [] = {CVTP_A, CVTP_B, CVTP_C, CVTP_D};
+
+VECT_VAR_DECL (expected, uint, 16, 8) [] = {CVTP_A, CVTP_B, CVTP_C, CVTP_D,
+					    CVTP_E, CVTP_F, CVTP_G, CVTP_H};
+void
+exec_vcvtpu16f16_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VCVTP (U16 <- F16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc, float, 16, 4);
+  VECT_VAR_DECL (buf_src, float, 16, 4) [] = {A, B, C, D};
+  VLOAD (vsrc, buf_src, , float, f, 16, 4);
+  DECL_VARIABLE (vector_res, uint, 16, 4) =
+    vcvtp_u16_f16 (VECT_VAR (vsrc, float, 16, 4));
+  vst1_u16 (VECT_VAR (result, uint, 16, 4),
+	    VECT_VAR (vector_res, uint, 16, 4));
+
+  CHECK (TEST_MSG, uint, 16, 4, PRIx16, expected, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VCVTPQ (U16 <- F16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc, float, 16, 8);
+  VECT_VAR_DECL (buf_src, float, 16, 8) [] = {A, B, C, D, E, F, G, H};
+  VLOAD (vsrc, buf_src, q, float, f, 16, 8);
+  DECL_VARIABLE (vector_res, uint, 16, 8) =
+    vcvtpq_u16_f16 (VECT_VAR (vsrc, float, 16, 8));
+  vst1q_u16 (VECT_VAR (result, uint, 16, 8),
+	     VECT_VAR (vector_res, uint, 16, 8));
+
+  CHECK (TEST_MSG, uint, 16, 8, PRIx16, expected, "");
+}
+
+int
+main (void)
+{
+  exec_vcvtpu16f16_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfma_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfma_f16_1.c
new file mode 100644
index 0000000..51208fe
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfma_f16_1.c
@@ -0,0 +1,106 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FP16_C(a) ((__fp16) a)
+#define A0 FP16_C (123.4)
+#define B0 FP16_C (-5.8)
+#define C0 FP16_C (-3.8)
+#define D0 FP16_C (10)
+
+#define A1 FP16_C (12.4)
+#define B1 FP16_C (-5.8)
+#define C1 FP16_C (90.8)
+#define D1 FP16_C (24)
+
+#define A2 FP16_C (23.4)
+#define B2 FP16_C (-5.8)
+#define C2 FP16_C (8.9)
+#define D2 FP16_C (4)
+
+#define E0 FP16_C (3.4)
+#define F0 FP16_C (-55.8)
+#define G0 FP16_C (-31.8)
+#define H0 FP16_C (2)
+
+#define E1 FP16_C (123.4)
+#define F1 FP16_C (-5.8)
+#define G1 FP16_C (-3.8)
+#define H1 FP16_C (102)
+
+#define E2 FP16_C (4.9)
+#define F2 FP16_C (-15.8)
+#define G2 FP16_C (39.8)
+#define H2 FP16_C (49)
+
+/* Expected results for vfma.  */
+VECT_VAR_DECL (expected, float, 16, 4) [] = {A0 + A1 * A2, B0 + B1 * B2,
+					     C0 + C1 * C2, D0 + D1 * D2};
+hfloat16_t * VECT_VAR (expected_static, hfloat, 16, 4) =
+  (hfloat16_t *) VECT_VAR (expected, float, 16, 4);
+
+VECT_VAR_DECL (expected, float, 16, 8) [] = {A0 + A1 * A2, B0 + B1 * B2,
+					     C0 + C1 * C2, D0 + D1 * D2,
+					     E0 + E1 * E2, F0 + F1 * F2,
+					     G0 + G1 * G2, H0 + H1 * H2};
+hfloat16_t * VECT_VAR (expected_static, hfloat, 16, 8) =
+  (hfloat16_t *) VECT_VAR (expected, float, 16, 8);
+
+void
+exec_vfma_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VFMA (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc_1, float, 16, 4);
+  DECL_VARIABLE (vsrc_2, float, 16, 4);
+  DECL_VARIABLE (vsrc_3, float, 16, 4);
+  VECT_VAR_DECL (buf_src_1, float, 16, 4) [] = {A0, B0, C0, D0};
+  VECT_VAR_DECL (buf_src_2, float, 16, 4) [] = {A1, B1, C1, D1};
+  VECT_VAR_DECL (buf_src_3, float, 16, 4) [] = {A2, B2, C2, D2};
+  VLOAD (vsrc_1, buf_src_1, , float, f, 16, 4);
+  VLOAD (vsrc_2, buf_src_2, , float, f, 16, 4);
+  VLOAD (vsrc_3, buf_src_3, , float, f, 16, 4);
+  DECL_VARIABLE (vector_res, float, 16, 4) =
+    vfma_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+	      VECT_VAR (vsrc_2, float, 16, 4),
+	      VECT_VAR (vsrc_3, float, 16, 4));
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_static, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VFMAQ (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc_1, float, 16, 8);
+  DECL_VARIABLE (vsrc_2, float, 16, 8);
+  DECL_VARIABLE (vsrc_3, float, 16, 8);
+  VECT_VAR_DECL (buf_src_1, float, 16, 8) [] = {A0, B0, C0, D0, E0, F0, G0, H0};
+  VECT_VAR_DECL (buf_src_2, float, 16, 8) [] = {A1, B1, C1, D1, E1, F1, G1, H1};
+  VECT_VAR_DECL (buf_src_3, float, 16, 8) [] = {A2, B2, C2, D2, E2, F2, G2, H2};
+  VLOAD (vsrc_1, buf_src_1, q, float, f, 16, 8);
+  VLOAD (vsrc_2, buf_src_2, q, float, f, 16, 8);
+  VLOAD (vsrc_3, buf_src_3, q, float, f, 16, 8);
+  DECL_VARIABLE (vector_res, float, 16, 8) =
+    vfmaq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+	       VECT_VAR (vsrc_2, float, 16, 8),
+	       VECT_VAR (vsrc_3, float, 16, 8));
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_static, "");
+}
+
+int
+main (void)
+{
+  exec_vfma_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfms_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfms_f16_1.c
new file mode 100644
index 0000000..3876e20
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfms_f16_1.c
@@ -0,0 +1,104 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FP16_C(a) ((__fp16) a)
+#define A0 FP16_C (123.4)
+#define B0 FP16_C (-5.8)
+#define C0 FP16_C (-3.8)
+#define D0 FP16_C (10)
+
+#define A1 FP16_C (12.4)
+#define B1 FP16_C (-5.8)
+#define C1 FP16_C (90.8)
+#define D1 FP16_C (24)
+
+#define A2 FP16_C (23.4)
+#define B2 FP16_C (-5.8)
+#define C2 FP16_C (8.9)
+#define D2 FP16_C (4)
+
+#define E0 FP16_C (3.4)
+#define F0 FP16_C (-55.8)
+#define G0 FP16_C (-31.8)
+#define H0 FP16_C (2)
+
+#define E1 FP16_C (123.4)
+#define F1 FP16_C (-5.8)
+#define G1 FP16_C (-3.8)
+#define H1 FP16_C (102)
+
+#define E2 FP16_C (4.9)
+#define F2 FP16_C (-15.8)
+#define G2 FP16_C (39.8)
+#define H2 FP16_C (49)
+
+/* Expected results for vfms.  */
+VECT_VAR_DECL (expected, float, 16, 4) [] = {A0 + -A1 * A2, B0 + -B1 * B2,
+					     C0 + -C1 * C2, D0 + -D1 * D2};
+hfloat16_t * VECT_VAR (expected_static, hfloat, 16, 4) =
+  (hfloat16_t *) VECT_VAR (expected, float, 16, 4);
+
+VECT_VAR_DECL (expected, float, 16, 8) [] = {A0 + -A1 * A2, B0 + -B1 * B2,
+					     C0 + -C1 * C2, D0 + -D1 * D2,
+					     E0 + -E1 * E2, F0 + -F1 * F2,
+					     G0 + -G1 * G2, H0 + -H1 * H2};
+hfloat16_t * VECT_VAR (expected_static, hfloat, 16, 8) =
+  (hfloat16_t *) VECT_VAR (expected, float, 16, 8);
+
+void
+exec_vfms_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VFMS (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc_1, float, 16, 4);
+  DECL_VARIABLE (vsrc_2, float, 16, 4);
+  DECL_VARIABLE (vsrc_3, float, 16, 4);
+  VECT_VAR_DECL (buf_src_1, float, 16, 4) [] = {A0, B0, C0, D0};
+  VECT_VAR_DECL (buf_src_2, float, 16, 4) [] = {A1, B1, C1, D1};
+  VECT_VAR_DECL (buf_src_3, float, 16, 4) [] = {A2, B2, C2, D2};
+  VLOAD (vsrc_1, buf_src_1, , float, f, 16, 4);
+  VLOAD (vsrc_2, buf_src_2, , float, f, 16, 4);
+  VLOAD (vsrc_3, buf_src_3, , float, f, 16, 4);
+  DECL_VARIABLE (vector_res, float, 16, 4) =
+    vfms_f16 (VECT_VAR (vsrc_1, float, 16, 4), VECT_VAR (vsrc_2, float, 16, 4),
+	      VECT_VAR (vsrc_3, float, 16, 4));
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_static, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VFMSQ (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc_1, float, 16, 8);
+  DECL_VARIABLE (vsrc_2, float, 16, 8);
+  DECL_VARIABLE (vsrc_3, float, 16, 8);
+  VECT_VAR_DECL (buf_src_1, float, 16, 8) [] = {A0, B0, C0, D0, E0, F0, G0, H0};
+  VECT_VAR_DECL (buf_src_2, float, 16, 8) [] = {A1, B1, C1, D1, E1, F1, G1, H1};
+  VECT_VAR_DECL (buf_src_3, float, 16, 8) [] = {A2, B2, C2, D2, E2, F2, G2, H2};
+  VLOAD (vsrc_1, buf_src_1, q, float, f, 16, 8);
+  VLOAD (vsrc_2, buf_src_2, q, float, f, 16, 8);
+  VLOAD (vsrc_3, buf_src_3, q, float, f, 16, 8);
+  DECL_VARIABLE (vector_res, float, 16, 8) =
+    vfmsq_f16 (VECT_VAR (vsrc_1, float, 16, 8), VECT_VAR (vsrc_2, float, 16, 8),
+	       VECT_VAR (vsrc_3, float, 16, 8));
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_static, "");
+}
+
+int
+main (void)
+{
+  exec_vfms_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmax_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmax_f16_1.c
new file mode 100644
index 0000000..830e439
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmax_f16_1.c
@@ -0,0 +1,81 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (123.4)
+#define B FP16_C (-567.8)
+#define C FP16_C (-34.8)
+#define D FP16_C (1024)
+#define E FP16_C (663.1)
+#define F FP16_C (169.1)
+#define G FP16_C (-4.8)
+#define H FP16_C (77)
+
+#define I FP16_C (0.7)
+#define J FP16_C (-78)
+#define K FP16_C (101.23)
+#define L FP16_C (98)
+#define M FP16_C (870.1)
+#define N FP16_C (-8781)
+#define O FP16_C (__builtin_inff ()) /* +Inf */
+#define P FP16_C (-__builtin_inff ()) /* -Inf */
+
+/* Expected results for vmax.  */
+VECT_VAR_DECL (expected, float, 16, 4) [] = {E, F, G, D};
+hfloat16_t * VECT_VAR (expected_static, hfloat, 16, 4) =
+  (hfloat16_t *) VECT_VAR (expected, float, 16, 4);
+
+VECT_VAR_DECL (expected, float, 16, 8) [] = {E, F, G, D, M, J, O, L};
+hfloat16_t * VECT_VAR (expected_static, hfloat, 16, 8) =
+  (hfloat16_t *) VECT_VAR (expected, float, 16, 8);
+
+void
+exec_vmax_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VMAX (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc_1, float, 16, 4);
+  DECL_VARIABLE (vsrc_2, float, 16, 4);
+  VECT_VAR_DECL (buf_src_1, float, 16, 4) [] = {A, B, C, D};
+  VECT_VAR_DECL (buf_src_2, float, 16, 4) [] = {E, F, G, H};
+  VLOAD (vsrc_1, buf_src_1, , float, f, 16, 4);
+  VLOAD (vsrc_2, buf_src_2, , float, f, 16, 4);
+  DECL_VARIABLE (vector_res, float, 16, 4) =
+    vmax_f16 (VECT_VAR (vsrc_1, float, 16, 4), VECT_VAR (vsrc_2, float, 16, 4));
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_static, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VMAXQ (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc_1, float, 16, 8);
+  DECL_VARIABLE (vsrc_2, float, 16, 8);
+  VECT_VAR_DECL (buf_src_1, float, 16, 8) [] = {A, B, C, D, I, J, K, L};
+  VECT_VAR_DECL (buf_src_2, float, 16, 8) [] = {E, F, G, H, M, N, O, P};
+  VLOAD (vsrc_1, buf_src_1, q, float, f, 16, 8);
+  VLOAD (vsrc_2, buf_src_2, q, float, f, 16, 8);
+  DECL_VARIABLE (vector_res, float, 16, 8) =
+    vmaxq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+	       VECT_VAR (vsrc_2, float, 16, 8));
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_static, "");
+}
+
+int
+main (void)
+{
+  exec_vmax_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmaxnm_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmaxnm_f16_1.c
new file mode 100644
index 0000000..5b2c991
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmaxnm_f16_1.c
@@ -0,0 +1,82 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (123.4)
+#define B FP16_C (__builtin_nanf ("")) /* NaN */
+#define C FP16_C (-34.8)
+#define D FP16_C (1024)
+#define E FP16_C (663.1)
+#define F FP16_C (169.1)
+#define G FP16_C (-4.8)
+#define H FP16_C (-__builtin_nanf ("")) /* NaN */
+
+#define I FP16_C (0.7)
+#define J FP16_C (-78)
+#define K FP16_C (101.23)
+#define L FP16_C (-1098)
+#define M FP16_C (870.1)
+#define N FP16_C (-8781)
+#define O FP16_C (__builtin_inff ()) /* +Inf */
+#define P FP16_C (-__builtin_inff ()) /* -Inf */
+
+/* Expected results for vmaxnm.  */
+VECT_VAR_DECL (expected, float, 16, 4) [] = {E, F, G, D};
+hfloat16_t * VECT_VAR (expected_static, hfloat, 16, 4) =
+  (hfloat16_t *) VECT_VAR (expected, float, 16, 4);
+
+VECT_VAR_DECL (expected, float, 16, 8) [] = {E, F, G, D, M, J, O, L};
+hfloat16_t * VECT_VAR (expected_static, hfloat, 16, 8) =
+  (hfloat16_t *) VECT_VAR (expected, float, 16, 8);
+
+void
+exec_vmaxnm_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VMAXNM (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc_1, float, 16, 4);
+  DECL_VARIABLE (vsrc_2, float, 16, 4);
+  VECT_VAR_DECL (buf_src_1, float, 16, 4) [] = {A, B, C, D};
+  VECT_VAR_DECL (buf_src_2, float, 16, 4) [] = {E, F, G, H};
+  VLOAD (vsrc_1, buf_src_1, , float, f, 16, 4);
+  VLOAD (vsrc_2, buf_src_2, , float, f, 16, 4);
+  DECL_VARIABLE (vector_res, float, 16, 4) =
+    vmaxnm_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		VECT_VAR (vsrc_2, float, 16, 4));
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_static, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VMAXNMQ (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc_1, float, 16, 8);
+  DECL_VARIABLE (vsrc_2, float, 16, 8);
+  VECT_VAR_DECL (buf_src_1, float, 16, 8) [] = {A, B, C, D, I, J, K, L};
+  VECT_VAR_DECL (buf_src_2, float, 16, 8) [] = {E, F, G, H, M, N, O, P};
+  VLOAD (vsrc_1, buf_src_1, q, float, f, 16, 8);
+  VLOAD (vsrc_2, buf_src_2, q, float, f, 16, 8);
+  DECL_VARIABLE (vector_res, float, 16, 8) =
+    vmaxnmq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		 VECT_VAR (vsrc_2, float, 16, 8));
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_static, "");
+}
+
+int
+main (void)
+{
+  exec_vmaxnm_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmin_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmin_f16_1.c
new file mode 100644
index 0000000..0ee5b4d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmin_f16_1.c
@@ -0,0 +1,81 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (123.4)
+#define B FP16_C (-567.8)
+#define C FP16_C (-34.8)
+#define D FP16_C (1024)
+#define E FP16_C (663.1)
+#define F FP16_C (169.1)
+#define G FP16_C (-4.8)
+#define H FP16_C (77)
+
+#define I FP16_C (0.7)
+#define J FP16_C (-78)
+#define K FP16_C (101.23)
+#define L FP16_C (98)
+#define M FP16_C (870.1)
+#define N FP16_C (-8781)
+#define O FP16_C (__builtin_inff ()) /* +Inf */
+#define P FP16_C (-__builtin_inff ()) /* -Inf */
+
+/* Expected results for vmin.  */
+VECT_VAR_DECL (expected, float, 16, 4) [] = {A, B, C, H};
+hfloat16_t * VECT_VAR (expected_static, hfloat, 16, 4) =
+  (hfloat16_t *) VECT_VAR (expected, float, 16, 4);
+
+VECT_VAR_DECL (expected, float, 16, 8) [] = {A, B, C, H, I, N, K, P};
+hfloat16_t * VECT_VAR (expected_static, hfloat, 16, 8) =
+  (hfloat16_t *) VECT_VAR (expected, float, 16, 8);
+
+void
+exec_vmin_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VMIN (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc_1, float, 16, 4);
+  DECL_VARIABLE (vsrc_2, float, 16, 4);
+  VECT_VAR_DECL (buf_src_1, float, 16, 4) [] = {A, B, C, D};
+  VECT_VAR_DECL (buf_src_2, float, 16, 4) [] = {E, F, G, H};
+  VLOAD (vsrc_1, buf_src_1, , float, f, 16, 4);
+  VLOAD (vsrc_2, buf_src_2, , float, f, 16, 4);
+  DECL_VARIABLE (vector_res, float, 16, 4) =
+    vmin_f16 (VECT_VAR (vsrc_1, float, 16, 4), VECT_VAR (vsrc_2, float, 16, 4));
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_static, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VMINQ (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc_1, float, 16, 8);
+  DECL_VARIABLE (vsrc_2, float, 16, 8);
+  VECT_VAR_DECL (buf_src_1, float, 16, 8) [] = {A, B, C, D, I, J, K, L};
+  VECT_VAR_DECL (buf_src_2, float, 16, 8) [] = {E, F, G, H, M, N, O, P};
+  VLOAD (vsrc_1, buf_src_1, q, float, f, 16, 8);
+  VLOAD (vsrc_2, buf_src_2, q, float, f, 16, 8);
+  DECL_VARIABLE (vector_res, float, 16, 8) =
+    vminq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+	       VECT_VAR (vsrc_2, float, 16, 8));
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_static, "");
+}
+
+int
+main (void)
+{
+  exec_vmin_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vminnm_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vminnm_f16_1.c
new file mode 100644
index 0000000..75150db
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vminnm_f16_1.c
@@ -0,0 +1,83 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (123.4)
+#define B FP16_C (__builtin_nanf ("")) /* NaN */
+#define C FP16_C (-34.8)
+#define D FP16_C (1024)
+#define E FP16_C (663.1)
+#define F FP16_C (169.1)
+#define G FP16_C (-4.8)
+#define H FP16_C (-__builtin_nanf ("")) /* NaN */
+
+#define I FP16_C (0.7)
+#define J FP16_C (-78)
+#define K FP16_C (101.23)
+#define L FP16_C (-1098)
+#define M FP16_C (870.1)
+#define N FP16_C (-8781)
+#define O FP16_C (__builtin_inff ()) /* +Inf */
+#define P FP16_C (-__builtin_inff ()) /* -Inf */
+
+
+/* Expected results for vminnm.  */
+VECT_VAR_DECL (expected, float, 16, 4) [] = {A, F, C, D};
+hfloat16_t * VECT_VAR (expected_static, hfloat, 16, 4) =
+  (hfloat16_t *) VECT_VAR (expected, float, 16, 4);
+
+VECT_VAR_DECL (expected, float, 16, 8) [] = {A, F, C, D, I, N, K, P};
+hfloat16_t * VECT_VAR (expected_static, hfloat, 16, 8) =
+  (hfloat16_t *) VECT_VAR (expected, float, 16, 8);
+
+void
+exec_vminnm_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VMINNM (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc_1, float, 16, 4);
+  DECL_VARIABLE (vsrc_2, float, 16, 4);
+  VECT_VAR_DECL (buf_src_1, float, 16, 4) [] = {A, B, C, D};
+  VECT_VAR_DECL (buf_src_2, float, 16, 4) [] = {E, F, G, H};
+  VLOAD (vsrc_1, buf_src_1, , float, f, 16, 4);
+  VLOAD (vsrc_2, buf_src_2, , float, f, 16, 4);
+  DECL_VARIABLE (vector_res, float, 16, 4) =
+    vminnm_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		VECT_VAR (vsrc_2, float, 16, 4));
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_static, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VMINNMQ (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc_1, float, 16, 8);
+  DECL_VARIABLE (vsrc_2, float, 16, 8);
+  VECT_VAR_DECL (buf_src_1, float, 16, 8) [] = {A, B, C, D, I, J, K, L};
+  VECT_VAR_DECL (buf_src_2, float, 16, 8) [] = {E, F, G, H, M, N, O, P};
+  VLOAD (vsrc_1, buf_src_1, q, float, f, 16, 8);
+  VLOAD (vsrc_2, buf_src_2, q, float, f, 16, 8);
+  DECL_VARIABLE (vector_res, float, 16, 8) =
+    vminnmq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		 VECT_VAR (vsrc_2, float, 16, 8));
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_static, "");
+}
+
+int
+main (void)
+{
+  exec_vminnm_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmul_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmul_f16_1.c
new file mode 100644
index 0000000..2bc26d8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmul_f16_1.c
@@ -0,0 +1,82 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (13.4)
+#define B FP16_C (-56.8)
+#define C FP16_C (-34.8)
+#define D FP16_C (12)
+#define E FP16_C (63.1)
+#define F FP16_C (19.1)
+#define G FP16_C (-4.8)
+#define H FP16_C (77)
+
+#define I FP16_C (0.7)
+#define J FP16_C (-78)
+#define K FP16_C (11.23)
+#define L FP16_C (98)
+#define M FP16_C (87.1)
+#define N FP16_C (-8)
+#define O FP16_C (-1.1)
+#define P FP16_C (-9.7)
+
+/* Expected results for vmul.  */
+VECT_VAR_DECL (expected, float, 16, 4) [] = {A * E, B * F, C * G, D * H};
+hfloat16_t * VECT_VAR (expected_static, hfloat, 16, 4) =
+  (hfloat16_t *) VECT_VAR (expected, float, 16, 4);
+
+VECT_VAR_DECL (expected, float, 16, 8) [] = {A * E, B * F, C * G, D * H,
+					     I * M, J * N, K * O, L * P};
+hfloat16_t * VECT_VAR (expected_static, hfloat, 16, 8) =
+  (hfloat16_t *) VECT_VAR (expected, float, 16, 8);
+
+void
+exec_vmul_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VMUL (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc_1, float, 16, 4);
+  DECL_VARIABLE (vsrc_2, float, 16, 4);
+  VECT_VAR_DECL (buf_src_1, float, 16, 4) [] = {A, B, C, D};
+  VECT_VAR_DECL (buf_src_2, float, 16, 4) [] = {E, F, G, H};
+  VLOAD (vsrc_1, buf_src_1, , float, f, 16, 4);
+  VLOAD (vsrc_2, buf_src_2, , float, f, 16, 4);
+  DECL_VARIABLE (vector_res, float, 16, 4) =
+    vmul_f16 (VECT_VAR (vsrc_1, float, 16, 4), VECT_VAR (vsrc_2, float, 16, 4));
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_static, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VMULQ (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc_1, float, 16, 8);
+  DECL_VARIABLE (vsrc_2, float, 16, 8);
+  VECT_VAR_DECL (buf_src_1, float, 16, 8) [] = {A, B, C, D, I, J, K, L};
+  VECT_VAR_DECL (buf_src_2, float, 16, 8) [] = {E, F, G, H, M, N, O, P};
+  VLOAD (vsrc_1, buf_src_1, q, float, f, 16, 8);
+  VLOAD (vsrc_2, buf_src_2, q, float, f, 16, 8);
+  DECL_VARIABLE (vector_res, float, 16, 8) =
+    vmulq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+	       VECT_VAR (vsrc_2, float, 16, 8));
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_static, "");
+}
+
+int
+main (void)
+{
+  exec_vmul_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmul_lane_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmul_lane_f16_1.c
new file mode 100644
index 0000000..fa82f19
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmul_lane_f16_1.c
@@ -0,0 +1,155 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (13.4)
+#define B FP16_C (-56.8)
+#define C FP16_C (-34.8)
+#define D FP16_C (12)
+#define E FP16_C (63.1)
+#define F FP16_C (19.1)
+#define G FP16_C (-4.8)
+#define H FP16_C (77)
+
+#define I FP16_C (0.7)
+#define J FP16_C (-78)
+#define K FP16_C (11.23)
+#define L FP16_C (98)
+#define M FP16_C (87.1)
+#define N FP16_C (-8)
+#define O FP16_C (-1.1)
+#define P FP16_C (-9.7)
+
+/* Expected results for vmul_lane.  */
+VECT_VAR_DECL (expected0, float, 16, 4) [] = {A * E, B * E, C * E, D * E};
+hfloat16_t * VECT_VAR (expected0_static, hfloat, 16, 4) =
+  (hfloat16_t *) VECT_VAR (expected0, float, 16, 4);
+
+VECT_VAR_DECL (expected1, float, 16, 4) [] = {A * F, B * F, C * F, D * F};
+hfloat16_t * VECT_VAR (expected1_static, hfloat, 16, 4) =
+  (hfloat16_t *) VECT_VAR (expected1, float, 16, 4);
+
+VECT_VAR_DECL (expected2, float, 16, 4) [] = {A * G, B * G, C * G, D * G};
+hfloat16_t * VECT_VAR (expected2_static, hfloat, 16, 4) =
+  (hfloat16_t *) VECT_VAR (expected2, float, 16, 4);
+
+VECT_VAR_DECL (expected3, float, 16, 4) [] = {A * H, B * H, C * H, D * H};
+hfloat16_t * VECT_VAR (expected3_static, hfloat, 16, 4) =
+  (hfloat16_t *) VECT_VAR (expected3, float, 16, 4);
+
+VECT_VAR_DECL (expected0, float, 16, 8) [] = {A * E, B * E, C * E, D * E,
+					     I * E, J * E, K * E, L * E};
+hfloat16_t * VECT_VAR (expected0_static, hfloat, 16, 8) =
+  (hfloat16_t *) VECT_VAR (expected0, float, 16, 8);
+
+VECT_VAR_DECL (expected1, float, 16, 8) [] = {A * F, B * F, C * F, D * F,
+					      I * F, J * F, K * F, L * F};
+hfloat16_t * VECT_VAR (expected1_static, hfloat, 16, 8) =
+  (hfloat16_t *) VECT_VAR (expected1, float, 16, 8);
+
+VECT_VAR_DECL (expected2, float, 16, 8) [] = {A * G, B * G, C * G, D * G,
+					      I * G, J * G, K * G, L * G};
+hfloat16_t * VECT_VAR (expected2_static, hfloat, 16, 8) =
+  (hfloat16_t *) VECT_VAR (expected2, float, 16, 8);
+
+VECT_VAR_DECL (expected3, float, 16, 8) [] = {A * H, B * H, C * H, D * H,
+					      I * H, J * H, K * H, L * H};
+hfloat16_t * VECT_VAR (expected3_static, hfloat, 16, 8) =
+  (hfloat16_t *) VECT_VAR (expected3, float, 16, 8);
+
+void
+exec_vmul_lane_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VMUL_LANE (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc_1, float, 16, 4);
+  DECL_VARIABLE (vsrc_2, float, 16, 4);
+  VECT_VAR_DECL (buf_src_1, float, 16, 4) [] = {A, B, C, D};
+  VECT_VAR_DECL (buf_src_2, float, 16, 4) [] = {E, F, G, H};
+  VLOAD (vsrc_1, buf_src_1, , float, f, 16, 4);
+  VLOAD (vsrc_2, buf_src_2, , float, f, 16, 4);
+  DECL_VARIABLE (vector_res, float, 16, 4) =
+    vmul_lane_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		   VECT_VAR (vsrc_2, float, 16, 4), 0);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected0_static, "");
+
+  VECT_VAR (vector_res, float, 16, 4) =
+    vmul_lane_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		   VECT_VAR (vsrc_2, float, 16, 4), 1);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected1_static, "");
+
+  VECT_VAR (vector_res, float, 16, 4) =
+    vmul_lane_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		   VECT_VAR (vsrc_2, float, 16, 4), 2);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected2_static, "");
+
+  VECT_VAR (vector_res, float, 16, 4) =
+    vmul_lane_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		   VECT_VAR (vsrc_2, float, 16, 4), 3);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected3_static, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VMULQ_LANE (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc_1, float, 16, 8);
+  VECT_VAR_DECL (buf_src_1, float, 16, 8) [] = {A, B, C, D, I, J, K, L};
+  VLOAD (vsrc_1, buf_src_1, q, float, f, 16, 8);
+  DECL_VARIABLE (vector_res, float, 16, 8) =
+    vmulq_lane_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		    VECT_VAR (vsrc_2, float, 16, 4), 0);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected0_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8) =
+    vmulq_lane_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		    VECT_VAR (vsrc_2, float, 16, 4), 1);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected1_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8) =
+    vmulq_lane_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		    VECT_VAR (vsrc_2, float, 16, 4), 2);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected2_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8) =
+    vmulq_lane_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		    VECT_VAR (vsrc_2, float, 16, 4), 3);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected3_static, "");
+}
+
+int
+main (void)
+{
+  exec_vmul_lane_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmul_n_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmul_n_f16_1.c
new file mode 100644
index 0000000..3e633d6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmul_n_f16_1.c
@@ -0,0 +1,192 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (13.4)
+#define B FP16_C (-56.8)
+#define C FP16_C (-34.8)
+#define D FP16_C (12)
+#define E FP16_C (63.1)
+#define F FP16_C (19.1)
+#define G FP16_C (-4.8)
+#define H FP16_C (77)
+
+#define I FP16_C (0.7)
+#define J FP16_C (-78)
+#define K FP16_C (11.23)
+#define L FP16_C (98)
+#define M FP16_C (87.1)
+#define N FP16_C (-8)
+#define O FP16_C (-1.1)
+#define P FP16_C (-9.7)
+
+/* Expected results for vmul_n.  */
+VECT_VAR_DECL (expected0, float, 16, 4) [] = {A * E, B * E, C * E, D * E};
+hfloat16_t * VECT_VAR (expected0_static, hfloat, 16, 4) =
+  (hfloat16_t *) VECT_VAR (expected0, float, 16, 4);
+
+VECT_VAR_DECL (expected1, float, 16, 4) [] = {A * F, B * F, C * F, D * F};
+hfloat16_t * VECT_VAR (expected1_static, hfloat, 16, 4) =
+  (hfloat16_t *) VECT_VAR (expected1, float, 16, 4);
+
+VECT_VAR_DECL (expected2, float, 16, 4) [] = {A * G, B * G, C * G, D * G};
+hfloat16_t * VECT_VAR (expected2_static, hfloat, 16, 4) =
+  (hfloat16_t *) VECT_VAR (expected2, float, 16, 4);
+
+VECT_VAR_DECL (expected3, float, 16, 4) [] = {A * H, B * H, C * H, D * H};
+hfloat16_t * VECT_VAR (expected3_static, hfloat, 16, 4) =
+  (hfloat16_t *) VECT_VAR (expected3, float, 16, 4);
+
+VECT_VAR_DECL (expected0, float, 16, 8) [] = {A * E, B * E, C * E, D * E,
+					     I * E, J * E, K * E, L * E};
+hfloat16_t * VECT_VAR (expected0_static, hfloat, 16, 8) =
+  (hfloat16_t *) VECT_VAR (expected0, float, 16, 8);
+
+VECT_VAR_DECL (expected1, float, 16, 8) [] = {A * F, B * F, C * F, D * F,
+					      I * F, J * F, K * F, L * F};
+hfloat16_t * VECT_VAR (expected1_static, hfloat, 16, 8) =
+  (hfloat16_t *) VECT_VAR (expected1, float, 16, 8);
+
+VECT_VAR_DECL (expected2, float, 16, 8) [] = {A * G, B * G, C * G, D * G,
+					      I * G, J * G, K * G, L * G};
+hfloat16_t * VECT_VAR (expected2_static, hfloat, 16, 8) =
+  (hfloat16_t *) VECT_VAR (expected2, float, 16, 8);
+
+VECT_VAR_DECL (expected3, float, 16, 8) [] = {A * H, B * H, C * H, D * H,
+					      I * H, J * H, K * H, L * H};
+hfloat16_t * VECT_VAR (expected3_static, hfloat, 16, 8) =
+  (hfloat16_t *) VECT_VAR (expected3, float, 16, 8);
+
+VECT_VAR_DECL (expected4, float, 16, 8) [] = {A * M, B * M, C * M, D * M,
+					      I * M, J * M, K * M, L * M};
+hfloat16_t * VECT_VAR (expected4_static, hfloat, 16, 8) =
+  (hfloat16_t *) VECT_VAR (expected4, float, 16, 8);
+
+VECT_VAR_DECL (expected5, float, 16, 8) [] = {A * N, B * N, C * N, D * N,
+					      I * N, J * N, K * N, L * N};
+hfloat16_t * VECT_VAR (expected5_static, hfloat, 16, 8) =
+  (hfloat16_t *) VECT_VAR (expected5, float, 16, 8);
+
+VECT_VAR_DECL (expected6, float, 16, 8) [] = {A * O, B * O, C * O, D * O,
+					      I * O, J * O, K * O, L * O};
+hfloat16_t * VECT_VAR (expected6_static, hfloat, 16, 8) =
+  (hfloat16_t *) VECT_VAR (expected6, float, 16, 8);
+
+VECT_VAR_DECL (expected7, float, 16, 8) [] = {A * P, B * P, C * P, D * P,
+					      I * P, J * P, K * P, L * P};
+hfloat16_t * VECT_VAR (expected7_static, hfloat, 16, 8) =
+  (hfloat16_t *) VECT_VAR (expected7, float, 16, 8);
+
+void
+exec_vmul_n_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VMUL_N (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc_1, float, 16, 4);
+  VECT_VAR_DECL (buf_src_1, float, 16, 4) [] = {A, B, C, D};
+  VLOAD (vsrc_1, buf_src_1, , float, f, 16, 4);
+  DECL_VARIABLE (vector_res, float, 16, 4) =
+    vmul_n_f16 (VECT_VAR (vsrc_1, float, 16, 4), E);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected0_static, "");
+
+  VECT_VAR (vector_res, float, 16, 4) =
+    vmul_n_f16 (VECT_VAR (vsrc_1, float, 16, 4), F);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected1_static, "");
+
+  VECT_VAR (vector_res, float, 16, 4) =
+    vmul_n_f16 (VECT_VAR (vsrc_1, float, 16, 4), G);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected2_static, "");
+
+  VECT_VAR (vector_res, float, 16, 4) =
+    vmul_n_f16 (VECT_VAR (vsrc_1, float, 16, 4), H);
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected3_static, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VMULQ_N (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc_1, float, 16, 8);
+  VECT_VAR_DECL (buf_src_1, float, 16, 8) [] = {A, B, C, D, I, J, K, L};
+  VLOAD (vsrc_1, buf_src_1, q, float, f, 16, 8);
+  DECL_VARIABLE (vector_res, float, 16, 8) =
+    vmulq_n_f16 (VECT_VAR (vsrc_1, float, 16, 8), E);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected0_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8) =
+    vmulq_n_f16 (VECT_VAR (vsrc_1, float, 16, 8), F);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected1_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8) =
+    vmulq_n_f16 (VECT_VAR (vsrc_1, float, 16, 8), G);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected2_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8) =
+    vmulq_n_f16 (VECT_VAR (vsrc_1, float, 16, 8), H);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected3_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8) =
+    vmulq_n_f16 (VECT_VAR (vsrc_1, float, 16, 8), M);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected4_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8) =
+    vmulq_n_f16 (VECT_VAR (vsrc_1, float, 16, 8), N);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected5_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8) =
+    vmulq_n_f16 (VECT_VAR (vsrc_1, float, 16, 8), O);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected6_static, "");
+
+  VECT_VAR (vector_res, float, 16, 8) =
+    vmulq_n_f16 (VECT_VAR (vsrc_1, float, 16, 8), P);
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected7_static, "");
+}
+
+int
+main (void)
+{
+  exec_vmul_n_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vneg_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vneg_f16_1.c
new file mode 100644
index 0000000..515bb4d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vneg_f16_1.c
@@ -0,0 +1,65 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (123.4)
+#define B FP16_C (-567.8)
+#define C FP16_C (-34.8)
+#define D FP16_C (1024)
+#define E FP16_C (663.1)
+#define F FP16_C (169.1)
+#define G FP16_C (-4.8)
+#define H FP16_C (77)
+
+/* Expected results for vneg.  */
+VECT_VAR_DECL (expected, float, 16, 4) [] = {-A, -B, -C, -D};
+hfloat16_t * VECT_VAR (expected_static, hfloat, 16, 4) =
+  (hfloat16_t *) VECT_VAR (expected, float, 16, 4);
+
+VECT_VAR_DECL (expected, float, 16, 8) [] = {-A, -B, -C, -D, -E, -F, -G, -H};
+hfloat16_t * VECT_VAR (expected_static, hfloat, 16, 8) =
+  (hfloat16_t *) VECT_VAR (expected, float, 16, 8);
+
+void
+exec_vneg_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VNEG (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc, float, 16, 4);
+  VECT_VAR_DECL (buf_src, float, 16, 4) [] = {A, B, C, D};
+  VLOAD (vsrc, buf_src, , float, f, 16, 4);
+  DECL_VARIABLE (vector_res, float, 16, 4) =
+    vneg_f16 (VECT_VAR (vsrc, float, 16, 4));
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_static, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VNEGQ (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc, float, 16, 8);
+  VECT_VAR_DECL (buf_src, float, 16, 8) [] = {A, B, C, D, E, F, G, H};
+  VLOAD (vsrc, buf_src, q, float, f, 16, 8);
+  DECL_VARIABLE (vector_res, float, 16, 8) =
+    vnegq_f16 (VECT_VAR (vsrc, float, 16, 8));
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_static, "");
+}
+
+int
+main (void)
+{
+  exec_vneg_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vpadd_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vpadd_f16_1.c
new file mode 100644
index 0000000..a5b48a1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vpadd_f16_1.c
@@ -0,0 +1,87 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (123.4)
+#define B FP16_C (-567.8)
+#define C FP16_C (-34.8)
+#define D FP16_C (1024)
+#define E FP16_C (663.1)
+#define F FP16_C (169.1)
+#define G FP16_C (-4.8)
+#define H FP16_C (77)
+
+#define I FP16_C (0.7)
+#define J FP16_C (-78)
+#define K FP16_C (101.23)
+#define L FP16_C (98)
+#define M FP16_C (870.1)
+#define N FP16_C (-8781)
+#define O FP16_C (-1.1)
+#define P FP16_C (47823)
+
+/* Expected results for vpadd.  */
+VECT_VAR_DECL (expected, float, 16, 4) [] = {A + B, C + D, E + F, G + H};
+hfloat16_t * VECT_VAR (expected_static, hfloat, 16, 4) =
+  (hfloat16_t *) VECT_VAR (expected, float, 16, 4);
+
+VECT_VAR_DECL (expected, float, 16, 8) [] = {A + B, C + D, I + J, K + L,
+					     E + F, G + H, M + N, O + P};
+hfloat16_t * VECT_VAR (expected_static, hfloat, 16, 8) =
+  (hfloat16_t *) VECT_VAR (expected, float, 16, 8);
+
+void
+exec_vpadd_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VPADD (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc_1, float, 16, 4);
+  DECL_VARIABLE (vsrc_2, float, 16, 4);
+  VECT_VAR_DECL (buf_src_1, float, 16, 4) [] = {A, B, C, D};
+  VECT_VAR_DECL (buf_src_2, float, 16, 4) [] = {E, F, G, H};
+  VLOAD (vsrc_1, buf_src_1, , float, f, 16, 4);
+  VLOAD (vsrc_2, buf_src_2, , float, f, 16, 4);
+  DECL_VARIABLE (vector_res, float, 16, 4) =
+    vpadd_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+	       VECT_VAR (vsrc_2, float, 16, 4));
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_static, "");
+
+#ifdef __ARM_ARCH_ISA_A64
+
+#undef TEST_MSG
+#define TEST_MSG "VPADDQ (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc_1, float, 16, 8);
+  DECL_VARIABLE (vsrc_2, float, 16, 8);
+  VECT_VAR_DECL (buf_src_1, float, 16, 8) [] = {A, B, C, D, I, J, K, L};
+  VECT_VAR_DECL (buf_src_2, float, 16, 8) [] = {E, F, G, H, M, N, O, P};
+  VLOAD (vsrc_1, buf_src_1, q, float, f, 16, 8);
+  VLOAD (vsrc_2, buf_src_2, q, float, f, 16, 8);
+  DECL_VARIABLE (vector_res, float, 16, 8) =
+    vpaddq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		VECT_VAR (vsrc_2, float, 16, 8));
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_static, "");
+
+#endif
+}
+
+int
+main (void)
+{
+  exec_vpadd_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vpmax_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vpmax_f16_1.c
new file mode 100644
index 0000000..af310b1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vpmax_f16_1.c
@@ -0,0 +1,87 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (123.4)
+#define B FP16_C (-567.8)
+#define C FP16_C (-34.8)
+#define D FP16_C (1024)
+#define E FP16_C (663.1)
+#define F FP16_C (169.1)
+#define G FP16_C (-4.8)
+#define H FP16_C (77)
+
+#define I FP16_C (0.7)
+#define J FP16_C (-78)
+#define K FP16_C (101.23)
+#define L FP16_C (98)
+#define M FP16_C (870.1)
+#define N FP16_C (-8781)
+#define O FP16_C (__builtin_inff ()) /* +Inf */
+#define P FP16_C (-__builtin_inff ()) /* -Inf */
+
+
+/* Expected results for vpmax.  */
+VECT_VAR_DECL (expected, float, 16, 4) [] = {A, D, E, H};
+hfloat16_t * VECT_VAR (expected_static, hfloat, 16, 4) =
+  (hfloat16_t *) VECT_VAR (expected, float, 16, 4);
+
+VECT_VAR_DECL (expected, float, 16, 8) [] = {A, D, I, K, E, H, M, O};
+hfloat16_t * VECT_VAR (expected_static, hfloat, 16, 8) =
+  (hfloat16_t *) VECT_VAR (expected, float, 16, 8);
+
+void
+exec_vpmax_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VPMAX (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc_1, float, 16, 4);
+  DECL_VARIABLE (vsrc_2, float, 16, 4);
+  VECT_VAR_DECL (buf_src_1, float, 16, 4) [] = {A, B, C, D};
+  VECT_VAR_DECL (buf_src_2, float, 16, 4) [] = {E, F, G, H};
+  VLOAD (vsrc_1, buf_src_1, , float, f, 16, 4);
+  VLOAD (vsrc_2, buf_src_2, , float, f, 16, 4);
+  DECL_VARIABLE (vector_res, float, 16, 4) =
+    vpmax_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+	       VECT_VAR (vsrc_2, float, 16, 4));
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_static, "");
+
+#ifdef __ARM_ARCH_ISA_A64
+
+#undef TEST_MSG
+#define TEST_MSG "VPMAXQ (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc_1, float, 16, 8);
+  DECL_VARIABLE (vsrc_2, float, 16, 8);
+  VECT_VAR_DECL (buf_src_1, float, 16, 8) [] = {A, B, C, D, I, J, K, L};
+  VECT_VAR_DECL (buf_src_2, float, 16, 8) [] = {E, F, G, H, M, N, O, P};
+  VLOAD (vsrc_1, buf_src_1, q, float, f, 16, 8);
+  VLOAD (vsrc_2, buf_src_2, q, float, f, 16, 8);
+  DECL_VARIABLE (vector_res, float, 16, 8) =
+    vpmaxq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		VECT_VAR (vsrc_2, float, 16, 8));
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_static, "");
+
+#endif
+}
+
+int
+main (void)
+{
+  exec_vpmax_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vpmin_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vpmin_f16_1.c
new file mode 100644
index 0000000..7230176
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vpmin_f16_1.c
@@ -0,0 +1,86 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (123.4)
+#define B FP16_C (-567.8)
+#define C FP16_C (-34.8)
+#define D FP16_C (1024)
+#define E FP16_C (663.1)
+#define F FP16_C (169.1)
+#define G FP16_C (-4.8)
+#define H FP16_C (77)
+
+#define I FP16_C (0.7)
+#define J FP16_C (-78)
+#define K FP16_C (101.23)
+#define L FP16_C (98)
+#define M FP16_C (870.1)
+#define N FP16_C (-8781)
+#define O FP16_C (__builtin_inff ()) /* +Inf */
+#define P FP16_C (-__builtin_inff ()) /* -Inf */
+
+/* Expected results for vpmin.  */
+VECT_VAR_DECL (expected, float, 16, 4) [] = {B, C, F, G};
+hfloat16_t * VECT_VAR (expected_static, hfloat, 16, 4) =
+  (hfloat16_t *) VECT_VAR (expected, float, 16, 4);
+
+VECT_VAR_DECL (expected, float, 16, 8) [] = {B, C, J, L, F, G, N, P};
+hfloat16_t * VECT_VAR (expected_static, hfloat, 16, 8) =
+  (hfloat16_t *) VECT_VAR (expected, float, 16, 8);
+
+void
+exec_vpmin_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VPMIN (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc_1, float, 16, 4);
+  DECL_VARIABLE (vsrc_2, float, 16, 4);
+  VECT_VAR_DECL (buf_src_1, float, 16, 4) [] = {A, B, C, D};
+  VECT_VAR_DECL (buf_src_2, float, 16, 4) [] = {E, F, G, H};
+  VLOAD (vsrc_1, buf_src_1, , float, f, 16, 4);
+  VLOAD (vsrc_2, buf_src_2, , float, f, 16, 4);
+  DECL_VARIABLE (vector_res, float, 16, 4) =
+    vpmin_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+	       VECT_VAR (vsrc_2, float, 16, 4));
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_static, "");
+
+#ifdef __ARM_ARCH_ISA_A64
+
+#undef TEST_MSG
+#define TEST_MSG "VPMINQ (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc_1, float, 16, 8);
+  DECL_VARIABLE (vsrc_2, float, 16, 8);
+  VECT_VAR_DECL (buf_src_1, float, 16, 8) [] = {A, B, C, D, I, J, K, L};
+  VECT_VAR_DECL (buf_src_2, float, 16, 8) [] = {E, F, G, H, M, N, O, P};
+  VLOAD (vsrc_1, buf_src_1, q, float, f, 16, 8);
+  VLOAD (vsrc_2, buf_src_2, q, float, f, 16, 8);
+  DECL_VARIABLE (vector_res, float, 16, 8) =
+    vpminq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		VECT_VAR (vsrc_2, float, 16, 8));
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_static, "");
+
+#endif
+}
+
+int
+main (void)
+{
+  exec_vpmin_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrecpe_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrecpe_f16_1.c
new file mode 100644
index 0000000..12cf5ae
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrecpe_f16_1.c
@@ -0,0 +1,75 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (123.4)
+#define B FP16_C (567.8)
+#define C FP16_C (34.8)
+#define D FP16_C (1024)
+#define E FP16_C (663.1)
+#define F FP16_C (144.0)
+#define G FP16_C (4.8)
+#define H FP16_C (77)
+
+#define RECP_A FP16_C (1/A)
+#define RECP_B FP16_C (1/B)
+#define RECP_C FP16_C (1/C)
+#define RECP_D FP16_C (1/D)
+#define RECP_E FP16_C (1/E)
+#define RECP_F FP16_C (1/F)
+#define RECP_G FP16_C (1/G)
+#define RECP_H FP16_C (1/H)
+
+/* Expected results for vrecpe.  */
+VECT_VAR_DECL (expected, float, 16, 4) [] = {RECP_A, RECP_B, RECP_C, RECP_D};
+hfloat16_t * VECT_VAR (expected_static, hfloat, 16, 4) =
+  (hfloat16_t *) VECT_VAR (expected, float, 16, 4);
+
+VECT_VAR_DECL (expected, float, 16, 8) [] = {RECP_A, RECP_B, RECP_C, RECP_D,
+					     RECP_E, RECP_F, RECP_G, RECP_H};
+hfloat16_t * VECT_VAR (expected_static, hfloat, 16, 8) =
+  (hfloat16_t *) VECT_VAR (expected, float, 16, 8);
+
+void
+exec_vrecpe_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VRECPE (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc, float, 16, 4);
+  VECT_VAR_DECL (buf_src, float, 16, 4) [] = {A, B, C, D};
+  VLOAD (vsrc, buf_src, , float, f, 16, 4);
+  DECL_VARIABLE (vector_res, float, 16, 4) =
+    vrecpe_f16 (VECT_VAR (vsrc, float, 16, 4));
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP_BIAS (TEST_MSG, float, 16, 4, PRIx16, expected_static, "", 5);
+
+#undef TEST_MSG
+#define TEST_MSG "VRECPEQ (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc, float, 16, 8);
+  VECT_VAR_DECL (buf_src, float, 16, 8) [] = {A, B, C, D, E, F, G, H};
+  VLOAD (vsrc, buf_src, q, float, f, 16, 8);
+  DECL_VARIABLE (vector_res, float, 16, 8) =
+    vrecpeq_f16 (VECT_VAR (vsrc, float, 16, 8));
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP_BIAS (TEST_MSG, float, 16, 8, PRIx16, expected_static, "", 5);
+}
+
+int
+main (void)
+{
+  exec_vrecpe_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrecps_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrecps_f16_1.c
new file mode 100644
index 0000000..1e9c511
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrecps_f16_1.c
@@ -0,0 +1,86 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (12.4)
+#define B FP16_C (-5.8)
+#define C FP16_C (-3.8)
+#define D FP16_C (10)
+#define E FP16_C (66.1)
+#define F FP16_C (16.1)
+#define G FP16_C (-4.8)
+#define H FP16_C (-77)
+
+#define I FP16_C (0.7)
+#define J FP16_C (-78)
+#define K FP16_C (10.23)
+#define L FP16_C (98)
+#define M FP16_C (87)
+#define N FP16_C (-87.81)
+#define O FP16_C (-1.1)
+#define P FP16_C (47.8)
+
+/* Expected results for vrecps.  */
+VECT_VAR_DECL (expected, float, 16, 4) [] = {2.0f - A * E, 2.0f - B * F,
+					     2.0f - C * G, 2.0f - D * H};
+hfloat16_t * VECT_VAR (expected_static, hfloat, 16, 4) =
+  (hfloat16_t *) VECT_VAR (expected, float, 16, 4);
+
+VECT_VAR_DECL (expected, float, 16, 8) [] = {2.0f - A * E, 2.0f - B * F,
+					     2.0f - C * G, 2.0f - D * H,
+					     2.0f - I * M, 2.0f - J * N,
+					     2.0f - K * O, 2.0f - L * P};
+hfloat16_t * VECT_VAR (expected_static, hfloat, 16, 8) =
+  (hfloat16_t *) VECT_VAR (expected, float, 16, 8);
+
+void
+exec_vrecps_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VRECPS (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc_1, float, 16, 4);
+  DECL_VARIABLE (vsrc_2, float, 16, 4);
+  VECT_VAR_DECL (buf_src_1, float, 16, 4) [] = {A, B, C, D};
+  VECT_VAR_DECL (buf_src_2, float, 16, 4) [] = {E, F, G, H};
+  VLOAD (vsrc_1, buf_src_1, , float, f, 16, 4);
+  VLOAD (vsrc_2, buf_src_2, , float, f, 16, 4);
+  DECL_VARIABLE (vector_res, float, 16, 4) =
+    vrecps_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		VECT_VAR (vsrc_2, float, 16, 4));
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_static, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VRECPSQ (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc_1, float, 16, 8);
+  DECL_VARIABLE (vsrc_2, float, 16, 8);
+  VECT_VAR_DECL (buf_src_1, float, 16, 8) [] = {A, B, C, D, I, J, K, L};
+  VECT_VAR_DECL (buf_src_2, float, 16, 8) [] = {E, F, G, H, M, N, O, P};
+  VLOAD (vsrc_1, buf_src_1, q, float, f, 16, 8);
+  VLOAD (vsrc_2, buf_src_2, q, float, f, 16, 8);
+  DECL_VARIABLE (vector_res, float, 16, 8) =
+    vrecpsq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		 VECT_VAR (vsrc_2, float, 16, 8));
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP_BIAS (TEST_MSG, float, 16, 8, PRIx16, expected_static, "", 1);
+}
+
+int
+main (void)
+{
+  exec_vrecps_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnd_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnd_f16_1.c
new file mode 100644
index 0000000..99ea119
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnd_f16_1.c
@@ -0,0 +1,74 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (123.4)
+#define RND_A FP16_C (123)
+#define B FP16_C (-567.5)
+#define RND_B FP16_C (-567)
+#define C FP16_C (-34.8)
+#define RND_C FP16_C (-34)
+#define D FP16_C (1024)
+#define RND_D FP16_C (1024)
+#define E FP16_C (663.1)
+#define RND_E FP16_C (663)
+#define F FP16_C (169.1)
+#define RND_F FP16_C (169)
+#define G FP16_C (-4.8)
+#define RND_G FP16_C (-4)
+#define H FP16_C (77)
+#define RND_H FP16_C (77)
+
+/* Expected results for vrnd.  */
+VECT_VAR_DECL (expected, float, 16, 4) [] = {RND_A, RND_B, RND_C, RND_D};
+hfloat16_t * VECT_VAR (expected_static, hfloat, 16, 4) =
+  (hfloat16_t *) VECT_VAR (expected, float, 16, 4);
+
+VECT_VAR_DECL (expected, float, 16, 8) [] = {RND_A, RND_B, RND_C, RND_D,
+					     RND_E, RND_F, RND_G, RND_H};
+hfloat16_t * VECT_VAR (expected_static, hfloat, 16, 8) =
+  (hfloat16_t *) VECT_VAR (expected, float, 16, 8);
+
+void
+exec_vrnd_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VRND (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc, float, 16, 4);
+  VECT_VAR_DECL (buf_src, float, 16, 4) [] = {A, B, C, D};
+  VLOAD (vsrc, buf_src, , float, f, 16, 4);
+  DECL_VARIABLE (vector_res, float, 16, 4) =
+    vrnd_f16 (VECT_VAR (vsrc, float, 16, 4));
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_static, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VRNDQ (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc, float, 16, 8);
+  VECT_VAR_DECL (buf_src, float, 16, 8) [] = {A, B, C, D, E, F, G, H};
+  VLOAD (vsrc, buf_src, q, float, f, 16, 8);
+  DECL_VARIABLE (vector_res, float, 16, 8) =
+    vrndq_f16 (VECT_VAR (vsrc, float, 16, 8));
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_static, "");
+}
+
+int
+main (void)
+{
+  exec_vrnd_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnda_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnda_f16_1.c
new file mode 100644
index 0000000..86b7fb6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnda_f16_1.c
@@ -0,0 +1,74 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (123.4)
+#define RNDA_A FP16_C (123)
+#define B FP16_C (-567.5)
+#define RNDA_B FP16_C (-568)
+#define C FP16_C (-34.8)
+#define RNDA_C FP16_C (-35)
+#define D FP16_C (1024)
+#define RNDA_D FP16_C (1024)
+#define E FP16_C (663.1)
+#define RNDA_E FP16_C (663)
+#define F FP16_C (169.1)
+#define RNDA_F FP16_C (169)
+#define G FP16_C (-4.8)
+#define RNDA_G FP16_C (-5)
+#define H FP16_C (77.5)
+#define RNDA_H FP16_C (78)
+
+/* Expected results for vrnda.  */
+VECT_VAR_DECL (expected, float, 16, 4) [] = {RNDA_A, RNDA_B, RNDA_C, RNDA_D};
+hfloat16_t * VECT_VAR (expected_static, hfloat, 16, 4) =
+  (hfloat16_t *) VECT_VAR (expected, float, 16, 4);
+
+VECT_VAR_DECL (expected, float, 16, 8) [] = {RNDA_A, RNDA_B, RNDA_C, RNDA_D,
+					     RNDA_E, RNDA_F, RNDA_G, RNDA_H};
+hfloat16_t * VECT_VAR (expected_static, hfloat, 16, 8) =
+  (hfloat16_t *) VECT_VAR (expected, float, 16, 8);
+
+void
+exec_vrnda_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VRNDA (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc, float, 16, 4);
+  VECT_VAR_DECL (buf_src, float, 16, 4) [] = {A, B, C, D};
+  VLOAD (vsrc, buf_src, , float, f, 16, 4);
+  DECL_VARIABLE (vector_res, float, 16, 4) =
+    vrnda_f16 (VECT_VAR (vsrc, float, 16, 4));
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_static, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VRNDAQ (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc, float, 16, 8);
+  VECT_VAR_DECL (buf_src, float, 16, 8) [] = {A, B, C, D, E, F, G, H};
+  VLOAD (vsrc, buf_src, q, float, f, 16, 8);
+  DECL_VARIABLE (vector_res, float, 16, 8) =
+    vrndaq_f16 (VECT_VAR (vsrc, float, 16, 8));
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_static, "");
+}
+
+int
+main (void)
+{
+  exec_vrnda_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndm_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndm_f16_1.c
new file mode 100644
index 0000000..904c265
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndm_f16_1.c
@@ -0,0 +1,74 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (123.4)
+#define RNDM_A FP16_C (123)
+#define B FP16_C (-567.5)
+#define RNDM_B FP16_C (-568)
+#define C FP16_C (-34.8)
+#define RNDM_C FP16_C (-35)
+#define D FP16_C (1024)
+#define RNDM_D FP16_C (1024)
+#define E FP16_C (663.1)
+#define RNDM_E FP16_C (663)
+#define F FP16_C (169.1)
+#define RNDM_F FP16_C (169)
+#define G FP16_C (-4.8)
+#define RNDM_G FP16_C (-5)
+#define H FP16_C (77.5)
+#define RNDM_H FP16_C (77)
+
+/* Expected results for vrndm.  */
+VECT_VAR_DECL (expected, float, 16, 4) [] = {RNDM_A, RNDM_B, RNDM_C, RNDM_D};
+hfloat16_t * VECT_VAR (expected_static, hfloat, 16, 4) =
+  (hfloat16_t *) VECT_VAR (expected, float, 16, 4);
+
+VECT_VAR_DECL (expected, float, 16, 8) [] = {RNDM_A, RNDM_B, RNDM_C, RNDM_D,
+					     RNDM_E, RNDM_F, RNDM_G, RNDM_H};
+hfloat16_t * VECT_VAR (expected_static, hfloat, 16, 8) =
+  (hfloat16_t *) VECT_VAR (expected, float, 16, 8);
+
+void
+exec_vrndm_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VRNDM (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc, float, 16, 4);
+  VECT_VAR_DECL (buf_src, float, 16, 4) [] = {A, B, C, D};
+  VLOAD (vsrc, buf_src, , float, f, 16, 4);
+  DECL_VARIABLE (vector_res, float, 16, 4) =
+    vrndm_f16 (VECT_VAR (vsrc, float, 16, 4));
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_static, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VRNDMQ (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc, float, 16, 8);
+  VECT_VAR_DECL (buf_src, float, 16, 8) [] = {A, B, C, D, E, F, G, H};
+  VLOAD (vsrc, buf_src, q, float, f, 16, 8);
+  DECL_VARIABLE (vector_res, float, 16, 8) =
+    vrndmq_f16 (VECT_VAR (vsrc, float, 16, 8));
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_static, "");
+}
+
+int
+main (void)
+{
+  exec_vrndm_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndn_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndn_f16_1.c
new file mode 100644
index 0000000..c94fd54
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndn_f16_1.c
@@ -0,0 +1,74 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (123.4)
+#define RNDN_A FP16_C (123)
+#define B FP16_C (-567.5)
+#define RNDN_B FP16_C (-568)
+#define C FP16_C (-34.8)
+#define RNDN_C FP16_C (-35)
+#define D FP16_C (1024)
+#define RNDN_D FP16_C (1024)
+#define E FP16_C (663.1)
+#define RNDN_E FP16_C (663)
+#define F FP16_C (169.1)
+#define RNDN_F FP16_C (169)
+#define G FP16_C (-4.8)
+#define RNDN_G FP16_C (-5)
+#define H FP16_C (77)
+#define RNDN_H FP16_C (77)
+
+/* Expected results for vrndn.  */
+VECT_VAR_DECL (expected, float, 16, 4) [] = {RNDN_A, RNDN_B, RNDN_C, RNDN_D};
+hfloat16_t * VECT_VAR (expected_static, hfloat, 16, 4) =
+  (hfloat16_t *) VECT_VAR (expected, float, 16, 4);
+
+VECT_VAR_DECL (expected, float, 16, 8) [] = {RNDN_A, RNDN_B, RNDN_C, RNDN_D,
+					     RNDN_E, RNDN_F, RNDN_G, RNDN_H};
+hfloat16_t * VECT_VAR (expected_static, hfloat, 16, 8) =
+  (hfloat16_t *) VECT_VAR (expected, float, 16, 8);
+
+void
+exec_vrndn_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VRNDN (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc, float, 16, 4);
+  VECT_VAR_DECL (buf_src, float, 16, 4) [] = {A, B, C, D};
+  VLOAD (vsrc, buf_src, , float, f, 16, 4);
+  DECL_VARIABLE (vector_res, float, 16, 4) =
+    vrndn_f16 (VECT_VAR (vsrc, float, 16, 4));
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_static, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VRNDNQ (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc, float, 16, 8);
+  VECT_VAR_DECL (buf_src, float, 16, 8) [] = {A, B, C, D, E, F, G, H};
+  VLOAD (vsrc, buf_src, q, float, f, 16, 8);
+  DECL_VARIABLE (vector_res, float, 16, 8) =
+    vrndnq_f16 (VECT_VAR (vsrc, float, 16, 8));
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_static, "");
+}
+
+int
+main (void)
+{
+  exec_vrndn_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndp_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndp_f16_1.c
new file mode 100644
index 0000000..6bcab83
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndp_f16_1.c
@@ -0,0 +1,74 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (123.4)
+#define RNDP_A FP16_C (124)
+#define B FP16_C (-567.5)
+#define RNDP_B FP16_C (-567)
+#define C FP16_C (-34.8)
+#define RNDP_C FP16_C (-34)
+#define D FP16_C (1024)
+#define RNDP_D FP16_C (1024)
+#define E FP16_C (163.1)
+#define RNDP_E FP16_C (164)
+#define F FP16_C (169.1)
+#define RNDP_F FP16_C (170)
+#define G FP16_C (-4.8)
+#define RNDP_G FP16_C (-4)
+#define H FP16_C (77.5)
+#define RNDP_H FP16_C (78)
+
+/* Expected results for vrndp.  */
+VECT_VAR_DECL (expected, float, 16, 4) [] = {RNDP_A, RNDP_B, RNDP_C, RNDP_D};
+hfloat16_t * VECT_VAR (expected_static, hfloat, 16, 4) =
+  (hfloat16_t *) VECT_VAR (expected, float, 16, 4);
+
+VECT_VAR_DECL (expected, float, 16, 8) [] = {RNDP_A, RNDP_B, RNDP_C, RNDP_D,
+					     RNDP_E, RNDP_F, RNDP_G, RNDP_H};
+hfloat16_t * VECT_VAR (expected_static, hfloat, 16, 8) =
+  (hfloat16_t *) VECT_VAR (expected, float, 16, 8);
+
+void
+exec_vrndp_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VRNDP (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc, float, 16, 4);
+  VECT_VAR_DECL (buf_src, float, 16, 4) [] = {A, B, C, D};
+  VLOAD (vsrc, buf_src, , float, f, 16, 4);
+  DECL_VARIABLE (vector_res, float, 16, 4) =
+    vrndp_f16 (VECT_VAR (vsrc, float, 16, 4));
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_static, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VRNDPQ (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc, float, 16, 8);
+  VECT_VAR_DECL (buf_src, float, 16, 8) [] = {A, B, C, D, E, F, G, H};
+  VLOAD (vsrc, buf_src, q, float, f, 16, 8);
+  DECL_VARIABLE (vector_res, float, 16, 8) =
+    vrndpq_f16 (VECT_VAR (vsrc, float, 16, 8));
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_static, "");
+}
+
+int
+main (void)
+{
+  exec_vrndp_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndx_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndx_f16_1.c
new file mode 100644
index 0000000..2413dc4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndx_f16_1.c
@@ -0,0 +1,74 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (123.4)
+#define RNDX_A FP16_C (123)
+#define B FP16_C (-567.8)
+#define RNDX_B FP16_C (-568)
+#define C FP16_C (-34.8)
+#define RNDX_C FP16_C (-35)
+#define D FP16_C (1024)
+#define RNDX_D FP16_C (1024)
+#define E FP16_C (663.1)
+#define RNDX_E FP16_C (663)
+#define F FP16_C (169.1)
+#define RNDX_F FP16_C (169)
+#define G FP16_C (-4.8)
+#define RNDX_G FP16_C (-5)
+#define H FP16_C (77)
+#define RNDX_H FP16_C (77)
+
+/* Expected results for vrndx.  */
+VECT_VAR_DECL (expected, float, 16, 4) [] = {RNDX_A, RNDX_B, RNDX_C, RNDX_D};
+hfloat16_t * VECT_VAR (expected_static, hfloat, 16, 4) =
+  (hfloat16_t *) VECT_VAR (expected, float, 16, 4);
+
+VECT_VAR_DECL (expected, float, 16, 8) [] = {RNDX_A, RNDX_B, RNDX_C, RNDX_D,
+					     RNDX_E, RNDX_F, RNDX_G, RNDX_H};
+hfloat16_t * VECT_VAR (expected_static, hfloat, 16, 8) =
+  (hfloat16_t *) VECT_VAR (expected, float, 16, 8);
+
+void
+exec_vrndx_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VRNDX (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc, float, 16, 4);
+  VECT_VAR_DECL (buf_src, float, 16, 4) [] = {A, B, C, D};
+  VLOAD (vsrc, buf_src, , float, f, 16, 4);
+  DECL_VARIABLE (vector_res, float, 16, 4) =
+    vrndx_f16 (VECT_VAR (vsrc, float, 16, 4));
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_static, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VRNDXQ (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc, float, 16, 8);
+  VECT_VAR_DECL (buf_src, float, 16, 8) [] = {A, B, C, D, E, F, G, H};
+  VLOAD (vsrc, buf_src, q, float, f, 16, 8);
+  DECL_VARIABLE (vector_res, float, 16, 8) =
+    vrndxq_f16 (VECT_VAR (vsrc, float, 16, 8));
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_static, "");
+}
+
+int
+main (void)
+{
+  exec_vrndx_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrsqrts_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrsqrts_f16_1.c
new file mode 100644
index 0000000..d8b7ee3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrsqrts_f16_1.c
@@ -0,0 +1,92 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (12.4)
+#define B FP16_C (-5.8)
+#define C FP16_C (-3.8)
+#define D FP16_C (10)
+#define E FP16_C (66.1)
+#define F FP16_C (16.1)
+#define G FP16_C (-4.8)
+#define H FP16_C (-77)
+
+#define I FP16_C (0.7)
+#define J FP16_C (-78)
+#define K FP16_C (10.23)
+#define L FP16_C (98)
+#define M FP16_C (87)
+#define N FP16_C (-87.81)
+#define O FP16_C (-1.1)
+#define P FP16_C (47.8)
+
+/* Expected results for vrsqrts.  */
+VECT_VAR_DECL (expected, float, 16, 4) [] = {(3.0f + (-A) * E) / 2.0f,
+					     (3.0f + (-B) * F) / 2.0f,
+					     (3.0f + (-C) * G) / 2.0f,
+					     (3.0f + (-D) * H) / 2.0f};
+hfloat16_t * VECT_VAR (expected_static, hfloat, 16, 4) =
+  (hfloat16_t *) VECT_VAR (expected, float, 16, 4);
+
+VECT_VAR_DECL (expected, float, 16, 8) [] = {(3.0f + (-A) * E) / 2.0f,
+					     (3.0f + (-B) * F) / 2.0f,
+					     (3.0f + (-C) * G) / 2.0f,
+					     (3.0f + (-D) * H) / 2.0f,
+					     (3.0f + (-I) * M) / 2.0f,
+					     (3.0f + (-J) * N) / 2.0f,
+					     (3.0f + (-K) * O) / 2.0f,
+					     (3.0f + (-L) * P) / 2.0f};
+hfloat16_t * VECT_VAR (expected_static, hfloat, 16, 8) =
+  (hfloat16_t *) VECT_VAR (expected, float, 16, 8);
+
+void
+exec_vrsqrts_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VRSQRTS (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc_1, float, 16, 4);
+  DECL_VARIABLE (vsrc_2, float, 16, 4);
+  VECT_VAR_DECL (buf_src_1, float, 16, 4) [] = {A, B, C, D};
+  VECT_VAR_DECL (buf_src_2, float, 16, 4) [] = {E, F, G, H};
+  VLOAD (vsrc_1, buf_src_1, , float, f, 16, 4);
+  VLOAD (vsrc_2, buf_src_2, , float, f, 16, 4);
+  DECL_VARIABLE (vector_res, float, 16, 4) =
+    vrsqrts_f16 (VECT_VAR (vsrc_1, float, 16, 4),
+		 VECT_VAR (vsrc_2, float, 16, 4));
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_static, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VRSQRTSQ (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc_1, float, 16, 8);
+  DECL_VARIABLE (vsrc_2, float, 16, 8);
+  VECT_VAR_DECL (buf_src_1, float, 16, 8) [] = {A, B, C, D, I, J, K, L};
+  VECT_VAR_DECL (buf_src_2, float, 16, 8) [] = {E, F, G, H, M, N, O, P};
+  VLOAD (vsrc_1, buf_src_1, q, float, f, 16, 8);
+  VLOAD (vsrc_2, buf_src_2, q, float, f, 16, 8);
+  DECL_VARIABLE (vector_res, float, 16, 8) =
+    vrsqrtsq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+		  VECT_VAR (vsrc_2, float, 16, 8));
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP_BIAS (TEST_MSG, float, 16, 8, PRIx16, expected_static, "", 1);
+}
+
+int
+main (void)
+{
+  exec_vrsqrts_f16 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vsub_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vsub_f16_1.c
new file mode 100644
index 0000000..d54b011
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vsub_f16_1.c
@@ -0,0 +1,82 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define FP16_C(a) ((__fp16) a)
+#define A FP16_C (123.4)
+#define B FP16_C (-567.8)
+#define C FP16_C (-34.8)
+#define D FP16_C (1024)
+#define E FP16_C (663.1)
+#define F FP16_C (169.1)
+#define G FP16_C (-4.8)
+#define H FP16_C (77)
+
+#define I FP16_C (0.7)
+#define J FP16_C (-78)
+#define K FP16_C (101.23)
+#define L FP16_C (98)
+#define M FP16_C (870.1)
+#define N FP16_C (-8781)
+#define O FP16_C (-1.1)
+#define P FP16_C (47823)
+
+/* Expected results for vsub.  */
+VECT_VAR_DECL (expected, float, 16, 4) [] = {A - E, B - F, C - G, D - H};
+hfloat16_t * VECT_VAR (expected_static, hfloat, 16, 4) =
+  (hfloat16_t *) VECT_VAR (expected, float, 16, 4);
+
+VECT_VAR_DECL (expected, float, 16, 8) [] = {A - E, B - F, C - G, D - H,
+					     I - M, J - N, K - O, L - P};
+hfloat16_t * VECT_VAR (expected_static, hfloat, 16, 8) =
+  (hfloat16_t *) VECT_VAR (expected, float, 16, 8);
+
+void
+exec_vsub_f16 (void)
+{
+#undef TEST_MSG
+#define TEST_MSG "VSUB (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc_1, float, 16, 4);
+  DECL_VARIABLE (vsrc_2, float, 16, 4);
+  VECT_VAR_DECL (buf_src_1, float, 16, 4) [] = {A, B, C, D};
+  VECT_VAR_DECL (buf_src_2, float, 16, 4) [] = {E, F, G, H};
+  VLOAD (vsrc_1, buf_src_1, , float, f, 16, 4);
+  VLOAD (vsrc_2, buf_src_2, , float, f, 16, 4);
+  DECL_VARIABLE (vector_res, float, 16, 4) =
+    vsub_f16 (VECT_VAR (vsrc_1, float, 16, 4), VECT_VAR (vsrc_2, float, 16, 4));
+  vst1_f16 (VECT_VAR (result, float, 16, 4),
+	    VECT_VAR (vector_res, float, 16, 4));
+
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_static, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VSUBQ (FP16)"
+  clean_results ();
+
+  DECL_VARIABLE (vsrc_1, float, 16, 8);
+  DECL_VARIABLE (vsrc_2, float, 16, 8);
+  VECT_VAR_DECL (buf_src_1, float, 16, 8) [] = {A, B, C, D, I, J, K, L};
+  VECT_VAR_DECL (buf_src_2, float, 16, 8) [] = {E, F, G, H, M, N, O, P};
+  VLOAD (vsrc_1, buf_src_1, q, float, f, 16, 8);
+  VLOAD (vsrc_2, buf_src_2, q, float, f, 16, 8);
+  DECL_VARIABLE (vector_res, float, 16, 8) =
+    vsubq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
+	       VECT_VAR (vsrc_2, float, 16, 8));
+  vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	     VECT_VAR (vector_res, float, 16, 8));
+
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_static, "");
+}
+
+int
+main (void)
+{
+  exec_vsub_f16 ();
+  return 0;
+}
-- 
2.1.4


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 8/17][ARM] Add VFP FP16 arithmetic instructions.
  2016-05-17 14:36 ` [PATCH 8/17][ARM] Add VFP FP16 arithmetic instructions Matthew Wahab
@ 2016-05-18  0:52   ` Joseph Myers
  2016-05-18  0:57     ` Joseph Myers
  2016-05-18 13:40     ` Matthew Wahab
  2016-07-04 14:02   ` Matthew Wahab
  1 sibling, 2 replies; 73+ messages in thread
From: Joseph Myers @ 2016-05-18  0:52 UTC (permalink / raw)
  To: Matthew Wahab; +Cc: gcc-patches

On Tue, 17 May 2016, Matthew Wahab wrote:

> In most cases the instructions are added using non-standard pattern
> names. This is to force operations on __fp16 values to be done, by
> conversion, using the single-precision instructions. The exceptions are
> the precision preserving operations ABS and NEG.

But why do you need to force that?  If the instructions follow IEEE 
semantics including for exceptions and rounding modes, then X OP Y 
computed directly with binary16 arithmetic has the same value as results 
from promoting to binary32, doing binary32 arithmetic and converting back 
to binary16, for OP in + - * /.  (Double-rounding problems can only occur 
in round-to-nearest and if the binary32 result is exactly half way between 
two representable binary16 values but the exact result is not exactly half 
way between.  It's obvious that this can't occur to + - * and only a bit 
harder to see this for /.  According to the logic used in 
convert.c:convert_to_real_1, double rounding can't occur in this case for 
square root either, though I haven't verified that.)

So I'd expect e.g.

__fp16 a, b;
__fp16 c = a / b;

to generate the new instructions, because direct binary16 arithmetic is a 
correct implementation of (__fp16) ((float) a / (float) b).

(ISO C, even with DTS 18661-5, does not concern itself with the number of 
times an expression raises a given exception beyond whether that is zero 
or nonzero, so changes between two and one instances of "inexact" are not 
a concern.)

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 8/17][ARM] Add VFP FP16 arithmetic instructions.
  2016-05-18  0:52   ` Joseph Myers
@ 2016-05-18  0:57     ` Joseph Myers
  2016-05-18 13:40     ` Matthew Wahab
  1 sibling, 0 replies; 73+ messages in thread
From: Joseph Myers @ 2016-05-18  0:57 UTC (permalink / raw)
  To: Matthew Wahab; +Cc: gcc-patches

On Wed, 18 May 2016, Joseph Myers wrote:

> But why do you need to force that?  If the instructions follow IEEE 
> semantics including for exceptions and rounding modes, then X OP Y 
> computed directly with binary16 arithmetic has the same value as results 
> from promoting to binary32, doing binary32 arithmetic and converting back 
> to binary16, for OP in + - * /.  (Double-rounding problems can only occur 

I should say: this is not the case for fma - (__fp16) fmaf (a, b, c) need 
not be the same as fmaf16 (a, b, c) for fp16 values a, b, c - but I think 
you should use the standard instruction name there as well - if the 
instruction is a fused multiply-add on binary16, it should be described as 
such.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 9/17][ARM] Add NEON FP16 arithmetic instructions.
  2016-05-17 14:37 ` [PATCH 9/17][ARM] Add NEON " Matthew Wahab
@ 2016-05-18  0:58   ` Joseph Myers
  2016-05-19 17:01     ` Jiong Wang
  2016-07-04 14:09     ` Matthew Wahab
  0 siblings, 2 replies; 73+ messages in thread
From: Joseph Myers @ 2016-05-18  0:58 UTC (permalink / raw)
  To: Matthew Wahab; +Cc: gcc-patches

On Tue, 17 May 2016, Matthew Wahab wrote:

> As with the VFP FP16 arithmetic instructions, operations on __fp16
> values are done by conversion to single-precision. Any new optimization
> supported by the instruction descriptions can only apply to code
> generated using intrinsics added in this patch series.

As with the scalar instructions, I think it is legitimate in most cases to 
optimize arithmetic via single precision to work direct on __fp16 values 
(and this would be natural for vectorization of __fp16 arithmetic).

> A number of the instructions are modelled as two variants, one using
> UNSPEC and the other using RTL operations, with the model used decided
> by the funsafe-math-optimizations flag. This follows the
> single-precision instructions and is due to the half-precision
> operations having the same conditions and restrictions on their use in
> optmizations (when they are enabled).

(Of course, these restrictions still apply.)

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 16/17][ARM] Add tests for VFP FP16 ACLE instrinsics.
  2016-05-17 14:51 ` [PATCH 16/17][ARM] Add tests for VFP FP16 ACLE instrinsics Matthew Wahab
@ 2016-05-18  1:07   ` Joseph Myers
  2016-05-18 10:58     ` Matthew Wahab
  0 siblings, 1 reply; 73+ messages in thread
From: Joseph Myers @ 2016-05-18  1:07 UTC (permalink / raw)
  To: Matthew Wahab; +Cc: gcc-patches

On Tue, 17 May 2016, Matthew Wahab wrote:

> In some tests, there are unavoidable differences in precision when
> calculating the actual and the expected results of an FP16 operation. A
> new support function CHECK_FP_BIAS is used so that these tests can check
> for an acceptable margin of error. In these tests, the tolerance is
> given as the absolute integer difference between the bitvectors of the
> expected and the actual results.

As far as I can see, CHECK_FP_BIAS is only used in the following patch, 
but there is another bias test in vsqrth_f16_1.c in this patch.

Could you clarify where the "unavoidable differences in precision" come 
from?  Are the results of some of the new instructions not fully 
specified, only specified within a given precision?  (As far as I can tell 
the existing v8 instructions for reciprocal and reciprocal square root 
estimates do have fully defined results, despite being loosely described 
as esimtates.)

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 16/17][ARM] Add tests for VFP FP16 ACLE instrinsics.
  2016-05-18  1:07   ` Joseph Myers
@ 2016-05-18 10:58     ` Matthew Wahab
  2016-07-04 14:18       ` Matthew Wahab
  0 siblings, 1 reply; 73+ messages in thread
From: Matthew Wahab @ 2016-05-18 10:58 UTC (permalink / raw)
  To: Joseph Myers; +Cc: gcc-patches, jiong Wang

On 18/05/16 02:06, Joseph Myers wrote:
> On Tue, 17 May 2016, Matthew Wahab wrote:
>
>> In some tests, there are unavoidable differences in precision when calculating
>> the actual and the expected results of an FP16 operation. A new support function
>> CHECK_FP_BIAS is used so that these tests can check for an acceptable margin of
>> error. In these tests, the tolerance is given as the absolute integer difference
>> between the bitvectors of the expected and the actual results.
>
> As far as I can see, CHECK_FP_BIAS is only used in the following patch, but there
>  is another bias test in vsqrth_f16_1.c in this patch.

This is my mistake, the CHECK_FP_BIAS is used for the NEON tests and should have gone
into that patch. The VFP test can do a simpler check so doesn't need the macro.

> Could you clarify where the "unavoidable differences in precision" come from? Are
> the results of some of the new instructions not fully specified, only specified
> within a given precision?  (As far as I can tell the existing v8 instructions for
> reciprocal and reciprocal square root estimates do have fully defined results,
> despite being loosely described as esimtates.)

The expected results in the new tests are represented as expressions whose value is
expected to be calculated at compile-time. This makes the tests more readable but 
differences in the precision between the the compiler and the HW calculations mean 
that for vrecpe_f16, vrecps_f16, vrsqrts_f16 and vsqrth_f16_1.c the expected and 
actual results are different.

On reflection, it may be better to remove the CHECK_FP_BIAS macro and, for the tests 
that needed it, to drop the compiler calculation and just use the expected 
hexadecimal value.

Other tests depending on compiler-time calculations involve relatively simple 
arithmetic operations and it's not clear if they are susceptible to the same rounding 
errors. I have limited knowledge in FP arithmetic though so I'll look into this.

Matthew

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 8/17][ARM] Add VFP FP16 arithmetic instructions.
  2016-05-18  0:52   ` Joseph Myers
  2016-05-18  0:57     ` Joseph Myers
@ 2016-05-18 13:40     ` Matthew Wahab
  2016-05-18 15:21       ` Joseph Myers
  1 sibling, 1 reply; 73+ messages in thread
From: Matthew Wahab @ 2016-05-18 13:40 UTC (permalink / raw)
  To: Joseph Myers; +Cc: gcc-patches

On 18/05/16 01:51, Joseph Myers wrote:
> On Tue, 17 May 2016, Matthew Wahab wrote:
>
>> In most cases the instructions are added using non-standard pattern
>> names. This is to force operations on __fp16 values to be done, by
>> conversion, using the single-precision instructions. The exceptions are
>> the precision preserving operations ABS and NEG.
>
> But why do you need to force that?  If the instructions follow IEEE
> semantics including for exceptions and rounding modes, then X OP Y
> computed directly with binary16 arithmetic has the same value as results
> from promoting to binary32, doing binary32 arithmetic and converting back
> to binary16, for OP in + - * /.  (Double-rounding problems can only occur
> in round-to-nearest and if the binary32 result is exactly half way between
> two representable binary16 values but the exact result is not exactly half
> way between.  It's obvious that this can't occur to + - * and only a bit
> harder to see this for /.  According to the logic used in
> convert.c:convert_to_real_1, double rounding can't occur in this case for
> square root either, though I haven't verified that.)

AArch64 follows IEEE-754 but ARM (AArch32) adds restrictions like flush-to-zero that 
could affect the outcome of a calculation.

> So I'd expect e.g.
>
> __fp16 a, b;
> __fp16 c = a / b;
>
> to generate the new instructions, because direct binary16 arithmetic is a
> correct implementation of (__fp16) ((float) a / (float) b).

Something like

__fp16 a, b, c;
__fp16 d = (a / b) * c;

would be done as the sequence of single precision operations:

vcvtb.f32.f16 s0, s0
vcvtb.f32.f16 s1, s1
vcvtb.f32.f16 s2, s2
vdiv.f32 s15, s0, s1
vmul.f32 s0, s15, s2
vcvtb.f16.f32 s0, s0

Doing this with vdiv.f16 and vmul.f16 could change the calculated result because the 
flush-to-zero rule is related to operation precision so affects the value of a 
vdiv.f16 differently from the vdiv.f32.

(At least, that's my understanding.)

Matthew

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 8/17][ARM] Add VFP FP16 arithmetic instructions.
  2016-05-18 13:40     ` Matthew Wahab
@ 2016-05-18 15:21       ` Joseph Myers
  2016-05-19 14:54         ` Matthew Wahab
  0 siblings, 1 reply; 73+ messages in thread
From: Joseph Myers @ 2016-05-18 15:21 UTC (permalink / raw)
  To: Matthew Wahab; +Cc: gcc-patches

On Wed, 18 May 2016, Matthew Wahab wrote:

> AArch64 follows IEEE-754 but ARM (AArch32) adds restrictions like
> flush-to-zero that could affect the outcome of a calculation.

The result of a float computation on two values immediately promoted from 
fp16 cannot be within the subnormal range for float.  Thus, only one flush 
to zero can happen, on the final conversion back to fp16, and that cannot 
make the result different from doing direct arithmetic in fp16 (assuming 
flush to zero affects conversion from float to fp16 the same way it 
affects direct fp16 arithmetic).

> > So I'd expect e.g.
> > 
> > __fp16 a, b;
> > __fp16 c = a / b;
> > 
> > to generate the new instructions, because direct binary16 arithmetic is a
> > correct implementation of (__fp16) ((float) a / (float) b).
> 
> Something like
> 
> __fp16 a, b, c;
> __fp16 d = (a / b) * c;
> 
> would be done as the sequence of single precision operations:
> 
> vcvtb.f32.f16 s0, s0
> vcvtb.f32.f16 s1, s1
> vcvtb.f32.f16 s2, s2
> vdiv.f32 s15, s0, s1
> vmul.f32 s0, s15, s2
> vcvtb.f16.f32 s0, s0
> 
> Doing this with vdiv.f16 and vmul.f16 could change the calculated result
> because the flush-to-zero rule is related to operation precision so affects
> the value of a vdiv.f16 differently from the vdiv.f32.

Flush to zero is irrelevant here, since that sequence of three operations 
also cannot produce anything in the subnormal range for float.  (It's true 
that double rounding is relevant for your example and so converting it to 
direct fp16 arithmetic would not be safe for that reason.)

That example is also not relevant to my point.  In my example

> > __fp16 a, b;
> > __fp16 c = a / b;

it's already the case that GCC will (a) promote to float, because the 
target hooks say to do so, (b) notice that the result is immediately 
converted back to fp16, and that this means fp16 arithmetic could be used 
directly, and so adjust it back to fp16 arithmetic (see convert_to_real_1, 
and the call therein to real_can_shorten_arithmetic which knows conditions 
under which it's safe to change such promoted arithmetic back to 
arithmetic on a narrower type).  Then the expanders (I think) notice the 
lack of direct HFmode arithmetic and so put the widening / narrowing back 
again.

But in your example, *because* doing it with direct fp16 arithmetic would 
not be equivalent, convert_to_real_1 would not eliminate the conversions 
to float, the float operations would still be present at expansion time, 
and so direct HFmode arithmetic patterns would not match.

In short: instructions for direct HFmode arithmetic should be described 
with patterns with the standard names.  It's the job of the 
architecture-independent compiler to ensure that fp16 arithmetic in the 
user's source code only generates direct fp16 arithmetic in GIMPLE (and 
thus ends up using those patterns) if that is a correct representation of 
the source code's semantics according to ACLE.

The intrinsics you provide can then be written to use direct arithmetic, 
and rely on convert_to_real_1 eliminating the promotions, rather than 
needing built-in functions at all, just like many arm_neon.h intrinsics 
make direct use of GNU C vector arithmetic.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 8/17][ARM] Add VFP FP16 arithmetic instructions.
  2016-05-18 15:21       ` Joseph Myers
@ 2016-05-19 14:54         ` Matthew Wahab
  0 siblings, 0 replies; 73+ messages in thread
From: Matthew Wahab @ 2016-05-19 14:54 UTC (permalink / raw)
  To: Joseph Myers; +Cc: gcc-patches

On 18/05/16 16:20, Joseph Myers wrote:
> On Wed, 18 May 2016, Matthew Wahab wrote:
>
>> AArch64 follows IEEE-754 but ARM (AArch32) adds restrictions like
>> flush-to-zero that could affect the outcome of a calculation.
>
> The result of a float computation on two values immediately promoted from
> fp16 cannot be within the subnormal range for float.  Thus, only one flush
> to zero can happen, on the final conversion back to fp16, and that cannot
> make the result different from doing direct arithmetic in fp16 (assuming
> flush to zero affects conversion from float to fp16 the same way it
> affects direct fp16 arithmetic).
>
[..]
>
> In short: instructions for direct HFmode arithmetic should be described
> with patterns with the standard names.  It's the job of the
> architecture-independent compiler to ensure that fp16 arithmetic in the
> user's source code only generates direct fp16 arithmetic in GIMPLE (and
> thus ends up using those patterns) if that is a correct representation of
> the source code's semantics according to ACLE.
>
> The intrinsics you provide can then be written to use direct arithmetic,
> and rely on convert_to_real_1 eliminating the promotions, rather than
> needing built-in functions at all, just like many arm_neon.h intrinsics
> make direct use of GNU C vector arithmetic.

I think it's clear that this has exhausted my knowledge of FP semantics.

Forcing promotion to single-precision was to settle concerns brought up in internal 
discussions about __fp16 semantics. I'll see if anybody has any problem with the 
changes you suggest.

Thanks,
Matthew

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 9/17][ARM] Add NEON FP16 arithmetic instructions.
  2016-05-18  0:58   ` Joseph Myers
@ 2016-05-19 17:01     ` Jiong Wang
  2016-05-19 17:29       ` Joseph Myers
  2016-07-04 14:09     ` Matthew Wahab
  1 sibling, 1 reply; 73+ messages in thread
From: Jiong Wang @ 2016-05-19 17:01 UTC (permalink / raw)
  To: Joseph Myers; +Cc: Matthew Wahab, gcc-patches



On 18/05/16 01:58, Joseph Myers wrote:
> On Tue, 17 May 2016, Matthew Wahab wrote:
>
>> As with the VFP FP16 arithmetic instructions, operations on __fp16
>> values are done by conversion to single-precision. Any new optimization
>> supported by the instruction descriptions can only apply to code
>> generated using intrinsics added in this patch series.
> As with the scalar instructions, I think it is legitimate in most cases to
> optimize arithmetic via single precision to work direct on __fp16 values
> (and this would be natural for vectorization of __fp16 arithmetic).

Hi Josephy,

   Currently for vector types like v4hf, there is not type promotion, it 
will live
on arm until it reaches vector lower pass where it's splitted into hf 
operations, then
these hf operations will be widened into sf operation during rtl expand 
as we don't have
scalar hf support on standard patterns.

Then,

   * if we add scalar HF mode to standard patterns, vector HF modes 
operation will be
     turned into scalar HF operations instead of scalar SF operations.

   * if we add vector HF mode to standard patterns, vector HF modes 
operations will
     generate vector HF instructions directly.

   Will this still cause precision inconsistence with old gcc when there 
are cascade
   vector float operations?

   Thanks

Regards,
Jiong

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 9/17][ARM] Add NEON FP16 arithmetic instructions.
  2016-05-19 17:01     ` Jiong Wang
@ 2016-05-19 17:29       ` Joseph Myers
  2016-06-08  8:46         ` James Greenhalgh
  0 siblings, 1 reply; 73+ messages in thread
From: Joseph Myers @ 2016-05-19 17:29 UTC (permalink / raw)
  To: Jiong Wang; +Cc: Matthew Wahab, gcc-patches

On Thu, 19 May 2016, Jiong Wang wrote:

> Then,
> 
>   * if we add scalar HF mode to standard patterns, vector HF modes operation
> will be
>     turned into scalar HF operations instead of scalar SF operations.
> 
>   * if we add vector HF mode to standard patterns, vector HF modes operations
> will
>     generate vector HF instructions directly.
> 
>   Will this still cause precision inconsistence with old gcc when there are
> cascade
>   vector float operations?

I'm not sure inconsistency with old GCC is what's relevant here.

Standard-named RTL patterns have particular semantics.  Those semantics do 
not depend on the target architecture (except where there are target 
macros / hooks to define such dependence).  If you have an instruction 
that matches those target-independent semantics, it should be available 
for the standard-named pattern.  I believe that is the case here, for both 
the scalar and the vector instructions - they have the standard semantics, 
so should be available for the standard patterns.

It is the responsibility of the target-independent parts of the compiler 
to ensure that the RTL generated matches the source code semantics, so 
that providing a standard pattern for an instruction that matches the 
pattern's semantics does not cause any problems regarding source code 
semantics.

That said: if the expander in old GCC is converting a vector HF operation 
into scalar SF operations, I'd expect it also to include a conversion from 
SFmode back to HFmode after those operations, since it will be producing a 
vector HF result.  And that would apply for each individual operation 
expanded.  So I would not expect inconsistency to arise from making direct 
HFmode operations available (given that the semantics of scalar + - * / 
are the same whether you do them directly on HFmode or promote to SFmode, 
do the operation there and then convert the result back to HFmode before 
doing any further operations on it).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 9/17][ARM] Add NEON FP16 arithmetic instructions.
  2016-05-19 17:29       ` Joseph Myers
@ 2016-06-08  8:46         ` James Greenhalgh
  2016-06-08 20:02           ` Joseph Myers
  0 siblings, 1 reply; 73+ messages in thread
From: James Greenhalgh @ 2016-06-08  8:46 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Jiong Wang, Matthew Wahab, gcc-patches, nd, Szabolcs.Nagy,
	ramana.radhakrishnan

[-- Attachment #1: Type: text/plain, Size: 6025 bytes --]

On Thu, May 19, 2016 at 05:29:16PM +0000, Joseph Myers wrote:
> On Thu, 19 May 2016, Jiong Wang wrote:
> 
> > Then,
> > 
> >   * if we add scalar HF mode to standard patterns, vector HF modes operation
> > will be
> >     turned into scalar HF operations instead of scalar SF operations.
> > 
> >   * if we add vector HF mode to standard patterns, vector HF modes operations
> > will
> >     generate vector HF instructions directly.
> > 
> >   Will this still cause precision inconsistence with old gcc when there are
> > cascade
> >   vector float operations?
> 
> I'm not sure inconsistency with old GCC is what's relevant here.
> 
> Standard-named RTL patterns have particular semantics.  Those semantics do 
> not depend on the target architecture (except where there are target 
> macros / hooks to define such dependence).  If you have an instruction 
> that matches those target-independent semantics, it should be available 
> for the standard-named pattern.  I believe that is the case here, for both 
> the scalar and the vector instructions - they have the standard semantics, 
> so should be available for the standard patterns.
> 
> It is the responsibility of the target-independent parts of the compiler 
> to ensure that the RTL generated matches the source code semantics, so 
> that providing a standard pattern for an instruction that matches the 
> pattern's semantics does not cause any problems regarding source code 
> semantics.
> 
> That said: if the expander in old GCC is converting a vector HF operation 
> into scalar SF operations, I'd expect it also to include a conversion from 
> SFmode back to HFmode after those operations, since it will be producing a 
> vector HF result.  And that would apply for each individual operation 
> expanded.  So I would not expect inconsistency to arise from making direct 
> HFmode operations available (given that the semantics of scalar + - * / 
> are the same whether you do them directly on HFmode or promote to SFmode, 
> do the operation there and then convert the result back to HFmode before 
> doing any further operations on it).

I think the confusion here is that these two functions:

  float16x8_t
  __attribute__ ((noinline)) 
  foo (float16x8_t a, float16x8_t b, float16x8_t c)
  {
    return a * b / c;
  }

  float16_t
  __attribute__ ((noinline)) 
  bar (float16_t a, float16_t b, float16_t c)
  {
    return a * b / c;
  }

Have different behaviours in terms of when they extend and truncate between
floating-point precisions.

A full testcase calling these functions is attached.

Compile with

  `gcc -O3`
     for AArch64 ARMv8-A
  `gcc -O3 -mfloat-abi=hard -mfpu=neon-fp16 -mfp16-format=ieee -march=armv7-a`
     for ARMv7-A 

This prints:

  Fail:
	Scalar Input	256.000000
	Scalar Output	256.000000
	Vector input	256.000000
	Vector output	inf
  Fail:
	Scalar Input	3.300781
	Scalar Output	3.300781
	Vector input	3.300781
	Vector output	3.302734
  Fail:
	Scalar Input	10000.000000
	Scalar Output	10000.000000
	Vector input	10000.000000
	Vector output	inf
  Fail:
	Scalar Input	0.000003
	Scalar Output	0.000003
	Vector input	0.000003
	Vector output	0.000000
  Fail:
	Scalar Input	0.000400
	Scalar Output	0.000400
	Vector input	0.000400
	Vector output	0.000447

foo, operating on vectors, remains in 16-bit precision throughout gimple,
will scalarise during veclower, and will add float_extend and float_truncate
around each operation during expand to preserve the 16-bit rounding
behaviour. For this testcase, that means two truncates per vector element.
One after the multiply, one after the divide.

bar, operating on scalars, adds promotions early due to TARGET_PROMOTED_TYPE.
In gimple we stay in 32-bit precision for the two operations, and we
truncate only after both operations. That means one truncate, taking place
after the divide.

However, I find this surprising at a language level, though I see
that Clang 3.8 has the same behaviour.  ACLE doesn't mention the GCC
vector extensions, so doesn't specify the behaviour of the arithmetic
operators on vector-of-float16_t types. GCC's vector extension documentation
gives this definition for arithmetic operations:

  The types defined in this manner can be used with a subset of normal
  C operations. Currently, GCC allows using the following operators on
  these types: +, -, *, /, unary minus, ^, |, &, ~, %.

  The operations behave like C++ valarrays. Addition is defined as
  the addition of the corresponding elements of the operands. For
  example, in the code below, each of the 4 elements in a is added to
  the corresponding 4 elements in b and the resulting vector is stored
  in c.

  Subtraction, multiplication, division, and the logical operations
  operate in a similar manner. Likewise, the result of using the unary
  minus or complement operators on a vector type is a vector whose
  elements are the negative or complemented values of the corresponding
  elements in the operand. 

Without digging in to the compiler code, I would have expected the vector
implementation to give equivalent results to the scalar one.

My question is whether you consider the different behaviour between scalar
float16_t and vector-of-float16_t types to be a bug? I can think of some
ways to fix the vector behaviour if it is buggy, but they would of course
be a change in behaviour from current releases (and from clang 3.8).

Clearly, this makes no difference to your comment that we should implement
these using standard pattern names. Either this is a bug, in which case
the front-end will arrange for the promotion to vector-of-float32_t
types, and implementing the vector standard pattern names would potentially
allow for some optimisation back to vector-of-float16_t type, or this
is not a bug, in which case the vector-of-float16_t standard pattern names
match the expected semantics perfectly.

Thanks,
James

[-- Attachment #2: fp16-scalar-vector.c --]
[-- Type: text/x-csrc, Size: 805 bytes --]

#include "arm_neon.h"
#include "stdio.h"

float16x8_t
__attribute__ ((noinline))
foo (float16x8_t a, float16x8_t b, float16x8_t c)
{
  return a * b / c;
}

float16_t
__attribute__ ((noinline))
bar (float16_t a, float16_t b, float16_t c)
{
  return a * b / c;
}

#define VALS { 1.0f, 256.0f, 1.1f, 2.2f, \
		     3.3f, 10000.0f, 0.000003f, 0.0004f }

int
main (int argc, char **argv)
{
  float16_t x[8] = VALS;
  float16_t y[8];
  float16x8_t vx = VALS;

  for (int i = 0; i< 8; i++)
    y[i] = bar (x[i], x[i], x[i]);

  float16x8_t vy = foo (vx, vx, vx);

  for (int i = 0; i < 8; i++)
    if (y[i] != vy[i])
      printf ("Fail:\n\tScalar Input\t%f\n\tScalar Output\t%f\n\t"
	      "Vector input\t%f\n\tVector output\t%f\n",
	      x[i], y[i], vx[i], vy[i]);

}

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 9/17][ARM] Add NEON FP16 arithmetic instructions.
  2016-06-08  8:46         ` James Greenhalgh
@ 2016-06-08 20:02           ` Joseph Myers
  0 siblings, 0 replies; 73+ messages in thread
From: Joseph Myers @ 2016-06-08 20:02 UTC (permalink / raw)
  To: James Greenhalgh
  Cc: Jiong Wang, Matthew Wahab, gcc-patches, nd, Szabolcs.Nagy,
	ramana.radhakrishnan

On Wed, 8 Jun 2016, James Greenhalgh wrote:

> My question is whether you consider the different behaviour between scalar
> float16_t and vector-of-float16_t types to be a bug? I can think of some

No, because it matches how things work for vectors of integer types.  
E.g.:

typedef unsigned char vuc __attribute__((vector_size(8)));

vuc a = { 128, 128, 128, 128, 128, 128, 128, 128 }, b;

int
main (void)
{
  b = a / (a + a);
  return 0;
}

(Does a divide-by-zero, because (a + a) is evaluated without promotion to 
vector of int.)

It's a general rule for vector operations that there are no promotions 
that change the bit-size of the vectors, so arithmetic is done directly on 
unsigned char in this case, even though it normally would not be.  
Conversions when the types match apart from signedness are, as the comment 
in c_common_type notes, not fully defined.

  /* If one type is a vector type, return that type.  (How the usual
     arithmetic conversions apply to the vector types extension is not
     precisely specified.)  */

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 1/17][ARM] Add ARMv8.2-A command line option and profile.
  2016-05-17 14:23 ` [PATCH 1/17][ARM] Add ARMv8.2-A command line option and profile Matthew Wahab
@ 2016-07-04 13:46   ` Matthew Wahab
  2016-09-21 13:57     ` Ramana Radhakrishnan
  0 siblings, 1 reply; 73+ messages in thread
From: Matthew Wahab @ 2016-07-04 13:46 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2037 bytes --]

On 17/05/16 15:22, Matthew Wahab wrote:
 > This patch adds the command options for the architecture ARMv8.2-A and
 > the half-precision extension. The architecture is selected by
 > -march=armv8.2-a and has all the properties of -march=armv8.1-a.
 >
 > This patch also enables the CRC extension (+crc) which is required
 > for both ARMv8.2-A and ARMv8.1-A architectures but is not currently
 > enabled by default for -march=armv8.1-a.
 >
 > The half-precision extension is selected using the extension +fp16. This
 > enables the VFP FP16 instructions if an ARMv8 VFP unit is also
 > specified, e.g. by -mfpu=fp-armv8. It also enables the FP16 NEON
 > instructions if an ARMv8 NEON unit is specified, e.g. by
 > -mfpu=neon-fp-armv8. Note that if the NEON FP16 instructions are enabled
 > then so are the VFP FP16 instructions.

This a minor respin that moves the setting of arm_fp16_inst in
arm_option_override to immediately before it is used to set the required
arm_fp16_format.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator.

2016-07-04  Matthew Wahab  <matthew.wahab@arm.com>

	* config/arm/arm-arches.def ("armv8.1-a"): Add FL_CRC32.
	("armv8.2-a"): New.
	("armv8.2-a+fp16"): New.
	* config/arm/arm-protos.h (FL2_ARCH8_2): New.
	(FL2_FP16INST): New.
	(FL2_FOR_ARCH8_2A): New.
	* config/arm/arm-tables.opt: Regenerate.
	* config/arm/arm.c (arm_arch8_2): New.
	(arm_fp16_inst): New.
	(arm_option_override): Set arm_arch8_2 and arm_fp16_inst.  Check
	for incompatible fp16-format settings.
	* config/arm/arm.h (TARGET_VFP_FP16INST): New.
	(TARGET_NEON_FP16INST): New.
	(arm_arch8_2): Declare.
	(arm_fp16_inst): Declare.
	* config/arm/bpabi.h (BE8_LINK_SPEC): Add entries for
	march=armv8.2-a and march=armv8.2-a+fp16.
	* config/arm/t-aprofile (Arch Matches): Add entries for armv8.2-a
	and armv8.2-a+fp16.
	* doc/invoke.texi (ARM Options): Add "-march=armv8.1-a",
	"-march=armv8.2-a" and "-march=armv8.2-a+fp16".


[-- Attachment #2: 0001-PATCH-1-17-ARM-Add-ARMv8.2-A-command-line-option-and.patch --]
[-- Type: text/x-patch, Size: 9768 bytes --]

From e165b4e8bc4338608ff9505a7fd1a26d8a996b0a Mon Sep 17 00:00:00 2001
From: Matthew Wahab <matthew.wahab@arm.com>
Date: Thu, 7 Apr 2016 13:31:24 +0100
Subject: [PATCH 01/17] [PATCH 1/17][ARM] Add ARMv8.2-A command line option and
 profile.

2016-07-04  Matthew Wahab  <matthew.wahab@arm.com>

	* config/arm/arm-arches.def ("armv8.1-a"): Add FL_CRC32.
	("armv8.2-a"): New.
	("armv8.2-a+fp16"): New.
	* config/arm/arm-protos.h (FL2_ARCH8_2): New.
	(FL2_FP16INST): New.
	(FL2_FOR_ARCH8_2A): New.
	* config/arm/arm-tables.opt: Regenerate.
	* config/arm/arm.c (arm_arch8_2): New.
	(arm_fp16_inst): New.
	(arm_option_override): Set arm_arch8_2 and arm_fp16_inst.  Check
	for incompatible fp16-format settings.
	* config/arm/arm.h (TARGET_VFP_FP16INST): New.
	(TARGET_NEON_FP16INST): New.
	(arm_arch8_2): Declare.
	(arm_fp16_inst): Declare.
	* config/arm/bpabi.h (BE8_LINK_SPEC): Add entries for
	march=armv8.2-a and march=armv8.2-a+fp16.
	* config/arm/t-aprofile (Arch Matches): Add entries for armv8.2-a
	and armv8.2-a+fp16.
	* doc/invoke.texi (ARM Options): Add "-march=armv8.1-a",
	"-march=armv8.2-a" and "-march=armv8.2-a+fp16".
---
 gcc/config/arm/arm-arches.def | 10 ++++++++--
 gcc/config/arm/arm-protos.h   |  4 ++++
 gcc/config/arm/arm-tables.opt | 10 ++++++++--
 gcc/config/arm/arm.c          | 15 +++++++++++++++
 gcc/config/arm/arm.h          | 14 ++++++++++++++
 gcc/config/arm/bpabi.h        |  4 ++++
 gcc/config/arm/t-aprofile     |  2 ++
 gcc/doc/invoke.texi           | 13 +++++++++++++
 8 files changed, 68 insertions(+), 4 deletions(-)

diff --git a/gcc/config/arm/arm-arches.def b/gcc/config/arm/arm-arches.def
index fd02b18..2b4a80e 100644
--- a/gcc/config/arm/arm-arches.def
+++ b/gcc/config/arm/arm-arches.def
@@ -58,10 +58,16 @@ ARM_ARCH("armv7e-m", cortexm4,  7EM,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC |	      FL_F
 ARM_ARCH("armv8-a", cortexa53,  8A,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC |             FL_FOR_ARCH8A))
 ARM_ARCH("armv8-a+crc",cortexa53, 8A,   ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_CRC32  | FL_FOR_ARCH8A))
 ARM_ARCH("armv8.1-a", cortexa53,  8A,
-	  ARM_FSET_MAKE (FL_CO_PROC | FL_FOR_ARCH8A,  FL2_FOR_ARCH8_1A))
+	  ARM_FSET_MAKE (FL_CO_PROC | FL_CRC32 | FL_FOR_ARCH8A,
+			 FL2_FOR_ARCH8_1A))
 ARM_ARCH("armv8.1-a+crc",cortexa53, 8A,
 	  ARM_FSET_MAKE (FL_CO_PROC | FL_CRC32 | FL_FOR_ARCH8A,
 			 FL2_FOR_ARCH8_1A))
+ARM_ARCH ("armv8.2-a", cortexa53,  8A,
+	  ARM_FSET_MAKE (FL_CO_PROC | FL_CRC32 | FL_FOR_ARCH8A,
+			 FL2_FOR_ARCH8_2A))
+ARM_ARCH ("armv8.2-a+fp16", cortexa53,  8A,
+	  ARM_FSET_MAKE (FL_CO_PROC | FL_CRC32 | FL_FOR_ARCH8A,
+			 FL2_FOR_ARCH8_2A | FL2_FP16INST))
 ARM_ARCH("iwmmxt",  iwmmxt,     5TE,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT))
 ARM_ARCH("iwmmxt2", iwmmxt2,    5TE,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT | FL_IWMMXT2))
-
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 1ba2ebb..960bb63 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -392,6 +392,9 @@ extern bool arm_is_constant_pool_ref (rtx);
 #define FL_ARCH6KZ    (1 << 31)       /* ARMv6KZ architecture.  */
 
 #define FL2_ARCH8_1   (1 << 0)	      /* Architecture 8.1.  */
+#define FL2_ARCH8_2   (1 << 1)	      /* Architecture 8.2.  */
+#define FL2_FP16INST  (1 << 2)	      /* FP16 Instructions for ARMv8.2 and
+					 later.  */
 
 /* Flags that only effect tuning, not available instructions.  */
 #define FL_TUNE		(FL_WBUF | FL_VFPV2 | FL_STRONG | FL_LDSCHED \
@@ -422,6 +425,7 @@ extern bool arm_is_constant_pool_ref (rtx);
 #define FL_FOR_ARCH7EM  (FL_FOR_ARCH7M | FL_ARCH7EM)
 #define FL_FOR_ARCH8A	(FL_FOR_ARCH7VE | FL_ARCH8)
 #define FL2_FOR_ARCH8_1A	FL2_ARCH8_1
+#define FL2_FOR_ARCH8_2A	(FL2_FOR_ARCH8_1A | FL2_ARCH8_2)
 
 /* There are too many feature bits to fit in a single word so the set of cpu and
    fpu capabilities is a structure.  A feature set is created and manipulated
diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt
index a5fe2c3..3a5cd69 100644
--- a/gcc/config/arm/arm-tables.opt
+++ b/gcc/config/arm/arm-tables.opt
@@ -437,10 +437,16 @@ EnumValue
 Enum(arm_arch) String(armv8.1-a+crc) Value(28)
 
 EnumValue
-Enum(arm_arch) String(iwmmxt) Value(29)
+Enum(arm_arch) String(armv8.2-a) Value(29)
 
 EnumValue
-Enum(arm_arch) String(iwmmxt2) Value(30)
+Enum(arm_arch) String(armv8.2-a+fp16) Value(30)
+
+EnumValue
+Enum(arm_arch) String(iwmmxt) Value(31)
+
+EnumValue
+Enum(arm_arch) String(iwmmxt2) Value(32)
 
 Enum
 Name(arm_fpu) Type(int)
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index a7dda1f..75442f8 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -815,6 +815,13 @@ int arm_arch8 = 0;
 /* Nonzero if this chip supports the ARMv8.1 extensions.  */
 int arm_arch8_1 = 0;
 
+/* Nonzero if this chip supports the ARM Architecture 8.2 extensions.  */
+int arm_arch8_2 = 0;
+
+/* Nonzero if this chip supports the FP16 instructions extension of ARM
+   Architecture 8.2.  */
+int arm_fp16_inst = 0;
+
 /* Nonzero if this chip can benefit from load scheduling.  */
 int arm_ld_sched = 0;
 
@@ -3191,6 +3198,7 @@ arm_option_override (void)
   arm_arch7em = ARM_FSET_HAS_CPU1 (insn_flags, FL_ARCH7EM);
   arm_arch8 = ARM_FSET_HAS_CPU1 (insn_flags, FL_ARCH8);
   arm_arch8_1 = ARM_FSET_HAS_CPU2 (insn_flags, FL2_ARCH8_1);
+  arm_arch8_2 = ARM_FSET_HAS_CPU2 (insn_flags, FL2_ARCH8_2);
   arm_arch_thumb1 = ARM_FSET_HAS_CPU1 (insn_flags, FL_THUMB);
   arm_arch_thumb2 = ARM_FSET_HAS_CPU1 (insn_flags, FL_THUMB2);
   arm_arch_xscale = ARM_FSET_HAS_CPU1 (insn_flags, FL_XSCALE);
@@ -3207,6 +3215,13 @@ arm_option_override (void)
   arm_tune_cortex_a9 = (arm_tune == cortexa9) != 0;
   arm_arch_crc = ARM_FSET_HAS_CPU1 (insn_flags, FL_CRC32);
   arm_m_profile_small_mul = ARM_FSET_HAS_CPU1 (insn_flags, FL_SMALLMUL);
+  arm_fp16_inst = ARM_FSET_HAS_CPU2 (insn_flags, FL2_FP16INST);
+  if (arm_fp16_inst)
+    {
+      if (arm_fp16_format == ARM_FP16_FORMAT_ALTERNATIVE)
+	error ("selected fp16 options are incompatible.");
+      arm_fp16_format = ARM_FP16_FORMAT_IEEE;
+    }
 
   /* V5 code we generate is completely interworking capable, so we turn off
      TARGET_INTERWORK here to avoid many tests later on.  */
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index f0cdd66..ee69428 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -217,6 +217,13 @@ extern void (*arm_lang_output_object_attributes_hook)(void);
 /* FPU supports ARMv8.1 Adv.SIMD extensions.  */
 #define TARGET_NEON_RDMA (TARGET_NEON && arm_arch8_1)
 
+/* FPU supports the floating point FP16 instructions for ARMv8.2 and later.  */
+#define TARGET_VFP_FP16INST \
+  (TARGET_32BIT && TARGET_HARD_FLOAT && TARGET_FPU_ARMV8 && arm_fp16_inst)
+
+/* FPU supports the AdvSIMD FP16 instructions for ARMv8.2 and later.  */
+#define TARGET_NEON_FP16INST (TARGET_VFP_FP16INST && TARGET_NEON_RDMA)
+
 /* Q-bit is present.  */
 #define TARGET_ARM_QBIT \
   (TARGET_32BIT && arm_arch5e && (arm_arch_notm || arm_arch7))
@@ -443,6 +450,13 @@ extern int arm_arch8;
 /* Nonzero if this chip supports the ARM Architecture 8.1 extensions.  */
 extern int arm_arch8_1;
 
+/* Nonzero if this chip supports the ARM Architecture 8.2 extensions.  */
+extern int arm_arch8_2;
+
+/* Nonzero if this chip supports the FP16 instructions extension of ARM
+   Architecture 8.2.  */
+extern int arm_fp16_inst;
+
 /* Nonzero if this chip can benefit from load scheduling.  */
 extern int arm_ld_sched;
 
diff --git a/gcc/config/arm/bpabi.h b/gcc/config/arm/bpabi.h
index d6d394a..68b4b01 100644
--- a/gcc/config/arm/bpabi.h
+++ b/gcc/config/arm/bpabi.h
@@ -93,6 +93,8 @@
    |march=armv8-a+crc					\
    |march=armv8.1-a					\
    |march=armv8.1-a+crc					\
+   |march=armv8.2-a					\
+   |march=armv8.2-a+fp16				\
    :%{!r:--be8}}}"
 #else
 #define BE8_LINK_SPEC \
@@ -127,6 +129,8 @@
    |march=armv8-a+crc					\
    |march=armv8.1-a					\
    |march=armv8.1-a+crc					\
+   |march=armv8.2-a					\
+   |march=armv8.2-a+fp16				\
    :%{!r:--be8}}}"
 #endif
 
diff --git a/gcc/config/arm/t-aprofile b/gcc/config/arm/t-aprofile
index 1b34b54..46f148b 100644
--- a/gcc/config/arm/t-aprofile
+++ b/gcc/config/arm/t-aprofile
@@ -104,6 +104,8 @@ MULTILIB_MATCHES       += march?armv8-a=mcpu?xgene1
 MULTILIB_MATCHES       += march?armv8-a=march?armv8-a+crc
 MULTILIB_MATCHES       += march?armv8-a=march?armv8.1-a
 MULTILIB_MATCHES       += march?armv8-a=march?armv8.1-a+crc
+MULTILIB_MATCHES       += march?armv8-a=march?armv8.2-a
+MULTILIB_MATCHES       += march?armv8-a=march?armv8.2-a+fp16
 
 # FPU matches
 MULTILIB_MATCHES       += mfpu?vfpv3-d16=mfpu?vfpv3
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 2c87c53..1b09f21 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -14159,6 +14159,19 @@ extensions.
 @option{-march=armv8-a+crc} enables code generation for the ARMv8-A
 architecture together with the optional CRC32 extensions.
 
+@option{-march=armv8.1-a} enables compiler support for the ARMv8.1-A
+architecture.  This also enables the features provided by
+@option{-march=armv8-a+crc}.
+
+@option{-march=armv8.2-a} enables compiler support for the ARMv8.2-A
+architecture.  This also enables the features provided by
+@option{-march=armv8.1-a}.
+
+@option{-march=armv8.2-a+fp16} enables compiler support for the
+ARMv8.2-A architecture with the optional FP16 instructions extension.
+This also enables the features provided by @option{-march=armv8.1-a}
+and implies @option{-mfp16-format=ieee}.
+
 @option{-march=native} causes the compiler to auto-detect the architecture
 of the build computer.  At present, this feature is only supported on
 GNU/Linux, and not all architectures are recognized.  If the auto-detect
-- 
2.1.4


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 3/17][Testsuite] Add ARM support for ARMv8.2-A with FP16 arithmetic instructions.
  2016-05-17 14:26 ` [PATCH 3/17][Testsuite] Add ARM support for ARMv8.2-A with FP16 arithmetic instructions Matthew Wahab
@ 2016-07-04 13:49   ` Matthew Wahab
  2016-07-27 13:34     ` Ramana Radhakrishnan
  0 siblings, 1 reply; 73+ messages in thread
From: Matthew Wahab @ 2016-07-04 13:49 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1557 bytes --]

On 17/05/16 15:26, Matthew Wahab wrote:
 > The ARMv8.2-A FP16 extension adds to both the VFP and the NEON
 > instruction sets. This patch adds support to the testsuite to select
 > targets and set options for tests that make use of these
 > instructions. It also adds documentation for ARMv8.1-A selectors.

This is a rebase of the patch to take account of changes in
sourcebuild.texi.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator.

2016-07-04  Matthew Wahab  <matthew.wahab@arm.com>

	* doc/sourcebuild.texi (ARM-specific attributes): Add anchor for
	arm_v8_1a_neon_ok.  Add entries for arm_v8_2a_fp16_scalar_ok,
	arm_v8_2a_fp16_scalar_hw, arm_v8_2a_fp16_neon_ok and
	arm_v8_2a_fp16_neon_hw.
	(Add options): Add entries for arm_v8_1a_neon, arm_v8_2a_fp16_scalar,
	arm_v8_2a_fp16_neon.
	* lib/target-supports.exp
	(add_options_for_arm_v8_2a_fp16_scalar): New.
	(add_options_for_arm_v8_2a_fp16_neon): New.
	(check_effective_target_arm_arch_v8_2a_ok): Auto-generate.
	(add_options_for_arm_arch_v8_2a): Auto-generate.
	(check_effective_target_arm_arch_v8_2a_multilib): Auto-generate.
	(check_effective_target_arm_v8_2a_fp16_scalar_ok_nocache): New.
	(check_effective_target_arm_v8_2a_fp16_scalar_ok): New.
	(check_effective_target_arm_v8_2a_fp16_neon_ok_nocache): New.
	(check_effective_target_arm_v8_2a_fp16_neon_ok): New.
	(check_effective_target_arm_v8_2a_fp16_scalar_hw): New.
	(check_effective_target_arm_v8_2a_fp16_neon_hw): New.


[-- Attachment #2: 0003-PATCH-3-17-Testsuite-Add-ARM-support-for-ARMv8.2-A-w.patch --]
[-- Type: text/x-patch, Size: 10006 bytes --]

From 47ead98473ac1f6dda5df2638800e5b4c8ec38a1 Mon Sep 17 00:00:00 2001
From: Matthew Wahab <matthew.wahab@arm.com>
Date: Thu, 7 Apr 2016 13:34:30 +0100
Subject: [PATCH 03/17] [PATCH 3/17][Testsuite] Add ARM support for ARMv8.2-A
 with FP16   arithmetic instructions.

2016-07-04  Matthew Wahab  <matthew.wahab@arm.com>

	* doc/sourcebuild.texi (ARM-specific attributes): Add anchor for
	arm_v8_1a_neon_ok.  Add entries for arm_v8_2a_fp16_scalar_ok,
	arm_v8_2a_fp16_scalar_hw, arm_v8_2a_fp16_neon_ok and
	arm_v8_2a_fp16_neon_hw.
	(Add options): Add entries for arm_v8_1a_neon, arm_v8_2a_scalar,
	arm_v8_2a_neon.
	* lib/target-supports.exp
	(add_options_for_arm_v8_2a_fp16_scalar): New.
	(add_options_for_arm_v8_2a_fp16_neon): New.
	(check_effective_target_arm_arch_v8_2a_ok): Auto-generate.
	(add_options_for_arm_arch_v8_2a): Auto-generate.
	(check_effective_target_arm_arch_v8_2a_multilib): Auto-generate.
	(check_effective_target_arm_v8_2a_fp16_scalar_ok_nocache): New.
	(check_effective_target_arm_v8_2a_fp16_scalar_ok): New.
	(check_effective_target_arm_v8_2a_fp16_neon_ok_nocache): New.
	(check_effective_target_arm_v8_2a_fp16_neon_ok): New.
	(check_effective_target_arm_v8_2a_fp16_scalar_hw): New.
	(check_effective_target_arm_v8_2a_fp16_neon_hw): New.
---
 gcc/doc/sourcebuild.texi              |  40 ++++++++++
 gcc/testsuite/lib/target-supports.exp | 145 +++++++++++++++++++++++++++++++++-
 2 files changed, 184 insertions(+), 1 deletion(-)

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 1fa962d..4f83307 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1596,6 +1596,7 @@ ARM target supports @code{-mfpu=neon-fp-armv8 -mfloat-abi=softfp}.
 Some multilibs may be incompatible with these options.
 
 @item arm_v8_1a_neon_ok
+@anchor{arm_v8_1a_neon_ok}
 ARM target supports options to generate ARMv8.1 Adv.SIMD instructions.
 Some multilibs may be incompatible with these options.
 
@@ -1607,6 +1608,28 @@ arm_v8_1a_neon_ok.
 @item arm_acq_rel
 ARM target supports acquire-release instructions.
 
+@item arm_v8_2a_fp16_scalar_ok
+@anchor{arm_v8_2a_fp16_scalar_ok}
+ARM target supports options to generate instructions for ARMv8.2 and
+scalar instructions from the FP16 extension.  Some multilibs may be
+incompatible with these options.
+
+@item arm_v8_2a_fp16_scalar_hw
+ARM target supports executing instructions for ARMv8.2 and scalar
+instructions from the FP16 extension.  Some multilibs may be
+incompatible with these options.  Implies arm_v8_2a_fp16_neon_ok.
+
+@item arm_v8_2a_fp16_neon_ok
+@anchor{arm_v8_2a_fp16_neon_ok}
+ARM target supports options to generate instructions from ARMv8.2 with
+the FP16 extension.  Some multilibs may be incompatible with these
+options.  Implies arm_v8_2a_fp16_scalar_ok.
+
+@item arm_v8_2a_fp16_neon_hw
+ARM target supports executing instructions from ARMv8.2 with the FP16
+extension.  Some multilibs may be incompatible with these options.
+Implies arm_v8_2a_fp16_neon_ok and arm_v8_2a_fp16_scalar_hw.
+
 @item arm_prefer_ldrd_strd
 ARM target prefers @code{LDRD} and @code{STRD} instructions over
 @code{LDM} and @code{STM} instructions.
@@ -2091,6 +2114,23 @@ the @ref{arm_neon_fp16_ok,,arm_neon_fp16_ok effective target keyword}.
 arm vfp3 floating point support; see
 the @ref{arm_vfp3_ok,,arm_vfp3_ok effective target keyword}.
 
+@item arm_v8_1a_neon
+Add options for ARMv8.1 with Adv.SIMD support, if this is supported
+by the target; see the @ref{arm_v8_1a_neon_ok,,arm_v8_1a_neon_ok}
+effective target keyword.
+
+@item arm_v8_2a_fp16_scalar
+Add options for ARMv8.2 with scalar FP16 support, if this is
+supported by the target; see the
+@ref{arm_v8_2a_fp16_scalar_ok,,arm_v8_2a_fp16_scalar_ok} effective
+target keyword.
+
+@item arm_v8_2a_fp16_neon
+Add options for ARMv8.2 with Adv.SIMD FP16 support, if this is
+supported by the target; see the
+@ref{arm_v8_2a_fp16_neon_ok,,arm_v8_2a_fp16_neon_ok} effective target
+keyword.
+
 @item bind_pic_locally
 Add the target-specific flags needed to enable functions to bind
 locally when using pic/PIC passes in the testsuite.
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 2ee7fc0..3e914d3 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2974,6 +2974,28 @@ proc add_options_for_arm_v8_1a_neon { flags } {
     return "$flags $et_arm_v8_1a_neon_flags -march=armv8.1-a"
 }
 
+# Add the options needed for ARMv8.2 with the scalar FP16 extension.
+# Also adds the ARMv8 FP options for ARM.
+
+proc add_options_for_arm_v8_2a_fp16_scalar { flags } {
+    if { ! [check_effective_target_arm_v8_2a_fp16_scalar_ok] } {
+	return "$flags"
+    }
+    global et_arm_v8_2a_fp16_scalar_flags
+    return "$flags $et_arm_v8_2a_fp16_scalar_flags"
+}
+
+# Add the options needed for ARMv8.2 with the FP16 extension.  Also adds
+# the ARMv8 NEON options for ARM.
+
+proc add_options_for_arm_v8_2a_fp16_neon { flags } {
+    if { ! [check_effective_target_arm_v8_2a_fp16_neon_ok] } {
+	return "$flags"
+    }
+    global et_arm_v8_2a_fp16_neon_flags
+    return "$flags $et_arm_v8_2a_fp16_neon_flags"
+}
+
 proc add_options_for_arm_crc { flags } {
     if { ! [check_effective_target_arm_crc_ok] } {
         return "$flags"
@@ -3325,7 +3347,8 @@ foreach { armfunc armflag armdef } { v4 "-march=armv4 -marm" __ARM_ARCH_4__
 				     v7m "-march=armv7-m -mthumb" __ARM_ARCH_7M__
 				     v7em "-march=armv7e-m -mthumb" __ARM_ARCH_7EM__
 				     v8a "-march=armv8-a" __ARM_ARCH_8A__
-				     v8_1a "-march=armv8.1a" __ARM_ARCH_8A__ } {
+				     v8_1a "-march=armv8.1a" __ARM_ARCH_8A__
+				     v8_2a "-march=armv8.2a" __ARM_ARCH_8A__ } {
     eval [string map [list FUNC $armfunc FLAG $armflag DEF $armdef ] {
 	proc check_effective_target_arm_arch_FUNC_ok { } {
 	    if { [ string match "*-marm*" "FLAG" ] &&
@@ -3537,6 +3560,76 @@ proc check_effective_target_arm_v8_1a_neon_ok { } {
 		check_effective_target_arm_v8_1a_neon_ok_nocache]
 }
 
+# Return 1 if the target supports ARMv8.2 scalar FP16 arithmetic
+# instructions, 0 otherwise.  The test is valid for ARM.  Record the
+# command line options needed.
+
+proc check_effective_target_arm_v8_2a_fp16_scalar_ok_nocache { } {
+    global et_arm_v8_2a_fp16_scalar_flags
+    set et_arm_v8_2a_fp16_scalar_flags ""
+
+    if { ![istarget arm*-*-*] } {
+	return 0;
+    }
+
+    # Iterate through sets of options to find the compiler flags that
+    # need to be added to the -march option.
+    foreach flags {"" "-mfpu=fp-armv8" "-mfloat-abi=softfp" \
+		       "-mfpu=fp-armv8 -mfloat-abi=softfp"} {
+	if { [check_no_compiler_messages_nocache \
+		  arm_v8_2a_fp16_scalar_ok object {
+	    #if !defined (__ARM_FEATURE_FP16_SCALAR_ARITHMETIC)
+	    #error "__ARM_FEATURE_FP16_SCALAR_ARITHMETIC not defined"
+	    #endif
+	} "$flags -march=armv8.2-a+fp16"] } {
+	    set et_arm_v8_2a_fp16_scalar_flags "$flags -march=armv8.2-a+fp16"
+	    return 1
+	}
+    }
+
+    return 0;
+}
+
+proc check_effective_target_arm_v8_2a_fp16_scalar_ok { } {
+    return [check_cached_effective_target arm_v8_2a_fp16_scalar_ok \
+		check_effective_target_arm_v8_2a_fp16_scalar_ok_nocache]
+}
+
+# Return 1 if the target supports ARMv8.2 Adv.SIMD FP16 arithmetic
+# instructions, 0 otherwise.  The test is valid for ARM.  Record the
+# command line options needed.
+
+proc check_effective_target_arm_v8_2a_fp16_neon_ok_nocache { } {
+    global et_arm_v8_2a_fp16_neon_flags
+    set et_arm_v8_2a_fp16_neon_flags ""
+
+    if { ![istarget arm*-*-*] } {
+	return 0;
+    }
+
+    # Iterate through sets of options to find the compiler flags that
+    # need to be added to the -march option.
+    foreach flags {"" "-mfpu=neon-fp-armv8" "-mfloat-abi=softfp" \
+		       "-mfpu=neon-fp-armv8 -mfloat-abi=softfp"} {
+	if { [check_no_compiler_messages_nocache \
+		  arm_v8_2a_fp16_neon_ok object {
+	    #if !defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+	    #error "__ARM_FEATURE_FP16_VECTOR_ARITHMETIC not defined"
+	    #endif
+	} "$flags -march=armv8.2-a+fp16"] } {
+	    set et_arm_v8_2a_fp16_neon_flags "$flags -march=armv8.2-a+fp16"
+	    return 1
+	}
+    }
+
+    return 0;
+}
+
+proc check_effective_target_arm_v8_2a_fp16_neon_ok { } {
+    return [check_cached_effective_target arm_v8_2a_fp16_neon_ok \
+		check_effective_target_arm_v8_2a_fp16_neon_ok_nocache]
+}
+
 # Return 1 if the target supports executing ARMv8 NEON instructions, 0
 # otherwise.
 
@@ -3599,6 +3692,56 @@ proc check_effective_target_arm_v8_1a_neon_hw { } {
     } [add_options_for_arm_v8_1a_neon ""]]
 }
 
+# Return 1 if the target supports executing instructions floating point
+# instructions from ARMv8.2 with the FP16 extension, 0 otherwise.  The
+# test is valid for ARM.
+
+proc check_effective_target_arm_v8_2a_fp16_scalar_hw { } {
+    if { ![check_effective_target_arm_v8_2a_fp16_scalar_ok] } {
+	return 0;
+    }
+    return [check_runtime arm_v8_2a_fp16_scalar_hw_available {
+	int
+	main (void)
+	{
+	  __fp16 a = 1.0;
+	  __fp16 result;
+
+	  asm ("vabs.f16 %0, %1"
+	       : "=w"(result)
+	       : "w"(a)
+	       : /* No clobbers.  */);
+
+	  return (result == 1.0) ? 0 : 1;
+	}
+    } [add_options_for_arm_v8_2a_fp16_scalar ""]]
+}
+
+# Return 1 if the target supports executing instructions Adv.SIMD
+# instructions from ARMv8.2 with the FP16 extension, 0 otherwise.  The
+# test is valid for ARM.
+
+proc check_effective_target_arm_v8_2a_fp16_neon_hw { } {
+    if { ![check_effective_target_arm_v8_2a_fp16_neon_ok] } {
+	return 0;
+    }
+    return [check_runtime arm_v8_2a_fp16_neon_hw_available {
+	int
+	main (void)
+	{
+	  __simd64_float16_t a = {1.0, -1.0, 1.0, -1.0};
+	  __simd64_float16_t result;
+
+	  asm ("vabs.f16 %P0, %P1"
+	       : "=w"(result)
+	       : "w"(a)
+	       : /* No clobbers.  */);
+
+	  return (result[0] == 1.0) ? 0 : 1;
+	}
+    } [add_options_for_arm_v8_2a_fp16_neon ""]]
+}
+
 # Return 1 if this is a ARM target with NEON enabled.
 
 proc check_effective_target_arm_neon { } {
-- 
2.1.4


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 7/17][ARM] Add FP16 data movement instructions.
  2016-05-17 14:34 ` [PATCH 7/17][ARM] Add FP16 data movement instructions Matthew Wahab
@ 2016-07-04 13:57   ` Matthew Wahab
  2016-07-27 14:01     ` Ramana Radhakrishnan
  0 siblings, 1 reply; 73+ messages in thread
From: Matthew Wahab @ 2016-07-04 13:57 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1683 bytes --]

On 17/05/16 15:34, Matthew Wahab wrote:
 > The ARMv8.2-A FP16 extension adds a number of instructions to support
 > data movement for FP16 values. This patch adds these instructions to the
 > backend, making them available to the compiler code generator.

This updates the expected output for the test added by the patch since
gcc now generates ldrh/strh for some indexed loads/stores which were
previously done with vld1/vstr1.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator.

2016-07-04  Matthew Wahab  <matthew.wahab@arm.com>
	    Jiong Wang <jiong.wang@arm.com>

	* config/arm/arm.c (coproc_secondary_reload_class): Make HFmode
	available when FP16 instructions are available.
	(output_move_vfp): Add support for 16-bit data moves.
	(arm_validize_comparison): Fix some white-space.  Support HFmode
	by conversion to SFmode.
	* config/arm/arm.md (truncdfhf2): Fix a comment.
	(extendhfdf2): Likewise.
	(cstorehf4): New.
	(movsicc): Fix some white-space.
	(movhfcc): New.
	(movsfcc): Fix some white-space.
	(*cmovhf): New.
	* config/arm/vfp.md (*arm_movhi_vfp): Disable when VFP FP16
	instructions are available.
	(*thumb2_movhi_vfp): Likewise.
	(*arm_movhi_fp16): New.
	(*thumb2_movhi_fp16): New.
	(*movhf_vfp_fp16): New.
	(*movhf_vfp_neon): Disable when VFP FP16 instructions are
	available.
	(*movhf_vfp): Likewise.
	(extendhfsf2): Enable when VFP FP16 instructions are available.
	(truncsfhf2):  Enable when VFP FP16 instructions are available.

testsuite/
2016-07-04  Matthew Wahab  <matthew.wahab@arm.com>

	* gcc.target/arm/armv8_2_fp16-move-1.c: New.


[-- Attachment #2: 0007-PATCH-7-17-ARM-Add-FP16-data-movement-instructions.patch --]
[-- Type: text/x-patch, Size: 18540 bytes --]

From 0633bbb2f2d43a6994adaeb44898e18c304ee728 Mon Sep 17 00:00:00 2001
From: Matthew Wahab <matthew.wahab@arm.com>
Date: Thu, 7 Apr 2016 13:35:04 +0100
Subject: [PATCH 07/17] [PATCH 7/17][ARM] Add FP16 data movement instructions.

2016-07-04  Matthew Wahab  <matthew.wahab@arm.com>
	    Jiong Wang <jiong.wang@arm.com>

	* config/arm/arm.c (coproc_secondary_reload_class): Make HFmode
	available when FP16 instructions are available.
	(output_move_vfp): Add support for 16-bit data moves.
	(arm_validize_comparison): Fix some white-space.  Support HFmode
	by conversion to SFmode.
	* config/arm/arm.md (truncdfhf2): Fix a comment.
	(extendhfdf2): Likewise.
	(cstorehf4): New.
	(movsicc): Fix some white-space.
	(movhfcc): New.
	(movsfcc): Fix some white-space.
	(*cmovhf): New.
	* config/arm/vfp.md (*arm_movhi_vfp): Disable when VFP FP16
	instructions are available.
	(*thumb2_movhi_vfp): Likewise.
	(*arm_movhi_fp16): New.
	(*thumb2_movhi_fp16): New.
	(*movhf_vfp_fp16): New.
	(*movhf_vfp_neon): Disable when VFP FP16 instructions are
	available.
	(*movhf_vfp): Likewise.
	(extendhfsf2): Enable when VFP FP16 instructions are available.
	(truncsfhf2):  Enable when VFP FP16 instructions are available.

testsuite/
2016-07-04  Matthew Wahab  <matthew.wahab@arm.com>

	* gcc.target/arm/armv8_2_fp16-move-1.c: New.
---
 gcc/config/arm/arm.c                               |  16 +-
 gcc/config/arm/arm.md                              |  81 ++++++++-
 gcc/config/arm/vfp.md                              | 182 ++++++++++++++++++++-
 gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c | 165 +++++++++++++++++++
 4 files changed, 432 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index ce18f75..f07e2c1 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -13187,7 +13187,7 @@ coproc_secondary_reload_class (machine_mode mode, rtx x, bool wb)
 {
   if (mode == HFmode)
     {
-      if (!TARGET_NEON_FP16)
+      if (!TARGET_NEON_FP16 && !TARGET_VFP_FP16INST)
 	return GENERAL_REGS;
       if (s_register_operand (x, mode) || neon_vector_mem_operand (x, 2, true))
 	return NO_REGS;
@@ -18638,6 +18638,8 @@ output_move_vfp (rtx *operands)
   rtx reg, mem, addr, ops[2];
   int load = REG_P (operands[0]);
   int dp = GET_MODE_SIZE (GET_MODE (operands[0])) == 8;
+  int sp = (!TARGET_VFP_FP16INST
+	    || GET_MODE_SIZE (GET_MODE (operands[0])) == 4);
   int integer_p = GET_MODE_CLASS (GET_MODE (operands[0])) == MODE_INT;
   const char *templ;
   char buff[50];
@@ -18684,7 +18686,7 @@ output_move_vfp (rtx *operands)
 
   sprintf (buff, templ,
 	   load ? "ld" : "st",
-	   dp ? "64" : "32",
+	   dp ? "64" : sp ? "32" : "16",
 	   dp ? "P" : "",
 	   integer_p ? "\t%@ int" : "");
   output_asm_insn (buff, ops);
@@ -29326,7 +29328,7 @@ arm_validize_comparison (rtx *comparison, rtx * op1, rtx * op2)
 {
   enum rtx_code code = GET_CODE (*comparison);
   int code_int;
-  machine_mode mode = (GET_MODE (*op1) == VOIDmode) 
+  machine_mode mode = (GET_MODE (*op1) == VOIDmode)
     ? GET_MODE (*op2) : GET_MODE (*op1);
 
   gcc_assert (GET_MODE (*op1) != VOIDmode || GET_MODE (*op2) != VOIDmode);
@@ -29354,6 +29356,14 @@ arm_validize_comparison (rtx *comparison, rtx * op1, rtx * op2)
 	*op2 = force_reg (mode, *op2);
       return true;
 
+    case HFmode:
+      if (!TARGET_VFP_FP16INST)
+	break;
+      /* FP16 comparisons are done in SF mode.  */
+      mode = SFmode;
+      *op1 = convert_to_mode (mode, *op1, 1);
+      *op2 = convert_to_mode (mode, *op2, 1);
+      /* Fall through.  */
     case SFmode:
     case DFmode:
       if (!arm_float_compare_operand (*op1, mode))
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 21af27c..6a980cd 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -4854,7 +4854,7 @@
   ""
 )
 
-/* DFmode -> HFmode conversions have to go through SFmode.  */
+;; DFmode to HFmode conversions have to go through SFmode.
 (define_expand "truncdfhf2"
   [(set (match_operand:HF  0 "general_operand" "")
 	(float_truncate:HF
@@ -5361,7 +5361,7 @@
   ""
 )
 
-/* HFmode -> DFmode conversions have to go through SFmode.  */
+;; HFmode -> DFmode conversions have to go through SFmode.
 (define_expand "extendhfdf2"
   [(set (match_operand:DF                  0 "general_operand" "")
 	(float_extend:DF (match_operand:HF 1 "general_operand"  "")))]
@@ -7366,6 +7366,24 @@
   DONE;
 }")
 
+(define_expand "cstorehf4"
+  [(set (match_operand:SI 0 "s_register_operand")
+	(match_operator:SI 1 "expandable_comparison_operator"
+	 [(match_operand:HF 2 "s_register_operand")
+	  (match_operand:HF 3 "arm_float_compare_operand")]))]
+  "TARGET_VFP_FP16INST"
+  {
+    if (!arm_validize_comparison (&operands[1],
+				  &operands[2],
+				  &operands[3]))
+       FAIL;
+
+    emit_insn (gen_cstore_cc (operands[0], operands[1],
+			      operands[2], operands[3]));
+    DONE;
+  }
+)
+
 (define_expand "cstoresf4"
   [(set (match_operand:SI 0 "s_register_operand" "")
 	(match_operator:SI 1 "expandable_comparison_operator"
@@ -7418,9 +7436,31 @@
     rtx ccreg;
 
     if (!arm_validize_comparison (&operands[1], &XEXP (operands[1], 0), 
-       				  &XEXP (operands[1], 1)))
+				  &XEXP (operands[1], 1)))
       FAIL;
-    
+
+    code = GET_CODE (operands[1]);
+    ccreg = arm_gen_compare_reg (code, XEXP (operands[1], 0),
+				 XEXP (operands[1], 1), NULL_RTX);
+    operands[1] = gen_rtx_fmt_ee (code, VOIDmode, ccreg, const0_rtx);
+  }"
+)
+
+(define_expand "movhfcc"
+  [(set (match_operand:HF 0 "s_register_operand")
+	(if_then_else:HF (match_operand 1 "arm_cond_move_operator")
+			 (match_operand:HF 2 "s_register_operand")
+			 (match_operand:HF 3 "s_register_operand")))]
+  "TARGET_VFP_FP16INST"
+  "
+  {
+    enum rtx_code code = GET_CODE (operands[1]);
+    rtx ccreg;
+
+    if (!arm_validize_comparison (&operands[1], &XEXP (operands[1], 0),
+				  &XEXP (operands[1], 1)))
+      FAIL;
+
     code = GET_CODE (operands[1]);
     ccreg = arm_gen_compare_reg (code, XEXP (operands[1], 0),
 				 XEXP (operands[1], 1), NULL_RTX);
@@ -7439,7 +7479,7 @@
     enum rtx_code code = GET_CODE (operands[1]);
     rtx ccreg;
 
-    if (!arm_validize_comparison (&operands[1], &XEXP (operands[1], 0), 
+    if (!arm_validize_comparison (&operands[1], &XEXP (operands[1], 0),
        				  &XEXP (operands[1], 1)))
        FAIL;
 
@@ -7504,6 +7544,37 @@
    (set_attr "type" "fcsel")]
 )
 
+(define_insn "*cmovhf"
+    [(set (match_operand:HF 0 "s_register_operand" "=t")
+	(if_then_else:HF (match_operator 1 "arm_vsel_comparison_operator"
+			 [(match_operand 2 "cc_register" "") (const_int 0)])
+			  (match_operand:HF 3 "s_register_operand" "t")
+			  (match_operand:HF 4 "s_register_operand" "t")))]
+  "TARGET_VFP_FP16INST"
+  "*
+  {
+    enum arm_cond_code code = maybe_get_arm_condition_code (operands[1]);
+    switch (code)
+      {
+      case ARM_GE:
+      case ARM_GT:
+      case ARM_EQ:
+      case ARM_VS:
+	return \"vsel%d1.f16\\t%0, %3, %4\";
+      case ARM_LT:
+      case ARM_LE:
+      case ARM_NE:
+      case ARM_VC:
+	return \"vsel%D1.f16\\t%0, %4, %3\";
+      default:
+	gcc_unreachable ();
+      }
+    return \"\";
+  }"
+  [(set_attr "conds" "use")
+   (set_attr "type" "fcsel")]
+)
+
 (define_insn_and_split "*movsicc_insn"
   [(set (match_operand:SI 0 "s_register_operand" "=r,r,r,r,r,r,r,r")
 	(if_then_else:SI
diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
index d7c874a..b1c13fa 100644
--- a/gcc/config/arm/vfp.md
+++ b/gcc/config/arm/vfp.md
@@ -27,6 +27,7 @@
    (match_operand:HI 1 "general_operand"
     "rIk, K, n, r, mi, r, *t, *t"))]
  "TARGET_ARM && TARGET_HARD_FLOAT && TARGET_VFP
+  && !TARGET_VFP_FP16INST
   && (register_operand (operands[0], HImode)
        || register_operand (operands[1], HImode))"
 {
@@ -76,6 +77,7 @@
    (match_operand:HI 1 "general_operand"
     "rk, I, Py, n, r, m, r, *t, *t"))]
  "TARGET_THUMB2 && TARGET_HARD_FLOAT && TARGET_VFP
+  && !TARGET_VFP_FP16INST
   && (register_operand (operands[0], HImode)
        || register_operand (operands[1], HImode))"
 {
@@ -111,6 +113,99 @@
   (set_attr "length" "2, 4, 2, 4, 4, 4, 4, 4, 4")]
 )
 
+;; Patterns for HI moves which provide more data transfer instructions when FP16
+;; instructions are available.
+(define_insn "*arm_movhi_fp16"
+ [(set
+   (match_operand:HI 0 "nonimmediate_operand"
+    "=r,  r, r, m, r, *t,  r, *t")
+   (match_operand:HI 1 "general_operand"
+    "rIk, K, n, r, mi, r, *t, *t"))]
+ "TARGET_ARM && TARGET_VFP_FP16INST
+  && (register_operand (operands[0], HImode)
+       || register_operand (operands[1], HImode))"
+{
+  switch (which_alternative)
+    {
+    case 0:
+      return "mov%?\t%0, %1\t%@ movhi";
+    case 1:
+      return "mvn%?\t%0, #%B1\t%@ movhi";
+    case 2:
+      return "movw%?\t%0, %L1\t%@ movhi";
+    case 3:
+      return "strh%?\t%1, %0\t%@ movhi";
+    case 4:
+      return "ldrh%?\t%0, %1\t%@ movhi";
+    case 5:
+    case 6:
+      return "vmov%?.f16\t%0, %1\t%@ int";
+    case 7:
+      return "vmov%?.f32\t%0, %1\t%@ int";
+    default:
+      gcc_unreachable ();
+    }
+}
+ [(set_attr "predicable" "yes")
+  (set_attr_alternative "type"
+   [(if_then_else
+     (match_operand 1 "const_int_operand" "")
+     (const_string "mov_imm")
+     (const_string "mov_reg"))
+    (const_string "mvn_imm")
+    (const_string "mov_imm")
+    (const_string "store1")
+    (const_string "load1")
+    (const_string "f_mcr")
+    (const_string "f_mrc")
+    (const_string "fmov")])
+  (set_attr "pool_range" "*, *, *, *, 256, *, *, *")
+  (set_attr "neg_pool_range" "*, *, *, *, 244, *, *, *")
+  (set_attr "length" "4")]
+)
+
+(define_insn "*thumb2_movhi_fp16"
+ [(set
+   (match_operand:HI 0 "nonimmediate_operand"
+    "=rk, r, l, r, m, r, *t, r, *t")
+   (match_operand:HI 1 "general_operand"
+    "rk, I, Py, n, r, m, r, *t, *t"))]
+ "TARGET_THUMB2 && TARGET_VFP_FP16INST
+  && (register_operand (operands[0], HImode)
+       || register_operand (operands[1], HImode))"
+{
+  switch (which_alternative)
+    {
+    case 0:
+    case 1:
+    case 2:
+      return "mov%?\t%0, %1\t%@ movhi";
+    case 3:
+      return "movw%?\t%0, %L1\t%@ movhi";
+    case 4:
+      return "strh%?\t%1, %0\t%@ movhi";
+    case 5:
+      return "ldrh%?\t%0, %1\t%@ movhi";
+    case 6:
+    case 7:
+      return "vmov%?.f16\t%0, %1\t%@ int";
+    case 8:
+      return "vmov%?.f32\t%0, %1\t%@ int";
+    default:
+      gcc_unreachable ();
+    }
+}
+ [(set_attr "predicable" "yes")
+  (set_attr "predicable_short_it"
+   "yes, no, yes, no, no, no, no, no, no")
+  (set_attr "type"
+   "mov_reg, mov_imm, mov_imm, mov_imm, store1, load1,\
+    f_mcr, f_mrc, fmov")
+  (set_attr "pool_range" "*, *, *, *, *, 4094, *, *, *")
+  (set_attr "neg_pool_range" "*, *, *, *, *, 250, *, *, *")
+  (set_attr "length" "2, 4, 2, 4, 4, 4, 4, 4, 4")]
+)
+
 ;; SImode moves
 ;; ??? For now do not allow loading constants into vfp regs.  This causes
 ;; problems because small constants get converted into adds.
@@ -304,10 +399,87 @@
  )
 
 ;; HFmode moves
+
+(define_insn "*movhf_vfp_fp16"
+  [(set (match_operand:HF 0 "nonimmediate_operand"
+			  "= r,m,t,r,t,r,t,t,Um,r")
+	(match_operand:HF 1 "general_operand"
+			  "  m,r,t,r,r,t,Dv,Um,t,F"))]
+  "TARGET_32BIT
+   && TARGET_VFP_FP16INST
+   && (s_register_operand (operands[0], HFmode)
+       || s_register_operand (operands[1], HFmode))"
+ {
+  switch (which_alternative)
+    {
+    case 0: /* ARM register from memory.  */
+      return \"ldrh%?\\t%0, %1\\t%@ __fp16\";
+    case 1: /* Memory from ARM register.  */
+      return \"strh%?\\t%1, %0\\t%@ __fp16\";
+    case 2: /* S register from S register.  */
+      return \"vmov\\t%0, %1\t%@ __fp16\";
+    case 3: /* ARM register from ARM register.  */
+      return \"mov%?\\t%0, %1\\t%@ __fp16\";
+    case 4: /* S register from ARM register.  */
+    case 5: /* ARM register from S register.  */
+    case 6: /* S register from immediate.  */
+      return \"vmov.f16\\t%0, %1\t%@ __fp16\";
+    case 7: /* S register from memory.  */
+      return \"vld1.16\\t{%z0}, %A1\";
+    case 8: /* Memory from S register.  */
+      return \"vst1.16\\t{%z1}, %A0\";
+    case 9: /* ARM register from constant.  */
+      {
+	long bits;
+	rtx ops[4];
+
+	bits = real_to_target (NULL, CONST_DOUBLE_REAL_VALUE (operands[1]),
+			       HFmode);
+	ops[0] = operands[0];
+	ops[1] = GEN_INT (bits);
+	ops[2] = GEN_INT (bits & 0xff00);
+	ops[3] = GEN_INT (bits & 0x00ff);
+
+	if (arm_arch_thumb2)
+	  output_asm_insn (\"movw\\t%0, %1\", ops);
+	else
+	  output_asm_insn (\"mov\\t%0, %2\;orr\\t%0, %0, %3\", ops);
+	return \"\";
+       }
+    default:
+      gcc_unreachable ();
+    }
+ }
+  [(set_attr "predicable" "yes, yes, no, yes, no, no, no, no, no, no")
+   (set_attr "predicable_short_it" "no, no, no, yes,\
+				    no, no, no, no,\
+				    no, no")
+   (set_attr_alternative "type"
+    [(const_string "load1") (const_string "store1")
+     (const_string "fmov") (const_string "mov_reg")
+     (const_string "f_mcr") (const_string "f_mrc")
+     (const_string "fconsts") (const_string "neon_load1_1reg")
+     (const_string "neon_store1_1reg")
+     (if_then_else (match_test "arm_arch_thumb2")
+      (const_string "mov_imm")
+      (const_string "multiple"))])
+   (set_attr_alternative "length"
+    [(const_int 4) (const_int 4)
+     (const_int 4) (const_int 4)
+     (const_int 4) (const_int 4)
+     (const_int 4) (const_int 4)
+     (const_int 4)
+     (if_then_else (match_test "arm_arch_thumb2")
+      (const_int 4)
+      (const_int 8))])]
+)
+
 (define_insn "*movhf_vfp_neon"
   [(set (match_operand:HF 0 "nonimmediate_operand" "= t,Um,r,m,t,r,t,r,r")
 	(match_operand:HF 1 "general_operand"	   " Um, t,m,r,t,r,r,t,F"))]
-  "TARGET_32BIT && TARGET_HARD_FLOAT && TARGET_NEON_FP16
+  "TARGET_32BIT
+   && TARGET_HARD_FLOAT && TARGET_NEON_FP16
+   && !TARGET_VFP_FP16INST
    && (   s_register_operand (operands[0], HFmode)
        || s_register_operand (operands[1], HFmode))"
   "*
@@ -361,8 +533,10 @@
 (define_insn "*movhf_vfp"
   [(set (match_operand:HF 0 "nonimmediate_operand" "=r,m,t,r,t,r,r")
 	(match_operand:HF 1 "general_operand"	   " m,r,t,r,r,t,F"))]
-  "TARGET_32BIT && TARGET_HARD_FLOAT && TARGET_VFP
+  "TARGET_32BIT
+   && TARGET_HARD_FLOAT && TARGET_VFP
    && !TARGET_NEON_FP16
+   && !TARGET_VFP_FP16INST
    && (   s_register_operand (operands[0], HFmode)
        || s_register_operand (operands[1], HFmode))"
   "*
@@ -1095,7 +1269,7 @@
 (define_insn "extendhfsf2"
   [(set (match_operand:SF		   0 "s_register_operand" "=t")
 	(float_extend:SF (match_operand:HF 1 "s_register_operand" "t")))]
-  "TARGET_32BIT && TARGET_HARD_FLOAT && TARGET_FP16"
+  "TARGET_32BIT && TARGET_HARD_FLOAT && (TARGET_FP16 || TARGET_VFP_FP16INST)"
   "vcvtb%?.f32.f16\\t%0, %1"
   [(set_attr "predicable" "yes")
    (set_attr "predicable_short_it" "no")
@@ -1105,7 +1279,7 @@
 (define_insn "truncsfhf2"
   [(set (match_operand:HF		   0 "s_register_operand" "=t")
 	(float_truncate:HF (match_operand:SF 1 "s_register_operand" "t")))]
-  "TARGET_32BIT && TARGET_HARD_FLOAT && TARGET_FP16"
+  "TARGET_32BIT && TARGET_HARD_FLOAT && (TARGET_FP16 || TARGET_VFP_FP16INST)"
   "vcvtb%?.f16.f32\\t%0, %1"
   [(set_attr "predicable" "yes")
    (set_attr "predicable_short_it" "no")
diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c
new file mode 100644
index 0000000..951da23
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c
@@ -0,0 +1,165 @@
+/* { dg-do compile }  */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_ok }  */
+/* { dg-options "-O2" }  */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+__fp16
+test_load_1 (__fp16* a)
+{
+  return *a;
+}
+
+__fp16
+test_load_2 (__fp16* a, int i)
+{
+  return a[i];
+}
+
+/* { dg-final { scan-assembler-times {vld1\.16\t\{d[0-9]+\[[0-9]+\]\}, \[r[0-9]\]+} 1 } }  */
+
+void
+test_store_1 (__fp16* a, __fp16 b)
+{
+  *a = b;
+}
+
+void
+test_store_2 (__fp16* a, int i, __fp16 b)
+{
+  a[i] = b;
+}
+
+/* { dg-final { scan-assembler-times {vst1\.16\t\{d[0-9]+\[[0-9]+\]\}, \[r[0-9]\]+} 1 } }  */
+
+__fp16
+test_load_store_1 (__fp16* a, int i, __fp16* b)
+{
+  a[i] = b[i];
+}
+
+__fp16
+test_load_store_2 (__fp16* a, int i, __fp16* b)
+{
+  a[i] = b[i + 2];
+  return a[i];
+}
+/* { dg-final { scan-assembler-times {ldrh\tr[0-9]+} 3 } }  */
+/* { dg-final { scan-assembler-times {strh\tr[0-9]+} 3 } }  */
+
+__fp16
+test_select_1 (int sel, __fp16 a, __fp16 b)
+{
+  if (sel)
+    return a;
+  else
+    return b;
+}
+
+__fp16
+test_select_2 (int sel, __fp16 a, __fp16 b)
+{
+  return sel ? a : b;
+}
+
+__fp16
+test_select_3 (__fp16 a, __fp16 b, __fp16 c)
+{
+  return (a == b) ? b : c;
+}
+
+__fp16
+test_select_4 (__fp16 a, __fp16 b, __fp16 c)
+{
+  return (a != b) ? b : c;
+}
+
+__fp16
+test_select_5 (__fp16 a, __fp16 b, __fp16 c)
+{
+  return (a < b) ? b : c;
+}
+
+__fp16
+test_select_6 (__fp16 a, __fp16 b, __fp16 c)
+{
+  return (a <= b) ? b : c;
+}
+
+__fp16
+test_select_7 (__fp16 a, __fp16 b, __fp16 c)
+{
+  return (a > b) ? b : c;
+}
+
+__fp16
+test_select_8 (__fp16 a, __fp16 b, __fp16 c)
+{
+  return (a >= b) ? b : c;
+}
+
+/* { dg-final { scan-assembler-times {vseleq\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 4 } } */
+/* { dg-final { scan-assembler-times {vselgt\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } }  */
+/* { dg-final { scan-assembler-times {vselge\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } }  */
+
+/* { dg-final { scan-assembler-times {vmov\.f16\ts[0-9]+, r[0-9]+} 5 } }  */
+/* { dg-final { scan-assembler-times {vmov\.f16\tr[0-9]+, s[0-9]+} 5 } }  */
+
+int
+test_compare_1 (__fp16 a, __fp16 b)
+{
+  if (a == b)
+    return -1;
+  else
+    return 0;
+}
+
+int
+test_compare_ (__fp16 a, __fp16 b)
+{
+  if (a != b)
+    return -1;
+  else
+    return 0;
+}
+
+int
+test_compare_2 (__fp16 a, __fp16 b)
+{
+  if (a > b)
+    return -1;
+  else
+    return 0;
+}
+
+int
+test_compare_3 (__fp16 a, __fp16 b)
+{
+  if (a >= b)
+    return -1;
+  else
+    return 0;
+}
+
+int
+test_compare_4 (__fp16 a, __fp16 b)
+{
+  if (a < b)
+    return -1;
+  else
+    return 0;
+}
+
+int
+test_compare_5 (__fp16 a, __fp16 b)
+{
+  if (a <= b)
+    return -1;
+  else
+    return 0;
+}
+
+/* { dg-final { scan-assembler-not {vcmp\.f16} } }  */
+/* { dg-final { scan-assembler-not {vcmpe\.f16} } }  */
+
+/* { dg-final { scan-assembler-times {vcmp\.f32} 4 } }  */
+/* { dg-final { scan-assembler-times {vcmpe\.f32} 8 } }  */
-- 
2.1.4


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 8/17][ARM] Add VFP FP16 arithmetic instructions.
  2016-05-17 14:36 ` [PATCH 8/17][ARM] Add VFP FP16 arithmetic instructions Matthew Wahab
  2016-05-18  0:52   ` Joseph Myers
@ 2016-07-04 14:02   ` Matthew Wahab
  2016-07-28 11:37     ` Ramana Radhakrishnan
  1 sibling, 1 reply; 73+ messages in thread
From: Matthew Wahab @ 2016-07-04 14:02 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 4173 bytes --]

On 19/05/16 15:54, Matthew Wahab wrote:
 > On 18/05/16 16:20, Joseph Myers wrote:
 >> On Wed, 18 May 2016, Matthew Wahab wrote:
 >>
 >> In short: instructions for direct HFmode arithmetic should be described
 >> with patterns with the standard names.  It's the job of the
 >> architecture-independent compiler to ensure that fp16 arithmetic in the
 >> user's source code only generates direct fp16 arithmetic in GIMPLE (and
 >> thus ends up using those patterns) if that is a correct representation of
 >> the source code's semantics according to ACLE.
 >>
 >> The intrinsics you provide can then be written to use direct arithmetic,
 >> and rely on convert_to_real_1 eliminating the promotions, rather than
 >> needing built-in functions at all, just like many arm_neon.h intrinsics
 >> make direct use of GNU C vector arithmetic.
 >
 > I think it's clear that this has exhausted my knowledge of FP semantics.
 >
 > Forcing promotion to single-precision was to settle concerns brought up in
 > internal discussions about __fp16 semantics. I'll see if anybody has any
 > problem with the changes you suggest.

This patch changes the implementation to use the standard names for the
HFmode arithmetic. Later patches will also be updated to use the
arithmetic operators where appropriate.

Changes since the last version of this patch:
- The standard names for plus, minus, mult, div and fma are defined for
   HF mode.
- The patterns supporting the new ACLE intrinsics vnegh_f16, vaddh_f16,
   vsubh_f16, vmulh_f16 and vdivh_f16 are removed, the arithmetic
   operators will be used instead.
- The tests are updated to expect f16 instructions rather than the f32
   instructions that were previously emitted.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator.

Ok for trunk?
Matthew

2016-07-04  Matthew Wahab  <matthew.wahab@arm.com>

	* config/arm/iterators.md (Code iterators): Fix some white-space
	in the comments.
	(GLTE): New.
	(ABSNEG): New
	(FCVT): Moved from vfp.md.
	(VCVT_HF_US_N): New.
	(VCVT_SI_US_N): New.
	(VCVT_HF_US): New.
	(VCVTH_US): New.
	(FP16_RND): New.
	(absneg_str): New.
	(FCVTI32typename): Moved from vfp.md.
	(sup): Add UNSPEC_VCVTA_S, UNSPEC_VCVTA_U, UNSPEC_VCVTM_S,
	UNSPEC_VCVTM_U, UNSPEC_VCVTN_S, UNSPEC_VCVTN_U, UNSPEC_VCVTP_S,
	UNSPEC_VCVTP_U, UNSPEC_VCVT_HF_S_N, UNSPEC_VCVT_HF_U_N,
	UNSPEC_VCVT_SI_S_N, UNSPEC_VCVT_SI_U_N,  UNSPEC_VCVTH_S_N,
	UNSPEC_VCVTH_U_N, UNSPEC_VCVTH_S and UNSPEC_VCVTH_U.
	(vcvth_op): New.
	(fp16_rnd_str): New.
	(fp16_rnd_insn): New.
	* config/arm/unspecs.md (UNSPEC_VCVT_HF_S_N): New.
	(UNSPEC_VCVT_HF_U_N): New.
	(UNSPEC_VCVT_SI_S_N): New.
	(UNSPEC_VCVT_SI_U_N): New.
	(UNSPEC_VCVTH_S): New.
	(UNSPEC_VCVTH_U): New.
	(UNSPEC_VCVTA_S): New.
	(UNSPEC_VCVTA_U): New.
	(UNSPEC_VCVTM_S): New.
	(UNSPEC_VCVTM_U): New.
	(UNSPEC_VCVTN_S): New.
	(UNSPEC_VCVTN_U): New.
	(UNSPEC_VCVTP_S): New.
	(UNSPEC_VCVTP_U): New.
	(UNSPEC_VCVTP_S): New.
	(UNSPEC_VCVTP_U): New.
	(UNSPEC_VRND): New.
	(UNSPEC_VRNDA): New.
	(UNSPEC_VRNDI): New.
	(UNSPEC_VRNDM): New.
	(UNSPEC_VRNDN): New.
	(UNSPEC_VRNDP): New.
	(UNSPEC_VRNDX): New.
	* config/arm/vfp.md (<absneg_str>hf2): New.
	(neon_vabshf): New.
	(neon_v<fp16_rnd_str>hf): New.
	(neon_vrndihf): New.
	(addhf3): New.
	(subhf3): New.
	(divhf3): New.
	(mulhf3): New.
	(*mulsf3neghf_vfp): New.
	(*negmulhf3_vfp): New.
	(*mulsf3addhf_vfp): New.
	(*mulhf3subhf_vfp): New.
	(*mulhf3neghfaddhf_vfp): New.
	(*mulhf3neghfsubhf_vfp): New.
	(fmahf4): New.
	(neon_vfmahf): New.
	(fmsubhf4_fp16): New.
	(neon_vfmshf): New.
	(*fnmsubhf4): New.
	(*fnmaddhf4): New.
	(neon_vsqrthf): New.
	(neon_vrsqrtshf): New.
	(FCVT): Move to iterators.md.
	(FCVTI32typename): Likewise.
	(neon_vcvth<sup>hf): New.
	(neon_vcvth<sup>si): New.
	(neon_vcvth<sup>_nhf_unspec): New.
	(neon_vcvth<sup>_nhf): New.
	(neon_vcvth<sup>_nsi_unspec): New.
	(neon_vcvth<sup>_nsi): New.
	(neon_vcvt<vcvth_op>h<sup>si): New.
	(neon_<fmaxmin_op>hf): New.

testsuite/
2016-07-04  Matthew Wahab  <matthew.wahab@arm.com>

	* gcc.target/arm/armv8_2-fp16-arith-1.c: New.
	* gcc.target/arm/armv8_2-fp16-conv-1.c: New.


[-- Attachment #2: 0008-PATCH-8-17-ARM-Add-VFP-FP16-arithmetic-instructions.patch --]
[-- Type: text/x-patch, Size: 28343 bytes --]

From 780903a1c5ef2e4393c9ee2843307d9041f36f87 Mon Sep 17 00:00:00 2001
From: Matthew Wahab <matthew.wahab@arm.com>
Date: Thu, 7 Apr 2016 14:49:17 +0100
Subject: [PATCH 08/17] [PATCH 8/17][ARM] Add VFP FP16 arithmetic instructions.

2016-07-04  Matthew Wahab  <matthew.wahab@arm.com>

	* config/arm/iterators.md (Code iterators): Fix some white-space
	in the comments.
	(GLTE): New.
	(ABSNEG): New
	(FCVT): Moved from vfp.md.
	(VCVT_HF_US_N): New.
	(VCVT_SI_US_N): New.
	(VCVT_HF_US): New.
	(VCVTH_US): New.
	(FP16_RND): New.
	(absneg_str): New.
	(FCVTI32typename): Moved from vfp.md.
	(sup): Add UNSPEC_VCVTA_S, UNSPEC_VCVTA_U, UNSPEC_VCVTM_S,
	UNSPEC_VCVTM_U, UNSPEC_VCVTN_S, UNSPEC_VCVTN_U, UNSPEC_VCVTP_S,
	UNSPEC_VCVTP_U, UNSPEC_VCVT_HF_S_N, UNSPEC_VCVT_HF_U_N,
	UNSPEC_VCVT_SI_S_N, UNSPEC_VCVT_SI_U_N,  UNSPEC_VCVTH_S_N,
	UNSPEC_VCVTH_U_N, UNSPEC_VCVTH_S and UNSPEC_VCVTH_U.
	(vcvth_op): New.
	(fp16_rnd_str): New.
	(fp16_rnd_insn): New.
	* config/arm/unspecs.md (UNSPEC_VCVT_HF_S_N): New.
	(UNSPEC_VCVT_HF_U_N): New.
	(UNSPEC_VCVT_SI_S_N): New.
	(UNSPEC_VCVT_SI_U_N): New.
	(UNSPEC_VCVTH_S): New.
	(UNSPEC_VCVTH_U): New.
	(UNSPEC_VCVTA_S): New.
	(UNSPEC_VCVTA_U): New.
	(UNSPEC_VCVTM_S): New.
	(UNSPEC_VCVTM_U): New.
	(UNSPEC_VCVTN_S): New.
	(UNSPEC_VCVTN_U): New.
	(UNSPEC_VCVTP_S): New.
	(UNSPEC_VCVTP_U): New.
	(UNSPEC_VCVTP_S): New.
	(UNSPEC_VCVTP_U): New.
	(UNSPEC_VRND): New.
	(UNSPEC_VRNDA): New.
	(UNSPEC_VRNDI): New.
	(UNSPEC_VRNDM): New.
	(UNSPEC_VRNDN): New.
	(UNSPEC_VRNDP): New.
	(UNSPEC_VRNDX): New.
	* config/arm/vfp.md (<absneg_str>hf2): New.
	(neon_vabshf): New.
	(neon_v<fp16_rnd_str>hf): New.
	(neon_vrndihf): New.
	(addhf3): New.
	(subhf3): New.
	(divhf3): New.
	(mulhf3): New.
	(*mulsf3neghf_vfp): New.
	(*negmulhf3_vfp): New.
	(*mulsf3addhf_vfp): New.
	(*mulhf3subhf_vfp): New.
	(*mulhf3neghfaddhf_vfp): New.
	(*mulhf3neghfsubhf_vfp): New.
	(fmahf4): New.
	(neon_vfmahf): New.
	(fmsubhf4_fp16): New.
	(neon_vfmshf): New.
	(*fnmsubhf4): New.
	(*fnmaddhf4): New.
	(neon_vsqrthf): New.
	(neon_vrsqrtshf): New.
	(FCVT): Move to iterators.md.
	(FCVTI32typename): Likewise.
	(neon_vcvth<sup>hf): New.
	(neon_vcvth<sup>si): New.
	(neon_vcvth<sup>_nhf_unspec): New.
	(neon_vcvth<sup>_nhf): New.
	(neon_vcvth<sup>_nsi_unspec): New.
	(neon_vcvth<sup>_nsi): New.
	(neon_vcvt<vcvth_op>h<sup>si): New.
	(neon_<fmaxmin_op>hf): New.

testsuite/
2016-07-04  Matthew Wahab  <matthew.wahab@arm.com>

	* gcc.target/arm/armv8_2-fp16-arith-1.c: New.
	* gcc.target/arm/armv8_2-fp16-conv-1.c: New.
---
 gcc/config/arm/iterators.md                        |  59 +++-
 gcc/config/arm/unspecs.md                          |  21 ++
 gcc/config/arm/vfp.md                              | 382 ++++++++++++++++++++-
 .../gcc.target/arm/armv8_2-fp16-arith-1.c          |  68 ++++
 gcc/testsuite/gcc.target/arm/armv8_2-fp16-conv-1.c | 101 ++++++
 5 files changed, 625 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/armv8_2-fp16-arith-1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/armv8_2-fp16-conv-1.c

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 3f9d9e4..9371b6a 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -199,14 +199,17 @@
 ;; Code iterators
 ;;----------------------------------------------------------------------------
 
-;; A list of condition codes used in compare instructions where 
-;; the carry flag from the addition is used instead of doing the 
+;; A list of condition codes used in compare instructions where
+;; the carry flag from the addition is used instead of doing the
 ;; compare a second time.
 (define_code_iterator LTUGEU [ltu geu])
 
 ;; The signed gt, ge comparisons
 (define_code_iterator GTGE [gt ge])
 
+;; The signed gt, ge, lt, le comparisons
+(define_code_iterator GLTE [gt ge lt le])
+
 ;; The unsigned gt, ge comparisons
 (define_code_iterator GTUGEU [gtu geu])
 
@@ -235,6 +238,12 @@
 ;; Binary operators whose second operand can be shifted.
 (define_code_iterator SHIFTABLE_OPS [plus minus ior xor and])
 
+;; Operations on the sign of a number.
+(define_code_iterator ABSNEG [abs neg])
+
+;; Conversions.
+(define_code_iterator FCVT [unsigned_float float])
+
 ;; plus and minus are the only SHIFTABLE_OPS for which Thumb2 allows
 ;; a stack pointer opoerand.  The minus operation is a candidate for an rsub
 ;; and hence only plus is supported.
@@ -330,6 +339,22 @@
 
 (define_int_iterator VCVT_US_N [UNSPEC_VCVT_S_N UNSPEC_VCVT_U_N])
 
+(define_int_iterator VCVT_HF_US_N [UNSPEC_VCVT_HF_S_N UNSPEC_VCVT_HF_U_N])
+
+(define_int_iterator VCVT_SI_US_N [UNSPEC_VCVT_SI_S_N UNSPEC_VCVT_SI_U_N])
+
+(define_int_iterator VCVT_HF_US [UNSPEC_VCVTA_S UNSPEC_VCVTA_U
+				 UNSPEC_VCVTM_S UNSPEC_VCVTM_U
+				 UNSPEC_VCVTN_S UNSPEC_VCVTN_U
+				 UNSPEC_VCVTP_S UNSPEC_VCVTP_U])
+
+(define_int_iterator VCVTH_US [UNSPEC_VCVTH_S UNSPEC_VCVTH_U])
+
+;; Operators for FP16 instructions.
+(define_int_iterator FP16_RND [UNSPEC_VRND UNSPEC_VRNDA
+			       UNSPEC_VRNDM UNSPEC_VRNDN
+			       UNSPEC_VRNDP UNSPEC_VRNDX])
+
 (define_int_iterator VQMOVN [UNSPEC_VQMOVN_S UNSPEC_VQMOVN_U])
 
 (define_int_iterator VMOVL [UNSPEC_VMOVL_S UNSPEC_VMOVL_U])
@@ -687,6 +712,12 @@
 (define_code_attr shift [(ashiftrt "ashr") (lshiftrt "lshr")])
 (define_code_attr shifttype [(ashiftrt "signed") (lshiftrt "unsigned")])
 
+;; String reprentations of operations on the sign of a number.
+(define_code_attr absneg_str [(abs "abs") (neg "neg")])
+
+;; Conversions.
+(define_code_attr FCVTI32typename [(unsigned_float "u32") (float "s32")])
+
 ;;----------------------------------------------------------------------------
 ;; Int attributes
 ;;----------------------------------------------------------------------------
@@ -718,7 +749,13 @@
   (UNSPEC_VPMAX "s") (UNSPEC_VPMAX_U "u")
   (UNSPEC_VPMIN "s") (UNSPEC_VPMIN_U "u")
   (UNSPEC_VCVT_S "s") (UNSPEC_VCVT_U "u")
+  (UNSPEC_VCVTA_S "s") (UNSPEC_VCVTA_U "u")
+  (UNSPEC_VCVTM_S "s") (UNSPEC_VCVTM_U "u")
+  (UNSPEC_VCVTN_S "s") (UNSPEC_VCVTN_U "u")
+  (UNSPEC_VCVTP_S "s") (UNSPEC_VCVTP_U "u")
   (UNSPEC_VCVT_S_N "s") (UNSPEC_VCVT_U_N "u")
+  (UNSPEC_VCVT_HF_S_N "s") (UNSPEC_VCVT_HF_U_N "u")
+  (UNSPEC_VCVT_SI_S_N "s") (UNSPEC_VCVT_SI_U_N "u")
   (UNSPEC_VQMOVN_S "s") (UNSPEC_VQMOVN_U "u")
   (UNSPEC_VMOVL_S "s") (UNSPEC_VMOVL_U "u")
   (UNSPEC_VSHL_S "s") (UNSPEC_VSHL_U "u")
@@ -733,9 +770,25 @@
   (UNSPEC_VSHLL_S_N "s") (UNSPEC_VSHLL_U_N "u")
   (UNSPEC_VSRA_S_N "s") (UNSPEC_VSRA_U_N "u")
   (UNSPEC_VRSRA_S_N "s") (UNSPEC_VRSRA_U_N "u")
-
+  (UNSPEC_VCVTH_S "s") (UNSPEC_VCVTH_U "u")
 ])
 
+(define_int_attr vcvth_op
+ [(UNSPEC_VCVTA_S "a") (UNSPEC_VCVTA_U "a")
+  (UNSPEC_VCVTM_S "m") (UNSPEC_VCVTM_U "m")
+  (UNSPEC_VCVTN_S "n") (UNSPEC_VCVTN_U "n")
+  (UNSPEC_VCVTP_S "p") (UNSPEC_VCVTP_U "p")])
+
+(define_int_attr fp16_rnd_str
+  [(UNSPEC_VRND "rnd") (UNSPEC_VRNDA "rnda")
+   (UNSPEC_VRNDM "rndm") (UNSPEC_VRNDN "rndn")
+   (UNSPEC_VRNDP "rndp") (UNSPEC_VRNDX "rndx")])
+
+(define_int_attr fp16_rnd_insn
+  [(UNSPEC_VRND "vrintz") (UNSPEC_VRNDA "vrinta")
+   (UNSPEC_VRNDM "vrintm") (UNSPEC_VRNDN "vrintn")
+   (UNSPEC_VRNDP "vrintp") (UNSPEC_VRNDX "vrintx")])
+
 (define_int_attr cmp_op_unsp [(UNSPEC_VCEQ "eq") (UNSPEC_VCGT "gt")
                               (UNSPEC_VCGE "ge") (UNSPEC_VCLE "le")
                               (UNSPEC_VCLT "lt") (UNSPEC_VCAGE "ge")
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index 5744c62..57a47ff 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -203,6 +203,20 @@
   UNSPEC_VCVT_U
   UNSPEC_VCVT_S_N
   UNSPEC_VCVT_U_N
+  UNSPEC_VCVT_HF_S_N
+  UNSPEC_VCVT_HF_U_N
+  UNSPEC_VCVT_SI_S_N
+  UNSPEC_VCVT_SI_U_N
+  UNSPEC_VCVTH_S
+  UNSPEC_VCVTH_U
+  UNSPEC_VCVTA_S
+  UNSPEC_VCVTA_U
+  UNSPEC_VCVTM_S
+  UNSPEC_VCVTM_U
+  UNSPEC_VCVTN_S
+  UNSPEC_VCVTN_U
+  UNSPEC_VCVTP_S
+  UNSPEC_VCVTP_U
   UNSPEC_VEXT
   UNSPEC_VHADD_S
   UNSPEC_VHADD_U
@@ -365,5 +379,12 @@
   UNSPEC_NVRINTN
   UNSPEC_VQRDMLAH
   UNSPEC_VQRDMLSH
+  UNSPEC_VRND
+  UNSPEC_VRNDA
+  UNSPEC_VRNDI
+  UNSPEC_VRNDM
+  UNSPEC_VRNDN
+  UNSPEC_VRNDP
+  UNSPEC_VRNDX
 ])
 
diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
index b1c13fa..5d22c34 100644
--- a/gcc/config/arm/vfp.md
+++ b/gcc/config/arm/vfp.md
@@ -937,9 +937,63 @@
    (set_attr "type" "ffarithd")]
 )
 
+;; ABS and NEG for FP16.
+(define_insn "<absneg_str>hf2"
+  [(set (match_operand:HF 0 "s_register_operand" "=w")
+    (ABSNEG:HF (match_operand:HF 1 "s_register_operand" "w")))]
+ "TARGET_VFP_FP16INST"
+ "v<absneg_str>.f16\t%0, %1"
+  [(set_attr "conds" "unconditional")
+   (set_attr "type" "ffariths")]
+)
+
+(define_expand "neon_vabshf"
+ [(set
+   (match_operand:HF 0 "s_register_operand")
+   (abs:HF (match_operand:HF 1 "s_register_operand")))]
+ "TARGET_VFP_FP16INST"
+{
+  emit_insn (gen_abshf2 (operands[0], operands[1]));
+  DONE;
+})
+
+;; VRND for FP16.
+(define_insn "neon_v<fp16_rnd_str>hf"
+  [(set (match_operand:HF 0 "s_register_operand" "=w")
+    (unspec:HF
+     [(match_operand:HF 1 "s_register_operand" "w")]
+     FP16_RND))]
+ "TARGET_VFP_FP16INST"
+ "<fp16_rnd_insn>.f16\t%0, %1"
+ [(set_attr "conds" "unconditional")
+  (set_attr "type" "neon_fp_round_s")]
+)
+
+(define_insn "neon_vrndihf"
+  [(set (match_operand:HF 0 "s_register_operand" "=w")
+    (unspec:HF
+     [(match_operand:HF 1 "s_register_operand" "w")]
+     UNSPEC_VRNDI))]
+  "TARGET_VFP_FP16INST"
+  "vrintr.f16\t%0, %1"
+ [(set_attr "conds" "unconditional")
+  (set_attr "type" "neon_fp_round_s")]
+)
 
 ;; Arithmetic insns
 
+(define_insn "addhf3"
+  [(set
+    (match_operand:HF 0 "s_register_operand" "=w")
+    (plus:HF
+     (match_operand:HF 1 "s_register_operand" "w")
+     (match_operand:HF 2 "s_register_operand" "w")))]
+ "TARGET_VFP_FP16INST"
+ "vadd.f16\t%0, %1, %2"
+  [(set_attr "conds" "unconditional")
+   (set_attr "type" "fadds")]
+)
+
 (define_insn "*addsf3_vfp"
   [(set (match_operand:SF	   0 "s_register_operand" "=t")
 	(plus:SF (match_operand:SF 1 "s_register_operand" "t")
@@ -962,6 +1016,17 @@
    (set_attr "type" "faddd")]
 )
 
+(define_insn "subhf3"
+ [(set
+   (match_operand:HF 0 "s_register_operand" "=w")
+   (minus:HF
+    (match_operand:HF 1 "s_register_operand" "w")
+    (match_operand:HF 2 "s_register_operand" "w")))]
+ "TARGET_VFP_FP16INST"
+ "vsub.f16\t%0, %1, %2"
+  [(set_attr "conds" "unconditional")
+   (set_attr "type" "fadds")]
+)
 
 (define_insn "*subsf3_vfp"
   [(set (match_operand:SF	    0 "s_register_operand" "=t")
@@ -988,6 +1053,19 @@
 
 ;; Division insns
 
+;; FP16 Division.
+(define_insn "divhf3"
+  [(set
+    (match_operand:HF	   0 "s_register_operand" "=w")
+    (div:HF
+     (match_operand:HF 1 "s_register_operand" "w")
+     (match_operand:HF 2 "s_register_operand" "w")))]
+  "TARGET_VFP_FP16INST"
+  "vdiv.f16\t%0, %1, %2"
+  [(set_attr "conds" "unconditional")
+   (set_attr "type" "fdivs")]
+)
+
 ; VFP9 Erratum 760019: It's potentially unsafe to overwrite the input
 ; operands, so mark the output as early clobber for VFPv2 on ARMv5 or
 ; earlier.
@@ -1018,6 +1096,17 @@
 
 ;; Multiplication insns
 
+(define_insn "mulhf3"
+ [(set
+   (match_operand:HF 0 "s_register_operand" "=w")
+   (mult:HF (match_operand:HF 1 "s_register_operand" "w")
+	    (match_operand:HF 2 "s_register_operand" "w")))]
+  "TARGET_VFP_FP16INST"
+  "vmul.f16\t%0, %1, %2"
+  [(set_attr "conds" "unconditional")
+   (set_attr "type" "fmuls")]
+)
+
 (define_insn "*mulsf3_vfp"
   [(set (match_operand:SF	   0 "s_register_operand" "=t")
 	(mult:SF (match_operand:SF 1 "s_register_operand" "t")
@@ -1040,6 +1129,26 @@
    (set_attr "type" "fmuld")]
 )
 
+(define_insn "*mulsf3neghf_vfp"
+  [(set (match_operand:HF		   0 "s_register_operand" "=t")
+	(mult:HF (neg:HF (match_operand:HF 1 "s_register_operand" "t"))
+		 (match_operand:HF	   2 "s_register_operand" "t")))]
+  "TARGET_VFP_FP16INST && !flag_rounding_math"
+  "vnmul.f16\\t%0, %1, %2"
+  [(set_attr "conds" "unconditional")
+   (set_attr "type" "fmuls")]
+)
+
+(define_insn "*negmulhf3_vfp"
+  [(set (match_operand:HF		   0 "s_register_operand" "=t")
+	(neg:HF (mult:HF (match_operand:HF 1 "s_register_operand" "t")
+		 (match_operand:HF	   2 "s_register_operand" "t"))))]
+  "TARGET_VFP_FP16INST"
+  "vnmul.f16\\t%0, %1, %2"
+  [(set_attr "conds" "unconditional")
+   (set_attr "type" "fmuls")]
+)
+
 (define_insn "*mulsf3negsf_vfp"
   [(set (match_operand:SF		   0 "s_register_operand" "=t")
 	(mult:SF (neg:SF (match_operand:SF 1 "s_register_operand" "t"))
@@ -1089,6 +1198,18 @@
 ;; Multiply-accumulate insns
 
 ;; 0 = 1 * 2 + 0
+(define_insn "*mulsf3addhf_vfp"
+ [(set (match_operand:HF 0 "s_register_operand" "=t")
+       (plus:HF
+	(mult:HF (match_operand:HF 2 "s_register_operand" "t")
+		 (match_operand:HF 3 "s_register_operand" "t"))
+	(match_operand:HF 1 "s_register_operand" "0")))]
+  "TARGET_VFP_FP16INST"
+  "vmla.f16\\t%0, %2, %3"
+  [(set_attr "conds" "unconditional")
+   (set_attr "type" "fmacs")]
+)
+
 (define_insn "*mulsf3addsf_vfp"
   [(set (match_operand:SF		    0 "s_register_operand" "=t")
 	(plus:SF (mult:SF (match_operand:SF 2 "s_register_operand" "t")
@@ -1114,6 +1235,17 @@
 )
 
 ;; 0 = 1 * 2 - 0
+(define_insn "*mulhf3subhf_vfp"
+  [(set (match_operand:HF 0 "s_register_operand" "=t")
+	(minus:HF (mult:HF (match_operand:HF 2 "s_register_operand" "t")
+			   (match_operand:HF 3 "s_register_operand" "t"))
+		  (match_operand:HF 1 "s_register_operand" "0")))]
+  "TARGET_VFP_FP16INST"
+  "vnmls.f16\\t%0, %2, %3"
+  [(set_attr "conds" "unconditional")
+   (set_attr "type" "fmacs")]
+)
+
 (define_insn "*mulsf3subsf_vfp"
   [(set (match_operand:SF		     0 "s_register_operand" "=t")
 	(minus:SF (mult:SF (match_operand:SF 2 "s_register_operand" "t")
@@ -1139,6 +1271,17 @@
 )
 
 ;; 0 = -(1 * 2) + 0
+(define_insn "*mulhf3neghfaddhf_vfp"
+  [(set (match_operand:HF 0 "s_register_operand" "=t")
+	(minus:HF (match_operand:HF 1 "s_register_operand" "0")
+		  (mult:HF (match_operand:HF 2 "s_register_operand" "t")
+			   (match_operand:HF 3 "s_register_operand" "t"))))]
+  "TARGET_VFP_FP16INST"
+  "vmls.f16\\t%0, %2, %3"
+  [(set_attr "conds" "unconditional")
+   (set_attr "type" "fmacs")]
+)
+
 (define_insn "*mulsf3negsfaddsf_vfp"
   [(set (match_operand:SF		     0 "s_register_operand" "=t")
 	(minus:SF (match_operand:SF	     1 "s_register_operand" "0")
@@ -1165,6 +1308,18 @@
 
 
 ;; 0 = -(1 * 2) - 0
+(define_insn "*mulhf3neghfsubhf_vfp"
+  [(set (match_operand:HF 0 "s_register_operand" "=t")
+	(minus:HF (mult:HF
+		   (neg:HF (match_operand:HF 2 "s_register_operand" "t"))
+		   (match_operand:HF 3 "s_register_operand" "t"))
+		  (match_operand:HF 1 "s_register_operand" "0")))]
+  "TARGET_VFP_FP16INST"
+  "vnmla.f16\\t%0, %2, %3"
+  [(set_attr "conds" "unconditional")
+   (set_attr "type" "fmacs")]
+)
+
 (define_insn "*mulsf3negsfsubsf_vfp"
   [(set (match_operand:SF		      0 "s_register_operand" "=t")
 	(minus:SF (mult:SF
@@ -1193,6 +1348,30 @@
 
 ;; Fused-multiply-accumulate
 
+(define_insn "fmahf4"
+  [(set (match_operand:HF 0 "register_operand" "=w")
+    (fma:HF
+     (match_operand:HF 1 "register_operand" "w")
+     (match_operand:HF 2 "register_operand" "w")
+     (match_operand:HF 3 "register_operand" "0")))]
+ "TARGET_VFP_FP16INST"
+ "vfma.f16\\t%0, %1, %2"
+ [(set_attr "conds" "unconditional")
+  (set_attr "type" "ffmas")]
+)
+
+(define_expand "neon_vfmahf"
+  [(match_operand:HF 0 "s_register_operand")
+   (match_operand:HF 1 "s_register_operand")
+   (match_operand:HF 2 "s_register_operand")
+   (match_operand:HF 3 "s_register_operand")]
+  "TARGET_VFP_FP16INST"
+{
+  emit_insn (gen_fmahf4 (operands[0], operands[2], operands[3],
+			 operands[1]));
+  DONE;
+})
+
 (define_insn "fma<SDF:mode>4"
   [(set (match_operand:SDF 0 "register_operand" "=<F_constraint>")
         (fma:SDF (match_operand:SDF 1 "register_operand" "<F_constraint>")
@@ -1205,6 +1384,30 @@
    (set_attr "type" "ffma<vfp_type>")]
 )
 
+(define_insn "fmsubhf4_fp16"
+ [(set (match_operand:HF 0 "register_operand" "=w")
+   (fma:HF
+    (neg:HF (match_operand:HF 1 "register_operand" "w"))
+    (match_operand:HF 2 "register_operand" "w")
+    (match_operand:HF 3 "register_operand" "0")))]
+ "TARGET_VFP_FP16INST"
+ "vfms.f16\\t%0, %1, %2"
+ [(set_attr "conds" "unconditional")
+  (set_attr "type" "ffmas")]
+)
+
+(define_expand "neon_vfmshf"
+  [(match_operand:HF 0 "s_register_operand")
+   (match_operand:HF 1 "s_register_operand")
+   (match_operand:HF 2 "s_register_operand")
+   (match_operand:HF 3 "s_register_operand")]
+  "TARGET_VFP_FP16INST"
+{
+  emit_insn (gen_fmsubhf4_fp16 (operands[0], operands[2], operands[3],
+				operands[1]));
+  DONE;
+})
+
 (define_insn "*fmsub<SDF:mode>4"
   [(set (match_operand:SDF 0 "register_operand" "=<F_constraint>")
 	(fma:SDF (neg:SDF (match_operand:SDF 1 "register_operand"
@@ -1218,6 +1421,17 @@
    (set_attr "type" "ffma<vfp_type>")]
 )
 
+(define_insn "*fnmsubhf4"
+  [(set (match_operand:HF 0 "register_operand" "=w")
+	(fma:HF (match_operand:HF 1 "register_operand" "w")
+		 (match_operand:HF 2 "register_operand" "w")
+		 (neg:HF (match_operand:HF 3 "register_operand" "0"))))]
+  "TARGET_VFP_FP16INST"
+  "vfnms.f16\\t%0, %1, %2"
+  [(set_attr "conds" "unconditional")
+   (set_attr "type" "ffmas")]
+)
+
 (define_insn "*fnmsub<SDF:mode>4"
   [(set (match_operand:SDF 0 "register_operand" "=<F_constraint>")
 	(fma:SDF (match_operand:SDF 1 "register_operand" "<F_constraint>")
@@ -1230,6 +1444,17 @@
    (set_attr "type" "ffma<vfp_type>")]
 )
 
+(define_insn "*fnmaddhf4"
+  [(set (match_operand:HF 0 "register_operand" "=w")
+	(fma:HF (neg:HF (match_operand:HF 1 "register_operand" "w"))
+		 (match_operand:HF 2 "register_operand" "w")
+		 (neg:HF (match_operand:HF 3 "register_operand" "0"))))]
+  "TARGET_VFP_FP16INST"
+  "vfnma.f16\\t%0, %1, %2"
+  [(set_attr "conds" "unconditional")
+   (set_attr "type" "ffmas")]
+)
+
 (define_insn "*fnmadd<SDF:mode>4"
   [(set (match_operand:SDF 0 "register_operand" "=<F_constraint>")
 	(fma:SDF (neg:SDF (match_operand:SDF 1 "register_operand"
@@ -1372,6 +1597,27 @@
 
 ;; Sqrt insns.
 
+(define_insn "neon_vsqrthf"
+  [(set (match_operand:HF 0 "s_register_operand" "=w")
+	(sqrt:HF (match_operand:HF 1 "s_register_operand" "w")))]
+  "TARGET_VFP_FP16INST"
+  "vsqrt.f16\t%0, %1"
+  [(set_attr "conds" "unconditional")
+   (set_attr "type" "fsqrts")]
+)
+
+(define_insn "neon_vrsqrtshf"
+  [(set
+    (match_operand:HF 0 "s_register_operand" "=w")
+    (unspec:HF [(match_operand:HF 1 "s_register_operand" "w")
+		(match_operand:HF 2 "s_register_operand" "w")]
+     UNSPEC_VRSQRTS))]
+ "TARGET_VFP_FP16INST"
+ "vrsqrts.f16\t%0, %1, %2"
+ [(set_attr "conds" "unconditional")
+  (set_attr "type" "fsqrts")]
+)
+
 ; VFP9 Erratum 760019: It's potentially unsafe to overwrite the input
 ; operands, so mark the output as early clobber for VFPv2 on ARMv5 or
 ; earlier.
@@ -1528,9 +1774,6 @@
 )
 
 ;; Fixed point to floating point conversions.
-(define_code_iterator FCVT [unsigned_float float])
-(define_code_attr FCVTI32typename [(unsigned_float "u32") (float "s32")])
-
 (define_insn "*combine_vcvt_f32_<FCVTI32typename>"
   [(set (match_operand:SF 0 "s_register_operand" "=t")
 	(mult:SF (FCVT:SF (match_operand:SI 1 "s_register_operand" "0"))
@@ -1575,6 +1818,125 @@
    (set_attr "type" "f_cvtf2i")]
  )
 
+;; FP16 conversions.
+(define_insn "neon_vcvth<sup>hf"
+ [(set (match_operand:HF 0 "s_register_operand" "=w")
+   (unspec:HF
+    [(match_operand:SI 1 "s_register_operand" "w")]
+    VCVTH_US))]
+ "TARGET_VFP_FP16INST"
+ "vcvt.f16.<sup>%#32\t%0, %1"
+ [(set_attr "conds" "unconditional")
+  (set_attr "type" "f_cvti2f")]
+)
+
+(define_insn "neon_vcvth<sup>si"
+ [(set (match_operand:SI 0 "s_register_operand" "=w")
+   (unspec:SI
+    [(match_operand:HF 1 "s_register_operand" "w")]
+    VCVTH_US))]
+ "TARGET_VFP_FP16INST"
+ "vcvt.<sup>%#32.f16\t%0, %1"
+ [(set_attr "conds" "unconditional")
+  (set_attr "type" "f_cvtf2i")]
+)
+
+;; The neon_vcvth<sup>_nhf patterns are used to generate the instruction for the
+;; vcvth_n_f16_<sup>32 arm_fp16 intrinsics.  They are complicated by the
+;; hardware requirement that the source and destination registers are the same
+;; despite having different machine modes.  The approach is to use a temporary
+;; register for the conversion and move that to the correct destination.
+
+;; Generate an unspec pattern for the intrinsic.
+(define_insn "neon_vcvth<sup>_nhf_unspec"
+ [(set
+   (match_operand:SI 0 "s_register_operand" "=w")
+   (unspec:SI
+    [(match_operand:SI 1 "s_register_operand" "0")
+     (match_operand:SI 2 "immediate_operand" "i")]
+    VCVT_HF_US_N))
+ (set
+  (match_operand:HF 3 "s_register_operand" "=w")
+  (float_truncate:HF (float:SF (match_dup 0))))]
+ "TARGET_VFP_FP16INST"
+{
+  neon_const_bounds (operands[2], 1, 33);
+  return "vcvt.f16.<sup>32\t%0, %0, %2\;vmov.f32\t%3, %0";
+}
+  [(set_attr "conds" "unconditional")
+   (set_attr "type" "f_cvti2f")]
+)
+
+;; Generate the instruction patterns needed for vcvth_n_f16_s32 neon intrinsics.
+(define_expand "neon_vcvth<sup>_nhf"
+ [(match_operand:HF 0 "s_register_operand")
+  (unspec:HF [(match_operand:SI 1 "s_register_operand")
+	      (match_operand:SI 2 "immediate_operand")]
+   VCVT_HF_US_N)]
+"TARGET_VFP_FP16INST"
+{
+  rtx op1 = gen_reg_rtx (SImode);
+
+  neon_const_bounds (operands[2], 1, 33);
+
+  emit_move_insn (op1, operands[1]);
+  emit_insn (gen_neon_vcvth<sup>_nhf_unspec (op1, op1, operands[2],
+					     operands[0]));
+  DONE;
+})
+
+;; The neon_vcvth<sup>_nsi patterns are used to generate the instruction for the
+;; vcvth_n_<sup>32_f16 arm_fp16 intrinsics.  They have the same restrictions and
+;; are implemented in the same way as the neon_vcvth<sup>_nhf patterns.
+
+;; Generate an unspec pattern, constraining the registers.
+(define_insn "neon_vcvth<sup>_nsi_unspec"
+ [(set (match_operand:SI 0 "s_register_operand" "=w")
+   (unspec:SI
+    [(fix:SI
+      (fix:SF
+       (float_extend:SF
+	(match_operand:HF 1 "s_register_operand" "w"))))
+     (match_operand:SI 2 "immediate_operand" "i")]
+    VCVT_SI_US_N))]
+ "TARGET_VFP_FP16INST"
+{
+  neon_const_bounds (operands[2], 1, 33);
+  return "vmov.f32\t%0, %1\;vcvt.<sup>%#32.f16\t%0, %0, %2";
+}
+  [(set_attr "conds" "unconditional")
+   (set_attr "type" "f_cvtf2i")]
+)
+
+;; Generate the instruction patterns needed for vcvth_n_f16_s32 neon intrinsics.
+(define_expand "neon_vcvth<sup>_nsi"
+ [(match_operand:SI 0 "s_register_operand")
+  (unspec:SI
+   [(match_operand:HF 1 "s_register_operand")
+    (match_operand:SI 2 "immediate_operand")]
+   VCVT_SI_US_N)]
+ "TARGET_VFP_FP16INST"
+{
+  rtx op1 = gen_reg_rtx (SImode);
+
+  neon_const_bounds (operands[2], 1, 33);
+  emit_insn (gen_neon_vcvth<sup>_nsi_unspec (op1, operands[1], operands[2]));
+  emit_move_insn (operands[0], op1);
+  DONE;
+})
+
+(define_insn "neon_vcvt<vcvth_op>h<sup>si"
+ [(set
+   (match_operand:SI 0 "s_register_operand" "=w")
+   (unspec:SI
+    [(match_operand:HF 1 "s_register_operand" "w")]
+    VCVT_HF_US))]
+ "TARGET_VFP_FP16INST"
+ "vcvt<vcvth_op>.<sup>%#32.f16\t%0, %1"
+  [(set_attr "conds" "unconditional")
+   (set_attr "type" "f_cvtf2i")]
+)
+
 ;; Store multiple insn used in function prologue.
 (define_insn "*push_multi_vfp"
   [(match_parallel 2 "multi_register_push"
@@ -1644,6 +2006,20 @@
 )
 
 ;; Scalar forms for the IEEE-754 fmax()/fmin() functions
+
+(define_insn "neon_<fmaxmin_op>hf"
+ [(set
+   (match_operand:HF 0 "s_register_operand" "=w")
+   (unspec:HF
+    [(match_operand:HF 1 "s_register_operand" "w")
+     (match_operand:HF 2 "s_register_operand" "w")]
+    VMAXMINFNM))]
+ "TARGET_VFP_FP16INST"
+ "<fmaxmin_op>.f16\t%0, %1, %2"
+ [(set_attr "conds" "unconditional")
+  (set_attr "type" "f_minmaxs")]
+)
+
 (define_insn "<fmaxmin><mode>3"
   [(set (match_operand:SDF 0 "s_register_operand" "=<F_constraint>")
 	(unspec:SDF [(match_operand:SDF 1 "s_register_operand" "<F_constraint>")
diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-arith-1.c b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-arith-1.c
new file mode 100644
index 0000000..e7da3fc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-arith-1.c
@@ -0,0 +1,68 @@
+/* { dg-do compile }  */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_ok }  */
+/* { dg-options "-O2 -ffast-math" }  */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+/* Test instructions generated for half-precision arithmetic.  */
+
+typedef __fp16 float16_t;
+typedef __simd64_float16_t float16x4_t;
+typedef __simd128_float16_t float16x8_t;
+
+float16_t
+fp16_abs (float16_t a)
+{
+  return (a < 0) ? -a : a;
+}
+
+#define TEST_UNOP(NAME, OPERATOR, TY)		\
+  TY test_##NAME##_##TY (TY a)			\
+  {						\
+    return OPERATOR (a);			\
+  }
+
+#define TEST_BINOP(NAME, OPERATOR, TY)		\
+  TY test_##NAME##_##TY (TY a, TY b)		\
+  {						\
+    return a OPERATOR b;			\
+  }
+
+#define TEST_CMP(NAME, OPERATOR, RTY, TY)	\
+  RTY test_##NAME##_##TY (TY a, TY b)		\
+  {						\
+    return a OPERATOR b;			\
+  }
+
+/* Scalars.  */
+
+TEST_UNOP (neg, -, float16_t)
+TEST_UNOP (abs, fp16_abs, float16_t)
+
+TEST_BINOP (add, +, float16_t)
+TEST_BINOP (sub, -, float16_t)
+TEST_BINOP (mult, *, float16_t)
+TEST_BINOP (div, /, float16_t)
+
+TEST_CMP (equal, ==, int, float16_t)
+TEST_CMP (unequal, !=, int, float16_t)
+TEST_CMP (lessthan, <, int, float16_t)
+TEST_CMP (greaterthan, >, int, float16_t)
+TEST_CMP (lessthanequal, <=, int, float16_t)
+TEST_CMP (greaterthanqual, >=, int, float16_t)
+
+/* { dg-final { scan-assembler-times {vneg\.f16\ts[0-9]+, s[0-9]+} 1 } }  */
+/* { dg-final { scan-assembler-times {vabs\.f16\ts[0-9]+, s[0-9]+} 2 } }  */
+
+/* { dg-final { scan-assembler-times {vadd\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } }  */
+/* { dg-final { scan-assembler-times {vsub\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } }  */
+/* { dg-final { scan-assembler-times {vmul\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } }  */
+/* { dg-final { scan-assembler-times {vdiv\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } }  */
+/* { dg-final { scan-assembler-times {vcmp\.f32\ts[0-9]+, s[0-9]+} 2 } }  */
+/* { dg-final { scan-assembler-times {vcmpe\.f32\ts[0-9]+, s[0-9]+} 4 } }  */
+
+/* { dg-final { scan-assembler-not {vadd\.f32} } }  */
+/* { dg-final { scan-assembler-not {vsub\.f32} } }  */
+/* { dg-final { scan-assembler-not {vmul\.f32} } }  */
+/* { dg-final { scan-assembler-not {vdiv\.f32} } }  */
+/* { dg-final { scan-assembler-not {vcmp\.f16} } }  */
+/* { dg-final { scan-assembler-not {vcmpe\.f16} } }  */
diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-conv-1.c b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-conv-1.c
new file mode 100644
index 0000000..c9639a5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-conv-1.c
@@ -0,0 +1,101 @@
+/* { dg-do compile }  */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_ok }  */
+/* { dg-options "-O2" }  */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+/* Test ARMv8.2 FP16 conversions.  */
+#include <arm_fp16.h>
+
+float
+f16_to_f32 (__fp16 a)
+{
+  return (float)a;
+}
+
+float
+f16_to_pf32 (__fp16* a)
+{
+  return (float)*a;
+}
+
+short
+f16_to_s16 (__fp16 a)
+{
+  return (short)a;
+}
+
+short
+pf16_to_s16 (__fp16* a)
+{
+  return (short)*a;
+}
+
+/* { dg-final { scan-assembler-times {vcvtb\.f32\.f16\ts[0-9]+, s[0-9]+} 4 } }  */
+
+__fp16
+f32_to_f16 (float a)
+{
+  return (__fp16)a;
+}
+
+void
+f32_to_pf16 (__fp16* x, float a)
+{
+  *x = (__fp16)a;
+}
+
+__fp16
+s16_to_f16 (short a)
+{
+  return (__fp16)a;
+}
+
+void
+s16_to_pf16 (__fp16* x, short a)
+{
+  *x = (__fp16)a;
+}
+
+/* { dg-final { scan-assembler-times {vcvtb\.f16\.f32\ts[0-9]+, s[0-9]+} 4 } }  */
+
+float
+s16_to_f32 (short a)
+{
+  return (float)a;
+}
+
+/* { dg-final { scan-assembler-times {vcvt\.f32\.s32\ts[0-9]+, s[0-9]+} 3 } }  */
+
+short
+f32_to_s16 (float a)
+{
+  return (short)a;
+}
+
+/* { dg-final { scan-assembler-times {vcvt\.s32\.f32\ts[0-9]+, s[0-9]+} 3 } }  */
+
+unsigned short
+f32_to_u16 (float a)
+{
+  return (unsigned short)a;
+}
+
+/* { dg-final { scan-assembler-times {vcvt\.u32\.f32\ts[0-9]+, s[0-9]+} 1 } }  */
+
+short
+f64_to_s16 (double a)
+{
+  return (short)a;
+}
+
+/* { dg-final { scan-assembler-times {vcvt\.s32\.f64\ts[0-9]+, d[0-9]+} 1 } }  */
+
+unsigned short
+f64_to_u16 (double a)
+{
+  return (unsigned short)a;
+}
+
+/* { dg-final { scan-assembler-times {vcvt\.s32\.f64\ts[0-9]+, d[0-9]+} 1 } }  */
+
+
-- 
2.1.4


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 9/17][ARM] Add NEON FP16 arithmetic instructions.
  2016-05-18  0:58   ` Joseph Myers
  2016-05-19 17:01     ` Jiong Wang
@ 2016-07-04 14:09     ` Matthew Wahab
  2016-07-28 11:53       ` Ramana Radhakrishnan
  1 sibling, 1 reply; 73+ messages in thread
From: Matthew Wahab @ 2016-07-04 14:09 UTC (permalink / raw)
  To: Joseph Myers; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 4937 bytes --]

On 18/05/16 01:58, Joseph Myers wrote:
 > On Tue, 17 May 2016, Matthew Wahab wrote:
 >
 >> As with the VFP FP16 arithmetic instructions, operations on __fp16
 >> values are done by conversion to single-precision. Any new optimization
 >> supported by the instruction descriptions can only apply to code
 >> generated using intrinsics added in this patch series.
 >
 > As with the scalar instructions, I think it is legitimate in most cases to
 > optimize arithmetic via single precision to work direct on __fp16 values
 > (and this would be natural for vectorization of __fp16 arithmetic).
 >
 >> A number of the instructions are modelled as two variants, one using
 >> UNSPEC and the other using RTL operations, with the model used decided
 >> by the funsafe-math-optimizations flag. This follows the
 >> single-precision instructions and is due to the half-precision
 >> operations having the same conditions and restrictions on their use in
 >> optmizations (when they are enabled).
 >
 > (Of course, these restrictions still apply.)

The F16 support generally follows the F32 implementation and, for F32,
direct arithmetic vector operations are only available when
unsafe-math-optimizations is enabled. I want to check the behaviour of
the F16 operations when unsafe-math is enabled so I'll defer to a follow
up patch the change to use standard names for the vector operations.

There are still some changes from the previous patch:

- Two fma/fmsub patterns *fma<VH:mode>4 and <*fmsub<VH:mode>4 are
   dropped since they just duplicated *fma<VH:mode>4_intrinsic and
   <*fmsub<VH:mode>4_intrinsic.

- Patterns neon_vadd<mode>_unspec and neon_vsub<mode>_unspec are
   dropped, they were redundant.

- <absneg_str><mode>2_fp16 is renamed to <absneg_str><mode>2. This
   implements the abs and neg operations which are always safe to use.

- neon_vsqrte<mode> is renamed to neon_vrsqrte<mode>. This is a
   misspelled intrinsic that wasn't caught in testing because the
   relevant test case is missing. The intrinsic is fixed here and in
   other patches and an advsimd-intrinsics test added later in the
   (updated) series.

- neon_vcvt<sup>_n<mode: The bounds on the scalar were wrong, the
   correct range for f16 is 0-17.

- Test armv8_2-fp16-arith-1.c is updated to expect f16 arithmetic
   instructions rather then f32 and to use the neon command line options.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator.

Ok for trunk?
Matthew

2016-07-04  Matthew Wahab  <matthew.wahab@arm.com>

	* config/arm/iterators.md (VCVTHI): New.
	(NEON_VCMP): Add UNSPEC_VCLT and UNSPEC_VCLE.  Fix a long line.
	(NEON_VAGLTE): New.
	(VFM_LANE_AS): New.
	(VH_CVTTO): New.
	(V_reg): Add HF, V4HF and V8HF.  Fix white-space.
	(V_HALF): Add V4HF.  Fix white-space.
	(V_if_elem): Add HF, V4HF and V8HF.  Fix white-space.
	(V_s_elem): Likewise.
	(V_sz_elem): Fix white-space.
	(V_elem_ch): Likewise.
	(VH_elem_ch): New.
	(scalar_mul_constraint): Add V8HF and V4HF.
	(Is_float_mode): Fix white-space.
	(Is_d_reg): Fix white-space.
	(q): Add HF.  Fix white-space.
	(float_sup): New.
	(float_SUP): New.
	(cmp_op_unsp): Add UNSPEC_VCALE and UNSPEC_VCALT.
	(neon_vfm_lane_as): New.
	* config/arm/neon.md (add<mode>3_fp16): New.
	(sub<mode>3_fp16): New.
	(mul<mode>3add<mode>_neon): New.
	(fma<VH:mode>4_intrinsic): New.
	(fmsub<VCVTF:mode>4_intrinsic): Fix white-space.
	(fmsub<VH:mode>4_intrinsic): New.
	(<absneg_str><mode>2): New.
	(neon_v<absneg_str><mode>): New.
	(neon_v<fp16_rnd_str><mode>): New.
	(neon_vrsqrte<mode>): New.
	(neon_vpaddv4hf): New.
	(neon_vadd<mode>): New.
	(neon_vsub<mode>): New.
	(neon_vmulf<mode>): New.
	(neon_vfma<VH:mode>): New.
	(neon_vfms<VH:mode>): New.
	(neon_vc<cmp_op><mode>): New.
	(neon_vc<cmp_op><mode>_fp16insn): New
	(neon_vc<cmp_op_unsp><mode>_fp16insn_unspec): New.
	(neon_vca<cmp_op><mode>): New.
	(neon_vca<cmp_op><mode>_fp16insn): New.
	(neon_vca<cmp_op_unsp><mode>_fp16insn_unspec): New.
	(neon_vc<cmp_op>z<mode>): New.
	(neon_vabd<mode>): New.
	(neon_v<maxmin>f<mode>): New.
	(neon_vp<maxmin>fv4hf: New.
	(neon_<fmaxmin_op><mode>): New.
	(neon_vrecps<mode>): New.
	(neon_vrsqrts<mode>): New.
	(neon_vrecpe<mode>): New (VH variant).
	(neon_vdup_lane<mode>_internal): New.
	(neon_vdup_lane<mode>): New.
	(neon_vcvt<sup><mode>): New (VCVTHI variant).
	(neon_vcvt<sup><mode>): New (VH variant).
	(neon_vcvt<sup>_n<mode>): New (VH variant).
	(neon_vcvt<sup>_n<mode>): New (VCVTHI variant).
	(neon_vcvt<vcvth_op><sup><mode>): New.
	(neon_vmul_lane<mode>): New.
	(neon_vmul_n<mode>): New.
	* config/arm/unspecs.md (UNSPEC_VCALE): New
	(UNSPEC_VCALT): New.
	(UNSPEC_VFMA_LANE): New.
	(UNSPECS_VFMS_LANE): New.

testsuite/
2016-07-04  Matthew Wahab  <matthew.wahab@arm.com>

	* gcc.target/arm/armv8_2-fp16-arith-1.c: Use arm_v8_2a_fp16_neon
	options.  Add tests for float16x4_t and float16x8_t.


[-- Attachment #2: 0009-PATCH-9-17-ARM-Add-NEON-FP16-arithmetic-instructions.patch --]
[-- Type: text/x-patch, Size: 36684 bytes --]

From 4cbebc297f74f0c2e3ddac600d7902083c09c934 Mon Sep 17 00:00:00 2001
From: Matthew Wahab <matthew.wahab@arm.com>
Date: Thu, 7 Apr 2016 16:19:57 +0100
Subject: [PATCH 09/17] [PATCH 9/17][ARM] Add NEON FP16 arithmetic
 instructions.

2016-07-04  Matthew Wahab  <matthew.wahab@arm.com>

	* config/arm/iterators.md (VCVTHI): New.
	(NEON_VCMP): Add UNSPEC_VCLT and UNSPEC_VCLE.  Fix a long line.
	(NEON_VAGLTE): New.
	(VFM_LANE_AS): New.
	(VH_CVTTO): New.
	(V_reg): Add HF, V4HF and V8HF.  Fix white-space.
	(V_HALF): Add V4HF.  Fix white-space.
	(V_if_elem): Add HF, V4HF and V8HF.  Fix white-space.
	(V_s_elem): Likewise.
	(V_sz_elem): Fix white-space.
	(V_elem_ch): Likewise.
	(VH_elem_ch): New.
	(scalar_mul_constraint): Add V8HF and V4HF.
	(Is_float_mode): Fix white-space.
	(Is_d_reg): Add V4HF and V8HF.  Fix white-space.
	(q): Add HF.  Fix white-space.
	(float_sup): New.
	(float_SUP): New.
	(cmp_op_unsp): Add UNSPEC_VCALE and UNSPEC_VCALT.
	(neon_vfm_lane_as): New.
	* config/arm/neon.md (add<mode>3_fp16): New.
	(sub<mode>3_fp16): New.
	(mul<mode>3add<mode>_neon): New.
	(fma<VH:mode>4_intrinsic): New.
	(fmsub<VCVTF:mode>4_intrinsic): Fix white-space.
	(fmsub<VH:mode>4_intrinsic): New.
	(<absneg_str><mode>2): New.
	(neon_v<absneg_str><mode>): New.
	(neon_v<fp16_rnd_str><mode>): New.
	(neon_vrsqrte<mode>): New.
	(neon_vpaddv4hf): New.
	(neon_vadd<mode>): New.
	(neon_vsub<mode>): New.
	(neon_vmulf<mode>): New.
	(neon_vfma<VH:mode>): New.
	(neon_vfms<VH:mode>): New.
	(neon_vc<cmp_op><mode>): New.
	(neon_vc<cmp_op><mode>_fp16insn): New
	(neon_vc<cmp_op_unsp><mode>_fp16insn_unspec): New.
	(neon_vca<cmp_op><mode>): New.
	(neon_vca<cmp_op><mode>_fp16insn): New.
	(neon_vca<cmp_op_unsp><mode>_fp16insn_unspec): New.
	(neon_vc<cmp_op>z<mode>): New.
	(neon_vabd<mode>): New.
	(neon_v<maxmin>f<mode>): New.
	(neon_vp<maxmin>fv4hf: New.
	(neon_<fmaxmin_op><mode>): New.
	(neon_vrecps<mode>): New.
	(neon_vrsqrts<mode>): New.
	(neon_vrecpe<mode>): New (VH variant).
	(neon_vdup_lane<mode>_internal): New.
	(neon_vdup_lane<mode>): New.
	(neon_vcvt<sup><mode>): New (VCVTHI variant).
	(neon_vcvt<sup><mode>): New (VH variant).
	(neon_vcvt<sup>_n<mode>): New (VH variant).
	(neon_vcvt<sup>_n<mode>): New (VCVTHI variant).
	(neon_vcvt<vcvth_op><sup><mode>): New.
	(neon_vmul_lane<mode>): New.
	(neon_vmul_n<mode>): New.
	* config/arm/unspecs.md (UNSPEC_VCALE): New
	(UNSPEC_VCALT): New.
	(UNSPEC_VFMA_LANE): New.
	(UNSPECS_VFMS_LANE): New.

testsuite/
2016-07-04  Matthew Wahab  <matthew.wahab@arm.com>

	* gcc.target/arm/armv8_2-fp16-arith-1.c: Use arm_v8_2a_fp16_neon
	options.  Add tests for float16x4_t and float16x8_t.
---
 gcc/config/arm/iterators.md                        | 121 ++++--
 gcc/config/arm/neon.md                             | 459 ++++++++++++++++++++-
 gcc/config/arm/unspecs.md                          |   5 +-
 .../gcc.target/arm/armv8_2-fp16-arith-1.c          |  53 ++-
 4 files changed, 579 insertions(+), 59 deletions(-)

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 9371b6a..be39e4a 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -145,6 +145,9 @@
 ;; Vector modes form int->float conversions.
 (define_mode_iterator VCVTI [V2SI V4SI])
 
+;; Vector modes for int->half conversions.
+(define_mode_iterator VCVTHI [V4HI V8HI])
+
 ;; Vector modes for doubleword multiply-accumulate, etc. insns.
 (define_mode_iterator VMD [V4HI V2SI V2SF])
 
@@ -267,10 +270,14 @@
 (define_int_iterator VRINT [UNSPEC_VRINTZ UNSPEC_VRINTP UNSPEC_VRINTM
                             UNSPEC_VRINTR UNSPEC_VRINTX UNSPEC_VRINTA])
 
-(define_int_iterator NEON_VCMP [UNSPEC_VCEQ UNSPEC_VCGT UNSPEC_VCGE UNSPEC_VCLT UNSPEC_VCLE])
+(define_int_iterator NEON_VCMP [UNSPEC_VCEQ UNSPEC_VCGT UNSPEC_VCGE
+				UNSPEC_VCLT UNSPEC_VCLE])
 
 (define_int_iterator NEON_VACMP [UNSPEC_VCAGE UNSPEC_VCAGT])
 
+(define_int_iterator NEON_VAGLTE [UNSPEC_VCAGE UNSPEC_VCAGT
+				  UNSPEC_VCALE UNSPEC_VCALT])
+
 (define_int_iterator VCVT [UNSPEC_VRINTP UNSPEC_VRINTM UNSPEC_VRINTA])
 
 (define_int_iterator NEON_VRINT [UNSPEC_NVRINTP UNSPEC_NVRINTZ UNSPEC_NVRINTM
@@ -398,6 +405,8 @@
 
 (define_int_iterator VQRDMLH_AS [UNSPEC_VQRDMLAH UNSPEC_VQRDMLSH])
 
+(define_int_iterator VFM_LANE_AS [UNSPEC_VFMA_LANE UNSPEC_VFMS_LANE])
+
 ;;----------------------------------------------------------------------------
 ;; Mode attributes
 ;;----------------------------------------------------------------------------
@@ -416,6 +425,10 @@
 (define_mode_attr V_cvtto [(V2SI "v2sf") (V2SF "v2si")
                            (V4SI "v4sf") (V4SF "v4si")])
 
+;; (Opposite) mode to convert to/from for vector-half mode conversions.
+(define_mode_attr VH_CVTTO [(V4HI "V4HF") (V4HF "V4HI")
+			    (V8HI "V8HF") (V8HF "V8HI")])
+
 ;; Define element mode for each vector mode.
 (define_mode_attr V_elem [(V8QI "QI") (V16QI "QI")
 			  (V4HI "HI") (V8HI "HI")
@@ -459,12 +472,13 @@
 
 ;; Register width from element mode
 (define_mode_attr V_reg [(V8QI "P") (V16QI "q")
-                         (V4HI "P") (V8HI  "q")
-                         (V4HF "P") (V8HF  "q")
-                         (V2SI "P") (V4SI  "q")
-                         (V2SF "P") (V4SF  "q")
-                         (DI   "P") (V2DI  "q")
-                         (SF   "")  (DF    "P")])
+			 (V4HI "P") (V8HI  "q")
+			 (V4HF "P") (V8HF  "q")
+			 (V2SI "P") (V4SI  "q")
+			 (V2SF "P") (V4SF  "q")
+			 (DI   "P") (V2DI  "q")
+			 (SF   "")  (DF    "P")
+			 (HF   "")])
 
 ;; Wider modes with the same number of elements.
 (define_mode_attr V_widen [(V8QI "V8HI") (V4HI "V4SI") (V2SI "V2DI")])
@@ -480,7 +494,7 @@
 (define_mode_attr V_HALF [(V16QI "V8QI") (V8HI "V4HI")
 			  (V8HF "V4HF") (V4SI  "V2SI")
 			  (V4SF "V2SF") (V2DF "DF")
-                          (V2DI "DI")])
+			  (V2DI "DI") (V4HF "HF")])
 
 ;; Same, but lower-case.
 (define_mode_attr V_half [(V16QI "v8qi") (V8HI "v4hi")
@@ -529,18 +543,22 @@
 ;; Get element type from double-width mode, for operations where we 
 ;; don't care about signedness.
 (define_mode_attr V_if_elem [(V8QI "i8")  (V16QI "i8")
-                 (V4HI "i16") (V8HI  "i16")
-                             (V2SI "i32") (V4SI  "i32")
-                             (DI   "i64") (V2DI  "i64")
-                 (V2SF "f32") (V4SF  "f32")
-                 (SF "f32") (DF "f64")])
+			     (V4HI "i16") (V8HI  "i16")
+			     (V2SI "i32") (V4SI  "i32")
+			     (DI   "i64") (V2DI  "i64")
+			     (V2SF "f32") (V4SF  "f32")
+			     (SF   "f32") (DF    "f64")
+			     (HF   "f16") (V4HF  "f16")
+			     (V8HF "f16")])
 
 ;; Same, but for operations which work on signed values.
 (define_mode_attr V_s_elem [(V8QI "s8")  (V16QI "s8")
-                (V4HI "s16") (V8HI  "s16")
-                            (V2SI "s32") (V4SI  "s32")
-                            (DI   "s64") (V2DI  "s64")
-                (V2SF "f32") (V4SF  "f32")])
+			    (V4HI "s16") (V8HI  "s16")
+			    (V2SI "s32") (V4SI  "s32")
+			    (DI   "s64") (V2DI  "s64")
+			    (V2SF "f32") (V4SF  "f32")
+			    (HF   "f16") (V4HF  "f16")
+			    (V8HF "f16")])
 
 ;; Same, but for operations which work on unsigned values.
 (define_mode_attr V_u_elem [(V8QI "u8")  (V16QI "u8")
@@ -557,17 +575,22 @@
                              (V2SF "32") (V4SF "32")])
 
 (define_mode_attr V_sz_elem [(V8QI "8")  (V16QI "8")
-                 (V4HI "16") (V8HI  "16")
-                             (V2SI "32") (V4SI  "32")
-                             (DI   "64") (V2DI  "64")
+			     (V4HI "16") (V8HI  "16")
+			     (V2SI "32") (V4SI  "32")
+			     (DI   "64") (V2DI  "64")
 			     (V4HF "16") (V8HF "16")
-                 (V2SF "32") (V4SF  "32")])
+			     (V2SF "32") (V4SF  "32")])
 
 (define_mode_attr V_elem_ch [(V8QI "b")  (V16QI "b")
-                             (V4HI "h") (V8HI  "h")
-                             (V2SI "s") (V4SI  "s")
-                             (DI   "d") (V2DI  "d")
-                             (V2SF "s") (V4SF  "s")])
+			     (V4HI "h") (V8HI  "h")
+			     (V2SI "s") (V4SI  "s")
+			     (DI   "d") (V2DI  "d")
+			     (V2SF "s") (V4SF  "s")
+			     (V2SF "s") (V4SF  "s")])
+
+(define_mode_attr VH_elem_ch [(V4HI "s") (V8HI  "s")
+			      (V4HF "s") (V8HF  "s")
+			      (HF "s")])
 
 ;; Element sizes for duplicating ARM registers to all elements of a vector.
 (define_mode_attr VD_dup [(V8QI "8") (V4HI "16") (V2SI "32") (V2SF "32")])
@@ -603,16 +626,17 @@
 ;; This mode attribute is used to obtain the correct register constraints.
 
 (define_mode_attr scalar_mul_constraint [(V4HI "x") (V2SI "t") (V2SF "t")
-                                         (V8HI "x") (V4SI "t") (V4SF "t")])
+					 (V8HI "x") (V4SI "t") (V4SF "t")
+					 (V8HF "x") (V4HF "x")])
 
 ;; Predicates used for setting type for neon instructions
 
 (define_mode_attr Is_float_mode [(V8QI "false") (V16QI "false")
-                 (V4HI "false") (V8HI "false")
-                 (V2SI "false") (V4SI "false")
-                 (V4HF "true") (V8HF "true")
-                 (V2SF "true") (V4SF "true")
-                 (DI "false") (V2DI "false")])
+				 (V4HI "false") (V8HI "false")
+				 (V2SI "false") (V4SI "false")
+				 (V4HF "true") (V8HF "true")
+				 (V2SF "true") (V4SF "true")
+				 (DI "false") (V2DI "false")])
 
 (define_mode_attr Scalar_mul_8_16 [(V8QI "true") (V16QI "true")
 				   (V4HI "true") (V8HI "true")
@@ -621,10 +645,10 @@
 				   (DI "false") (V2DI "false")])
 
 (define_mode_attr Is_d_reg [(V8QI "true") (V16QI "false")
-                            (V4HI "true") (V8HI  "false")
-                            (V2SI "true") (V4SI  "false")
-                            (V2SF "true") (V4SF  "false")
-                            (DI   "true") (V2DI  "false")
+			    (V4HI "true") (V8HI  "false")
+			    (V2SI "true") (V4SI  "false")
+			    (V2SF "true") (V4SF  "false")
+			    (DI   "true") (V2DI  "false")
 			    (V4HF "true") (V8HF  "false")])
 
 (define_mode_attr V_mode_nunits [(V8QI "8") (V16QI "16")
@@ -670,12 +694,14 @@
 
 ;; Mode attribute used to build the "type" attribute.
 (define_mode_attr q [(V8QI "") (V16QI "_q")
-                     (V4HI "") (V8HI "_q")
-                     (V2SI "") (V4SI "_q")
+		     (V4HI "") (V8HI "_q")
+		     (V2SI "") (V4SI "_q")
 		     (V4HF "") (V8HF "_q")
-                     (V2SF "") (V4SF "_q")
-                     (DI "")   (V2DI "_q")
-                     (DF "")   (V2DF "_q")])
+		     (V2SF "") (V4SF "_q")
+		     (V4HF "") (V8HF "_q")
+		     (DI "")   (V2DI "_q")
+		     (DF "")   (V2DF "_q")
+		     (HF "")])
 
 (define_mode_attr pf [(V8QI "p") (V16QI "p") (V2SF "f") (V4SF "f")])
 
@@ -718,6 +744,10 @@
 ;; Conversions.
 (define_code_attr FCVTI32typename [(unsigned_float "u32") (float "s32")])
 
+(define_code_attr float_sup [(unsigned_float "u") (float "s")])
+
+(define_code_attr float_SUP [(unsigned_float "U") (float "S")])
+
 ;;----------------------------------------------------------------------------
 ;; Int attributes
 ;;----------------------------------------------------------------------------
@@ -790,9 +820,10 @@
    (UNSPEC_VRNDP "vrintp") (UNSPEC_VRNDX "vrintx")])
 
 (define_int_attr cmp_op_unsp [(UNSPEC_VCEQ "eq") (UNSPEC_VCGT "gt")
-                              (UNSPEC_VCGE "ge") (UNSPEC_VCLE "le")
-                              (UNSPEC_VCLT "lt") (UNSPEC_VCAGE "ge")
-                              (UNSPEC_VCAGT "gt")])
+			      (UNSPEC_VCGE "ge") (UNSPEC_VCLE "le")
+			      (UNSPEC_VCLT "lt") (UNSPEC_VCAGE "ge")
+			      (UNSPEC_VCAGT "gt") (UNSPEC_VCALE "le")
+			      (UNSPEC_VCALT "lt")])
 
 (define_int_attr r [
   (UNSPEC_VRHADD_S "r") (UNSPEC_VRHADD_U "r")
@@ -908,3 +939,7 @@
 
 ;; Attributes for VQRDMLAH/VQRDMLSH
 (define_int_attr neon_rdma_as [(UNSPEC_VQRDMLAH "a") (UNSPEC_VQRDMLSH "s")])
+
+;; Attributes for VFMA_LANE/ VFMS_LANE
+(define_int_attr neon_vfm_lane_as
+ [(UNSPEC_VFMA_LANE "a") (UNSPEC_VFMS_LANE "s")])
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index c7bb121..0532333 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -505,6 +505,20 @@
                     (const_string "neon_add<q>")))]
 )
 
+(define_insn "add<mode>3_fp16"
+  [(set
+    (match_operand:VH 0 "s_register_operand" "=w")
+    (plus:VH
+     (match_operand:VH 1 "s_register_operand" "w")
+     (match_operand:VH 2 "s_register_operand" "w")))]
+ "TARGET_NEON_FP16INST"
+ "vadd.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+ [(set (attr "type")
+   (if_then_else (match_test "<Is_float_mode>")
+    (const_string "neon_fp_addsub_s<q>")
+    (const_string "neon_add<q>")))]
+)
+
 (define_insn "adddi3_neon"
   [(set (match_operand:DI 0 "s_register_operand" "=w,?&r,?&r,?w,?&r,?&r,?&r")
         (plus:DI (match_operand:DI 1 "s_register_operand" "%w,0,0,w,r,0,r")
@@ -543,6 +557,17 @@
                     (const_string "neon_sub<q>")))]
 )
 
+(define_insn "sub<mode>3_fp16"
+ [(set
+   (match_operand:VH 0 "s_register_operand" "=w")
+   (minus:VH
+    (match_operand:VH 1 "s_register_operand" "w")
+    (match_operand:VH 2 "s_register_operand" "w")))]
+ "TARGET_NEON_FP16INST"
+ "vsub.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+ [(set_attr "type" "neon_sub<q>")]
+)
+
 (define_insn "subdi3_neon"
   [(set (match_operand:DI 0 "s_register_operand" "=w,?&r,?&r,?&r,?w")
         (minus:DI (match_operand:DI 1 "s_register_operand" "w,0,r,0,w")
@@ -591,6 +616,16 @@
 		    (const_string "neon_mla_<V_elem_ch><q>")))]
 )
 
+(define_insn "mul<mode>3add<mode>_neon"
+  [(set (match_operand:VH 0 "s_register_operand" "=w")
+	(plus:VH (mult:VH (match_operand:VH 2 "s_register_operand" "w")
+			  (match_operand:VH 3 "s_register_operand" "w"))
+		  (match_operand:VH 1 "s_register_operand" "0")))]
+  "TARGET_NEON_FP16INST && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
+  "vmla.f16\t%<V_reg>0, %<V_reg>2, %<V_reg>3"
+  [(set_attr "type" "neon_fp_mla_s<q>")]
+)
+
 (define_insn "mul<mode>3neg<mode>add<mode>_neon"
   [(set (match_operand:VDQW 0 "s_register_operand" "=w")
         (minus:VDQW (match_operand:VDQW 1 "s_register_operand" "0")
@@ -629,6 +664,19 @@
   [(set_attr "type" "neon_fp_mla_s<q>")]
 )
 
+;; There is limited support for unsafe-math optimizations using the NEON FP16
+;; arithmetic instructions, so only the intrinsic is currently supported.
+(define_insn "fma<VH:mode>4_intrinsic"
+ [(set (match_operand:VH 0 "register_operand" "=w")
+   (fma:VH
+    (match_operand:VH 1 "register_operand" "w")
+    (match_operand:VH 2 "register_operand" "w")
+    (match_operand:VH 3 "register_operand" "0")))]
+ "TARGET_NEON_FP16INST"
+ "vfma.<V_if_elem>\\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+ [(set_attr "type" "neon_fp_mla_s<q>")]
+)
+
 (define_insn "*fmsub<VCVTF:mode>4"
   [(set (match_operand:VCVTF 0 "register_operand" "=w")
         (fma:VCVTF (neg:VCVTF (match_operand:VCVTF 1 "register_operand" "w"))
@@ -640,13 +688,25 @@
 )
 
 (define_insn "fmsub<VCVTF:mode>4_intrinsic"
-  [(set (match_operand:VCVTF 0 "register_operand" "=w")
-        (fma:VCVTF (neg:VCVTF (match_operand:VCVTF 1 "register_operand" "w"))
-		   (match_operand:VCVTF 2 "register_operand" "w")
-		   (match_operand:VCVTF 3 "register_operand" "0")))]
-  "TARGET_NEON && TARGET_FMA"
-  "vfms%?.<V_if_elem>\\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
-  [(set_attr "type" "neon_fp_mla_s<q>")]
+ [(set (match_operand:VCVTF 0 "register_operand" "=w")
+   (fma:VCVTF
+    (neg:VCVTF (match_operand:VCVTF 1 "register_operand" "w"))
+    (match_operand:VCVTF 2 "register_operand" "w")
+    (match_operand:VCVTF 3 "register_operand" "0")))]
+ "TARGET_NEON && TARGET_FMA"
+ "vfms%?.<V_if_elem>\\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+ [(set_attr "type" "neon_fp_mla_s<q>")]
+)
+
+(define_insn "fmsub<VH:mode>4_intrinsic"
+ [(set (match_operand:VH 0 "register_operand" "=w")
+   (fma:VH
+    (neg:VH (match_operand:VH 1 "register_operand" "w"))
+    (match_operand:VH 2 "register_operand" "w")
+    (match_operand:VH 3 "register_operand" "0")))]
+ "TARGET_NEON_FP16INST"
+ "vfms.<V_if_elem>\\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+ [(set_attr "type" "neon_fp_mla_s<q>")]
 )
 
 (define_insn "neon_vrint<NEON_VRINT:nvrint_variant><VCVTF:mode>"
@@ -860,6 +920,44 @@
   ""
 )
 
+(define_insn "<absneg_str><mode>2"
+  [(set (match_operand:VH 0 "s_register_operand" "=w")
+    (ABSNEG:VH (match_operand:VH 1 "s_register_operand" "w")))]
+ "TARGET_NEON_FP16INST"
+ "v<absneg_str>.<V_s_elem>\t%<V_reg>0, %<V_reg>1"
+ [(set_attr "type" "neon_abs<q>")]
+)
+
+(define_expand "neon_v<absneg_str><mode>"
+ [(set
+   (match_operand:VH 0 "s_register_operand")
+   (ABSNEG:VH (match_operand:VH 1 "s_register_operand")))]
+ "TARGET_NEON_FP16INST"
+{
+  emit_insn (gen_<absneg_str><mode>2 (operands[0], operands[1]));
+  DONE;
+})
+
+(define_insn "neon_v<fp16_rnd_str><mode>"
+  [(set (match_operand:VH 0 "s_register_operand" "=w")
+    (unspec:VH
+     [(match_operand:VH 1 "s_register_operand" "w")]
+     FP16_RND))]
+ "TARGET_NEON_FP16INST"
+ "<fp16_rnd_insn>.<V_s_elem>\t%<V_reg>0, %<V_reg>1"
+ [(set_attr "type" "neon_fp_round_s<q>")]
+)
+
+(define_insn "neon_vrsqrte<mode>"
+  [(set (match_operand:VH 0 "s_register_operand" "=w")
+    (unspec:VH
+     [(match_operand:VH 1 "s_register_operand" "w")]
+     UNSPEC_VRSQRTE))]
+  "TARGET_NEON_FP16INST"
+  "vrsqrte.f16\t%<V_reg>0, %<V_reg>1"
+ [(set_attr "type" "neon_fp_rsqrte_s<q>")]
+)
+
 (define_insn "*umin<mode>3_neon"
   [(set (match_operand:VDQIW 0 "s_register_operand" "=w")
 	(umin:VDQIW (match_operand:VDQIW 1 "s_register_operand" "w")
@@ -1601,6 +1699,17 @@
                     (const_string "neon_reduc_add<q>")))]
 )
 
+(define_insn "neon_vpaddv4hf"
+ [(set
+   (match_operand:V4HF 0 "s_register_operand" "=w")
+   (unspec:V4HF [(match_operand:V4HF 1 "s_register_operand" "w")
+		 (match_operand:V4HF 2 "s_register_operand" "w")]
+    UNSPEC_VPADD))]
+ "TARGET_NEON_FP16INST"
+ "vpadd.f16\t%P0, %P1, %P2"
+ [(set_attr "type" "neon_reduc_add")]
+)
+
 (define_insn "neon_vpsmin<mode>"
   [(set (match_operand:VD 0 "s_register_operand" "=w")
 	(unspec:VD [(match_operand:VD 1 "s_register_operand" "w")
@@ -1949,6 +2058,26 @@
   DONE;
 })
 
+(define_expand "neon_vadd<mode>"
+  [(match_operand:VH 0 "s_register_operand")
+   (match_operand:VH 1 "s_register_operand")
+   (match_operand:VH 2 "s_register_operand")]
+  "TARGET_NEON_FP16INST"
+{
+  emit_insn (gen_add<mode>3_fp16 (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
+(define_expand "neon_vsub<mode>"
+  [(match_operand:VH 0 "s_register_operand")
+   (match_operand:VH 1 "s_register_operand")
+   (match_operand:VH 2 "s_register_operand")]
+  "TARGET_NEON_FP16INST"
+{
+  emit_insn (gen_sub<mode>3_fp16 (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
 ; Note that NEON operations don't support the full IEEE 754 standard: in
 ; particular, denormal values are flushed to zero.  This means that GCC cannot
 ; use those instructions for autovectorization, etc. unless
@@ -2040,6 +2169,17 @@
                     (const_string "neon_mul_<V_elem_ch><q>")))]
 )
 
+(define_insn "neon_vmulf<mode>"
+ [(set
+   (match_operand:VH 0 "s_register_operand" "=w")
+   (mult:VH
+    (match_operand:VH 1 "s_register_operand" "w")
+    (match_operand:VH 2 "s_register_operand" "w")))]
+  "TARGET_NEON_FP16INST"
+  "vmul.f16\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+ [(set_attr "type" "neon_mul_<VH_elem_ch><q>")]
+)
+
 (define_expand "neon_vmla<mode>"
   [(match_operand:VDQW 0 "s_register_operand" "=w")
    (match_operand:VDQW 1 "s_register_operand" "0")
@@ -2068,6 +2208,18 @@
   DONE;
 })
 
+(define_expand "neon_vfma<VH:mode>"
+  [(match_operand:VH 0 "s_register_operand")
+   (match_operand:VH 1 "s_register_operand")
+   (match_operand:VH 2 "s_register_operand")
+   (match_operand:VH 3 "s_register_operand")]
+  "TARGET_NEON_FP16INST"
+{
+  emit_insn (gen_fma<mode>4_intrinsic (operands[0], operands[2], operands[3],
+				       operands[1]));
+  DONE;
+})
+
 (define_expand "neon_vfms<VCVTF:mode>"
   [(match_operand:VCVTF 0 "s_register_operand")
    (match_operand:VCVTF 1 "s_register_operand")
@@ -2080,6 +2232,18 @@
   DONE;
 })
 
+(define_expand "neon_vfms<VH:mode>"
+  [(match_operand:VH 0 "s_register_operand")
+   (match_operand:VH 1 "s_register_operand")
+   (match_operand:VH 2 "s_register_operand")
+   (match_operand:VH 3 "s_register_operand")]
+  "TARGET_NEON_FP16INST"
+{
+  emit_insn (gen_fmsub<mode>4_intrinsic (operands[0], operands[2], operands[3],
+					 operands[1]));
+  DONE;
+})
+
 ; Used for intrinsics when flag_unsafe_math_optimizations is false.
 
 (define_insn "neon_vmla<mode>_unspec"
@@ -2380,6 +2544,72 @@
   [(set_attr "type" "neon_fp_compare_s<q>")]
 )
 
+(define_expand "neon_vc<cmp_op><mode>"
+ [(match_operand:<V_cmp_result> 0 "s_register_operand")
+  (neg:<V_cmp_result>
+   (COMPARISONS:VH
+    (match_operand:VH 1 "s_register_operand")
+    (match_operand:VH 2 "reg_or_zero_operand")))]
+ "TARGET_NEON_FP16INST"
+{
+  /* For FP comparisons use UNSPECS unless -funsafe-math-optimizations
+     are enabled.  */
+  if (GET_MODE_CLASS (<MODE>mode) == MODE_VECTOR_FLOAT
+      && !flag_unsafe_math_optimizations)
+    emit_insn
+      (gen_neon_vc<cmp_op><mode>_fp16insn_unspec
+       (operands[0], operands[1], operands[2]));
+  else
+    emit_insn
+      (gen_neon_vc<cmp_op><mode>_fp16insn
+       (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
+(define_insn "neon_vc<cmp_op><mode>_fp16insn"
+ [(set (match_operand:<V_cmp_result> 0 "s_register_operand" "=w,w")
+   (neg:<V_cmp_result>
+    (COMPARISONS:<V_cmp_result>
+     (match_operand:VH 1 "s_register_operand" "w,w")
+     (match_operand:VH 2 "reg_or_zero_operand" "w,Dz"))))]
+ "TARGET_NEON_FP16INST
+  && !(GET_MODE_CLASS (<MODE>mode) == MODE_VECTOR_FLOAT
+  && !flag_unsafe_math_optimizations)"
+{
+  char pattern[100];
+  sprintf (pattern, "vc<cmp_op>.%s%%#<V_sz_elem>\t%%<V_reg>0,"
+	   " %%<V_reg>1, %s",
+	   GET_MODE_CLASS (<MODE>mode) == MODE_VECTOR_FLOAT
+	   ? "f" : "<cmp_type>",
+	   which_alternative == 0
+	   ? "%<V_reg>2" : "#0");
+  output_asm_insn (pattern, operands);
+  return "";
+}
+ [(set (attr "type")
+   (if_then_else (match_operand 2 "zero_operand")
+    (const_string "neon_compare_zero<q>")
+    (const_string "neon_compare<q>")))])
+
+(define_insn "neon_vc<cmp_op_unsp><mode>_fp16insn_unspec"
+ [(set
+   (match_operand:<V_cmp_result> 0 "s_register_operand" "=w,w")
+   (unspec:<V_cmp_result>
+    [(match_operand:VH 1 "s_register_operand" "w,w")
+     (match_operand:VH 2 "reg_or_zero_operand" "w,Dz")]
+    NEON_VCMP))]
+ "TARGET_NEON_FP16INST"
+{
+  char pattern[100];
+  sprintf (pattern, "vc<cmp_op_unsp>.f%%#<V_sz_elem>\t%%<V_reg>0,"
+	   " %%<V_reg>1, %s",
+	   which_alternative == 0
+	   ? "%<V_reg>2" : "#0");
+  output_asm_insn (pattern, operands);
+  return "";
+}
+ [(set_attr "type" "neon_fp_compare_s<q>")])
+
 (define_insn "neon_vc<cmp_op>u<mode>"
   [(set (match_operand:<V_cmp_result> 0 "s_register_operand" "=w")
         (neg:<V_cmp_result>
@@ -2431,6 +2661,60 @@
   [(set_attr "type" "neon_fp_compare_s<q>")]
 )
 
+(define_expand "neon_vca<cmp_op><mode>"
+  [(set
+    (match_operand:<V_cmp_result> 0 "s_register_operand")
+    (neg:<V_cmp_result>
+     (GLTE:<V_cmp_result>
+      (abs:VH (match_operand:VH 1 "s_register_operand"))
+      (abs:VH (match_operand:VH 2 "s_register_operand")))))]
+ "TARGET_NEON_FP16INST"
+{
+  if (flag_unsafe_math_optimizations)
+    emit_insn (gen_neon_vca<cmp_op><mode>_fp16insn
+	       (operands[0], operands[1], operands[2]));
+  else
+    emit_insn (gen_neon_vca<cmp_op><mode>_fp16insn_unspec
+	       (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
+(define_insn "neon_vca<cmp_op><mode>_fp16insn"
+  [(set
+    (match_operand:<V_cmp_result> 0 "s_register_operand" "=w")
+    (neg:<V_cmp_result>
+     (GLTE:<V_cmp_result>
+      (abs:VH (match_operand:VH 1 "s_register_operand" "w"))
+      (abs:VH (match_operand:VH 2 "s_register_operand" "w")))))]
+ "TARGET_NEON_FP16INST && flag_unsafe_math_optimizations"
+ "vac<cmp_op>.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+ [(set_attr "type" "neon_fp_compare_s<q>")]
+)
+
+(define_insn "neon_vca<cmp_op_unsp><mode>_fp16insn_unspec"
+ [(set (match_operand:<V_cmp_result> 0 "s_register_operand" "=w")
+   (unspec:<V_cmp_result>
+    [(match_operand:VH 1 "s_register_operand" "w")
+     (match_operand:VH 2 "s_register_operand" "w")]
+    NEON_VAGLTE))]
+ "TARGET_NEON"
+ "vac<cmp_op_unsp>.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+ [(set_attr "type" "neon_fp_compare_s<q>")]
+)
+
+(define_expand "neon_vc<cmp_op>z<mode>"
+ [(set
+   (match_operand:<V_cmp_result> 0 "s_register_operand")
+   (COMPARISONS:<V_cmp_result>
+    (match_operand:VH 1 "s_register_operand")
+    (const_int 0)))]
+ "TARGET_NEON_FP16INST"
+ {
+  emit_insn (gen_neon_vc<cmp_op><mode> (operands[0], operands[1],
+					CONST0_RTX (<MODE>mode)));
+  DONE;
+})
+
 (define_insn "neon_vtst<mode>"
   [(set (match_operand:VDQIW 0 "s_register_operand" "=w")
         (unspec:VDQIW [(match_operand:VDQIW 1 "s_register_operand" "w")
@@ -2451,6 +2735,16 @@
   [(set_attr "type" "neon_abd<q>")]
 )
 
+(define_insn "neon_vabd<mode>"
+  [(set (match_operand:VH 0 "s_register_operand" "=w")
+    (unspec:VH [(match_operand:VH 1 "s_register_operand" "w")
+		(match_operand:VH 2 "s_register_operand" "w")]
+     UNSPEC_VABD_F))]
+ "TARGET_NEON_FP16INST"
+ "vabd.<V_s_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+  [(set_attr "type" "neon_abd<q>")]
+)
+
 (define_insn "neon_vabdf<mode>"
   [(set (match_operand:VCVTF 0 "s_register_operand" "=w")
         (unspec:VCVTF [(match_operand:VCVTF 1 "s_register_operand" "w")
@@ -2513,6 +2807,40 @@
   [(set_attr "type" "neon_fp_minmax_s<q>")]
 )
 
+(define_insn "neon_v<maxmin>f<mode>"
+ [(set (match_operand:VH 0 "s_register_operand" "=w")
+   (unspec:VH
+    [(match_operand:VH 1 "s_register_operand" "w")
+     (match_operand:VH 2 "s_register_operand" "w")]
+    VMAXMINF))]
+ "TARGET_NEON_FP16INST"
+ "v<maxmin>.<V_s_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+ [(set_attr "type" "neon_fp_minmax_s<q>")]
+)
+
+(define_insn "neon_vp<maxmin>fv4hf"
+ [(set (match_operand:V4HF 0 "s_register_operand" "=w")
+   (unspec:V4HF
+    [(match_operand:V4HF 1 "s_register_operand" "w")
+     (match_operand:V4HF 2 "s_register_operand" "w")]
+    VPMAXMINF))]
+ "TARGET_NEON_FP16INST"
+ "vp<maxmin>.f16\t%P0, %P1, %P2"
+  [(set_attr "type" "neon_reduc_minmax")]
+)
+
+(define_insn "neon_<fmaxmin_op><mode>"
+ [(set
+   (match_operand:VH 0 "s_register_operand" "=w")
+   (unspec:VH
+    [(match_operand:VH 1 "s_register_operand" "w")
+     (match_operand:VH 2 "s_register_operand" "w")]
+    VMAXMINFNM))]
+ "TARGET_NEON_FP16INST"
+ "<fmaxmin_op>.<V_s_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+ [(set_attr "type" "neon_fp_minmax_s<q>")]
+)
+
 ;; Vector forms for the IEEE-754 fmax()/fmin() functions
 (define_insn "<fmaxmin><mode>3"
   [(set (match_operand:VCVTF 0 "s_register_operand" "=w")
@@ -2584,6 +2912,17 @@
   [(set_attr "type" "neon_fp_recps_s<q>")]
 )
 
+(define_insn "neon_vrecps<mode>"
+  [(set
+    (match_operand:VH 0 "s_register_operand" "=w")
+    (unspec:VH [(match_operand:VH 1 "s_register_operand" "w")
+		(match_operand:VH 2 "s_register_operand" "w")]
+     UNSPEC_VRECPS))]
+  "TARGET_NEON_FP16INST"
+  "vrecps.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+  [(set_attr "type" "neon_fp_recps_s<q>")]
+)
+
 (define_insn "neon_vrsqrts<mode>"
   [(set (match_operand:VCVTF 0 "s_register_operand" "=w")
         (unspec:VCVTF [(match_operand:VCVTF 1 "s_register_operand" "w")
@@ -2594,6 +2933,17 @@
   [(set_attr "type" "neon_fp_rsqrts_s<q>")]
 )
 
+(define_insn "neon_vrsqrts<mode>"
+  [(set
+    (match_operand:VH 0 "s_register_operand" "=w")
+    (unspec:VH [(match_operand:VH 1 "s_register_operand" "w")
+		 (match_operand:VH 2 "s_register_operand" "w")]
+     UNSPEC_VRSQRTS))]
+ "TARGET_NEON_FP16INST"
+ "vrsqrts.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+ [(set_attr "type" "neon_fp_rsqrts_s<q>")]
+)
+
 (define_expand "neon_vabs<mode>"
   [(match_operand:VDQW 0 "s_register_operand" "")
    (match_operand:VDQW 1 "s_register_operand" "")]
@@ -2709,6 +3059,15 @@
 })
 
 (define_insn "neon_vrecpe<mode>"
+  [(set (match_operand:VH 0 "s_register_operand" "=w")
+	(unspec:VH [(match_operand:VH 1 "s_register_operand" "w")]
+		   UNSPEC_VRECPE))]
+  "TARGET_NEON_FP16INST"
+  "vrecpe.f16\t%<V_reg>0, %<V_reg>1"
+  [(set_attr "type" "neon_fp_recpe_s<q>")]
+)
+
+(define_insn "neon_vrecpe<mode>"
   [(set (match_operand:V32 0 "s_register_operand" "=w")
 	(unspec:V32 [(match_operand:V32 1 "s_register_operand" "w")]
                     UNSPEC_VRECPE))]
@@ -3251,6 +3610,28 @@ if (BYTES_BIG_ENDIAN)
   [(set_attr "type" "neon_fp_cvt_narrow_s_q")]
 )
 
+(define_insn "neon_vcvt<sup><mode>"
+ [(set
+   (match_operand:<VH_CVTTO> 0 "s_register_operand" "=w")
+   (unspec:<VH_CVTTO>
+    [(match_operand:VCVTHI 1 "s_register_operand" "w")]
+    VCVT_US))]
+ "TARGET_NEON_FP16INST"
+ "vcvt.f16.<sup>%#16\t%<V_reg>0, %<V_reg>1"
+  [(set_attr "type" "neon_int_to_fp_<VH_elem_ch><q>")]
+)
+
+(define_insn "neon_vcvt<sup><mode>"
+ [(set
+   (match_operand:<VH_CVTTO> 0 "s_register_operand" "=w")
+   (unspec:<VH_CVTTO>
+    [(match_operand:VH 1 "s_register_operand" "w")]
+    VCVT_US))]
+ "TARGET_NEON_FP16INST"
+ "vcvt.<sup>%#16.f16\t%<V_reg>0, %<V_reg>1"
+  [(set_attr "type" "neon_fp_to_int_<VH_elem_ch><q>")]
+)
+
 (define_insn "neon_vcvt<sup>_n<mode>"
   [(set (match_operand:<V_CVTTO> 0 "s_register_operand" "=w")
 	(unspec:<V_CVTTO> [(match_operand:VCVTF 1 "s_register_operand" "w")
@@ -3265,6 +3646,20 @@ if (BYTES_BIG_ENDIAN)
 )
 
 (define_insn "neon_vcvt<sup>_n<mode>"
+ [(set (match_operand:<VH_CVTTO> 0 "s_register_operand" "=w")
+   (unspec:<VH_CVTTO>
+    [(match_operand:VH 1 "s_register_operand" "w")
+     (match_operand:SI 2 "immediate_operand" "i")]
+    VCVT_US_N))]
+  "TARGET_NEON_FP16INST"
+{
+  neon_const_bounds (operands[2], 0, 17);
+  return "vcvt.<sup>%#16.f16\t%<V_reg>0, %<V_reg>1, %2";
+}
+ [(set_attr "type" "neon_fp_to_int_<VH_elem_ch><q>")]
+)
+
+(define_insn "neon_vcvt<sup>_n<mode>"
   [(set (match_operand:<V_CVTTO> 0 "s_register_operand" "=w")
 	(unspec:<V_CVTTO> [(match_operand:VCVTI 1 "s_register_operand" "w")
 			   (match_operand:SI 2 "immediate_operand" "i")]
@@ -3277,6 +3672,31 @@ if (BYTES_BIG_ENDIAN)
   [(set_attr "type" "neon_int_to_fp_<V_elem_ch><q>")]
 )
 
+(define_insn "neon_vcvt<sup>_n<mode>"
+ [(set (match_operand:<VH_CVTTO> 0 "s_register_operand" "=w")
+   (unspec:<VH_CVTTO>
+    [(match_operand:VCVTHI 1 "s_register_operand" "w")
+     (match_operand:SI 2 "immediate_operand" "i")]
+    VCVT_US_N))]
+ "TARGET_NEON_FP16INST"
+{
+  neon_const_bounds (operands[2], 0, 17);
+  return "vcvt.f16.<sup>%#16\t%<V_reg>0, %<V_reg>1, %2";
+}
+ [(set_attr "type" "neon_int_to_fp_<VH_elem_ch><q>")]
+)
+
+(define_insn "neon_vcvt<vcvth_op><sup><mode>"
+ [(set
+   (match_operand:<VH_CVTTO> 0 "s_register_operand" "=w")
+   (unspec:<VH_CVTTO>
+    [(match_operand:VH 1 "s_register_operand" "w")]
+    VCVT_HF_US))]
+ "TARGET_NEON_FP16INST"
+ "vcvt<vcvth_op>.<sup>%#16.f16\t%<V_reg>0, %<V_reg>1"
+  [(set_attr "type" "neon_fp_to_int_<VH_elem_ch><q>")]
+)
+
 (define_insn "neon_vmovn<mode>"
   [(set (match_operand:<V_narrow> 0 "s_register_operand" "=w")
 	(unspec:<V_narrow> [(match_operand:VN 1 "s_register_operand" "w")]
@@ -3347,6 +3767,18 @@ if (BYTES_BIG_ENDIAN)
                    (const_string "neon_mul_<V_elem_ch>_scalar<q>")))]
 )
 
+(define_insn "neon_vmul_lane<mode>"
+  [(set (match_operand:VH 0 "s_register_operand" "=w")
+	(unspec:VH [(match_operand:VH 1 "s_register_operand" "w")
+		    (match_operand:V4HF 2 "s_register_operand"
+		     "<scalar_mul_constraint>")
+		     (match_operand:SI 3 "immediate_operand" "i")]
+		     UNSPEC_VMUL_LANE))]
+  "TARGET_NEON_FP16INST"
+  "vmul.f16\t%<V_reg>0, %<V_reg>1, %P2[%c3]"
+  [(set_attr "type" "neon_fp_mul_s_scalar<q>")]
+)
+
 (define_insn "neon_vmull<sup>_lane<mode>"
   [(set (match_operand:<V_widen> 0 "s_register_operand" "=w")
 	(unspec:<V_widen> [(match_operand:VMDI 1 "s_register_operand" "w")
@@ -3601,6 +4033,19 @@ if (BYTES_BIG_ENDIAN)
   DONE;
 })
 
+(define_expand "neon_vmul_n<mode>"
+  [(match_operand:VH 0 "s_register_operand")
+   (match_operand:VH 1 "s_register_operand")
+   (match_operand:<V_elem> 2 "s_register_operand")]
+  "TARGET_NEON_FP16INST"
+{
+  rtx tmp = gen_reg_rtx (V4HFmode);
+  emit_insn (gen_neon_vset_lanev4hf (tmp, operands[2], tmp, const0_rtx));
+  emit_insn (gen_neon_vmul_lane<mode> (operands[0], operands[1], tmp,
+				       const0_rtx));
+  DONE;
+})
+
 (define_expand "neon_vmulls_n<mode>"
   [(match_operand:<V_widen> 0 "s_register_operand" "")
    (match_operand:VMDI 1 "s_register_operand" "")
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index 57a47ff..bee8795 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -191,6 +191,8 @@
   UNSPEC_VBSL
   UNSPEC_VCAGE
   UNSPEC_VCAGT
+  UNSPEC_VCALE
+  UNSPEC_VCALT
   UNSPEC_VCEQ
   UNSPEC_VCGE
   UNSPEC_VCGEU
@@ -258,6 +260,8 @@
   UNSPEC_VMLSL_S_LANE
   UNSPEC_VMLSL_U_LANE
   UNSPEC_VMLSL_LANE
+  UNSPEC_VFMA_LANE
+  UNSPEC_VFMS_LANE
   UNSPEC_VMOVL_S
   UNSPEC_VMOVL_U
   UNSPEC_VMOVN
@@ -387,4 +391,3 @@
   UNSPEC_VRNDP
   UNSPEC_VRNDX
 ])
-
diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-arith-1.c b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-arith-1.c
index e7da3fc..b88f43f 100644
--- a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-arith-1.c
+++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-arith-1.c
@@ -1,7 +1,7 @@
 /* { dg-do compile }  */
-/* { dg-require-effective-target arm_v8_2a_fp16_scalar_ok }  */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_ok }  */
 /* { dg-options "-O2 -ffast-math" }  */
-/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
 
 /* Test instructions generated for half-precision arithmetic.  */
 
@@ -9,6 +9,9 @@ typedef __fp16 float16_t;
 typedef __simd64_float16_t float16x4_t;
 typedef __simd128_float16_t float16x8_t;
 
+typedef short int16x4_t __attribute__ ((vector_size (8)));
+typedef short int int16x8_t  __attribute__ ((vector_size (16)));
+
 float16_t
 fp16_abs (float16_t a)
 {
@@ -50,15 +53,49 @@ TEST_CMP (greaterthan, >, int, float16_t)
 TEST_CMP (lessthanequal, <=, int, float16_t)
 TEST_CMP (greaterthanqual, >=, int, float16_t)
 
+/* Vectors of size 4.  */
+
+TEST_UNOP (neg, -, float16x4_t)
+
+TEST_BINOP (add, +, float16x4_t)
+TEST_BINOP (sub, -, float16x4_t)
+TEST_BINOP (mult, *, float16x4_t)
+TEST_BINOP (div, /, float16x4_t)
+
+TEST_CMP (equal, ==, int16x4_t, float16x4_t)
+TEST_CMP (unequal, !=, int16x4_t, float16x4_t)
+TEST_CMP (lessthan, <, int16x4_t, float16x4_t)
+TEST_CMP (greaterthan, >, int16x4_t, float16x4_t)
+TEST_CMP (lessthanequal, <=, int16x4_t, float16x4_t)
+TEST_CMP (greaterthanqual, >=, int16x4_t, float16x4_t)
+
+/* Vectors of size 8.  */
+
+TEST_UNOP (neg, -, float16x8_t)
+
+TEST_BINOP (add, +, float16x8_t)
+TEST_BINOP (sub, -, float16x8_t)
+TEST_BINOP (mult, *, float16x8_t)
+TEST_BINOP (div, /, float16x8_t)
+
+TEST_CMP (equal, ==, int16x8_t, float16x8_t)
+TEST_CMP (unequal, !=, int16x8_t, float16x8_t)
+TEST_CMP (lessthan, <, int16x8_t, float16x8_t)
+TEST_CMP (greaterthan, >, int16x8_t, float16x8_t)
+TEST_CMP (lessthanequal, <=, int16x8_t, float16x8_t)
+TEST_CMP (greaterthanqual, >=, int16x8_t, float16x8_t)
+
 /* { dg-final { scan-assembler-times {vneg\.f16\ts[0-9]+, s[0-9]+} 1 } }  */
+/* { dg-final { scan-assembler-times {vneg\.f16\td[0-9]+, d[0-9]+} 1 } }  */
+/* { dg-final { scan-assembler-times {vneg\.f16\tq[0-9]+, q[0-9]+} 1 } }  */
 /* { dg-final { scan-assembler-times {vabs\.f16\ts[0-9]+, s[0-9]+} 2 } }  */
 
-/* { dg-final { scan-assembler-times {vadd\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } }  */
-/* { dg-final { scan-assembler-times {vsub\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } }  */
-/* { dg-final { scan-assembler-times {vmul\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } }  */
-/* { dg-final { scan-assembler-times {vdiv\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } }  */
-/* { dg-final { scan-assembler-times {vcmp\.f32\ts[0-9]+, s[0-9]+} 2 } }  */
-/* { dg-final { scan-assembler-times {vcmpe\.f32\ts[0-9]+, s[0-9]+} 4 } }  */
+/* { dg-final { scan-assembler-times {vadd\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 13 } }  */
+/* { dg-final { scan-assembler-times {vsub\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 13 } }  */
+/* { dg-final { scan-assembler-times {vmul\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 13 } }  */
+/* { dg-final { scan-assembler-times {vdiv\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 13 } }  */
+/* { dg-final { scan-assembler-times {vcmp\.f32\ts[0-9]+, s[0-9]+} 26 } }  */
+/* { dg-final { scan-assembler-times {vcmpe\.f32\ts[0-9]+, s[0-9]+} 52 } }  */
 
 /* { dg-final { scan-assembler-not {vadd\.f32} } }  */
 /* { dg-final { scan-assembler-not {vsub\.f32} } }  */
-- 
2.1.4


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 11/17][ARM] Add builtins for VFP FP16 intrinsics.
  2016-05-17 14:41 ` [PATCH 11/17][ARM] Add builtins for VFP FP16 intrinsics Matthew Wahab
@ 2016-07-04 14:12   ` Matthew Wahab
  2016-07-28 11:55     ` Ramana Radhakrishnan
  0 siblings, 1 reply; 73+ messages in thread
From: Matthew Wahab @ 2016-07-04 14:12 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1493 bytes --]

On 17/05/16 15:41, Matthew Wahab wrote:
 > The ACLE intrinsics introduced to support the ARMv8.2 FP16 extensions
 > require that intrinsics for scalar floating pointer (VFP) instructions
 > are available under different conditions from those for the NEON
 > intrinsics.
 >
 > This patch adds the support code and builtins data for the new VFP
 > intrinsics. Because of the similarities between the scalar and NEON
 > builtins, the support code for the scalar builtins follows the code for
 > the NEON builtins. The declarations for the VFP builtins are also added
 > in this patch since the support code expects non-empty tables.

Updated the patch to drop the builtins for vneg, vadd, vsub, vmul and
vdiv, which are no longer needed.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator.

Ok for trunk?
Matthew

2016-07-04  Matthew Wahab  <matthew.wahab@arm.com>

	* config/arm/arm-builtins.c (hf_UP): New.
	(si_UP): New.
	(vfp_builtin_data): New.  Update comment.
	(enum arm_builtins): Include "arm_vfp_builtins.def".
	(ARM_BUILTIN_VFP_PATTERN_START): New.
	(arm_init_vfp_builtins): New.
	(arm_init_builtins): Add arm_init_vfp_builtins.
	(arm_expand_vfp_builtin): New.
	(arm_expand_builtins): Update for arm_expand_vfp_builtin.  Fix
	long line.
	* config/arm/arm_vfp_builtins.def: New file.
	* config/arm/t-arm (arm.o): Add arm_vfp_builtins.def.
	(arm-builtins.o): Likewise.


[-- Attachment #2: 0011-PATCH-11-17-ARM-Add-builtins-for-VFP-FP16-intrinsics.patch --]
[-- Type: text/x-patch, Size: 8624 bytes --]

From 04896868ba0af25b31e9d23c3af5d3a88e70a564 Mon Sep 17 00:00:00 2001
From: Matthew Wahab <matthew.wahab@arm.com>
Date: Thu, 7 Apr 2016 15:33:14 +0100
Subject: [PATCH 11/17] [PATCH 11/17][ARM] Add builtins for VFP FP16
 intrinsics.

2016-07-04  Matthew Wahab  <matthew.wahab@arm.com>

	* config/arm/arm-builtins.c (hf_UP): New.
	(si_UP): New.
	(vfp_builtin_data): New.  Update comment.
	(enum arm_builtins): Include "arm_vfp_builtins.def".
	(ARM_BUILTIN_VFP_PATTERN_START): New.
	(arm_init_vfp_builtins): New.
	(arm_init_builtins): Add arm_init_vfp_builtins.
	(arm_expand_vfp_builtin): New.
	(arm_expand_builtins): Update for arm_expand_vfp_builtin.  Fix
	long line.
	* config/arm/arm_vfp_builtins.def: New file.
	* config/arm/t-arm (arm.o): Add arm_vfp_builtins.def.
	(arm-builtins.o): Likewise.
---
 gcc/config/arm/arm-builtins.c       | 75 +++++++++++++++++++++++++++++++++----
 gcc/config/arm/arm_vfp_builtins.def | 51 +++++++++++++++++++++++++
 gcc/config/arm/t-arm                |  4 +-
 3 files changed, 121 insertions(+), 9 deletions(-)
 create mode 100644 gcc/config/arm/arm_vfp_builtins.def

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 5dd81b1..70bcc07 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -190,6 +190,8 @@ arm_storestruct_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define ti_UP	 TImode
 #define ei_UP	 EImode
 #define oi_UP	 OImode
+#define hf_UP	 HFmode
+#define si_UP	 SImode
 
 #define UP(X) X##_UP
 
@@ -239,12 +241,22 @@ typedef struct {
   VAR11 (T, N, A, B, C, D, E, F, G, H, I, J, K) \
   VAR1 (T, N, L)
 
-/* The NEON builtin data can be found in arm_neon_builtins.def.
-   The mode entries in the following table correspond to the "key" type of the
-   instruction variant, i.e. equivalent to that which would be specified after
-   the assembler mnemonic, which usually refers to the last vector operand.
-   The modes listed per instruction should be the same as those defined for
-   that instruction's pattern in neon.md.  */
+/* The NEON builtin data can be found in arm_neon_builtins.def and
+   arm_vfp_builtins.def.  The entries in arm_neon_builtins.def require
+   TARGET_NEON to be true.  The entries in arm_vfp_builtins.def require
+   TARGET_VFP to be true.  The feature tests are checked when the builtins are
+   expanded.
+
+   The mode entries in the following table correspond to
+   the "key" type of the instruction variant, i.e. equivalent to that which
+   would be specified after the assembler mnemonic, which usually refers to the
+   last vector operand.  The modes listed per instruction should be the same as
+   those defined for that instruction's pattern in neon.md.  */
+
+static neon_builtin_datum vfp_builtin_data[] =
+{
+#include "arm_vfp_builtins.def"
+};
 
 static neon_builtin_datum neon_builtin_data[] =
 {
@@ -534,6 +546,10 @@ enum arm_builtins
 #undef CRYPTO2
 #undef CRYPTO3
 
+  ARM_BUILTIN_VFP_BASE,
+
+#include "arm_vfp_builtins.def"
+
   ARM_BUILTIN_NEON_BASE,
   ARM_BUILTIN_NEON_LANE_CHECK = ARM_BUILTIN_NEON_BASE,
 
@@ -542,6 +558,9 @@ enum arm_builtins
   ARM_BUILTIN_MAX
 };
 
+#define ARM_BUILTIN_VFP_PATTERN_START \
+  (ARM_BUILTIN_VFP_BASE + 1)
+
 #define ARM_BUILTIN_NEON_PATTERN_START \
   (ARM_BUILTIN_NEON_BASE + 1)
 
@@ -1033,6 +1052,20 @@ arm_init_neon_builtins (void)
     }
 }
 
+/* Set up all the scalar floating point builtins.  */
+
+static void
+arm_init_vfp_builtins (void)
+{
+  unsigned int i, fcode = ARM_BUILTIN_VFP_PATTERN_START;
+
+  for (i = 0; i < ARRAY_SIZE (vfp_builtin_data); i++, fcode++)
+    {
+      neon_builtin_datum *d = &vfp_builtin_data[i];
+      arm_init_neon_builtin (fcode, d);
+    }
+}
+
 static void
 arm_init_crypto_builtins (void)
 {
@@ -1777,7 +1810,7 @@ arm_init_builtins (void)
   if (TARGET_HARD_FLOAT)
     {
       arm_init_neon_builtins ();
-
+      arm_init_vfp_builtins ();
       arm_init_crypto_builtins ();
     }
 
@@ -2324,6 +2357,27 @@ arm_expand_neon_builtin (int fcode, tree exp, rtx target)
   return arm_expand_neon_builtin_1 (fcode, exp, target, d);
 }
 
+/* Expand a VFP builtin, if TARGET_VFP is true.  These builtins are treated like
+   neon builtins except that the data is looked up in table
+   VFP_BUILTIN_DATA.  */
+
+static rtx
+arm_expand_vfp_builtin (int fcode, tree exp, rtx target)
+{
+  if (fcode >= ARM_BUILTIN_VFP_BASE && ! TARGET_VFP)
+    {
+      fatal_error (input_location,
+		   "You must enable VFP instructions"
+		   " to use these intrinsics.");
+      return const0_rtx;
+    }
+
+  neon_builtin_datum *d
+    = &vfp_builtin_data[fcode - ARM_BUILTIN_VFP_PATTERN_START];
+
+  return arm_expand_neon_builtin_1 (fcode, exp, target, d);
+}
+
 /* Expand an expression EXP that calls a built-in function,
    with result going to TARGET if that's convenient
    (and in mode MODE if that's convenient).
@@ -2361,13 +2415,18 @@ arm_expand_builtin (tree exp,
   if (fcode >= ARM_BUILTIN_NEON_BASE)
     return arm_expand_neon_builtin (fcode, exp, target);
 
+  if (fcode >= ARM_BUILTIN_VFP_BASE)
+    return arm_expand_vfp_builtin (fcode, exp, target);
+
   /* Check in the context of the function making the call whether the
      builtin is supported.  */
   if (fcode >= ARM_BUILTIN_CRYPTO_BASE
       && (!TARGET_CRYPTO || !TARGET_HARD_FLOAT))
     {
       fatal_error (input_location,
-		   "You must enable crypto intrinsics (e.g. include -mfloat-abi=softfp -mfpu=crypto-neon...) to use these intrinsics.");
+		   "You must enable crypto instructions"
+		   " (e.g. include -mfloat-abi=softfp -mfpu=crypto-neon...)"
+		   " to use these intrinsics.");
       return const0_rtx;
     }
 
diff --git a/gcc/config/arm/arm_vfp_builtins.def b/gcc/config/arm/arm_vfp_builtins.def
new file mode 100644
index 0000000..5abfcdd
--- /dev/null
+++ b/gcc/config/arm/arm_vfp_builtins.def
@@ -0,0 +1,51 @@
+/* VFP instruction builtin definitions.
+   Copyright (C) 2016 Free Software Foundation, Inc.
+   Contributed by ARM Ltd.
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+/* This file lists the builtins that may be available when VFP is enabled but
+   not NEON is enabled.  The entries otherwise have the same requirements and
+   generate the same structures as those in the arm_neon_builtins.def.  */
+
+/* FP16 Arithmetic instructions.  */
+VAR1 (UNOP, vabs, hf)
+VAR2 (UNOP, vcvths, hf, si)
+VAR2 (UNOP, vcvthu, hf, si)
+VAR1 (UNOP, vcvtahs, si)
+VAR1 (UNOP, vcvtahu, si)
+VAR1 (UNOP, vcvtmhs, si)
+VAR1 (UNOP, vcvtmhu, si)
+VAR1 (UNOP, vcvtnhs, si)
+VAR1 (UNOP, vcvtnhu, si)
+VAR1 (UNOP, vcvtphs, si)
+VAR1 (UNOP, vcvtphu, si)
+VAR1 (UNOP, vrnd, hf)
+VAR1 (UNOP, vrnda, hf)
+VAR1 (UNOP, vrndi, hf)
+VAR1 (UNOP, vrndm, hf)
+VAR1 (UNOP, vrndn, hf)
+VAR1 (UNOP, vrndp, hf)
+VAR1 (UNOP, vrndx, hf)
+VAR1 (UNOP, vsqrt, hf)
+
+VAR2 (BINOP, vcvths_n, hf, si)
+VAR2 (BINOP, vcvthu_n, hf, si)
+VAR1 (BINOP, vmaxnm, hf)
+VAR1 (BINOP, vminnm, hf)
+
+VAR1 (TERNOP, vfma, hf)
+VAR1 (TERNOP, vfms, hf)
diff --git a/gcc/config/arm/t-arm b/gcc/config/arm/t-arm
index 749a58d..803baa2 100644
--- a/gcc/config/arm/t-arm
+++ b/gcc/config/arm/t-arm
@@ -95,7 +95,8 @@ arm.o: $(srcdir)/config/arm/arm.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
   $(srcdir)/config/arm/arm-cores.def \
   $(srcdir)/config/arm/arm-arches.def $(srcdir)/config/arm/arm-fpus.def \
   $(srcdir)/config/arm/arm-protos.h \
-  $(srcdir)/config/arm/arm_neon_builtins.def
+  $(srcdir)/config/arm/arm_neon_builtins.def \
+  $(srcdir)/config/arm/arm_vfp_builtins.def
 
 arm-builtins.o: $(srcdir)/config/arm/arm-builtins.c $(CONFIG_H) \
   $(SYSTEM_H) coretypes.h $(TM_H) \
@@ -103,6 +104,7 @@ arm-builtins.o: $(srcdir)/config/arm/arm-builtins.c $(CONFIG_H) \
   $(DIAGNOSTIC_CORE_H) $(OPTABS_H) \
   $(srcdir)/config/arm/arm-protos.h \
   $(srcdir)/config/arm/arm_neon_builtins.def \
+  $(srcdir)/config/arm/arm_vfp_builtins.def \
   $(srcdir)/config/arm/arm-simd-builtin-types.def
 	$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
 		$(srcdir)/config/arm/arm-builtins.c
-- 
2.1.4


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 12/17][ARM] Add builtins for NEON FP16 intrinsics.
  2016-05-17 14:43 ` [PATCH 12/17][ARM] Add builtins for NEON " Matthew Wahab
@ 2016-07-04 14:13   ` Matthew Wahab
  2016-07-28 11:56     ` Ramana Radhakrishnan
  0 siblings, 1 reply; 73+ messages in thread
From: Matthew Wahab @ 2016-07-04 14:13 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2722 bytes --]

On 17/05/16 15:42, Matthew Wahab wrote:
 > This patch adds the builtins data for the ACLE intrinsics introduced to
 > support the NEON instructions of the ARMv8.2-A FP16 extension.

Updated to fix the vsqrte/vrsqrte spelling mistake and correct the changelog.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator.

Ok for trunk?
Matthew

2016-07-04  Matthew Wahab  <matthew.wahab@arm.com>

	* config/arm/arm_neon_builtins.def (vadd): New (v8hf, v4hf
	variants).
	(vmulf): New (v8hf, v4hf variants).
	(vfma): New (v8hf, v4hf variants).
	(vfms): New (v8hf, v4hf variants).
	(vsub): New (v8hf, v4hf variants).
	(vcage): New (v8hf, v4hf variants).
	(vcagt): New (v8hf, v4hf variants).
	(vcale): New (v8hf, v4hf variants).
	(vcalt): New (v8hf, v4hf variants).
	(vceq): New (v8hf, v4hf variants).
	(vcgt): New (v8hf, v4hf variants).
	(vcge): New (v8hf, v4hf variants).
	(vcle): New (v8hf, v4hf variants).
	(vclt): New (v8hf, v4hf variants).
	(vceqz): New (v8hf, v4hf variants).
	(vcgez): New (v8hf, v4hf variants).
	(vcgtz): New (v8hf, v4hf variants).
	(vcltz): New (v8hf, v4hf variants).
	(vclez): New (v8hf, v4hf variants).
	(vabd): New (v8hf, v4hf variants).
	(vmaxf): New (v8hf, v4hf variants).
	(vmaxnm): New (v8hf, v4hf variants).
	(vminf): New (v8hf, v4hf variants).
	(vminnm): New (v8hf, v4hf variants).
	(vpmaxf): New (v4hf variant).
	(vpminf): New (v4hf variant).
	(vpadd): New (v4hf variant).
	(vrecps): New (v8hf, v4hf variants).
	(vrsqrts): New (v8hf, v4hf variants).
	(vabs): New (v8hf, v4hf variants).
	(vneg): New (v8hf, v4hf variants).
	(vrecpe): New (v8hf, v4hf variants).
	(vrnd): New (v8hf, v4hf variants).
	(vrnda): New (v8hf, v4hf variants).
	(vrndm): New (v8hf, v4hf variants).
	(vrndn): New (v8hf, v4hf variants).
	(vrndp): New (v8hf, v4hf variants).
	(vrndx): New (v8hf, v4hf variants).
	(vrsqrte): New (v8hf, v4hf variants).
	(vmul_lane): Add v4hf and v8hf variants.
	(vmul_n): Add v4hf and v8hf variants.
	(vext): New (v8hf, v4hf variants).
	(vcvts): New (v8hi, v4hi variants).
	(vcvts): New (v8hf, v4hf variants).
	(vcvtu): New (v8hi, v4hi variants).
	(vcvtu): New (v8hf, v4hf variants).
	(vcvts_n): New (v8hf, v4hf variants).
	(vcvtu_n): New (v8hi, v4hi variants).
	(vcvts_n): New (v8hi, v4hi variants).
	(vcvtu_n): New (v8hf, v4hf variants).
	(vbsl): New (v8hf, v4hf variants).
	(vcvtas): New (v8hf, v4hf variants).
	(vcvtau): New (v8hf, v4hf variants).
	(vcvtms): New (v8hf, v4hf variants).
	(vcvtmu): New (v8hf, v4hf variants).
	(vcvtns): New (v8hf, v4hf variants).
	(vcvtnu): New (v8hf, v4hf variants).
	(vcvtps): New (v8hf, v4hf variants).
	(vcvtpu): New (v8hf, v4hf variants).


[-- Attachment #2: 0012-PATCH-12-17-ARM-Add-builtins-for-NEON-FP16-intrinsic.patch --]
[-- Type: text/x-patch, Size: 9856 bytes --]

From 5df552f65de19667400c63ff939ed5e90a8cbadf Mon Sep 17 00:00:00 2001
From: Matthew Wahab <matthew.wahab@arm.com>
Date: Thu, 7 Apr 2016 13:36:41 +0100
Subject: [PATCH 12/17] [PATCH 12/17][ARM] Add builtins for NEON FP16
 intrinsics.

2016-07-04  Matthew Wahab  <matthew.wahab@arm.com>

	* config/arm/arm_neon_builtins.def (vadd): New (v8hf, v4hf
	variants).
	(vmulf): New (v8hf, v4hf variants).
	(vfma): New (v8hf, v4hf variants).
	(vfms): New (v8hf, v4hf variants).
	(vsub): New (v8hf, v4hf variants).
	(vcage): New (v8hf, v4hf variants).
	(vcagt): New (v8hf, v4hf variants).
	(vcale): New (v8hf, v4hf variants).
	(vcalt): New (v8hf, v4hf variants).
	(vceq): New (v8hf, v4hf variants).
	(vcgt): New (v8hf, v4hf variants).
	(vcge): New (v8hf, v4hf variants).
	(vcle): New (v8hf, v4hf variants).
	(vclt): New (v8hf, v4hf variants).
	(vceqz): New (v8hf, v4hf variants).
	(vcgez): New (v8hf, v4hf variants).
	(vcgtz): New (v8hf, v4hf variants).
	(vcltz): New (v8hf, v4hf variants).
	(vclez): New (v8hf, v4hf variants).
	(vabd): New (v8hf, v4hf variants).
	(vmaxf): New (v8hf, v4hf variants).
	(vmaxnm): New (v8hf, v4hf variants).
	(vminf): New (v8hf, v4hf variants).
	(vminnm): New (v8hf, v4hf variants).
	(vpmaxf): New (v4hf variant).
	(vpminf): New (v4hf variant).
	(vpadd): New (v4hf variant).
	(vrecps): New (v8hf, v4hf variants).
	(vrsqrts): New (v8hf, v4hf variants).
	(vabs): New (v8hf, v4hf variants).
	(vneg): New (v8hf, v4hf variants).
	(vrecpe): New (v8hf, v4hf variants).
	(vrnd): New (v8hf, v4hf variants).
	(vrnda): New (v8hf, v4hf variants).
	(vrndm): New (v8hf, v4hf variants).
	(vrndn): New (v8hf, v4hf variants).
	(vrndp): New (v8hf, v4hf variants).
	(vrndx): New (v8hf, v4hf variants).
	(vrsqrte): New (v8hf, v4hf variants).
	(vmul_lane): Add v4hf and v8hf variants.
	(vmul_n): Add v4hf and v8hf variants.
	(vext): New (v8hf, v4hf variants).
	(vcvts): New (v8hi, v4hi variants).
	(vcvts): New (v8hf, v4hf variants).
	(vcvtu): New (v8hi, v4hi variants).
	(vcvtu): New (v8hf, v4hf variants).
	(vcvts_n): New (v8hf, v4hf variants).
	(vcvtu_n): New (v8hi, v4hi variants).
	(vcvts_n): New (v8hi, v4hi variants).
	(vcvtu_n): New (v8hf, v4hf variants).
	(vbsl): New (v8hf, v4hf variants).
	(vcvtas): New (v8hf, v4hf variants).
	(vcvtau): New (v8hf, v4hf variants).
	(vcvtms): New (v8hf, v4hf variants).
	(vcvtmu): New (v8hf, v4hf variants).
	(vcvtns): New (v8hf, v4hf variants).
	(vcvtnu): New (v8hf, v4hf variants).
	(vcvtps): New (v8hf, v4hf variants).
	(vcvtpu): New (v8hf, v4hf variants).
---
 gcc/config/arm/arm_neon_builtins.def | 59 ++++++++++++++++++++++++++++++++++--
 1 file changed, 57 insertions(+), 2 deletions(-)

diff --git a/gcc/config/arm/arm_neon_builtins.def b/gcc/config/arm/arm_neon_builtins.def
index a4ba516..b29aa91 100644
--- a/gcc/config/arm/arm_neon_builtins.def
+++ b/gcc/config/arm/arm_neon_builtins.def
@@ -19,6 +19,7 @@
    <http://www.gnu.org/licenses/>.  */
 
 VAR2 (BINOP, vadd, v2sf, v4sf)
+VAR2 (BINOP, vadd, v8hf, v4hf)
 VAR3 (BINOP, vaddls, v8qi, v4hi, v2si)
 VAR3 (BINOP, vaddlu, v8qi, v4hi, v2si)
 VAR3 (BINOP, vaddws, v8qi, v4hi, v2si)
@@ -32,12 +33,15 @@ VAR8 (BINOP, vqaddu, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
 VAR3 (BINOP, vaddhn, v8hi, v4si, v2di)
 VAR3 (BINOP, vraddhn, v8hi, v4si, v2di)
 VAR2 (BINOP, vmulf, v2sf, v4sf)
+VAR2 (BINOP, vmulf, v8hf, v4hf)
 VAR2 (BINOP, vmulp, v8qi, v16qi)
 VAR8 (TERNOP, vmla, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf)
 VAR3 (TERNOP, vmlals, v8qi, v4hi, v2si)
 VAR3 (TERNOP, vmlalu, v8qi, v4hi, v2si)
 VAR2 (TERNOP, vfma, v2sf, v4sf)
+VAR2 (TERNOP, vfma, v4hf, v8hf)
 VAR2 (TERNOP, vfms, v2sf, v4sf)
+VAR2 (TERNOP, vfms, v4hf, v8hf)
 VAR8 (TERNOP, vmls, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf)
 VAR3 (TERNOP, vmlsls, v8qi, v4hi, v2si)
 VAR3 (TERNOP, vmlslu, v8qi, v4hi, v2si)
@@ -94,6 +98,7 @@ VAR8 (TERNOP_IMM, vsrau_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
 VAR8 (TERNOP_IMM, vrsras_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
 VAR8 (TERNOP_IMM, vrsrau_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
 VAR2 (BINOP, vsub, v2sf, v4sf)
+VAR2 (BINOP, vsub, v8hf, v4hf)
 VAR3 (BINOP, vsubls, v8qi, v4hi, v2si)
 VAR3 (BINOP, vsublu, v8qi, v4hi, v2si)
 VAR3 (BINOP, vsubws, v8qi, v4hi, v2si)
@@ -111,12 +116,27 @@ VAR8 (BINOP, vcgt, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf)
 VAR6 (BINOP, vcgtu, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
 VAR2 (BINOP, vcage, v2sf, v4sf)
 VAR2 (BINOP, vcagt, v2sf, v4sf)
+VAR2 (BINOP, vcage, v4hf, v8hf)
+VAR2 (BINOP, vcagt, v4hf, v8hf)
+VAR2 (BINOP, vcale, v4hf, v8hf)
+VAR2 (BINOP, vcalt, v4hf, v8hf)
+VAR2 (BINOP, vceq, v4hf, v8hf)
+VAR2 (BINOP, vcge, v4hf, v8hf)
+VAR2 (BINOP, vcgt, v4hf, v8hf)
+VAR2 (BINOP, vcle, v4hf, v8hf)
+VAR2 (BINOP, vclt, v4hf, v8hf)
+VAR2 (UNOP, vceqz, v4hf, v8hf)
+VAR2 (UNOP, vcgez, v4hf, v8hf)
+VAR2 (UNOP, vcgtz, v4hf, v8hf)
+VAR2 (UNOP, vclez, v4hf, v8hf)
+VAR2 (UNOP, vcltz, v4hf, v8hf)
 VAR6 (BINOP, vtst, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
 VAR6 (BINOP, vabds, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
 VAR6 (BINOP, vabdu, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
 VAR2 (BINOP, vabdf, v2sf, v4sf)
 VAR3 (BINOP, vabdls, v8qi, v4hi, v2si)
 VAR3 (BINOP, vabdlu, v8qi, v4hi, v2si)
+VAR2 (BINOP, vabd, v8hf, v4hf)
 
 VAR6 (TERNOP, vabas, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
 VAR6 (TERNOP, vabau, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
@@ -126,27 +146,38 @@ VAR3 (TERNOP, vabalu, v8qi, v4hi, v2si)
 VAR6 (BINOP, vmaxs, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
 VAR6 (BINOP, vmaxu, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
 VAR2 (BINOP, vmaxf, v2sf, v4sf)
+VAR2 (BINOP, vmaxf, v8hf, v4hf)
+VAR2 (BINOP, vmaxnm, v4hf, v8hf)
 VAR6 (BINOP, vmins, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
 VAR6 (BINOP, vminu, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
 VAR2 (BINOP, vminf, v2sf, v4sf)
+VAR2 (BINOP, vminf, v4hf, v8hf)
+VAR2 (BINOP, vminnm, v8hf, v4hf)
 
 VAR3 (BINOP, vpmaxs, v8qi, v4hi, v2si)
 VAR3 (BINOP, vpmaxu, v8qi, v4hi, v2si)
 VAR1 (BINOP, vpmaxf, v2sf)
+VAR1 (BINOP, vpmaxf, v4hf)
 VAR3 (BINOP, vpmins, v8qi, v4hi, v2si)
 VAR3 (BINOP, vpminu, v8qi, v4hi, v2si)
 VAR1 (BINOP, vpminf, v2sf)
+VAR1 (BINOP, vpminf, v4hf)
 
 VAR4 (BINOP, vpadd, v8qi, v4hi, v2si, v2sf)
+VAR1 (BINOP, vpadd, v4hf)
 VAR6 (UNOP, vpaddls, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
 VAR6 (UNOP, vpaddlu, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
 VAR6 (BINOP, vpadals, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
 VAR6 (BINOP, vpadalu, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
 VAR2 (BINOP, vrecps, v2sf, v4sf)
 VAR2 (BINOP, vrsqrts, v2sf, v4sf)
+VAR2 (BINOP, vrecps, v4hf, v8hf)
+VAR2 (BINOP, vrsqrts, v4hf, v8hf)
 VAR8 (TERNOP_IMM, vsri_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
 VAR8 (TERNOP_IMM, vsli_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
 VAR8 (UNOP, vabs, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf)
+VAR2 (UNOP, vabs, v8hf, v4hf)
+VAR2 (UNOP, vneg, v8hf, v4hf)
 VAR6 (UNOP, vqabs, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
 VAR8 (UNOP, vneg, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf)
 VAR6 (UNOP, vqneg, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
@@ -155,8 +186,16 @@ VAR6 (UNOP, vclz, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
 VAR5 (BSWAP, bswap, v4hi, v8hi, v2si, v4si, v2di)
 VAR2 (UNOP, vcnt, v8qi, v16qi)
 VAR4 (UNOP, vrecpe, v2si, v2sf, v4si, v4sf)
+VAR2 (UNOP, vrecpe, v8hf, v4hf)
 VAR4 (UNOP, vrsqrte, v2si, v2sf, v4si, v4sf)
+VAR2 (UNOP, vrsqrte, v4hf, v8hf)
 VAR6 (UNOP, vmvn, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
+VAR2 (UNOP, vrnd, v8hf, v4hf)
+VAR2 (UNOP, vrnda, v8hf, v4hf)
+VAR2 (UNOP, vrndm, v8hf, v4hf)
+VAR2 (UNOP, vrndn, v8hf, v4hf)
+VAR2 (UNOP, vrndp, v8hf, v4hf)
+VAR2 (UNOP, vrndx, v8hf, v4hf)
   /* FIXME: vget_lane supports more variants than this!  */
 VAR10 (GETLANE, vget_lane,
 	 v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di)
@@ -179,7 +218,7 @@ VAR3 (UNOP, vqmovnu, v8hi, v4si, v2di)
 VAR3 (UNOP, vqmovun, v8hi, v4si, v2di)
 VAR3 (UNOP, vmovls, v8qi, v4hi, v2si)
 VAR3 (UNOP, vmovlu, v8qi, v4hi, v2si)
-VAR6 (SETLANE, vmul_lane, v4hi, v2si, v2sf, v8hi, v4si, v4sf)
+VAR8 (SETLANE, vmul_lane, v4hi, v2si, v2sf, v8hi, v4si, v4sf, v4hf, v8hf)
 VAR6 (MAC_LANE, vmla_lane, v4hi, v2si, v2sf, v8hi, v4si, v4sf)
 VAR2 (MAC_LANE, vmlals_lane, v4hi, v2si)
 VAR2 (MAC_LANE, vmlalu_lane, v4hi, v2si)
@@ -188,7 +227,7 @@ VAR6 (MAC_LANE, vmls_lane, v4hi, v2si, v2sf, v8hi, v4si, v4sf)
 VAR2 (MAC_LANE, vmlsls_lane, v4hi, v2si)
 VAR2 (MAC_LANE, vmlslu_lane, v4hi, v2si)
 VAR2 (MAC_LANE, vqdmlsl_lane, v4hi, v2si)
-VAR6 (BINOP, vmul_n, v4hi, v2si, v2sf, v8hi, v4si, v4sf)
+VAR8 (BINOP, vmul_n, v4hi, v2si, v2sf, v8hi, v4si, v4sf, v4hf, v8hf)
 VAR6 (MAC_N, vmla_n, v4hi, v2si, v2sf, v8hi, v4si, v4sf)
 VAR2 (MAC_N, vmlals_n, v4hi, v2si)
 VAR2 (MAC_N, vmlalu_n, v4hi, v2si)
@@ -204,9 +243,17 @@ VAR8 (UNOP, vrev64, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf)
 VAR4 (UNOP, vrev32, v8qi, v4hi, v16qi, v8hi)
 VAR2 (UNOP, vrev16, v8qi, v16qi)
 VAR4 (UNOP, vcvts, v2si, v2sf, v4si, v4sf)
+VAR2 (UNOP, vcvts, v4hi, v8hi)
+VAR2 (UNOP, vcvts, v4hf, v8hf)
+VAR2 (UNOP, vcvtu, v4hi, v8hi)
+VAR2 (UNOP, vcvtu, v4hf, v8hf)
 VAR4 (UNOP, vcvtu, v2si, v2sf, v4si, v4sf)
 VAR4 (BINOP, vcvts_n, v2si, v2sf, v4si, v4sf)
 VAR4 (BINOP, vcvtu_n, v2si, v2sf, v4si, v4sf)
+VAR2 (BINOP, vcvts_n, v4hf, v8hf)
+VAR2 (BINOP, vcvtu_n, v4hi, v8hi)
+VAR2 (BINOP, vcvts_n, v4hi, v8hi)
+VAR2 (BINOP, vcvtu_n, v4hf, v8hf)
 VAR1 (UNOP, vcvtv4sf, v4hf)
 VAR1 (UNOP, vcvtv4hf, v4sf)
 VAR10 (TERNOP, vbsl,
@@ -223,6 +270,14 @@ VAR1 (UNOP, vcvtav2sf, v2si)
 VAR1 (UNOP, vcvtav4sf, v4si)
 VAR1 (UNOP, vcvtauv2sf, v2si)
 VAR1 (UNOP, vcvtauv4sf, v4si)
+VAR2 (UNOP, vcvtas, v4hf, v8hf)
+VAR2 (UNOP, vcvtau, v4hf, v8hf)
+VAR2 (UNOP, vcvtms, v4hf, v8hf)
+VAR2 (UNOP, vcvtmu, v4hf, v8hf)
+VAR2 (UNOP, vcvtns, v4hf, v8hf)
+VAR2 (UNOP, vcvtnu, v4hf, v8hf)
+VAR2 (UNOP, vcvtps, v4hf, v8hf)
+VAR2 (UNOP, vcvtpu, v4hf, v8hf)
 VAR1 (UNOP, vcvtpv2sf, v2si)
 VAR1 (UNOP, vcvtpv4sf, v4si)
 VAR1 (UNOP, vcvtpuv2sf, v2si)
-- 
2.1.4


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 13/17][ARM] Add VFP FP16 instrinsics.
  2016-05-17 14:44 ` [PATCH 13/17][ARM] Add VFP FP16 instrinsics Matthew Wahab
@ 2016-07-04 14:14   ` Matthew Wahab
  2016-07-28 11:57     ` Ramana Radhakrishnan
  0 siblings, 1 reply; 73+ messages in thread
From: Matthew Wahab @ 2016-07-04 14:14 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 828 bytes --]

On 17/05/16 15:44, Matthew Wahab wrote:
 > The ARMv8.2-A architecture introduces an optional FP16 extension adding
 > half-precision floating point data processing instructions to the
 > existing scalar (floating point) support. A future version of the ACLE
 > will add support for these instructions and this patch implements that
 > support.

Updated to use the standard arithmetic operations for vnegh_f16,
vaddh_f16, vsubh_f16, vmulh_f16 and vdivh_f16.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator.

Ok for trunk?
Matthew

2016-07-04  Matthew Wahab  <matthew.wahab@arm.com>

	* config.gcc (extra_headers): Add arm_fp16.h
	* config/arm/arm_fp16.h: New.
	* config/arm/arm_neon.h: Include "arm_fp16.h".


[-- Attachment #2: 0013-PATCH-13-17-ARM-Add-VFP-FP16-instrinsics.patch --]
[-- Type: text/x-patch, Size: 8503 bytes --]

From a9042ae0e0ea4a61436663a1afea81ccf699e9f9 Mon Sep 17 00:00:00 2001
From: Matthew Wahab <matthew.wahab@arm.com>
Date: Thu, 7 Apr 2016 15:36:23 +0100
Subject: [PATCH 13/17] [PATCH 13/17][ARM] Add VFP FP16 instrinsics.

2016-07-04  Matthew Wahab  <matthew.wahab@arm.com>

	* config.gcc (extra_headers): Add arm_fp16.h
	* config/arm/arm_fp16.h: New.
	* config/arm/arm_neon.h: Include "arm_fp16.h".
---
 gcc/config.gcc            |   2 +-
 gcc/config/arm/arm_fp16.h | 255 ++++++++++++++++++++++++++++++++++++++++++++++
 gcc/config/arm/arm_neon.h |   1 +
 3 files changed, 257 insertions(+), 1 deletion(-)
 create mode 100644 gcc/config/arm/arm_fp16.h

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 1f75f17..4333bc9 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -320,7 +320,7 @@ arc*-*-*)
 arm*-*-*)
 	cpu_type=arm
 	extra_objs="arm-builtins.o aarch-common.o"
-	extra_headers="mmintrin.h arm_neon.h arm_acle.h"
+	extra_headers="mmintrin.h arm_neon.h arm_acle.h arm_fp16.h"
 	target_type_format_char='%'
 	c_target_objs="arm-c.o"
 	cxx_target_objs="arm-c.o"
diff --git a/gcc/config/arm/arm_fp16.h b/gcc/config/arm/arm_fp16.h
new file mode 100644
index 0000000..c72d8c4
--- /dev/null
+++ b/gcc/config/arm/arm_fp16.h
@@ -0,0 +1,255 @@
+/* ARM FP16 intrinsics include file.
+
+   Copyright (C) 2016 Free Software Foundation, Inc.
+   Contributed by ARM Ltd.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _GCC_ARM_FP16_H
+#define _GCC_ARM_FP16_H 1
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <stdint.h>
+
+/* Intrinsics for FP16 instructions.  */
+#pragma GCC push_options
+#pragma GCC target ("fpu=fp-armv8")
+
+#if defined (__ARM_FEATURE_FP16_SCALAR_ARITHMETIC)
+
+typedef __fp16 float16_t;
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vabsh_f16 (float16_t __a)
+{
+  return __builtin_neon_vabshf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vaddh_f16 (float16_t __a, float16_t __b)
+{
+  return __a + __b;
+}
+
+__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+vcvtah_s32_f16 (float16_t __a)
+{
+  return __builtin_neon_vcvtahssi (__a);
+}
+
+__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+vcvtah_u32_f16 (float16_t __a)
+{
+  return __builtin_neon_vcvtahusi (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vcvth_f16_s32 (int32_t __a)
+{
+  return __builtin_neon_vcvthshf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vcvth_f16_u32 (uint32_t __a)
+{
+  return __builtin_neon_vcvthuhf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vcvth_n_f16_s32 (int32_t __a, const int __b)
+{
+  return __builtin_neon_vcvths_nhf (__a, __b);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vcvth_n_f16_u32 (uint32_t __a, const int __b)
+{
+  return __builtin_neon_vcvthu_nhf ((int32_t)__a, __b);
+}
+
+__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+vcvth_n_s32_f16 (float16_t __a, const int __b)
+{
+  return __builtin_neon_vcvths_nsi (__a, __b);
+}
+
+__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+vcvth_n_u32_f16 (float16_t __a, const int __b)
+{
+  return (uint32_t)__builtin_neon_vcvthu_nsi (__a, __b);
+}
+
+__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+vcvth_s32_f16 (float16_t __a)
+{
+  return __builtin_neon_vcvthssi (__a);
+}
+
+__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+vcvth_u32_f16 (float16_t __a)
+{
+  return __builtin_neon_vcvthusi (__a);
+}
+
+__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+vcvtmh_s32_f16 (float16_t __a)
+{
+  return __builtin_neon_vcvtmhssi (__a);
+}
+
+__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+vcvtmh_u32_f16 (float16_t __a)
+{
+  return __builtin_neon_vcvtmhusi (__a);
+}
+
+__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+vcvtnh_s32_f16 (float16_t __a)
+{
+  return __builtin_neon_vcvtnhssi (__a);
+}
+
+__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+vcvtnh_u32_f16 (float16_t __a)
+{
+  return __builtin_neon_vcvtnhusi (__a);
+}
+
+__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+vcvtph_s32_f16 (float16_t __a)
+{
+  return __builtin_neon_vcvtphssi (__a);
+}
+
+__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+vcvtph_u32_f16 (float16_t __a)
+{
+  return __builtin_neon_vcvtphusi (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vdivh_f16 (float16_t __a, float16_t __b)
+{
+  return __a / __b;
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vfmah_f16 (float16_t __a, float16_t __b, float16_t __c)
+{
+  return __builtin_neon_vfmahf (__a, __b, __c);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vfmsh_f16 (float16_t __a, float16_t __b, float16_t __c)
+{
+  return __builtin_neon_vfmshf (__a, __b, __c);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vmaxnmh_f16 (float16_t __a, float16_t __b)
+{
+  return __builtin_neon_vmaxnmhf (__a, __b);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vminnmh_f16 (float16_t __a, float16_t __b)
+{
+  return __builtin_neon_vminnmhf (__a, __b);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vmulh_f16 (float16_t __a, float16_t __b)
+{
+  return __a * __b;
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vnegh_f16 (float16_t __a)
+{
+  return  - __a;
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vrndah_f16 (float16_t __a)
+{
+  return __builtin_neon_vrndahf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vrndh_f16 (float16_t __a)
+{
+  return __builtin_neon_vrndhf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vrndih_f16 (float16_t __a)
+{
+  return __builtin_neon_vrndihf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vrndmh_f16 (float16_t __a)
+{
+  return __builtin_neon_vrndmhf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vrndnh_f16 (float16_t __a)
+{
+  return __builtin_neon_vrndnhf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vrndph_f16 (float16_t __a)
+{
+  return __builtin_neon_vrndphf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vrndxh_f16 (float16_t __a)
+{
+  return __builtin_neon_vrndxhf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vsqrth_f16 (float16_t __a)
+{
+  return __builtin_neon_vsqrthf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vsubh_f16 (float16_t __a, float16_t __b)
+{
+  return __a - __b;
+}
+
+#endif /* __ARM_FEATURE_FP16_SCALAR_ARITHMETIC  */
+#pragma GCC pop_options
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 3bd9517..8ed5aa8 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -38,6 +38,7 @@
 extern "C" {
 #endif
 
+#include <arm_fp16.h>
 #include <stdint.h>
 
 typedef __simd64_int8_t int8x8_t;
-- 
2.1.4


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 14/17][ARM] Add NEON FP16 instrinsics.
  2016-05-17 14:47 ` [PATCH 14/17][ARM] Add NEON " Matthew Wahab
@ 2016-07-04 14:16   ` Matthew Wahab
  2016-08-03 12:57     ` Ramana Radhakrishnan
  0 siblings, 1 reply; 73+ messages in thread
From: Matthew Wahab @ 2016-07-04 14:16 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2941 bytes --]

On 17/05/16 15:46, Matthew Wahab wrote:
 > The ARMv8.2-A architecture introduces an optional FP16 extension adding
 > half-precision floating point data processing instructions to the
 > existing Adv.SIMD (NEON) support. A future version of the ACLE will add
 > support for these instructions and this patch implements that support.

Updated to fix the vsqrte/vrsqrte spelling mistake.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator.

Ok for trunk?
Matthew

2016-07-04  Matthew Wahab  <matthew.wahab@arm.com>

	* config/arm/arm_neon.h (vabd_f16): New.
	(vabdq_f16): New.
	(vabs_f16): New.
	(vabsq_f16): New.
	(vadd_f16): New.
	(vaddq_f16): New.
	(vcage_f16): New.
	(vcageq_f16): New.
	(vcagt_f16): New.
	(vcagtq_f16): New.
	(vcale_f16): New.
	(vcaleq_f16): New.
	(vcalt_f16): New.
	(vcaltq_f16): New.
	(vceq_f16): New.
	(vceqq_f16): New.
	(vceqz_f16): New.
	(vceqzq_f16): New.
	(vcge_f16): New.
	(vcgeq_f16): New.
	(vcgez_f16): New.
	(vcgezq_f16): New.
	(vcgt_f16): New.
	(vcgtq_f16): New.
	(vcgtz_f16): New.
	(vcgtzq_f16): New.
	(vcle_f16): New.
	(vcleq_f16): New.
	(vclez_f16): New.
	(vclezq_f16): New.
	(vclt_f16): New.
	(vcltq_f16): New.
	(vcltz_f16): New.
	(vcltzq_f16): New.
	(vcvt_f16_s16): New.
	(vcvt_f16_u16): New.
	(vcvt_s16_f16): New.
	(vcvt_u16_f16): New.
	(vcvtq_f16_s16): New.
	(vcvtq_f16_u16): New.
	(vcvtq_s16_f16): New.
	(vcvtq_u16_f16): New.
	(vcvta_s16_f16): New.
	(vcvta_u16_f16): New.
	(vcvtaq_s16_f16): New.
	(vcvtaq_u16_f16): New.
	(vcvtm_s16_f16): New.
	(vcvtm_u16_f16): New.
	(vcvtmq_s16_f16): New.
	(vcvtmq_u16_f16): New.
	(vcvtn_s16_f16): New.
	(vcvtn_u16_f16): New.
	(vcvtnq_s16_f16): New.
	(vcvtnq_u16_f16): New.
	(vcvtp_s16_f16): New.
	(vcvtp_u16_f16): New.
	(vcvtpq_s16_f16): New.
	(vcvtpq_u16_f16): New.
	(vcvt_n_f16_s16): New.
	(vcvt_n_f16_u16): New.
	(vcvtq_n_f16_s16): New.
	(vcvtq_n_f16_u16): New.
	(vcvt_n_s16_f16): New.
	(vcvt_n_u16_f16): New.
	(vcvtq_n_s16_f16): New.
	(vcvtq_n_u16_f16): New.
	(vfma_f16): New.
	(vfmaq_f16): New.
	(vfms_f16): New.
	(vfmsq_f16): New.
	(vmax_f16): New.
	(vmaxq_f16): New.
	(vmaxnm_f16): New.
	(vmaxnmq_f16): New.
	(vmin_f16): New.
	(vminq_f16): New.
	(vminnm_f16): New.
	(vminnmq_f16): New.
	(vmul_f16): New.
	(vmul_lane_f16): New.
	(vmul_n_f16): New.
	(vmulq_f16): New.
	(vmulq_lane_f16): New.
	(vmulq_n_f16): New.
	(vneg_f16): New.
	(vnegq_f16): New.
	(vpadd_f16): New.
	(vpmax_f16): New.
	(vpmin_f16): New.
	(vrecpe_f16): New.
	(vrecpeq_f16): New.
	(vrnd_f16): New.
	(vrndq_f16): New.
	(vrnda_f16): New.
	(vrndaq_f16): New.
	(vrndm_f16): New.
	(vrndmq_f16): New.
	(vrndn_f16): New.
	(vrndnq_f16): New.
	(vrndp_f16): New.
	(vrndpq_f16): New.
	(vrndx_f16): New.
	(vrndxq_f16): New.
	(vrsqrte_f16): New.
	(vrsqrteq_f16): New.
	(vrecps_f16): New.
	(vrecpsq_f16): New.
	(vrsqrts_f16): New.
	(vrsqrtsq_f16): New.
	(vsub_f16): New.
	(vsubq_f16): New.


[-- Attachment #2: 0014-PATCH-14-17-ARM-Add-NEON-FP16-instrinsics.patch --]
[-- Type: text/x-patch, Size: 22992 bytes --]

From c26f43f3127d18971769f891c252ec5e157026f9 Mon Sep 17 00:00:00 2001
From: Matthew Wahab <matthew.wahab@arm.com>
Date: Thu, 7 Apr 2016 15:36:34 +0100
Subject: [PATCH 14/17] [PATCH 14/17][ARM] Add NEON FP16 instrinsics.

2016-07-04  Matthew Wahab  <matthew.wahab@arm.com>

	* config/arm/arm_neon.h (vabd_f16): New.
	(vabdq_f16): New.
	(vabs_f16): New.
	(vabsq_f16): New.
	(vadd_f16): New.
	(vaddq_f16): New.
	(vcage_f16): New.
	(vcageq_f16): New.
	(vcagt_f16): New.
	(vcagtq_f16): New.
	(vcale_f16): New.
	(vcaleq_f16): New.
	(vcalt_f16): New.
	(vcaltq_f16): New.
	(vceq_f16): New.
	(vceqq_f16): New.
	(vceqz_f16): New.
	(vceqzq_f16): New.
	(vcge_f16): New.
	(vcgeq_f16): New.
	(vcgez_f16): New.
	(vcgezq_f16): New.
	(vcgt_f16): New.
	(vcgtq_f16): New.
	(vcgtz_f16): New.
	(vcgtzq_f16): New.
	(vcle_f16): New.
	(vcleq_f16): New.
	(vclez_f16): New.
	(vclezq_f16): New.
	(vclt_f16): New.
	(vcltq_f16): New.
	(vcltz_f16): New.
	(vcltzq_f16): New.
	(vcvt_f16_s16): New.
	(vcvt_f16_u16): New.
	(vcvt_s16_f16): New.
	(vcvt_u16_f16): New.
	(vcvtq_f16_s16): New.
	(vcvtq_f16_u16): New.
	(vcvtq_s16_f16): New.
	(vcvtq_u16_f16): New.
	(vcvta_s16_f16): New.
	(vcvta_u16_f16): New.
	(vcvtaq_s16_f16): New.
	(vcvtaq_u16_f16): New.
	(vcvtm_s16_f16): New.
	(vcvtm_u16_f16): New.
	(vcvtmq_s16_f16): New.
	(vcvtmq_u16_f16): New.
	(vcvtn_s16_f16): New.
	(vcvtn_u16_f16): New.
	(vcvtnq_s16_f16): New.
	(vcvtnq_u16_f16): New.
	(vcvtp_s16_f16): New.
	(vcvtp_u16_f16): New.
	(vcvtpq_s16_f16): New.
	(vcvtpq_u16_f16): New.
	(vcvt_n_f16_s16): New.
	(vcvt_n_f16_u16): New.
	(vcvtq_n_f16_s16): New.
	(vcvtq_n_f16_u16): New.
	(vcvt_n_s16_f16): New.
	(vcvt_n_u16_f16): New.
	(vcvtq_n_s16_f16): New.
	(vcvtq_n_u16_f16): New.
	(vfma_f16): New.
	(vfmaq_f16): New.
	(vfms_f16): New.
	(vfmsq_f16): New.
	(vmax_f16): New.
	(vmaxq_f16): New.
	(vmaxnm_f16): New.
	(vmaxnmq_f16): New.
	(vmin_f16): New.
	(vminq_f16): New.
	(vminnm_f16): New.
	(vminnmq_f16): New.
	(vmul_f16): New.
	(vmul_lane_f16): New.
	(vmul_n_f16): New.
	(vmulq_f16): New.
	(vmulq_lane_f16): New.
	(vmulq_n_f16): New.
	(vneg_f16): New.
	(vnegq_f16): New.
	(vpadd_f16): New.
	(vpmax_f16): New.
	(vpmin_f16): New.
	(vrecpe_f16): New.
	(vrecpeq_f16): New.
	(vrnd_f16): New.
	(vrndq_f16): New.
	(vrnda_f16): New.
	(vrndaq_f16): New.
	(vrndm_f16): New.
	(vrndmq_f16): New.
	(vrndn_f16): New.
	(vrndnq_f16): New.
	(vrndp_f16): New.
	(vrndpq_f16): New.
	(vrndx_f16): New.
	(vrndxq_f16): New.
	(vrsqrte_f16): New.
	(vrsqrteq_f16): New.
	(vrecps_f16): New.
	(vrecpsq_f16): New.
	(vrsqrts_f16): New.
	(vrsqrtsq_f16): New.
	(vsub_f16): New.
	(vsubq_f16): New.
---
 gcc/config/arm/arm_neon.h | 674 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 674 insertions(+)

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 8ed5aa8..54bbc7d 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -14843,6 +14843,680 @@ vmull_high_p64 (poly64x2_t __a, poly64x2_t __b)
 
 #pragma GCC pop_options
 
+  /* Intrinsics for FP16 instructions.  */
+#pragma GCC push_options
+#pragma GCC target ("fpu=neon-fp-armv8")
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vabd_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_neon_vabdv4hf (__a, __b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vabdq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_neon_vabdv8hf (__a, __b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vabs_f16 (float16x4_t __a)
+{
+  return __builtin_neon_vabsv4hf (__a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vabsq_f16 (float16x8_t __a)
+{
+  return __builtin_neon_vabsv8hf (__a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vadd_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_neon_vaddv4hf (__a, __b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vaddq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_neon_vaddv8hf (__a, __b);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcage_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return (uint16x4_t)__builtin_neon_vcagev4hf (__a, __b);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcageq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return (uint16x8_t)__builtin_neon_vcagev8hf (__a, __b);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcagt_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return (uint16x4_t)__builtin_neon_vcagtv4hf (__a, __b);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcagtq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return (uint16x8_t)__builtin_neon_vcagtv8hf (__a, __b);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcale_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return (uint16x4_t)__builtin_neon_vcalev4hf (__a, __b);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcaleq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return (uint16x8_t)__builtin_neon_vcalev8hf (__a, __b);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcalt_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return (uint16x4_t)__builtin_neon_vcaltv4hf (__a, __b);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcaltq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return (uint16x8_t)__builtin_neon_vcaltv8hf (__a, __b);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vceq_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return (uint16x4_t)__builtin_neon_vceqv4hf (__a, __b);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vceqq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return (uint16x8_t)__builtin_neon_vceqv8hf (__a, __b);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vceqz_f16 (float16x4_t __a)
+{
+  return (uint16x4_t)__builtin_neon_vceqzv4hf (__a);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vceqzq_f16 (float16x8_t __a)
+{
+  return (uint16x8_t)__builtin_neon_vceqzv8hf (__a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcge_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return (uint16x4_t)__builtin_neon_vcgev4hf (__a, __b);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcgeq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return (uint16x8_t)__builtin_neon_vcgev8hf (__a, __b);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcgez_f16 (float16x4_t __a)
+{
+  return (uint16x4_t)__builtin_neon_vcgezv4hf (__a);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcgezq_f16 (float16x8_t __a)
+{
+  return (uint16x8_t)__builtin_neon_vcgezv8hf (__a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcgt_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return (uint16x4_t)__builtin_neon_vcgtv4hf (__a, __b);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcgtq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return (uint16x8_t)__builtin_neon_vcgtv8hf (__a, __b);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcgtz_f16 (float16x4_t __a)
+{
+  return (uint16x4_t)__builtin_neon_vcgtzv4hf (__a);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcgtzq_f16 (float16x8_t __a)
+{
+  return (uint16x8_t)__builtin_neon_vcgtzv8hf (__a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcle_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return (uint16x4_t)__builtin_neon_vclev4hf (__a, __b);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcleq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return (uint16x8_t)__builtin_neon_vclev8hf (__a, __b);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vclez_f16 (float16x4_t __a)
+{
+  return (uint16x4_t)__builtin_neon_vclezv4hf (__a);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vclezq_f16 (float16x8_t __a)
+{
+  return (uint16x8_t)__builtin_neon_vclezv8hf (__a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vclt_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return (uint16x4_t)__builtin_neon_vcltv4hf (__a, __b);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcltq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return (uint16x8_t)__builtin_neon_vcltv8hf (__a, __b);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcltz_f16 (float16x4_t __a)
+{
+  return (uint16x4_t)__builtin_neon_vcltzv4hf (__a);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcltzq_f16 (float16x8_t __a)
+{
+  return (uint16x8_t)__builtin_neon_vcltzv8hf (__a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vcvt_f16_s16 (int16x4_t __a)
+{
+  return (float16x4_t)__builtin_neon_vcvtsv4hi (__a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vcvt_f16_u16 (uint16x4_t __a)
+{
+  return (float16x4_t)__builtin_neon_vcvtuv4hi ((int16x4_t)__a);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vcvt_s16_f16 (float16x4_t __a)
+{
+  return (int16x4_t)__builtin_neon_vcvtsv4hf (__a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcvt_u16_f16 (float16x4_t __a)
+{
+  return (uint16x4_t)__builtin_neon_vcvtuv4hf (__a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vcvtq_f16_s16 (int16x8_t __a)
+{
+  return (float16x8_t)__builtin_neon_vcvtsv8hi (__a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vcvtq_f16_u16 (uint16x8_t __a)
+{
+  return (float16x8_t)__builtin_neon_vcvtuv8hi ((int16x8_t)__a);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vcvtq_s16_f16 (float16x8_t __a)
+{
+  return (int16x8_t)__builtin_neon_vcvtsv8hf (__a);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcvtq_u16_f16 (float16x8_t __a)
+{
+  return (uint16x8_t)__builtin_neon_vcvtuv8hf (__a);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vcvta_s16_f16 (float16x4_t __a)
+{
+  return __builtin_neon_vcvtasv4hf (__a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcvta_u16_f16 (float16x4_t __a)
+{
+  return (uint16x4_t)__builtin_neon_vcvtauv4hf (__a);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vcvtaq_s16_f16 (float16x8_t __a)
+{
+  return __builtin_neon_vcvtasv8hf (__a);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcvtaq_u16_f16 (float16x8_t __a)
+{
+  return (uint16x8_t)__builtin_neon_vcvtauv8hf (__a);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vcvtm_s16_f16 (float16x4_t __a)
+{
+  return __builtin_neon_vcvtmsv4hf (__a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcvtm_u16_f16 (float16x4_t __a)
+{
+  return (uint16x4_t)__builtin_neon_vcvtmuv4hf (__a);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vcvtmq_s16_f16 (float16x8_t __a)
+{
+  return __builtin_neon_vcvtmsv8hf (__a);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcvtmq_u16_f16 (float16x8_t __a)
+{
+  return (uint16x8_t)__builtin_neon_vcvtmuv8hf (__a);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vcvtn_s16_f16 (float16x4_t __a)
+{
+  return __builtin_neon_vcvtnsv4hf (__a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcvtn_u16_f16 (float16x4_t __a)
+{
+  return (uint16x4_t)__builtin_neon_vcvtnuv4hf (__a);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vcvtnq_s16_f16 (float16x8_t __a)
+{
+  return __builtin_neon_vcvtnsv8hf (__a);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcvtnq_u16_f16 (float16x8_t __a)
+{
+  return (uint16x8_t)__builtin_neon_vcvtnuv8hf (__a);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vcvtp_s16_f16 (float16x4_t __a)
+{
+  return __builtin_neon_vcvtpsv4hf (__a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcvtp_u16_f16 (float16x4_t __a)
+{
+  return (uint16x4_t)__builtin_neon_vcvtpuv4hf (__a);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vcvtpq_s16_f16 (float16x8_t __a)
+{
+  return __builtin_neon_vcvtpsv8hf (__a);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcvtpq_u16_f16 (float16x8_t __a)
+{
+  return (uint16x8_t)__builtin_neon_vcvtpuv8hf (__a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vcvt_n_f16_s16 (int16x4_t __a, const int __b)
+{
+  return __builtin_neon_vcvts_nv4hi (__a, __b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vcvt_n_f16_u16 (uint16x4_t __a, const int __b)
+{
+  return __builtin_neon_vcvtu_nv4hi ((int16x4_t)__a, __b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vcvtq_n_f16_s16 (int16x8_t __a, const int __b)
+{
+  return __builtin_neon_vcvts_nv8hi (__a, __b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vcvtq_n_f16_u16 (uint16x8_t __a, const int __b)
+{
+  return __builtin_neon_vcvtu_nv8hi ((int16x8_t)__a, __b);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vcvt_n_s16_f16 (float16x4_t __a, const int __b)
+{
+  return __builtin_neon_vcvts_nv4hf (__a, __b);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcvt_n_u16_f16 (float16x4_t __a, const int __b)
+{
+  return (uint16x4_t)__builtin_neon_vcvtu_nv4hf (__a, __b);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vcvtq_n_s16_f16 (float16x8_t __a, const int __b)
+{
+  return __builtin_neon_vcvts_nv8hf (__a, __b);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcvtq_n_u16_f16 (float16x8_t __a, const int __b)
+{
+  return (uint16x8_t)__builtin_neon_vcvtu_nv8hf (__a, __b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vfma_f16 (float16x4_t __a, float16x4_t __b, float16x4_t __c)
+{
+  return __builtin_neon_vfmav4hf (__a, __b, __c);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vfmaq_f16 (float16x8_t __a, float16x8_t __b, float16x8_t __c)
+{
+  return __builtin_neon_vfmav8hf (__a, __b, __c);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vfms_f16 (float16x4_t __a, float16x4_t __b, float16x4_t __c)
+{
+  return __builtin_neon_vfmsv4hf (__a, __b, __c);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vfmsq_f16 (float16x8_t __a, float16x8_t __b, float16x8_t __c)
+{
+  return __builtin_neon_vfmsv8hf (__a, __b, __c);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vmax_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_neon_vmaxfv4hf (__a, __b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vmaxq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_neon_vmaxfv8hf (__a, __b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vmaxnm_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_neon_vmaxnmv4hf (__a, __b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vmaxnmq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_neon_vmaxnmv8hf (__a, __b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vmin_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_neon_vminfv4hf (__a, __b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vminq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_neon_vminfv8hf (__a, __b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vminnm_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_neon_vminnmv4hf (__a, __b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vminnmq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_neon_vminnmv8hf (__a, __b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vmul_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_neon_vmulfv4hf (__a, __b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vmul_lane_f16 (float16x4_t __a, float16x4_t __b, const int __c)
+{
+  return __builtin_neon_vmul_lanev4hf (__a, __b, __c);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vmul_n_f16 (float16x4_t __a, float16_t __b)
+{
+  return __builtin_neon_vmul_nv4hf (__a, __b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vmulq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_neon_vmulfv8hf (__a, __b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vmulq_lane_f16 (float16x8_t __a, float16x4_t __b, const int __c)
+{
+  return __builtin_neon_vmul_lanev8hf (__a, __b, __c);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vmulq_n_f16 (float16x8_t __a, float16_t __b)
+{
+  return __builtin_neon_vmul_nv8hf (__a, __b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vneg_f16 (float16x4_t __a)
+{
+  return __builtin_neon_vnegv4hf (__a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vnegq_f16 (float16x8_t __a)
+{
+  return __builtin_neon_vnegv8hf (__a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vpadd_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_neon_vpaddv4hf (__a, __b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vpmax_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_neon_vpmaxfv4hf (__a, __b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vpmin_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_neon_vpminfv4hf (__a, __b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vrecpe_f16 (float16x4_t __a)
+{
+  return __builtin_neon_vrecpev4hf (__a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vrecpeq_f16 (float16x8_t __a)
+{
+  return __builtin_neon_vrecpev8hf (__a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vrnd_f16 (float16x4_t __a)
+{
+  return __builtin_neon_vrndv4hf (__a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vrndq_f16 (float16x8_t __a)
+{
+  return __builtin_neon_vrndv8hf (__a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vrnda_f16 (float16x4_t __a)
+{
+  return __builtin_neon_vrndav4hf (__a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vrndaq_f16 (float16x8_t __a)
+{
+  return __builtin_neon_vrndav8hf (__a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vrndm_f16 (float16x4_t __a)
+{
+  return __builtin_neon_vrndmv4hf (__a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vrndmq_f16 (float16x8_t __a)
+{
+  return __builtin_neon_vrndmv8hf (__a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vrndn_f16 (float16x4_t __a)
+{
+  return __builtin_neon_vrndnv4hf (__a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vrndnq_f16 (float16x8_t __a)
+{
+  return __builtin_neon_vrndnv8hf (__a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vrndp_f16 (float16x4_t __a)
+{
+  return __builtin_neon_vrndpv4hf (__a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vrndpq_f16 (float16x8_t __a)
+{
+  return __builtin_neon_vrndpv8hf (__a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vrndx_f16 (float16x4_t __a)
+{
+  return __builtin_neon_vrndxv4hf (__a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vrndxq_f16 (float16x8_t __a)
+{
+  return __builtin_neon_vrndxv8hf (__a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vrsqrte_f16 (float16x4_t __a)
+{
+  return __builtin_neon_vrsqrtev4hf (__a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vrsqrteq_f16 (float16x8_t __a)
+{
+  return __builtin_neon_vrsqrtev8hf (__a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vrecps_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_neon_vrecpsv4hf (__a, __b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vrecpsq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_neon_vrecpsv8hf (__a, __b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vrsqrts_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_neon_vrsqrtsv4hf (__a, __b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vrsqrtsq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_neon_vrsqrtsv8hf (__a, __b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vsub_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_neon_vsubv4hf (__a, __b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vsubq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_neon_vsubv8hf (__a, __b);
+}
+
+#endif /* __ARM_FEATURE_VECTOR_FP16_ARITHMETIC.  */
+#pragma GCC pop_options
+
   /* Half-precision data processing intrinsics.  */
 #if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
 
-- 
2.1.4


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 15/17][ARM] Add tests for ARMv8.2-A FP16 support.
  2016-05-17 14:49 ` [PATCH 15/17][ARM] Add tests for ARMv8.2-A FP16 support Matthew Wahab
@ 2016-07-04 14:17   ` Matthew Wahab
  2016-08-04  8:34     ` Ramana Radhakrishnan
  0 siblings, 1 reply; 73+ messages in thread
From: Matthew Wahab @ 2016-07-04 14:17 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1280 bytes --]

On 17/05/16 15:48, Matthew Wahab wrote:
 > Support for using the half-precision floating point operations added by
 > the ARMv8.2-A FP16 extension is based on the macros and intrinsics added
 > to the ACLE for the extension.
 >
 > This patch adds tests to check the compilers treatment of the ACLE
 > macros and the code generated for the new intrinsics. It does not
 > include the executable tests for the
 > gcc.target/aarch64/advsimd-intrinsics testsuite. Those are added later
 > in the patch series.

Changes since the previous version are:

- Fix the vsqrte/vrsqrte spelling mistake.

- armv8_2-fp16-scalar-2.c: Set option -std=c11, needed to test that
   vaddh_f16 (vmulh_f16 (a, b), c) generates a VMLA. (Options enabled
   with the default -std=g11 mean that VFMA would be generated
   otherwise.)

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator.

Ok for trunk?
Matthew

testsuite/
2016-07-04  Matthew Wahab  <matthew.wahab@arm.com>

	* gcc.target/arm/armv8_2-fp16-neon-1.c: New.
	* gcc.target/arm/armv8_2-fp16-scalar-1.c: New.
	* gcc.target/arm/armv8_2-fp16-scalar-2.c: New.
	* gcc.target/arm/attr-fp16-arith-1.c: Add a test of intrinsics
	support.


[-- Attachment #2: 0015-PATCH-15-17-ARM-Add-tests-for-ARMv8.2-A-FP16-support.patch --]
[-- Type: text/x-patch, Size: 27366 bytes --]

From b8760efc9da23357dc2bccef36e8ba2fc2f7a856 Mon Sep 17 00:00:00 2001
From: Matthew Wahab <matthew.wahab@arm.com>
Date: Thu, 7 Apr 2016 13:38:02 +0100
Subject: [PATCH 15/17] [PATCH 15/17][ARM] Add tests for ARMv8.2-A FP16
 support.

testsuite/
2016-07-04  Matthew Wahab  <matthew.wahab@arm.com>

	* gcc.target/arm/armv8_2-fp16-neon-1.c: New.
	* gcc.target/arm/armv8_2-fp16-scalar-1.c: New.
	* gcc.target/arm/armv8_2-fp16-scalar-2.c: New.
	* gcc.target/arm/attr-fp16-arith-1.c: Add a test of intrinsics
	support.
---
 gcc/testsuite/gcc.target/arm/armv8_2-fp16-neon-1.c | 490 +++++++++++++++++++++
 .../gcc.target/arm/armv8_2-fp16-scalar-1.c         | 203 +++++++++
 .../gcc.target/arm/armv8_2-fp16-scalar-2.c         |  71 +++
 gcc/testsuite/gcc.target/arm/attr-fp16-arith-1.c   |  13 +
 4 files changed, 777 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/arm/armv8_2-fp16-neon-1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/armv8_2-fp16-scalar-1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/armv8_2-fp16-scalar-2.c

diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-neon-1.c b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-neon-1.c
new file mode 100644
index 0000000..968efae
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-neon-1.c
@@ -0,0 +1,490 @@
+/* { dg-do compile }  */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_ok }  */
+/* { dg-options "-O2" }  */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
+
+/* Test instructions generated for the FP16 vector intrinsics.  */
+
+#include <arm_neon.h>
+
+#define MSTRCAT(L, str)	L##str
+
+#define UNOP_TEST(insn)				\
+  float16x4_t					\
+  MSTRCAT (test_##insn, _16x4) (float16x4_t a)	\
+  {						\
+    return MSTRCAT (insn, _f16) (a);		\
+  }						\
+  float16x8_t					\
+  MSTRCAT (test_##insn, _16x8) (float16x8_t a)	\
+  {						\
+    return MSTRCAT (insn, q_f16) (a);		\
+  }
+
+#define BINOP_TEST(insn)					\
+  float16x4_t							\
+  MSTRCAT (test_##insn, _16x4) (float16x4_t a, float16x4_t b)	\
+  {								\
+    return MSTRCAT (insn, _f16) (a, b);				\
+  }								\
+  float16x8_t							\
+  MSTRCAT (test_##insn, _16x8) (float16x8_t a, float16x8_t b)	\
+  {								\
+    return MSTRCAT (insn, q_f16) (a, b);			\
+  }
+
+#define BINOP_LANE_TEST(insn, I)					\
+  float16x4_t								\
+  MSTRCAT (test_##insn##_lane, _16x4) (float16x4_t a, float16x4_t b)	\
+  {									\
+    return MSTRCAT (insn, _lane_f16) (a, b, I);				\
+  }									\
+  float16x8_t								\
+  MSTRCAT (test_##insn##_lane, _16x8) (float16x8_t a, float16x4_t b)	\
+  {									\
+    return MSTRCAT (insn, q_lane_f16) (a, b, I);			\
+  }
+
+#define BINOP_LANEQ_TEST(insn, I)					\
+  float16x4_t								\
+  MSTRCAT (test_##insn##_laneq, _16x4) (float16x4_t a, float16x8_t b)	\
+  {									\
+    return MSTRCAT (insn, _laneq_f16) (a, b, I);			\
+  }									\
+  float16x8_t								\
+  MSTRCAT (test_##insn##_laneq, _16x8) (float16x8_t a, float16x8_t b)	\
+  {									\
+    return MSTRCAT (insn, q_laneq_f16) (a, b, I);			\
+  }									\
+
+#define BINOP_N_TEST(insn)					\
+  float16x4_t							\
+  MSTRCAT (test_##insn##_n, _16x4) (float16x4_t a, float16_t b)	\
+  {								\
+    return MSTRCAT (insn, _n_f16) (a, b);			\
+  }								\
+  float16x8_t							\
+  MSTRCAT (test_##insn##_n, _16x8) (float16x8_t a, float16_t b)	\
+  {								\
+    return MSTRCAT (insn, q_n_f16) (a, b);			\
+  }
+
+#define TERNOP_TEST(insn)						\
+  float16_t								\
+  MSTRCAT (test_##insn, _16) (float16_t a, float16_t b, float16_t c)	\
+  {									\
+    return MSTRCAT (insn, h_f16) (a, b, c);				\
+  }									\
+  float16x4_t								\
+  MSTRCAT (test_##insn, _16x4) (float16x4_t a, float16x4_t b,		\
+			       float16x4_t c)				\
+  {									\
+    return MSTRCAT (insn, _f16) (a, b, c);				\
+  }									\
+  float16x8_t								\
+  MSTRCAT (test_##insn, _16x8) (float16x8_t a, float16x8_t b,		\
+			       float16x8_t c)				\
+  {									\
+    return MSTRCAT (insn, q_f16) (a, b, c);				\
+  }
+
+#define VCMP1_TEST(insn)			\
+  uint16x4_t					\
+  MSTRCAT (test_##insn, _16x4) (float16x4_t a)	\
+  {						\
+    return MSTRCAT (insn, _f16) (a);		\
+  }						\
+  uint16x8_t					\
+  MSTRCAT (test_##insn, _16x8) (float16x8_t a)	\
+  {						\
+    return MSTRCAT (insn, q_f16) (a);		\
+  }
+
+#define VCMP2_TEST(insn)					\
+  uint16x4_t							\
+  MSTRCAT (test_##insn, _16x4) (float16x4_t a, float16x4_t b)	\
+  {								\
+    return MSTRCAT (insn, _f16) (a, b);				\
+  }								\
+  uint16x8_t							\
+  MSTRCAT (test_##insn, _16x8) (float16x8_t a, float16x8_t b)	\
+  {								\
+    return MSTRCAT (insn, q_f16) (a, b);			\
+  }
+
+#define VCVT_TEST(insn, TY, TO, FR)			\
+  MSTRCAT (TO, 16x4_t)					\
+  MSTRCAT (test_##insn, TY) (MSTRCAT (FR, 16x4_t) a)	\
+  {							\
+    return MSTRCAT (insn, TY) (a);			\
+  }							\
+  MSTRCAT (TO, 16x8_t)					\
+  MSTRCAT (test_##insn##_q, TY) (MSTRCAT (FR, 16x8_t) a)	\
+  {							\
+    return MSTRCAT (insn, q##TY) (a);			\
+  }
+
+#define VCVT_N_TEST(insn, TY, TO, FR)			\
+  MSTRCAT (TO, 16x4_t)					\
+  MSTRCAT (test_##insn##_n, TY) (MSTRCAT (FR, 16x4_t) a)	\
+  {							\
+    return MSTRCAT (insn, _n##TY) (a, 1);		\
+  }							\
+  MSTRCAT (TO, 16x8_t)					\
+  MSTRCAT (test_##insn##_n_q, TY) (MSTRCAT (FR, 16x8_t) a)	\
+  {							\
+    return MSTRCAT (insn, q_n##TY) (a, 1);		\
+  }
+
+VCMP1_TEST (vceqz)
+/* { dg-final { scan-assembler-times {vceq\.f16\td[0-9]+, d[0-0]+, #0} 1 } }  */
+/* { dg-final { scan-assembler-times {vceq\.f16\tq[0-9]+, q[0-9]+, #0} 1 } }  */
+
+VCMP1_TEST (vcgtz)
+/* { dg-final { scan-assembler-times {vcgt\.f16\td[0-9]+, d[0-9]+, #0} 1 } }  */
+/* { dg-final { scan-assembler-times {vceq\.f16\tq[0-9]+, q[0-9]+, #0} 1 } }  */
+
+VCMP1_TEST (vcgez)
+/* { dg-final { scan-assembler-times {vcge\.f16\td[0-9]+, d[0-9]+, #0} 1 } }  */
+/* { dg-final { scan-assembler-times {vcge\.f16\tq[0-9]+, q[0-9]+, #0} 1 } }  */
+
+VCMP1_TEST (vcltz)
+/* { dg-final { scan-assembler-times {vclt.f16\td[0-9]+, d[0-9]+, #0} 1 } }  */
+/* { dg-final { scan-assembler-times {vclt.f16\tq[0-9]+, q[0-9]+, #0} 1 } }  */
+
+VCMP1_TEST (vclez)
+/* { dg-final { scan-assembler-times {vcle\.f16\td[0-9]+, d[0-9]+, #0} 1 } }  */
+/* { dg-final { scan-assembler-times {vcle\.f16\tq[0-9]+, q[0-9]+, #0} 1 } }  */
+
+VCVT_TEST (vcvt, _f16_s16, float, int)
+VCVT_N_TEST (vcvt, _f16_s16, float, int)
+/* { dg-final { scan-assembler-times {vcvt\.f16\.s16\td[0-9]+, d[0-9]+} 2 } }
+   { dg-final { scan-assembler-times {vcvt\.f16\.s16\tq[0-9]+, q[0-9]+} 2 } }
+   { dg-final { scan-assembler-times {vcvt\.f16\.s16\td[0-9]+, d[0-9]+, #1} 1 } }
+   { dg-final { scan-assembler-times {vcvt\.f16\.s16\tq[0-9]+, q[0-9]+, #1} 1 } }  */
+
+VCVT_TEST (vcvt, _f16_u16, float, uint)
+VCVT_N_TEST (vcvt, _f16_u16, float, uint)
+/* { dg-final { scan-assembler-times {vcvt\.f16\.u16\td[0-9]+, d[0-9]+} 2 } }
+   { dg-final { scan-assembler-times {vcvt\.f16\.u16\tq[0-9]+, q[0-9]+} 2 } }
+   { dg-final { scan-assembler-times {vcvt\.f16\.u16\td[0-9]+, d[0-9]+, #1} 1 } }
+   { dg-final { scan-assembler-times {vcvt\.f16\.u16\tq[0-9]+, q[0-9]+, #1} 1 } }  */
+
+VCVT_TEST (vcvt, _s16_f16, int, float)
+VCVT_N_TEST (vcvt, _s16_f16, int, float)
+/* { dg-final { scan-assembler-times {vcvt\.s16\.f16\td[0-9]+, d[0-9]+} 2 } }
+   { dg-final { scan-assembler-times {vcvt\.s16\.f16\tq[0-9]+, q[0-9]+} 2 } }
+   { dg-final { scan-assembler-times {vcvt\.s16\.f16\td[0-9]+, d[0-9]+, #1} 1 } }
+   { dg-final { scan-assembler-times {vcvt\.s16\.f16\tq[0-9]+, q[0-9]+, #1} 1 } }  */
+
+VCVT_TEST (vcvt, _u16_f16, uint, float)
+VCVT_N_TEST (vcvt, _u16_f16, uint, float)
+/* { dg-final { scan-assembler-times {vcvt\.u16\.f16\td[0-9]+, d[0-9]+} 2 } }
+   { dg-final { scan-assembler-times {vcvt\.u16\.f16\tq[0-9]+, q[0-9]+} 2 } }
+   { dg-final { scan-assembler-times {vcvt\.u16\.f16\td[0-9]+, d[0-9]+, #1} 1 } }
+   { dg-final { scan-assembler-times {vcvt\.u16\.f16\tq[0-9]+, q[0-9]+, #1} 1 } }  */
+
+VCVT_TEST (vcvta, _s16_f16, int, float)
+/* { dg-final { scan-assembler-times {vcvta\.s16\.f16\td[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vcvta\.s16\.f16\tq[0-9]+, q[0-9]+} 1 } }
+*/
+
+VCVT_TEST (vcvta, _u16_f16, uint, float)
+/* { dg-final { scan-assembler-times {vcvta\.u16\.f16\td[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vcvta\.u16\.f16\tq[0-9]+, q[0-9]+} 1 } }
+*/
+
+VCVT_TEST (vcvtm, _s16_f16, int, float)
+/* { dg-final { scan-assembler-times {vcvtm\.s16\.f16\td[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vcvtm\.s16\.f16\tq[0-9]+, q[0-9]+} 1 } }
+*/
+
+VCVT_TEST (vcvtm, _u16_f16, uint, float)
+/* { dg-final { scan-assembler-times {vcvtm\.u16\.f16\td[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vcvtm\.u16\.f16\tq[0-9]+, q[0-9]+} 1 } }
+*/
+
+VCVT_TEST (vcvtn, _s16_f16, int, float)
+/* { dg-final { scan-assembler-times {vcvtn\.s16\.f16\td[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vcvtn\.s16\.f16\tq[0-9]+, q[0-9]+} 1 } }
+*/
+
+VCVT_TEST (vcvtn, _u16_f16, uint, float)
+/* { dg-final { scan-assembler-times {vcvtn\.u16\.f16\td[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vcvtn\.u16\.f16\tq[0-9]+, q[0-9]+} 1 } }
+*/
+
+VCVT_TEST (vcvtp, _s16_f16, int, float)
+/* { dg-final { scan-assembler-times {vcvtp\.s16\.f16\td[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vcvtp\.s16\.f16\tq[0-9]+, q[0-9]+} 1 } }
+*/
+
+VCVT_TEST (vcvtp, _u16_f16, uint, float)
+/* { dg-final { scan-assembler-times {vcvtp\.u16\.f16\td[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vcvtp\.u16\.f16\tq[0-9]+, q[0-9]+} 1 } }
+*/
+
+UNOP_TEST (vabs)
+/* { dg-final { scan-assembler-times {vabs\.f16\td[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vabs\.f16\tq[0-9]+, q[0-9]+} 1 } }  */
+
+UNOP_TEST (vneg)
+/* { dg-final { scan-assembler-times {vneg\.f16\td[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vneg\.f16\tq[0-9]+, q[0-9]+} 1 } }  */
+
+UNOP_TEST (vrecpe)
+/* { dg-final { scan-assembler-times {vrecpe\.f16\td[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vrecpe\.f16\tq[0-9]+, q[0-9]+} 1 } }  */
+
+UNOP_TEST (vrnd)
+/* { dg-final { scan-assembler-times {vrintz\.f16\td[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vrintz\.f16\tq[0-9]+, q[0-9]+} 1 } }  */
+
+UNOP_TEST (vrnda)
+/* { dg-final { scan-assembler-times {vrinta\.f16\td[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vrinta\.f16\tq[0-9]+, q[0-9]+} 1 } }  */
+
+UNOP_TEST (vrndm)
+/* { dg-final { scan-assembler-times {vrintm\.f16\td[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vrintm\.f16\tq[0-9]+, q[0-9]+} 1 } }  */
+
+UNOP_TEST (vrndn)
+/* { dg-final { scan-assembler-times {vrintn\.f16\td[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vrintn\.f16\tq[0-9]+, q[0-9]+} 1 } }  */
+
+UNOP_TEST (vrndp)
+/* { dg-final { scan-assembler-times {vrintp\.f16\td[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vrintp\.f16\tq[0-9]+, q[0-9]+} 1 } }  */
+
+UNOP_TEST (vrndx)
+/* { dg-final { scan-assembler-times {vrintx\.f16\td[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vrintx\.f16\tq[0-9]+, q[0-9]+} 1 } }  */
+
+UNOP_TEST (vrsqrte)
+/* { dg-final { scan-assembler-times {vrsqrte\.f16\td[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vrsqrte\.f16\tq[0-9]+, q[0-9]+} 1 } }  */
+
+BINOP_TEST (vadd)
+/* { dg-final { scan-assembler-times {vadd\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vadd\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
+
+BINOP_TEST (vabd)
+/* { dg-final { scan-assembler-times {vabd\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vabd\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
+
+VCMP2_TEST (vcage)
+/* { dg-final { scan-assembler-times {vacge\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vacge\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
+
+VCMP2_TEST (vcagt)
+/* { dg-final { scan-assembler-times {vacgt\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vacgt\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
+
+VCMP2_TEST (vcale)
+/* { dg-final { scan-assembler-times {vacle\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vacle\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
+
+VCMP2_TEST (vcalt)
+/* { dg-final { scan-assembler-times {vaclt\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vaclt\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
+
+VCMP2_TEST (vceq)
+/* { dg-final { scan-assembler-times {vceq\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vceq\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
+
+VCMP2_TEST (vcge)
+/* { dg-final { scan-assembler-times {vcge\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vcge\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
+
+VCMP2_TEST (vcgt)
+/* { dg-final { scan-assembler-times {vcgt\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vcgt\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
+
+VCMP2_TEST (vcle)
+/* { dg-final { scan-assembler-times {vcle\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vcle\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
+
+VCMP2_TEST (vclt)
+/* { dg-final { scan-assembler-times {vclt\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vclt\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
+
+BINOP_TEST (vmax)
+/* { dg-final { scan-assembler-times {vmax\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vmax\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
+
+BINOP_TEST (vmin)
+/* { dg-final { scan-assembler-times {vmin\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vmin\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
+
+BINOP_TEST (vmaxnm)
+/* { dg-final { scan-assembler-times {vmaxnm\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }
+  { dg-final { scan-assembler-times {vmaxnm\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
+
+BINOP_TEST (vminnm)
+/* { dg-final { scan-assembler-times {vminnm\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }
+  { dg-final { scan-assembler-times {vminnm\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
+
+BINOP_TEST (vmul)
+/* { dg-final { scan-assembler-times {vmul\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 3 } }
+   { dg-final { scan-assembler-times {vmul\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
+BINOP_LANE_TEST (vmul, 2)
+/* { dg-final { scan-assembler-times {vmul\.f16\td[0-9]+, d[0-9]+, d[0-9]+\[2\]} 1 } }
+   { dg-final { scan-assembler-times {vmul\.f16\tq[0-9]+, q[0-9]+, d[0-9]+\[2\]} 1 } }  */
+BINOP_N_TEST (vmul)
+/* { dg-final { scan-assembler-times {vmul\.f16\td[0-9]+, d[0-9]+, d[0-9]+\[0\]} 1 } }
+   { dg-final { scan-assembler-times {vmul\.f16\tq[0-9]+, q[0-9]+, d[0-9]+\[0\]} 1 } }*/
+
+float16x4_t
+test_vpadd_16x4 (float16x4_t a, float16x4_t b)
+{
+  return vpadd_f16 (a, b);
+}
+/* { dg-final { scan-assembler-times {vpadd\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } } */
+
+float16x4_t
+test_vpmax_16x4 (float16x4_t a, float16x4_t b)
+{
+  return vpmax_f16 (a, b);
+}
+/* { dg-final { scan-assembler-times {vpmax\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } } */
+
+float16x4_t
+test_vpmin_16x4 (float16x4_t a, float16x4_t b)
+{
+  return vpmin_f16 (a, b);
+}
+/* { dg-final { scan-assembler-times {vpmin\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } } */
+
+BINOP_TEST (vsub)
+/* { dg-final { scan-assembler-times {vsub\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vsub\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
+
+BINOP_TEST (vrecps)
+/* { dg-final { scan-assembler-times {vrecps\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }
+  { dg-final { scan-assembler-times {vrecps\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
+
+BINOP_TEST (vrsqrts)
+/* { dg-final { scan-assembler-times {vrsqrts\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }
+  { dg-final { scan-assembler-times {vrsqrts\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
+
+TERNOP_TEST (vfma)
+/* { dg-final { scan-assembler-times {vfma\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }
+  { dg-final { scan-assembler-times {vfma\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
+
+TERNOP_TEST (vfms)
+/* { dg-final { scan-assembler-times {vfms\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }
+  { dg-final { scan-assembler-times {vfms\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
+
+float16x4_t
+test_vmov_n_f16 (float16_t a)
+{
+  return vmov_n_f16 (a);
+}
+
+float16x4_t
+test_vdup_n_f16 (float16_t a)
+{
+  return vdup_n_f16 (a);
+}
+/* { dg-final { scan-assembler-times {vdup\.16\td[0-9]+, r[0-9]+} 2 } }  */
+
+float16x8_t
+test_vmovq_n_f16 (float16_t a)
+{
+  return vmovq_n_f16 (a);
+}
+
+float16x8_t
+test_vdupq_n_f16 (float16_t a)
+{
+  return vdupq_n_f16 (a);
+}
+/* { dg-final { scan-assembler-times {vdup\.16\tq[0-9]+, r[0-9]+} 2 } }  */
+
+float16x4_t
+test_vdup_lane_f16 (float16x4_t a)
+{
+  return vdup_lane_f16 (a, 1);
+}
+/* { dg-final { scan-assembler-times {vdup\.16\td[0-9]+, d[0-9]+\[1\]} 1 } }  */
+
+float16x8_t
+test_vdupq_lane_f16 (float16x4_t a)
+{
+  return vdupq_lane_f16 (a, 1);
+}
+/* { dg-final { scan-assembler-times {vdup\.16\tq[0-9]+, d[0-9]+\[1\]} 1 } }  */
+
+float16x4_t
+test_vext_f16 (float16x4_t a, float16x4_t b)
+{
+  return vext_f16 (a, b, 1);
+}
+/* { dg-final { scan-assembler-times {vext\.16\td[0-9]+, d[0-9]+, d[0-9]+, #1} 1 } } */
+
+float16x8_t
+test_vextq_f16 (float16x8_t a, float16x8_t b)
+{
+  return vextq_f16 (a, b, 1);
+}
+/*   { dg-final { scan-assembler-times {vext\.16\tq[0-9]+, q[0-9]+, q[0-9]+, #1} 1 } }  */
+
+UNOP_TEST (vrev64)
+/* { dg-final { scan-assembler-times {vrev64\.16\td[0-9]+, d[0-9]+} 1 } }
+   { dg-final { scan-assembler-times {vrev64\.16\tq[0-9]+, q[0-9]+} 1 } }  */
+
+float16x4_t
+test_vbsl16x4 (uint16x4_t a, float16x4_t b, float16x4_t c)
+{
+  return vbsl_f16 (a, b, c);
+}
+/* { dg-final { scan-assembler-times {vbsl\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }  */
+
+float16x8_t
+test_vbslq16x8 (uint16x8_t a, float16x8_t b, float16x8_t c)
+{
+  return vbslq_f16 (a, b, c);
+}
+/*{ dg-final { scan-assembler-times {vbsl\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
+
+float16x4x2_t
+test_vzip16x4 (float16x4_t a, float16x4_t b)
+{
+  return vzip_f16 (a, b);
+}
+/* { dg-final { scan-assembler-times {vzip\.16\td[0-9]+, d[0-9]+} 1 } }  */
+
+float16x8x2_t
+test_vzipq16x8 (float16x8_t a, float16x8_t b)
+{
+  return vzipq_f16 (a, b);
+}
+/*{ dg-final { scan-assembler-times {vzip\.16\tq[0-9]+, q[0-9]+} 1 } }  */
+
+float16x4x2_t
+test_vuzp16x4 (float16x4_t a, float16x4_t b)
+{
+  return vuzp_f16 (a, b);
+}
+/* { dg-final { scan-assembler-times {vuzp\.16\td[0-9]+, d[0-9]+} 1 } }  */
+
+float16x8x2_t
+test_vuzpq16x8 (float16x8_t a, float16x8_t b)
+{
+  return vuzpq_f16 (a, b);
+}
+/*{ dg-final { scan-assembler-times {vuzp\.16\tq[0-9]+, q[0-9]+} 1 } }  */
+
+float16x4x2_t
+test_vtrn16x4 (float16x4_t a, float16x4_t b)
+{
+  return vtrn_f16 (a, b);
+}
+/* { dg-final { scan-assembler-times {vtrn\.16\td[0-9]+, d[0-9]+} 1 } }  */
+
+float16x8x2_t
+test_vtrnq16x8 (float16x8_t a, float16x8_t b)
+{
+  return vtrnq_f16 (a, b);
+}
+/*{ dg-final { scan-assembler-times {vtrn\.16\tq[0-9]+, q[0-9]+} 1 } }  */
diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-scalar-1.c b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-scalar-1.c
new file mode 100644
index 0000000..2eddb76
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-scalar-1.c
@@ -0,0 +1,203 @@
+/* { dg-do compile }  */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_ok }  */
+/* { dg-options "-O2" }  */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+/* Test instructions generated for the FP16 scalar intrinsics.  */
+#include <arm_fp16.h>
+
+#define MSTRCAT(L, str)	L##str
+
+#define UNOP_TEST(insn)				\
+  float16_t					\
+  MSTRCAT (test_##insn, 16) (float16_t a)	\
+  {						\
+    return MSTRCAT (insn, h_f16) (a);		\
+  }
+
+#define BINOP_TEST(insn)				\
+  float16_t						\
+  MSTRCAT (test_##insn, 16) (float16_t a, float16_t b)	\
+  {							\
+    return MSTRCAT (insn, h_f16) (a, b);		\
+  }
+
+#define TERNOP_TEST(insn)						\
+  float16_t								\
+  MSTRCAT (test_##insn, 16) (float16_t a, float16_t b, float16_t c)	\
+  {									\
+    return MSTRCAT (insn, h_f16) (a, b, c);				\
+  }
+
+float16_t
+test_vcvth_f16_s32 (int32_t a)
+{
+  return vcvth_f16_s32 (a);
+}
+
+float16_t
+test_vcvth_n_f16_s32 (int32_t a)
+{
+  return vcvth_n_f16_s32 (a, 1);
+}
+/* { dg-final { scan-assembler-times {vcvt\.f16\.s32\ts[0-9]+, s[0-9]+} 2 } }  */
+/* { dg-final { scan-assembler-times {vcvt\.f16\.s32\ts[0-9]+, s[0-9]+, #1} 1 } }  */
+
+float16_t
+test_vcvth_f16_u32 (uint32_t a)
+{
+  return vcvth_f16_u32 (a);
+}
+
+float16_t
+test_vcvth_n_f16_u32 (uint32_t a)
+{
+  return vcvth_n_f16_u32 (a, 1);
+}
+
+/* { dg-final { scan-assembler-times {vcvt\.f16\.u32\ts[0-9]+, s[0-9]+} 2 } }  */
+/* { dg-final { scan-assembler-times {vcvt\.f16\.u32\ts[0-9]+, s[0-9]+, #1} 1 } }  */
+
+uint32_t
+test_vcvth_u32_f16 (float16_t a)
+{
+  return vcvth_u32_f16 (a);
+}
+/* { dg-final { scan-assembler-times {vcvt\.u32\.f16\ts[0-9]+, s[0-9]+} 2 } }  */
+
+uint32_t
+test_vcvth_n_u32_f16 (float16_t a)
+{
+  return vcvth_n_u32_f16 (a, 1);
+}
+/* { dg-final { scan-assembler-times {vcvt\.u32\.f16\ts[0-9]+, s[0-9]+, #1} 1 } }  */
+
+int32_t
+test_vcvth_s32_f16 (float16_t a)
+{
+  return vcvth_s32_f16 (a);
+}
+
+int32_t
+test_vcvth_n_s32_f16 (float16_t a)
+{
+  return vcvth_n_s32_f16 (a, 1);
+}
+
+/* { dg-final { scan-assembler-times {vcvt\.s32\.f16\ts[0-9]+, s[0-9]+} 2 } }  */
+/* { dg-final { scan-assembler-times {vcvt\.s32\.f16\ts[0-9]+, s[0-9]+, #1} 1 } }  */
+
+int32_t
+test_vcvtah_s32_f16 (float16_t a)
+{
+  return vcvtah_s32_f16 (a);
+}
+/* { dg-final { scan-assembler-times {vcvta\.s32\.f16\ts[0-9]+, s[0-9]+} 1 } }  */
+
+uint32_t
+test_vcvtah_u32_f16 (float16_t a)
+{
+  return vcvtah_u32_f16 (a);
+}
+/* { dg-final { scan-assembler-times {vcvta\.u32\.f16\ts[0-9]+, s[0-9]+} 1 } }  */
+
+int32_t
+test_vcvtmh_s32_f16 (float16_t a)
+{
+  return vcvtmh_s32_f16 (a);
+}
+/* { dg-final { scan-assembler-times {vcvtm\.s32\.f16\ts[0-9]+, s[0-9]+} 1 } }  */
+
+uint32_t
+test_vcvtmh_u32_f16 (float16_t a)
+{
+  return vcvtmh_u32_f16 (a);
+}
+/* { dg-final { scan-assembler-times {vcvtm\.u32\.f16\ts[0-9]+, s[0-9]+} 1 } }
+ */
+
+int32_t
+test_vcvtnh_s32_f16 (float16_t a)
+{
+  return vcvtnh_s32_f16 (a);
+}
+/* { dg-final { scan-assembler-times {vcvtn\.s32\.f16\ts[0-9]+, s[0-9]+} 1 } }
+ */
+
+uint32_t
+test_vcvtnh_u32_f16 (float16_t a)
+{
+  return vcvtnh_u32_f16 (a);
+}
+/* { dg-final { scan-assembler-times {vcvtn\.u32\.f16\ts[0-9]+, s[0-9]+} 1 } }
+ */
+
+int32_t
+test_vcvtph_s32_f16 (float16_t a)
+{
+  return vcvtph_s32_f16 (a);
+}
+/* { dg-final { scan-assembler-times {vcvtp\.s32\.f16\ts[0-9]+, s[0-9]+} 1 } }
+ */
+
+uint32_t
+test_vcvtph_u32_f16 (float16_t a)
+{
+  return vcvtph_u32_f16 (a);
+}
+/* { dg-final { scan-assembler-times {vcvtp\.u32\.f16\ts[0-9]+, s[0-9]+} 1 } }
+ */
+
+UNOP_TEST (vabs)
+/* { dg-final { scan-assembler-times {vabs\.f16\ts[0-9]+, s[0-9]+} 1 } }  */
+
+UNOP_TEST (vneg)
+/* { dg-final { scan-assembler-times {vneg\.f16\ts[0-9]+, s[0-9]+} 1 } }  */
+
+UNOP_TEST (vrnd)
+/* { dg-final { scan-assembler-times {vrintz\.f16\ts[0-9]+, s[0-9]+} 1 } }  */
+
+UNOP_TEST (vrndi)
+/* { dg-final { scan-assembler-times {vrintr\.f16\ts[0-9]+, s[0-9]+} 1 } }  */
+
+UNOP_TEST (vrnda)
+/* { dg-final { scan-assembler-times {vrinta\.f16\ts[0-9]+, s[0-9]+} 1 } }  */
+
+UNOP_TEST (vrndm)
+/* { dg-final { scan-assembler-times {vrinta\.f16\ts[0-9]+, s[0-9]+} 1 } }  */
+
+UNOP_TEST (vrndn)
+/* { dg-final { scan-assembler-times {vrinta\.f16\ts[0-9]+, s[0-9]+} 1 } }  */
+
+UNOP_TEST (vrndp)
+/* { dg-final { scan-assembler-times {vrinta\.f16\ts[0-9]+, s[0-9]+} 1 } }  */
+
+UNOP_TEST (vrndx)
+/* { dg-final { scan-assembler-times {vrinta\.f16\ts[0-9]+, s[0-9]+} 1 } }  */
+
+UNOP_TEST (vsqrt)
+/* { dg-final { scan-assembler-times {vsqrt\.f16\ts[0-9]+, s[0-9]+} 1 } }  */
+
+BINOP_TEST (vadd)
+/* { dg-final { scan-assembler-times {vadd\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } }  */
+
+BINOP_TEST (vdiv)
+/* { dg-final { scan-assembler-times {vdiv\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } }  */
+
+BINOP_TEST (vmaxnm)
+/* { dg-final { scan-assembler-times {vmaxnm\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } }  */
+
+BINOP_TEST (vminnm)
+/* { dg-final { scan-assembler-times {vminnm\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } }  */
+
+BINOP_TEST (vmul)
+/* { dg-final { scan-assembler-times {vmul\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } }  */
+
+BINOP_TEST (vsub)
+/* { dg-final { scan-assembler-times {vsub\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } }  */
+
+TERNOP_TEST (vfma)
+/* { dg-final { scan-assembler-times {vfma\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } }  */
+
+TERNOP_TEST (vfms)
+/* { dg-final { scan-assembler-times {vfms\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } }  */
diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-scalar-2.c b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-scalar-2.c
new file mode 100644
index 0000000..fa4828d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-scalar-2.c
@@ -0,0 +1,71 @@
+/* { dg-do compile }  */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_ok }  */
+/* { dg-options "-O2 -std=c11" }  */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+/* Test compiler use of FP16 instructions.  */
+#include <arm_fp16.h>
+
+float16_t
+test_mov_imm_1 (float16_t a)
+{
+  return 1.0;
+}
+
+float16_t
+test_mov_imm_2 (float16_t a)
+{
+  float16_t b = 1.0;
+  return b;
+}
+
+float16_t
+test_vmov_imm_3 (float16_t a)
+{
+  float16_t b = 1.0;
+  return vaddh_f16 (a, b);
+}
+
+float16_t
+test_vmov_imm_4 (float16_t a)
+{
+  return vaddh_f16 (a, 1.0);
+}
+
+/* { dg-final { scan-assembler-times {vmov.f16\ts[0-9]+, #1\.0e\+0} 4 } }
+   { dg-final { scan-assembler-times {vadd.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 2 } } */
+
+float16_t
+test_vmla_1 (float16_t a, float16_t b, float16_t c)
+{
+  return vaddh_f16 (vmulh_f16 (a, b), c);
+}
+/* { dg-final { scan-assembler-times {vmla\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } }  */
+
+float16_t
+test_vmla_2 (float16_t a, float16_t b, float16_t c)
+{
+  return vsubh_f16 (vmulh_f16 (vnegh_f16 (a), b), c);
+}
+/* { dg-final { scan-assembler-times {vnmla\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } } */
+
+float16_t
+test_vmls_1 (float16_t a, float16_t b, float16_t c)
+{
+  return vsubh_f16 (c, vmulh_f16 (a, b));
+}
+
+float16_t
+test_vmls_2 (float16_t a, float16_t b, float16_t c)
+{
+  return vsubh_f16 (a, vmulh_f16 (b, c));
+}
+/* { dg-final { scan-assembler-times {vmls\.f16} 2 } } */
+
+float16_t
+test_vnmls_1 (float16_t a, float16_t b, float16_t c)
+{
+  return vsubh_f16 (vmulh_f16 (a, b), c);
+}
+/* { dg-final { scan-assembler-times {vnmls\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } } */
+
diff --git a/gcc/testsuite/gcc.target/arm/attr-fp16-arith-1.c b/gcc/testsuite/gcc.target/arm/attr-fp16-arith-1.c
index 5011315..a93d30f 100644
--- a/gcc/testsuite/gcc.target/arm/attr-fp16-arith-1.c
+++ b/gcc/testsuite/gcc.target/arm/attr-fp16-arith-1.c
@@ -28,6 +28,19 @@
 #error Invalid value for __ARM_FP
 #endif
 
+#include "arm_neon.h"
+
+float16_t
+foo (float16x4_t b)
+{
+  float16x4_t a = {2.0, 3.0, 4.0, 5.0};
+  float16x4_t res = vadd_f16 (a, b);
+
+  return res[0];
+}
+
+/* { dg-final { scan-assembler "vadd\\.f16\td\[0-9\]+, d\[0-9\]+" } } */
+
 #pragma GCC pop_options
 
 /* Check that the FP version is correctly reset to mfpu=fp-armv8.  */
-- 
2.1.4


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 16/17][ARM] Add tests for VFP FP16 ACLE instrinsics.
  2016-05-18 10:58     ` Matthew Wahab
@ 2016-07-04 14:18       ` Matthew Wahab
  2016-08-04  8:35         ` Ramana Radhakrishnan
  0 siblings, 1 reply; 73+ messages in thread
From: Matthew Wahab @ 2016-07-04 14:18 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 5548 bytes --]

On 18/05/16 11:58, Matthew Wahab wrote:
 > On 18/05/16 02:06, Joseph Myers wrote:
 >> On Tue, 17 May 2016, Matthew Wahab wrote:
 >>
 >>> In some tests, there are unavoidable differences in precision when
 >>> calculating the actual and the expected results of an FP16 operation. A
 >>> new support function CHECK_FP_BIAS is used so that these tests can check
 >>> for an acceptable margin of error. In these tests, the tolerance is
 >>> given as the absolute integer difference between the bitvectors of the
 >>> expected and the actual results.
 >>
 >> As far as I can see, CHECK_FP_BIAS is only used in the following patch,
 >> but there is another bias test in vsqrth_f16_1.c in this patch.
 >
 > This is my mistake, the CHECK_FP_BIAS is used for the NEON tests and should
 >  have gone into that patch. The VFP test can do a simpler check so doesn't
 > need the macro.
 >
 >> Could you clarify where the "unavoidable differences in precision" come
 >> from? Are the results of some of the new instructions not fully specified,
 >> only specified within a given precision?  (As far as I can tell the
 >> existing v8 instructions for reciprocal and reciprocal square root
 >> estimates do have fully defined results, despite being loosely described
 >> as esimtates.)
 >
 > The expected results in the new tests are represented as expressions whose
 > value is expected to be calculated at compile-time. This makes the tests
 > more readable but differences in the precision between the the compiler and
 > the HW calculations mean that for vrecpe_f16, vrecps_f16, vrsqrts_f16 and
 > vsqrth_f16_1.c the expected and actual results are different.
 >
 > On reflection, it may be better to remove the CHECK_FP_BIAS macro and, for
 > the tests that needed it, to drop the compiler calculation and just use the
 >  expected hexadecimal value.
 >
 > Other tests depending on compiler-time calculations involve relatively
 > simple arithmetic operations and it's not clear if they are susceptible to
 > the same rounding errors. I have limited knowledge in FP arithmetic though
 > so I'll look into this.

The scalar tests added in this patch and the vector tests added in the
next patch have been reworked to use the exact values for the expected
results rather than compile-time expressions. The CHECK_FP_BIAS macro is
not used and is removed from this patch.

The intention with these tests and with the vector tests is to check
that the compiler emits code that produces the same results as the
instruction regardless of any optimizations that it may apply. The
expected results for the tests were produced using inline assembler
taking the same inputs as the intrinsics being tested.

Other changes are to add and use some (limited) templates for scalar
operations and to add progress and error reporting, making the scalar
tests more consistent with those for the vector operations.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator.

Ok for trunk?
Matthew

testsuite/
2016-07-04  Jiong Wang  <jiong.wang@arm.com>
	    Matthew Wahab  <matthew.wahab@arm.com>

	* gcc.target/aarch64/advsimd-intrinsics/binary_scalar_op.inc: New.
	* gcc.target/aarch64/advsimd-intrinsics/unary_scalar_op.inc: New.
	* gcc.target/aarch64/advsimd-intrinsics/ternary_scalar_op.inc: New.
	* gcc.target/aarch64/advsimd-intrinsics/vabsh_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vaddh_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvtah_s32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvtah_u32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_s32_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_u32_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_s32_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_u32_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvth_n_s32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvth_n_u32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvth_s32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvth_u32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvtmh_s32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvtmh_u32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvtnh_s32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvtnh_u32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvtph_s32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvtph_u32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vdivh_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vfmah_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vfmsh_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vmaxnmh_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vminnmh_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vmulh_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vnegh_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vrndah_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vrndh_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vrndih_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vrndmh_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vrndnh_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vrndph_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vrndxh_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vsqrth_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vsubh_f16_1.c: New.


[-- Attachment #2: 0016-PATCH-16-17-ARM-Add-tests-for-VFP-FP16-ACLE-instrins.patch --]
[-- Type: text/x-patch, Size: 70092 bytes --]

From a2ad09ded8e34f4e42b7bba1c04c8b2d0f5c9fdd Mon Sep 17 00:00:00 2001
From: Matthew Wahab <matthew.wahab@arm.com>
Date: Thu, 7 Apr 2016 15:40:52 +0100
Subject: [PATCH 16/17] [PATCH 16/17][ARM] Add tests for VFP FP16 ACLE
 instrinsics.

testsuite/
2016-07-04  Jiong Wang  <jiong.wang@arm.com>
	    Matthew Wahab  <matthew.wahab@arm.com>

	* gcc.target/aarch64/advsimd-intrinsics/binary_scalar_op.inc: New.
	* gcc.target/aarch64/advsimd-intrinsics/unary_scalar_op.inc: New.
	* gcc.target/aarch64/advsimd-intrinsics/ternary_scalar_op.inc: New.
	* gcc.target/aarch64/advsimd-intrinsics/vabsh_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vaddh_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvtah_s32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvtah_u32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_s32_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_u32_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_s32_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_u32_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvth_n_s32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvth_n_u32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvth_s32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvth_u32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvtmh_s32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvtmh_u32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvtnh_s32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvtnh_u32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvtph_s32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvtph_u32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vdivh_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vfmah_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vfmsh_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vmaxnmh_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vminnmh_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vmulh_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vnegh_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vrndah_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vrndh_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vrndih_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vrndmh_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vrndnh_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vrndph_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vrndxh_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vsqrth_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vsubh_f16_1.c: New.
---
 .../advsimd-intrinsics/binary_scalar_op.inc        | 160 ++++++++++++++++
 .../advsimd-intrinsics/ternary_scalar_op.inc       | 206 +++++++++++++++++++++
 .../aarch64/advsimd-intrinsics/unary_scalar_op.inc | 199 ++++++++++++++++++++
 .../aarch64/advsimd-intrinsics/vabsh_f16_1.c       |  40 ++++
 .../aarch64/advsimd-intrinsics/vaddh_f16_1.c       |  40 ++++
 .../aarch64/advsimd-intrinsics/vcvtah_s32_f16_1.c  |  53 ++++++
 .../aarch64/advsimd-intrinsics/vcvtah_u32_f16_1.c  |  53 ++++++
 .../aarch64/advsimd-intrinsics/vcvth_f16_s32_1.c   |  52 ++++++
 .../aarch64/advsimd-intrinsics/vcvth_f16_u32_1.c   |  52 ++++++
 .../aarch64/advsimd-intrinsics/vcvth_n_f16_s32_1.c |  99 ++++++++++
 .../aarch64/advsimd-intrinsics/vcvth_n_f16_u32_1.c |  99 ++++++++++
 .../aarch64/advsimd-intrinsics/vcvth_n_s32_f16_1.c | 100 ++++++++++
 .../aarch64/advsimd-intrinsics/vcvth_n_u32_f16_1.c | 100 ++++++++++
 .../aarch64/advsimd-intrinsics/vcvth_s32_f16_1.c   |  53 ++++++
 .../aarch64/advsimd-intrinsics/vcvth_u32_f16_1.c   |  53 ++++++
 .../aarch64/advsimd-intrinsics/vcvtmh_s32_f16_1.c  |  53 ++++++
 .../aarch64/advsimd-intrinsics/vcvtmh_u32_f16_1.c  |  53 ++++++
 .../aarch64/advsimd-intrinsics/vcvtnh_s32_f16_1.c  |  53 ++++++
 .../aarch64/advsimd-intrinsics/vcvtnh_u32_f16_1.c  |  53 ++++++
 .../aarch64/advsimd-intrinsics/vcvtph_s32_f16_1.c  |  53 ++++++
 .../aarch64/advsimd-intrinsics/vcvtph_u32_f16_1.c  |  53 ++++++
 .../aarch64/advsimd-intrinsics/vdivh_f16_1.c       |  42 +++++
 .../aarch64/advsimd-intrinsics/vfmah_f16_1.c       |  40 ++++
 .../aarch64/advsimd-intrinsics/vfmsh_f16_1.c       |  40 ++++
 .../aarch64/advsimd-intrinsics/vmaxnmh_f16_1.c     |  42 +++++
 .../aarch64/advsimd-intrinsics/vminnmh_f16_1.c     |  42 +++++
 .../aarch64/advsimd-intrinsics/vmulh_f16_1.c       |  42 +++++
 .../aarch64/advsimd-intrinsics/vnegh_f16_1.c       |  39 ++++
 .../aarch64/advsimd-intrinsics/vrndah_f16_1.c      |  40 ++++
 .../aarch64/advsimd-intrinsics/vrndh_f16_1.c       |  40 ++++
 .../aarch64/advsimd-intrinsics/vrndih_f16_1.c      |  40 ++++
 .../aarch64/advsimd-intrinsics/vrndmh_f16_1.c      |  40 ++++
 .../aarch64/advsimd-intrinsics/vrndnh_f16_1.c      |  40 ++++
 .../aarch64/advsimd-intrinsics/vrndph_f16_1.c      |  40 ++++
 .../aarch64/advsimd-intrinsics/vrndxh_f16_1.c      |  40 ++++
 .../aarch64/advsimd-intrinsics/vsqrth_f16_1.c      |  40 ++++
 .../aarch64/advsimd-intrinsics/vsubh_f16_1.c       |  42 +++++
 37 files changed, 2326 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/binary_scalar_op.inc
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/ternary_scalar_op.inc
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/unary_scalar_op.inc
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vabsh_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vaddh_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtah_s32_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtah_u32_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_s32_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_u32_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_s32_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_u32_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_s32_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_u32_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_s32_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_u32_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtmh_s32_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtmh_u32_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtnh_s32_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtnh_u32_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtph_s32_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtph_u32_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vdivh_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfmah_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfmsh_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmaxnmh_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vminnmh_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulh_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vnegh_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndah_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndh_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndih_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndmh_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndnh_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndph_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndxh_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vsqrth_f16_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vsubh_f16_1.c

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/binary_scalar_op.inc b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/binary_scalar_op.inc
new file mode 100644
index 0000000..55dedd4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/binary_scalar_op.inc
@@ -0,0 +1,160 @@
+/* Template file for binary scalar operator validation.
+
+   This file is meant to be included by test files for binary scalar
+   operations.  */
+
+/* Check for required settings.  */
+
+#ifndef INSN_NAME
+#error INSN_NAME (the intrinsic to test) must be defined.
+#endif
+
+#ifndef INPUT_TYPE
+#error INPUT_TYPE (basic type of an input value) must be defined.
+#endif
+
+#ifndef OUTPUT_TYPE
+#error OUTPUT_TYPE (basic type of an output value) must be defined.
+#endif
+
+#ifndef OUTPUT_TYPE_SIZE
+#error OUTPUT_TYPE_SIZE (size in bits of an output value) must be defined.
+#endif
+
+/* Optional settings:
+
+   INPUT_1: Input values for the first parameter.  Must be of type INPUT_TYPE.
+   INPUT_2: Input values for the first parameter.  Must be of type
+   INPUT_TYPE.  */
+
+#ifndef TEST_MSG
+#define TEST_MSG "unnamed test"
+#endif
+
+/* The test framework.  */
+
+#include <stdio.h>
+
+extern void abort ();
+
+#define INFF __builtin_inf ()
+
+/* Stringify a macro.  */
+#define STR0(A) #A
+#define STR(A) STR0 (A)
+
+/* Macro concatenation.  */
+#define CAT0(A, B) A##B
+#define CAT(A, B) CAT0 (A, B)
+
+/* Format strings for error reporting.  */
+#define FMT16 "0x%04x"
+#define FMT32 "0x%08x"
+#define FMT CAT (FMT,OUTPUT_TYPE_SIZE)
+
+/* Type construction: forms TS_t, where T is the base type and S the size in
+   bits.  */
+#define MK_TYPE0(T, S) T##S##_t
+#define MK_TYPE(T, S) MK_TYPE0 (T, S)
+
+/* Convenience types for input and output data.  */
+typedef MK_TYPE (uint, OUTPUT_TYPE_SIZE) output_hex_type;
+
+/* Conversion between typed values and their hexadecimal representation.  */
+typedef union
+{
+  OUTPUT_TYPE value;
+  output_hex_type hex;
+} output_conv_type;
+
+/* Default input values.  */
+
+float16_t input_1_float16_t[] =
+{
+  0.0, -0.0,
+  2.0, 3.1,
+  20.0, 0.40,
+  -2.3, 1.33,
+  -7.6, 0.31,
+  0.3353, 0.5,
+  1.0, 13.13,
+  -6.3, 20.0,
+  (float16_t)INFF, (float16_t)-INFF,
+};
+
+float16_t input_2_float16_t[] =
+{
+  1.0, 1.0,
+  -4.33, 100.0,
+  30.0, -0.02,
+  0.5, -7.231,
+  -6.3, 20.0,
+  -7.231, 2.3,
+  -7.6, 5.1,
+  0.31, 0.33353,
+  (float16_t)-INFF, (float16_t)INFF,
+};
+
+#ifndef INPUT_1
+#define INPUT_1 CAT (input_1_,INPUT_TYPE)
+#endif
+
+#ifndef INPUT_2
+#define INPUT_2 CAT (input_2_,INPUT_TYPE)
+#endif
+
+/* Support macros and routines for the test function.  */
+
+#define CHECK()						\
+  {								\
+    output_conv_type actual;					\
+    output_conv_type expect;					\
+								\
+    expect.hex = ((output_hex_type*)EXPECTED)[index];		\
+    actual.value = INSN_NAME ((INPUT_1)[index],			\
+			      (INPUT_2)[index]);		\
+								\
+    if (actual.hex != expect.hex)				\
+      {								\
+	fprintf (stderr,					\
+		 "ERROR in %s (%s line %d), buffer %s, "	\
+		 "index %d: got "				\
+		 FMT " != " FMT "\n",				\
+		 TEST_MSG, __FILE__, __LINE__,			\
+		 STR (EXPECTED), index,				\
+		 actual.hex, expect.hex);			\
+	abort ();						\
+      }								\
+    fprintf (stderr, "CHECKED %s %s\n",				\
+	     STR (EXPECTED), TEST_MSG);				\
+  }
+
+#define FNNAME1(NAME) exec_ ## NAME
+#define FNNAME(NAME) FNNAME1 (NAME)
+
+/* The test function.  */
+
+void
+FNNAME (INSN_NAME) (void)
+{
+  /* Basic test: y[i] = OP (x[i]), for each INPUT[i], then compare the result
+     against EXPECTED[i].  */
+
+  const int num_tests = sizeof (INPUT_1) / sizeof (INPUT_1[0]);
+  int index;
+
+  for (index = 0; index < num_tests; index++)
+    CHECK ();
+
+#ifdef EXTRA_TESTS
+  EXTRA_TESTS ();
+#endif
+}
+
+int
+main (void)
+{
+  FNNAME (INSN_NAME) ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/ternary_scalar_op.inc b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/ternary_scalar_op.inc
new file mode 100644
index 0000000..4765091
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/ternary_scalar_op.inc
@@ -0,0 +1,206 @@
+/* Template file for ternary scalar operator validation.
+
+   This file is meant to be included by test files for binary scalar
+   operations.  */
+
+/* Check for required settings.  */
+
+#ifndef INSN_NAME
+#error INSN_NAME (the intrinsic to test) must be defined.
+#endif
+
+#ifndef INPUT_TYPE
+#error INPUT_TYPE (basic type of an input value) must be defined.
+#endif
+
+#ifndef OUTPUT_TYPE
+#error OUTPUT_TYPE (basic type of an output value) must be defined.
+#endif
+
+#ifndef OUTPUT_TYPE_SIZE
+#error OUTPUT_TYPE_SIZE (size in bits of an output value) must be defined.
+#endif
+
+/* Optional settings:
+
+   INPUT_1: Input values for the first parameter.  Must be of type INPUT_TYPE.
+   INPUT_2: Input values for the second parameter.  Must be of type INPUT_TYPE.
+   INPUT_3: Input values for the third parameter.  Must be of type
+   INPUT_TYPE.  */
+
+#ifndef TEST_MSG
+#define TEST_MSG "unnamed test"
+#endif
+
+/* The test framework.  */
+
+#include <stdio.h>
+
+extern void abort ();
+
+#define INFF __builtin_inf ()
+
+/* Stringify a macro.  */
+#define STR0(A) #A
+#define STR(A) STR0 (A)
+
+/* Macro concatenation.  */
+#define CAT0(A, B) A##B
+#define CAT(A, B) CAT0 (A, B)
+
+/* Format strings for error reporting.  */
+#define FMT16 "0x%04x"
+#define FMT32 "0x%08x"
+#define FMT CAT (FMT,OUTPUT_TYPE_SIZE)
+
+/* Type construction: forms TS_t, where T is the base type and S the size in
+   bits.  */
+#define MK_TYPE0(T, S) T##S##_t
+#define MK_TYPE(T, S) MK_TYPE0 (T, S)
+
+/* Convenience types for input and output data.  */
+typedef MK_TYPE (uint, OUTPUT_TYPE_SIZE) output_hex_type;
+
+/* Conversion between typed values and their hexadecimal representation.  */
+typedef union
+{
+  OUTPUT_TYPE value;
+  output_hex_type hex;
+} output_conv_type;
+
+/* Default input values.  */
+
+float16_t input_1_float16_t[] =
+{
+  0.0,
+  -0.0,
+  2.0,
+  3.1,
+  20.0,
+  0.40,
+  -2.3,
+  1.33,
+  -7.6,
+  0.31,
+  0.3353,
+  0.5,
+  1.0,
+  13.13,
+  -6.3,
+  20.0,
+  (float16_t)INFF,
+  (float16_t)-INFF,
+};
+
+float16_t input_2_float16_t[] =
+{
+  1.0,
+  1.0,
+  -4.33,
+  100.0,
+  30.0,
+  -0.02,
+  0.5,
+  -7.231,
+  -6.3,
+  20.0,
+  -7.231,
+  2.3,
+  -7.6,
+  5.1,
+  0.31,
+  0.33353,
+  (float16_t)-INFF,
+  (float16_t)INFF,
+};
+
+float16_t input_3_float16_t[] =
+{
+  -0.0,
+  0.0,
+  0.31,
+  -0.31,
+  1.31,
+  2.1,
+  -6.3,
+  1.0,
+  -1.5,
+  5.1,
+  0.3353,
+  9.3,
+  -9.3,
+  -7.231,
+  0.5,
+  -0.33,
+  (float16_t)INFF,
+  (float16_t)INFF,
+};
+
+#ifndef INPUT_1
+#define INPUT_1 CAT (input_1_,INPUT_TYPE)
+#endif
+
+#ifndef INPUT_2
+#define INPUT_2 CAT (input_2_,INPUT_TYPE)
+#endif
+
+#ifndef INPUT_3
+#define INPUT_3 CAT (input_3_,INPUT_TYPE)
+#endif
+
+/* Support macros and routines for the test function.  */
+
+#define CHECK()							\
+  {								\
+    output_conv_type actual;					\
+    output_conv_type expect;					\
+								\
+    expect.hex = ((output_hex_type*)EXPECTED)[index];		\
+    actual.value = INSN_NAME ((INPUT_1)[index],			\
+			      (INPUT_2)[index],			\
+			      (INPUT_3)[index]);		\
+								\
+    if (actual.hex != expect.hex)				\
+      {								\
+	fprintf (stderr,					\
+		 "ERROR in %s (%s line %d), buffer %s, "	\
+		 "index %d: got "				\
+		 FMT " != " FMT "\n",				\
+		 TEST_MSG, __FILE__, __LINE__,			\
+		 STR (EXPECTED), index,				\
+		 actual.hex, expect.hex);			\
+	abort ();						\
+      }								\
+    fprintf (stderr, "CHECKED %s %s\n",				\
+	     STR (EXPECTED), TEST_MSG);				\
+  }
+
+#define FNNAME1(NAME) exec_ ## NAME
+#define FNNAME(NAME) FNNAME1 (NAME)
+
+/* The test function.  */
+
+void
+FNNAME (INSN_NAME) (void)
+{
+  /* Basic test: y[i] = OP (x[i]), for each INPUT[i], then compare the result
+     against EXPECTED[i].  */
+
+  const int num_tests = sizeof (INPUT_1) / sizeof (INPUT_1[0]);
+  int index;
+
+  for (index = 0; index < num_tests; index++)
+    CHECK ();
+
+#ifdef EXTRA_TESTS
+  EXTRA_TESTS ();
+#endif
+}
+
+int
+main (void)
+{
+  FNNAME (INSN_NAME) ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/unary_scalar_op.inc b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/unary_scalar_op.inc
new file mode 100644
index 0000000..86403d2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/unary_scalar_op.inc
@@ -0,0 +1,199 @@
+/* Template file for unary scalar operator validation.
+
+   This file is meant to be included by test files for unary scalar
+   operations.  */
+
+/* Check for required settings.  */
+
+#ifndef INSN_NAME
+#error INSN_NAME (the intrinsic to test) must be defined.
+#endif
+
+#ifndef INPUT_TYPE
+#error INPUT_TYPE (basic type of an input value) must be defined.
+#endif
+
+#ifndef SCALAR_OPERANDS
+#ifndef EXPECTED
+#error EXPECTED (an array of expected output values) must be defined.
+#endif
+#endif
+
+#ifndef OUTPUT_TYPE
+#error OUTPUT_TYPE (basic type of an output value) must be defined.
+#endif
+
+#ifndef OUTPUT_TYPE_SIZE
+#error OUTPUT_TYPE_SIZE (size in bits of an output value) must be defined.
+#endif
+
+/* Optional settings.  */
+
+/* SCALAR_OPERANDS: Defined iff the intrinsic has a scalar operand.
+
+   SCALAR_1, SCALAR_2, .., SCALAR_4: If SCALAR_OPERANDS is defined, SCALAR_<n>
+   is the scalar and EXPECTED_<n> is array of expected values.
+
+   INPUT: Input values for the first parameter.  Must be of type INPUT_TYPE.  */
+
+/* Additional comments for the error message.  */
+#ifndef COMMENT
+#define COMMENT ""
+#endif
+
+#ifndef TEST_MSG
+#define TEST_MSG "unnamed test"
+#endif
+
+/* The test framework.  */
+
+#include <stdio.h>
+
+extern void abort ();
+
+#define INFF __builtin_inf ()
+
+/* Stringify a macro.  */
+#define STR0(A) #A
+#define STR(A) STR0 (A)
+
+/* Macro concatenation.  */
+#define CAT0(A, B) A##B
+#define CAT(A, B) CAT0 (A, B)
+
+/* Format strings for error reporting.  */
+#define FMT16 "0x%04x"
+#define FMT32 "0x%08x"
+#define FMT CAT (FMT,OUTPUT_TYPE_SIZE)
+
+/* Type construction: forms TS_t, where T is the base type and S the size in
+   bits.  */
+#define MK_TYPE0(T, S) T##S##_t
+#define MK_TYPE(T, S) MK_TYPE0 (T, S)
+
+/* Convenience types for input and output data.  */
+typedef MK_TYPE (uint, OUTPUT_TYPE_SIZE) output_hex_type;
+
+/* Conversion between typed values and their hexadecimal representation.  */
+typedef union
+{
+  OUTPUT_TYPE value;
+  output_hex_type hex;
+} output_conv_type;
+
+/* Default input values.  */
+
+float16_t input_1_float16_t[] =
+{
+  0.0, -0.0,
+  2.0, 3.1,
+  20.0, 0.40,
+  -2.3, 1.33,
+  -7.6, 0.31,
+  0.3353, 0.5,
+  1.0, 13.13,
+  -6.3, 20.0,
+  (float16_t)INFF, (float16_t)-INFF,
+};
+
+#ifndef INPUT
+#define INPUT CAT(input_1_,INPUT_TYPE)
+#endif
+
+/* Support macros and routines for the test function.  */
+
+#define CHECK()							\
+  {								\
+    output_conv_type actual;					\
+    output_conv_type expect;					\
+								\
+    expect.hex = ((output_hex_type*)EXPECTED)[index];		\
+    actual.value = INSN_NAME ((INPUT)[index]);			\
+								\
+    if (actual.hex != expect.hex)				\
+      {								\
+	fprintf (stderr,					\
+		 "ERROR in %s (%s line %d), buffer %s, "	\
+		 "index %d: got "				\
+		 FMT " != " FMT "\n",				\
+		 TEST_MSG, __FILE__, __LINE__,			\
+		 STR (EXPECTED), index,				\
+		 actual.hex, expect.hex);			\
+	abort ();						\
+      }								\
+    fprintf (stderr, "CHECKED %s %s\n",				\
+	     STR (EXPECTED), TEST_MSG);				\
+  }
+
+#define CHECK_N(SCALAR, EXPECTED)				\
+  {								\
+    output_conv_type actual;					\
+    output_conv_type expect;					\
+								\
+    expect.hex							\
+      = ((output_hex_type*)EXPECTED)[index];			\
+    actual.value = INSN_NAME ((INPUT)[index], (SCALAR));	\
+								\
+    if (actual.hex != expect.hex)				\
+      {								\
+	fprintf (stderr,					\
+		 "ERROR in %s (%s line %d), buffer %s, "	\
+		 "index %d: got "				\
+		 FMT " != " FMT "\n",				\
+		 TEST_MSG, __FILE__, __LINE__,			\
+		 STR (EXPECTED), index,				\
+		 actual.hex, expect.hex);			\
+	abort ();						\
+      }								\
+    fprintf (stderr, "CHECKED %s %s\n",				\
+	     STR (EXPECTED), TEST_MSG);				\
+  }
+
+#define FNNAME1(NAME) exec_ ## NAME
+#define FNNAME(NAME) FNNAME1 (NAME)
+
+/* The test function.  */
+
+void
+FNNAME (INSN_NAME) (void)
+{
+  /* Basic test: y[i] = OP (x[i]), for each INPUT[i], then compare the result
+     against EXPECTED[i].  */
+
+  const int num_tests = sizeof (INPUT) / sizeof (INPUT[0]);
+  int index;
+
+  for (index = 0; index < num_tests; index++)
+    {
+#if defined (SCALAR_OPERANDS)
+
+#ifdef SCALAR_1
+      CHECK_N (SCALAR_1, EXPECTED_1);
+#endif
+#ifdef SCALAR_2
+      CHECK_N (SCALAR_2, EXPECTED_2);
+#endif
+#ifdef SCALAR_3
+      CHECK_N (SCALAR_3, EXPECTED_3);
+#endif
+#ifdef SCALAR_4
+      CHECK_N (SCALAR_4, EXPECTED_4);
+#endif
+
+#else /* !defined (SCALAR_OPERAND).  */
+      CHECK ();
+#endif
+    }
+
+#ifdef EXTRA_TESTS
+  EXTRA_TESTS ();
+#endif
+}
+
+int
+main (void)
+{
+  FNNAME (INSN_NAME) ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vabsh_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vabsh_f16_1.c
new file mode 100644
index 0000000..16a986a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vabsh_f16_1.c
@@ -0,0 +1,40 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+/* Expected results (16-bit hexadecimal representation).  */
+uint16_t expected[] =
+{
+  0x0000 /* 0.000000 */,
+  0x0000 /* 0.000000 */,
+  0x4000 /* 2.000000 */,
+  0x4233 /* 3.099609 */,
+  0x4d00 /* 20.000000 */,
+  0x3666 /* 0.399902 */,
+  0x409a /* 2.300781 */,
+  0x3d52 /* 1.330078 */,
+  0x479a /* 7.601562 */,
+  0x34f6 /* 0.310059 */,
+  0x355d /* 0.335205 */,
+  0x3800 /* 0.500000 */,
+  0x3c00 /* 1.000000 */,
+  0x4a91 /* 13.132812 */,
+  0x464d /* 6.300781 */,
+  0x4d00 /* 20.000000 */,
+  0x7c00 /* inf */,
+  0x7c00 /* inf */
+};
+
+#define TEST_MSG "VABSH_F16"
+#define INSN_NAME vabsh_f16
+
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE float16_t
+#define OUTPUT_TYPE_SIZE 16
+
+/* Include the template for unary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vaddh_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vaddh_f16_1.c
new file mode 100644
index 0000000..4b0e242
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vaddh_f16_1.c
@@ -0,0 +1,40 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+/* Expected results (16-bit hexadecimal representation).  */
+uint16_t expected[] =
+{
+  0x3c00 /* 1.000000 */,
+  0x3c00 /* 1.000000 */,
+  0xc0a8 /* -2.328125 */,
+  0x5672 /* 103.125000 */,
+  0x5240 /* 50.000000 */,
+  0x3614 /* 0.379883 */,
+  0xbf34 /* -1.800781 */,
+  0xc5e6 /* -5.898438 */,
+  0xcaf4 /* -13.906250 */,
+  0x4d14 /* 20.312500 */,
+  0xc6e5 /* -6.894531 */,
+  0x419a /* 2.800781 */,
+  0xc69a /* -6.601562 */,
+  0x4c8f /* 18.234375 */,
+  0xc5fe /* -5.992188 */,
+  0x4d15 /* 20.328125 */,
+  0x7e00 /* nan */,
+  0x7e00 /* nan */,
+};
+
+#define TEST_MSG "VADDH_F16"
+#define INSN_NAME vaddh_f16
+
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE float16_t
+#define OUTPUT_TYPE_SIZE 16
+
+/* Include the template for binary scalar operations.  */
+#include "binary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtah_s32_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtah_s32_f16_1.c
new file mode 100644
index 0000000..ebfd62a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtah_s32_f16_1.c
@@ -0,0 +1,53 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+/* Input values.  */
+float16_t input[] =
+{
+  0.0, -0.0,
+  123.4, -567.8,
+  -34.8, 1024,
+  663.1, 169.1,
+  -4.8, 77.0,
+  -144.5, -56.8,
+
+  (float16_t) -16, (float16_t) -15,
+  (float16_t) -14, (float16_t) -13,
+};
+
+/* Expected results (32-bit hexadecimal representation).  */
+uint32_t expected[] =
+{
+  0x00000000,
+  0x00000000,
+  0x0000007b,
+  0xfffffdc8,
+  0xffffffdd,
+  0x00000400,
+  0x00000297,
+  0x000000a9,
+  0xfffffffb,
+  0x0000004d,
+  0xffffff6f,
+  0xffffffc7,
+  0xfffffff0,
+  0xfffffff1,
+  0xfffffff2,
+  0xfffffff3
+};
+
+#define TEST_MSG "VCVTAH_S32_F16"
+#define INSN_NAME vcvtah_s32_f16
+
+#define INPUT input
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE int32_t
+#define OUTPUT_TYPE_SIZE 32
+
+/* Include the template for unary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtah_u32_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtah_u32_f16_1.c
new file mode 100644
index 0000000..5ae28fc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtah_u32_f16_1.c
@@ -0,0 +1,53 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+/* Input values.  */
+float16_t input[] =
+{
+  0.0, -0.0,
+  123.4, -567.8,
+  -34.8, 1024,
+  663.1, 169.1,
+  -4.8, 77.0,
+  -144.5, -56.8,
+
+  (float16_t) -16, (float16_t) -15,
+  (float16_t) -14, (float16_t) -13,
+};
+
+/* Expected results (32-bit hexadecimal representation).  */
+uint32_t expected[] =
+{
+  0x00000000,
+  0x00000000,
+  0x0000007b,
+  0x00000000,
+  0x00000000,
+  0x00000400,
+  0x00000297,
+  0x000000a9,
+  0x00000000,
+  0x0000004d,
+  0x00000000,
+  0x00000000,
+  0x00000000,
+  0x00000000,
+  0x00000000,
+  0x00000000
+};
+
+#define TEST_MSG "VCVTAH_U32_F16"
+#define INSN_NAME vcvtah_u32_f16
+
+#define INPUT input
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE uint32_t
+#define OUTPUT_TYPE_SIZE 32
+
+/* Include the template for unary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_s32_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_s32_1.c
new file mode 100644
index 0000000..2173a0e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_s32_1.c
@@ -0,0 +1,52 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+/* Input values.  */
+uint32_t input[] =
+{
+  0, -0,
+  123, -567,
+  -34, 1024,
+  -63, 169,
+  -4, 77,
+  -144, -56,
+  -16, -15,
+  -14, -13,
+};
+
+/* Expected results (16-bit hexadecimal representation).  */
+uint16_t expected[] =
+{
+  0x0000 /* 0.000000 */,
+  0x0000 /* 0.000000 */,
+  0x57b0 /* 123.000000 */,
+  0xe06e /* -567.000000 */,
+  0xd040 /* -34.000000 */,
+  0x6400 /* 1024.000000 */,
+  0xd3e0 /* -63.000000 */,
+  0x5948 /* 169.000000 */,
+  0xc400 /* -4.000000 */,
+  0x54d0 /* 77.000000 */,
+  0xd880 /* -144.000000 */,
+  0xd300 /* -56.000000 */,
+  0xcc00 /* -16.000000 */,
+  0xcb80 /* -15.000000 */,
+  0xcb00 /* -14.000000 */,
+  0xca80 /* -13.000000 */
+};
+
+#define TEST_MSG "VCVTH_F16_S32"
+#define INSN_NAME vcvth_f16_s32
+
+#define INPUT input
+#define EXPECTED expected
+
+#define INPUT_TYPE uint32_t
+#define OUTPUT_TYPE float16_t
+#define OUTPUT_TYPE_SIZE 16
+
+/* Include the template for unary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_u32_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_u32_1.c
new file mode 100644
index 0000000..1583202
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_u32_1.c
@@ -0,0 +1,52 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+/* Input values.  */
+int32_t input[] =
+{
+  0, -0,
+  123, -567,
+  -34, 1024,
+  -63, 169,
+  -4, 77,
+  -144, -56,
+  -16, -15,
+  -14, -13,
+};
+
+/* Expected results (16-bit hexadecimal representation).  */
+uint16_t expected[] =
+{
+  0x0000 /* 0.000000 */,
+  0x0000 /* 0.000000 */,
+  0x57b0 /* 123.000000 */,
+  0x7c00 /* inf */,
+  0x7c00 /* inf */,
+  0x6400 /* 1024.000000 */,
+  0x7c00 /* inf */,
+  0x5948 /* 169.000000 */,
+  0x7c00 /* inf */,
+  0x54d0 /* 77.000000 */,
+  0x7c00 /* inf */,
+  0x7c00 /* inf */,
+  0x7c00 /* inf */,
+  0x7c00 /* inf */,
+  0x7c00 /* inf */,
+  0x7c00 /* inf */
+};
+
+#define TEST_MSG "VCVTH_F16_U32"
+#define INSN_NAME vcvth_f16_u32
+
+#define INPUT input
+#define EXPECTED expected
+
+#define INPUT_TYPE int32_t
+#define OUTPUT_TYPE float16_t
+#define OUTPUT_TYPE_SIZE 16
+
+/* Include the template for unary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_s32_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_s32_1.c
new file mode 100644
index 0000000..9ce9558
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_s32_1.c
@@ -0,0 +1,99 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+/* Input values.  */
+uint32_t input[] =
+{
+  0, -0,
+  123, -567,
+  -34, 1024,
+  -63, 169,
+  -4, 77,
+  -144, -56,
+  -16, -15,
+  -14, -13,
+};
+
+/* Expected results (16-bit hexadecimal representation).  */
+uint16_t expected_1[] =
+{
+  0x0000 /* 0.000000 */,
+  0x0000 /* 0.000000 */,
+  0x53b0 /* 61.500000 */,
+  0xdc6e /* -283.500000 */,
+  0xcc40 /* -17.000000 */,
+  0x6000 /* 512.000000 */,
+  0xcfe0 /* -31.500000 */,
+  0x5548 /* 84.500000 */,
+  0xc000 /* -2.000000 */,
+  0x50d0 /* 38.500000 */,
+  0xd480 /* -72.000000 */,
+  0xcf00 /* -28.000000 */,
+  0xc800 /* -8.000000 */,
+  0xc780 /* -7.500000 */,
+  0xc700 /* -7.000000 */,
+  0xc680 /* -6.500000 */
+};
+
+uint16_t expected_2[] =
+{
+  0x0000 /* 0.000000 */,
+  0x0000 /* 0.000000 */,
+  0x4fb0 /* 30.750000 */,
+  0xd86e /* -141.750000 */,
+  0xc840 /* -8.500000 */,
+  0x5c00 /* 256.000000 */,
+  0xcbe0 /* -15.750000 */,
+  0x5148 /* 42.250000 */,
+  0xbc00 /* -1.000000 */,
+  0x4cd0 /* 19.250000 */,
+  0xd080 /* -36.000000 */,
+  0xcb00 /* -14.000000 */,
+  0xc400 /* -4.000000 */,
+  0xc380 /* -3.750000 */,
+  0xc300 /* -3.500000 */,
+  0xc280 /* -3.250000 */
+};
+
+uint16_t expected_3[] =
+{
+ 0x0000 /* 0.000000 */,
+ 0x0000 /* 0.000000 */,
+ 0x0000 /* 0.000000 */,
+ 0x8002 /* -0.000000 */,
+ 0x8000 /* -0.000000 */,
+ 0x0004 /* 0.000000 */,
+ 0x8000 /* -0.000000 */,
+ 0x0001 /* 0.000000 */,
+ 0x8000 /* -0.000000 */,
+ 0x0000 /* 0.000000 */,
+ 0x8001 /* -0.000000 */,
+ 0x8000 /* -0.000000 */,
+ 0x8000 /* -0.000000 */,
+ 0x8000 /* -0.000000 */,
+ 0x8000 /* -0.000000 */,
+ 0x8000 /* -0.000000 */
+};
+
+#define TEST_MSG "VCVTH_N_F16_S32"
+#define INSN_NAME vcvth_n_f16_s32
+
+#define INPUT input
+#define EXPECTED_1 expected_1
+#define EXPECTED_2 expected_2
+#define EXPECTED_3 expected_3
+
+#define INPUT_TYPE int32_t
+#define OUTPUT_TYPE float16_t
+#define OUTPUT_TYPE_SIZE 16
+
+#define SCALAR_OPERANDS
+#define SCALAR_1 1
+#define SCALAR_2 2
+#define SCALAR_3 32
+
+/* Include the template for unary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_u32_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_u32_1.c
new file mode 100644
index 0000000..d308c35
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_u32_1.c
@@ -0,0 +1,99 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+/* Input values.  */
+uint32_t input[] =
+{
+  0, -0,
+  123, -567,
+  -34, 1024,
+  -63, 169,
+  -4, 77,
+  -144, -56,
+  -16, -15,
+  -14, -13,
+};
+
+/* Expected results (16-bit hexadecimal representation).  */
+uint16_t expected_1[] =
+{
+  0x0000 /* 0.000000 */,
+  0x0000 /* 0.000000 */,
+  0x53b0 /* 61.500000 */,
+  0x7c00 /* inf */,
+  0x7c00 /* inf */,
+  0x6000 /* 512.000000 */,
+  0x7c00 /* inf */,
+  0x5548 /* 84.500000 */,
+  0x7c00 /* inf */,
+  0x50d0 /* 38.500000 */,
+  0x7c00 /* inf */,
+  0x7c00 /* inf */,
+  0x7c00 /* inf */,
+  0x7c00 /* inf */,
+  0x7c00 /* inf */,
+  0x7c00 /* inf */
+};
+
+uint16_t expected_2[] =
+{
+  0x0000 /* 0.000000 */,
+  0x0000 /* 0.000000 */,
+  0x4fb0 /* 30.750000 */,
+  0x7c00 /* inf */,
+  0x7c00 /* inf */,
+  0x5c00 /* 256.000000 */,
+  0x7c00 /* inf */,
+  0x5148 /* 42.250000 */,
+  0x7c00 /* inf */,
+  0x4cd0 /* 19.250000 */,
+  0x7c00 /* inf */,
+  0x7c00 /* inf */,
+  0x7c00 /* inf */,
+  0x7c00 /* inf */,
+  0x7c00 /* inf */,
+  0x7c00 /* inf */
+};
+
+uint16_t expected_3[] =
+{
+  0x0000 /* 0.000000 */,
+  0x0000 /* 0.000000 */,
+  0x0000 /* 0.000000 */,
+  0x3c00 /* 1.000000 */,
+  0x3c00 /* 1.000000 */,
+  0x0004 /* 0.000000 */,
+  0x3c00 /* 1.000000 */,
+  0x0001 /* 0.000000 */,
+  0x3c00 /* 1.000000 */,
+  0x0000 /* 0.000000 */,
+  0x3c00 /* 1.000000 */,
+  0x3c00 /* 1.000000 */,
+  0x3c00 /* 1.000000 */,
+  0x3c00 /* 1.000000 */,
+  0x3c00 /* 1.000000 */,
+  0x3c00 /* 1.000000 */
+};
+
+#define TEST_MSG "VCVTH_N_F16_U32"
+#define INSN_NAME vcvth_n_f16_u32
+
+#define INPUT input
+#define EXPECTED_1 expected_1
+#define EXPECTED_2 expected_2
+#define EXPECTED_3 expected_3
+
+#define INPUT_TYPE uint32_t
+#define OUTPUT_TYPE float16_t
+#define OUTPUT_TYPE_SIZE 16
+
+#define SCALAR_OPERANDS
+#define SCALAR_1 1
+#define SCALAR_2 2
+#define SCALAR_3 32
+
+/* Include the template for unary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_s32_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_s32_f16_1.c
new file mode 100644
index 0000000..6e2ee50
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_s32_f16_1.c
@@ -0,0 +1,100 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+/* Input values.  */
+float16_t input[] =
+{
+  0.0, -0.0,
+  123.4, -567.8,
+  -34.8, 1024,
+  663.1, 169.1,
+  -4.8, 77.0,
+  -144.5, -56.8,
+
+  (float16_t) -16, (float16_t) -15,
+  (float16_t) -14, (float16_t) -13,
+};
+
+/* Expected results (32-bit hexadecimal representation).  */
+uint32_t expected_1[] =
+{
+  0x00000000,
+  0x00000000,
+  0x000000f6,
+  0xfffffb90,
+  0xffffffbb,
+  0x00000800,
+  0x0000052e,
+  0x00000152,
+  0xfffffff7,
+  0x0000009a,
+  0xfffffedf,
+  0xffffff8f,
+  0xffffffe0,
+  0xffffffe2,
+  0xffffffe4,
+  0xffffffe6,
+};
+
+uint32_t expected_2[] =
+{
+  0x00000000,
+  0x00000000,
+  0x000001ed,
+  0xfffff720,
+  0xffffff75,
+  0x00001000,
+  0x00000a5c,
+  0x000002a4,
+  0xffffffed,
+  0x00000134,
+  0xfffffdbe,
+  0xffffff1d,
+  0xffffffc0,
+  0xffffffc4,
+  0xffffffc8,
+  0xffffffcc,
+};
+
+uint32_t expected_3[] =
+{
+  0x00000000,
+  0x00000000,
+  0x7fffffff,
+  0x80000000,
+  0x80000000,
+  0x7fffffff,
+  0x7fffffff,
+  0x7fffffff,
+  0x80000000,
+  0x7fffffff,
+  0x80000000,
+  0x80000000,
+  0x80000000,
+  0x80000000,
+  0x80000000,
+  0x80000000,
+};
+
+#define TEST_MSG "VCVTH_N_S32_F16"
+#define INSN_NAME vcvth_n_s32_f16
+
+#define INPUT input
+#define EXPECTED_1 expected_1
+#define EXPECTED_2 expected_2
+#define EXPECTED_3 expected_3
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE uint32_t
+#define OUTPUT_TYPE_SIZE 32
+
+#define SCALAR_OPERANDS
+#define SCALAR_1 1
+#define SCALAR_2 2
+#define SCALAR_3 32
+
+/* Include the template for unary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_u32_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_u32_f16_1.c
new file mode 100644
index 0000000..188f60c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_u32_f16_1.c
@@ -0,0 +1,100 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+/* Input values.  */
+float16_t input[] =
+{
+  0.0, -0.0,
+  123.4, -567.8,
+  -34.8, 1024,
+  663.1, 169.1,
+  -4.8, 77.0,
+  -144.5, -56.8,
+
+  (float16_t) -16, (float16_t) -15,
+  (float16_t) -14, (float16_t) -13,
+};
+
+/* Expected results (32-bit hexadecimal representation).  */
+uint32_t expected_1[] =
+{
+  0x00000000,
+  0x00000000,
+  0x000000f6,
+  0x00000000,
+  0x00000000,
+  0x00000800,
+  0x0000052e,
+  0x00000152,
+  0x00000000,
+  0x0000009a,
+  0x00000000,
+  0x00000000,
+  0x00000000,
+  0x00000000,
+  0x00000000,
+  0x00000000,
+};
+
+uint32_t expected_2[] =
+{
+  0x00000000,
+  0x00000000,
+  0x000001ed,
+  0x00000000,
+  0x00000000,
+  0x00001000,
+  0x00000a5c,
+  0x000002a4,
+  0x00000000,
+  0x00000134,
+  0x00000000,
+  0x00000000,
+  0x00000000,
+  0x00000000,
+  0x00000000,
+  0x00000000,
+};
+
+uint32_t expected_3[] =
+{
+  0x00000000,
+  0x00000000,
+  0xffffffff,
+  0x00000000,
+  0x00000000,
+  0xffffffff,
+  0xffffffff,
+  0xffffffff,
+  0x00000000,
+  0xffffffff,
+  0x00000000,
+  0x00000000,
+  0x00000000,
+  0x00000000,
+  0x00000000,
+  0x00000000,
+};
+
+#define TEST_MSG "VCVTH_N_U32_F16"
+#define INSN_NAME vcvth_n_u32_f16
+
+#define INPUT input
+#define EXPECTED_1 expected_1
+#define EXPECTED_2 expected_2
+#define EXPECTED_3 expected_3
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE uint32_t
+#define OUTPUT_TYPE_SIZE 32
+
+#define SCALAR_OPERANDS
+#define SCALAR_1 1
+#define SCALAR_2 2
+#define SCALAR_3 32
+
+/* Include the template for unary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_s32_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_s32_f16_1.c
new file mode 100644
index 0000000..6bff954
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_s32_f16_1.c
@@ -0,0 +1,53 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+/* Input values.  */
+float16_t input[] =
+{
+  0.0, -0.0,
+  123.4, -567.8,
+  -34.8, 1024,
+  663.1, 169.1,
+  -4.8, 77.0,
+  -144.5, -56.8,
+
+  (float16_t) -16, (float16_t) -15,
+  (float16_t) -14, (float16_t) -13,
+};
+
+/* Expected results (32-bit hexadecimal representation).  */
+uint32_t expected[] =
+{
+  0x00000000,
+  0x00000000,
+  0x0000007b,
+  0xfffffdc8,
+  0xffffffde,
+  0x00000400,
+  0x00000297,
+  0x000000a9,
+  0xfffffffc,
+  0x0000004d,
+  0xffffff70,
+  0xffffffc8,
+  0xfffffff0,
+  0xfffffff1,
+  0xfffffff2,
+  0xfffffff3,
+};
+
+#define TEST_MSG "VCVTH_S32_F16"
+#define INSN_NAME vcvth_s32_f16
+
+#define INPUT input
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE int32_t
+#define OUTPUT_TYPE_SIZE 32
+
+/* Include the template for unary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_u32_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_u32_f16_1.c
new file mode 100644
index 0000000..d5807d7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_u32_f16_1.c
@@ -0,0 +1,53 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+/* Input values.  */
+float16_t input[] =
+{
+  0.0, -0.0,
+  123.4, -567.8,
+  -34.8, 1024,
+  663.1, 169.1,
+  -4.8, 77.0,
+  -144.5, -56.8,
+
+  (float16_t) -16, (float16_t) -15,
+  (float16_t) -14, (float16_t) -13,
+};
+
+/* Expected results (32-bit hexadecimal representation).  */
+uint32_t expected[] =
+{
+  0x00000000,
+  0x00000000,
+  0x0000007b,
+  0x00000000,
+  0x00000000,
+  0x00000400,
+  0x00000297,
+  0x000000a9,
+  0x00000000,
+  0x0000004d,
+  0x00000000,
+  0x00000000,
+  0x00000000,
+  0x00000000,
+  0x00000000,
+  0x00000000,
+};
+
+#define TEST_MSG "VCVTH_U32_F16"
+#define INSN_NAME vcvth_u32_f16
+
+#define INPUT input
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE uint32_t
+#define OUTPUT_TYPE_SIZE 32
+
+/* Include the template for unary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtmh_s32_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtmh_s32_f16_1.c
new file mode 100644
index 0000000..f4f7b37
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtmh_s32_f16_1.c
@@ -0,0 +1,53 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+/* Input values.  */
+float16_t input[] =
+{
+  0.0, -0.0,
+  123.4, -567.8,
+  -34.8, 1024,
+  663.1, 169.1,
+  -4.8, 77.0,
+  -144.5, -56.8,
+
+  (float16_t) -16, (float16_t) -15,
+  (float16_t) -14, (float16_t) -13,
+};
+
+/* Expected results (32-bit hexadecimal representation).  */
+uint32_t expected[] =
+{
+  0x00000000,
+  0x00000000,
+  0x0000007b,
+  0xfffffdc8,
+  0xffffffdd,
+  0x00000400,
+  0x00000297,
+  0x000000a9,
+  0xfffffffb,
+  0x0000004d,
+  0xffffff6f,
+  0xffffffc7,
+  0xfffffff0,
+  0xfffffff1,
+  0xfffffff2,
+  0xfffffff3
+};
+
+#define TEST_MSG "VCVTMH_S32_F16"
+#define INSN_NAME vcvtmh_s32_f16
+
+#define INPUT input
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE int32_t
+#define OUTPUT_TYPE_SIZE 32
+
+/* Include the template for unary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtmh_u32_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtmh_u32_f16_1.c
new file mode 100644
index 0000000..6cda3b6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtmh_u32_f16_1.c
@@ -0,0 +1,53 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+/* Input values.  */
+float16_t input[] =
+{
+  0.0, -0.0,
+  123.4, -567.8,
+  -34.8, 1024,
+  663.1, 169.1,
+  -4.8, 77.0,
+  -144.5, -56.8,
+
+  (float16_t) -16, (float16_t) -15,
+  (float16_t) -14, (float16_t) -13,
+};
+
+/* Expected results (32-bit hexadecimal representation).  */
+uint32_t expected[] =
+{
+  0x00000000,
+  0x00000000,
+  0x0000007b,
+  0x00000000,
+  0x00000000,
+  0x00000400,
+  0x00000297,
+  0x000000a9,
+  0x00000000,
+  0x0000004d,
+  0x00000000,
+  0x00000000,
+  0x00000000,
+  0x00000000,
+  0x00000000,
+  0x00000000,
+};
+
+#define TEST_MSG "VCVTMH_U32_F16"
+#define INSN_NAME vcvtmh_u32_f16
+
+#define INPUT input
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE uint32_t
+#define OUTPUT_TYPE_SIZE 32
+
+/* Include the template for unary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtnh_s32_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtnh_s32_f16_1.c
new file mode 100644
index 0000000..94c333e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtnh_s32_f16_1.c
@@ -0,0 +1,53 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+/* Input values.  */
+float16_t input[] =
+{
+  0.0, -0.0,
+  123.4, -567.8,
+  -34.8, 1024,
+  663.1, 169.1,
+  -4.8, 77.0,
+  -144.5, -56.8,
+
+  (float16_t) -16, (float16_t) -15,
+  (float16_t) -14, (float16_t) -13,
+};
+
+/* Expected results (32-bit hexadecimal representation).  */
+uint32_t expected[] =
+{
+  0x00000000,
+  0x00000000,
+  0x0000007b,
+  0xfffffdc8,
+  0xffffffdd,
+  0x00000400,
+  0x00000297,
+  0x000000a9,
+  0xfffffffb,
+  0x0000004d,
+  0xffffff70,
+  0xffffffc7,
+  0xfffffff0,
+  0xfffffff1,
+  0xfffffff2,
+  0xfffffff3
+};
+
+#define TEST_MSG "VCVTNH_S32_F16"
+#define INSN_NAME vcvtnh_s32_f16
+
+#define INPUT input
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE int32_t
+#define OUTPUT_TYPE_SIZE 32
+
+/* Include the template for unary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtnh_u32_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtnh_u32_f16_1.c
new file mode 100644
index 0000000..97d5fba
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtnh_u32_f16_1.c
@@ -0,0 +1,53 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+/* Input values.  */
+float16_t input[] =
+{
+  0.0, -0.0,
+  123.4, -567.8,
+  -34.8, 1024,
+  663.1, 169.1,
+  -4.8, 77.0,
+  -144.5, -56.8,
+
+  (float16_t) -16, (float16_t) -15,
+  (float16_t) -14, (float16_t) -13,
+};
+
+/* Expected results (32-bit hexadecimal representation).  */
+uint32_t expected[] =
+{
+  0x00000000,
+  0x00000000,
+  0x0000007b,
+  0x00000000,
+  0x00000000,
+  0x00000400,
+  0x00000297,
+  0x000000a9,
+  0x00000000,
+  0x0000004d,
+  0x00000000,
+  0x00000000,
+  0x00000000,
+  0x00000000,
+  0x00000000,
+  0x00000000,
+};
+
+#define TEST_MSG "VCVTNH_U32_F16"
+#define INSN_NAME vcvtnh_u32_f16
+
+#define INPUT input
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE uint32_t
+#define OUTPUT_TYPE_SIZE 32
+
+/* Include the template for unary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtph_s32_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtph_s32_f16_1.c
new file mode 100644
index 0000000..105d236
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtph_s32_f16_1.c
@@ -0,0 +1,53 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+/* Input values.  */
+float16_t input[] =
+{
+  0.0, -0.0,
+  123.4, -567.8,
+  -34.8, 1024,
+  663.1, 169.1,
+  -4.8, 77.0,
+  -144.5, -56.8,
+
+  (float16_t) -16, (float16_t) -15,
+  (float16_t) -14, (float16_t) -13,
+};
+
+/* Expected results (32-bit hexadecimal representation).  */
+uint32_t expected[] =
+{
+  0x00000000,
+  0x00000000,
+  0x0000007c,
+  0xfffffdc8,
+  0xffffffde,
+  0x00000400,
+  0x00000297,
+  0x000000aa,
+  0xfffffffc,
+  0x0000004d,
+  0xffffff70,
+  0xffffffc8,
+  0xfffffff0,
+  0xfffffff1,
+  0xfffffff2,
+  0xfffffff3
+};
+
+#define TEST_MSG "VCVTPH_S32_F16"
+#define INSN_NAME vcvtph_s32_f16
+
+#define INPUT input
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE int32_t
+#define OUTPUT_TYPE_SIZE 32
+
+/* Include the template for unary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtph_u32_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtph_u32_f16_1.c
new file mode 100644
index 0000000..d66adcd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtph_u32_f16_1.c
@@ -0,0 +1,53 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+/* Input values.  */
+float16_t input[] =
+{
+  0.0, -0.0,
+  123.4, -567.8,
+  -34.8, 1024,
+  663.1, 169.1,
+  -4.8, 77.0,
+  -144.5, -56.8,
+
+  (float16_t) -16, (float16_t) -15,
+  (float16_t) -14, (float16_t) -13,
+};
+
+/* Expected results (32-bit hexadecimal representation).  */
+uint32_t expected[] =
+{
+  0x00000000,
+  0x00000000,
+  0x0000007c,
+  0x00000000,
+  0x00000000,
+  0x00000400,
+  0x00000297,
+  0x000000aa,
+  0x00000000,
+  0x0000004d,
+  0x00000000,
+  0x00000000,
+  0x00000000,
+  0x00000000,
+  0x00000000,
+  0x00000000,
+};
+
+#define TEST_MSG "VCVTPH_U32_F16"
+#define INSN_NAME vcvtph_u32_f16
+
+#define INPUT input
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE uint32_t
+#define OUTPUT_TYPE_SIZE 32
+
+/* Include the template for unary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vdivh_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vdivh_f16_1.c
new file mode 100644
index 0000000..6a99109
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vdivh_f16_1.c
@@ -0,0 +1,42 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+#define INFF __builtin_inf ()
+
+/* Expected results (16-bit hexadecimal representation).  */
+uint16_t expected[] =
+{
+  0x0000 /* 0.000000 */,
+  0x8000 /* -0.000000 */,
+  0xb765 /* -0.462158 */,
+  0x27ef /* 0.030991 */,
+  0x3955 /* 0.666504 */,
+  0xccff /* -19.984375 */,
+  0xc49a /* -4.601562 */,
+  0xb1e3 /* -0.183960 */,
+  0x3cd3 /* 1.206055 */,
+  0x23f0 /* 0.015503 */,
+  0xa9ef /* -0.046356 */,
+  0x32f4 /* 0.217285 */,
+  0xb036 /* -0.131592 */,
+  0x4126 /* 2.574219 */,
+  0xcd15 /* -20.328125 */,
+  0x537f /* 59.968750 */,
+  0x7e00 /* nan */,
+  0x7e00 /* nan */
+};
+
+#define TEST_MSG "VDIVH_F16"
+#define INSN_NAME vdivh_f16
+
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE float16_t
+#define OUTPUT_TYPE_SIZE 16
+
+/* Include the template for binary scalar operations.  */
+#include "binary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfmah_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfmah_f16_1.c
new file mode 100644
index 0000000..1ac6b67
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfmah_f16_1.c
@@ -0,0 +1,40 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+/* Expected results (16-bit hexadecimal representation).  */
+uint16_t expected[] =
+{
+ 0x0000 /* 0.000000 */,
+ 0x0000 /* 0.000000 */,
+ 0x3944 /* 0.658203 */,
+ 0xcefa /* -27.906250 */,
+ 0x5369 /* 59.281250 */,
+ 0x35ba /* 0.357910 */,
+ 0xc574 /* -5.453125 */,
+ 0xc5e6 /* -5.898438 */,
+ 0x3f66 /* 1.849609 */,
+ 0x5665 /* 102.312500 */,
+ 0xc02d /* -2.087891 */,
+ 0x4d79 /* 21.890625 */,
+ 0x547b /* 71.687500 */,
+ 0xcdf0 /* -23.750000 */,
+ 0xc625 /* -6.144531 */,
+ 0x4cf9 /* 19.890625 */,
+ 0x7e00 /* nan */,
+ 0x7e00 /* nan */
+};
+
+#define TEST_MSG "VFMAH_F16"
+#define INSN_NAME vfmah_f16
+
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE float16_t
+#define OUTPUT_TYPE_SIZE 16
+
+/* Include the template for binary scalar operations.  */
+#include "ternary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfmsh_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfmsh_f16_1.c
new file mode 100644
index 0000000..77021be
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfmsh_f16_1.c
@@ -0,0 +1,40 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+/* Expected results (16-bit hexadecimal representation).  */
+uint16_t expected[] =
+{
+  0x0000 /* 0.000000 */,
+  0x8000 /* -0.000000 */,
+  0x42af /* 3.341797 */,
+  0x5043 /* 34.093750 */,
+  0xccd2 /* -19.281250 */,
+  0x3712 /* 0.441895 */,
+  0x3acc /* 0.849609 */,
+  0x4848 /* 8.562500 */,
+  0xcc43 /* -17.046875 */,
+  0xd65c /* -101.750000 */,
+  0x4185 /* 2.759766 */,
+  0xcd39 /* -20.890625 */,
+  0xd45b /* -69.687500 */,
+  0x5241 /* 50.031250 */,
+  0xc675 /* -6.457031 */,
+  0x4d07 /* 20.109375 */,
+  0x7c00 /* inf */,
+  0xfc00 /* -inf */
+};
+
+#define TEST_MSG "VFMSH_F16"
+#define INSN_NAME vfmsh_f16
+
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE float16_t
+#define OUTPUT_TYPE_SIZE 16
+
+/* Include the template for binary scalar operations.  */
+#include "ternary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmaxnmh_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmaxnmh_f16_1.c
new file mode 100644
index 0000000..4db4b84
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmaxnmh_f16_1.c
@@ -0,0 +1,42 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+#define INFF __builtin_inf ()
+
+/* Expected results (16-bit hexadecimal representation).  */
+uint16_t expected[] =
+{
+  0x3c00 /* 1.000000 */,
+  0x3c00 /* 1.000000 */,
+  0x4000 /* 2.000000 */,
+  0x5640 /* 100.000000 */,
+  0x4f80 /* 30.000000 */,
+  0x3666 /* 0.399902 */,
+  0x3800 /* 0.500000 */,
+  0x3d52 /* 1.330078 */,
+  0xc64d /* -6.300781 */,
+  0x4d00 /* 20.000000 */,
+  0x355d /* 0.335205 */,
+  0x409a /* 2.300781 */,
+  0x3c00 /* 1.000000 */,
+  0x4a91 /* 13.132812 */,
+  0x34f6 /* 0.310059 */,
+  0x4d00 /* 20.000000 */,
+  0x7c00 /* inf */,
+  0x7c00 /* inf */
+};
+
+#define TEST_MSG "VMAXNMH_F16"
+#define INSN_NAME vmaxnmh_f16
+
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE float16_t
+#define OUTPUT_TYPE_SIZE 16
+
+/* Include the template for binary scalar operations.  */
+#include "binary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vminnmh_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vminnmh_f16_1.c
new file mode 100644
index 0000000..f6b0216
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vminnmh_f16_1.c
@@ -0,0 +1,42 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+#define INFF __builtin_inf ()
+
+/* Expected results (16-bit hexadecimal representation).  */
+uint16_t expected[] =
+{
+  0x0000 /* 0.000000 */,
+  0x8000 /* -0.000000 */,
+  0xc454 /* -4.328125 */,
+  0x4233 /* 3.099609 */,
+  0x4d00 /* 20.000000 */,
+  0xa51f /* -0.020004 */,
+  0xc09a /* -2.300781 */,
+  0xc73b /* -7.230469 */,
+  0xc79a /* -7.601562 */,
+  0x34f6 /* 0.310059 */,
+  0xc73b /* -7.230469 */,
+  0x3800 /* 0.500000 */,
+  0xc79a /* -7.601562 */,
+  0x451a /* 5.101562 */,
+  0xc64d /* -6.300781 */,
+  0x3556 /* 0.333496 */,
+  0xfc00 /* -inf */,
+  0xfc00 /* -inf */
+};
+
+#define TEST_MSG "VMINNMH_F16"
+#define INSN_NAME vminnmh_f16
+
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE float16_t
+#define OUTPUT_TYPE_SIZE 16
+
+/* Include the template for binary scalar operations.  */
+#include "binary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulh_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulh_f16_1.c
new file mode 100644
index 0000000..09684d2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulh_f16_1.c
@@ -0,0 +1,42 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+#define INFF __builtin_inf ()
+
+/* Expected results (16-bit hexadecimal representation).  */
+uint16_t expected[] =
+{
+  0x0000 /* 0.000000 */,
+  0x8000 /* -0.000000 */,
+  0xc854 /* -8.656250 */,
+  0x5cd8 /* 310.000000 */,
+  0x60b0 /* 600.000000 */,
+  0xa019 /* -0.008003 */,
+  0xbc9a /* -1.150391 */,
+  0xc8cf /* -9.617188 */,
+  0x51fd /* 47.906250 */,
+  0x4634 /* 6.203125 */,
+  0xc0d9 /* -2.423828 */,
+  0x3c9a /* 1.150391 */,
+  0xc79a /* -7.601562 */,
+  0x5430 /* 67.000000 */,
+  0xbfd0 /* -1.953125 */,
+  0x46ac /* 6.671875 */,
+  0xfc00 /* -inf */,
+  0xfc00 /* -inf */
+};
+
+#define TEST_MSG "VMULH_F16"
+#define INSN_NAME vmulh_f16
+
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE float16_t
+#define OUTPUT_TYPE_SIZE 16
+
+/* Include the template for binary scalar operations.  */
+#include "binary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vnegh_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vnegh_f16_1.c
new file mode 100644
index 0000000..421d827
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vnegh_f16_1.c
@@ -0,0 +1,39 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+uint16_t expected[] =
+{
+  0x8000 /* -0.000000 */,
+  0x0000 /* 0.000000 */,
+  0xc000 /* -2.000000 */,
+  0xc233 /* -3.099609 */,
+  0xcd00 /* -20.000000 */,
+  0xb666 /* -0.399902 */,
+  0x409a /* 2.300781 */,
+  0xbd52 /* -1.330078 */,
+  0x479a /* 7.601562 */,
+  0xb4f6 /* -0.310059 */,
+  0xb55d /* -0.335205 */,
+  0xb800 /* -0.500000 */,
+  0xbc00 /* -1.000000 */,
+  0xca91 /* -13.132812 */,
+  0x464d /* 6.300781 */,
+  0xcd00 /* -20.000000 */,
+  0xfc00 /* -inf */,
+  0x7c00 /* inf */
+};
+
+#define TEST_MSG "VNEGH_F16"
+#define INSN_NAME vnegh_f16
+
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE float16_t
+#define OUTPUT_TYPE_SIZE 16
+
+/* Include the template for unary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndah_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndah_f16_1.c
new file mode 100644
index 0000000..bcf47f6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndah_f16_1.c
@@ -0,0 +1,40 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+/* Expected results (16-bit hexadecimal representation).  */
+uint16_t expected[] =
+{
+  0x0000 /* 0.000000 */,
+  0x8000 /* -0.000000 */,
+  0x4000 /* 2.000000 */,
+  0x4200 /* 3.000000 */,
+  0x4d00 /* 20.000000 */,
+  0x0000 /* 0.000000 */,
+  0xc000 /* -2.000000 */,
+  0x3c00 /* 1.000000 */,
+  0xc800 /* -8.000000 */,
+  0x0000 /* 0.000000 */,
+  0x0000 /* 0.000000 */,
+  0x3c00 /* 1.000000 */,
+  0x3c00 /* 1.000000 */,
+  0x4a80 /* 13.000000 */,
+  0xc600 /* -6.000000 */,
+  0x4d00 /* 20.000000 */,
+  0x7c00 /* inf */,
+  0xfc00 /* -inf */
+};
+
+#define TEST_MSG "VRNDAH_F16"
+#define INSN_NAME vrndah_f16
+
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE float16_t
+#define OUTPUT_TYPE_SIZE 16
+
+/* Include the template for unary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndh_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndh_f16_1.c
new file mode 100644
index 0000000..3c4649e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndh_f16_1.c
@@ -0,0 +1,40 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+/* Expected results (16-bit hexadecimal representation).  */
+uint16_t expected[] =
+{
+  0x0000 /* 0.000000 */,
+  0x8000 /* -0.000000 */,
+  0x4000 /* 2.000000 */,
+  0x4200 /* 3.000000 */,
+  0x4d00 /* 20.000000 */,
+  0x0000 /* 0.000000 */,
+  0xc000 /* -2.000000 */,
+  0x3c00 /* 1.000000 */,
+  0xc700 /* -7.000000 */,
+  0x0000 /* 0.000000 */,
+  0x0000 /* 0.000000 */,
+  0x0000 /* 0.000000 */,
+  0x3c00 /* 1.000000 */,
+  0x4a80 /* 13.000000 */,
+  0xc600 /* -6.000000 */,
+  0x4d00 /* 20.000000 */,
+  0x7c00 /* inf */,
+  0xfc00 /* -inf */
+};
+
+#define TEST_MSG "VRNDH_F16"
+#define INSN_NAME vrndh_f16
+
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE float16_t
+#define OUTPUT_TYPE_SIZE 16
+
+/* Include the template for unary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndih_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndih_f16_1.c
new file mode 100644
index 0000000..4a7b721
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndih_f16_1.c
@@ -0,0 +1,40 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+/* Expected results (16-bit hexadecimal representation).  */
+uint16_t expected[] =
+{
+  0x0000 /* 0.000000 */,
+  0x8000 /* -0.000000 */,
+  0x4000 /* 2.000000 */,
+  0x4200 /* 3.000000 */,
+  0x4d00 /* 20.000000 */,
+  0x0000 /* 0.000000 */,
+  0xc000 /* -2.000000 */,
+  0x3c00 /* 1.000000 */,
+  0xc800 /* -8.000000 */,
+  0x0000 /* 0.000000 */,
+  0x0000 /* 0.000000 */,
+  0x0000 /* 0.000000 */,
+  0x3c00 /* 1.000000 */,
+  0x4a80 /* 13.000000 */,
+  0xc600 /* -6.000000 */,
+  0x4d00 /* 20.000000 */,
+  0x7c00 /* inf */,
+  0xfc00 /* -inf */
+};
+
+#define TEST_MSG "VRNDIH_F16"
+#define INSN_NAME vrndih_f16
+
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE float16_t
+#define OUTPUT_TYPE_SIZE 16
+
+/* Include the template for unary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndmh_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndmh_f16_1.c
new file mode 100644
index 0000000..9af357d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndmh_f16_1.c
@@ -0,0 +1,40 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+/* Expected results (16-bit hexadecimal representation).  */
+uint16_t expected[] =
+{
+  0x0000 /* 0.000000 */,
+  0x8000 /* -0.000000 */,
+  0x4000 /* 2.000000 */,
+  0x4200 /* 3.000000 */,
+  0x4d00 /* 20.000000 */,
+  0x0000 /* 0.000000 */,
+  0xc200 /* -3.000000 */,
+  0x3c00 /* 1.000000 */,
+  0xc800 /* -8.000000 */,
+  0x0000 /* 0.000000 */,
+  0x0000 /* 0.000000 */,
+  0x0000 /* 0.000000 */,
+  0x3c00 /* 1.000000 */,
+  0x4a80 /* 13.000000 */,
+  0xc700 /* -7.000000 */,
+  0x4d00 /* 20.000000 */,
+  0x7c00 /* inf */,
+  0xfc00 /* -inf */
+};
+
+#define TEST_MSG "VRNDMH_F16"
+#define INSN_NAME vrndmh_f16
+
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE float16_t
+#define OUTPUT_TYPE_SIZE 16
+
+/* Include the template for unary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndnh_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndnh_f16_1.c
new file mode 100644
index 0000000..eb4b27d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndnh_f16_1.c
@@ -0,0 +1,40 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+/* Expected results (16-bit hexadecimal representation).  */
+uint16_t expected[] =
+{
+  0x0000 /* 0.000000 */,
+  0x8000 /* -0.000000 */,
+  0x4000 /* 2.000000 */,
+  0x4200 /* 3.000000 */,
+  0x4d00 /* 20.000000 */,
+  0x0000 /* 0.000000 */,
+  0xc000 /* -2.000000 */,
+  0x3c00 /* 1.000000 */,
+  0xc800 /* -8.000000 */,
+  0x0000 /* 0.000000 */,
+  0x0000 /* 0.000000 */,
+  0x0000 /* 0.000000 */,
+  0x3c00 /* 1.000000 */,
+  0x4a80 /* 13.000000 */,
+  0xc600 /* -6.000000 */,
+  0x4d00 /* 20.000000 */,
+  0x7c00 /* inf */,
+  0xfc00 /* -inf */
+};
+
+#define TEST_MSG "VRNDNH_F16"
+#define INSN_NAME vrndnh_f16
+
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE float16_t
+#define OUTPUT_TYPE_SIZE 16
+
+/* Include the template for unary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndph_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndph_f16_1.c
new file mode 100644
index 0000000..3fa9749
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndph_f16_1.c
@@ -0,0 +1,40 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+/* Expected results (16-bit hexadecimal representation).  */
+uint16_t expected[] =
+{
+  0x0000 /* 0.000000 */,
+  0x8000 /* -0.000000 */,
+  0x4000 /* 2.000000 */,
+  0x4400 /* 4.000000 */,
+  0x4d00 /* 20.000000 */,
+  0x3c00 /* 1.000000 */,
+  0xc000 /* -2.000000 */,
+  0x4000 /* 2.000000 */,
+  0xc700 /* -7.000000 */,
+  0x3c00 /* 1.000000 */,
+  0x3c00 /* 1.000000 */,
+  0x3c00 /* 1.000000 */,
+  0x3c00 /* 1.000000 */,
+  0x4b00 /* 14.000000 */,
+  0xc600 /* -6.000000 */,
+  0x4d00 /* 20.000000 */,
+  0x7c00 /* inf */,
+  0xfc00 /* -inf */
+};
+
+#define TEST_MSG "VRNDPH_F16"
+#define INSN_NAME vrndph_f16
+
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE float16_t
+#define OUTPUT_TYPE_SIZE 16
+
+/* Include the template for unary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndxh_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndxh_f16_1.c
new file mode 100644
index 0000000..eb4b27d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndxh_f16_1.c
@@ -0,0 +1,40 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+/* Expected results (16-bit hexadecimal representation).  */
+uint16_t expected[] =
+{
+  0x0000 /* 0.000000 */,
+  0x8000 /* -0.000000 */,
+  0x4000 /* 2.000000 */,
+  0x4200 /* 3.000000 */,
+  0x4d00 /* 20.000000 */,
+  0x0000 /* 0.000000 */,
+  0xc000 /* -2.000000 */,
+  0x3c00 /* 1.000000 */,
+  0xc800 /* -8.000000 */,
+  0x0000 /* 0.000000 */,
+  0x0000 /* 0.000000 */,
+  0x0000 /* 0.000000 */,
+  0x3c00 /* 1.000000 */,
+  0x4a80 /* 13.000000 */,
+  0xc600 /* -6.000000 */,
+  0x4d00 /* 20.000000 */,
+  0x7c00 /* inf */,
+  0xfc00 /* -inf */
+};
+
+#define TEST_MSG "VRNDNH_F16"
+#define INSN_NAME vrndnh_f16
+
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE float16_t
+#define OUTPUT_TYPE_SIZE 16
+
+/* Include the template for unary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vsqrth_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vsqrth_f16_1.c
new file mode 100644
index 0000000..7d03827
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vsqrth_f16_1.c
@@ -0,0 +1,40 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+/* Expected results (16-bit hexadecimal representation).  */
+uint16_t expected[] =
+{
+  0x0000 /* 0.000000 */,
+  0x8000 /* -0.000000 */,
+  0x3da8 /* 1.414062 */,
+  0x3f0b /* 1.760742 */,
+  0x4479 /* 4.472656 */,
+  0x390f /* 0.632324 */,
+  0x7e00 /* nan */,
+  0x3c9d /* 1.153320 */,
+  0x7e00 /* nan */,
+  0x3874 /* 0.556641 */,
+  0x38a2 /* 0.579102 */,
+  0x39a8 /* 0.707031 */,
+  0x3c00 /* 1.000000 */,
+  0x433f /* 3.623047 */,
+  0x7e00 /* nan */,
+  0x4479 /* 4.472656 */,
+  0x7c00 /* inf */,
+  0x7e00 /* nan */
+};
+
+#define TEST_MSG "VSQRTH_F16"
+#define INSN_NAME vsqrth_f16
+
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE float16_t
+#define OUTPUT_TYPE_SIZE 16
+
+/* Include the template for unary scalar operations.  */
+#include "unary_scalar_op.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vsubh_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vsubh_f16_1.c
new file mode 100644
index 0000000..a7aba11
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vsubh_f16_1.c
@@ -0,0 +1,42 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+#include <arm_fp16.h>
+
+#define INFF __builtin_inf ()
+
+/* Expected results (16-bit hexadecimal representation).  */
+uint16_t expected[] =
+{
+  0xbc00 /* -1.000000 */,
+  0xbc00 /* -1.000000 */,
+  0x4654 /* 6.328125 */,
+  0xd60e /* -96.875000 */,
+  0xc900 /* -10.000000 */,
+  0x36b8 /* 0.419922 */,
+  0xc19a /* -2.800781 */,
+  0x4848 /* 8.562500 */,
+  0xbd34 /* -1.300781 */,
+  0xccec /* -19.687500 */,
+  0x4791 /* 7.566406 */,
+  0xbf34 /* -1.800781 */,
+  0x484d /* 8.601562 */,
+  0x4804 /* 8.031250 */,
+  0xc69c /* -6.609375 */,
+  0x4ceb /* 19.671875 */,
+  0x7c00 /* inf */,
+  0xfc00 /* -inf */
+};
+
+#define TEST_MSG "VSUB_F16"
+#define INSN_NAME vsubh_f16
+
+#define EXPECTED expected
+
+#define INPUT_TYPE float16_t
+#define OUTPUT_TYPE float16_t
+#define OUTPUT_TYPE_SIZE 16
+
+/* Include the template for binary scalar operations.  */
+#include "binary_scalar_op.inc"
-- 
2.1.4


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 17/17][ARM] Add tests for NEON FP16 ACLE intrinsics.
  2016-05-17 14:52 ` [PATCH 17/17][ARM] Add tests for NEON FP16 ACLE intrinsics Matthew Wahab
@ 2016-07-04 14:22   ` Matthew Wahab
  2016-08-04  9:01     ` Ramana Radhakrishnan
  0 siblings, 1 reply; 73+ messages in thread
From: Matthew Wahab @ 2016-07-04 14:22 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 6433 bytes --]

On 17/05/16 15:52, Matthew Wahab wrote:
 > Support for using the half-precision floating point operations added by the
 > ARMv8.2-A FP16 extension is based on the macros and intrinsics added to the
 > ACLE for the extension.
 >
 > This patch adds executable tests for the ACLE Adv.SIMD (NEON) intrinsics to
 > the advsimd-intrinsics testsuite.

The tests added in the previous version of the patch, which only tested
the f16 variants of intrinsics, are dropped. Instead, this patch extends
the existing intrinsics tests to support the new f16 variants. Where the
intrinsic is new, a new test for the intrinsic is added with f16 as the
only variant. (This is consistent with existing practice, e.g vcvt.c.)
The new tests are based on similar existing tests, e.g. maxnm_1.c is
derived from max.c and the vcvt{a,m,p}_1.c tests, via vcvtX.inc, are
based on vcvt.c.

Since they are only available when the FP16 arithmetic instructions are
enabled, advsimd-intrinsics.exp is updated to set -march=armv8.2+fp when
the hardware supports it and the tests for the f16 intrinscs are guarded
with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC. Where a test has only f16
variants, the test file itself is also guarded with
dg-require-effective-target arm_v8_2a_fp16_neon_hw so that it reports
UNSUPPORTED rather than PASS if FP16 isn't supported.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator. Also tested the advsimd-intrinscs tests cross-compiled
for aarch64-none-elf on an ARMv8.2-A emulator.

Ok for trunk?
Matthew

testsuite/
2016-07-04  Matthew Wahab  <matthew.wahab@arm.com>

	* gcc.target/advsimd-intrinsics/advsimd-intrinsics.exp: Enable
	-march=armv8.2-a+fp16 when supported by the hardware.
	* gcc.target/aarch64/advsimd-intrinsics/binary_op_float.inc: New.
	* gcc.target/aarch64/advsimd-intrinsics/binary_op_no64.inc:
	Add F16 tests, enabled if macro HAS_FLOAT16_VARIANT is defined.  Add
	semi-colons to a macro invocations.
	* gcc.target/aarch64/advsimd-intrinsics/cmp_fp_op.inc: Add F16
	tests, enabled if macro __ARM_FEATURE_FP16_VECTOR_ARITHMETIC is
	defined.
	* gcc.target/aarch64/advsimd-intrinsics/cmp_op.inc: Likewise.
	* gcc.target/aarch64/advsimd-intrinsics/cmp_zero_op.inc: New.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vabd.c: Add F16
	tests, enabled if macro __ARM_FEATURE_FP16_VECTOR_ARITHMETIC is
	defined.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vabs.c: Likewise.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vadd.c: Likewise.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vcage.c: Likewise.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vcagt.c: Likewise.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vcale.c: Likewise.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vcalt.c: Likewise.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vceq.c: Likewise.
	* gcc.target/aarch64/advsimd-intrinsics/vceqz_1.c: New.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vcge.c: Add F16
	tests, enabled if macro __ARM_FEATURE_FP16_VECTOR_ARITHMETIC is
	defined.
	* gcc.target/aarch64/advsimd-intrinsics/vcgez_1.c: New.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vcgt.c: Add F16
	tests, enabled if macro __ARM_FEATURE_FP16_VECTOR_ARITHMETIC is
	defined.
	* gcc.target/aarch64/advsimd-intrinsics/vcgtz_1.c: New.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vcle.c: Add F16
	tests, enabled if macro __ARM_FEATURE_FP16_VECTOR_ARITHMETIC is
	defined.
	* gcc.target/aarch64/advsimd-intrinsics/vclez_1.c: New.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vclt.c: Add F16
	tests, enabled if macro __ARM_FEATURE_FP16_VECTOR_ARITHMETIC is
	defined.
	* gcc.target/aarch64/advsimd-intrinsics/vcltz_1.c: New.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vcvt.c: Add F16
	tests, enabled if macro __ARM_FEATURE_FP16_VECTOR_ARITHMETIC is
	defined.  Also fix some white-space.
	* gcc.target/aarch64/advsimd-intrinsics/vcvtX.inc: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvta_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvtm_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvtp_1.c: New.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vfma.c: Add F16
	tests, enabled if macro __ARM_FEATURE_FP16_VECTOR_ARITHMETIC is
	defined.  Also fix some long lines and white-space.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vfms.c: Add F16
	tests, enabled if macro __ARM_FEATURE_FP16_VECTOR_ARITHMETIC is
	defined.  Also fix some long lines and white-space.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vmax.c: Add F16
	tests, enabled if macro __ARM_FEATURE_FP16_VECTOR_ARITHMETIC is
	defined.
	* gcc.target/aarch64/advsimd-intrinsics/vmaxnm_1.c: New.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vmin.c: Add F16
	tests, enabled if macro __ARM_FEATURE_FP16_VECTOR_ARITHMETIC is
	defined.
	* gcc.target/aarch64/advsimd-intrinsics/vminnm_1.c: New.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vmul.c: Add F16
	tests, enabled if macro __ARM_FEATURE_FP16_VECTOR_ARITHMETIC is
	defined.
	* gcc.target/aarch64/advsimd-intrinsics/vmul_lane.c: Likewise.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vmul_n.c:
	Likewise.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vneg.c:
	Likewise.
	* gcc.target/aarch64/advsimd-intrinsics/vpXXX.inc: Likewise.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vpadd.c:
	Likewise.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vpmax.c:
	Likewise.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vpmin.c:
	Likewise.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vrecpe.c:
	Likewise.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vrecps.c:
	Likewise.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vrnd.c:
	Likewise.
	* gcc.target/aarch64/advsimd-intrinsics/vrndX.inc: Likewise.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vrnda.c:
	Likewise.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vrndm.c:
	Likewise.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vrndn.c:
	Likewise.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vrndp.c:
	Likewise.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vrndx.c:
	Likewise.
	* gcc.target/aarch64/advsimd-intrinsics/vrsqrte.c: Likewise.
	* gcc.target/aarch64/advsimd-intrinsics/vrsqrts.c: Likewise.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vsub.c:
	Likewise.


[-- Attachment #2: 0017-PATCH-17-17-ARM-Add-tests-for-NEON-FP16-ACLE-intrins.patch --]
[-- Type: text/x-patch, Size: 163856 bytes --]

From 2b955bc4acd3b198dd0d1ed96fbe11c60b469996 Mon Sep 17 00:00:00 2001
From: Matthew Wahab <matthew.wahab@arm.com>
Date: Thu, 7 Apr 2016 15:41:45 +0100
Subject: [PATCH 17/17] [PATCH 17/17][ARM] Add tests for NEON FP16 ACLE
 intrinsics.

testsuite/
2016-07-04  Matthew Wahab  <matthew.wahab@arm.com>

	* gcc.target/advsimd-intrinsics/advsimd-intrinsics.exp: Enable
	-march=armv8.2-a+fp16 when supported by the hardware.
	* gcc.target/aarch64/advsimd-intrinsics/binary_op_float.inc: New.
	* gcc.target/aarch64/advsimd-intrinsics/binary_op_no64.inc:
	Add F16 tests, enabled if macro HAS_FLOAT16_VARIANT is defined.  Add
	semi-colons to a macro invocations.
	* gcc.target/aarch64/advsimd-intrinsics/cmp_fp_op.inc: Add F16
	tests, enabled if macro __ARM_FEATURE_FP16_VECTOR_ARITHMETIC is
	defined.
	* gcc.target/aarch64/advsimd-intrinsics/cmp_op.inc: Likewise.
	* gcc.target/aarch64/advsimd-intrinsics/cmp_zero_op.inc: New.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vabd.c: Add F16
	tests, enabled if macro __ARM_FEATURE_FP16_VECTOR_ARITHMETIC is
	defined.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vabs.c: Likewise.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vadd.c: Likewise.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vcage.c: Likewise.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vcagt.c: Likewise.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vcale.c: Likewise.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vcalt.c: Likewise.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vceq.c: Likewise.
	* gcc.target/aarch64/advsimd-intrinsics/vceqz_1.c: New.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vcge.c: Add F16
	tests, enabled if macro __ARM_FEATURE_FP16_VECTOR_ARITHMETIC is
	defined.
	* gcc.target/aarch64/advsimd-intrinsics/vcgez_1.c: New.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vcgt.c: Add F16
	tests, enabled if macro __ARM_FEATURE_FP16_VECTOR_ARITHMETIC is
	defined.
	* gcc.target/aarch64/advsimd-intrinsics/vcgtz_1.c: New.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vcle.c: Add F16
	tests, enabled if macro __ARM_FEATURE_FP16_VECTOR_ARITHMETIC is
	defined.
	* gcc.target/aarch64/advsimd-intrinsics/vclez_1.c: New.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vclt.c: Add F16
	tests, enabled if macro __ARM_FEATURE_FP16_VECTOR_ARITHMETIC is
	defined.
	* gcc.target/aarch64/advsimd-intrinsics/vcltz_1.c: New.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vcvt.c: Add F16
	tests, enabled if macro __ARM_FEATURE_FP16_VECTOR_ARITHMETIC is
	defined.  Also fix some white-space.
	* gcc.target/aarch64/advsimd-intrinsics/vcvtX.inc: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvta_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvtm_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvtp_1.c: New.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vfma.c: Add F16
	tests, enabled if macro __ARM_FEATURE_FP16_VECTOR_ARITHMETIC is
	defined.  Also fix some long lines and white-space.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vfms.c: Add F16
	tests, enabled if macro __ARM_FEATURE_FP16_VECTOR_ARITHMETIC is
	defined.  Also fix some long lines and white-space.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vmax.c: Add F16
	tests, enabled if macro __ARM_FEATURE_FP16_VECTOR_ARITHMETIC is
	defined.
	* gcc.target/aarch64/advsimd-intrinsics/vmaxnm_1.c: New.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vmin.c: Add F16
	tests, enabled if macro __ARM_FEATURE_FP16_VECTOR_ARITHMETIC is
	defined.
	* gcc.target/aarch64/advsimd-intrinsics/vminnm_1.c: New.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vmul.c: Add F16
	tests, enabled if macro __ARM_FEATURE_FP16_VECTOR_ARITHMETIC is
	defined.
	* gcc.target/aarch64/advsimd-intrinsics/vmul_lane.c: Likewise.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vmul_n.c:
	Likewise.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vneg.c:
	Likewise.
	* gcc.target/aarch64/advsimd-intrinsics/vpXXX.inc: Likewise.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vpadd.c:
	Likewise.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vpmax.c:
	Likewise.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vpmin.c:
	Likewise.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vrecpe.c:
	Likewise.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vrecps.c:
	Likewise.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vrnd.c:
	Likewise.
	* gcc.target/aarch64/advsimd-intrinsics/vrndX.inc: Likewise.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vrnda.c:
	Likewise.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vrndm.c:
	Likewise.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vrndn.c:
	Likewise.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vrndp.c:
	Likewise.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vrndx.c:
	Likewise.
	* gcc.target/aarch64/advsimd-intrinsics/vrsqrte.c: Likewise.
	* gcc.target/aarch64/advsimd-intrinsics/vrsqrts.c: Likewise.
	* gcc.target/gcc.target/aarch64/advsimd-intrinsics/vsub.c:
	Likewise.
---
 .../advsimd-intrinsics/advsimd-intrinsics.exp      |   5 +-
 .../aarch64/advsimd-intrinsics/binary_op_float.inc | 170 ++++++++++++++++++
 .../aarch64/advsimd-intrinsics/binary_op_no64.inc  |  57 ++++++-
 .../aarch64/advsimd-intrinsics/cmp_fp_op.inc       |  41 +++++
 .../aarch64/advsimd-intrinsics/cmp_op.inc          |  80 +++++++++
 .../aarch64/advsimd-intrinsics/cmp_zero_op.inc     | 111 ++++++++++++
 .../gcc.target/aarch64/advsimd-intrinsics/vabd.c   |  57 ++++++-
 .../gcc.target/aarch64/advsimd-intrinsics/vabs.c   |  28 +++
 .../gcc.target/aarch64/advsimd-intrinsics/vadd.c   |  31 ++++
 .../gcc.target/aarch64/advsimd-intrinsics/vcage.c  |  10 ++
 .../gcc.target/aarch64/advsimd-intrinsics/vcagt.c  |  10 ++
 .../gcc.target/aarch64/advsimd-intrinsics/vcale.c  |  10 ++
 .../gcc.target/aarch64/advsimd-intrinsics/vcalt.c  |  10 ++
 .../gcc.target/aarch64/advsimd-intrinsics/vceq.c   |  18 ++
 .../aarch64/advsimd-intrinsics/vceqz_1.c           |  27 +++
 .../gcc.target/aarch64/advsimd-intrinsics/vcge.c   |  22 +++
 .../aarch64/advsimd-intrinsics/vcgez_1.c           |  30 ++++
 .../gcc.target/aarch64/advsimd-intrinsics/vcgt.c   |  21 +++
 .../aarch64/advsimd-intrinsics/vcgtz_1.c           |  28 +++
 .../gcc.target/aarch64/advsimd-intrinsics/vcle.c   |  22 +++
 .../aarch64/advsimd-intrinsics/vclez_1.c           |  29 ++++
 .../gcc.target/aarch64/advsimd-intrinsics/vclt.c   |  21 +++
 .../aarch64/advsimd-intrinsics/vcltz_1.c           |  27 +++
 .../gcc.target/aarch64/advsimd-intrinsics/vcvt.c   | 189 ++++++++++++++++++++-
 .../aarch64/advsimd-intrinsics/vcvtX.inc           | 113 ++++++++++++
 .../aarch64/advsimd-intrinsics/vcvta_1.c           |  33 ++++
 .../aarch64/advsimd-intrinsics/vcvtm_1.c           |  33 ++++
 .../aarch64/advsimd-intrinsics/vcvtp_1.c           |  33 ++++
 .../gcc.target/aarch64/advsimd-intrinsics/vfma.c   |  46 ++++-
 .../gcc.target/aarch64/advsimd-intrinsics/vfms.c   |  45 ++++-
 .../gcc.target/aarch64/advsimd-intrinsics/vmax.c   |  33 ++++
 .../aarch64/advsimd-intrinsics/vmaxnm_1.c          |  47 +++++
 .../gcc.target/aarch64/advsimd-intrinsics/vmin.c   |  37 ++++
 .../aarch64/advsimd-intrinsics/vminnm_1.c          |  51 ++++++
 .../gcc.target/aarch64/advsimd-intrinsics/vmul.c   |  35 ++++
 .../aarch64/advsimd-intrinsics/vmul_lane.c         |  37 ++++
 .../gcc.target/aarch64/advsimd-intrinsics/vmul_n.c |  32 ++++
 .../gcc.target/aarch64/advsimd-intrinsics/vneg.c   |  29 ++++
 .../aarch64/advsimd-intrinsics/vpXXX.inc           |  15 ++
 .../gcc.target/aarch64/advsimd-intrinsics/vpadd.c  |   3 +
 .../gcc.target/aarch64/advsimd-intrinsics/vpmax.c  |   3 +
 .../gcc.target/aarch64/advsimd-intrinsics/vpmin.c  |   3 +
 .../gcc.target/aarch64/advsimd-intrinsics/vrecpe.c | 125 ++++++++++++++
 .../gcc.target/aarch64/advsimd-intrinsics/vrecps.c |  98 +++++++++++
 .../gcc.target/aarch64/advsimd-intrinsics/vrnd.c   |   8 +
 .../aarch64/advsimd-intrinsics/vrndX.inc           |  20 +++
 .../gcc.target/aarch64/advsimd-intrinsics/vrnda.c  |   9 +
 .../gcc.target/aarch64/advsimd-intrinsics/vrndm.c  |   9 +
 .../gcc.target/aarch64/advsimd-intrinsics/vrndn.c  |   9 +
 .../gcc.target/aarch64/advsimd-intrinsics/vrndp.c  |   8 +
 .../gcc.target/aarch64/advsimd-intrinsics/vrndx.c  |   8 +
 .../aarch64/advsimd-intrinsics/vrsqrte.c           |  91 ++++++++++
 .../aarch64/advsimd-intrinsics/vrsqrts.c           |  97 +++++++++++
 .../gcc.target/aarch64/advsimd-intrinsics/vsub.c   |  31 ++++
 54 files changed, 2178 insertions(+), 17 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/binary_op_float.inc
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/cmp_zero_op.inc
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vceqz_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcgez_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcgtz_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vclez_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcltz_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtX.inc
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvta_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtm_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtp_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmaxnm_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vminnm_1.c

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/advsimd-intrinsics.exp b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/advsimd-intrinsics.exp
index ff39973..e93b8d5 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/advsimd-intrinsics.exp
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/advsimd-intrinsics.exp
@@ -53,7 +53,10 @@ torture-init
 set-torture-options $C_TORTURE_OPTIONS {{}} $LTO_TORTURE_OPTIONS
 
 # Make sure Neon flags are provided, if necessary.  Use fp16 if we can.
-if {[check_effective_target_arm_neon_fp16_ok]} then {
+# Use fp16 arithmetic operations if the hardware supports it.
+if {[check_effective_target_arm_v8_2a_fp16_neon_hw]} then {
+  set additional_flags [add_options_for_arm_v8_2a_fp16_neon ""]
+} elseif {[check_effective_target_arm_neon_fp16_ok]} then {
   set additional_flags [add_options_for_arm_neon_fp16 ""]
 } else {
   set additional_flags [add_options_for_arm_neon ""]
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/binary_op_float.inc b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/binary_op_float.inc
new file mode 100644
index 0000000..cc1bfb3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/binary_op_float.inc
@@ -0,0 +1,170 @@
+/* Floating-point only version of binary_op_no64.inc template.  Currently only
+   float16_t is used.  */
+
+#include <math.h>
+
+#define FNNAME1(NAME) exec_ ## NAME
+#define FNNAME(NAME) FNNAME1(NAME)
+
+void FNNAME (INSN_NAME) (void)
+{
+  int i;
+
+  /* Basic test: z = INSN (x, y), then store the result.  */
+#define TEST_BINARY_OP1(INSN, Q, T1, T2, W, N)				\
+  VECT_VAR(vector_res, T1, W, N) =					\
+    INSN##Q##_##T2##W(VECT_VAR(vector, T1, W, N),			\
+		      VECT_VAR(vector2, T1, W, N));			\
+  vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(vector_res, T1, W, N))
+
+#define TEST_BINARY_OP(INSN, Q, T1, T2, W, N)   \
+  TEST_BINARY_OP1(INSN, Q, T1, T2, W, N)	\
+
+#ifdef HAS_FLOAT16_VARIANT
+  DECL_VARIABLE(vector, float, 16, 4);
+  DECL_VARIABLE(vector2, float, 16, 4);
+  DECL_VARIABLE(vector_res, float, 16, 4);
+
+  DECL_VARIABLE(vector, float, 16, 8);
+  DECL_VARIABLE(vector2, float, 16, 8);
+  DECL_VARIABLE(vector_res, float, 16, 8);
+#endif
+
+#ifdef HAS_FLOAT_VARIANT
+  DECL_VARIABLE(vector, float, 32, 2);
+  DECL_VARIABLE(vector2, float, 32, 2);
+  DECL_VARIABLE(vector_res, float, 32, 2);
+
+  DECL_VARIABLE(vector, float, 32, 4);
+  DECL_VARIABLE(vector2, float, 32, 4);
+  DECL_VARIABLE(vector_res, float, 32, 4);
+#endif
+
+  clean_results ();
+
+  /* Initialize input "vector" from "buffer".  */
+#ifdef HAS_FLOAT16_VARIANT
+  VLOAD(vector, buffer, , float, f, 16, 4);
+  VLOAD(vector, buffer, q, float, f, 16, 8);
+#endif
+#ifdef HAS_FLOAT_VARIANT
+  VLOAD(vector, buffer, , float, f, 32, 2);
+  VLOAD(vector, buffer, q, float, f, 32, 4);
+#endif
+
+  /* Choose init value arbitrarily, will be used as comparison value.  */
+#ifdef HAS_FLOAT16_VARIANT
+  VDUP(vector2, , float, f, 16, 4, -15.5f);
+  VDUP(vector2, q, float, f, 16, 8, -14.5f);
+#endif
+#ifdef HAS_FLOAT_VARIANT
+  VDUP(vector2, , float, f, 32, 2, -15.5f);
+  VDUP(vector2, q, float, f, 32, 4, -14.5f);
+#endif
+
+#ifdef HAS_FLOAT16_VARIANT
+#define FLOAT16_VARIANT(MACRO, VAR)			\
+  MACRO(VAR, , float, f, 16, 4);			\
+  MACRO(VAR, q, float, f, 16, 8);
+#else
+#define FLOAT16_VARIANT(MACRO, VAR)
+#endif
+
+#ifdef HAS_FLOAT_VARIANT
+#define FLOAT_VARIANT(MACRO, VAR)			\
+  MACRO(VAR, , float, f, 32, 2);			\
+  MACRO(VAR, q, float, f, 32, 4);
+#else
+#define FLOAT_VARIANT(MACRO, VAR)
+#endif
+
+#define TEST_MACRO_NO64BIT_VARIANT_1_5(MACRO, VAR)	\
+
+  /* Apply a binary operator named INSN_NAME.  */
+  FLOAT16_VARIANT(TEST_BINARY_OP, INSN_NAME);
+  FLOAT_VARIANT(TEST_BINARY_OP, INSN_NAME);
+
+#ifdef HAS_FLOAT16_VARIANT
+  CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected, "");
+  CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected, "");
+
+  /* Extra FP tests with special values (NaN, ....)  */
+  VDUP(vector, q, float, f, 16, 8, 1.0f);
+  VDUP(vector2, q, float, f, 16, 8, NAN);
+  TEST_BINARY_OP(INSN_NAME, q, float, f, 16, 8);
+  CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_nan,
+	   " FP special (NaN)");
+
+  VDUP(vector, q, float, f, 16, 8, -NAN);
+  VDUP(vector2, q, float, f, 16, 8, 1.0f);
+  TEST_BINARY_OP(INSN_NAME, q, float, f, 16, 8);
+  CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_mnan,
+	   " FP special (-NaN)");
+
+  VDUP(vector, q, float, f, 16, 8, 1.0f);
+  VDUP(vector2, q, float, f, 16, 8, HUGE_VALF);
+  TEST_BINARY_OP(INSN_NAME, q, float, f, 16, 8);
+  CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_inf,
+	   " FP special (inf)");
+
+  VDUP(vector, q, float, f, 16, 8, -HUGE_VALF);
+  VDUP(vector2, q, float, f, 16, 8, 1.0f);
+  TEST_BINARY_OP(INSN_NAME, q, float, f, 16, 8);
+  CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_minf,
+	   " FP special (-inf)");
+
+  VDUP(vector, q, float, f, 16, 8, 0.0f);
+  VDUP(vector2, q, float, f, 16, 8, -0.0f);
+  TEST_BINARY_OP(INSN_NAME, q, float, f, 16, 8);
+  CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_zero1,
+	   " FP special (-0.0)");
+
+  VDUP(vector, q, float, f, 16, 8, -0.0f);
+  VDUP(vector2, q, float, f, 16, 8, 0.0f);
+  TEST_BINARY_OP(INSN_NAME, q, float, f, 16, 8);
+  CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_zero2,
+	   " FP special (-0.0)");
+#endif
+
+#ifdef HAS_FLOAT_VARIANT
+  CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected, "");
+  CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected, "");
+
+  /* Extra FP tests with special values (NaN, ....)  */
+  VDUP(vector, q, float, f, 32, 4, 1.0f);
+  VDUP(vector2, q, float, f, 32, 4, NAN);
+  TEST_BINARY_OP(INSN_NAME, q, float, f, 32, 4);
+  CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_nan, " FP special (NaN)");
+
+  VDUP(vector, q, float, f, 32, 4, -NAN);
+  VDUP(vector2, q, float, f, 32, 4, 1.0f);
+  TEST_BINARY_OP(INSN_NAME, q, float, f, 32, 4);
+  CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_mnan, " FP special (-NaN)");
+
+  VDUP(vector, q, float, f, 32, 4, 1.0f);
+  VDUP(vector2, q, float, f, 32, 4, HUGE_VALF);
+  TEST_BINARY_OP(INSN_NAME, q, float, f, 32, 4);
+  CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_inf, " FP special (inf)");
+
+  VDUP(vector, q, float, f, 32, 4, -HUGE_VALF);
+  VDUP(vector2, q, float, f, 32, 4, 1.0f);
+  TEST_BINARY_OP(INSN_NAME, q, float, f, 32, 4);
+  CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_minf, " FP special (-inf)");
+
+  VDUP(vector, q, float, f, 32, 4, 0.0f);
+  VDUP(vector2, q, float, f, 32, 4, -0.0f);
+  TEST_BINARY_OP(INSN_NAME, q, float, f, 32, 4);
+  CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_zero1, " FP special (-0.0)");
+
+  VDUP(vector, q, float, f, 32, 4, -0.0f);
+  VDUP(vector2, q, float, f, 32, 4, 0.0f);
+  TEST_BINARY_OP(INSN_NAME, q, float, f, 32, 4);
+  CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_zero2, " FP special (-0.0)");
+#endif
+}
+
+int main (void)
+{
+  FNNAME (INSN_NAME) ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/binary_op_no64.inc b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/binary_op_no64.inc
index 1eb9271..a30f420 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/binary_op_no64.inc
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/binary_op_no64.inc
@@ -28,6 +28,10 @@ void FNNAME (INSN_NAME) (void)
 
   /* Initialize input "vector" from "buffer".  */
   TEST_MACRO_ALL_VARIANTS_2_5(VLOAD, vector, buffer);
+#ifdef HAS_FLOAT16_VARIANT
+  VLOAD(vector, buffer, , float, f, 16, 4);
+  VLOAD(vector, buffer, q, float, f, 16, 8);
+#endif
 #ifdef HAS_FLOAT_VARIANT
   VLOAD(vector, buffer, , float, f, 32, 2);
   VLOAD(vector, buffer, q, float, f, 32, 4);
@@ -46,15 +50,27 @@ void FNNAME (INSN_NAME) (void)
   VDUP(vector2, q, uint, u, 8, 16, 0xf9);
   VDUP(vector2, q, uint, u, 16, 8, 0xfff2);
   VDUP(vector2, q, uint, u, 32, 4, 0xfffffff1);
+#ifdef HAS_FLOAT16_VARIANT
+  VDUP(vector2, , float, f, 16, 4, -15.5f);
+  VDUP(vector2, q, float, f, 16, 8, -14.5f);
+#endif
 #ifdef HAS_FLOAT_VARIANT
   VDUP(vector2, , float, f, 32, 2, -15.5f);
   VDUP(vector2, q, float, f, 32, 4, -14.5f);
 #endif
 
+#ifdef HAS_FLOAT16_VARIANT
+#define FLOAT16_VARIANT(MACRO, VAR)			\
+  MACRO(VAR, , float, f, 16, 4);			\
+  MACRO(VAR, q, float, f, 16, 8);
+#else
+#define FLOAT16_VARIANT(MACRO, VAR)
+#endif
+
 #ifdef HAS_FLOAT_VARIANT
 #define FLOAT_VARIANT(MACRO, VAR)			\
   MACRO(VAR, , float, f, 32, 2);			\
-  MACRO(VAR, q, float, f, 32, 4)
+  MACRO(VAR, q, float, f, 32, 4);
 #else
 #define FLOAT_VARIANT(MACRO, VAR)
 #endif
@@ -72,7 +88,8 @@ void FNNAME (INSN_NAME) (void)
   MACRO(VAR, q, uint, u, 8, 16);			\
   MACRO(VAR, q, uint, u, 16, 8);			\
   MACRO(VAR, q, uint, u, 32, 4);			\
-  FLOAT_VARIANT(MACRO, VAR)
+  FLOAT_VARIANT(MACRO, VAR);				\
+  FLOAT16_VARIANT(MACRO, VAR);
 
   /* Apply a binary operator named INSN_NAME.  */
   TEST_MACRO_NO64BIT_VARIANT_1_5(TEST_BINARY_OP, INSN_NAME);
@@ -90,6 +107,42 @@ void FNNAME (INSN_NAME) (void)
   CHECK(TEST_MSG, uint, 16, 8, PRIx16, expected, "");
   CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected, "");
 
+#ifdef HAS_FLOAT16_VARIANT
+  CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected, "");
+  CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected, "");
+
+  /* Extra FP tests with special values (NaN, ....)  */
+  VDUP(vector, q, float, f, 16, 8, 1.0f);
+  VDUP(vector2, q, float, f, 16, 8, NAN);
+  TEST_BINARY_OP(INSN_NAME, q, float, f, 16, 8);
+  CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_nan, " FP special (NaN)");
+
+  VDUP(vector, q, float, f, 16, 8, -NAN);
+  VDUP(vector2, q, float, f, 16, 8, 1.0f);
+  TEST_BINARY_OP(INSN_NAME, q, float, f, 16, 8);
+  CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_mnan, " FP special (-NaN)");
+
+  VDUP(vector, q, float, f, 16, 8, 1.0f);
+  VDUP(vector2, q, float, f, 16, 8, HUGE_VALF);
+  TEST_BINARY_OP(INSN_NAME, q, float, f, 16, 8);
+  CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_inf, " FP special (inf)");
+
+  VDUP(vector, q, float, f, 16, 8, -HUGE_VALF);
+  VDUP(vector2, q, float, f, 16, 8, 1.0f);
+  TEST_BINARY_OP(INSN_NAME, q, float, f, 16, 8);
+  CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_minf, " FP special (-inf)");
+
+  VDUP(vector, q, float, f, 16, 8, 0.0f);
+  VDUP(vector2, q, float, f, 16, 8, -0.0f);
+  TEST_BINARY_OP(INSN_NAME, q, float, f, 16, 8);
+  CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_zero1, " FP special (-0.0)");
+
+  VDUP(vector, q, float, f, 16, 8, -0.0f);
+  VDUP(vector2, q, float, f, 16, 8, 0.0f);
+  TEST_BINARY_OP(INSN_NAME, q, float, f, 16, 8);
+  CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_zero2, " FP special (-0.0)");
+#endif
+
 #ifdef HAS_FLOAT_VARIANT
   CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected, "");
   CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected, "");
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/cmp_fp_op.inc b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/cmp_fp_op.inc
index 33451d7..313badb 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/cmp_fp_op.inc
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/cmp_fp_op.inc
@@ -15,6 +15,10 @@
    each test file.  */
 extern ARRAY(expected2, uint, 32, 2);
 extern ARRAY(expected2, uint, 32, 4);
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+extern ARRAY(expected2, uint, 16, 4);
+extern ARRAY(expected2, uint, 16, 8);
+#endif
 
 #define FNNAME1(NAME) exec_ ## NAME
 #define FNNAME(NAME) FNNAME1(NAME)
@@ -37,17 +41,33 @@ void FNNAME (INSN_NAME) (void)
   DECL_VARIABLE(vector2, float, 32, 4);
   DECL_VARIABLE(vector_res, uint, 32, 2);
   DECL_VARIABLE(vector_res, uint, 32, 4);
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  DECL_VARIABLE(vector, float, 16, 4);
+  DECL_VARIABLE(vector, float, 16, 8);
+  DECL_VARIABLE(vector2, float, 16, 4);
+  DECL_VARIABLE(vector2, float, 16, 8);
+  DECL_VARIABLE(vector_res, uint, 16, 4);
+  DECL_VARIABLE(vector_res, uint, 16, 8);
+#endif
 
   clean_results ();
 
   /* Initialize input "vector" from "buffer".  */
   VLOAD(vector, buffer, , float, f, 32, 2);
   VLOAD(vector, buffer, q, float, f, 32, 4);
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VLOAD(vector, buffer, , float, f, 16, 4);
+  VLOAD(vector, buffer, q, float, f, 16, 8);
+#endif
 
   /* Choose init value arbitrarily, will be used for vector
      comparison.  */
   VDUP(vector2, , float, f, 32, 2, -16.0f);
   VDUP(vector2, q, float, f, 32, 4, -14.0f);
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VDUP(vector2, , float, f, 16, 4, -16.0f);
+  VDUP(vector2, q, float, f, 16, 8, -14.0f);
+#endif
 
   /* Apply operator named INSN_NAME.  */
   TEST_VCOMP(INSN_NAME, , float, f, uint, 32, 2);
@@ -56,15 +76,36 @@ void FNNAME (INSN_NAME) (void)
   TEST_VCOMP(INSN_NAME, q, float, f, uint, 32, 4);
   CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected, "");
 
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  TEST_VCOMP(INSN_NAME, , float, f, uint, 16, 4);
+  CHECK(TEST_MSG, uint, 16, 4, PRIx16, expected, "");
+
+  TEST_VCOMP(INSN_NAME, q, float, f, uint, 16, 8);
+  CHECK(TEST_MSG, uint, 16, 8, PRIx16, expected, "");
+#endif
+
   /* Test again, with different input values.  */
   VDUP(vector2, , float, f, 32, 2, -10.0f);
   VDUP(vector2, q, float, f, 32, 4, 10.0f);
 
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VDUP(vector2, , float, f, 16, 4, -10.0f);
+  VDUP(vector2, q, float, f, 16, 8, 10.0f);
+#endif
+
   TEST_VCOMP(INSN_NAME, , float, f, uint, 32, 2);
   CHECK(TEST_MSG, uint, 32, 2, PRIx32, expected2, "");
 
   TEST_VCOMP(INSN_NAME, q, float, f, uint, 32, 4);
   CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected2,"");
+
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  TEST_VCOMP(INSN_NAME, , float, f, uint, 16, 4);
+  CHECK(TEST_MSG, uint, 16, 4, PRIx16, expected2, "");
+
+  TEST_VCOMP(INSN_NAME, q, float, f, uint, 16, 8);
+  CHECK(TEST_MSG, uint, 16, 8, PRIx16, expected2,"");
+#endif
 }
 
 int main (void)
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/cmp_op.inc b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/cmp_op.inc
index a09c5f5..c8c5dfe 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/cmp_op.inc
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/cmp_op.inc
@@ -11,6 +11,17 @@ extern ARRAY(expected_uint, uint, 32, 2);
 extern ARRAY(expected_q_uint, uint, 8, 16);
 extern ARRAY(expected_q_uint, uint, 16, 8);
 extern ARRAY(expected_q_uint, uint, 32, 4);
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+extern ARRAY(expected_float, uint, 16, 4);
+extern ARRAY(expected_q_float, uint, 16, 8);
+extern ARRAY(expected_nan, uint, 16, 4);
+extern ARRAY(expected_mnan, uint, 16, 4);
+extern ARRAY(expected_nan2, uint, 16, 4);
+extern ARRAY(expected_inf, uint, 16, 4);
+extern ARRAY(expected_minf, uint, 16, 4);
+extern ARRAY(expected_inf2, uint, 16, 4);
+extern ARRAY(expected_mzero, uint, 16, 4);
+#endif
 extern ARRAY(expected_float, uint, 32, 2);
 extern ARRAY(expected_q_float, uint, 32, 4);
 extern ARRAY(expected_uint2, uint, 32, 2);
@@ -48,6 +59,9 @@ void FNNAME (INSN_NAME) (void)
   DECL_VARIABLE(vector, uint, 8, 8);
   DECL_VARIABLE(vector, uint, 16, 4);
   DECL_VARIABLE(vector, uint, 32, 2);
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  DECL_VARIABLE (vector, float, 16, 4);
+#endif
   DECL_VARIABLE(vector, float, 32, 2);
   DECL_VARIABLE(vector, int, 8, 16);
   DECL_VARIABLE(vector, int, 16, 8);
@@ -55,6 +69,9 @@ void FNNAME (INSN_NAME) (void)
   DECL_VARIABLE(vector, uint, 8, 16);
   DECL_VARIABLE(vector, uint, 16, 8);
   DECL_VARIABLE(vector, uint, 32, 4);
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  DECL_VARIABLE (vector, float, 16, 8);
+#endif
   DECL_VARIABLE(vector, float, 32, 4);
 
   DECL_VARIABLE(vector2, int, 8, 8);
@@ -63,6 +80,9 @@ void FNNAME (INSN_NAME) (void)
   DECL_VARIABLE(vector2, uint, 8, 8);
   DECL_VARIABLE(vector2, uint, 16, 4);
   DECL_VARIABLE(vector2, uint, 32, 2);
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  DECL_VARIABLE (vector2, float, 16, 4);
+#endif
   DECL_VARIABLE(vector2, float, 32, 2);
   DECL_VARIABLE(vector2, int, 8, 16);
   DECL_VARIABLE(vector2, int, 16, 8);
@@ -70,6 +90,9 @@ void FNNAME (INSN_NAME) (void)
   DECL_VARIABLE(vector2, uint, 8, 16);
   DECL_VARIABLE(vector2, uint, 16, 8);
   DECL_VARIABLE(vector2, uint, 32, 4);
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  DECL_VARIABLE (vector2, float, 16, 8);
+#endif
   DECL_VARIABLE(vector2, float, 32, 4);
 
   DECL_VARIABLE(vector_res, uint, 8, 8);
@@ -88,6 +111,9 @@ void FNNAME (INSN_NAME) (void)
   VLOAD(vector, buffer, , uint, u, 8, 8);
   VLOAD(vector, buffer, , uint, u, 16, 4);
   VLOAD(vector, buffer, , uint, u, 32, 2);
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VLOAD (vector, buffer, , float, f, 16, 4);
+#endif
   VLOAD(vector, buffer, , float, f, 32, 2);
 
   VLOAD(vector, buffer, q, int, s, 8, 16);
@@ -96,6 +122,9 @@ void FNNAME (INSN_NAME) (void)
   VLOAD(vector, buffer, q, uint, u, 8, 16);
   VLOAD(vector, buffer, q, uint, u, 16, 8);
   VLOAD(vector, buffer, q, uint, u, 32, 4);
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VLOAD (vector, buffer, q, float, f, 16, 8);
+#endif
   VLOAD(vector, buffer, q, float, f, 32, 4);
 
   /* Choose init value arbitrarily, will be used for vector
@@ -106,6 +135,9 @@ void FNNAME (INSN_NAME) (void)
   VDUP(vector2, , uint, u, 8, 8, 0xF3);
   VDUP(vector2, , uint, u, 16, 4, 0xFFF2);
   VDUP(vector2, , uint, u, 32, 2, 0xFFFFFFF1);
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VDUP (vector2, , float, f, 16, 4, -15.0f);
+#endif
   VDUP(vector2, , float, f, 32, 2, -15.0f);
 
   VDUP(vector2, q, int, s, 8, 16, -4);
@@ -114,6 +146,9 @@ void FNNAME (INSN_NAME) (void)
   VDUP(vector2, q, uint, u, 8, 16, 0xF4);
   VDUP(vector2, q, uint, u, 16, 8, 0xFFF6);
   VDUP(vector2, q, uint, u, 32, 4, 0xFFFFFFF2);
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VDUP (vector2, q, float, f, 16, 8, -14.0f);
+#endif
   VDUP(vector2, q, float, f, 32, 4, -14.0f);
 
   /* The comparison operators produce only unsigned results, which
@@ -154,9 +189,17 @@ void FNNAME (INSN_NAME) (void)
   CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected_q_uint, "");
 
   /* The float variants.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  TEST_VCOMP (INSN_NAME, , float, f, uint, 16, 4);
+  CHECK (TEST_MSG, uint, 16, 4, PRIx16, expected_float, "");
+#endif
   TEST_VCOMP(INSN_NAME, , float, f, uint, 32, 2);
   CHECK(TEST_MSG, uint, 32, 2, PRIx32, expected_float, "");
 
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  TEST_VCOMP (INSN_NAME, q, float, f, uint, 16, 8);
+  CHECK (TEST_MSG, uint, 16, 8, PRIx16, expected_q_float, "");
+#endif
   TEST_VCOMP(INSN_NAME, q, float, f, uint, 32, 4);
   CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected_q_float, "");
 
@@ -176,6 +219,43 @@ void FNNAME (INSN_NAME) (void)
 
 
   /* Extra FP tests with special values (NaN, ....).  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VDUP (vector, , float, f, 16, 4, 1.0);
+  VDUP (vector2, , float, f, 16, 4, NAN);
+  TEST_VCOMP (INSN_NAME, , float, f, uint, 16, 4);
+  CHECK (TEST_MSG, uint, 16, 4, PRIx16, expected_nan, "FP special (NaN)");
+
+  VDUP (vector, , float, f, 16, 4, 1.0);
+  VDUP (vector2, , float, f, 16, 4, -NAN);
+  TEST_VCOMP (INSN_NAME, , float, f, uint, 16, 4);
+  CHECK (TEST_MSG, uint, 16, 4, PRIx16, expected_mnan, " FP special (-NaN)");
+
+  VDUP (vector, , float, f, 16, 4, NAN);
+  VDUP (vector2, , float, f, 16, 4, 1.0);
+  TEST_VCOMP (INSN_NAME, , float, f, uint, 16, 4);
+  CHECK (TEST_MSG, uint, 16, 4, PRIx16, expected_nan2, " FP special (NaN)");
+
+  VDUP (vector, , float, f, 16, 4, 1.0);
+  VDUP (vector2, , float, f, 16, 4, HUGE_VALF);
+  TEST_VCOMP (INSN_NAME, , float, f, uint, 16, 4);
+  CHECK (TEST_MSG, uint, 16, 4, PRIx16, expected_inf, " FP special (inf)");
+
+  VDUP (vector, , float, f, 16, 4, 1.0);
+  VDUP (vector2, , float, f, 16, 4, -HUGE_VALF);
+  TEST_VCOMP (INSN_NAME, , float, f, uint, 16, 4);
+  CHECK (TEST_MSG, uint, 16, 4, PRIx16, expected_minf, " FP special (-inf)");
+
+  VDUP (vector, , float, f, 16, 4, HUGE_VALF);
+  VDUP (vector2, , float, f, 16, 4, 1.0);
+  TEST_VCOMP (INSN_NAME, , float, f, uint, 16, 4);
+  CHECK (TEST_MSG, uint, 16, 4, PRIx16, expected_inf2, " FP special (inf)");
+
+  VDUP (vector, , float, f, 16, 4, -0.0);
+  VDUP (vector2, , float, f, 16, 4, 0.0);
+  TEST_VCOMP (INSN_NAME, , float, f, uint, 16, 4);
+  CHECK (TEST_MSG, uint, 16, 4, PRIx16, expected_mzero, " FP special (-0.0)");
+#endif
+
   VDUP(vector, , float, f, 32, 2, 1.0);
   VDUP(vector2, , float, f, 32, 2, NAN);
   TEST_VCOMP(INSN_NAME, , float, f, uint, 32, 2);
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/cmp_zero_op.inc b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/cmp_zero_op.inc
new file mode 100644
index 0000000..610272f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/cmp_zero_op.inc
@@ -0,0 +1,111 @@
+/* Template file for the validation of compare against zero operators.
+
+   This file is base on cmp_op.inc.  It is meant to be included by the relevant
+   test files, which have to define the intrinsic family to test.  If a given
+   intrinsic supports variants which are not supported by all the other
+   operators, these can be tested by providing a definition for EXTRA_TESTS.  */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+#include <math.h>
+
+/* Additional expected results declaration, they are initialized in
+   each test file.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+extern ARRAY(expected_float, uint, 16, 4);
+extern ARRAY(expected_q_float, uint, 16, 8);
+extern ARRAY(expected_uint2, uint, 16, 4);
+extern ARRAY(expected_uint3, uint, 16, 4);
+extern ARRAY(expected_uint4, uint, 16, 4);
+extern ARRAY(expected_nan, uint, 16, 4);
+extern ARRAY(expected_mnan, uint, 16, 4);
+extern ARRAY(expected_inf, uint, 16, 4);
+extern ARRAY(expected_minf, uint, 16, 4);
+extern ARRAY(expected_zero, uint, 16, 4);
+extern ARRAY(expected_mzero, uint, 16, 4);
+#endif
+
+#define FNNAME1(NAME) exec_ ## NAME
+#define FNNAME(NAME) FNNAME1(NAME)
+
+void FNNAME (INSN_NAME) (void)
+{
+  /* Basic test: y=vcomp(x1,x2), then store the result.  */
+#define TEST_VCOMP1(INSN, Q, T1, T2, T3, W, N)				\
+  VECT_VAR(vector_res, T3, W, N) =					\
+    INSN##Q##_##T2##W(VECT_VAR(vector, T1, W, N));			\
+  vst1##Q##_u##W(VECT_VAR(result, T3, W, N), VECT_VAR(vector_res, T3, W, N))
+
+#define TEST_VCOMP(INSN, Q, T1, T2, T3, W, N)				\
+  TEST_VCOMP1(INSN, Q, T1, T2, T3, W, N)
+
+  /* No need for 64 bits elements.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  DECL_VARIABLE (vector, float, 16, 4);
+  DECL_VARIABLE (vector, float, 16, 8);
+#endif
+
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  DECL_VARIABLE(vector_res, uint, 16, 4);
+  DECL_VARIABLE(vector_res, uint, 16, 8);
+#endif
+
+  clean_results ();
+
+  /* Choose init value arbitrarily, will be used for vector
+     comparison.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VDUP (vector, , float, f, 16, 4, -15.0f);
+  VDUP (vector, q, float, f, 16, 8, 14.0f);
+#endif
+
+  /* Float variants.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  TEST_VCOMP (INSN_NAME, , float, f, uint, 16, 4);
+  TEST_VCOMP (INSN_NAME, q, float, f, uint, 16, 8);
+#endif
+
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  CHECK (TEST_MSG, uint, 16, 4, PRIx16, expected_float, "");
+  CHECK (TEST_MSG, uint, 16, 8, PRIx16, expected_q_float, "");
+#endif
+
+  /* Extra FP tests with special values (NaN, ....).  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VDUP (vector, , float, f, 16, 4, NAN);
+  TEST_VCOMP (INSN_NAME, , float, f, uint, 16, 4);
+  CHECK (TEST_MSG, uint, 16, 4, PRIx16, expected_nan, "FP special (NaN)");
+
+  VDUP (vector, , float, f, 16, 4, -NAN);
+  TEST_VCOMP (INSN_NAME, , float, f, uint, 16, 4);
+  CHECK (TEST_MSG, uint, 16, 4, PRIx16, expected_mnan, " FP special (-NaN)");
+
+  VDUP (vector, , float, f, 16, 4, HUGE_VALF);
+  TEST_VCOMP (INSN_NAME, , float, f, uint, 16, 4);
+  CHECK (TEST_MSG, uint, 16, 4, PRIx16, expected_inf, " FP special (inf)");
+
+  VDUP (vector, , float, f, 16, 4, -HUGE_VALF);
+  TEST_VCOMP (INSN_NAME, , float, f, uint, 16, 4);
+  CHECK (TEST_MSG, uint, 16, 4, PRIx16, expected_minf, " FP special (-inf)");
+
+  VDUP (vector, , float, f, 16, 4, 0.0);
+  TEST_VCOMP (INSN_NAME, , float, f, uint, 16, 4);
+  CHECK (TEST_MSG, uint, 16, 4, PRIx16, expected_zero, " FP special (0.0)");
+
+  VDUP (vector, , float, f, 16, 4, 0.0);
+  TEST_VCOMP (INSN_NAME, , float, f, uint, 16, 4);
+  CHECK (TEST_MSG, uint, 16, 4, PRIx16, expected_mzero, " FP special (-0.0)");
+#endif
+
+#ifdef EXTRA_TESTS
+  EXTRA_TESTS();
+#endif
+}
+
+int main (void)
+{
+  FNNAME (INSN_NAME) ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vabd.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vabd.c
index 67d2af1..3049065 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vabd.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vabd.c
@@ -30,10 +30,20 @@ VECT_VAR_DECL(expected,uint,32,4) [] = { 0xffffffd0, 0xffffffd1,
 					 0xffffffd2, 0xffffffd3 };
 VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0x42407ae1, 0x423c7ae1,
 					   0x42387ae1, 0x42347ae1 };
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL(expected, hfloat, 16, 4) [] = { 0x4e13, 0x4dd3,
+					      0x4d93, 0x4d53 };
+VECT_VAR_DECL(expected, hfloat, 16, 8) [] = { 0x5204, 0x51e4, 0x51c4, 0x51a4,
+					      0x5184, 0x5164, 0x5144, 0x5124 };
+#endif
 
 /* Additional expected results for float32 variants with specially
    chosen input values.  */
 VECT_VAR_DECL(expected_float32,hfloat,32,4) [] = { 0x0, 0x0, 0x0, 0x0 };
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL(expected_float16, hfloat, 16, 8) [] = { 0x0, 0x0, 0x0, 0x0,
+						      0x0, 0x0, 0x0, 0x0 };
+#endif
 
 #define TEST_MSG "VABD/VABDQ"
 void exec_vabd (void)
@@ -65,6 +75,17 @@ void exec_vabd (void)
   DECL_VABD_VAR(vector2);
   DECL_VABD_VAR(vector_res);
 
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  DECL_VARIABLE(vector1, float, 16, 4);
+  DECL_VARIABLE(vector1, float, 16, 8);
+
+  DECL_VARIABLE(vector2, float, 16, 4);
+  DECL_VARIABLE(vector2, float, 16, 8);
+
+  DECL_VARIABLE(vector_res, float, 16, 4);
+  DECL_VARIABLE(vector_res, float, 16, 8);
+#endif
+
   clean_results ();
 
   /* Initialize input "vector1" from "buffer".  */
@@ -82,6 +103,12 @@ void exec_vabd (void)
   VLOAD(vector1, buffer, q, uint, u, 16, 8);
   VLOAD(vector1, buffer, q, uint, u, 32, 4);
   VLOAD(vector1, buffer, q, float, f, 32, 4);
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VLOAD(vector1, buffer, , float, f, 16, 4);
+  VLOAD(vector1, buffer, , float, f, 16, 4);
+  VLOAD(vector1, buffer, q, float, f, 16, 8);
+  VLOAD(vector1, buffer, q, float, f, 16, 8);
+#endif
 
   /* Choose init value arbitrarily.  */
   VDUP(vector2, , int, s, 8, 8, 1);
@@ -98,6 +125,10 @@ void exec_vabd (void)
   VDUP(vector2, q, uint, u, 16, 8, 12);
   VDUP(vector2, q, uint, u, 32, 4, 32);
   VDUP(vector2, q, float, f, 32, 4, 32.12f);
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VDUP(vector2, , float, f, 16, 4, 8.3f);
+  VDUP(vector2, q, float, f, 16, 8, 32.12f);
+#endif
 
   /* Execute the tests.  */
   TEST_VABD(, int, s, 8, 8);
@@ -115,6 +146,11 @@ void exec_vabd (void)
   TEST_VABD(q, uint, u, 32, 4);
   TEST_VABD(q, float, f, 32, 4);
 
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  TEST_VABD(, float, f, 16, 4);
+  TEST_VABD(q, float, f, 16, 8);
+#endif
+
   CHECK(TEST_MSG, int, 8, 8, PRIx8, expected, "");
   CHECK(TEST_MSG, int, 16, 4, PRIx16, expected, "");
   CHECK(TEST_MSG, int, 32, 2, PRIx32, expected, "");
@@ -129,7 +165,10 @@ void exec_vabd (void)
   CHECK(TEST_MSG, uint, 16, 8, PRIx16, expected, "");
   CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected, "");
   CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected, "");
-
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected, "");
+  CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected, "");
+#endif
 
   /* Extra FP tests with special values (-0.0, ....) */
   VDUP(vector1, q, float, f, 32, 4, -0.0f);
@@ -137,11 +176,27 @@ void exec_vabd (void)
   TEST_VABD(q, float, f, 32, 4);
   CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_float32, " FP special (-0.0)");
 
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VDUP(vector1, q, float, f, 16, 8, -0.0f);
+  VDUP(vector2, q, float, f, 16, 8, 0.0);
+  TEST_VABD(q, float, f, 16, 8);
+  CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_float16,
+	   " FP special (-0.0)");
+#endif
+
   /* Extra FP tests with special values (-0.0, ....) */
   VDUP(vector1, q, float, f, 32, 4, 0.0f);
   VDUP(vector2, q, float, f, 32, 4, -0.0);
   TEST_VABD(q, float, f, 32, 4);
   CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_float32, " FP special (-0.0)");
+
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VDUP(vector1, q, float, f, 16, 8, 0.0f);
+  VDUP(vector2, q, float, f, 16, 8, -0.0);
+  TEST_VABD(q, float, f, 16, 8);
+  CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_float16,
+	   " FP special (-0.0)");
+#endif
 }
 
 int main (void)
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vabs.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vabs.c
index 9c80ef1..9d6d5b2 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vabs.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vabs.c
@@ -21,24 +21,52 @@ VECT_VAR_DECL(expected,int,32,4) [] = { 0x10, 0xf, 0xe, 0xd };
 /* Expected results for float32 variants. Needs to be separated since
    the generic test function does not test floating-point
    versions.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL(expected_float16, hfloat, 16, 4) [] = { 0x409a, 0x409a,
+						      0x409a, 0x409a };
+VECT_VAR_DECL(expected_float16, hfloat, 16, 8) [] = { 0x42cd, 0x42cd,
+						      0x42cd, 0x42cd,
+						      0x42cd, 0x42cd,
+						      0x42cd, 0x42cd };
+#endif
 VECT_VAR_DECL(expected_float32,hfloat,32,2) [] = { 0x40133333, 0x40133333 };
 VECT_VAR_DECL(expected_float32,hfloat,32,4) [] = { 0x4059999a, 0x4059999a,
 						   0x4059999a, 0x4059999a };
 
 void exec_vabs_f32(void)
 {
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  DECL_VARIABLE(vector, float, 16, 4);
+  DECL_VARIABLE(vector, float, 16, 8);
+#endif
   DECL_VARIABLE(vector, float, 32, 2);
   DECL_VARIABLE(vector, float, 32, 4);
 
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  DECL_VARIABLE(vector_res, float, 16, 4);
+  DECL_VARIABLE(vector_res, float, 16, 8);
+#endif
   DECL_VARIABLE(vector_res, float, 32, 2);
   DECL_VARIABLE(vector_res, float, 32, 4);
 
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VDUP(vector, , float, f, 16, 4, -2.3f);
+  VDUP(vector, q, float, f, 16, 8, 3.4f);
+#endif
   VDUP(vector, , float, f, 32, 2, -2.3f);
   VDUP(vector, q, float, f, 32, 4, 3.4f);
 
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  TEST_UNARY_OP(INSN_NAME, , float, f, 16, 4);
+  TEST_UNARY_OP(INSN_NAME, q, float, f, 16, 8);
+#endif
   TEST_UNARY_OP(INSN_NAME, , float, f, 32, 2);
   TEST_UNARY_OP(INSN_NAME, q, float, f, 32, 4);
 
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected_float16, "");
+  CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_float16, "");
+#endif
   CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected_float32, "");
   CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_float32, "");
 }
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vadd.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vadd.c
index 7be1401..1561dc1 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vadd.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vadd.c
@@ -43,6 +43,14 @@ VECT_VAR_DECL(expected,uint,64,2) [] = { 0xfffffffffffffff3,
 VECT_VAR_DECL(expected_float32,hfloat,32,2) [] = { 0x40d9999a, 0x40d9999a };
 VECT_VAR_DECL(expected_float32,hfloat,32,4) [] = { 0x41100000, 0x41100000,
 						   0x41100000, 0x41100000 };
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL(expected_float16, hfloat, 16, 4) [] = { 0x46cd, 0x46cd,
+						      0x46cd, 0x46cd };
+VECT_VAR_DECL(expected_float16, hfloat, 16, 8) [] = { 0x4880, 0x4880,
+						      0x4880, 0x4880,
+						      0x4880, 0x4880,
+						      0x4880, 0x4880 };
+#endif
 
 void exec_vadd_f32(void)
 {
@@ -66,4 +74,27 @@ void exec_vadd_f32(void)
 
   CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected_float32, "");
   CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_float32, "");
+
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  DECL_VARIABLE(vector, float, 16, 4);
+  DECL_VARIABLE(vector, float, 16, 8);
+
+  DECL_VARIABLE(vector2, float, 16, 4);
+  DECL_VARIABLE(vector2, float, 16, 8);
+
+  DECL_VARIABLE(vector_res, float, 16, 4);
+  DECL_VARIABLE(vector_res, float, 16, 8);
+
+  VDUP(vector, , float, f, 16, 4, 2.3f);
+  VDUP(vector, q, float, f, 16, 8, 3.4f);
+
+  VDUP(vector2, , float, f, 16, 4, 4.5f);
+  VDUP(vector2, q, float, f, 16, 8, 5.6f);
+
+  TEST_BINARY_OP(INSN_NAME, , float, f, 16, 4);
+  TEST_BINARY_OP(INSN_NAME, q, float, f, 16, 8);
+
+  CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected_float16, "");
+  CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_float16, "");
+#endif
 }
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcage.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcage.c
index 1fadf66..ab00b96 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcage.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcage.c
@@ -11,3 +11,13 @@ VECT_VAR_DECL(expected,uint,32,4) [] = { 0xffffffff, 0xffffffff,
 VECT_VAR_DECL(expected2,uint,32,2) [] = { 0xffffffff, 0xffffffff };
 VECT_VAR_DECL(expected2,uint,32,4) [] = { 0xffffffff, 0xffffffff,
 					  0xffffffff, 0xffffffff };
+
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL (expected, uint, 16, 4) [] = { 0xffff, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL (expected, uint, 16, 8) [] = { 0xffff, 0xffff, 0xffff, 0x0,
+					     0x0, 0x0, 0x0, 0x0 };
+
+VECT_VAR_DECL (expected2, uint, 16, 4) [] = { 0xffff, 0xffff, 0xffff, 0xffff };
+VECT_VAR_DECL (expected2, uint, 16, 8) [] = { 0xffff, 0xffff, 0xffff, 0xffff,
+					      0xffff, 0xffff, 0xffff, 0x0 };
+#endif
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcagt.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcagt.c
index b1144a2..81c46a6 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcagt.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcagt.c
@@ -11,3 +11,13 @@ VECT_VAR_DECL(expected,uint,32,4) [] = { 0xffffffff, 0xffffffff,
 VECT_VAR_DECL(expected2,uint,32,2) [] = { 0xffffffff, 0xffffffff };
 VECT_VAR_DECL(expected2,uint,32,4) [] = { 0xffffffff, 0xffffffff,
 					  0xffffffff, 0xffffffff };
+
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL (expected, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL (expected, uint, 16, 8) [] = { 0xffff, 0xffff, 0x0, 0x0,
+					     0x0, 0x0, 0x0, 0x0 };
+
+VECT_VAR_DECL (expected2, uint, 16, 4) [] = { 0xffff, 0xffff, 0xffff, 0xffff };
+VECT_VAR_DECL (expected2, uint, 16, 8) [] = { 0xffff, 0xffff, 0xffff, 0xffff,
+					      0xffff, 0xffff, 0x0, 0x0 };
+#endif
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcale.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcale.c
index bff9e4a..091ffaf 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcale.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcale.c
@@ -9,3 +9,13 @@ VECT_VAR_DECL(expected,uint,32,4) [] = { 0x0, 0x0, 0xffffffff, 0xffffffff };
 
 VECT_VAR_DECL(expected2,uint,32,2) [] = { 0x0, 0x0 };
 VECT_VAR_DECL(expected2,uint,32,4) [] = { 0x0, 0x0, 0x0, 0x0 };
+
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL (expected, uint, 16, 4) [] = { 0xffff, 0xffff, 0xffff, 0xffff };
+VECT_VAR_DECL (expected, uint, 16, 8) [] = { 0x0, 0x0, 0xffff, 0xffff,
+					     0xffff, 0xffff, 0xffff, 0xffff };
+
+VECT_VAR_DECL (expected2, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL (expected2, uint, 16, 8) [] = { 0x0, 0x0, 0x0, 0x0,
+					      0x0, 0x0, 0xffff, 0xffff };
+#endif
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcalt.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcalt.c
index ed652eb..525176a 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcalt.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcalt.c
@@ -9,3 +9,13 @@ VECT_VAR_DECL(expected,uint,32,4) [] = { 0x0, 0x0, 0x0, 0xffffffff };
 
 VECT_VAR_DECL(expected2,uint,32,2) [] = { 0x0, 0x0 };
 VECT_VAR_DECL(expected2,uint,32,4) [] = { 0x0, 0x0, 0x0, 0x0 };
+
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL (expected, uint, 16, 4) [] = { 0x0, 0xffff, 0xffff, 0xffff };
+VECT_VAR_DECL (expected, uint, 16, 8) [] = { 0x0, 0x0, 0x0, 0xffff,
+					     0xffff, 0xffff, 0xffff, 0xffff };
+
+VECT_VAR_DECL (expected2, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL (expected2, uint, 16, 8) [] = { 0x0, 0x0, 0x0, 0x0,
+					      0x0, 0x0, 0x0, 0xffff };
+#endif
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vceq.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vceq.c
index 1e21d50..ede01fb 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vceq.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vceq.c
@@ -32,6 +32,12 @@ VECT_VAR_DECL(expected_q_uint,uint,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
 						0x0, 0x0, 0xffff, 0x0 };
 VECT_VAR_DECL(expected_q_uint,uint,32,4) [] = { 0x0, 0x0, 0xffffffff, 0x0 };
 
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL (expected_float, uint, 16, 4) [] = { 0x0, 0xffff, 0x0, 0x0 };
+VECT_VAR_DECL (expected_q_float, uint, 16, 8) [] = { 0x0, 0x0, 0xffff, 0x0,
+						     0x0, 0x0, 0x0, 0x0, };
+#endif
+
 VECT_VAR_DECL(expected_float,uint,32,2) [] = { 0x0, 0xffffffff };
 VECT_VAR_DECL(expected_q_float,uint,32,4) [] = { 0x0, 0x0, 0xffffffff, 0x0 };
 
@@ -39,6 +45,18 @@ VECT_VAR_DECL(expected_uint2,uint,32,2) [] = { 0xffffffff, 0x0 };
 VECT_VAR_DECL(expected_uint3,uint,32,2) [] = { 0x0, 0xffffffff };
 VECT_VAR_DECL(expected_uint4,uint,32,2) [] = { 0xffffffff, 0x0 };
 
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL (expected_nan, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0  };
+VECT_VAR_DECL (expected_mnan, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL (expected_nan2, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
+
+VECT_VAR_DECL (expected_inf, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL (expected_minf, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL (expected_inf2, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL (expected_mzero, uint, 16, 4) [] = { 0xffff, 0xffff,
+						   0xffff, 0xffff };
+#endif
+
 VECT_VAR_DECL(expected_nan,uint,32,2) [] = { 0x0, 0x0 };
 VECT_VAR_DECL(expected_mnan,uint,32,2) [] = { 0x0, 0x0 };
 VECT_VAR_DECL(expected_nan2,uint,32,2) [] = { 0x0, 0x0 };
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vceqz_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vceqz_1.c
new file mode 100644
index 0000000..eefaa7a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vceqz_1.c
@@ -0,0 +1,27 @@
+/* This file tests an intrinsic which currently has only an f16 variant and that
+   is only available when FP16 arithmetic instructions are supported.  */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+
+#define INSN_NAME vceqz
+#define TEST_MSG "VCEQZ/VCEQZQ"
+
+#include "cmp_zero_op.inc"
+
+/* Expected results.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL (expected_float, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0  };
+VECT_VAR_DECL (expected_q_float, uint, 16, 8) [] = { 0x0, 0x0, 0x0, 0x0,
+						     0x0, 0x0, 0x0, 0x0 };
+#endif
+
+/* Extra FP tests with special values (NaN, ....).  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL (expected_nan, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0  };
+VECT_VAR_DECL (expected_mnan, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL (expected_inf, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL (expected_minf, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL (expected_zero, uint, 16, 4) [] = { 0xffff, 0xffff,
+						  0xffff, 0xffff };
+VECT_VAR_DECL (expected_mzero, uint, 16, 4) [] = { 0xffff, 0xffff,
+						   0xffff, 0xffff };
+#endif
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcge.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcge.c
index 22a5d67..0ec7c7b 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcge.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcge.c
@@ -28,6 +28,14 @@ VECT_VAR_DECL(expected_q_uint,uint,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
 						0, 0x0, 0xffff, 0xffff };
 VECT_VAR_DECL(expected_q_uint,uint,32,4) [] = { 0x0, 0x0, 0xffffffff, 0xffffffff };
 
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL (expected_float, uint, 16, 4) [] = { 0x0, 0xffff, 0xffff, 0xffff };
+VECT_VAR_DECL (expected_q_float, uint, 16, 8) [] = { 0x0, 0x0,
+						     0xffff, 0xffff,
+						     0xffff, 0xffff,
+						     0xffff, 0xffff };
+#endif
+
 VECT_VAR_DECL(expected_float,uint,32,2) [] = { 0x0, 0xffffffff };
 VECT_VAR_DECL(expected_q_float,uint,32,4) [] = { 0x0, 0x0, 0xffffffff, 0xffffffff };
 
@@ -35,6 +43,20 @@ VECT_VAR_DECL(expected_uint2,uint,32,2) [] = { 0xffffffff, 0xffffffff };
 VECT_VAR_DECL(expected_uint3,uint,32,2) [] = { 0x0, 0xffffffff };
 VECT_VAR_DECL(expected_uint4,uint,32,2) [] = { 0xffffffff, 0xffffffff };
 
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL (expected_nan, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0  };
+VECT_VAR_DECL (expected_mnan, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL (expected_nan2, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
+
+VECT_VAR_DECL (expected_inf, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL (expected_minf, uint, 16, 4) [] = { 0xffff, 0xffff,
+						  0xffff, 0xffff };
+VECT_VAR_DECL (expected_inf2, uint, 16, 4) [] = { 0xffff, 0xffff,
+						  0xffff, 0xffff };
+VECT_VAR_DECL (expected_mzero, uint, 16, 4) [] = { 0xffff, 0xffff,
+						   0xffff, 0xffff };
+#endif
+
 VECT_VAR_DECL(expected_nan,uint,32,2) [] = { 0x0, 0x0 };
 VECT_VAR_DECL(expected_mnan,uint,32,2) [] = { 0x0, 0x0 };
 VECT_VAR_DECL(expected_nan2,uint,32,2) [] = { 0x0, 0x0 };
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcgez_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcgez_1.c
new file mode 100644
index 0000000..3ce74f2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcgez_1.c
@@ -0,0 +1,30 @@
+/* This file tests an intrinsic which currently has only an f16 variant and that
+   is only available when FP16 arithmetic instructions are supported.  */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+
+#define INSN_NAME vcgez
+#define TEST_MSG "VCGEZ/VCGEZQ"
+
+#include "cmp_zero_op.inc"
+
+/* Expected results.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL (expected_float, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0  };
+VECT_VAR_DECL (expected_q_float, uint, 16, 8) [] = { 0xffff, 0xffff,
+						     0xffff, 0xffff,
+						     0xffff, 0xffff,
+						     0xffff, 0xffff };
+#endif
+
+/* Extra FP tests with special values (NaN, ....).  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL (expected_nan, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0  };
+VECT_VAR_DECL (expected_mnan, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL (expected_inf, uint, 16, 4) [] = { 0xffff, 0xffff,
+						 0xffff, 0xffff };
+VECT_VAR_DECL (expected_minf, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL (expected_zero, uint, 16, 4) [] = { 0xffff, 0xffff,
+						  0xffff, 0xffff };
+VECT_VAR_DECL (expected_mzero, uint, 16, 4) [] = { 0xffff, 0xffff,
+						   0xffff, 0xffff };
+#endif
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcgt.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcgt.c
index c44819a..3976d57 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcgt.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcgt.c
@@ -28,6 +28,14 @@ VECT_VAR_DECL(expected_q_uint,uint,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
 						0x0, 0x0, 0x0, 0xffff };
 VECT_VAR_DECL(expected_q_uint,uint,32,4) [] = { 0x0, 0x0, 0x0, 0xffffffff };
 
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL (expected_float, uint, 16, 4) [] = { 0x0, 0x0, 0xffff, 0xffff };
+VECT_VAR_DECL (expected_q_float, uint, 16, 8) [] = { 0x0, 0x0,
+						     0x0, 0xffff,
+						     0xffff, 0xffff,
+						     0xffff, 0xffff };
+#endif
+
 VECT_VAR_DECL(expected_float,uint,32,2) [] = { 0x0, 0x0 };
 VECT_VAR_DECL(expected_q_float,uint,32,4) [] = { 0x0, 0x0, 0x0, 0xffffffff };
 
@@ -35,6 +43,19 @@ VECT_VAR_DECL(expected_uint2,uint,32,2) [] = { 0x0, 0xffffffff };
 VECT_VAR_DECL(expected_uint3,uint,32,2) [] = { 0x0, 0x0 };
 VECT_VAR_DECL(expected_uint4,uint,32,2) [] = { 0x0, 0xffffffff };
 
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL (expected_nan, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0  };
+VECT_VAR_DECL (expected_mnan, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL (expected_nan2, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
+
+VECT_VAR_DECL (expected_inf, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL (expected_minf, uint, 16, 4) [] = { 0xffff, 0xffff,
+						  0xffff, 0xffff };
+VECT_VAR_DECL (expected_inf2, uint, 16, 4) [] = { 0xffff, 0xffff,
+						  0xffff, 0xffff };
+VECT_VAR_DECL (expected_mzero, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
+#endif
+
 VECT_VAR_DECL(expected_nan,uint,32,2) [] = { 0x0, 0x0 };
 VECT_VAR_DECL(expected_mnan,uint,32,2) [] = { 0x0, 0x0 };
 VECT_VAR_DECL(expected_nan2,uint,32,2) [] = { 0x0, 0x0 };
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcgtz_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcgtz_1.c
new file mode 100644
index 0000000..a096dc7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcgtz_1.c
@@ -0,0 +1,28 @@
+/* This file tests an intrinsic which currently has only an f16 variant and that
+   is only available when FP16 arithmetic instructions are supported.  */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+
+#define INSN_NAME vcgtz
+#define TEST_MSG "VCGTZ/VCGTZQ"
+
+#include "cmp_zero_op.inc"
+
+/* Expected results.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL (expected_float, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0  };
+VECT_VAR_DECL (expected_q_float, uint, 16, 8) [] = { 0xffff, 0xffff,
+						     0xffff, 0xffff,
+						     0xffff, 0xffff,
+						     0xffff, 0xffff };
+#endif
+
+/* Extra FP tests with special values (NaN, ....).  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL (expected_nan, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0  };
+VECT_VAR_DECL (expected_mnan, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL (expected_inf, uint, 16, 4) [] = { 0xffff, 0xffff,
+						 0xffff, 0xffff };
+VECT_VAR_DECL (expected_minf, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL (expected_zero, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL (expected_mzero, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
+#endif
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcle.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcle.c
index a59b543..49f89d8 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcle.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcle.c
@@ -31,6 +31,14 @@ VECT_VAR_DECL(expected_q_uint,uint,16,8) [] = { 0xffff, 0xffff, 0xffff, 0xffff,
 VECT_VAR_DECL(expected_q_uint,uint,32,4) [] = { 0xffffffff, 0xffffffff,
 						0xffffffff, 0x0 };
 
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL (expected_float, uint, 16, 4) [] = { 0xffff, 0xffff, 0x0, 0x0 };
+VECT_VAR_DECL (expected_q_float, uint, 16, 8) [] = { 0xffff, 0xffff,
+						     0xffff, 0x0,
+						     0x0, 0x0,
+						     0x0, 0x0 };
+#endif
+
 VECT_VAR_DECL(expected_float,uint,32,2) [] = { 0xffffffff, 0xffffffff };
 VECT_VAR_DECL(expected_q_float,uint,32,4) [] = { 0xffffffff, 0xffffffff,
 						 0xffffffff, 0x0 };
@@ -39,6 +47,20 @@ VECT_VAR_DECL(expected_uint2,uint,32,2) [] = { 0xffffffff, 0x0 };
 VECT_VAR_DECL(expected_uint3,uint,32,2) [] = { 0xffffffff, 0xffffffff };
 VECT_VAR_DECL(expected_uint4,uint,32,2) [] = { 0xffffffff, 0x0 };
 
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL (expected_nan, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0  };
+VECT_VAR_DECL (expected_mnan, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL (expected_nan2, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
+
+VECT_VAR_DECL (expected_inf, uint, 16, 4) [] = { 0xffff, 0xffff,
+						 0xffff, 0xffff };
+VECT_VAR_DECL (expected_minf, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL (expected_inf2, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
+
+VECT_VAR_DECL (expected_mzero, uint, 16, 4) [] = { 0xffff, 0xffff,
+						   0xffff, 0xffff };
+#endif
+
 VECT_VAR_DECL(expected_nan,uint,32,2) [] = { 0x0, 0x0 };
 VECT_VAR_DECL(expected_mnan,uint,32,2) [] = { 0x0, 0x0 };
 VECT_VAR_DECL(expected_nan2,uint,32,2) [] = { 0x0, 0x0 };
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vclez_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vclez_1.c
new file mode 100644
index 0000000..7e18e3d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vclez_1.c
@@ -0,0 +1,29 @@
+/* This file tests an intrinsic which currently has only an f16 variant and that
+   is only available when FP16 arithmetic instructions are supported.  */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+
+#define INSN_NAME vclez
+#define TEST_MSG "VCLEZ/VCLEZQ"
+
+#include "cmp_zero_op.inc"
+
+/* Expected results.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL (expected_float, uint, 16, 4) [] = { 0xffff, 0xffff,
+						   0xffff, 0xffff };
+VECT_VAR_DECL (expected_q_float, uint, 16, 8) [] = { 0x0, 0x0, 0x0, 0x0 };
+#endif
+
+/* Extra FP tests with special values (NaN, ....).  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL (expected_nan, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0  };
+VECT_VAR_DECL (expected_mnan, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL (expected_inf, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
+
+VECT_VAR_DECL (expected_minf, uint, 16, 4) [] = { 0xffff, 0xffff,
+						  0xffff, 0xffff };
+VECT_VAR_DECL (expected_zero, uint, 16, 4) [] = { 0xffff, 0xffff,
+						  0xffff, 0xffff };
+VECT_VAR_DECL (expected_mzero, uint, 16, 4) [] = { 0xffff, 0xffff,
+						   0xffff, 0xffff };
+#endif
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vclt.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vclt.c
index 6ef2b4c..b6f8d87 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vclt.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vclt.c
@@ -30,6 +30,14 @@ VECT_VAR_DECL(expected_q_uint,uint,16,8) [] = { 0xffff, 0xffff, 0xffff, 0xffff,
 VECT_VAR_DECL(expected_q_uint,uint,32,4) [] = { 0xffffffff, 0xffffffff,
 						0x0, 0x0 };
 
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL (expected_float, uint, 16, 4) [] = { 0xffff, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL (expected_q_float, uint, 16, 8) [] = { 0xffff, 0xffff,
+						     0x0, 0x0,
+						     0x0, 0x0,
+						     0x0, 0x0 };
+#endif
+
 VECT_VAR_DECL(expected_float,uint,32,2) [] = { 0xffffffff, 0x0 };
 VECT_VAR_DECL(expected_q_float,uint,32,4) [] = { 0xffffffff, 0xffffffff,
 						 0x0, 0x0 };
@@ -38,6 +46,19 @@ VECT_VAR_DECL(expected_uint2,uint,32,2) [] = { 0x0, 0x0 };
 VECT_VAR_DECL(expected_uint3,uint,32,2) [] = { 0xffffffff, 0x0 };
 VECT_VAR_DECL(expected_uint4,uint,32,2) [] = { 0x0, 0x0 };
 
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL (expected_nan, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0  };
+VECT_VAR_DECL (expected_mnan, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL (expected_nan2, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
+
+VECT_VAR_DECL (expected_inf, uint, 16, 4) [] = { 0xffff, 0xffff,
+						 0xffff, 0xffff };
+VECT_VAR_DECL (expected_minf, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL (expected_inf2, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
+
+VECT_VAR_DECL (expected_mzero, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
+#endif
+
 VECT_VAR_DECL(expected_nan,uint,32,2) [] = { 0x0, 0x0 };
 VECT_VAR_DECL(expected_mnan,uint,32,2) [] = { 0x0, 0x0 };
 VECT_VAR_DECL(expected_nan2,uint,32,2) [] = { 0x0, 0x0 };
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcltz_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcltz_1.c
new file mode 100644
index 0000000..9b75cc7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcltz_1.c
@@ -0,0 +1,27 @@
+/* This file tests an intrinsic which currently has only an f16 variant and that
+   is only available when FP16 arithmetic instructions are supported.  */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+
+#define INSN_NAME vcltz
+#define TEST_MSG "VCLTZ/VCLTZQ"
+
+#include "cmp_zero_op.inc"
+
+/* Expected results.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL (expected_float, uint, 16, 4) [] = { 0xffff, 0xffff,
+						   0xffff, 0xffff };
+VECT_VAR_DECL (expected_q_float, uint, 16, 8) [] = { 0x0, 0x0, 0x0, 0x0 };
+#endif
+
+/* Extra FP tests with special values (NaN, ....).  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL (expected_nan, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0  };
+VECT_VAR_DECL (expected_mnan, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL (expected_inf, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
+
+VECT_VAR_DECL (expected_minf, uint, 16, 4) [] = { 0xffff, 0xffff,
+						  0xffff, 0xffff };
+VECT_VAR_DECL (expected_zero, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL (expected_mzero, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
+#endif
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt.c
index 8e80f1e..b2b861a 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt.c
@@ -4,36 +4,99 @@
 #include <math.h>
 
 /* Expected results for vcvt.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL(expected_s, hfloat, 16, 4) [] =
+{ 0xcc00, 0xcb80, 0xcb00, 0xca80 };
+VECT_VAR_DECL(expected_u, hfloat, 16, 4) [] =
+{ 0x7c00, 0x7c00, 0x7c00, 0x7c00, };
+VECT_VAR_DECL(expected_s, hfloat, 16, 8) [] =
+{ 0xcc00, 0xcb80, 0xcb00, 0xca80,
+  0xca00, 0xc980, 0xc900, 0xc880 };
+VECT_VAR_DECL(expected_u, hfloat, 16, 8) [] =
+{ 0x7c00, 0x7c00, 0x7c00, 0x7c00,
+  0x7c00, 0x7c00, 0x7c00, 0x7c00, };
+#endif
 VECT_VAR_DECL(expected_s,hfloat,32,2) [] = { 0xc1800000, 0xc1700000 };
 VECT_VAR_DECL(expected_u,hfloat,32,2) [] = { 0x4f800000, 0x4f800000 };
 VECT_VAR_DECL(expected_s,hfloat,32,4) [] = { 0xc1800000, 0xc1700000,
-					   0xc1600000, 0xc1500000 };
+					     0xc1600000, 0xc1500000 };
 VECT_VAR_DECL(expected_u,hfloat,32,4) [] = { 0x4f800000, 0x4f800000,
-					   0x4f800000, 0x4f800000 };
+					     0x4f800000, 0x4f800000 };
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL(expected, int, 16, 4) [] = { 0xfff1, 0x5, 0xfff1, 0x5 };
+VECT_VAR_DECL(expected, uint, 16, 4) [] = { 0x0, 0x5, 0x0, 0x5 };
+VECT_VAR_DECL(expected, int, 16, 8) [] = { 0x0, 0x0, 0xf, 0xfff1,
+					   0x0, 0x0, 0xf, 0xfff1 };
+VECT_VAR_DECL(expected, uint, 16, 8) [] = { 0x0, 0x0, 0xf, 0x0,
+					    0x0, 0x0, 0xf, 0x0 };
+#endif
 VECT_VAR_DECL(expected,int,32,2) [] = { 0xfffffff1, 0x5 };
 VECT_VAR_DECL(expected,uint,32,2) [] = { 0x0, 0x5 };
 VECT_VAR_DECL(expected,int,32,4) [] = { 0x0, 0x0, 0xf, 0xfffffff1 };
 VECT_VAR_DECL(expected,uint,32,4) [] = { 0x0, 0x0, 0xf, 0x0 };
 
 /* Expected results for vcvt_n.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL(expected_vcvt_n_s, hfloat, 16, 4) [] = { 0xc400, 0xc380,
+						       0xc300, 0xc280 };
+VECT_VAR_DECL(expected_vcvt_n_u, hfloat, 16, 4) [] = { 0x6000, 0x6000,
+						       0x6000, 0x6000 };
+VECT_VAR_DECL(expected_vcvt_n_s, hfloat, 16, 8) [] = { 0xb000, 0xaf80,
+						       0xaf00, 0xae80,
+						       0xae00, 0xad80,
+						       0xad00, 0xac80 };
+VECT_VAR_DECL(expected_vcvt_n_u, hfloat, 16, 8) [] = { 0x4c00, 0x4c00,
+						       0x4c00, 0x4c00,
+						       0x4c00, 0x4c00,
+						       0x4c00, 0x4c00 };
+#endif
 VECT_VAR_DECL(expected_vcvt_n_s,hfloat,32,2) [] = { 0xc0800000, 0xc0700000 };
 VECT_VAR_DECL(expected_vcvt_n_u,hfloat,32,2) [] = { 0x4c000000, 0x4c000000 };
 VECT_VAR_DECL(expected_vcvt_n_s,hfloat,32,4) [] = { 0xb2800000, 0xb2700000,
 						  0xb2600000, 0xb2500000 };
 VECT_VAR_DECL(expected_vcvt_n_u,hfloat,32,4) [] = { 0x49800000, 0x49800000,
 						  0x49800000, 0x49800000 };
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL(expected_vcvt_n, int, 16, 4) [] = { 0xffc3, 0x15,
+						  0xffc3, 0x15 };
+VECT_VAR_DECL(expected_vcvt_n, uint, 16, 4) [] = { 0x0, 0x2a6, 0x0, 0x2a6 };
+VECT_VAR_DECL(expected_vcvt_n, int, 16, 8) [] = { 0x0, 0x0, 0x78f, 0xf871,
+						  0x0, 0x0, 0x78f, 0xf871 };
+VECT_VAR_DECL(expected_vcvt_n, uint, 16, 8) [] = { 0x0, 0x0, 0xf1e0, 0x0,
+						   0x0, 0x0, 0xf1e0, 0x0 };
+#endif
 VECT_VAR_DECL(expected_vcvt_n,int,32,2) [] = { 0xff0b3333, 0x54cccd };
 VECT_VAR_DECL(expected_vcvt_n,uint,32,2) [] = { 0x0, 0x15 };
 VECT_VAR_DECL(expected_vcvt_n,int,32,4) [] = { 0x0, 0x0, 0x1e3d7, 0xfffe1c29 };
 VECT_VAR_DECL(expected_vcvt_n,uint,32,4) [] = { 0x0, 0x0, 0x1e, 0x0 };
 
 /* Expected results for vcvt with rounding.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL(expected_rounding, int, 16, 4) [] = { 0xa, 0xa, 0xa, 0xa };
+VECT_VAR_DECL(expected_rounding, uint, 16, 4) [] = { 0xa, 0xa, 0xa, 0xa };
+VECT_VAR_DECL(expected_rounding, int, 16, 8) [] = { 0x7d, 0x7d, 0x7d, 0x7d,
+						    0x7d, 0x7d, 0x7d, 0x7d };
+VECT_VAR_DECL(expected_rounding, uint, 16, 8) [] = { 0x7d, 0x7d, 0x7d, 0x7d,
+						     0x7d, 0x7d, 0x7d, 0x7d };
+#endif
 VECT_VAR_DECL(expected_rounding,int,32,2) [] = { 0xa, 0xa };
 VECT_VAR_DECL(expected_rounding,uint,32,2) [] = { 0xa, 0xa };
 VECT_VAR_DECL(expected_rounding,int,32,4) [] = { 0x7d, 0x7d, 0x7d, 0x7d };
 VECT_VAR_DECL(expected_rounding,uint,32,4) [] = { 0x7d, 0x7d, 0x7d, 0x7d };
 
 /* Expected results for vcvt_n with rounding.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL(expected_vcvt_n_rounding, int, 16, 4) [] =
+{ 0x533, 0x533, 0x533, 0x533 };
+VECT_VAR_DECL(expected_vcvt_n_rounding, uint, 16, 4) [] =
+{ 0x533, 0x533, 0x533, 0x533 };
+VECT_VAR_DECL(expected_vcvt_n_rounding, int, 16, 8) [] =
+{ 0x7fff, 0x7fff, 0x7fff, 0x7fff,
+  0x7fff, 0x7fff, 0x7fff, 0x7fff };
+VECT_VAR_DECL(expected_vcvt_n_rounding, uint, 16, 8) [] =
+{ 0xffff, 0xffff, 0xffff, 0xffff,
+  0xffff, 0xffff, 0xffff, 0xffff };
+#endif
 VECT_VAR_DECL(expected_vcvt_n_rounding,int,32,2) [] = { 0xa66666, 0xa66666 };
 VECT_VAR_DECL(expected_vcvt_n_rounding,uint,32,2) [] = { 0xa66666, 0xa66666 };
 VECT_VAR_DECL(expected_vcvt_n_rounding,int,32,4) [] = { 0xfbccc, 0xfbccc,
@@ -42,11 +105,17 @@ VECT_VAR_DECL(expected_vcvt_n_rounding,uint,32,4) [] = { 0xfbccc, 0xfbccc,
 						0xfbccc, 0xfbccc };
 
 /* Expected results for vcvt_n with saturation.  */
-VECT_VAR_DECL(expected_vcvt_n_saturation,int,32,2) [] = { 0x7fffffff,
-							  0x7fffffff };
-VECT_VAR_DECL(expected_vcvt_n_saturation,int,32,4) [] = { 0x7fffffff,
-							  0x7fffffff,
-					       0x7fffffff, 0x7fffffff };
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL(expected_vcvt_n_saturation, int, 16, 4) [] =
+{ 0x533, 0x533, 0x533, 0x533 };
+VECT_VAR_DECL(expected_vcvt_n_saturation, int, 16, 8) [] =
+{ 0x7fff, 0x7fff, 0x7fff, 0x7fff,
+  0x7fff, 0x7fff, 0x7fff, 0x7fff };
+#endif
+VECT_VAR_DECL(expected_vcvt_n_saturation,int,32,2) [] =
+{ 0x7fffffff, 0x7fffffff };
+VECT_VAR_DECL(expected_vcvt_n_saturation,int,32,4) [] =
+{ 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff };
 
 #define TEST_MSG "VCVT/VCVTQ"
 void exec_vcvt (void)
@@ -89,11 +158,26 @@ void exec_vcvt (void)
 
   /* Initialize input "vector" from "buffer".  */
   TEST_MACRO_ALL_VARIANTS_2_5(VLOAD, vector, buffer);
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VLOAD(vector, buffer, , float, f, 16, 4);
+  VLOAD(vector, buffer, q, float, f, 16, 8);
+#endif
   VLOAD(vector, buffer, , float, f, 32, 2);
   VLOAD(vector, buffer, q, float, f, 32, 4);
 
   /* Make sure some elements have a fractional part, to exercise
      integer conversions.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VSET_LANE(vector, , float, f, 16, 4, 0, -15.3f);
+  VSET_LANE(vector, , float, f, 16, 4, 1, 5.3f);
+  VSET_LANE(vector, , float, f, 16, 4, 2, -15.3f);
+  VSET_LANE(vector, , float, f, 16, 4, 3, 5.3f);
+  VSET_LANE(vector, q, float, f, 16, 8, 4, -15.3f);
+  VSET_LANE(vector, q, float, f, 16, 8, 5, 5.3f);
+  VSET_LANE(vector, q, float, f, 16, 8, 6, -15.3f);
+  VSET_LANE(vector, q, float, f, 16, 8, 7, 5.3f);
+#endif
+
   VSET_LANE(vector, , float, f, 32, 2, 0, -15.3f);
   VSET_LANE(vector, , float, f, 32, 2, 1, 5.3f);
   VSET_LANE(vector, q, float, f, 32, 4, 2, -15.3f);
@@ -103,23 +187,55 @@ void exec_vcvt (void)
      before overwriting them.  */
 #define TEST_MSG2 ""
 
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  /* vcvt_f16_xx.  */
+  TEST_VCVT_FP(, float, f, 16, 4, int, s, expected_s);
+  TEST_VCVT_FP(, float, f, 16, 4, uint, u, expected_u);
+#endif
   /* vcvt_f32_xx.  */
   TEST_VCVT_FP(, float, f, 32, 2, int, s, expected_s);
   TEST_VCVT_FP(, float, f, 32, 2, uint, u, expected_u);
 
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  /* vcvtq_f16_xx.  */
+  TEST_VCVT_FP(q, float, f, 16, 8, int, s, expected_s);
+  TEST_VCVT_FP(q, float, f, 16, 8, uint, u, expected_u);
+#endif
   /* vcvtq_f32_xx.  */
   TEST_VCVT_FP(q, float, f, 32, 4, int, s, expected_s);
   TEST_VCVT_FP(q, float, f, 32, 4, uint, u, expected_u);
 
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  /* vcvt_xx_f16.  */
+  TEST_VCVT(, int, s, 16, 4, float, f, expected);
+  TEST_VCVT(, uint, u, 16, 4, float, f, expected);
+#endif
   /* vcvt_xx_f32.  */
   TEST_VCVT(, int, s, 32, 2, float, f, expected);
   TEST_VCVT(, uint, u, 32, 2, float, f, expected);
 
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VSET_LANE(vector, q, float, f, 16, 8, 0, 0.0f);
+  VSET_LANE(vector, q, float, f, 16, 8, 1, -0.0f);
+  VSET_LANE(vector, q, float, f, 16, 8, 2, 15.12f);
+  VSET_LANE(vector, q, float, f, 16, 8, 3, -15.12f);
+  VSET_LANE(vector, q, float, f, 16, 8, 4, 0.0f);
+  VSET_LANE(vector, q, float, f, 16, 8, 5, -0.0f);
+  VSET_LANE(vector, q, float, f, 16, 8, 6, 15.12f);
+  VSET_LANE(vector, q, float, f, 16, 8, 7, -15.12f);
+#endif
+
   VSET_LANE(vector, q, float, f, 32, 4, 0, 0.0f);
   VSET_LANE(vector, q, float, f, 32, 4, 1, -0.0f);
   VSET_LANE(vector, q, float, f, 32, 4, 2, 15.12f);
   VSET_LANE(vector, q, float, f, 32, 4, 3, -15.12f);
 
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  /* vcvtq_xx_f16.  */
+  TEST_VCVT(q, int, s, 16, 8, float, f, expected);
+  TEST_VCVT(q, uint, u, 16, 8, float, f, expected);
+#endif
+
   /* vcvtq_xx_f32.  */
   TEST_VCVT(q, int, s, 32, 4, float, f, expected);
   TEST_VCVT(q, uint, u, 32, 4, float, f, expected);
@@ -129,18 +245,38 @@ void exec_vcvt (void)
 #undef TEST_MSG
 #define TEST_MSG "VCVT_N/VCVTQ_N"
 
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  /* vcvt_n_f16_xx.  */
+  TEST_VCVT_N_FP(, float, f, 16, 4, int, s, 2, expected_vcvt_n_s);
+  TEST_VCVT_N_FP(, float, f, 16, 4, uint, u, 7, expected_vcvt_n_u);
+#endif
   /* vcvt_n_f32_xx.  */
   TEST_VCVT_N_FP(, float, f, 32, 2, int, s, 2, expected_vcvt_n_s);
   TEST_VCVT_N_FP(, float, f, 32, 2, uint, u, 7, expected_vcvt_n_u);
 
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  /* vcvtq_n_f16_xx.  */
+  TEST_VCVT_N_FP(q, float, f, 16, 8, int, s, 7, expected_vcvt_n_s);
+  TEST_VCVT_N_FP(q, float, f, 16, 8, uint, u, 12, expected_vcvt_n_u);
+#endif
   /* vcvtq_n_f32_xx.  */
   TEST_VCVT_N_FP(q, float, f, 32, 4, int, s, 30, expected_vcvt_n_s);
   TEST_VCVT_N_FP(q, float, f, 32, 4, uint, u, 12, expected_vcvt_n_u);
 
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  /* vcvt_n_xx_f16.  */
+  TEST_VCVT_N(, int, s, 16, 4, float, f, 2, expected_vcvt_n);
+  TEST_VCVT_N(, uint, u, 16, 4, float, f, 7, expected_vcvt_n);
+#endif
   /* vcvt_n_xx_f32.  */
   TEST_VCVT_N(, int, s, 32, 2, float, f, 20, expected_vcvt_n);
   TEST_VCVT_N(, uint, u, 32, 2, float, f, 2, expected_vcvt_n);
 
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  /* vcvtq_n_xx_f16.  */
+  TEST_VCVT_N(q, int, s, 16, 8, float, f, 7, expected_vcvt_n);
+  TEST_VCVT_N(q, uint, u, 16, 8, float, f, 12, expected_vcvt_n);
+#endif
   /* vcvtq_n_xx_f32.  */
   TEST_VCVT_N(q, int, s, 32, 4, float, f, 13, expected_vcvt_n);
   TEST_VCVT_N(q, uint, u, 32, 4, float, f, 1, expected_vcvt_n);
@@ -150,20 +286,49 @@ void exec_vcvt (void)
 #define TEST_MSG "VCVT/VCVTQ"
 #undef TEST_MSG2
 #define TEST_MSG2 "(check rounding)"
+
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VDUP(vector, , float, f, 16, 4, 10.4f);
+  VDUP(vector, q, float, f, 16, 8, 125.9f);
+#endif
   VDUP(vector, , float, f, 32, 2, 10.4f);
   VDUP(vector, q, float, f, 32, 4, 125.9f);
+
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  /* vcvt_xx_f16.  */
+  TEST_VCVT(, int, s, 16, 4, float, f, expected_rounding);
+  TEST_VCVT(, uint, u, 16, 4, float, f, expected_rounding);
+#endif
   /* vcvt_xx_f32.  */
   TEST_VCVT(, int, s, 32, 2, float, f, expected_rounding);
   TEST_VCVT(, uint, u, 32, 2, float, f, expected_rounding);
+
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  /* vcvtq_xx_f16.  */
+  TEST_VCVT(q, int, s, 16, 8, float, f, expected_rounding);
+  TEST_VCVT(q, uint, u, 16, 8, float, f, expected_rounding);
+#endif
   /* vcvtq_xx_f32.  */
   TEST_VCVT(q, int, s, 32, 4, float, f, expected_rounding);
   TEST_VCVT(q, uint, u, 32, 4, float, f, expected_rounding);
 
 #undef TEST_MSG
 #define TEST_MSG "VCVT_N/VCVTQ_N"
+
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  /* vcvt_n_xx_f16.  */
+  TEST_VCVT_N(, int, s, 16, 4, float, f, 7, expected_vcvt_n_rounding);
+  TEST_VCVT_N(, uint, u, 16, 4, float, f, 7, expected_vcvt_n_rounding);
+#endif
   /* vcvt_n_xx_f32.  */
   TEST_VCVT_N(, int, s, 32, 2, float, f, 20, expected_vcvt_n_rounding);
   TEST_VCVT_N(, uint, u, 32, 2, float, f, 20, expected_vcvt_n_rounding);
+
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  /* vcvtq_n_xx_f16.  */
+  TEST_VCVT_N(q, int, s, 16, 8, float, f, 13, expected_vcvt_n_rounding);
+  TEST_VCVT_N(q, uint, u, 16, 8, float, f, 13, expected_vcvt_n_rounding);
+#endif
   /* vcvtq_n_xx_f32.  */
   TEST_VCVT_N(q, int, s, 32, 4, float, f, 13, expected_vcvt_n_rounding);
   TEST_VCVT_N(q, uint, u, 32, 4, float, f, 13, expected_vcvt_n_rounding);
@@ -172,8 +337,18 @@ void exec_vcvt (void)
 #define TEST_MSG "VCVT_N/VCVTQ_N"
 #undef TEST_MSG2
 #define TEST_MSG2 "(check saturation)"
+
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  /* vcvt_n_xx_f16.  */
+  TEST_VCVT_N(, int, s, 16, 4, float, f, 7, expected_vcvt_n_saturation);
+#endif
   /* vcvt_n_xx_f32.  */
   TEST_VCVT_N(, int, s, 32, 2, float, f, 31, expected_vcvt_n_saturation);
+
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  /* vcvtq_n_xx_f16.  */
+  TEST_VCVT_N(q, int, s, 16, 8, float, f, 13, expected_vcvt_n_saturation);
+#endif
   /* vcvtq_n_xx_f32.  */
   TEST_VCVT_N(q, int, s, 32, 4, float, f, 31, expected_vcvt_n_saturation);
 }
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtX.inc b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtX.inc
new file mode 100644
index 0000000..e0a479f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtX.inc
@@ -0,0 +1,113 @@
+/* Template file for VCVT operator validation.
+
+   This file is meant to be included by the relevant test files, which
+   have to define the intrinsic family to test.  If a given intrinsic
+   supports variants which are not supported by all the other vcvt
+   operators, these can be tested by providing a definition for
+   EXTRA_TESTS.
+
+   This file is only used for VCVT? tests, which currently have only f16 to
+   integer variants.  It is based on vcvt.c.  */
+
+#define FNNAME1(NAME) exec_ ## NAME
+#define FNNAME(NAME) FNNAME1 (NAME)
+
+void FNNAME (INSN_NAME) (void)
+{
+  int i;
+
+  /* Basic test: y=vcvt(x), then store the result.  */
+#define TEST_VCVT1(INSN, Q, T1, T2, W, N, TS1, TS2, EXP)	\
+  VECT_VAR(vector_res, T1, W, N) =				\
+    INSN##Q##_##T2##W##_##TS2##W(VECT_VAR(vector, TS1, W, N));	\
+  vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N),			\
+		    VECT_VAR(vector_res, T1, W, N));		\
+  CHECK(TEST_MSG, T1, W, N, PRIx##W, EXP, TEST_MSG2);
+
+#define TEST_VCVT(INSN, Q, T1, T2, W, N, TS1, TS2, EXP)		\
+  TEST_VCVT1 (INSN, Q, T1, T2, W, N, TS1, TS2, EXP)
+
+  DECL_VARIABLE_ALL_VARIANTS(vector);
+  DECL_VARIABLE_ALL_VARIANTS(vector_res);
+
+  clean_results ();
+
+  /* Initialize input "vector" from "buffer".  */
+  TEST_MACRO_ALL_VARIANTS_2_5(VLOAD, vector, buffer);
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VLOAD(vector, buffer, , float, f, 16, 4);
+  VLOAD(vector, buffer, q, float, f, 16, 8);
+#endif
+
+  /* Make sure some elements have a fractional part, to exercise
+     integer conversions.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VSET_LANE(vector, , float, f, 16, 4, 0, -15.3f);
+  VSET_LANE(vector, , float, f, 16, 4, 1, 5.3f);
+  VSET_LANE(vector, , float, f, 16, 4, 2, -15.3f);
+  VSET_LANE(vector, , float, f, 16, 4, 3, 5.3f);
+  VSET_LANE(vector, q, float, f, 16, 8, 4, -15.3f);
+  VSET_LANE(vector, q, float, f, 16, 8, 5, 5.3f);
+  VSET_LANE(vector, q, float, f, 16, 8, 6, -15.3f);
+  VSET_LANE(vector, q, float, f, 16, 8, 7, 5.3f);
+#endif
+
+  /* The same result buffers are used multiple times, so we check them
+     before overwriting them.  */
+#define TEST_MSG2 ""
+
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  /* vcvt?_xx_f16.  */
+  TEST_VCVT(INSN_NAME, , int, s, 16, 4, float, f, expected);
+  TEST_VCVT(INSN_NAME, , uint, u, 16, 4, float, f, expected);
+#endif
+
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VSET_LANE(vector, q, float, f, 16, 8, 0, 0.0f);
+  VSET_LANE(vector, q, float, f, 16, 8, 1, -0.0f);
+  VSET_LANE(vector, q, float, f, 16, 8, 2, 15.12f);
+  VSET_LANE(vector, q, float, f, 16, 8, 3, -15.12f);
+  VSET_LANE(vector, q, float, f, 16, 8, 4, 0.0f);
+  VSET_LANE(vector, q, float, f, 16, 8, 5, -0.0f);
+  VSET_LANE(vector, q, float, f, 16, 8, 6, 15.12f);
+  VSET_LANE(vector, q, float, f, 16, 8, 7, -15.12f);
+#endif
+
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  /* vcvt?q_xx_f16.  */
+  TEST_VCVT(INSN_NAME, q, int, s, 16, 8, float, f, expected);
+  TEST_VCVT(INSN_NAME, q, uint, u, 16, 8, float, f, expected);
+#endif
+
+  /* Check rounding.  */
+#undef TEST_MSG2
+#define TEST_MSG2 "(check rounding)"
+
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VDUP(vector, , float, f, 16, 4, 10.4f);
+  VDUP(vector, q, float, f, 16, 8, 125.9f);
+#endif
+
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  /* vcvt?_xx_f16.  */
+  TEST_VCVT(INSN_NAME, , int, s, 16, 4, float, f, expected_rounding);
+  TEST_VCVT(INSN_NAME, , uint, u, 16, 4, float, f, expected_rounding);
+#endif
+
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  /* vcvt?q_xx_f16.  */
+  TEST_VCVT(INSN_NAME, q, int, s, 16, 8, float, f, expected_rounding);
+  TEST_VCVT(INSN_NAME, q, uint, u, 16, 8, float, f, expected_rounding);
+#endif
+
+#ifdef EXTRA_TESTS
+  EXTRA_TESTS();
+#endif
+}
+
+int
+main (void)
+{
+  FNNAME (INSN_NAME) ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvta_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvta_1.c
new file mode 100644
index 0000000..c467f05
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvta_1.c
@@ -0,0 +1,33 @@
+/* This file tests an intrinsic which currently has only an f16 variant and that
+   is only available when FP16 arithmetic instructions are supported.  */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+#include <math.h>
+
+/* Expected results.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL(expected, int, 16, 4) [] = { 0xfff1, 0x5, 0xfff1, 0x5 };
+VECT_VAR_DECL(expected, uint, 16, 4) [] = { 0x0, 0x5, 0x0, 0x5 };
+VECT_VAR_DECL(expected, int, 16, 8) [] = { 0x0, 0x0, 0xf, 0xfff1,
+					   0x0, 0x0, 0xf, 0xfff1 };
+VECT_VAR_DECL(expected, uint, 16, 8) [] = { 0x0, 0x0, 0xf, 0x0,
+					    0x0, 0x0, 0xf, 0x0 };
+#endif
+
+/* Expected results with rounding.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL(expected_rounding, int, 16, 4) [] = { 0xa, 0xa, 0xa, 0xa };
+VECT_VAR_DECL(expected_rounding, uint, 16, 4) [] = { 0xa, 0xa, 0xa, 0xa };
+VECT_VAR_DECL(expected_rounding, int, 16, 8) [] = { 0x7e, 0x7e, 0x7e, 0x7e,
+						    0x7e, 0x7e, 0x7e, 0x7e };
+VECT_VAR_DECL(expected_rounding, uint, 16, 8) [] = { 0x7e, 0x7e, 0x7e, 0x7e,
+						     0x7e, 0x7e, 0x7e, 0x7e };
+#endif
+
+#define TEST_MSG "VCVTA/VCVTAQ"
+#define INSN_NAME vcvta
+
+#include "vcvtX.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtm_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtm_1.c
new file mode 100644
index 0000000..1c22772
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtm_1.c
@@ -0,0 +1,33 @@
+/* This file tests an intrinsic which currently has only an f16 variant and that
+   is only available when FP16 arithmetic instructions are supported.  */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+#include <math.h>
+
+/* Expected results.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL(expected, int, 16, 4) [] = { 0xfff0, 0x5, 0xfff0, 0x5 };
+VECT_VAR_DECL(expected, uint, 16, 4) [] = { 0x0, 0x5, 0x0, 0x5 };
+VECT_VAR_DECL(expected, int, 16, 8) [] = { 0x0, 0x0, 0xf, 0xfff0, 0x0,
+					   0x0, 0xf, 0xfff0 };
+VECT_VAR_DECL(expected, uint, 16, 8) [] = { 0x0, 0x0, 0xf, 0x0,
+					    0x0, 0x0, 0xf, 0x0 };
+#endif
+
+/* Expected results with rounding.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL(expected_rounding, int, 16, 4) [] = { 0xa, 0xa, 0xa, 0xa };
+VECT_VAR_DECL(expected_rounding, uint, 16, 4) [] = { 0xa, 0xa, 0xa, 0xa };
+VECT_VAR_DECL(expected_rounding, int, 16, 8) [] = { 0x7d, 0x7d, 0x7d, 0x7d,
+						    0x7d, 0x7d, 0x7d, 0x7d };
+VECT_VAR_DECL(expected_rounding, uint, 16, 8) [] = { 0x7d, 0x7d, 0x7d, 0x7d,
+						     0x7d, 0x7d, 0x7d, 0x7d };
+#endif
+
+#define TEST_MSG "VCVTM/VCVTMQ"
+#define INSN_NAME vcvtm
+
+#include "vcvtX.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtp_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtp_1.c
new file mode 100644
index 0000000..7057909
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtp_1.c
@@ -0,0 +1,33 @@
+/* This file tests an intrinsic which currently has only an f16 variant and that
+   is only available when FP16 arithmetic instructions are supported.  */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+#include <math.h>
+
+/* Expected results.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL(expected, int, 16, 4) [] = { 0xfff1, 0x6, 0xfff1, 0x6 };
+VECT_VAR_DECL(expected, uint, 16, 4) [] = { 0x0, 0x6, 0x0, 0x6 };
+VECT_VAR_DECL(expected, int, 16, 8) [] = { 0x0, 0x0, 0x10, 0xfff1,
+					   0x0, 0x0, 0x10, 0xfff1 };
+VECT_VAR_DECL(expected, uint, 16, 8) [] = { 0x0, 0x0, 0x10, 0x0,
+					    0x0, 0x0, 0x10, 0x0 };
+#endif
+
+/* Expected results with rounding.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL(expected_rounding, int, 16, 4) [] = { 0xb, 0xb, 0xb, 0xb };
+VECT_VAR_DECL(expected_rounding, uint, 16, 4) [] = { 0xb, 0xb, 0xb, 0xb };
+VECT_VAR_DECL(expected_rounding, int, 16, 8) [] = { 0x7e, 0x7e, 0x7e, 0x7e,
+						    0x7e, 0x7e, 0x7e, 0x7e };
+VECT_VAR_DECL(expected_rounding, uint, 16, 8) [] = { 0x7e, 0x7e, 0x7e, 0x7e,
+						     0x7e, 0x7e, 0x7e, 0x7e };
+#endif
+
+#define TEST_MSG "VCVTP/VCVTPQ"
+#define INSN_NAME vcvtp
+
+#include "vcvtX.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfma.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfma.c
index 8180108..2cf68fe 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfma.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfma.c
@@ -3,11 +3,19 @@
 #include "compute-ref-data.h"
 
 #ifdef __ARM_FEATURE_FMA
+
 /* Expected results.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL(expected, hfloat, 16, 4) [] = { 0x61c6, 0x61c8, 0x61ca, 0x61cc };
+VECT_VAR_DECL(expected, hfloat, 16, 8) [] = { 0x6435, 0x6436, 0x6437, 0x6438,
+					      0x6439, 0x643a, 0x643b, 0x643c };
+#endif
 VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0x4438ca3d, 0x44390a3d };
-VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0x44869eb8, 0x4486beb8, 0x4486deb8, 0x4486feb8 };
+VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0x44869eb8, 0x4486beb8,
+					   0x4486deb8, 0x4486feb8 };
 #ifdef __aarch64__
-VECT_VAR_DECL(expected,hfloat,64,2) [] = { 0x408906e1532b8520, 0x40890ee1532b8520 };
+VECT_VAR_DECL(expected,hfloat,64,2) [] = { 0x408906e1532b8520,
+					   0x40890ee1532b8520 };
 #endif
 
 #define TEST_MSG "VFMA/VFMAQ"
@@ -44,6 +52,18 @@ void exec_vfma (void)
   DECL_VARIABLE(VAR, float, 32, 4);
 #endif
 
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  DECL_VARIABLE(vector1, float, 16, 4);
+  DECL_VARIABLE(vector2, float, 16, 4);
+  DECL_VARIABLE(vector3, float, 16, 4);
+  DECL_VARIABLE(vector_res, float, 16, 4);
+
+  DECL_VARIABLE(vector1, float, 16, 8);
+  DECL_VARIABLE(vector2, float, 16, 8);
+  DECL_VARIABLE(vector3, float, 16, 8);
+  DECL_VARIABLE(vector_res, float, 16, 8);
+#endif
+
   DECL_VFMA_VAR(vector1);
   DECL_VFMA_VAR(vector2);
   DECL_VFMA_VAR(vector3);
@@ -52,6 +72,10 @@ void exec_vfma (void)
   clean_results ();
 
   /* Initialize input "vector1" from "buffer".  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VLOAD(vector1, buffer, , float, f, 16, 4);
+  VLOAD(vector1, buffer, q, float, f, 16, 8);
+#endif
   VLOAD(vector1, buffer, , float, f, 32, 2);
   VLOAD(vector1, buffer, q, float, f, 32, 4);
 #ifdef __aarch64__
@@ -59,13 +83,21 @@ void exec_vfma (void)
 #endif
 
   /* Choose init value arbitrarily.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VDUP(vector2, , float, f, 16, 4, 9.3f);
+  VDUP(vector2, q, float, f, 16, 8, 29.7f);
+#endif
   VDUP(vector2, , float, f, 32, 2, 9.3f);
   VDUP(vector2, q, float, f, 32, 4, 29.7f);
 #ifdef __aarch64__
   VDUP(vector2, q, float, f, 64, 2, 15.8f);
 #endif
-  
+
   /* Choose init value arbitrarily.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VDUP(vector3, , float, f, 16, 4, 81.2f);
+  VDUP(vector3, q, float, f, 16, 8, 36.8f);
+#endif
   VDUP(vector3, , float, f, 32, 2, 81.2f);
   VDUP(vector3, q, float, f, 32, 4, 36.8f);
 #ifdef __aarch64__
@@ -73,12 +105,20 @@ void exec_vfma (void)
 #endif
 
   /* Execute the tests.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  TEST_VFMA(, float, f, 16, 4);
+  TEST_VFMA(q, float, f, 16, 8);
+#endif
   TEST_VFMA(, float, f, 32, 2);
   TEST_VFMA(q, float, f, 32, 4);
 #ifdef __aarch64__
   TEST_VFMA(q, float, f, 64, 2);
 #endif
 
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected, "");
+  CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected, "");
+#endif
   CHECK_VFMA_RESULTS (TEST_MSG, "");
 }
 #endif
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfms.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfms.c
index 02bef09..555654d 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfms.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfms.c
@@ -4,10 +4,17 @@
 
 #ifdef __ARM_FEATURE_FMA
 /* Expected results.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL(expected, hfloat, 16, 4) [] = { 0xe206, 0xe204, 0xe202, 0xe200 };
+VECT_VAR_DECL(expected, hfloat, 16, 8) [] = { 0xe455, 0xe454, 0xe453, 0xe452,
+					      0xe451, 0xe450, 0xe44f, 0xe44e };
+#endif
 VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0xc440ca3d, 0xc4408a3d };
-VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0xc48a9eb8, 0xc48a7eb8, 0xc48a5eb8, 0xc48a3eb8 };
+VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0xc48a9eb8, 0xc48a7eb8,
+					   0xc48a5eb8, 0xc48a3eb8 };
 #ifdef __aarch64__
-VECT_VAR_DECL(expected,hfloat,64,2) [] = { 0xc08a06e1532b8520, 0xc089fee1532b8520 };
+VECT_VAR_DECL(expected,hfloat,64,2) [] = { 0xc08a06e1532b8520,
+					   0xc089fee1532b8520 };
 #endif
 
 #define TEST_MSG "VFMS/VFMSQ"
@@ -44,6 +51,18 @@ void exec_vfms (void)
   DECL_VARIABLE(VAR, float, 32, 4);
 #endif
 
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  DECL_VARIABLE(vector1, float, 16, 4);
+  DECL_VARIABLE(vector2, float, 16, 4);
+  DECL_VARIABLE(vector3, float, 16, 4);
+  DECL_VARIABLE(vector_res, float, 16, 4);
+
+  DECL_VARIABLE(vector1, float, 16, 8);
+  DECL_VARIABLE(vector2, float, 16, 8);
+  DECL_VARIABLE(vector3, float, 16, 8);
+  DECL_VARIABLE(vector_res, float, 16, 8);
+#endif
+
   DECL_VFMS_VAR(vector1);
   DECL_VFMS_VAR(vector2);
   DECL_VFMS_VAR(vector3);
@@ -52,6 +71,10 @@ void exec_vfms (void)
   clean_results ();
 
   /* Initialize input "vector1" from "buffer".  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VLOAD(vector1, buffer, , float, f, 16, 4);
+  VLOAD(vector1, buffer, q, float, f, 16, 8);
+#endif
   VLOAD(vector1, buffer, , float, f, 32, 2);
   VLOAD(vector1, buffer, q, float, f, 32, 4);
 #ifdef __aarch64__
@@ -59,13 +82,21 @@ void exec_vfms (void)
 #endif
 
   /* Choose init value arbitrarily.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VDUP(vector2, , float, f, 16, 4, 9.3f);
+  VDUP(vector2, q, float, f, 16, 8, 29.7f);
+#endif
   VDUP(vector2, , float, f, 32, 2, 9.3f);
   VDUP(vector2, q, float, f, 32, 4, 29.7f);
 #ifdef __aarch64__
   VDUP(vector2, q, float, f, 64, 2, 15.8f);
 #endif
-  
+
   /* Choose init value arbitrarily.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VDUP(vector3, , float, f, 16, 4, 81.2f);
+  VDUP(vector3, q, float, f, 16, 8, 36.8f);
+#endif
   VDUP(vector3, , float, f, 32, 2, 81.2f);
   VDUP(vector3, q, float, f, 32, 4, 36.8f);
 #ifdef __aarch64__
@@ -73,12 +104,20 @@ void exec_vfms (void)
 #endif
 
   /* Execute the tests.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  TEST_VFMS(, float, f, 16, 4);
+  TEST_VFMS(q, float, f, 16, 8);
+#endif
   TEST_VFMS(, float, f, 32, 2);
   TEST_VFMS(q, float, f, 32, 4);
 #ifdef __aarch64__
   TEST_VFMS(q, float, f, 64, 2);
 #endif
 
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected, "");
+  CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected, "");
+#endif
   CHECK_VFMS_RESULTS (TEST_MSG, "");
 }
 #endif
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmax.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmax.c
index 830603d..80f8bec 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmax.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmax.c
@@ -7,6 +7,10 @@
 
 #define HAS_FLOAT_VARIANT
 
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+#define HAS_FLOAT16_VARIANT
+#endif
+
 /* Expected results.  */
 VECT_VAR_DECL(expected,int,8,8) [] = { 0xf3, 0xf3, 0xf3, 0xf3,
 				       0xf4, 0xf5, 0xf6, 0xf7 };
@@ -16,6 +20,9 @@ VECT_VAR_DECL(expected,uint,8,8) [] = { 0xf3, 0xf3, 0xf3, 0xf3,
 					0xf4, 0xf5, 0xf6, 0xf7 };
 VECT_VAR_DECL(expected,uint,16,4) [] = { 0xfff1, 0xfff1, 0xfff2, 0xfff3 };
 VECT_VAR_DECL(expected,uint,32,2) [] = { 0xfffffff0, 0xfffffff1 };
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL(expected, hfloat, 16, 4) [] = { 0xcbc0, 0xcb80, 0xcb00, 0xca80 };
+#endif
 VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0xc1780000, 0xc1700000 };
 VECT_VAR_DECL(expected,int,8,16) [] = { 0xf4, 0xf4, 0xf4, 0xf4,
 					0xf4, 0xf5, 0xf6, 0xf7,
@@ -33,10 +40,36 @@ VECT_VAR_DECL(expected,uint,16,8) [] = { 0xfff2, 0xfff2, 0xfff2, 0xfff3,
 					 0xfff4, 0xfff5, 0xfff6, 0xfff7 };
 VECT_VAR_DECL(expected,uint,32,4) [] = { 0xfffffff1, 0xfffffff1,
 					 0xfffffff2, 0xfffffff3 };
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL(expected, hfloat, 16, 8) [] = { 0xcb40, 0xcb40, 0xcb00, 0xca80,
+					      0xca00, 0xc980, 0xc900, 0xc880 };
+#endif
 VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0xc1680000, 0xc1680000,
 					   0xc1600000, 0xc1500000 };
 
 /* Expected results with special FP values.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL(expected_nan, hfloat, 16, 8) [] = { 0x7e00, 0x7e00,
+						  0x7e00, 0x7e00,
+						  0x7e00, 0x7e00,
+						  0x7e00, 0x7e00 };
+VECT_VAR_DECL(expected_mnan, hfloat, 16, 8) [] = { 0x7e00, 0x7e00,
+						   0x7e00, 0x7e00,
+						   0x7e00, 0x7e00,
+						   0x7e00, 0x7e00 };
+VECT_VAR_DECL(expected_inf, hfloat, 16, 8) [] = { 0x7c00, 0x7c00,
+						  0x7c00, 0x7c00,
+						  0x7c00, 0x7c00,
+						  0x7c00, 0x7c00 };
+VECT_VAR_DECL(expected_minf, hfloat, 16, 8) [] = { 0x3c00, 0x3c00,
+						   0x3c00, 0x3c00,
+						   0x3c00, 0x3c00,
+						   0x3c00, 0x3c00 };
+VECT_VAR_DECL(expected_zero1, hfloat, 16, 8) [] = { 0x0, 0x0, 0x0, 0x0,
+						    0x0, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL(expected_zero2, hfloat, 16, 8) [] = { 0x0, 0x0, 0x0, 0x0,
+						    0x0, 0x0, 0x0, 0x0 };
+#endif
 VECT_VAR_DECL(expected_nan,hfloat,32,4) [] = { 0x7fc00000, 0x7fc00000,
 					       0x7fc00000, 0x7fc00000 };
 VECT_VAR_DECL(expected_mnan,hfloat,32,4) [] = { 0x7fc00000, 0x7fc00000,
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmaxnm_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmaxnm_1.c
new file mode 100644
index 0000000..e546bd5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmaxnm_1.c
@@ -0,0 +1,47 @@
+/* This file tests an intrinsic which currently has only an f16 variant and that
+   is only available when FP16 arithmetic instructions are supported.  */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define INSN_NAME vmaxnm
+#define TEST_MSG "VMAXNM/VMAXNMQ"
+
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+#define HAS_FLOAT16_VARIANT
+#endif
+
+/* Expected results.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL(expected, hfloat, 16, 4) [] = { 0xcbc0, 0xcb80, 0xcb00, 0xca80 };
+VECT_VAR_DECL(expected, hfloat, 16, 8) [] = { 0xcb40, 0xcb40, 0xcb00, 0xca80,
+					      0xca00, 0xc980, 0xc900, 0xc880 };
+#endif
+
+/* Expected results with special FP values.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL(expected_nan, hfloat, 16, 8) [] = { 0x3c00, 0x3c00,
+						  0x3c00, 0x3c00,
+						  0x3c00, 0x3c00,
+						  0x3c00, 0x3c00 };
+VECT_VAR_DECL(expected_mnan, hfloat, 16, 8) [] = { 0x3c00, 0x3c00,
+						   0x3c00, 0x3c00,
+						   0x3c00, 0x3c00,
+						   0x3c00, 0x3c00 };
+VECT_VAR_DECL(expected_inf, hfloat, 16, 8) [] = { 0x7c00, 0x7c00,
+						  0x7c00, 0x7c00,
+						  0x7c00, 0x7c00,
+						  0x7c00, 0x7c00 };
+VECT_VAR_DECL(expected_minf, hfloat, 16, 8) [] = { 0x3c00, 0x3c00,
+						   0x3c00, 0x3c00,
+						   0x3c00, 0x3c00,
+						   0x3c00, 0x3c00 };
+VECT_VAR_DECL(expected_zero1, hfloat, 16, 8) [] = { 0x0, 0x0, 0x0, 0x0,
+						    0x0, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL(expected_zero2, hfloat, 16, 8) [] = { 0x0, 0x0, 0x0, 0x0,
+						    0x0, 0x0, 0x0, 0x0 };
+#endif
+
+#include "binary_op_float.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmin.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmin.c
index 8ad2703..4ee3c1e 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmin.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmin.c
@@ -7,6 +7,10 @@
 
 #define HAS_FLOAT_VARIANT
 
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+#define HAS_FLOAT16_VARIANT
+#endif
+
 /* Expected results.  */
 VECT_VAR_DECL(expected,int,8,8) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
 				       0xf3, 0xf3, 0xf3, 0xf3 };
@@ -16,6 +20,9 @@ VECT_VAR_DECL(expected,uint,8,8) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
 					0xf3, 0xf3, 0xf3, 0xf3 };
 VECT_VAR_DECL(expected,uint,16,4) [] = { 0xfff0, 0xfff1, 0xfff1, 0xfff1 };
 VECT_VAR_DECL(expected,uint,32,2) [] = { 0xfffffff0, 0xfffffff0 };
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL(expected, hfloat, 16, 4) [] = { 0xcc00, 0xcbc0, 0xcbc0, 0xcbc0 };
+#endif
 VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0xc1800000, 0xc1780000 };
 VECT_VAR_DECL(expected,int,8,16) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
 					0xf4, 0xf4, 0xf4, 0xf4,
@@ -31,11 +38,41 @@ VECT_VAR_DECL(expected,uint,8,16) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
 					 0xf9, 0xf9, 0xf9, 0xf9 };
 VECT_VAR_DECL(expected,uint,16,8) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff2,
 					 0xfff2, 0xfff2, 0xfff2, 0xfff2 };
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL(expected, hfloat, 16, 8) [] = { 0xcc00, 0xcb80, 0xcb40, 0xcb40,
+					      0xcb40, 0xcb40, 0xcb40, 0xcb40 };
+#endif
 VECT_VAR_DECL(expected,uint,32,4) [] = { 0xfffffff0, 0xfffffff1,
 					 0xfffffff1, 0xfffffff1 };
 VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0xc1800000, 0xc1700000,
 					   0xc1680000, 0xc1680000 };
 /* Expected results with special FP values.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL(expected_nan, hfloat, 16, 8) [] = { 0x7e00, 0x7e00,
+						  0x7e00, 0x7e00,
+						  0x7e00, 0x7e00,
+						  0x7e00, 0x7e00 };
+VECT_VAR_DECL(expected_mnan, hfloat, 16, 8) [] = { 0x7e00, 0x7e00,
+						   0x7e00, 0x7e00,
+						   0x7e00, 0x7e00,
+						   0x7e00, 0x7e00 };
+VECT_VAR_DECL(expected_inf, hfloat, 16, 8) [] = { 0x3c00, 0x3c00,
+						  0x3c00, 0x3c00,
+						  0x3c00, 0x3c00,
+						  0x3c00, 0x3c00 };
+VECT_VAR_DECL(expected_minf, hfloat, 16, 8) [] = { 0xfc00, 0xfc00,
+						   0xfc00, 0xfc00,
+						   0xfc00, 0xfc00,
+						   0xfc00, 0xfc00 };
+VECT_VAR_DECL(expected_zero1, hfloat, 16, 8) [] = { 0x8000, 0x8000,
+						    0x8000, 0x8000,
+						    0x8000, 0x8000,
+						    0x8000, 0x8000 };
+VECT_VAR_DECL(expected_zero2, hfloat, 16, 8) [] = { 0x8000, 0x8000,
+						    0x8000, 0x8000,
+						    0x8000, 0x8000,
+						    0x8000, 0x8000 };
+#endif
 VECT_VAR_DECL(expected_nan,hfloat,32,4) [] = { 0x7fc00000, 0x7fc00000,
 					       0x7fc00000, 0x7fc00000 };
 VECT_VAR_DECL(expected_mnan,hfloat,32,4) [] = { 0x7fc00000, 0x7fc00000,
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vminnm_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vminnm_1.c
new file mode 100644
index 0000000..975fc56
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vminnm_1.c
@@ -0,0 +1,51 @@
+/* This file tests an intrinsic which currently has only an f16 variant and that
+   is only available when FP16 arithmetic instructions are supported.  */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define INSN_NAME vminnm
+#define TEST_MSG "VMINNM/VMINMQ"
+
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+#define HAS_FLOAT16_VARIANT
+#endif
+
+/* Expected results.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL(expected, hfloat, 16, 4) [] = { 0xcc00, 0xcbc0, 0xcbc0, 0xcbc0 };
+VECT_VAR_DECL(expected, hfloat, 16, 8) [] = { 0xcc00, 0xcb80, 0xcb40, 0xcb40,
+					      0xcb40, 0xcb40, 0xcb40, 0xcb40 };
+#endif
+
+/* Expected results with special FP values.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL(expected_nan, hfloat, 16, 8) [] = { 0x3c00, 0x3c00,
+						  0x3c00, 0x3c00,
+						  0x3c00, 0x3c00,
+						  0x3c00, 0x3c00 };
+VECT_VAR_DECL(expected_mnan, hfloat, 16, 8) [] = { 0x3c00, 0x3c00,
+						   0x3c00, 0x3c00,
+						   0x3c00, 0x3c00,
+						   0x3c00, 0x3c00 };
+VECT_VAR_DECL(expected_inf, hfloat, 16, 8) [] = { 0x3c00, 0x3c00,
+						  0x3c00, 0x3c00,
+						  0x3c00, 0x3c00,
+						  0x3c00, 0x3c00 };
+VECT_VAR_DECL(expected_minf, hfloat, 16, 8) [] = { 0xfc00, 0xfc00,
+						   0xfc00, 0xfc00,
+						   0xfc00, 0xfc00,
+						   0xfc00, 0xfc00 };
+VECT_VAR_DECL(expected_zero1, hfloat, 16, 8) [] = { 0x8000, 0x8000,
+						    0x8000, 0x8000,
+						    0x8000, 0x8000,
+						    0x8000, 0x8000 };
+VECT_VAR_DECL(expected_zero2, hfloat, 16, 8) [] = { 0x8000, 0x8000,
+						    0x8000, 0x8000,
+						    0x8000, 0x8000,
+						    0x8000, 0x8000 };
+#endif
+
+#include "binary_op_float.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmul.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmul.c
index 63f0d8d..c5fe31a 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmul.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmul.c
@@ -13,6 +13,10 @@ VECT_VAR_DECL(expected,uint,16,4) [] = { 0xfab0, 0xfb05, 0xfb5a, 0xfbaf };
 VECT_VAR_DECL(expected,uint,32,2) [] = { 0xfffff9a0, 0xfffffa06 };
 VECT_VAR_DECL(expected,poly,8,8) [] = { 0xc0, 0x84, 0x48, 0xc,
 					0xd0, 0x94, 0x58, 0x1c };
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL(expected, hfloat, 16, 4) [] = { 0xe02a, 0xdfcf,
+					      0xdf4a, 0xdec4 };
+#endif
 VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0xc4053333, 0xc3f9c000 };
 VECT_VAR_DECL(expected,int,8,16) [] = { 0x90, 0x7, 0x7e, 0xf5,
 					0x6c, 0xe3, 0x5a, 0xd1,
@@ -34,6 +38,10 @@ VECT_VAR_DECL(expected,poly,8,16) [] = { 0x60, 0xca, 0x34, 0x9e,
 					 0xc8, 0x62, 0x9c, 0x36,
 					 0x30, 0x9a, 0x64, 0xce,
 					 0x98, 0x32, 0xcc, 0x66 };
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL(expected, hfloat, 16, 8) [] = { 0xe63a, 0xe5d6, 0xe573, 0xe50f,
+					      0xe4ac, 0xe448, 0xe3c8, 0xe301 };
+#endif
 VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0xc4c73333, 0xc4bac000,
 					   0xc4ae4ccd, 0xc4a1d999 };
 
@@ -78,6 +86,17 @@ void FNNAME (INSN_NAME) (void)
   DECL_VMUL(poly, 8, 16);
   DECL_VMUL(float, 32, 4);
 
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  DECL_VARIABLE(vector1, float, 16, 4);
+  DECL_VARIABLE(vector1, float, 16, 8);
+
+  DECL_VARIABLE(vector2, float, 16, 4);
+  DECL_VARIABLE(vector2, float, 16, 8);
+
+  DECL_VARIABLE(vector_res, float, 16, 4);
+  DECL_VARIABLE(vector_res, float, 16, 8);
+#endif
+
   clean_results ();
 
   /* Initialize input "vector1" from "buffer".  */
@@ -97,6 +116,10 @@ void FNNAME (INSN_NAME) (void)
   VLOAD(vector1, buffer, q, uint, u, 32, 4);
   VLOAD(vector1, buffer, q, poly, p, 8, 16);
   VLOAD(vector1, buffer, q, float, f, 32, 4);
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VLOAD(vector1, buffer, , float, f, 16, 4);
+  VLOAD(vector1, buffer, q, float, f, 16, 8);
+#endif
 
   /* Choose init value arbitrarily.  */
   VDUP(vector2, , int, s, 8, 8, 0x11);
@@ -115,6 +138,10 @@ void FNNAME (INSN_NAME) (void)
   VDUP(vector2, q, uint, u, 32, 4, 0xCC);
   VDUP(vector2, q, poly, p, 8, 16, 0xAA);
   VDUP(vector2, q, float, f, 32, 4, 99.6f);
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VDUP(vector2, , float, f, 16, 4, 33.3f);
+  VDUP(vector2, q, float, f, 16, 8, 99.6f);
+#endif
 
   /* Execute the tests.  */
   TEST_VMUL(INSN_NAME, , int, s, 8, 8);
@@ -133,6 +160,10 @@ void FNNAME (INSN_NAME) (void)
   TEST_VMUL(INSN_NAME, q, uint, u, 32, 4);
   TEST_VMUL(INSN_NAME, q, poly, p, 8, 16);
   TEST_VMUL(INSN_NAME, q, float, f, 32, 4);
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  TEST_VMUL(INSN_NAME, , float, f, 16, 4);
+  TEST_VMUL(INSN_NAME, q, float, f, 16, 8);
+#endif
 
   CHECK(TEST_MSG, int, 8, 8, PRIx8, expected, "");
   CHECK(TEST_MSG, int, 16, 4, PRIx16, expected, "");
@@ -150,6 +181,10 @@ void FNNAME (INSN_NAME) (void)
   CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected, "");
   CHECK(TEST_MSG, poly, 8, 16, PRIx8, expected, "");
   CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected, "");
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected, "");
+  CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected, "");
+#endif
 }
 
 int main (void)
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmul_lane.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmul_lane.c
index 978cd9b..e6cf4d7 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmul_lane.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmul_lane.c
@@ -7,6 +7,9 @@ VECT_VAR_DECL(expected,int,16,4) [] = { 0xffc0, 0xffc4, 0xffc8, 0xffcc };
 VECT_VAR_DECL(expected,int,32,2) [] = { 0xfffffde0, 0xfffffe02 };
 VECT_VAR_DECL(expected,uint,16,4) [] = { 0xbbc0, 0xc004, 0xc448, 0xc88c };
 VECT_VAR_DECL(expected,uint,32,2) [] = { 0xfffface0, 0xffffb212 };
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL(expected, hfloat, 16, 4) [] = { 0xddb3, 0xdd58, 0xdcfd, 0xdca1 };
+#endif
 VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0xc3b66666, 0xc3ab0000 };
 VECT_VAR_DECL(expected,int,16,8) [] = { 0xffc0, 0xffc4, 0xffc8, 0xffcc,
 					0xffd0, 0xffd4, 0xffd8, 0xffdc };
@@ -16,6 +19,10 @@ VECT_VAR_DECL(expected,uint,16,8) [] = { 0xbbc0, 0xc004, 0xc448, 0xc88c,
 					 0xccd0, 0xd114, 0xd558, 0xd99c };
 VECT_VAR_DECL(expected,uint,32,4) [] = { 0xfffface0, 0xffffb212,
 					 0xffffb744, 0xffffbc76 };
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL(expected, hfloat, 16, 8) [] = { 0xddb3, 0xdd58, 0xdcfd, 0xdca1,
+					      0xdc46, 0xdbd6, 0xdb20, 0xda69 };
+#endif
 VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0xc3b66666, 0xc3ab0000,
 					   0xc39f9999, 0xc3943333 };
 
@@ -45,11 +52,20 @@ void exec_vmul_lane (void)
 
   DECL_VMUL(vector);
   DECL_VMUL(vector_res);
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  DECL_VARIABLE(vector, float, 16, 4);
+  DECL_VARIABLE(vector, float, 16, 8);
+  DECL_VARIABLE(vector_res, float, 16, 4);
+  DECL_VARIABLE(vector_res, float, 16, 8);
+#endif
 
   DECL_VARIABLE(vector2, int, 16, 4);
   DECL_VARIABLE(vector2, int, 32, 2);
   DECL_VARIABLE(vector2, uint, 16, 4);
   DECL_VARIABLE(vector2, uint, 32, 2);
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  DECL_VARIABLE(vector2, float, 16, 4);
+#endif
   DECL_VARIABLE(vector2, float, 32, 2);
 
   clean_results ();
@@ -59,11 +75,17 @@ void exec_vmul_lane (void)
   VLOAD(vector, buffer, , int, s, 32, 2);
   VLOAD(vector, buffer, , uint, u, 16, 4);
   VLOAD(vector, buffer, , uint, u, 32, 2);
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VLOAD(vector, buffer, , float, f, 16, 4);
+#endif
   VLOAD(vector, buffer, , float, f, 32, 2);
   VLOAD(vector, buffer, q, int, s, 16, 8);
   VLOAD(vector, buffer, q, int, s, 32, 4);
   VLOAD(vector, buffer, q, uint, u, 16, 8);
   VLOAD(vector, buffer, q, uint, u, 32, 4);
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VLOAD(vector, buffer, q, float, f, 16, 8);
+#endif
   VLOAD(vector, buffer, q, float, f, 32, 4);
 
   /* Initialize vector2.  */
@@ -71,6 +93,9 @@ void exec_vmul_lane (void)
   VDUP(vector2, , int, s, 32, 2, 0x22);
   VDUP(vector2, , uint, u, 16, 4, 0x444);
   VDUP(vector2, , uint, u, 32, 2, 0x532);
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VDUP(vector2, , float, f, 16, 4, 22.8f);
+#endif
   VDUP(vector2, , float, f, 32, 2, 22.8f);
 
   /* Choose lane arbitrarily.  */
@@ -78,22 +103,34 @@ void exec_vmul_lane (void)
   TEST_VMUL_LANE(, int, s, 32, 2, 2, 1);
   TEST_VMUL_LANE(, uint, u, 16, 4, 4, 2);
   TEST_VMUL_LANE(, uint, u, 32, 2, 2, 1);
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  TEST_VMUL_LANE(, float, f, 16, 4, 4, 1);
+#endif
   TEST_VMUL_LANE(, float, f, 32, 2, 2, 1);
   TEST_VMUL_LANE(q, int, s, 16, 8, 4, 2);
   TEST_VMUL_LANE(q, int, s, 32, 4, 2, 0);
   TEST_VMUL_LANE(q, uint, u, 16, 8, 4, 2);
   TEST_VMUL_LANE(q, uint, u, 32, 4, 2, 1);
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  TEST_VMUL_LANE(q, float, f, 16, 8, 4, 0);
+#endif
   TEST_VMUL_LANE(q, float, f, 32, 4, 2, 0);
 
   CHECK(TEST_MSG, int, 16, 4, PRIx64, expected, "");
   CHECK(TEST_MSG, int, 32, 2, PRIx32, expected, "");
   CHECK(TEST_MSG, uint, 16, 4, PRIx64, expected, "");
   CHECK(TEST_MSG, uint, 32, 2, PRIx32, expected, "");
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected, "");
+#endif
   CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected, "");
   CHECK(TEST_MSG, int, 16, 8, PRIx64, expected, "");
   CHECK(TEST_MSG, int, 32, 4, PRIx32, expected, "");
   CHECK(TEST_MSG, uint, 16, 8, PRIx64, expected, "");
   CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected, "");
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected, "");
+#endif
   CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected, "");
 }
 
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmul_n.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmul_n.c
index be0ee65..16f7dac 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmul_n.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmul_n.c
@@ -7,6 +7,9 @@ VECT_VAR_DECL(expected,int,16,4) [] = { 0xfef0, 0xff01, 0xff12, 0xff23 };
 VECT_VAR_DECL(expected,int,32,2) [] = { 0xfffffde0, 0xfffffe02 };
 VECT_VAR_DECL(expected,uint,16,4) [] = { 0xfcd0, 0xfd03, 0xfd36, 0xfd69 };
 VECT_VAR_DECL(expected,uint,32,2) [] = { 0xfffffbc0, 0xfffffc04 };
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL(expected, hfloat, 16, 4) [] = { 0xdd93, 0xdd3a, 0xdce1, 0xdc87 };
+#endif
 VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0xc3b26666, 0xc3a74000 };
 VECT_VAR_DECL(expected,int,16,8) [] = { 0xfab0, 0xfb05, 0xfb5a, 0xfbaf,
 					0xfc04, 0xfc59, 0xfcae, 0xfd03 };
@@ -16,6 +19,10 @@ VECT_VAR_DECL(expected,uint,16,8) [] = { 0xf890, 0xf907, 0xf97e, 0xf9f5,
 					 0xfa6c, 0xfae3, 0xfb5a, 0xfbd1 };
 VECT_VAR_DECL(expected,uint,32,4) [] = { 0xfffff780, 0xfffff808,
 					 0xfffff890, 0xfffff918 };
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL(expected, hfloat, 16, 8) [] = { 0xe58e, 0xe535, 0xe4dc, 0xe483,
+					      0xe42a, 0xe3a3, 0xe2f2, 0xe240 };
+#endif
 VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0xc4b1cccd, 0xc4a6b000,
 					   0xc49b9333, 0xc4907667 };
 
@@ -50,6 +57,13 @@ void FNNAME (INSN_NAME) (void)
   DECL_VMUL(vector);
   DECL_VMUL(vector_res);
 
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  DECL_VARIABLE(vector, float, 16, 4);
+  DECL_VARIABLE(vector, float, 16, 8);
+  DECL_VARIABLE(vector_res, float, 16, 4);
+  DECL_VARIABLE(vector_res, float, 16, 8);
+#endif
+
   clean_results ();
 
   /* Initialize vector from pre-initialized values.  */
@@ -57,11 +71,17 @@ void FNNAME (INSN_NAME) (void)
   VLOAD(vector, buffer, , int, s, 32, 2);
   VLOAD(vector, buffer, , uint, u, 16, 4);
   VLOAD(vector, buffer, , uint, u, 32, 2);
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VLOAD(vector, buffer, , float, f, 16, 4);
+#endif
   VLOAD(vector, buffer, , float, f, 32, 2);
   VLOAD(vector, buffer, q, int, s, 16, 8);
   VLOAD(vector, buffer, q, int, s, 32, 4);
   VLOAD(vector, buffer, q, uint, u, 16, 8);
   VLOAD(vector, buffer, q, uint, u, 32, 4);
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VLOAD(vector, buffer, q, float, f, 16, 8);
+#endif
   VLOAD(vector, buffer, q, float, f, 32, 4);
 
   /* Choose multiplier arbitrarily.  */
@@ -69,22 +89,34 @@ void FNNAME (INSN_NAME) (void)
   TEST_VMUL_N(, int, s, 32, 2, 0x22);
   TEST_VMUL_N(, uint, u, 16, 4, 0x33);
   TEST_VMUL_N(, uint, u, 32, 2, 0x44);
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  TEST_VMUL_N(, float, f, 16, 4, 22.3f);
+#endif
   TEST_VMUL_N(, float, f, 32, 2, 22.3f);
   TEST_VMUL_N(q, int, s, 16, 8, 0x55);
   TEST_VMUL_N(q, int, s, 32, 4, 0x66);
   TEST_VMUL_N(q, uint, u, 16, 8, 0x77);
   TEST_VMUL_N(q, uint, u, 32, 4, 0x88);
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  TEST_VMUL_N(q, float, f, 16, 8, 88.9f);
+#endif
   TEST_VMUL_N(q, float, f, 32, 4, 88.9f);
 
   CHECK(TEST_MSG, int, 16, 4, PRIx64, expected, "");
   CHECK(TEST_MSG, int, 32, 2, PRIx32, expected, "");
   CHECK(TEST_MSG, uint, 16, 4, PRIx64, expected, "");
   CHECK(TEST_MSG, uint, 32, 2, PRIx32, expected, "");
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected, "");
+#endif
   CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected, "");
   CHECK(TEST_MSG, int, 16, 8, PRIx64, expected, "");
   CHECK(TEST_MSG, int, 32, 4, PRIx32, expected, "");
   CHECK(TEST_MSG, uint, 16, 8, PRIx64, expected, "");
   CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected, "");
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected, "");
+#endif
   CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected, "");
 }
 
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vneg.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vneg.c
index 78f17ed..7bd9d55 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vneg.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vneg.c
@@ -21,24 +21,53 @@ VECT_VAR_DECL(expected,int,32,4) [] = { 0x10, 0xf, 0xe, 0xd };
 /* Expected results for float32 variants. Needs to be separated since
    the generic test function does not test floating-point
    versions.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL(expected_float16, hfloat, 16, 4) [] = { 0xc09a, 0xc09a,
+						      0xc09a, 0xc09a };
+VECT_VAR_DECL(expected_float16, hfloat, 16, 8) [] = { 0xc2cd, 0xc2cd,
+						      0xc2cd, 0xc2cd,
+						      0xc2cd, 0xc2cd,
+						      0xc2cd, 0xc2cd };
+#endif
 VECT_VAR_DECL(expected_float32,hfloat,32,2) [] = { 0xc0133333, 0xc0133333 };
 VECT_VAR_DECL(expected_float32,hfloat,32,4) [] = { 0xc059999a, 0xc059999a,
 						   0xc059999a, 0xc059999a };
 
 void exec_vneg_f32(void)
 {
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  DECL_VARIABLE(vector, float, 16, 4);
+  DECL_VARIABLE(vector, float, 16, 8);
+#endif
   DECL_VARIABLE(vector, float, 32, 2);
   DECL_VARIABLE(vector, float, 32, 4);
 
+
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  DECL_VARIABLE(vector_res, float, 16, 4);
+  DECL_VARIABLE(vector_res, float, 16, 8);
+#endif
   DECL_VARIABLE(vector_res, float, 32, 2);
   DECL_VARIABLE(vector_res, float, 32, 4);
 
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VDUP(vector, , float, f, 16, 4, 2.3f);
+  VDUP(vector, q, float, f, 16, 8, 3.4f);
+#endif
   VDUP(vector, , float, f, 32, 2, 2.3f);
   VDUP(vector, q, float, f, 32, 4, 3.4f);
 
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  TEST_UNARY_OP(INSN_NAME, , float, f, 16, 4);
+  TEST_UNARY_OP(INSN_NAME, q, float, f, 16, 8);
+#endif
   TEST_UNARY_OP(INSN_NAME, , float, f, 32, 2);
   TEST_UNARY_OP(INSN_NAME, q, float, f, 32, 4);
 
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected_float16, "");
+  CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_float16, "");
+#endif
   CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected_float32, "");
   CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_float32, "");
 }
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vpXXX.inc b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vpXXX.inc
index c1b7235..a9b0c62 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vpXXX.inc
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vpXXX.inc
@@ -21,6 +21,9 @@ void FNNAME (INSN_NAME) (void)
   DECL_VARIABLE(vector, uint, 8, 8);
   DECL_VARIABLE(vector, uint, 16, 4);
   DECL_VARIABLE(vector, uint, 32, 2);
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  DECL_VARIABLE(vector, float, 16, 4);
+#endif
   DECL_VARIABLE(vector, float, 32, 2);
 
   DECL_VARIABLE(vector_res, int, 8, 8);
@@ -29,6 +32,9 @@ void FNNAME (INSN_NAME) (void)
   DECL_VARIABLE(vector_res, uint, 8, 8);
   DECL_VARIABLE(vector_res, uint, 16, 4);
   DECL_VARIABLE(vector_res, uint, 32, 2);
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  DECL_VARIABLE(vector_res, float, 16, 4);
+#endif
   DECL_VARIABLE(vector_res, float, 32, 2);
 
   clean_results ();
@@ -40,6 +46,9 @@ void FNNAME (INSN_NAME) (void)
   VLOAD(vector, buffer, , uint, u, 8, 8);
   VLOAD(vector, buffer, , uint, u, 16, 4);
   VLOAD(vector, buffer, , uint, u, 32, 2);
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VLOAD(vector, buffer, , float, f, 16, 4);
+#endif
   VLOAD(vector, buffer, , float, f, 32, 2);
 
   /* Apply a binary operator named INSN_NAME.  */
@@ -49,6 +58,9 @@ void FNNAME (INSN_NAME) (void)
   TEST_VPXXX(INSN_NAME, uint, u, 8, 8);
   TEST_VPXXX(INSN_NAME, uint, u, 16, 4);
   TEST_VPXXX(INSN_NAME, uint, u, 32, 2);
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  TEST_VPXXX(INSN_NAME, float, f, 16, 4);
+#endif
   TEST_VPXXX(INSN_NAME, float, f, 32, 2);
 
   CHECK(TEST_MSG, int, 8, 8, PRIx32, expected, "");
@@ -57,6 +69,9 @@ void FNNAME (INSN_NAME) (void)
   CHECK(TEST_MSG, uint, 8, 8, PRIx32, expected, "");
   CHECK(TEST_MSG, uint, 16, 4, PRIx64, expected, "");
   CHECK(TEST_MSG, uint, 32, 2, PRIx32, expected, "");
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected, "");
+#endif
   CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected, "");
 }
 
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vpadd.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vpadd.c
index 5ddfd3d..f1bbe09 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vpadd.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vpadd.c
@@ -14,6 +14,9 @@ VECT_VAR_DECL(expected,uint,8,8) [] = { 0xe1, 0xe5, 0xe9, 0xed,
 					0xe1, 0xe5, 0xe9, 0xed };
 VECT_VAR_DECL(expected,uint,16,4) [] = { 0xffe1, 0xffe5, 0xffe1, 0xffe5 };
 VECT_VAR_DECL(expected,uint,32,2) [] = { 0xffffffe1, 0xffffffe1 };
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL(expected, hfloat, 16, 4) [] = { 0xcfc0, 0xcec0, 0xcfc0, 0xcec0 };
+#endif
 VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0xc1f80000, 0xc1f80000 };
 
 #include "vpXXX.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vpmax.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vpmax.c
index f27a9a9..c962114 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vpmax.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vpmax.c
@@ -15,6 +15,9 @@ VECT_VAR_DECL(expected,uint,8,8) [] = { 0xf1, 0xf3, 0xf5, 0xf7,
 					0xf1, 0xf3, 0xf5, 0xf7 };
 VECT_VAR_DECL(expected,uint,16,4) [] = { 0xfff1, 0xfff3, 0xfff1, 0xfff3 };
 VECT_VAR_DECL(expected,uint,32,2) [] = { 0xfffffff1, 0xfffffff1 };
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL(expected, hfloat, 16, 4) [] = { 0xcb80, 0xca80, 0xcb80, 0xca80 };
+#endif
 VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0xc1700000, 0xc1700000 };
 
 #include "vpXXX.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vpmin.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vpmin.c
index a7cb696..7c75cf5 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vpmin.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vpmin.c
@@ -15,6 +15,9 @@ VECT_VAR_DECL(expected,uint,8,8) [] = { 0xf0, 0xf2, 0xf4, 0xf6,
 					0xf0, 0xf2, 0xf4, 0xf6 };
 VECT_VAR_DECL(expected,uint,16,4) [] = { 0xfff0, 0xfff2, 0xfff0, 0xfff2 };
 VECT_VAR_DECL(expected,uint,32,2) [] = { 0xfffffff0, 0xfffffff0 };
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL(expected, hfloat, 16, 4) [] = { 0xcc00, 0xcb00, 0xcc00, 0xcb00 };
+#endif
 VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0xc1800000, 0xc1800000 };
 
 #include "vpXXX.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrecpe.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrecpe.c
index 55b45b7..cd6a17f 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrecpe.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrecpe.c
@@ -7,6 +7,14 @@
 VECT_VAR_DECL(expected_positive,uint,32,2) [] = { 0xffffffff, 0xffffffff };
 VECT_VAR_DECL(expected_positive,uint,32,4) [] = { 0xbf000000, 0xbf000000,
 						  0xbf000000, 0xbf000000 };
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL(expected_positive, hfloat, 16, 4) [] = { 0x3834, 0x3834,
+						       0x3834, 0x3834 };
+VECT_VAR_DECL(expected_positive, hfloat, 16, 8) [] = { 0x2018, 0x2018,
+						       0x2018, 0x2018,
+						       0x2018, 0x2018,
+						       0x2018, 0x2018 };
+#endif
 VECT_VAR_DECL(expected_positive,hfloat,32,2) [] = { 0x3f068000, 0x3f068000 };
 VECT_VAR_DECL(expected_positive,hfloat,32,4) [] = { 0x3c030000, 0x3c030000,
 						    0x3c030000, 0x3c030000 };
@@ -15,24 +23,56 @@ VECT_VAR_DECL(expected_positive,hfloat,32,4) [] = { 0x3c030000, 0x3c030000,
 VECT_VAR_DECL(expected_negative,uint,32,2) [] = { 0x80000000, 0x80000000 };
 VECT_VAR_DECL(expected_negative,uint,32,4) [] = { 0xee800000, 0xee800000,
 						  0xee800000, 0xee800000 };
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL(expected_negative, hfloat, 16, 4) [] = { 0xae64, 0xae64,
+						       0xae64, 0xae64 };
+VECT_VAR_DECL(expected_negative, hfloat, 16, 8) [] = { 0xa018, 0xa018,
+						       0xa018, 0xa018,
+						       0xa018, 0xa018,
+						       0xa018, 0xa018 };
+#endif
 VECT_VAR_DECL(expected_negative,hfloat,32,2) [] = { 0xbdcc8000, 0xbdcc8000 };
 VECT_VAR_DECL(expected_negative,hfloat,32,4) [] = { 0xbc030000, 0xbc030000,
 						    0xbc030000, 0xbc030000 };
 
 /* Expected results with FP special values (NaN, infinity).  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL(expected_fp1, hfloat, 16, 4) [] = { 0x7e00, 0x7e00,
+						  0x7e00, 0x7e00 };
+VECT_VAR_DECL(expected_fp1, hfloat, 16, 8) [] = { 0x0, 0x0, 0x0, 0x0,
+						  0x0, 0x0, 0x0, 0x0 };
+#endif
 VECT_VAR_DECL(expected_fp1,hfloat,32,2) [] = { 0x7fc00000, 0x7fc00000 };
 VECT_VAR_DECL(expected_fp1,hfloat,32,4) [] = { 0x0, 0x0, 0x0, 0x0 };
 
 /* Expected results with FP special values (zero, large value).  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL(expected_fp2, hfloat, 16, 4) [] = { 0x7c00, 0x7c00,
+						  0x7c00, 0x7c00 };
+VECT_VAR_DECL(expected_fp2, hfloat, 16, 8) [] = { 0x0, 0x0, 0x0, 0x0,
+						  0x0, 0x0, 0x0, 0x0 };
+#endif
 VECT_VAR_DECL(expected_fp2,hfloat,32,2) [] = { 0x7f800000, 0x7f800000 };
 VECT_VAR_DECL(expected_fp2,hfloat,32,4) [] = { 0x0, 0x0, 0x0, 0x0 };
 
 /* Expected results with FP special values (-0, -infinity).  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL(expected_fp3, hfloat, 16, 4) [] = { 0xfc00, 0xfc00,
+						  0xfc00, 0xfc00};
+VECT_VAR_DECL(expected_fp3, hfloat, 16, 8) [] = { 0x8000, 0x8000,
+						  0x8000, 0x8000,
+						  0x8000, 0x8000,
+						  0x8000, 0x8000 };
+#endif
 VECT_VAR_DECL(expected_fp3,hfloat,32,2) [] = { 0xff800000, 0xff800000 };
 VECT_VAR_DECL(expected_fp3,hfloat,32,4) [] = { 0x80000000, 0x80000000,
 					       0x80000000, 0x80000000 };
 
 /* Expected results with FP special large negative value.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL(expected_fp4, hfloat, 16, 4) [] = { 0x8000, 0x8000,
+						  0x8000, 0x8000 };
+#endif
 VECT_VAR_DECL(expected_fp4,hfloat,32,2) [] = { 0x80000000, 0x80000000 };
 
 #define TEST_MSG "VRECPE/VRECPEQ"
@@ -50,11 +90,19 @@ void exec_vrecpe(void)
   /* No need for 64 bits variants.  */
   DECL_VARIABLE(vector, uint, 32, 2);
   DECL_VARIABLE(vector, uint, 32, 4);
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  DECL_VARIABLE(vector, float, 16, 4);
+  DECL_VARIABLE(vector, float, 16, 8);
+#endif
   DECL_VARIABLE(vector, float, 32, 2);
   DECL_VARIABLE(vector, float, 32, 4);
 
   DECL_VARIABLE(vector_res, uint, 32, 2);
   DECL_VARIABLE(vector_res, uint, 32, 4);
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  DECL_VARIABLE(vector_res, float, 16, 4);
+  DECL_VARIABLE(vector_res, float, 16, 8);
+#endif
   DECL_VARIABLE(vector_res, float, 32, 2);
   DECL_VARIABLE(vector_res, float, 32, 4);
 
@@ -62,88 +110,165 @@ void exec_vrecpe(void)
 
   /* Choose init value arbitrarily, positive.  */
   VDUP(vector, , uint, u, 32, 2, 0x12345678);
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VDUP(vector, , float, f, 16, 4, 1.9f);
+#endif
   VDUP(vector, , float, f, 32, 2, 1.9f);
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VDUP(vector, q, float, f, 16, 8, 125.0f);
+#endif
   VDUP(vector, q, uint, u, 32, 4, 0xABCDEF10);
   VDUP(vector, q, float, f, 32, 4, 125.0f);
 
   /* Apply the operator.  */
   TEST_VRECPE(, uint, u, 32, 2);
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  TEST_VRECPE(, float, f, 16, 4);
+#endif
   TEST_VRECPE(, float, f, 32, 2);
   TEST_VRECPE(q, uint, u, 32, 4);
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  TEST_VRECPE(q, float, f, 16, 8);
+#endif
   TEST_VRECPE(q, float, f, 32, 4);
 
 #define CMT " (positive input)"
   CHECK(TEST_MSG, uint, 32, 2, PRIx32, expected_positive, CMT);
   CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected_positive, CMT);
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected_positive, CMT);
+  CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_positive, CMT);
+#endif
   CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected_positive, CMT);
   CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_positive, CMT);
 
   /* Choose init value arbitrarily,negative.  */
   VDUP(vector, , uint, u, 32, 2, 0xFFFFFFFF);
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VDUP(vector, , float, f, 16, 4, -10.0f);
+#endif
   VDUP(vector, , float, f, 32, 2, -10.0f);
   VDUP(vector, q, uint, u, 32, 4, 0x89081234);
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VDUP(vector, q, float, f, 16, 8, -125.0f);
+#endif
   VDUP(vector, q, float, f, 32, 4, -125.0f);
 
   /* Apply the operator.  */
   TEST_VRECPE(, uint, u, 32, 2);
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  TEST_VRECPE(, float, f, 16, 4);
+#endif
   TEST_VRECPE(, float, f, 32, 2);
   TEST_VRECPE(q, uint, u, 32, 4);
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  TEST_VRECPE(q, float, f, 16, 8);
+#endif
   TEST_VRECPE(q, float, f, 32, 4);
 
 #undef CMT
 #define CMT " (negative input)"
   CHECK(TEST_MSG, uint, 32, 2, PRIx32, expected_negative, CMT);
   CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected_negative, CMT);
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected_negative, CMT);
+  CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_negative, CMT);
+#endif
   CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected_negative, CMT);
   CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_negative, CMT);
 
   /* Test FP variants with special input values (NaN, infinity).  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VDUP(vector, , float, f, 16, 4, NAN);
+  VDUP(vector, q, float, f, 16, 8, HUGE_VALF);
+#endif
   VDUP(vector, , float, f, 32, 2, NAN);
   VDUP(vector, q, float, f, 32, 4, HUGE_VALF);
 
   /* Apply the operator.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  TEST_VRECPE(, float, f, 16, 4);
+  TEST_VRECPE(q, float, f, 16, 8);
+#endif
   TEST_VRECPE(, float, f, 32, 2);
   TEST_VRECPE(q, float, f, 32, 4);
 
 #undef CMT
 #define CMT " FP special (NaN, infinity)"
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected_fp1, CMT);
+  CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_fp1, CMT);
+#endif
   CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected_fp1, CMT);
   CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_fp1, CMT);
 
   /* Test FP variants with special input values (zero, large value).  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VDUP(vector, , float, f, 16, 4, 0.0f);
+  VDUP(vector, q, float, f, 16, 8, 8.97229e37f /*9.0e37f*/);
+#endif
   VDUP(vector, , float, f, 32, 2, 0.0f);
   VDUP(vector, q, float, f, 32, 4, 8.97229e37f /*9.0e37f*/);
 
   /* Apply the operator.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  TEST_VRECPE(, float, f, 16, 4);
+  TEST_VRECPE(q, float, f, 16, 8);
+#endif
   TEST_VRECPE(, float, f, 32, 2);
   TEST_VRECPE(q, float, f, 32, 4);
 
 #undef CMT
 #define CMT " FP special (zero, large value)"
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected_fp2, CMT);
+  CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_fp2, CMT);
+#endif
   CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected_fp2, CMT);
   CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_fp2, CMT);
 
   /* Test FP variants with special input values (-0, -infinity).  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VDUP(vector, , float, f, 16, 4, -0.0f);
+  VDUP(vector, q, float, f, 16, 8, -HUGE_VALF);
+#endif
   VDUP(vector, , float, f, 32, 2, -0.0f);
   VDUP(vector, q, float, f, 32, 4, -HUGE_VALF);
 
   /* Apply the operator.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  TEST_VRECPE(, float, f, 16, 4);
+  TEST_VRECPE(q, float, f, 16, 8);
+#endif
   TEST_VRECPE(, float, f, 32, 2);
   TEST_VRECPE(q, float, f, 32, 4);
 
 #undef CMT
 #define CMT " FP special (-0, -infinity)"
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected_fp3, CMT);
+  CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_fp3, CMT);
+#endif
   CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected_fp3, CMT);
   CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_fp3, CMT);
 
   /* Test FP variants with special input values (large negative value).  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VDUP(vector, , float, f, 16, 4, -9.0e37f);
+#endif
   VDUP(vector, , float, f, 32, 2, -9.0e37f);
 
   /* Apply the operator.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  TEST_VRECPE(, float, f, 16, 4);
+#endif
   TEST_VRECPE(, float, f, 32, 2);
 
 #undef CMT
 #define CMT " FP special (large negative value)"
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected_fp4, CMT);
+#endif
   CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected_fp4, CMT);
 }
 
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrecps.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrecps.c
index 0e41947..b06da22 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrecps.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrecps.c
@@ -4,22 +4,51 @@
 #include <math.h>
 
 /* Expected results with positive input.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL(expected, hfloat, 16, 4) [] = { 0xd70c, 0xd70c, 0xd70c, 0xd70c };
+VECT_VAR_DECL(expected, hfloat, 16, 8) [] = { 0xcedc, 0xcedc, 0xcedc, 0xcedc,
+					      0xcedc, 0xcedc, 0xcedc, 0xcedc };
+#endif
 VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0xc2e19eb7, 0xc2e19eb7 };
 VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0xc1db851f, 0xc1db851f,
 					   0xc1db851f, 0xc1db851f };
 
 /* Expected results with FP special values (NaN).  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL(expected_fp1, hfloat, 16, 4) [] = { 0x7e00, 0x7e00,
+						  0x7e00, 0x7e00 };
+VECT_VAR_DECL(expected_fp1, hfloat, 16, 8) [] = { 0x7e00, 0x7e00,
+						  0x7e00, 0x7e00,
+						  0x7e00, 0x7e00,
+						  0x7e00, 0x7e00 };
+#endif
 VECT_VAR_DECL(expected_fp1,hfloat,32,2) [] = { 0x7fc00000, 0x7fc00000 };
 VECT_VAR_DECL(expected_fp1,hfloat,32,4) [] = { 0x7fc00000, 0x7fc00000,
 					       0x7fc00000, 0x7fc00000 };
 
 /* Expected results with FP special values (infinity, 0) and normal
    values.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL(expected_fp2, hfloat, 16, 4) [] = { 0xfc00, 0xfc00,
+						  0xfc00, 0xfc00 };
+VECT_VAR_DECL(expected_fp2, hfloat, 16, 8) [] = { 0x4000, 0x4000,
+						  0x4000, 0x4000,
+						  0x4000, 0x4000,
+						  0x4000, 0x4000 };
+#endif
 VECT_VAR_DECL(expected_fp2,hfloat,32,2) [] = { 0xff800000, 0xff800000 };
 VECT_VAR_DECL(expected_fp2,hfloat,32,4) [] = { 0x40000000, 0x40000000,
 					       0x40000000, 0x40000000 };
 
 /* Expected results with FP special values (infinity, 0).  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL(expected_fp3, hfloat, 16, 4) [] = { 0x4000, 0x4000,
+						  0x4000, 0x4000 };
+VECT_VAR_DECL(expected_fp3, hfloat, 16, 8) [] = { 0x4000, 0x4000,
+						  0x4000, 0x4000,
+						  0x4000, 0x4000,
+						  0x4000, 0x4000 };
+#endif
 VECT_VAR_DECL(expected_fp3,hfloat,32,2) [] = { 0x40000000, 0x40000000 };
 VECT_VAR_DECL(expected_fp3,hfloat,32,4) [] = { 0x40000000, 0x40000000,
 					       0x40000000, 0x40000000 };
@@ -38,74 +67,143 @@ void exec_vrecps(void)
 		    VECT_VAR(vector_res, T1, W, N))
 
   /* No need for integer variants.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  DECL_VARIABLE(vector, float, 16, 4);
+  DECL_VARIABLE(vector, float, 16, 8);
+#endif
   DECL_VARIABLE(vector, float, 32, 2);
   DECL_VARIABLE(vector, float, 32, 4);
 
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  DECL_VARIABLE(vector2, float, 16, 4);
+  DECL_VARIABLE(vector2, float, 16, 8);
+#endif
   DECL_VARIABLE(vector2, float, 32, 2);
   DECL_VARIABLE(vector2, float, 32, 4);
 
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  DECL_VARIABLE(vector_res, float, 16, 4);
+  DECL_VARIABLE(vector_res, float, 16, 8);
+#endif
   DECL_VARIABLE(vector_res, float, 32, 2);
   DECL_VARIABLE(vector_res, float, 32, 4);
 
   clean_results ();
 
   /* Choose init value arbitrarily.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VDUP(vector, , float, f, 16, 4, 12.9f);
+  VDUP(vector, q, float, f, 16, 8, 9.2f);
+#endif
   VDUP(vector, , float, f, 32, 2, 12.9f);
   VDUP(vector, q, float, f, 32, 4, 9.2f);
 
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VDUP(vector2, , float, f, 16, 4, 8.9f);
+  VDUP(vector2, q, float, f, 16, 8, 3.2f);
+#endif
   VDUP(vector2, , float, f, 32, 2, 8.9f);
   VDUP(vector2, q, float, f, 32, 4, 3.2f);
 
   /* Apply the operator.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  TEST_VRECPS(, float, f, 16, 4);
+  TEST_VRECPS(q, float, f, 16, 8);
+#endif
   TEST_VRECPS(, float, f, 32, 2);
   TEST_VRECPS(q, float, f, 32, 4);
 
 #define CMT " (positive input)"
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected, CMT);
+  CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected, CMT);
+#endif
   CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected, CMT);
   CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected, CMT);
 
 
   /* Test FP variants with special input values (NaN).  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VDUP(vector, , float, f, 16, 4, NAN);
+  VDUP(vector2, q, float, f, 16, 8, NAN);
+#endif
   VDUP(vector, , float, f, 32, 2, NAN);
   VDUP(vector2, q, float, f, 32, 4, NAN);
 
   /* Apply the operator.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  TEST_VRECPS(, float, f, 16, 4);
+  TEST_VRECPS(q, float, f, 16, 8);
+#endif
   TEST_VRECPS(, float, f, 32, 2);
   TEST_VRECPS(q, float, f, 32, 4);
 
 #undef CMT
 #define CMT " FP special (NaN)"
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected_fp1, CMT);
+  CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_fp1, CMT);
+#endif
   CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected_fp1, CMT);
   CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_fp1, CMT);
 
 
   /* Test FP variants with special input values (infinity, 0).  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VDUP(vector, , float, f, 16, 4, HUGE_VALF);
+  VDUP(vector, q, float, f, 16, 8, 0.0f);
+  VDUP(vector2, q, float, f, 16, 8, 3.2f); /* Restore a normal value.  */
+#endif
   VDUP(vector, , float, f, 32, 2, HUGE_VALF);
   VDUP(vector, q, float, f, 32, 4, 0.0f);
   VDUP(vector2, q, float, f, 32, 4, 3.2f); /* Restore a normal value.  */
 
+
   /* Apply the operator.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  TEST_VRECPS(, float, f, 16, 4);
+  TEST_VRECPS(q, float, f, 16, 8);
+#endif
   TEST_VRECPS(, float, f, 32, 2);
   TEST_VRECPS(q, float, f, 32, 4);
 
 #undef CMT
 #define CMT " FP special (infinity, 0) and normal value"
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected_fp2, CMT);
+  CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_fp2, CMT);
+#endif
   CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected_fp2, CMT);
   CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_fp2, CMT);
 
 
   /* Test FP variants with only special input values (infinity, 0).  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VDUP(vector, , float, f, 16, 4, HUGE_VALF);
+  VDUP(vector, q, float, f, 16, 8, 0.0f);
+  VDUP(vector2, , float, f, 16, 4, 0.0f);
+  VDUP(vector2, q, float, f, 16, 8, HUGE_VALF);
+#endif
   VDUP(vector, , float, f, 32, 2, HUGE_VALF);
   VDUP(vector, q, float, f, 32, 4, 0.0f);
   VDUP(vector2, , float, f, 32, 2, 0.0f);
   VDUP(vector2, q, float, f, 32, 4, HUGE_VALF);
 
+
   /* Apply the operator */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  TEST_VRECPS(, float, f, 16, 4);
+  TEST_VRECPS(q, float, f, 16, 8);
+#endif
   TEST_VRECPS(, float, f, 32, 2);
   TEST_VRECPS(q, float, f, 32, 4);
 
 #undef CMT
 #define CMT " FP special (infinity, 0)"
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected_fp3, CMT);
+  CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_fp3, CMT);
+#endif
   CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected_fp3, CMT);
   CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_fp3, CMT);
 }
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnd.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnd.c
index d97a3a2..fe6715f 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnd.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnd.c
@@ -6,6 +6,14 @@
 #include "compute-ref-data.h"
 
 /* Expected results.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL (expected, hfloat, 16, 4) [] = { 0xcc00, 0xcb80,
+					       0xcb00, 0xca80 };
+VECT_VAR_DECL (expected, hfloat, 16, 8) [] = { 0xcc00, 0xcb80,
+					       0xcb00, 0xca80,
+					       0xca00, 0xc980,
+					       0xc900, 0xc880 };
+#endif
 VECT_VAR_DECL (expected, hfloat, 32, 2) [] = { 0xc1800000, 0xc1700000 };
 VECT_VAR_DECL (expected, hfloat, 32, 4) [] = { 0xc1800000, 0xc1700000,
 					       0xc1600000, 0xc1500000 };
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndX.inc b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndX.inc
index 629240d..bb4a6ba 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndX.inc
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndX.inc
@@ -17,20 +17,40 @@ void FNNAME (INSN) (void)
 #define TEST_VRND(Q, T1, T2, W, N)		\
   TEST_VRND1 (INSN, Q, T1, T2, W, N)
 
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  DECL_VARIABLE(vector, float, 16, 4);
+  DECL_VARIABLE(vector, float, 16, 8);
+#endif
   DECL_VARIABLE (vector, float, 32, 2);
   DECL_VARIABLE (vector, float, 32, 4);
 
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  DECL_VARIABLE(vector_res, float, 16, 4);
+  DECL_VARIABLE(vector_res, float, 16, 8);
+#endif
   DECL_VARIABLE (vector_res, float, 32, 2);
   DECL_VARIABLE (vector_res, float, 32, 4);
 
   clean_results ();
 
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VLOAD (vector, buffer, , float, f, 16, 4);
+  VLOAD (vector, buffer, q, float, f, 16, 8);
+#endif
   VLOAD (vector, buffer, , float, f, 32, 2);
   VLOAD (vector, buffer, q, float, f, 32, 4);
 
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  TEST_VRND ( , float, f, 16, 4);
+  TEST_VRND (q, float, f, 16, 8);
+#endif
   TEST_VRND ( , float, f, 32, 2);
   TEST_VRND (q, float, f, 32, 4);
 
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected, "");
+  CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected, "");
+#endif
   CHECK_FP (TEST_MSG, float, 32, 2, PRIx32, expected, "");
   CHECK_FP (TEST_MSG, float, 32, 4, PRIx32, expected, "");
 }
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnda.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnda.c
index ff2bdc0..9c0f7ff 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnda.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnda.c
@@ -6,6 +6,15 @@
 #include "compute-ref-data.h"
 
 /* Expected results.  */
+/* Expected results.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL (expected, hfloat, 16, 4) [] = { 0xcc00, 0xcb80,
+					       0xcb00, 0xca80 };
+VECT_VAR_DECL (expected, hfloat, 16, 8) [] = { 0xcc00, 0xcb80,
+					       0xcb00, 0xca80,
+					       0xca00, 0xc980,
+					       0xc900, 0xc880 };
+#endif
 VECT_VAR_DECL (expected, hfloat, 32, 2) [] = { 0xc1800000, 0xc1700000 };
 VECT_VAR_DECL (expected, hfloat, 32, 4) [] = { 0xc1800000, 0xc1700000,
 					       0xc1600000, 0xc1500000 };
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndm.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndm.c
index eae9f61..9bfaffc 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndm.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndm.c
@@ -6,6 +6,15 @@
 #include "compute-ref-data.h"
 
 /* Expected results.  */
+/* Expected results.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL (expected, hfloat, 16, 4) [] = { 0xcc00, 0xcb80,
+					       0xcb00, 0xca80 };
+VECT_VAR_DECL (expected, hfloat, 16, 8) [] = { 0xcc00, 0xcb80,
+					       0xcb00, 0xca80,
+					       0xca00, 0xc980,
+					       0xc900, 0xc880 };
+#endif
 VECT_VAR_DECL (expected, hfloat, 32, 2) [] = { 0xc1800000, 0xc1700000 };
 VECT_VAR_DECL (expected, hfloat, 32, 4) [] = { 0xc1800000, 0xc1700000,
 					       0xc1600000, 0xc1500000 };
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndn.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndn.c
index c6c707d..52b9942 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndn.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndn.c
@@ -6,6 +6,15 @@
 #include "compute-ref-data.h"
 
 /* Expected results.  */
+/* Expected results.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL (expected, hfloat, 16, 4) [] = { 0xcc00, 0xcb80,
+					       0xcb00, 0xca80 };
+VECT_VAR_DECL (expected, hfloat, 16, 8) [] = { 0xcc00, 0xcb80,
+					       0xcb00, 0xca80,
+					       0xca00, 0xc980,
+					       0xc900, 0xc880 };
+#endif
 VECT_VAR_DECL (expected, hfloat, 32, 2) [] = { 0xc1800000, 0xc1700000 };
 VECT_VAR_DECL (expected, hfloat, 32, 4) [] = { 0xc1800000, 0xc1700000,
 					       0xc1600000, 0xc1500000 };
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndp.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndp.c
index e94eb6b..2e888b9 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndp.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndp.c
@@ -6,6 +6,14 @@
 #include "compute-ref-data.h"
 
 /* Expected results.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL (expected, hfloat, 16, 4) [] = { 0xcc00, 0xcb80,
+					       0xcb00, 0xca80 };
+VECT_VAR_DECL (expected, hfloat, 16, 8) [] = { 0xcc00, 0xcb80,
+					       0xcb00, 0xca80,
+					       0xca00, 0xc980,
+					       0xc900, 0xc880 };
+#endif
 VECT_VAR_DECL (expected, hfloat, 32, 2) [] = { 0xc1800000, 0xc1700000 };
 VECT_VAR_DECL (expected, hfloat, 32, 4) [] = { 0xc1800000, 0xc1700000,
 					       0xc1600000, 0xc1500000 };
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndx.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndx.c
index 0d2a63e..400ddf8 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndx.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndx.c
@@ -6,6 +6,14 @@
 #include "compute-ref-data.h"
 
 /* Expected results.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL (expected, hfloat, 16, 4) [] = { 0xcc00, 0xcb80,
+					       0xcb00, 0xca80 };
+VECT_VAR_DECL (expected, hfloat, 16, 8) [] = { 0xcc00, 0xcb80,
+					       0xcb00, 0xca80,
+					       0xca00, 0xc980,
+					       0xc900, 0xc880 };
+#endif
 VECT_VAR_DECL (expected, hfloat, 32, 2) [] = { 0xc1800000, 0xc1700000 };
 VECT_VAR_DECL (expected, hfloat, 32, 4) [] = { 0xc1800000, 0xc1700000,
 					       0xc1600000, 0xc1500000 };
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrsqrte.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrsqrte.c
index 0291ec0..77e2210 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrsqrte.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrsqrte.c
@@ -7,6 +7,11 @@
 VECT_VAR_DECL(expected,uint,32,2) [] = { 0xffffffff, 0xffffffff };
 VECT_VAR_DECL(expected,uint,32,4) [] = { 0x9c800000, 0x9c800000,
 					 0x9c800000, 0x9c800000 };
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL(expected, hfloat, 16, 4) [] = { 0x324c, 0x324c, 0x324c, 0x324c };
+VECT_VAR_DECL(expected, hfloat, 16, 8) [] = { 0x3380, 0x3380, 0x3380, 0x3380,
+					      0x3380, 0x3380, 0x3380, 0x3380 };
+#endif
 VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0x3e498000, 0x3e498000 };
 VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0x3e700000, 0x3e700000,
 					   0x3e700000, 0x3e700000 };
@@ -22,17 +27,39 @@ VECT_VAR_DECL(expected_2,uint,32,4) [] = { 0xed000000, 0xed000000,
 					   0xed000000, 0xed000000 };
 
 /* Expected results with FP special inputs values (NaNs, ...).  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL(expected_fp1, hfloat, 16, 4) [] = { 0x7e00, 0x7e00,
+						  0x7e00, 0x7e00 };
+VECT_VAR_DECL(expected_fp1, hfloat, 16, 8) [] = { 0x7c00, 0x7c00,
+						  0x7c00, 0x7c00,
+						  0x7c00, 0x7c00,
+						  0x7c00, 0x7c00 };
+#endif
 VECT_VAR_DECL(expected_fp1,hfloat,32,2) [] = { 0x7fc00000, 0x7fc00000 };
 VECT_VAR_DECL(expected_fp1,hfloat,32,4) [] = { 0x7f800000, 0x7f800000,
 					       0x7f800000, 0x7f800000 };
 
 /* Expected results with FP special inputs values
    (negative, infinity).  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL(expected_fp2, hfloat, 16, 4) [] = { 0x7e00, 0x7e00,
+						  0x7e00, 0x7e00 };
+VECT_VAR_DECL(expected_fp2, hfloat, 16, 8) [] = { 0x0, 0x0, 0x0, 0x0, 0x0,
+						  0x0, 0x0, 0x0 };
+#endif
 VECT_VAR_DECL(expected_fp2,hfloat,32,2) [] = { 0x7fc00000, 0x7fc00000 };
 VECT_VAR_DECL(expected_fp2,hfloat,32,4) [] = { 0x0, 0x0, 0x0, 0x0 };
 
 /* Expected results with FP special inputs values
    (-0, -infinity).  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL(expected_fp3, hfloat, 16, 4) [] = { 0xfc00, 0xfc00,
+						  0xfc00, 0xfc00 };
+VECT_VAR_DECL(expected_fp3, hfloat, 16, 8) [] = { 0x7e00, 0x7e00,
+						  0x7e00, 0x7e00,
+						  0x7e00, 0x7e00,
+						  0x7e00, 0x7e00 };
+#endif
 VECT_VAR_DECL(expected_fp3,hfloat,32,2) [] = { 0xff800000, 0xff800000 };
 VECT_VAR_DECL(expected_fp3,hfloat,32,4) [] = { 0x7fc00000, 0x7fc00000,
 					       0x7fc00000, 0x7fc00000 };
@@ -50,32 +77,60 @@ void exec_vrsqrte(void)
 		    VECT_VAR(vector_res, T1, W, N))
 
   DECL_VARIABLE(vector, uint, 32, 2);
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  DECL_VARIABLE(vector, float, 16, 4);
+#endif
   DECL_VARIABLE(vector, float, 32, 2);
   DECL_VARIABLE(vector, uint, 32, 4);
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  DECL_VARIABLE(vector, float, 16, 8);
+#endif
   DECL_VARIABLE(vector, float, 32, 4);
 
   DECL_VARIABLE(vector_res, uint, 32, 2);
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  DECL_VARIABLE(vector_res, float, 16, 4);
+#endif
   DECL_VARIABLE(vector_res, float, 32, 2);
   DECL_VARIABLE(vector_res, uint, 32, 4);
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  DECL_VARIABLE(vector_res, float, 16, 8);
+#endif
   DECL_VARIABLE(vector_res, float, 32, 4);
 
   clean_results ();
 
   /* Choose init value arbitrarily.  */
   VDUP(vector, , uint, u, 32, 2, 0x12345678);
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VDUP(vector, , float, f, 16, 4, 25.799999f);
+#endif
   VDUP(vector, , float, f, 32, 2, 25.799999f);
   VDUP(vector, q, uint, u, 32, 4, 0xABCDEF10);
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VDUP(vector, q, float, f, 16, 8, 18.2f);
+#endif
   VDUP(vector, q, float, f, 32, 4, 18.2f);
 
   /* Apply the operator.  */
   TEST_VRSQRTE(, uint, u, 32, 2);
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  TEST_VRSQRTE(, float, f, 16, 4);
+#endif
   TEST_VRSQRTE(, float, f, 32, 2);
   TEST_VRSQRTE(q, uint, u, 32, 4);
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  TEST_VRSQRTE(q, float, f, 16, 8);
+#endif
   TEST_VRSQRTE(q, float, f, 32, 4);
 
 #define CMT ""
   CHECK(TEST_MSG, uint, 32, 2, PRIx32, expected, CMT);
   CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected, CMT);
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected, CMT);
+  CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected, CMT);
+#endif
   CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected, CMT);
   CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected, CMT);
 
@@ -110,42 +165,78 @@ void exec_vrsqrte(void)
 
 
   /* Test FP variants with special input values (NaNs, ...).  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VDUP(vector, , float, f, 16, 4, NAN);
+  VDUP(vector, q, float, f, 16, 8, 0.0f);
+#endif
   VDUP(vector, , float, f, 32, 2, NAN);
   VDUP(vector, q, float, f, 32, 4, 0.0f);
 
   /* Apply the operator.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  TEST_VRSQRTE(, float, f, 16, 4);
+  TEST_VRSQRTE(q, float, f, 16, 8);
+#endif
   TEST_VRSQRTE(, float, f, 32, 2);
   TEST_VRSQRTE(q, float, f, 32, 4);
 
 #undef CMT
 #define CMT " FP special (NaN, 0)"
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected_fp1, CMT);
+  CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_fp1, CMT);
+#endif
   CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected_fp1, CMT);
   CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_fp1, CMT);
 
 
   /* Test FP variants with special input values (negative, infinity).  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VDUP(vector, , float, f, 16, 4, -1.0f);
+  VDUP(vector, q, float, f, 16, 8, HUGE_VALF);
+#endif
   VDUP(vector, , float, f, 32, 2, -1.0f);
   VDUP(vector, q, float, f, 32, 4, HUGE_VALF);
 
   /* Apply the operator.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  TEST_VRSQRTE(, float, f, 16, 4);
+  TEST_VRSQRTE(q, float, f, 16, 8);
+#endif
   TEST_VRSQRTE(, float, f, 32, 2);
   TEST_VRSQRTE(q, float, f, 32, 4);
 
 #undef CMT
 #define CMT " FP special (negative, infinity)"
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected_fp2, CMT);
+  CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_fp2, CMT);
+#endif
   CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected_fp2, CMT);
   CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_fp2, CMT);
 
   /* Test FP variants with special input values (-0, -infinity).  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VDUP(vector, , float, f, 16, 4, -0.0f);
+  VDUP(vector, q, float, f, 16, 8, -HUGE_VALF);
+#endif
   VDUP(vector, , float, f, 32, 2, -0.0f);
   VDUP(vector, q, float, f, 32, 4, -HUGE_VALF);
 
   /* Apply the operator.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  TEST_VRSQRTE(, float, f, 16, 4);
+  TEST_VRSQRTE(q, float, f, 16, 8);
+#endif
   TEST_VRSQRTE(, float, f, 32, 2);
   TEST_VRSQRTE(q, float, f, 32, 4);
 
 #undef CMT
 #define CMT " FP special (-0, -infinity)"
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected_fp3, CMT);
+  CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_fp3, CMT);
+#endif
   CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected_fp3, CMT);
   CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_fp3, CMT);
 }
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrsqrts.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrsqrts.c
index 4531026..06626e4 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrsqrts.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrsqrts.c
@@ -4,22 +4,51 @@
 #include <math.h>
 
 /* Expected results.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL(expected, hfloat, 16, 4) [] = { 0xd3cb, 0xd3cb, 0xd3cb, 0xd3cb };
+VECT_VAR_DECL(expected, hfloat, 16, 8) [] = { 0xc726, 0xc726, 0xc726, 0xc726,
+					      0xc726, 0xc726, 0xc726, 0xc726 };
+#endif
 VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0xc2796b84, 0xc2796b84 };
 VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0xc0e4a3d8, 0xc0e4a3d8,
 					   0xc0e4a3d8, 0xc0e4a3d8 };
 
 /* Expected results with input=NaN.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL(expected_nan, hfloat, 16, 4) [] = { 0x7e00, 0x7e00,
+						  0x7e00, 0x7e00 };
+VECT_VAR_DECL(expected_nan, hfloat, 16, 8) [] = { 0x7e00, 0x7e00,
+						  0x7e00, 0x7e00,
+						  0x7e00, 0x7e00,
+						  0x7e00, 0x7e00 };
+#endif
 VECT_VAR_DECL(expected_nan,hfloat,32,2) [] = { 0x7fc00000, 0x7fc00000 };
 VECT_VAR_DECL(expected_nan,hfloat,32,4) [] = { 0x7fc00000, 0x7fc00000,
 					       0x7fc00000, 0x7fc00000 };
 
 /* Expected results with FP special inputs values (infinity, 0).  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL(expected_fp1, hfloat, 16, 4) [] = { 0xfc00, 0xfc00,
+						  0xfc00, 0xfc00 };
+VECT_VAR_DECL(expected_fp1, hfloat, 16, 8) [] = { 0x3e00, 0x3e00,
+						  0x3e00, 0x3e00,
+						  0x3e00, 0x3e00,
+						  0x3e00, 0x3e00 };
+#endif
 VECT_VAR_DECL(expected_fp1,hfloat,32,2) [] = { 0xff800000, 0xff800000 };
 VECT_VAR_DECL(expected_fp1,hfloat,32,4) [] = { 0x3fc00000, 0x3fc00000,
 					       0x3fc00000, 0x3fc00000 };
 
 /* Expected results with only FP special inputs values (infinity,
    0).  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL(expected_fp2, hfloat, 16, 4) [] = { 0x3e00, 0x3e00,
+						  0x3e00, 0x3e00 };
+VECT_VAR_DECL(expected_fp2, hfloat, 16, 8) [] = { 0x3e00, 0x3e00,
+						  0x3e00, 0x3e00,
+						  0x3e00, 0x3e00,
+						  0x3e00, 0x3e00 };
+#endif
 VECT_VAR_DECL(expected_fp2,hfloat,32,2) [] = { 0x3fc00000, 0x3fc00000 };
 VECT_VAR_DECL(expected_fp2,hfloat,32,4) [] = { 0x3fc00000, 0x3fc00000,
 					       0x3fc00000, 0x3fc00000 };
@@ -38,75 +67,143 @@ void exec_vrsqrts(void)
 		    VECT_VAR(vector_res, T1, W, N))
 
   /* No need for integer variants.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  DECL_VARIABLE(vector, float, 16, 4);
+  DECL_VARIABLE(vector, float, 16, 8);
+#endif
   DECL_VARIABLE(vector, float, 32, 2);
   DECL_VARIABLE(vector, float, 32, 4);
 
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  DECL_VARIABLE(vector2, float, 16, 4);
+  DECL_VARIABLE(vector2, float, 16, 8);
+#endif
   DECL_VARIABLE(vector2, float, 32, 2);
   DECL_VARIABLE(vector2, float, 32, 4);
 
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  DECL_VARIABLE(vector_res, float, 16, 4);
+  DECL_VARIABLE(vector_res, float, 16, 8);
+#endif
   DECL_VARIABLE(vector_res, float, 32, 2);
   DECL_VARIABLE(vector_res, float, 32, 4);
 
   clean_results ();
 
   /* Choose init value arbitrarily.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VDUP(vector, , float, f, 16, 4, 12.9f);
+  VDUP(vector, q, float, f, 16, 8, 9.1f);
+#endif
   VDUP(vector, , float, f, 32, 2, 12.9f);
   VDUP(vector, q, float, f, 32, 4, 9.1f);
 
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VDUP(vector2, , float, f, 16, 4, 9.9f);
+  VDUP(vector2, q, float, f, 16, 8, 1.9f);
+#endif
   VDUP(vector2, , float, f, 32, 2, 9.9f);
   VDUP(vector2, q, float, f, 32, 4, 1.9f);
 
   /* Apply the operator.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  TEST_VRSQRTS(, float, f, 16, 4);
+  TEST_VRSQRTS(q, float, f, 16, 8);
+#endif
   TEST_VRSQRTS(, float, f, 32, 2);
   TEST_VRSQRTS(q, float, f, 32, 4);
 
 #define CMT ""
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected, CMT);
+  CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected, CMT);
+#endif
   CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected, CMT);
   CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected, CMT);
 
 
   /* Test FP variants with special input values (NaN).  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VDUP(vector, , float, f, 16, 4, NAN);
+  VDUP(vector2, q, float, f, 16, 8, NAN);
+#endif
   VDUP(vector, , float, f, 32, 2, NAN);
   VDUP(vector2, q, float, f, 32, 4, NAN);
 
   /* Apply the operator.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  TEST_VRSQRTS(, float, f, 16, 4);
+  TEST_VRSQRTS(q, float, f, 16, 8);
+#endif
   TEST_VRSQRTS(, float, f, 32, 2);
   TEST_VRSQRTS(q, float, f, 32, 4);
 
 #undef CMT
 #define CMT " FP special (NAN) and normal values"
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected_nan, CMT);
+  CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_nan, CMT);
+#endif
   CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected_nan, CMT);
   CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_nan, CMT);
 
 
   /* Test FP variants with special input values (infinity, 0).  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VDUP(vector, , float, f, 16, 4, HUGE_VALF);
+  VDUP(vector, q, float, f, 16, 8, 0.0f);
+  /* Restore a normal value in vector2.  */
+  VDUP(vector2, q, float, f, 16, 8, 3.2f);
+#endif
   VDUP(vector, , float, f, 32, 2, HUGE_VALF);
   VDUP(vector, q, float, f, 32, 4, 0.0f);
   /* Restore a normal value in vector2.  */
   VDUP(vector2, q, float, f, 32, 4, 3.2f);
 
   /* Apply the operator.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  TEST_VRSQRTS(, float, f, 16, 4);
+  TEST_VRSQRTS(q, float, f, 16, 8);
+#endif
   TEST_VRSQRTS(, float, f, 32, 2);
   TEST_VRSQRTS(q, float, f, 32, 4);
 
 #undef CMT
 #define CMT " FP special (infinity, 0) and normal values"
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected_fp1, CMT);
+  CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_fp1, CMT);
+#endif
   CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected_fp1, CMT);
   CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_fp1, CMT);
 
 
   /* Test FP variants with only special input values (infinity, 0).  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  VDUP(vector, , float, f, 16, 4, HUGE_VALF);
+  VDUP(vector, q, float, f, 16, 8, 0.0f);
+  VDUP(vector2, , float, f, 16, 4, -0.0f);
+  VDUP(vector2, q, float, f, 16, 8, HUGE_VALF);
+#endif
   VDUP(vector, , float, f, 32, 2, HUGE_VALF);
   VDUP(vector, q, float, f, 32, 4, 0.0f);
   VDUP(vector2, , float, f, 32, 2, -0.0f);
   VDUP(vector2, q, float, f, 32, 4, HUGE_VALF);
 
   /* Apply the operator.  */
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  TEST_VRSQRTS(, float, f, 16, 4);
+  TEST_VRSQRTS(q, float, f, 16, 8);
+#endif
   TEST_VRSQRTS(, float, f, 32, 2);
   TEST_VRSQRTS(q, float, f, 32, 4);
 
 #undef CMT
 #define CMT " only FP special (infinity, 0)"
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected_fp2, CMT);
+  CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_fp2, CMT);
+#endif
   CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected_fp2, CMT);
   CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_fp2, CMT);
 }
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vsub.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vsub.c
index 1a108d5..19d1fd2 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vsub.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vsub.c
@@ -44,6 +44,14 @@ VECT_VAR_DECL(expected,uint,64,2) [] = { 0xffffffffffffffed,
 VECT_VAR_DECL(expected_float32,hfloat,32,2) [] = { 0xc00ccccd, 0xc00ccccd };
 VECT_VAR_DECL(expected_float32,hfloat,32,4) [] = { 0xc00ccccc, 0xc00ccccc,
 						   0xc00ccccc, 0xc00ccccc };
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+VECT_VAR_DECL(expected_float16, hfloat, 16, 4) [] = { 0xc066, 0xc066,
+						      0xc066, 0xc066 };
+VECT_VAR_DECL(expected_float16, hfloat, 16, 8) [] = { 0xc067, 0xc067,
+						      0xc067, 0xc067,
+						      0xc067, 0xc067,
+						      0xc067, 0xc067 };
+#endif
 
 void exec_vsub_f32(void)
 {
@@ -67,4 +75,27 @@ void exec_vsub_f32(void)
 
   CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected_float32, "");
   CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_float32, "");
+
+#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
+  DECL_VARIABLE(vector, float, 16, 4);
+  DECL_VARIABLE(vector, float, 16, 8);
+
+  DECL_VARIABLE(vector2, float, 16, 4);
+  DECL_VARIABLE(vector2, float, 16, 8);
+
+  DECL_VARIABLE(vector_res, float, 16, 4);
+  DECL_VARIABLE(vector_res, float, 16, 8);
+
+  VDUP(vector, , float, f, 16, 4, 2.3f);
+  VDUP(vector, q, float, f, 16, 8, 3.4f);
+
+  VDUP(vector2, , float, f, 16, 4, 4.5f);
+  VDUP(vector2, q, float, f, 16, 8, 5.6f);
+
+  TEST_BINARY_OP(INSN_NAME, , float, f, 16, 4);
+  TEST_BINARY_OP(INSN_NAME, q, float, f, 16, 8);
+
+  CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected_float16, "");
+  CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_float16, "");
+#endif
 }
-- 
2.1.4


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 2/17][Testsuite] Add a selector for ARM FP16 alternative format support.
  2016-05-17 14:25 ` [PATCH 2/17][Testsuite] Add a selector for ARM FP16 alternative format support Matthew Wahab
@ 2016-07-27 13:34   ` Ramana Radhakrishnan
  0 siblings, 0 replies; 73+ messages in thread
From: Ramana Radhakrishnan @ 2016-07-27 13:34 UTC (permalink / raw)
  To: Matthew Wahab; +Cc: gcc-patches

On Tue, May 17, 2016 at 3:24 PM, Matthew Wahab
<matthew.wahab@foss.arm.com> wrote:
> The ARMv8.2-A FP16 extension only supports the IEEE format for FP16
> data. It is not compatible with the option -mfp16-format=none nor with
> the option -mfp16-format=alternative (selecting the ARM alternative FP16
> format). Using either with the FP16 extension will trigger a compiler
> error.
>
> This patch adds the selector arm_fp16_alternative_ok to the testsuite's
> target-support code to allow tests to require support for the
> alternative format. It also adds selector arm_fp16_none_ok to check
> whether -mfp16-format=none is a valid option for the target.  The patch
> also updates existing tests to make use of the new selectors.
>
> Tested the series for arm-none-linux-gnueabihf with native bootstrap and
> make check and for arm-none-eabi and armeb-none-eabi with make check on an
> ARMv8.2-A emulator.
>
> Ok for trunk?
> Matthew
>
> 2016-05-17  Matthew Wahab  <matthew.wahab@arm.com>
>
>         * doc/sourcebuild.texi (ARM-specific attributes): Add entries for
>         arm_fp16_alternative_ok and arm_fp16_none_ok.
>
> testsuite/
> 2016-05-17  Matthew Wahab  <matthew.wahab@arm.com>
>
>         * g++.dg/ext/arm-fp16/arm-fp16-ops-3.C: Use
>         arm_fp16_alternative_ok.
>         * g++.dg/ext/arm-fp16/arm-fp16-ops-4.C: Likewise.
>         * gcc.dg/torture/arm-fp16-int-convert-alt.c: Likewise.
>         * gcc/testsuite/gcc.dg/torture/arm-fp16-ops-3.c: Likewise.
>         * gcc/testsuite/gcc.dg/torture/arm-fp16-ops-4.c: Likewise.
>         * gcc.target/arm/fp16-compile-alt-1.c: Likewise.
>         * gcc.target/arm/fp16-compile-alt-10.c: Likewise.
>         * gcc.target/arm/fp16-compile-alt-11.c: Likewise.
>         * gcc.target/arm/fp16-compile-alt-12.c: Likewise.
>         * gcc.target/arm/fp16-compile-alt-2.c: Likewise.
>         * gcc.target/arm/fp16-compile-alt-3.c: Likewise.
>         * gcc.target/arm/fp16-compile-alt-4.c: Likewise.
>         * gcc.target/arm/fp16-compile-alt-5.c: Likewise.
>         * gcc.target/arm/fp16-compile-alt-6.c: Likewise.
>         * gcc.target/arm/fp16-compile-alt-7.c: Likewise.
>         * gcc.target/arm/fp16-compile-alt-8.c: Likewise.
>         * gcc.target/arm/fp16-compile-alt-9.c: Likewise.
>         * gcc.target/arm/fp16-compile-none-1.c: Use arm_fp16_none_ok.
>         * gcc.target/arm/fp16-compile-none-2.c: Likewise.
>         * gcc.target/arm/fp16-rounding-alt-1.c: Use
>         arm_fp16_alternative_ok.
>         * lib/target-supports.exp
>         (check_effective_target_arm_fp16_alternative_ok_nocache): New.
>         (check_effective_target_arm_fp16_alternative_ok): New.
>         (check_effective_target_arm_fp16_none_ok_nocache): New.
>         (check_effective_target_arm_fp16_none_ok): New.
>


OK.

Thanks,
ramana

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 3/17][Testsuite] Add ARM support for ARMv8.2-A with FP16 arithmetic instructions.
  2016-07-04 13:49   ` Matthew Wahab
@ 2016-07-27 13:34     ` Ramana Radhakrishnan
  0 siblings, 0 replies; 73+ messages in thread
From: Ramana Radhakrishnan @ 2016-07-27 13:34 UTC (permalink / raw)
  To: Matthew Wahab; +Cc: gcc-patches

On Mon, Jul 4, 2016 at 2:49 PM, Matthew Wahab
<matthew.wahab@foss.arm.com> wrote:
> On 17/05/16 15:26, Matthew Wahab wrote:
>> The ARMv8.2-A FP16 extension adds to both the VFP and the NEON
>> instruction sets. This patch adds support to the testsuite to select
>> targets and set options for tests that make use of these
>> instructions. It also adds documentation for ARMv8.1-A selectors.
>
> This is a rebase of the patch to take account of changes in
> sourcebuild.texi.
>
> Tested the series for arm-none-linux-gnueabihf with native bootstrap and
> make check and for arm-none-eabi and armeb-none-eabi with make check on
> an ARMv8.2-A emulator.
>
> 2016-07-04  Matthew Wahab  <matthew.wahab@arm.com>
>
>         * doc/sourcebuild.texi (ARM-specific attributes): Add anchor for
>         arm_v8_1a_neon_ok.  Add entries for arm_v8_2a_fp16_scalar_ok,
>         arm_v8_2a_fp16_scalar_hw, arm_v8_2a_fp16_neon_ok and
>         arm_v8_2a_fp16_neon_hw.
>         (Add options): Add entries for arm_v8_1a_neon,
> arm_v8_2a_fp16_scalar,
>         arm_v8_2a_fp16_neon.
>         * lib/target-supports.exp
>         (add_options_for_arm_v8_2a_fp16_scalar): New.
>
>         (add_options_for_arm_v8_2a_fp16_neon): New.
>         (check_effective_target_arm_arch_v8_2a_ok): Auto-generate.
>         (add_options_for_arm_arch_v8_2a): Auto-generate.
>         (check_effective_target_arm_arch_v8_2a_multilib): Auto-generate.
>         (check_effective_target_arm_v8_2a_fp16_scalar_ok_nocache): New.
>         (check_effective_target_arm_v8_2a_fp16_scalar_ok): New.
>         (check_effective_target_arm_v8_2a_fp16_neon_ok_nocache): New.
>         (check_effective_target_arm_v8_2a_fp16_neon_ok): New.
>         (check_effective_target_arm_v8_2a_fp16_scalar_hw): New.
>         (check_effective_target_arm_v8_2a_fp16_neon_hw): New.
>


OK.

Ramana

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 4/17][ARM] Define feature macros for FP16.
  2016-05-17 14:28 ` [PATCH 4/17][ARM] Define feature macros for FP16 Matthew Wahab
@ 2016-07-27 13:35   ` Ramana Radhakrishnan
  0 siblings, 0 replies; 73+ messages in thread
From: Ramana Radhakrishnan @ 2016-07-27 13:35 UTC (permalink / raw)
  To: Matthew Wahab; +Cc: gcc-patches

On Tue, May 17, 2016 at 3:28 PM, Matthew Wahab
<matthew.wahab@foss.arm.com> wrote:
> The FP16 extension introduced with the ARMv8.2-A architecture adds
> instructions operating on FP16 values to the VFP and NEON instruction
> sets.
>
> The patch adds the feature macro __ARM_FEATURE_FP16_SCALAR_ARITHMETIC
> which is defined to be 1 if the VFP FP16 instructions are available; it
> is otherwise undefined.
>
> The patch also adds the feature macro __ARM_FEATURE_FP16_VECTOR_ARITHMETIC
> which is defined to be 1 if the NEON FP16 instructions are available; it
> is otherwise undefined.
>
> These two macros will appear in a future version of the ACLE.
>
> Tested the series for arm-none-linux-gnueabihf with native bootstrap and
> make check and for arm-none-eabi and armeb-none-eabi with make check on
> an ARMv8.2-A emulator.
>
> Ok for trunk?
> Matthew
>
> 2016-05-17  Matthew Wahab  <matthew.wahab@arm.com>
>
>         * config/arm/arm-c.c (arm_cpu_builtins): Define
>         "__ARM_FEATURE_FP16_SCALAR_ARITHMETIC" and
>         "__ARM_FEATURE_FP16_VECTOR_ARITHMETIC".
>
> testsuite/
> 2016-05-17  Matthew Wahab  <matthew.wahab@arm.com>
>
>         * gcc.target/arm/attr-fp16-arith-1.c: New.
>

OK.

Ramana

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 5/17][ARM] Enable HI mode moves for floating point values.
  2016-05-17 14:29 ` [PATCH 5/17][ARM] Enable HI mode moves for floating point values Matthew Wahab
@ 2016-07-27 13:57   ` Ramana Radhakrishnan
  2016-09-26 13:20     ` Christophe Lyon
  0 siblings, 1 reply; 73+ messages in thread
From: Ramana Radhakrishnan @ 2016-07-27 13:57 UTC (permalink / raw)
  To: Matthew Wahab; +Cc: gcc-patches

On Tue, May 17, 2016 at 3:29 PM, Matthew Wahab
<matthew.wahab@foss.arm.com> wrote:
> The handling of 16-bit integer data-movement in the ARM backend doesn't
> make full use of the VFP instructions when they are available, even when
> the values are for use in VFP operations.
>
> This patch adds support for using the VFP instructions and registers
> when moving 16-bit integer and floating point data between registers and
> between registers and memory.
>
> Tested the series for arm-none-linux-gnueabihf with native bootstrap and
> make check and for arm-none-eabi and armeb-none-eabi with make check on
> an ARMv8.2-A emulator. Tested this patch for arm-none-linux-gnueabihf
> with native bootstrap and make check and for arm-none-eabi with
> check-gcc on an ARMv8.2-A emulator.
>
> Ok for trunk?


OK. ( the test function where this will make a difference is testhisf
for the record) ...

Ramana
> Matthew
>
> 2016-05-17  Jiong Wang  <jiong.wang@arm.com>
>             Matthew Wahab  <matthew.wahab@arm.com>
>
>         * config/arm/arm.c (output_move_vfp): Weaken assert to allow
>         HImode.
>         (arm_hard_regno_mode_ok): Allow HImode values in VFP registers.
>         * config/arm/arm.md (*movhi_insn_arch4) Disable when VFP registers
> are
>         available.
>         (*movhi_bytes): Likewise.
>         * config/arm/vfp.md (*arm_movhi_vfp): New.
>         (*thumb2_movhi_vfp): New.
>
> testsuite/
> 2016-05-17  Matthew Wahab  <matthew.wahab@arm.com>
>
>         * gcc.target/arm/short-vfp-1.c: New.
>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 6/17][ARM] Add data processing intrinsics for float16_t.
  2016-05-17 14:32 ` [PATCH 6/17][ARM] Add data processing intrinsics for float16_t Matthew Wahab
@ 2016-07-27 13:59   ` Ramana Radhakrishnan
  2016-09-25 14:44     ` Christophe Lyon
  0 siblings, 1 reply; 73+ messages in thread
From: Ramana Radhakrishnan @ 2016-07-27 13:59 UTC (permalink / raw)
  To: Matthew Wahab; +Cc: gcc-patches

On Tue, May 17, 2016 at 3:31 PM, Matthew Wahab
<matthew.wahab@foss.arm.com> wrote:
> The ACLE specifies a number of intrinsics for manipulating vectors
> holding values in most of the integer and floating point type. These
> include 16-bit integer types but not 16-bit floating point even though
> the same instruction is used for both.
>
> A future version of the ACLE extends the data processing intrinscs to
> the 16-bit floating point types, making the intrinsics available
> under the same conditions as the ARM __fp16 type.
>
> This patch adds the new intrinsics:
>  vbsl_f16, vbslq_f16, vdup_n_f16, vdupq_n_f16, vdup_lane_f16,
>  vdupq_lane_f16, vext_f16, vextq_f16, vmov_n_f16, vmovq_n_f16,
>  vrev64_f16, vrev64q_f16, vtrn_f16, vtrnq_f16, vuzp_f16, vuzpq_f16,
>  vzip_f16, vzipq_f16.
>
> This patch also updates the advsimd-intrinsics testsuite to test the f16
> variants for ARM targets. These intrinsics are only implemented in the
> ARM target so the tests are disabled for AArch64 using an extra
> condition on a new convenience macro FP16_SUPPORTED. This patch also
> disables, for the ARM target, the testsuite defined macro vdup_n_f16 as
> it is no longer needed.
>
> Tested the series for arm-none-linux-gnueabihf with native bootstrap and
> make check and for arm-none-eabi and armeb-none-eabi with make check on
> an ARMv8.2-A emulator. Also tested for aarch64-none-elf with the
> advsimd-intrinsics testsuite using an ARMv8.2-A emulator.
>
> Ok for trunk?
> Matthew
>
> 2016-05-17  Matthew Wahab  <matthew.wahab@arm.com>
>
>         * config/arm/arm.c (arm_evpc_neon_vuzp): Add support for V8HF and
>         V4HF modes.
>         (arm_evpc_neon_vzip): Likewise.
>         (arm_evpc_neon_vrev): Likewise.
>         (arm_evpc_neon_vtrn): Likewise.
>         (arm_evpc_neon_vext): Likewise.
>         * config/arm/arm_neon.h (vbsl_f16): New.
>         (vbslq_f16): New.
>         (vdup_n_f16): New.
>         (vdupq_n_f16): New.
>         (vdup_lane_f16): New.
>         (vdupq_lane_f16): New.
>         (vext_f16): New.
>         (vextq_f16): New.
>         (vmov_n_f16): New.
>         (vmovq_n_f16): New.
>         (vrev64_f16): New.
>         (vrev64q_f16): New.
>         (vtrn_f16): New.
>         (vtrnq_f16): New.
>         (vuzp_f16): New.
>         (vuzpq_f16): New.
>         (vzip_f16): New.
>         (vzipq_f16): New.
>         * config/arm/arm_neon_buillins.def (vdup_n): New (v8hf, v4hf
> variants).
>         (vdup_lane): New (v8hf, v4hf variants).
>         (vext): New (v8hf, v4hf variants).
>         (vbsl): New (v8hf, v4hf variants).
>         * config/arm/iterators.md (VDQWH): New.
>         (VH): New.
>         (V_double_vector_mode): Add V8HF and V4HF.  Fix white-space.
>         (Scalar_mul_8_16): Fix white-space.
>         (Is_d_reg): Add V4HF and V8HF.
>         * config/arm/neon.md (neon_vdup_lane<mode>_internal): New.
>         (neon_vdup_lane<mode>): New.
>         (neon_vtrn<mode>_internal): Replace VDQW with VDQWH.
>         (*neon_vtrn<mode>_insn): Likewise.
>         (neon_vzip<mode>_internal): Likewise. Also fix white-space.
>         (*neon_vzip<mode>_insn): Likewise
>         (neon_vuzp<mode>_internal): Likewise.
>         (*neon_vuzp<mode>_insn): Likewise
>         * config/arm/vec-common.md (vec_perm_const<mode>): New.
>
> testsuite/
> 2016-05-17  Matthew Wahab  <matthew.wahab@arm.com>
>
>         * gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
>         (FP16_SUPPORTED): New
>         (vdup_n_f16): Disable for non-AArch64 targets.
>         * gcc.target/aarch64/advsimd-intrinsics/vbsl.c: Add __fp16 tests,
>         conditional on FP16_SUPPORTED.
>         * gcc.target/aarch64/advsimd-intrinsics/vdup-vmov.c: Likewise.
>         * gcc.target/aarch64/advsimd-intrinsics/vdup_lane.c: Likewise.
>         * gcc.target/aarch64/advsimd-intrinsics/vext.c: Likewise.
>         * gcc.target/aarch64/advsimd-intrinsics/vrev.c: Likewise.
>         * gcc.target/aarch64/advsimd-intrinsics/vshuffle.inc: Add support
>         for testing __fp16.
>         * gcc.target/aarch64/advsimd-intrinsics/vtrn.c: Add __fp16 tests,
>         conditional on FP16_SUPPORTED.
>         * gcc.target/aarch64/advsimd-intrinsics/vuzp.c: Likewise.
>         * gcc.target/aarch64/advsimd-intrinsics/vzip.c: Likewise.
>

OK.


Ramana

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 7/17][ARM] Add FP16 data movement instructions.
  2016-07-04 13:57   ` Matthew Wahab
@ 2016-07-27 14:01     ` Ramana Radhakrishnan
  0 siblings, 0 replies; 73+ messages in thread
From: Ramana Radhakrishnan @ 2016-07-27 14:01 UTC (permalink / raw)
  To: Matthew Wahab; +Cc: gcc-patches

On Mon, Jul 4, 2016 at 2:57 PM, Matthew Wahab
<matthew.wahab@foss.arm.com> wrote:
> On 17/05/16 15:34, Matthew Wahab wrote:
>> The ARMv8.2-A FP16 extension adds a number of instructions to support
>> data movement for FP16 values. This patch adds these instructions to the
>> backend, making them available to the compiler code generator.
>
> This updates the expected output for the test added by the patch since
> gcc now generates ldrh/strh for some indexed loads/stores which were
> previously done with vld1/vstr1.
>
> Tested the series for arm-none-linux-gnueabihf with native bootstrap and
> make check and for arm-none-eabi and armeb-none-eabi with make check on
> an ARMv8.2-A emulator.
>
> 2016-07-04  Matthew Wahab  <matthew.wahab@arm.com>
>             Jiong Wang <jiong.wang@arm.com>
>
>         * config/arm/arm.c (coproc_secondary_reload_class): Make HFmode
>         available when FP16 instructions are available.
>         (output_move_vfp): Add support for 16-bit data moves.
>         (arm_validize_comparison): Fix some white-space.  Support HFmode
>         by conversion to SFmode.
>         * config/arm/arm.md (truncdfhf2): Fix a comment.
>         (extendhfdf2): Likewise.
>         (cstorehf4): New.
>         (movsicc): Fix some white-space.
>         (movhfcc): New.
>         (movsfcc): Fix some white-space.
>         (*cmovhf): New.
>         * config/arm/vfp.md (*arm_movhi_vfp): Disable when VFP FP16
>         instructions are available.
>         (*thumb2_movhi_vfp): Likewise.
>         (*arm_movhi_fp16): New.
>         (*thumb2_movhi_fp16): New.
>         (*movhf_vfp_fp16): New.
>         (*movhf_vfp_neon): Disable when VFP FP16 instructions are
>         available.
>         (*movhf_vfp): Likewise.
>         (extendhfsf2): Enable when VFP FP16 instructions are available.
>         (truncsfhf2):  Enable when VFP FP16 instructions are available.
>
> testsuite/
> 2016-07-04  Matthew Wahab  <matthew.wahab@arm.com>
>
>         * gcc.target/arm/armv8_2_fp16-move-1.c: New.
>

OK.

Thanks,
Ramana

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 8/17][ARM] Add VFP FP16 arithmetic instructions.
  2016-07-04 14:02   ` Matthew Wahab
@ 2016-07-28 11:37     ` Ramana Radhakrishnan
  2016-08-03 11:52       ` Ramana Radhakrishnan
  0 siblings, 1 reply; 73+ messages in thread
From: Ramana Radhakrishnan @ 2016-07-28 11:37 UTC (permalink / raw)
  To: Matthew Wahab; +Cc: gcc-patches

On Mon, Jul 4, 2016 at 3:02 PM, Matthew Wahab
<matthew.wahab@foss.arm.com> wrote:
> On 19/05/16 15:54, Matthew Wahab wrote:
>> On 18/05/16 16:20, Joseph Myers wrote:
>>> On Wed, 18 May 2016, Matthew Wahab wrote:
>>>
>>> In short: instructions for direct HFmode arithmetic should be described
>>> with patterns with the standard names.  It's the job of the
>>> architecture-independent compiler to ensure that fp16 arithmetic in the
>>> user's source code only generates direct fp16 arithmetic in GIMPLE (and
>>> thus ends up using those patterns) if that is a correct representation of
>>> the source code's semantics according to ACLE.
>>>
>>> The intrinsics you provide can then be written to use direct arithmetic,
>>> and rely on convert_to_real_1 eliminating the promotions, rather than
>>> needing built-in functions at all, just like many arm_neon.h intrinsics
>>> make direct use of GNU C vector arithmetic.
>>
>> I think it's clear that this has exhausted my knowledge of FP semantics.
>>
>> Forcing promotion to single-precision was to settle concerns brought up in
>> internal discussions about __fp16 semantics. I'll see if anybody has any
>> problem with the changes you suggest.
>
> This patch changes the implementation to use the standard names for the
> HFmode arithmetic. Later patches will also be updated to use the
> arithmetic operators where appropriate.
>
> Changes since the last version of this patch:
> - The standard names for plus, minus, mult, div and fma are defined for
>   HF mode.
> - The patterns supporting the new ACLE intrinsics vnegh_f16, vaddh_f16,
>   vsubh_f16, vmulh_f16 and vdivh_f16 are removed, the arithmetic
>   operators will be used instead.
> - The tests are updated to expect f16 instructions rather than the f32
>   instructions that were previously emitted.
>
> Tested the series for arm-none-linux-gnueabihf with native bootstrap and
> make check and for arm-none-eabi and armeb-none-eabi with make check on
> an ARMv8.2-A emulator.


All fine except -

Why can we not extend the <vrint_pattern> and the l<vrint_pattern> in
vfp.md for fp16 and avoid all the unspecs for vcvta and vrnd*
instructions ?

Ramana




>
> Ok for trunk?
> Matthew
>
> 2016-07-04  Matthew Wahab  <matthew.wahab@arm.com>
>
>         * config/arm/iterators.md (Code iterators): Fix some white-space
>         in the comments.
>         (GLTE): New.
>         (ABSNEG): New
>         (FCVT): Moved from vfp.md.
>         (VCVT_HF_US_N): New.
>         (VCVT_SI_US_N): New.
>         (VCVT_HF_US): New.
>         (VCVTH_US): New.
>         (FP16_RND): New.
>         (absneg_str): New.
>         (FCVTI32typename): Moved from vfp.md.
>         (sup): Add UNSPEC_VCVTA_S, UNSPEC_VCVTA_U, UNSPEC_VCVTM_S,
>         UNSPEC_VCVTM_U, UNSPEC_VCVTN_S, UNSPEC_VCVTN_U, UNSPEC_VCVTP_S,
>         UNSPEC_VCVTP_U, UNSPEC_VCVT_HF_S_N, UNSPEC_VCVT_HF_U_N,
>         UNSPEC_VCVT_SI_S_N, UNSPEC_VCVT_SI_U_N,  UNSPEC_VCVTH_S_N,
>         UNSPEC_VCVTH_U_N, UNSPEC_VCVTH_S and UNSPEC_VCVTH_U.
>
>         (vcvth_op): New.
>         (fp16_rnd_str): New.
>         (fp16_rnd_insn): New.


>         * config/arm/unspecs.md (UNSPEC_VCVT_HF_S_N): New.
>         (UNSPEC_VCVT_HF_U_N): New.
>         (UNSPEC_VCVT_SI_S_N): New.
>         (UNSPEC_VCVT_SI_U_N): New.
>         (UNSPEC_VCVTH_S): New.
>         (UNSPEC_VCVTH_U): New.
>         (UNSPEC_VCVTA_S): New.
>         (UNSPEC_VCVTA_U): New.
>         (UNSPEC_VCVTM_S): New.
>         (UNSPEC_VCVTM_U): New.
>         (UNSPEC_VCVTN_S): New.
>         (UNSPEC_VCVTN_U): New.
>         (UNSPEC_VCVTP_S): New.
>         (UNSPEC_VCVTP_U): New.
>         (UNSPEC_VCVTP_S): New.
>         (UNSPEC_VCVTP_U): New.
>         (UNSPEC_VRND): New.
>         (UNSPEC_VRNDA): New.
>         (UNSPEC_VRNDI): New.
>         (UNSPEC_VRNDM): New.
>         (UNSPEC_VRNDN): New.
>         (UNSPEC_VRNDP): New.
>         (UNSPEC_VRNDX): New.
>         * config/arm/vfp.md (<absneg_str>hf2): New.
>         (neon_vabshf): New.
>         (neon_v<fp16_rnd_str>hf): New.
>         (neon_vrndihf): New.
>         (addhf3): New.
>         (subhf3): New.
>         (divhf3): New.
>         (mulhf3): New.
>         (*mulsf3neghf_vfp): New.
>         (*negmulhf3_vfp): New.
>         (*mulsf3addhf_vfp): New.
>         (*mulhf3subhf_vfp): New.
>         (*mulhf3neghfaddhf_vfp): New.
>         (*mulhf3neghfsubhf_vfp): New.
>         (fmahf4): New.
>         (neon_vfmahf): New.
>         (fmsubhf4_fp16): New.
>         (neon_vfmshf): New.
>         (*fnmsubhf4): New.
>         (*fnmaddhf4): New.
>         (neon_vsqrthf): New.
>         (neon_vrsqrtshf): New.
>         (FCVT): Move to iterators.md.
>         (FCVTI32typename): Likewise.
>         (neon_vcvth<sup>hf): New.
>         (neon_vcvth<sup>si): New.
>         (neon_vcvth<sup>_nhf_unspec): New.
>         (neon_vcvth<sup>_nhf): New.
>         (neon_vcvth<sup>_nsi_unspec): New.
>         (neon_vcvth<sup>_nsi): New.
>         (neon_vcvt<vcvth_op>h<sup>si): New.
>         (neon_<fmaxmin_op>hf): New.
>
> testsuite/
> 2016-07-04  Matthew Wahab  <matthew.wahab@arm.com>
>
>
>         * gcc.target/arm/armv8_2-fp16-arith-1.c: New.
>         * gcc.target/arm/armv8_2-fp16-conv-1.c: New.
>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 9/17][ARM] Add NEON FP16 arithmetic instructions.
  2016-07-04 14:09     ` Matthew Wahab
@ 2016-07-28 11:53       ` Ramana Radhakrishnan
  0 siblings, 0 replies; 73+ messages in thread
From: Ramana Radhakrishnan @ 2016-07-28 11:53 UTC (permalink / raw)
  To: Matthew Wahab; +Cc: Joseph Myers, gcc-patches

On Mon, Jul 4, 2016 at 3:09 PM, Matthew Wahab
<matthew.wahab@foss.arm.com> wrote:
> On 18/05/16 01:58, Joseph Myers wrote:
>> On Tue, 17 May 2016, Matthew Wahab wrote:
>>
>>> As with the VFP FP16 arithmetic instructions, operations on __fp16
>>> values are done by conversion to single-precision. Any new optimization
>>> supported by the instruction descriptions can only apply to code
>>> generated using intrinsics added in this patch series.
>>
>> As with the scalar instructions, I think it is legitimate in most cases to
>> optimize arithmetic via single precision to work direct on __fp16 values
>> (and this would be natural for vectorization of __fp16 arithmetic).
>>
>>> A number of the instructions are modelled as two variants, one using
>>> UNSPEC and the other using RTL operations, with the model used decided
>>> by the funsafe-math-optimizations flag. This follows the
>>> single-precision instructions and is due to the half-precision
>>> operations having the same conditions and restrictions on their use in
>>> optmizations (when they are enabled).
>>
>> (Of course, these restrictions still apply.)
>
> The F16 support generally follows the F32 implementation and, for F32,
> direct arithmetic vector operations are only available when
> unsafe-math-optimizations is enabled. I want to check the behaviour of
> the F16 operations when unsafe-math is enabled so I'll defer to a follow
> up patch the change to use standard names for the vector operations.
>
> There are still some changes from the previous patch:
>
> - Two fma/fmsub patterns *fma<VH:mode>4 and <*fmsub<VH:mode>4 are
>   dropped since they just duplicated *fma<VH:mode>4_intrinsic and
>   <*fmsub<VH:mode>4_intrinsic.
>
> - Patterns neon_vadd<mode>_unspec and neon_vsub<mode>_unspec are
>   dropped, they were redundant.
>
> - <absneg_str><mode>2_fp16 is renamed to <absneg_str><mode>2. This
>   implements the abs and neg operations which are always safe to use.
>
> - neon_vsqrte<mode> is renamed to neon_vrsqrte<mode>. This is a
>   misspelled intrinsic that wasn't caught in testing because the
>   relevant test case is missing. The intrinsic is fixed here and in
>   other patches and an advsimd-intrinsics test added later in the
>   (updated) series.
>
> - neon_vcvt<sup>_n<mode: The bounds on the scalar were wrong, the
>   correct range for f16 is 0-17.
>
> - Test armv8_2-fp16-arith-1.c is updated to expect f16 arithmetic
>   instructions rather then f32 and to use the neon command line options.
>
> Tested the series for arm-none-linux-gnueabihf with native bootstrap and
> make check and for arm-none-eabi and armeb-none-eabi with make check on
> an ARMv8.2-A emulator.
>
> Ok for trunk?

OK.

Ramana
> Matthew
>
> 2016-07-04  Matthew Wahab  <matthew.wahab@arm.com>
>
>         * config/arm/iterators.md (VCVTHI): New.
>         (NEON_VCMP): Add UNSPEC_VCLT and UNSPEC_VCLE.  Fix a long line.
>         (NEON_VAGLTE): New.
>         (VFM_LANE_AS): New.
>         (VH_CVTTO): New.
>         (V_reg): Add HF, V4HF and V8HF.  Fix white-space.
>         (V_HALF): Add V4HF.  Fix white-space.
>         (V_if_elem): Add HF, V4HF and V8HF.  Fix white-space.
>         (V_s_elem): Likewise.
>         (V_sz_elem): Fix white-space.
>         (V_elem_ch): Likewise.
>         (VH_elem_ch): New.
>         (scalar_mul_constraint): Add V8HF and V4HF.
>         (Is_float_mode): Fix white-space.
>         (Is_d_reg): Fix white-space.
>         (q): Add HF.  Fix white-space.
>         (float_sup): New.
>         (float_SUP): New.
>         (cmp_op_unsp): Add UNSPEC_VCALE and UNSPEC_VCALT.
>         (neon_vfm_lane_as): New.
>         * config/arm/neon.md (add<mode>3_fp16): New.
>         (sub<mode>3_fp16): New.
>         (mul<mode>3add<mode>_neon): New.
>         (fma<VH:mode>4_intrinsic): New.
>         (fmsub<VCVTF:mode>4_intrinsic): Fix white-space.
>         (fmsub<VH:mode>4_intrinsic): New.
>         (<absneg_str><mode>2): New.
>         (neon_v<absneg_str><mode>): New.
>         (neon_v<fp16_rnd_str><mode>): New.
>         (neon_vrsqrte<mode>): New.
>         (neon_vpaddv4hf): New.
>         (neon_vadd<mode>): New.
>         (neon_vsub<mode>): New.
>         (neon_vmulf<mode>): New.
>         (neon_vfma<VH:mode>): New.
>         (neon_vfms<VH:mode>): New.
>         (neon_vc<cmp_op><mode>): New.
>         (neon_vc<cmp_op><mode>_fp16insn): New
>         (neon_vc<cmp_op_unsp><mode>_fp16insn_unspec): New.
>         (neon_vca<cmp_op><mode>): New.
>         (neon_vca<cmp_op><mode>_fp16insn): New.
>         (neon_vca<cmp_op_unsp><mode>_fp16insn_unspec): New.
>         (neon_vc<cmp_op>z<mode>): New.
>         (neon_vabd<mode>): New.
>         (neon_v<maxmin>f<mode>): New.
>         (neon_vp<maxmin>fv4hf: New.
>         (neon_<fmaxmin_op><mode>): New.
>         (neon_vrecps<mode>): New.
>         (neon_vrsqrts<mode>): New.
>         (neon_vrecpe<mode>): New (VH variant).
>         (neon_vdup_lane<mode>_internal): New.
>         (neon_vdup_lane<mode>): New.
>         (neon_vcvt<sup><mode>): New (VCVTHI variant).
>         (neon_vcvt<sup><mode>): New (VH variant).
>         (neon_vcvt<sup>_n<mode>): New (VH variant).
>         (neon_vcvt<sup>_n<mode>): New (VCVTHI variant).
>         (neon_vcvt<vcvth_op><sup><mode>): New.
>         (neon_vmul_lane<mode>): New.
>         (neon_vmul_n<mode>): New.
>         * config/arm/unspecs.md (UNSPEC_VCALE): New
>         (UNSPEC_VCALT): New.
>         (UNSPEC_VFMA_LANE): New.
>         (UNSPECS_VFMS_LANE): New.
>
> testsuite/
> 2016-07-04  Matthew Wahab  <matthew.wahab@arm.com>
>
>         * gcc.target/arm/armv8_2-fp16-arith-1.c: Use arm_v8_2a_fp16_neon
>         options.  Add tests for float16x4_t and float16x8_t.
>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 10/17][ARM] Refactor support code for NEON builtins.
  2016-05-17 14:39 ` [PATCH 10/17][ARM] Refactor support code for NEON builtins Matthew Wahab
@ 2016-07-28 11:54   ` Ramana Radhakrishnan
  2016-12-05 16:47     ` [arm-embedded][committed][PATCH 10/17] " Andre Vieira (lists)
  0 siblings, 1 reply; 73+ messages in thread
From: Ramana Radhakrishnan @ 2016-07-28 11:54 UTC (permalink / raw)
  To: Matthew Wahab; +Cc: gcc-patches

On Tue, May 17, 2016 at 3:39 PM, Matthew Wahab
<matthew.wahab@foss.arm.com> wrote:
> The ACLE intrinsics introduced to support the ARMv8.2 FP16 extensions
> require that intrinsics for scalar (VFP) instructions are available
> under different conditions from those for the NEON intrinsics. To
> support this, changes to the builtins support code are needed to enable
> the scalar intrinsics to be initialized and expanded independently of
> the NEON intrinsics.
>
> This patch prepares for this by refactoring some of the builtin support
> code so that it can be used for both the scalar and the NEON intrinsics.
>
> Tested the series for arm-none-linux-gnueabihf with native bootstrap and
> make check and for arm-none-eabi and armeb-none-eabi with make check on
> an ARMv8.2-A emulator.


OK.

Ramana
>
> Ok for trunk?
> Matthew
>
> 2016-05-17  Matthew Wahab  <matthew.wahab@arm.com>
>
>         * config/arm/arm-builtins.c (ARM_BUILTIN_NEON_PATTERN_START):
>         Change offset calculation.
>         (arm_init_neon_builtin): New.
>         (arm_init_builtins): Move body of a loop to the standalone
>         function arm_init_neon_builtin.
>         (arm_expand_neon_builtin_1): New.  Update comment.  Function body
>         moved from arm_expand_neon_builtin with some white-space fixes.
>         (arm_expand_neon_builtin): Move code into the standalone function
>         arm_expand_neon_builtin_1.
>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 11/17][ARM] Add builtins for VFP FP16 intrinsics.
  2016-07-04 14:12   ` Matthew Wahab
@ 2016-07-28 11:55     ` Ramana Radhakrishnan
  0 siblings, 0 replies; 73+ messages in thread
From: Ramana Radhakrishnan @ 2016-07-28 11:55 UTC (permalink / raw)
  To: Matthew Wahab; +Cc: gcc-patches

On Mon, Jul 4, 2016 at 3:11 PM, Matthew Wahab
<matthew.wahab@foss.arm.com> wrote:
> On 17/05/16 15:41, Matthew Wahab wrote:
>> The ACLE intrinsics introduced to support the ARMv8.2 FP16 extensions
>> require that intrinsics for scalar floating pointer (VFP) instructions
>> are available under different conditions from those for the NEON
>> intrinsics.
>>
>> This patch adds the support code and builtins data for the new VFP
>> intrinsics. Because of the similarities between the scalar and NEON
>> builtins, the support code for the scalar builtins follows the code for
>> the NEON builtins. The declarations for the VFP builtins are also added
>> in this patch since the support code expects non-empty tables.
>
> Updated the patch to drop the builtins for vneg, vadd, vsub, vmul and
> vdiv, which are no longer needed.
>
> Tested the series for arm-none-linux-gnueabihf with native bootstrap and
> make check and for arm-none-eabi and armeb-none-eabi with make check on
> an ARMv8.2-A emulator.
>
> Ok for trunk?
> Matthew
>
> 2016-07-04  Matthew Wahab  <matthew.wahab@arm.com>
>
>         * config/arm/arm-builtins.c (hf_UP): New.
>         (si_UP): New.
>         (vfp_builtin_data): New.  Update comment.
>         (enum arm_builtins): Include "arm_vfp_builtins.def".
>         (ARM_BUILTIN_VFP_PATTERN_START): New.
>         (arm_init_vfp_builtins): New.
>         (arm_init_builtins): Add arm_init_vfp_builtins.
>         (arm_expand_vfp_builtin): New.
>         (arm_expand_builtins): Update for arm_expand_vfp_builtin.  Fix
>         long line.
>         * config/arm/arm_vfp_builtins.def: New file.
>
>         * config/arm/t-arm (arm.o): Add arm_vfp_builtins.def.
>         (arm-builtins.o): Likewise.
>


OK.

Ramana

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 12/17][ARM] Add builtins for NEON FP16 intrinsics.
  2016-07-04 14:13   ` Matthew Wahab
@ 2016-07-28 11:56     ` Ramana Radhakrishnan
  0 siblings, 0 replies; 73+ messages in thread
From: Ramana Radhakrishnan @ 2016-07-28 11:56 UTC (permalink / raw)
  To: Matthew Wahab; +Cc: gcc-patches

On Mon, Jul 4, 2016 at 3:13 PM, Matthew Wahab
<matthew.wahab@foss.arm.com> wrote:
> On 17/05/16 15:42, Matthew Wahab wrote:
>> This patch adds the builtins data for the ACLE intrinsics introduced to
>> support the NEON instructions of the ARMv8.2-A FP16 extension.
>
> Updated to fix the vsqrte/vrsqrte spelling mistake and correct the
> changelog.
>
> Tested the series for arm-none-linux-gnueabihf with native bootstrap and
> make check and for arm-none-eabi and armeb-none-eabi with make check on
> an ARMv8.2-A emulator.
>
> Ok for trunk?

Ok ...

Ramana


> Matthew
>
> 2016-07-04  Matthew Wahab  <matthew.wahab@arm.com>
>
>
>         * config/arm/arm_neon_builtins.def (vadd): New (v8hf, v4hf
>         variants).
>         (vmulf): New (v8hf, v4hf variants).
>         (vfma): New (v8hf, v4hf variants).
>         (vfms): New (v8hf, v4hf variants).
>         (vsub): New (v8hf, v4hf variants).
>         (vcage): New (v8hf, v4hf variants).
>         (vcagt): New (v8hf, v4hf variants).
>         (vcale): New (v8hf, v4hf variants).
>         (vcalt): New (v8hf, v4hf variants).
>         (vceq): New (v8hf, v4hf variants).
>         (vcgt): New (v8hf, v4hf variants).
>         (vcge): New (v8hf, v4hf variants).
>         (vcle): New (v8hf, v4hf variants).
>         (vclt): New (v8hf, v4hf variants).
>         (vceqz): New (v8hf, v4hf variants).
>         (vcgez): New (v8hf, v4hf variants).
>         (vcgtz): New (v8hf, v4hf variants).
>         (vcltz): New (v8hf, v4hf variants).
>         (vclez): New (v8hf, v4hf variants).
>         (vabd): New (v8hf, v4hf variants).
>         (vmaxf): New (v8hf, v4hf variants).
>         (vmaxnm): New (v8hf, v4hf variants).
>         (vminf): New (v8hf, v4hf variants).
>         (vminnm): New (v8hf, v4hf variants).
>         (vpmaxf): New (v4hf variant).
>         (vpminf): New (v4hf variant).
>         (vpadd): New (v4hf variant).
>         (vrecps): New (v8hf, v4hf variants).
>         (vrsqrts): New (v8hf, v4hf variants).
>         (vabs): New (v8hf, v4hf variants).
>         (vneg): New (v8hf, v4hf variants).
>         (vrecpe): New (v8hf, v4hf variants).
>         (vrnd): New (v8hf, v4hf variants).
>         (vrnda): New (v8hf, v4hf variants).
>         (vrndm): New (v8hf, v4hf variants).
>         (vrndn): New (v8hf, v4hf variants).
>         (vrndp): New (v8hf, v4hf variants).
>         (vrndx): New (v8hf, v4hf variants).
>         (vrsqrte): New (v8hf, v4hf variants).
>         (vmul_lane): Add v4hf and v8hf variants.
>         (vmul_n): Add v4hf and v8hf variants.
>         (vext): New (v8hf, v4hf variants).
>         (vcvts): New (v8hi, v4hi variants).
>         (vcvts): New (v8hf, v4hf variants).
>         (vcvtu): New (v8hi, v4hi variants).
>         (vcvtu): New (v8hf, v4hf variants).
>         (vcvts_n): New (v8hf, v4hf variants).
>         (vcvtu_n): New (v8hi, v4hi variants).
>         (vcvts_n): New (v8hi, v4hi variants).
>         (vcvtu_n): New (v8hf, v4hf variants).
>         (vbsl): New (v8hf, v4hf variants).
>         (vcvtas): New (v8hf, v4hf variants).
>         (vcvtau): New (v8hf, v4hf variants).
>         (vcvtms): New (v8hf, v4hf variants).
>         (vcvtmu): New (v8hf, v4hf variants).
>         (vcvtns): New (v8hf, v4hf variants).
>         (vcvtnu): New (v8hf, v4hf variants).
>         (vcvtps): New (v8hf, v4hf variants).
>         (vcvtpu): New (v8hf, v4hf variants).
>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 13/17][ARM] Add VFP FP16 instrinsics.
  2016-07-04 14:14   ` Matthew Wahab
@ 2016-07-28 11:57     ` Ramana Radhakrishnan
  0 siblings, 0 replies; 73+ messages in thread
From: Ramana Radhakrishnan @ 2016-07-28 11:57 UTC (permalink / raw)
  To: Matthew Wahab; +Cc: gcc-patches

On Mon, Jul 4, 2016 at 3:14 PM, Matthew Wahab
<matthew.wahab@foss.arm.com> wrote:
> On 17/05/16 15:44, Matthew Wahab wrote:
>> The ARMv8.2-A architecture introduces an optional FP16 extension adding
>> half-precision floating point data processing instructions to the
>> existing scalar (floating point) support. A future version of the ACLE
>> will add support for these instructions and this patch implements that
>> support.
>
> Updated to use the standard arithmetic operations for vnegh_f16,
> vaddh_f16, vsubh_f16, vmulh_f16 and vdivh_f16.
>
> Tested the series for arm-none-linux-gnueabihf with native bootstrap and
> make check and for arm-none-eabi and armeb-none-eabi with make check on
> an ARMv8.2-A emulator.
>
> Ok for trunk?
> Matthew
>
> 2016-07-04  Matthew Wahab  <matthew.wahab@arm.com>
>
>         * config.gcc (extra_headers): Add arm_fp16.h
>         * config/arm/arm_fp16.h: New.
>         * config/arm/arm_neon.h: Include "arm_fp16.h".
>


OK.

Ramana

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 8/17][ARM] Add VFP FP16 arithmetic instructions.
  2016-07-28 11:37     ` Ramana Radhakrishnan
@ 2016-08-03 11:52       ` Ramana Radhakrishnan
  2016-08-03 13:10         ` Matthew Wahab
                           ` (2 more replies)
  0 siblings, 3 replies; 73+ messages in thread
From: Ramana Radhakrishnan @ 2016-08-03 11:52 UTC (permalink / raw)
  To: Matthew Wahab; +Cc: gcc-patches, Joseph S. Myers

On Thu, Jul 28, 2016 at 12:37 PM, Ramana Radhakrishnan
<ramana.gcc@googlemail.com> wrote:
> On Mon, Jul 4, 2016 at 3:02 PM, Matthew Wahab
> <matthew.wahab@foss.arm.com> wrote:
>> On 19/05/16 15:54, Matthew Wahab wrote:
>>> On 18/05/16 16:20, Joseph Myers wrote:
>>>> On Wed, 18 May 2016, Matthew Wahab wrote:
>>>>
>>>> In short: instructions for direct HFmode arithmetic should be described
>>>> with patterns with the standard names.  It's the job of the
>>>> architecture-independent compiler to ensure that fp16 arithmetic in the
>>>> user's source code only generates direct fp16 arithmetic in GIMPLE (and
>>>> thus ends up using those patterns) if that is a correct representation of
>>>> the source code's semantics according to ACLE.
>>>>
>>>> The intrinsics you provide can then be written to use direct arithmetic,
>>>> and rely on convert_to_real_1 eliminating the promotions, rather than
>>>> needing built-in functions at all, just like many arm_neon.h intrinsics
>>>> make direct use of GNU C vector arithmetic.
>>>
>>> I think it's clear that this has exhausted my knowledge of FP semantics.
>>>
>>> Forcing promotion to single-precision was to settle concerns brought up in
>>> internal discussions about __fp16 semantics. I'll see if anybody has any
>>> problem with the changes you suggest.
>>
>> This patch changes the implementation to use the standard names for the
>> HFmode arithmetic. Later patches will also be updated to use the
>> arithmetic operators where appropriate.
>>
>> Changes since the last version of this patch:
>> - The standard names for plus, minus, mult, div and fma are defined for
>>   HF mode.
>> - The patterns supporting the new ACLE intrinsics vnegh_f16, vaddh_f16,
>>   vsubh_f16, vmulh_f16 and vdivh_f16 are removed, the arithmetic
>>   operators will be used instead.
>> - The tests are updated to expect f16 instructions rather than the f32
>>   instructions that were previously emitted.
>>
>> Tested the series for arm-none-linux-gnueabihf with native bootstrap and
>> make check and for arm-none-eabi and armeb-none-eabi with make check on
>> an ARMv8.2-A emulator.
>
>
> All fine except -
>
> Why can we not extend the <vrint_pattern> and the l<vrint_pattern> in
> vfp.md for fp16 and avoid all the unspecs for vcvta and vrnd*
> instructions ?
>

I now feel reasonably convinced that these can go away and be replaced
by extending the <vrint_pattern> and l<vrint_pattern> expanders to
consider FP16 as well. Given that we are still only in the middle of
stage1 - I'm ok for you to apply this as is and then follow-up with a
patch that gets rid of the UNSPECs . If this holds for add, sub and
other patterns I don't see why it wouldn't hold for all these patterns
as well.

Joseph, do you have any opinions on whether we should be extending the
standard pattern names or not for btrunc, ceil, round, floor,
nearbyint, rint, lround, lfloor and lceil optabs for the HFmode
quantities ?

Thanks,
Ramana

> Ramana
>
>
>
>
>>
>> Ok for trunk?
>> Matthew
>>
>> 2016-07-04  Matthew Wahab  <matthew.wahab@arm.com>
>>
>>         * config/arm/iterators.md (Code iterators): Fix some white-space
>>         in the comments.
>>         (GLTE): New.
>>         (ABSNEG): New
>>         (FCVT): Moved from vfp.md.
>>         (VCVT_HF_US_N): New.
>>         (VCVT_SI_US_N): New.
>>         (VCVT_HF_US): New.
>>         (VCVTH_US): New.
>>         (FP16_RND): New.
>>         (absneg_str): New.
>>         (FCVTI32typename): Moved from vfp.md.
>>         (sup): Add UNSPEC_VCVTA_S, UNSPEC_VCVTA_U, UNSPEC_VCVTM_S,
>>         UNSPEC_VCVTM_U, UNSPEC_VCVTN_S, UNSPEC_VCVTN_U, UNSPEC_VCVTP_S,
>>         UNSPEC_VCVTP_U, UNSPEC_VCVT_HF_S_N, UNSPEC_VCVT_HF_U_N,
>>         UNSPEC_VCVT_SI_S_N, UNSPEC_VCVT_SI_U_N,  UNSPEC_VCVTH_S_N,
>>         UNSPEC_VCVTH_U_N, UNSPEC_VCVTH_S and UNSPEC_VCVTH_U.
>>
>>         (vcvth_op): New.
>>         (fp16_rnd_str): New.
>>         (fp16_rnd_insn): New.
>
>
>>         * config/arm/unspecs.md (UNSPEC_VCVT_HF_S_N): New.
>>         (UNSPEC_VCVT_HF_U_N): New.
>>         (UNSPEC_VCVT_SI_S_N): New.
>>         (UNSPEC_VCVT_SI_U_N): New.
>>         (UNSPEC_VCVTH_S): New.
>>         (UNSPEC_VCVTH_U): New.
>>         (UNSPEC_VCVTA_S): New.
>>         (UNSPEC_VCVTA_U): New.
>>         (UNSPEC_VCVTM_S): New.
>>         (UNSPEC_VCVTM_U): New.
>>         (UNSPEC_VCVTN_S): New.
>>         (UNSPEC_VCVTN_U): New.
>>         (UNSPEC_VCVTP_S): New.
>>         (UNSPEC_VCVTP_U): New.
>>         (UNSPEC_VCVTP_S): New.
>>         (UNSPEC_VCVTP_U): New.
>>         (UNSPEC_VRND): New.
>>         (UNSPEC_VRNDA): New.
>>         (UNSPEC_VRNDI): New.
>>         (UNSPEC_VRNDM): New.
>>         (UNSPEC_VRNDN): New.
>>         (UNSPEC_VRNDP): New.
>>         (UNSPEC_VRNDX): New.
>>         * config/arm/vfp.md (<absneg_str>hf2): New.
>>         (neon_vabshf): New.
>>         (neon_v<fp16_rnd_str>hf): New.
>>         (neon_vrndihf): New.
>>         (addhf3): New.
>>         (subhf3): New.
>>         (divhf3): New.
>>         (mulhf3): New.
>>         (*mulsf3neghf_vfp): New.
>>         (*negmulhf3_vfp): New.
>>         (*mulsf3addhf_vfp): New.
>>         (*mulhf3subhf_vfp): New.
>>         (*mulhf3neghfaddhf_vfp): New.
>>         (*mulhf3neghfsubhf_vfp): New.
>>         (fmahf4): New.
>>         (neon_vfmahf): New.
>>         (fmsubhf4_fp16): New.
>>         (neon_vfmshf): New.
>>         (*fnmsubhf4): New.
>>         (*fnmaddhf4): New.
>>         (neon_vsqrthf): New.
>>         (neon_vrsqrtshf): New.
>>         (FCVT): Move to iterators.md.
>>         (FCVTI32typename): Likewise.
>>         (neon_vcvth<sup>hf): New.
>>         (neon_vcvth<sup>si): New.
>>         (neon_vcvth<sup>_nhf_unspec): New.
>>         (neon_vcvth<sup>_nhf): New.
>>         (neon_vcvth<sup>_nsi_unspec): New.
>>         (neon_vcvth<sup>_nsi): New.
>>         (neon_vcvt<vcvth_op>h<sup>si): New.
>>         (neon_<fmaxmin_op>hf): New.
>>
>> testsuite/
>> 2016-07-04  Matthew Wahab  <matthew.wahab@arm.com>
>>
>>
>>         * gcc.target/arm/armv8_2-fp16-arith-1.c: New.
>>         * gcc.target/arm/armv8_2-fp16-conv-1.c: New.
>>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 14/17][ARM] Add NEON FP16 instrinsics.
  2016-07-04 14:16   ` Matthew Wahab
@ 2016-08-03 12:57     ` Ramana Radhakrishnan
  0 siblings, 0 replies; 73+ messages in thread
From: Ramana Radhakrishnan @ 2016-08-03 12:57 UTC (permalink / raw)
  To: Matthew Wahab; +Cc: gcc-patches

On Mon, Jul 4, 2016 at 3:15 PM, Matthew Wahab
<matthew.wahab@foss.arm.com> wrote:
> On 17/05/16 15:46, Matthew Wahab wrote:
>> The ARMv8.2-A architecture introduces an optional FP16 extension adding
>> half-precision floating point data processing instructions to the
>> existing Adv.SIMD (NEON) support. A future version of the ACLE will add
>> support for these instructions and this patch implements that support.
>
> Updated to fix the vsqrte/vrsqrte spelling mistake.
>
> Tested the series for arm-none-linux-gnueabihf with native bootstrap and
> make check and for arm-none-eabi and armeb-none-eabi with make check on
> an ARMv8.2-A emulator.
>
> Ok for trunk?
> Matthew
>
> 2016-07-04  Matthew Wahab  <matthew.wahab@arm.com>
>
>         * config/arm/arm_neon.h (vabd_f16): New.
>
>         (vabdq_f16): New.
>         (vabs_f16): New.
>         (vabsq_f16): New.
>         (vadd_f16): New.
>         (vaddq_f16): New.
>         (vcage_f16): New.
>         (vcageq_f16): New.
>         (vcagt_f16): New.
>         (vcagtq_f16): New.
>         (vcale_f16): New.
>         (vcaleq_f16): New.
>         (vcalt_f16): New.
>         (vcaltq_f16): New.
>         (vceq_f16): New.
>         (vceqq_f16): New.
>         (vceqz_f16): New.
>         (vceqzq_f16): New.
>         (vcge_f16): New.
>         (vcgeq_f16): New.
>         (vcgez_f16): New.
>         (vcgezq_f16): New.
>         (vcgt_f16): New.
>         (vcgtq_f16): New.
>         (vcgtz_f16): New.
>         (vcgtzq_f16): New.
>         (vcle_f16): New.
>         (vcleq_f16): New.
>         (vclez_f16): New.
>         (vclezq_f16): New.
>         (vclt_f16): New.
>         (vcltq_f16): New.
>         (vcltz_f16): New.
>         (vcltzq_f16): New.
>         (vcvt_f16_s16): New.
>         (vcvt_f16_u16): New.
>         (vcvt_s16_f16): New.
>         (vcvt_u16_f16): New.
>         (vcvtq_f16_s16): New.
>         (vcvtq_f16_u16): New.
>         (vcvtq_s16_f16): New.
>         (vcvtq_u16_f16): New.
>         (vcvta_s16_f16): New.
>         (vcvta_u16_f16): New.
>         (vcvtaq_s16_f16): New.
>         (vcvtaq_u16_f16): New.
>         (vcvtm_s16_f16): New.
>         (vcvtm_u16_f16): New.
>         (vcvtmq_s16_f16): New.
>         (vcvtmq_u16_f16): New.
>         (vcvtn_s16_f16): New.
>         (vcvtn_u16_f16): New.
>         (vcvtnq_s16_f16): New.
>         (vcvtnq_u16_f16): New.
>         (vcvtp_s16_f16): New.
>         (vcvtp_u16_f16): New.
>         (vcvtpq_s16_f16): New.
>         (vcvtpq_u16_f16): New.
>         (vcvt_n_f16_s16): New.
>         (vcvt_n_f16_u16): New.
>         (vcvtq_n_f16_s16): New.
>         (vcvtq_n_f16_u16): New.
>         (vcvt_n_s16_f16): New.
>         (vcvt_n_u16_f16): New.
>         (vcvtq_n_s16_f16): New.
>         (vcvtq_n_u16_f16): New.
>         (vfma_f16): New.
>         (vfmaq_f16): New.
>         (vfms_f16): New.
>         (vfmsq_f16): New.
>         (vmax_f16): New.
>         (vmaxq_f16): New.
>         (vmaxnm_f16): New.
>         (vmaxnmq_f16): New.
>         (vmin_f16): New.
>         (vminq_f16): New.
>         (vminnm_f16): New.
>         (vminnmq_f16): New.
>         (vmul_f16): New.
>         (vmul_lane_f16): New.
>         (vmul_n_f16): New.
>         (vmulq_f16): New.
>         (vmulq_lane_f16): New.
>         (vmulq_n_f16): New.
>         (vneg_f16): New.
>         (vnegq_f16): New.
>         (vpadd_f16): New.
>         (vpmax_f16): New.
>         (vpmin_f16): New.
>         (vrecpe_f16): New.
>         (vrecpeq_f16): New.
>         (vrnd_f16): New.
>         (vrndq_f16): New.
>         (vrnda_f16): New.
>         (vrndaq_f16): New.
>         (vrndm_f16): New.
>         (vrndmq_f16): New.
>         (vrndn_f16): New.
>         (vrndnq_f16): New.
>         (vrndp_f16): New.
>         (vrndpq_f16): New.
>         (vrndx_f16): New.
>         (vrndxq_f16): New.
>         (vrsqrte_f16): New.
>         (vrsqrteq_f16): New.
>
>         (vrecps_f16): New.
>         (vrecpsq_f16): New.
>         (vrsqrts_f16): New.
>         (vrsqrtsq_f16): New.
>         (vsub_f16): New.
>         (vsubq_f16): New.
>


OK ...

Thanks,
Ramana

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 8/17][ARM] Add VFP FP16 arithmetic instructions.
  2016-08-03 11:52       ` Ramana Radhakrishnan
@ 2016-08-03 13:10         ` Matthew Wahab
  2016-08-03 14:45         ` James Greenhalgh
  2016-08-03 17:44         ` Joseph Myers
  2 siblings, 0 replies; 73+ messages in thread
From: Matthew Wahab @ 2016-08-03 13:10 UTC (permalink / raw)
  To: Ramana Radhakrishnan; +Cc: gcc-patches, Joseph S. Myers

On 03/08/16 12:52, Ramana Radhakrishnan wrote:
> On Thu, Jul 28, 2016 at 12:37 PM, Ramana Radhakrishnan
> <ramana.gcc@googlemail.com> wrote:
>> On Mon, Jul 4, 2016 at 3:02 PM, Matthew Wahab
>> <matthew.wahab@foss.arm.com> wrote:
>>> On 19/05/16 15:54, Matthew Wahab wrote:
>>>> On 18/05/16 16:20, Joseph Myers wrote:
>>>>> On Wed, 18 May 2016, Matthew Wahab wrote:
>>>>>
>>>>> In short: instructions for direct HFmode arithmetic should be described
>>>>> with patterns with the standard names.  It's the job of the
>>>>> architecture-independent compiler to ensure that fp16 arithmetic in the
>>>>> user's source code only generates direct fp16 arithmetic in GIMPLE (and
>>>>> thus ends up using those patterns) if that is a correct representation of
>>>>> the source code's semantics according to ACLE.
>>>>>
>>>
>>> This patch changes the implementation to use the standard names for the
>>> HFmode arithmetic. Later patches will also be updated to use the
>>> arithmetic operators where appropriate.
>>>
>>
>> All fine except -
>>
>> Why can we not extend the <vrint_pattern> and the l<vrint_pattern> in
>> vfp.md for fp16 and avoid all the unspecs for vcvta and vrnd*
>> instructions ?
>>
>
> I now feel reasonably convinced that these can go away and be replaced
> by extending the <vrint_pattern> and l<vrint_pattern> expanders to
> consider FP16 as well. Given that we are still only in the middle of
> stage1 - I'm ok for you to apply this as is and then follow-up with a
> patch that gets rid of the UNSPECs . If this holds for add, sub and
> other patterns I don't see why it wouldn't hold for all these patterns
> as well.
>
> Joseph, do you have any opinions on whether we should be extending the
> standard pattern names or not for btrunc, ceil, round, floor,
> nearbyint, rint, lround, lfloor and lceil optabs for the HFmode
> quantities ?
>

Sorry for the delay replying.

I didn't extend the lvrint_pattern and vrint_pattern expanders to HF mode 
because of the general intention to do fp16 operations through the NEON 
intrinsics. If extending them to HF mode  produces the expected behaviour for 
the standard names that they implement then I agree that the change should be made.

I would prefer to do that as a separate patch though, to make sure that the new 
operations are properly tested. Some of the existing tests (in gcc.target/arm) 
use builtins that aren't available for HF mode so something else will be needed.

Matthew


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 8/17][ARM] Add VFP FP16 arithmetic instructions.
  2016-08-03 11:52       ` Ramana Radhakrishnan
  2016-08-03 13:10         ` Matthew Wahab
@ 2016-08-03 14:45         ` James Greenhalgh
  2016-08-03 17:44         ` Joseph Myers
  2 siblings, 0 replies; 73+ messages in thread
From: James Greenhalgh @ 2016-08-03 14:45 UTC (permalink / raw)
  To: Ramana Radhakrishnan; +Cc: Matthew Wahab, gcc-patches, Joseph S. Myers, nd

On Wed, Aug 03, 2016 at 12:52:42PM +0100, Ramana Radhakrishnan wrote:
> On Thu, Jul 28, 2016 at 12:37 PM, Ramana Radhakrishnan
> <ramana.gcc@googlemail.com> wrote:
> > On Mon, Jul 4, 2016 at 3:02 PM, Matthew Wahab
> > <matthew.wahab@foss.arm.com> wrote:
> >> On 19/05/16 15:54, Matthew Wahab wrote:
> >>> On 18/05/16 16:20, Joseph Myers wrote:
> >>>> On Wed, 18 May 2016, Matthew Wahab wrote:
> >>>>
> >>>> In short: instructions for direct HFmode arithmetic should be described
> >>>> with patterns with the standard names.  It's the job of the
> >>>> architecture-independent compiler to ensure that fp16 arithmetic in the
> >>>> user's source code only generates direct fp16 arithmetic in GIMPLE (and
> >>>> thus ends up using those patterns) if that is a correct representation of
> >>>> the source code's semantics according to ACLE.
> >>>>
> >>>> The intrinsics you provide can then be written to use direct arithmetic,
> >>>> and rely on convert_to_real_1 eliminating the promotions, rather than
> >>>> needing built-in functions at all, just like many arm_neon.h intrinsics
> >>>> make direct use of GNU C vector arithmetic.
> >>>
> >>> I think it's clear that this has exhausted my knowledge of FP semantics.
> >>>
> >>> Forcing promotion to single-precision was to settle concerns brought up in
> >>> internal discussions about __fp16 semantics. I'll see if anybody has any
> >>> problem with the changes you suggest.
> >>
> >> This patch changes the implementation to use the standard names for the
> >> HFmode arithmetic. Later patches will also be updated to use the
> >> arithmetic operators where appropriate.
> >>
> >> Changes since the last version of this patch:
> >> - The standard names for plus, minus, mult, div and fma are defined for
> >>   HF mode.
> >> - The patterns supporting the new ACLE intrinsics vnegh_f16, vaddh_f16,
> >>   vsubh_f16, vmulh_f16 and vdivh_f16 are removed, the arithmetic
> >>   operators will be used instead.
> >> - The tests are updated to expect f16 instructions rather than the f32
> >>   instructions that were previously emitted.
> >>
> >> Tested the series for arm-none-linux-gnueabihf with native bootstrap and
> >> make check and for arm-none-eabi and armeb-none-eabi with make check on
> >> an ARMv8.2-A emulator.
> >
> >
> > All fine except -
> >
> > Why can we not extend the <vrint_pattern> and the l<vrint_pattern> in
> > vfp.md for fp16 and avoid all the unspecs for vcvta and vrnd*
> > instructions ?
> >
> 
> I now feel reasonably convinced that these can go away and be replaced
> by extending the <vrint_pattern> and l<vrint_pattern> expanders to
> consider FP16 as well. Given that we are still only in the middle of
> stage1 - I'm ok for you to apply this as is and then follow-up with a
> patch that gets rid of the UNSPECs . If this holds for add, sub and
> other patterns I don't see why it wouldn't hold for all these patterns
> as well.
> 
> Joseph, do you have any opinions on whether we should be extending the
> standard pattern names or not for btrunc, ceil, round, floor,
> nearbyint, rint, lround, lfloor and lceil optabs for the HFmode
> quantities ?

Mapping these to standard pattern names is the right thing to do if they
implement the correct semantics for those standard pattern names. That's
true whether you access them by function name (as you would for _Float16),
or as intrinsics (as you may want to do for __fp16 in arm_fp16.h).

I see that the ARM port doesn't have as general a mechanism for specifying
intrinsics in config/arm/arm_neon_builtins.def as the AArch64 port has in
config/aarch64/aarch64-simd-builtins.def . In the AArch64 port it is
perfectly acceptable for a builtin to map on to a standard pattern name.
In the ARM port it seems there is a limitation such that all builtins *must*
map on to pattern names with the prefix "neon_".

Fixing this limitation (perhaps in the way that AArch64 goes about it with
a series of magic macros) would permit these to be Standard Pattern names.
See https://gcc.gnu.org/ml/gcc-patches/2013-04/msg01219.html for what I did
to AArch64 3 years ago.

I think that's probably the right way to go about resolving this, but I
haven't looked too hard in to what it would take in the ARM port to refactor
along those lines.
 
Thanks,
James

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 8/17][ARM] Add VFP FP16 arithmetic instructions.
  2016-08-03 11:52       ` Ramana Radhakrishnan
  2016-08-03 13:10         ` Matthew Wahab
  2016-08-03 14:45         ` James Greenhalgh
@ 2016-08-03 17:44         ` Joseph Myers
  2 siblings, 0 replies; 73+ messages in thread
From: Joseph Myers @ 2016-08-03 17:44 UTC (permalink / raw)
  To: Ramana Radhakrishnan; +Cc: Matthew Wahab, gcc-patches

On Wed, 3 Aug 2016, Ramana Radhakrishnan wrote:

> Joseph, do you have any opinions on whether we should be extending the
> standard pattern names or not for btrunc, ceil, round, floor,
> nearbyint, rint, lround, lfloor and lceil optabs for the HFmode
> quantities ?

If the semantics match a standard pattern, you should use the standard 
name.

It may well be the case that many of those patterns would not actually be 
used for generic code even after my _FloatN patches, since (a) I only add 
a minimal set of built-in functions, not the full set of all libm 
functions for all _FloatN / _FloatNx types (given possible issues with 
enum size and initialization time when seven new variants of every libm 
function are added as built-in functions) and (b) many relevant 
optimizations only work for float, double and long double.  But I think 
the right pattern names should still be used.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 15/17][ARM] Add tests for ARMv8.2-A FP16 support.
  2016-07-04 14:17   ` Matthew Wahab
@ 2016-08-04  8:34     ` Ramana Radhakrishnan
  0 siblings, 0 replies; 73+ messages in thread
From: Ramana Radhakrishnan @ 2016-08-04  8:34 UTC (permalink / raw)
  To: Matthew Wahab; +Cc: gcc-patches

On Mon, Jul 4, 2016 at 3:17 PM, Matthew Wahab
<matthew.wahab@foss.arm.com> wrote:
> On 17/05/16 15:48, Matthew Wahab wrote:
>> Support for using the half-precision floating point operations added by
>> the ARMv8.2-A FP16 extension is based on the macros and intrinsics added
>> to the ACLE for the extension.
>>
>> This patch adds tests to check the compilers treatment of the ACLE
>> macros and the code generated for the new intrinsics. It does not
>> include the executable tests for the
>> gcc.target/aarch64/advsimd-intrinsics testsuite. Those are added later
>> in the patch series.
>
> Changes since the previous version are:
>
> - Fix the vsqrte/vrsqrte spelling mistake.
>
> - armv8_2-fp16-scalar-2.c: Set option -std=c11, needed to test that
>   vaddh_f16 (vmulh_f16 (a, b), c) generates a VMLA. (Options enabled
>   with the default -std=g11 mean that VFMA would be generated
>   otherwise.)
>
> Tested the series for arm-none-linux-gnueabihf with native bootstrap and
> make check and for arm-none-eabi and armeb-none-eabi with make check on
> an ARMv8.2-A emulator.
>
> Ok for trunk?
> Matthew

OK.

regards
Ramana


>
> testsuite/
> 2016-07-04  Matthew Wahab  <matthew.wahab@arm.com>
>
>
>         * gcc.target/arm/armv8_2-fp16-neon-1.c: New.
>         * gcc.target/arm/armv8_2-fp16-scalar-1.c: New.
>         * gcc.target/arm/armv8_2-fp16-scalar-2.c: New.
>         * gcc.target/arm/attr-fp16-arith-1.c: Add a test of intrinsics
>         support.
>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 16/17][ARM] Add tests for VFP FP16 ACLE instrinsics.
  2016-07-04 14:18       ` Matthew Wahab
@ 2016-08-04  8:35         ` Ramana Radhakrishnan
  0 siblings, 0 replies; 73+ messages in thread
From: Ramana Radhakrishnan @ 2016-08-04  8:35 UTC (permalink / raw)
  To: Matthew Wahab; +Cc: gcc-patches

On Mon, Jul 4, 2016 at 3:18 PM, Matthew Wahab
<matthew.wahab@foss.arm.com> wrote:
> On 18/05/16 11:58, Matthew Wahab wrote:
>> On 18/05/16 02:06, Joseph Myers wrote:
>>> On Tue, 17 May 2016, Matthew Wahab wrote:
>>>
>>>> In some tests, there are unavoidable differences in precision when
>>>> calculating the actual and the expected results of an FP16 operation. A
>>>> new support function CHECK_FP_BIAS is used so that these tests can check
>>>> for an acceptable margin of error. In these tests, the tolerance is
>>>> given as the absolute integer difference between the bitvectors of the
>>>> expected and the actual results.
>>>
>>> As far as I can see, CHECK_FP_BIAS is only used in the following patch,
>>> but there is another bias test in vsqrth_f16_1.c in this patch.
>>
>> This is my mistake, the CHECK_FP_BIAS is used for the NEON tests and
>> should
>>  have gone into that patch. The VFP test can do a simpler check so doesn't
>> need the macro.
>>
>>> Could you clarify where the "unavoidable differences in precision" come
>>> from? Are the results of some of the new instructions not fully
>>> specified,
>>> only specified within a given precision?  (As far as I can tell the
>>> existing v8 instructions for reciprocal and reciprocal square root
>>> estimates do have fully defined results, despite being loosely described
>>> as esimtates.)
>>
>> The expected results in the new tests are represented as expressions whose
>> value is expected to be calculated at compile-time. This makes the tests
>> more readable but differences in the precision between the the compiler
>> and
>> the HW calculations mean that for vrecpe_f16, vrecps_f16, vrsqrts_f16 and
>> vsqrth_f16_1.c the expected and actual results are different.
>>
>> On reflection, it may be better to remove the CHECK_FP_BIAS macro and, for
>> the tests that needed it, to drop the compiler calculation and just use
>> the
>>  expected hexadecimal value.
>>
>> Other tests depending on compiler-time calculations involve relatively
>> simple arithmetic operations and it's not clear if they are susceptible to
>> the same rounding errors. I have limited knowledge in FP arithmetic though
>> so I'll look into this.
>
> The scalar tests added in this patch and the vector tests added in the
> next patch have been reworked to use the exact values for the expected
> results rather than compile-time expressions. The CHECK_FP_BIAS macro is
> not used and is removed from this patch.
>
> The intention with these tests and with the vector tests is to check
> that the compiler emits code that produces the same results as the
> instruction regardless of any optimizations that it may apply. The
> expected results for the tests were produced using inline assembler
> taking the same inputs as the intrinsics being tested.
>
> Other changes are to add and use some (limited) templates for scalar
> operations and to add progress and error reporting, making the scalar
> tests more consistent with those for the vector operations.
>
> Tested the series for arm-none-linux-gnueabihf with native bootstrap and
> make check and for arm-none-eabi and armeb-none-eabi with make check on
> an ARMv8.2-A emulator.
>
> Ok for trunk?
> Matthew
>

OK, please watch out for any fallout from the autotesters especially
with strange multilib combinations.

Ramana

> testsuite/
> 2016-07-04  Jiong Wang  <jiong.wang@arm.com>
>             Matthew Wahab  <matthew.wahab@arm.com>
>
>         * gcc.target/aarch64/advsimd-intrinsics/binary_scalar_op.inc: New.
>         * gcc.target/aarch64/advsimd-intrinsics/unary_scalar_op.inc: New.
>         * gcc.target/aarch64/advsimd-intrinsics/ternary_scalar_op.inc: New.
>
>         * gcc.target/aarch64/advsimd-intrinsics/vabsh_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vaddh_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcvtah_s32_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcvtah_u32_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_s32_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_u32_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_s32_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_u32_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcvth_n_s32_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcvth_n_u32_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcvth_s32_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcvth_u32_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcvtmh_s32_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcvtmh_u32_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcvtnh_s32_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcvtnh_u32_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcvtph_s32_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcvtph_u32_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vdivh_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vfmah_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vfmsh_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vmaxnmh_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vminnmh_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vmulh_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vnegh_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vrndah_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vrndh_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vrndih_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vrndmh_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vrndnh_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vrndph_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vrndxh_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vsqrth_f16_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vsubh_f16_1.c: New.
>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 17/17][ARM] Add tests for NEON FP16 ACLE intrinsics.
  2016-07-04 14:22   ` Matthew Wahab
@ 2016-08-04  9:01     ` Ramana Radhakrishnan
  0 siblings, 0 replies; 73+ messages in thread
From: Ramana Radhakrishnan @ 2016-08-04  9:01 UTC (permalink / raw)
  To: Matthew Wahab; +Cc: gcc-patches

On Mon, Jul 4, 2016 at 3:22 PM, Matthew Wahab
<matthew.wahab@foss.arm.com> wrote:
> On 17/05/16 15:52, Matthew Wahab wrote:
>> Support for using the half-precision floating point operations added by
>> the
>> ARMv8.2-A FP16 extension is based on the macros and intrinsics added to
>> the
>> ACLE for the extension.
>>
>> This patch adds executable tests for the ACLE Adv.SIMD (NEON) intrinsics
>> to
>> the advsimd-intrinsics testsuite.
>
> The tests added in the previous version of the patch, which only tested
> the f16 variants of intrinsics, are dropped. Instead, this patch extends
> the existing intrinsics tests to support the new f16 variants. Where the
> intrinsic is new, a new test for the intrinsic is added with f16 as the
> only variant. (This is consistent with existing practice, e.g vcvt.c.)
> The new tests are based on similar existing tests, e.g. maxnm_1.c is
> derived from max.c and the vcvt{a,m,p}_1.c tests, via vcvtX.inc, are
> based on vcvt.c.
>
> Since they are only available when the FP16 arithmetic instructions are
> enabled, advsimd-intrinsics.exp is updated to set -march=armv8.2+fp when
> the hardware supports it and the tests for the f16 intrinscs are guarded
> with __ARM_FEATURE_FP16_VECTOR_ARITHMETIC. Where a test has only f16
> variants, the test file itself is also guarded with
> dg-require-effective-target arm_v8_2a_fp16_neon_hw so that it reports
> UNSUPPORTED rather than PASS if FP16 isn't supported.
>
> Tested the series for arm-none-linux-gnueabihf with native bootstrap and
> make check and for arm-none-eabi and armeb-none-eabi with make check on
> an ARMv8.2-A emulator. Also tested the advsimd-intrinscs tests
> cross-compiled
> for aarch64-none-elf on an ARMv8.2-A emulator.
>
> Ok for trunk?

OK.

Thanks,
Ramana
> Matthew
>
> testsuite/
> 2016-07-04  Matthew Wahab  <matthew.wahab@arm.com>
>
>         * gcc.target/advsimd-intrinsics/advsimd-intrinsics.exp: Enable
>         -march=armv8.2-a+fp16 when supported by the hardware.
>         * gcc.target/aarch64/advsimd-intrinsics/binary_op_float.inc: New.
>         * gcc.target/aarch64/advsimd-intrinsics/binary_op_no64.inc:
>         Add F16 tests, enabled if macro HAS_FLOAT16_VARIANT is defined.  Add
>         semi-colons to a macro invocations.
>         * gcc.target/aarch64/advsimd-intrinsics/cmp_fp_op.inc: Add F16
>         tests, enabled if macro __ARM_FEATURE_FP16_VECTOR_ARITHMETIC is
>         defined.
>         * gcc.target/aarch64/advsimd-intrinsics/cmp_op.inc: Likewise.
>         * gcc.target/aarch64/advsimd-intrinsics/cmp_zero_op.inc: New.
>         * gcc.target/gcc.target/aarch64/advsimd-intrinsics/vabd.c: Add F16
>         tests, enabled if macro __ARM_FEATURE_FP16_VECTOR_ARITHMETIC is
>         defined.
>         * gcc.target/gcc.target/aarch64/advsimd-intrinsics/vabs.c: Likewise.
>         * gcc.target/gcc.target/aarch64/advsimd-intrinsics/vadd.c: Likewise.
>         * gcc.target/gcc.target/aarch64/advsimd-intrinsics/vcage.c:
> Likewise.
>         * gcc.target/gcc.target/aarch64/advsimd-intrinsics/vcagt.c:
> Likewise.
>         * gcc.target/gcc.target/aarch64/advsimd-intrinsics/vcale.c:
> Likewise.
>         * gcc.target/gcc.target/aarch64/advsimd-intrinsics/vcalt.c:
> Likewise.
>         * gcc.target/gcc.target/aarch64/advsimd-intrinsics/vceq.c: Likewise.
>         * gcc.target/aarch64/advsimd-intrinsics/vceqz_1.c: New.
>         * gcc.target/gcc.target/aarch64/advsimd-intrinsics/vcge.c: Add F16
>         tests, enabled if macro __ARM_FEATURE_FP16_VECTOR_ARITHMETIC is
>         defined.
>         * gcc.target/aarch64/advsimd-intrinsics/vcgez_1.c: New.
>         * gcc.target/gcc.target/aarch64/advsimd-intrinsics/vcgt.c: Add F16
>         tests, enabled if macro __ARM_FEATURE_FP16_VECTOR_ARITHMETIC is
>         defined.
>         * gcc.target/aarch64/advsimd-intrinsics/vcgtz_1.c: New.
>         * gcc.target/gcc.target/aarch64/advsimd-intrinsics/vcle.c: Add F16
>         tests, enabled if macro __ARM_FEATURE_FP16_VECTOR_ARITHMETIC is
>         defined.
>         * gcc.target/aarch64/advsimd-intrinsics/vclez_1.c: New.
>         * gcc.target/gcc.target/aarch64/advsimd-intrinsics/vclt.c: Add F16
>         tests, enabled if macro __ARM_FEATURE_FP16_VECTOR_ARITHMETIC is
>         defined.
>         * gcc.target/aarch64/advsimd-intrinsics/vcltz_1.c: New.
>         * gcc.target/gcc.target/aarch64/advsimd-intrinsics/vcvt.c: Add F16
>         tests, enabled if macro __ARM_FEATURE_FP16_VECTOR_ARITHMETIC is
>         defined.  Also fix some white-space.
>         * gcc.target/aarch64/advsimd-intrinsics/vcvtX.inc: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcvta_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcvtm_1.c: New.
>         * gcc.target/aarch64/advsimd-intrinsics/vcvtp_1.c: New.
>         * gcc.target/gcc.target/aarch64/advsimd-intrinsics/vfma.c: Add F16
>         tests, enabled if macro __ARM_FEATURE_FP16_VECTOR_ARITHMETIC is
>         defined.  Also fix some long lines and white-space.
>         * gcc.target/gcc.target/aarch64/advsimd-intrinsics/vfms.c: Add F16
>         tests, enabled if macro __ARM_FEATURE_FP16_VECTOR_ARITHMETIC is
>         defined.  Also fix some long lines and white-space.
>         * gcc.target/gcc.target/aarch64/advsimd-intrinsics/vmax.c: Add F16
>         tests, enabled if macro __ARM_FEATURE_FP16_VECTOR_ARITHMETIC is
>         defined.
>         * gcc.target/aarch64/advsimd-intrinsics/vmaxnm_1.c: New.
>         * gcc.target/gcc.target/aarch64/advsimd-intrinsics/vmin.c: Add F16
>         tests, enabled if macro __ARM_FEATURE_FP16_VECTOR_ARITHMETIC is
>         defined.
>         * gcc.target/aarch64/advsimd-intrinsics/vminnm_1.c: New.
>         * gcc.target/gcc.target/aarch64/advsimd-intrinsics/vmul.c: Add F16
>         tests, enabled if macro __ARM_FEATURE_FP16_VECTOR_ARITHMETIC is
>         defined.
>         * gcc.target/aarch64/advsimd-intrinsics/vmul_lane.c: Likewise.
>         * gcc.target/gcc.target/aarch64/advsimd-intrinsics/vmul_n.c:
>         Likewise.
>         * gcc.target/gcc.target/aarch64/advsimd-intrinsics/vneg.c:
>         Likewise.
>         * gcc.target/aarch64/advsimd-intrinsics/vpXXX.inc: Likewise.
>         * gcc.target/gcc.target/aarch64/advsimd-intrinsics/vpadd.c:
>         Likewise.
>         * gcc.target/gcc.target/aarch64/advsimd-intrinsics/vpmax.c:
>         Likewise.
>         * gcc.target/gcc.target/aarch64/advsimd-intrinsics/vpmin.c:
>         Likewise.
>         * gcc.target/gcc.target/aarch64/advsimd-intrinsics/vrecpe.c:
>         Likewise.
>         * gcc.target/gcc.target/aarch64/advsimd-intrinsics/vrecps.c:
>         Likewise.
>         * gcc.target/gcc.target/aarch64/advsimd-intrinsics/vrnd.c:
>         Likewise.
>         * gcc.target/aarch64/advsimd-intrinsics/vrndX.inc: Likewise.
>         * gcc.target/gcc.target/aarch64/advsimd-intrinsics/vrnda.c:
>         Likewise.
>         * gcc.target/gcc.target/aarch64/advsimd-intrinsics/vrndm.c:
>         Likewise.
>         * gcc.target/gcc.target/aarch64/advsimd-intrinsics/vrndn.c:
>         Likewise.
>         * gcc.target/gcc.target/aarch64/advsimd-intrinsics/vrndp.c:
>         Likewise.
>         * gcc.target/gcc.target/aarch64/advsimd-intrinsics/vrndx.c:
>         Likewise.
>         * gcc.target/aarch64/advsimd-intrinsics/vrsqrte.c: Likewise.
>         * gcc.target/aarch64/advsimd-intrinsics/vrsqrts.c: Likewise.
>         * gcc.target/gcc.target/aarch64/advsimd-intrinsics/vsub.c:
>         Likewise.
>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 1/17][ARM] Add ARMv8.2-A command line option and profile.
  2016-07-04 13:46   ` Matthew Wahab
@ 2016-09-21 13:57     ` Ramana Radhakrishnan
  0 siblings, 0 replies; 73+ messages in thread
From: Ramana Radhakrishnan @ 2016-09-21 13:57 UTC (permalink / raw)
  To: Matthew Wahab; +Cc: gcc-patches

On Mon, Jul 4, 2016 at 2:46 PM, Matthew Wahab
<matthew.wahab@foss.arm.com> wrote:
> On 17/05/16 15:22, Matthew Wahab wrote:
>> This patch adds the command options for the architecture ARMv8.2-A and
>> the half-precision extension. The architecture is selected by
>> -march=armv8.2-a and has all the properties of -march=armv8.1-a.
>>
>> This patch also enables the CRC extension (+crc) which is required
>> for both ARMv8.2-A and ARMv8.1-A architectures but is not currently
>> enabled by default for -march=armv8.1-a.
>>
>> The half-precision extension is selected using the extension +fp16. This
>> enables the VFP FP16 instructions if an ARMv8 VFP unit is also
>> specified, e.g. by -mfpu=fp-armv8. It also enables the FP16 NEON
>> instructions if an ARMv8 NEON unit is specified, e.g. by
>> -mfpu=neon-fp-armv8. Note that if the NEON FP16 instructions are enabled
>> then so are the VFP FP16 instructions.
>
> This a minor respin that moves the setting of arm_fp16_inst in
> arm_option_override to immediately before it is used to set the required
> arm_fp16_format.
>
> Tested the series for arm-none-linux-gnueabihf with native bootstrap and
> make check and for arm-none-eabi and armeb-none-eabi with make check on
> an ARMv8.2-A emulator.


OK.

Thanks,
Ramana
>
> 2016-07-04  Matthew Wahab  <matthew.wahab@arm.com>
>
>
>         * config/arm/arm-arches.def ("armv8.1-a"): Add FL_CRC32.
>         ("armv8.2-a"): New.
>         ("armv8.2-a+fp16"): New.
>         * config/arm/arm-protos.h (FL2_ARCH8_2): New.
>         (FL2_FP16INST): New.
>         (FL2_FOR_ARCH8_2A): New.
>         * config/arm/arm-tables.opt: Regenerate.
>         * config/arm/arm.c (arm_arch8_2): New.
>         (arm_fp16_inst): New.
>         (arm_option_override): Set arm_arch8_2 and arm_fp16_inst.  Check
>         for incompatible fp16-format settings.
>         * config/arm/arm.h (TARGET_VFP_FP16INST): New.
>         (TARGET_NEON_FP16INST): New.
>         (arm_arch8_2): Declare.
>         (arm_fp16_inst): Declare.
>         * config/arm/bpabi.h (BE8_LINK_SPEC): Add entries for
>         march=armv8.2-a and march=armv8.2-a+fp16.
>         * config/arm/t-aprofile (Arch Matches): Add entries for armv8.2-a
>         and armv8.2-a+fp16.
>         * doc/invoke.texi (ARM Options): Add "-march=armv8.1-a",
>         "-march=armv8.2-a" and "-march=armv8.2-a+fp16".
>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 6/17][ARM] Add data processing intrinsics for float16_t.
  2016-07-27 13:59   ` Ramana Radhakrishnan
@ 2016-09-25 14:44     ` Christophe Lyon
  2016-09-26  9:56       ` Matthew Wahab
  0 siblings, 1 reply; 73+ messages in thread
From: Christophe Lyon @ 2016-09-25 14:44 UTC (permalink / raw)
  To: Matthew Wahab, Ramana Radhakrishnan; +Cc: gcc-patches

Hi Matthew,


On 27 July 2016 at 15:59, Ramana Radhakrishnan
<ramana.gcc@googlemail.com> wrote:
> On Tue, May 17, 2016 at 3:31 PM, Matthew Wahab
> <matthew.wahab@foss.arm.com> wrote:
>> The ACLE specifies a number of intrinsics for manipulating vectors
>> holding values in most of the integer and floating point type. These
>> include 16-bit integer types but not 16-bit floating point even though
>> the same instruction is used for both.
>>
>> A future version of the ACLE extends the data processing intrinscs to
>> the 16-bit floating point types, making the intrinsics available
>> under the same conditions as the ARM __fp16 type.
>>
>> This patch adds the new intrinsics:
>>  vbsl_f16, vbslq_f16, vdup_n_f16, vdupq_n_f16, vdup_lane_f16,
>>  vdupq_lane_f16, vext_f16, vextq_f16, vmov_n_f16, vmovq_n_f16,
>>  vrev64_f16, vrev64q_f16, vtrn_f16, vtrnq_f16, vuzp_f16, vuzpq_f16,
>>  vzip_f16, vzipq_f16.
>>
>> This patch also updates the advsimd-intrinsics testsuite to test the f16
>> variants for ARM targets. These intrinsics are only implemented in the
>> ARM target so the tests are disabled for AArch64 using an extra
>> condition on a new convenience macro FP16_SUPPORTED. This patch also
>> disables, for the ARM target, the testsuite defined macro vdup_n_f16 as
>> it is no longer needed.
>>
>> Tested the series for arm-none-linux-gnueabihf with native bootstrap and
>> make check and for arm-none-eabi and armeb-none-eabi with make check on
>> an ARMv8.2-A emulator. Also tested for aarch64-none-elf with the
>> advsimd-intrinsics testsuite using an ARMv8.2-A emulator.
>>
>> Ok for trunk?
>> Matthew
>>
>> 2016-05-17  Matthew Wahab  <matthew.wahab@arm.com>
>>
>>         * config/arm/arm.c (arm_evpc_neon_vuzp): Add support for V8HF and
>>         V4HF modes.
>>         (arm_evpc_neon_vzip): Likewise.
>>         (arm_evpc_neon_vrev): Likewise.
>>         (arm_evpc_neon_vtrn): Likewise.
>>         (arm_evpc_neon_vext): Likewise.
>>         * config/arm/arm_neon.h (vbsl_f16): New.
>>         (vbslq_f16): New.
>>         (vdup_n_f16): New.
>>         (vdupq_n_f16): New.
>>         (vdup_lane_f16): New.
>>         (vdupq_lane_f16): New.
>>         (vext_f16): New.
>>         (vextq_f16): New.
>>         (vmov_n_f16): New.
>>         (vmovq_n_f16): New.
>>         (vrev64_f16): New.
>>         (vrev64q_f16): New.
>>         (vtrn_f16): New.
>>         (vtrnq_f16): New.
>>         (vuzp_f16): New.
>>         (vuzpq_f16): New.
>>         (vzip_f16): New.
>>         (vzipq_f16): New.
>>         * config/arm/arm_neon_buillins.def (vdup_n): New (v8hf, v4hf
>> variants).
>>         (vdup_lane): New (v8hf, v4hf variants).
>>         (vext): New (v8hf, v4hf variants).
>>         (vbsl): New (v8hf, v4hf variants).
>>         * config/arm/iterators.md (VDQWH): New.
>>         (VH): New.
>>         (V_double_vector_mode): Add V8HF and V4HF.  Fix white-space.
>>         (Scalar_mul_8_16): Fix white-space.
>>         (Is_d_reg): Add V4HF and V8HF.
>>         * config/arm/neon.md (neon_vdup_lane<mode>_internal): New.
>>         (neon_vdup_lane<mode>): New.
>>         (neon_vtrn<mode>_internal): Replace VDQW with VDQWH.
>>         (*neon_vtrn<mode>_insn): Likewise.
>>         (neon_vzip<mode>_internal): Likewise. Also fix white-space.
>>         (*neon_vzip<mode>_insn): Likewise
>>         (neon_vuzp<mode>_internal): Likewise.
>>         (*neon_vuzp<mode>_insn): Likewise
>>         * config/arm/vec-common.md (vec_perm_const<mode>): New.
>>
>> testsuite/
>> 2016-05-17  Matthew Wahab  <matthew.wahab@arm.com>
>>
>>         * gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
>>         (FP16_SUPPORTED): New
>>         (vdup_n_f16): Disable for non-AArch64 targets.
>>         * gcc.target/aarch64/advsimd-intrinsics/vbsl.c: Add __fp16 tests,
>>         conditional on FP16_SUPPORTED.
>>         * gcc.target/aarch64/advsimd-intrinsics/vdup-vmov.c: Likewise.
>>         * gcc.target/aarch64/advsimd-intrinsics/vdup_lane.c: Likewise.
>>         * gcc.target/aarch64/advsimd-intrinsics/vext.c: Likewise.
>>         * gcc.target/aarch64/advsimd-intrinsics/vrev.c: Likewise.
>>         * gcc.target/aarch64/advsimd-intrinsics/vshuffle.inc: Add support
>>         for testing __fp16.
>>         * gcc.target/aarch64/advsimd-intrinsics/vtrn.c: Add __fp16 tests,
>>         conditional on FP16_SUPPORTED.
>>         * gcc.target/aarch64/advsimd-intrinsics/vuzp.c: Likewise.
>>         * gcc.target/aarch64/advsimd-intrinsics/vzip.c: Likewise.
>>
>
> OK.
>
>
> Ramana

Since you committed this patch, I've noticed that libgcc fails to build
when GCC is configured:
--target arm-none-eabi and default cpu
/tmp/9649048_29.tmpdir/ccuBwQJJ.s: Assembler messages:
/tmp/9649048_29.tmpdir/ccuBwQJJ.s:64: Error: selected processor does
not support ARM mode `movwlt r0,32768'
/tmp/9649048_29.tmpdir/ccuBwQJJ.s:65: Error: selected processor does
not support ARM mode `movwge r0,32767'
make[4]: *** [_ssaddHQ.o] Error 1
make[4]: Leaving directory
`/tmp/9649048_29.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc1/arm-none-eabi/fpu/libgcc'

Thanks,

Christophe

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 6/17][ARM] Add data processing intrinsics for float16_t.
  2016-09-25 14:44     ` Christophe Lyon
@ 2016-09-26  9:56       ` Matthew Wahab
  2016-09-26 12:54         ` Christophe Lyon
  0 siblings, 1 reply; 73+ messages in thread
From: Matthew Wahab @ 2016-09-26  9:56 UTC (permalink / raw)
  To: Christophe Lyon, Ramana Radhakrishnan; +Cc: gcc-patches

Hello,

On 25/09/16 14:00, Christophe Lyon wrote:
>>>
>>> This patch adds the new intrinsics:
>>>   vbsl_f16, vbslq_f16, vdup_n_f16, vdupq_n_f16, vdup_lane_f16,
>>>   vdupq_lane_f16, vext_f16, vextq_f16, vmov_n_f16, vmovq_n_f16,
>>>   vrev64_f16, vrev64q_f16, vtrn_f16, vtrnq_f16, vuzp_f16, vuzpq_f16,
>>>   vzip_f16, vzipq_f16.
>>>
>>> This patch also updates the advsimd-intrinsics testsuite to test the f16
>>> variants for ARM targets. These intrinsics are only implemented in the
>>> ARM target so the tests are disabled for AArch64 using an extra
>>> condition on a new convenience macro FP16_SUPPORTED. This patch also
>>> disables, for the ARM target, the testsuite defined macro vdup_n_f16 as
>>> it is no longer needed.
>
> Since you committed this patch, I've noticed that libgcc fails to build
> when GCC is configured:
> --target arm-none-eabi and default cpu
> /tmp/9649048_29.tmpdir/ccuBwQJJ.s: Assembler messages:
> /tmp/9649048_29.tmpdir/ccuBwQJJ.s:64: Error: selected processor does
> not support ARM mode `movwlt r0,32768'
> /tmp/9649048_29.tmpdir/ccuBwQJJ.s:65: Error: selected processor does
> not support ARM mode `movwge r0,32767'
> make[4]: *** [_ssaddHQ.o] Error 1
> make[4]: Leaving directory
> `/tmp/9649048_29.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc1/arm-none-eabi/fpu/libgcc'
>


I can't reproduce the failure, could you send the configure arguments for the 
build.

I've tried assembling the string 'movw r0, 32768' and get the error when 
-march=armv6kz or earlier. I suspect the new movhi and/or movhf patterns added 
earlier in the series need the architecture level added as a precondition but 
I'll need to look into it.

Matthew

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 6/17][ARM] Add data processing intrinsics for float16_t.
  2016-09-26  9:56       ` Matthew Wahab
@ 2016-09-26 12:54         ` Christophe Lyon
  2016-09-26 13:11           ` Ramana Radhakrishnan
  0 siblings, 1 reply; 73+ messages in thread
From: Christophe Lyon @ 2016-09-26 12:54 UTC (permalink / raw)
  To: Matthew Wahab; +Cc: Ramana Radhakrishnan, gcc-patches

On 26 September 2016 at 11:43, Matthew Wahab <matthew.wahab@foss.arm.com> wrote:
> Hello,
>
> On 25/09/16 14:00, Christophe Lyon wrote:
>>>>
>>>>
>>>> This patch adds the new intrinsics:
>>>>   vbsl_f16, vbslq_f16, vdup_n_f16, vdupq_n_f16, vdup_lane_f16,
>>>>   vdupq_lane_f16, vext_f16, vextq_f16, vmov_n_f16, vmovq_n_f16,
>>>>   vrev64_f16, vrev64q_f16, vtrn_f16, vtrnq_f16, vuzp_f16, vuzpq_f16,
>>>>   vzip_f16, vzipq_f16.
>>>>
>>>> This patch also updates the advsimd-intrinsics testsuite to test the f16
>>>> variants for ARM targets. These intrinsics are only implemented in the
>>>> ARM target so the tests are disabled for AArch64 using an extra
>>>> condition on a new convenience macro FP16_SUPPORTED. This patch also
>>>> disables, for the ARM target, the testsuite defined macro vdup_n_f16 as
>>>> it is no longer needed.
>>
>>
>> Since you committed this patch, I've noticed that libgcc fails to build
>> when GCC is configured:
>> --target arm-none-eabi and default cpu
>> /tmp/9649048_29.tmpdir/ccuBwQJJ.s: Assembler messages:
>> /tmp/9649048_29.tmpdir/ccuBwQJJ.s:64: Error: selected processor does
>> not support ARM mode `movwlt r0,32768'
>> /tmp/9649048_29.tmpdir/ccuBwQJJ.s:65: Error: selected processor does
>> not support ARM mode `movwge r0,32767'
>> make[4]: *** [_ssaddHQ.o] Error 1
>> make[4]: Leaving directory
>>
>> `/tmp/9649048_29.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc1/arm-none-eabi/fpu/libgcc'
>>
>
>
> I can't reproduce the failure, could you send the configure arguments for
> the build.
>

If I'm not mistaken, that is:
 --target=arm-none-eabi  --disable-nls --disable-libgomp
--disable-libmudflap --disable-libcilkrts --enable-checking
--enable-languages=c,c++ --with-newlib

Maybe you've disabled multilibs?

> I've tried assembling the string 'movw r0, 32768' and get the error when
> -march=armv6kz or earlier. I suspect the new movhi and/or movhf patterns
> added earlier in the series need the architecture level added as a
> precondition but I'll need to look into it.
>
> Matthew

Thanks,

Christophe

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 6/17][ARM] Add data processing intrinsics for float16_t.
  2016-09-26 12:54         ` Christophe Lyon
@ 2016-09-26 13:11           ` Ramana Radhakrishnan
  2016-09-26 13:19             ` Matthew Wahab
  2016-09-26 13:21             ` Christophe Lyon
  0 siblings, 2 replies; 73+ messages in thread
From: Ramana Radhakrishnan @ 2016-09-26 13:11 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: Matthew Wahab, gcc-patches

On Mon, Sep 26, 2016 at 1:48 PM, Christophe Lyon
<christophe.lyon@linaro.org> wrote:
> On 26 September 2016 at 11:43, Matthew Wahab <matthew.wahab@foss.arm.com> wrote:
>> Hello,
>>
>> On 25/09/16 14:00, Christophe Lyon wrote:
>>>>>
>>>>>
>>>>> This patch adds the new intrinsics:
>>>>>   vbsl_f16, vbslq_f16, vdup_n_f16, vdupq_n_f16, vdup_lane_f16,
>>>>>   vdupq_lane_f16, vext_f16, vextq_f16, vmov_n_f16, vmovq_n_f16,
>>>>>   vrev64_f16, vrev64q_f16, vtrn_f16, vtrnq_f16, vuzp_f16, vuzpq_f16,
>>>>>   vzip_f16, vzipq_f16.
>>>>>
>>>>> This patch also updates the advsimd-intrinsics testsuite to test the f16
>>>>> variants for ARM targets. These intrinsics are only implemented in the
>>>>> ARM target so the tests are disabled for AArch64 using an extra
>>>>> condition on a new convenience macro FP16_SUPPORTED. This patch also
>>>>> disables, for the ARM target, the testsuite defined macro vdup_n_f16 as
>>>>> it is no longer needed.
>>>
>>>
>>> Since you committed this patch, I've noticed that libgcc fails to build
>>> when GCC is configured:
>>> --target arm-none-eabi and default cpu
>>> /tmp/9649048_29.tmpdir/ccuBwQJJ.s: Assembler messages:
>>> /tmp/9649048_29.tmpdir/ccuBwQJJ.s:64: Error: selected processor does
>>> not support ARM mode `movwlt r0,32768'
>>> /tmp/9649048_29.tmpdir/ccuBwQJJ.s:65: Error: selected processor does
>>> not support ARM mode `movwge r0,32767'
>>> make[4]: *** [_ssaddHQ.o] Error 1
>>> make[4]: Leaving directory
>>>
>>> `/tmp/9649048_29.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc1/arm-none-eabi/fpu/libgcc'
>>>
>>
>>
>> I can't reproduce the failure, could you send the configure arguments for
>> the build.
>>
>
> If I'm not mistaken, that is:
>  --target=arm-none-eabi  --disable-nls --disable-libgomp
> --disable-libmudflap --disable-libcilkrts --enable-checking
> --enable-languages=c,c++ --with-newlib
>
> Maybe you've disabled multilibs?


I'm pretty sure I built this as part of reviewing all these patches
with --with-mutlib-list=aprofile and didnt' see any failures. Not sure
what's going on here.

Ramana
>
>> I've tried assembling the string 'movw r0, 32768' and get the error when
>> -march=armv6kz or earlier. I suspect the new movhi and/or movhf patterns
>> added earlier in the series need the architecture level added as a
>> precondition but I'll need to look into it.
>>
>> Matthew
>
> Thanks,
>
> Christophe

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 6/17][ARM] Add data processing intrinsics for float16_t.
  2016-09-26 13:11           ` Ramana Radhakrishnan
@ 2016-09-26 13:19             ` Matthew Wahab
  2016-09-26 13:21             ` Christophe Lyon
  1 sibling, 0 replies; 73+ messages in thread
From: Matthew Wahab @ 2016-09-26 13:19 UTC (permalink / raw)
  To: Ramana Radhakrishnan, Christophe Lyon; +Cc: gcc-patches

On 26/09/16 14:03, Ramana Radhakrishnan wrote:
> On Mon, Sep 26, 2016 at 1:48 PM, Christophe Lyon
> <christophe.lyon@linaro.org> wrote:
>> On 26 September 2016 at 11:43, Matthew Wahab <matthew.wahab@foss.arm.com> wrote:
>>> Hello,
>>>
>>> On 25/09/16 14:00, Christophe Lyon wrote:
>>>>>>
>>>>>>
>>>>>> This patch adds the new intrinsics:
>>>>>>    vbsl_f16, vbslq_f16, vdup_n_f16, vdupq_n_f16, vdup_lane_f16,
>>>>>>    vdupq_lane_f16, vext_f16, vextq_f16, vmov_n_f16, vmovq_n_f16,
>>>>>>    vrev64_f16, vrev64q_f16, vtrn_f16, vtrnq_f16, vuzp_f16, vuzpq_f16,
>>>>>>    vzip_f16, vzipq_f16.
>>>>>>
>>>>>> This patch also updates the advsimd-intrinsics testsuite to test the f16
>>>>>> variants for ARM targets. These intrinsics are only implemented in the
>>>>>> ARM target so the tests are disabled for AArch64 using an extra
>>>>>> condition on a new convenience macro FP16_SUPPORTED. This patch also
>>>>>> disables, for the ARM target, the testsuite defined macro vdup_n_f16 as
>>>>>> it is no longer needed.
>>>>
>>>>
>>>> Since you committed this patch, I've noticed that libgcc fails to build
>>>> when GCC is configured:
>>>> --target arm-none-eabi and default cpu
>>>> /tmp/9649048_29.tmpdir/ccuBwQJJ.s: Assembler messages:
>>>> /tmp/9649048_29.tmpdir/ccuBwQJJ.s:64: Error: selected processor does
>>>> not support ARM mode `movwlt r0,32768'
>>>> /tmp/9649048_29.tmpdir/ccuBwQJJ.s:65: Error: selected processor does
>>>> not support ARM mode `movwge r0,32767'
>>>> make[4]: *** [_ssaddHQ.o] Error 1
>>>> make[4]: Leaving directory
>>>>
>>>> `/tmp/9649048_29.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc1/arm-none-eabi/fpu/libgcc'
>>>>
>>>
>>>
>>> I can't reproduce the failure, could you send the configure arguments for
>>> the build.
>>>
>>
>> If I'm not mistaken, that is:
>>   --target=arm-none-eabi  --disable-nls --disable-libgomp
>> --disable-libmudflap --disable-libcilkrts --enable-checking
>> --enable-languages=c,c++ --with-newlib
>>
>> Maybe you've disabled multilibs?
>
>
> I'm pretty sure I built this as part of reviewing all these patches
> with --with-mutlib-list=aprofile and didnt' see any failures. Not sure
> what's going on here.
>

I think the problem is that the new patterns use MOVW, which is a Thumb-2 
instruction, but don't check for Thumb-2 support in the target. I'm testing a 
patch to fix this.

Matthew

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 5/17][ARM] Enable HI mode moves for floating point values.
  2016-07-27 13:57   ` Ramana Radhakrishnan
@ 2016-09-26 13:20     ` Christophe Lyon
  2016-09-26 13:26       ` Matthew Wahab
  0 siblings, 1 reply; 73+ messages in thread
From: Christophe Lyon @ 2016-09-26 13:20 UTC (permalink / raw)
  To: Ramana Radhakrishnan; +Cc: Matthew Wahab, gcc-patches

Hi,

Sorry for the delay, I've been travelling.

On 27 July 2016 at 15:57, Ramana Radhakrishnan
<ramana.gcc@googlemail.com> wrote:
> On Tue, May 17, 2016 at 3:29 PM, Matthew Wahab
> <matthew.wahab@foss.arm.com> wrote:
>> The handling of 16-bit integer data-movement in the ARM backend doesn't
>> make full use of the VFP instructions when they are available, even when
>> the values are for use in VFP operations.
>>
>> This patch adds support for using the VFP instructions and registers
>> when moving 16-bit integer and floating point data between registers and
>> between registers and memory.
>>
>> Tested the series for arm-none-linux-gnueabihf with native bootstrap and
>> make check and for arm-none-eabi and armeb-none-eabi with make check on
>> an ARMv8.2-A emulator. Tested this patch for arm-none-linux-gnueabihf
>> with native bootstrap and make check and for arm-none-eabi with
>> check-gcc on an ARMv8.2-A emulator.
>>
>> Ok for trunk?
>

With this patch, I've noticed 2 regressions which still seem present on
trunk.

When GCC is configured with:
 --target=arm-none-linux-gnueabihf  --disable-libgomp
--disable-libmudflap --disable-libcilkrts --enable-checking
--enable-languages=c,c++,fortran --with-float=hard
--enable-build-with-cxx --with-mode=arm --with-cpu=cortex-a9
--with-fpu=vfp

and the tests run with -march=armv5t in RUNTESTFLAGS

FAIL: gcc.target/arm/constant-pool.c (test for excess errors)
because:
/ccepmUiD.s:29: Error: selected processor does not support ARM mode
`movw r0,4660'

FAIL: gfortran.fortran-torture/execute/nan_inf_fmt.f90 compilation,  -O2
because:
/cc76h4mz.s: Assembler messages:
/cc76h4mz.s:413: Error: selected processor does not support ARM mode
`movw r3,8224'
/cc76h4mz.s:496: Error: selected processor does not support ARM mode
`movw r2,8224'
/cc76h4mz.s:578: Error: selected processor does not support ARM mode
`movw ip,8224'

Christophe


>
> OK. ( the test function where this will make a difference is testhisf
> for the record) ...
>
> Ramana
>> Matthew
>>
>> 2016-05-17  Jiong Wang  <jiong.wang@arm.com>
>>             Matthew Wahab  <matthew.wahab@arm.com>
>>
>>         * config/arm/arm.c (output_move_vfp): Weaken assert to allow
>>         HImode.
>>         (arm_hard_regno_mode_ok): Allow HImode values in VFP registers.
>>         * config/arm/arm.md (*movhi_insn_arch4) Disable when VFP registers
>> are
>>         available.
>>         (*movhi_bytes): Likewise.
>>         * config/arm/vfp.md (*arm_movhi_vfp): New.
>>         (*thumb2_movhi_vfp): New.
>>
>> testsuite/
>> 2016-05-17  Matthew Wahab  <matthew.wahab@arm.com>
>>
>>         * gcc.target/arm/short-vfp-1.c: New.
>>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 6/17][ARM] Add data processing intrinsics for float16_t.
  2016-09-26 13:11           ` Ramana Radhakrishnan
  2016-09-26 13:19             ` Matthew Wahab
@ 2016-09-26 13:21             ` Christophe Lyon
  2016-09-26 20:02               ` Christophe Lyon
  1 sibling, 1 reply; 73+ messages in thread
From: Christophe Lyon @ 2016-09-26 13:21 UTC (permalink / raw)
  To: Ramana Radhakrishnan; +Cc: Matthew Wahab, gcc-patches

On 26 September 2016 at 15:03, Ramana Radhakrishnan
<ramana.gcc@googlemail.com> wrote:
> On Mon, Sep 26, 2016 at 1:48 PM, Christophe Lyon
> <christophe.lyon@linaro.org> wrote:
>> On 26 September 2016 at 11:43, Matthew Wahab <matthew.wahab@foss.arm.com> wrote:
>>> Hello,
>>>
>>> On 25/09/16 14:00, Christophe Lyon wrote:
>>>>>>
>>>>>>
>>>>>> This patch adds the new intrinsics:
>>>>>>   vbsl_f16, vbslq_f16, vdup_n_f16, vdupq_n_f16, vdup_lane_f16,
>>>>>>   vdupq_lane_f16, vext_f16, vextq_f16, vmov_n_f16, vmovq_n_f16,
>>>>>>   vrev64_f16, vrev64q_f16, vtrn_f16, vtrnq_f16, vuzp_f16, vuzpq_f16,
>>>>>>   vzip_f16, vzipq_f16.
>>>>>>
>>>>>> This patch also updates the advsimd-intrinsics testsuite to test the f16
>>>>>> variants for ARM targets. These intrinsics are only implemented in the
>>>>>> ARM target so the tests are disabled for AArch64 using an extra
>>>>>> condition on a new convenience macro FP16_SUPPORTED. This patch also
>>>>>> disables, for the ARM target, the testsuite defined macro vdup_n_f16 as
>>>>>> it is no longer needed.
>>>>
>>>>
>>>> Since you committed this patch, I've noticed that libgcc fails to build
>>>> when GCC is configured:
>>>> --target arm-none-eabi and default cpu
>>>> /tmp/9649048_29.tmpdir/ccuBwQJJ.s: Assembler messages:
>>>> /tmp/9649048_29.tmpdir/ccuBwQJJ.s:64: Error: selected processor does
>>>> not support ARM mode `movwlt r0,32768'
>>>> /tmp/9649048_29.tmpdir/ccuBwQJJ.s:65: Error: selected processor does
>>>> not support ARM mode `movwge r0,32767'
>>>> make[4]: *** [_ssaddHQ.o] Error 1
>>>> make[4]: Leaving directory
>>>>
>>>> `/tmp/9649048_29.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc1/arm-none-eabi/fpu/libgcc'
>>>>
>>>
>>>
>>> I can't reproduce the failure, could you send the configure arguments for
>>> the build.
>>>
>>
>> If I'm not mistaken, that is:
>>  --target=arm-none-eabi  --disable-nls --disable-libgomp
>> --disable-libmudflap --disable-libcilkrts --enable-checking
>> --enable-languages=c,c++ --with-newlib
>>
>> Maybe you've disabled multilibs?
>
>
> I'm pretty sure I built this as part of reviewing all these patches
> with --with-mutlib-list=aprofile and didnt' see any failures. Not sure
> what's going on here.
>

I'm not using very recent binutils, (can't remember if that's 2.25 or 2.26,
and I can't easily check remotely). Maybe something changed in
the assembler?

> Ramana
>>
>>> I've tried assembling the string 'movw r0, 32768' and get the error when
>>> -march=armv6kz or earlier. I suspect the new movhi and/or movhf patterns
>>> added earlier in the series need the architecture level added as a
>>> precondition but I'll need to look into it.
>>>
>>> Matthew
>>
>> Thanks,
>>
>> Christophe

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 5/17][ARM] Enable HI mode moves for floating point values.
  2016-09-26 13:20     ` Christophe Lyon
@ 2016-09-26 13:26       ` Matthew Wahab
  0 siblings, 0 replies; 73+ messages in thread
From: Matthew Wahab @ 2016-09-26 13:26 UTC (permalink / raw)
  To: Christophe Lyon, Ramana Radhakrishnan; +Cc: gcc-patches

On 26/09/16 14:15, Christophe Lyon wrote:
> Hi,
>
> Sorry for the delay, I've been travelling.
>
> On 27 July 2016 at 15:57, Ramana Radhakrishnan
> <ramana.gcc@googlemail.com> wrote:
>> On Tue, May 17, 2016 at 3:29 PM, Matthew Wahab
>> <matthew.wahab@foss.arm.com> wrote:
>>> The handling of 16-bit integer data-movement in the ARM backend doesn't
>>> make full use of the VFP instructions when they are available, even when
>>> the values are for use in VFP operations.
>>>
>>> This patch adds support for using the VFP instructions and registers
>>> when moving 16-bit integer and floating point data between registers and
>>> between registers and memory.
>>>
> With this patch, I've noticed 2 regressions which still seem present on
> trunk.
>
> When GCC is configured with:
>   --target=arm-none-linux-gnueabihf  --disable-libgomp
> --disable-libmudflap --disable-libcilkrts --enable-checking
> --enable-languages=c,c++,fortran --with-float=hard
> --enable-build-with-cxx --with-mode=arm --with-cpu=cortex-a9
> --with-fpu=vfp
>
> and the tests run with -march=armv5t in RUNTESTFLAGS
>
> FAIL: gcc.target/arm/constant-pool.c (test for excess errors)
> because:
> /ccepmUiD.s:29: Error: selected processor does not support ARM mode
> `movw r0,4660'
>
> FAIL: gfortran.fortran-torture/execute/nan_inf_fmt.f90 compilation,  -O2
> because:
> /cc76h4mz.s: Assembler messages:
> /cc76h4mz.s:413: Error: selected processor does not support ARM mode
> `movw r3,8224'
> /cc76h4mz.s:496: Error: selected processor does not support ARM mode
> `movw r2,8224'
> /cc76h4mz.s:578: Error: selected processor does not support ARM mode
> `movw ip,8224'
>
> Christophe

I suspect that this is the same MOVW problem as on the next patch in the series.

Matthew

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 6/17][ARM] Add data processing intrinsics for float16_t.
  2016-09-26 13:21             ` Christophe Lyon
@ 2016-09-26 20:02               ` Christophe Lyon
  0 siblings, 0 replies; 73+ messages in thread
From: Christophe Lyon @ 2016-09-26 20:02 UTC (permalink / raw)
  To: Ramana Radhakrishnan; +Cc: Matthew Wahab, gcc-patches

On 26 September 2016 at 15:19, Christophe Lyon
<christophe.lyon@linaro.org> wrote:
> On 26 September 2016 at 15:03, Ramana Radhakrishnan
> <ramana.gcc@googlemail.com> wrote:
>> On Mon, Sep 26, 2016 at 1:48 PM, Christophe Lyon
>> <christophe.lyon@linaro.org> wrote:
>>> On 26 September 2016 at 11:43, Matthew Wahab <matthew.wahab@foss.arm.com> wrote:
>>>> Hello,
>>>>
>>>> On 25/09/16 14:00, Christophe Lyon wrote:
>>>>>>>
>>>>>>>
>>>>>>> This patch adds the new intrinsics:
>>>>>>>   vbsl_f16, vbslq_f16, vdup_n_f16, vdupq_n_f16, vdup_lane_f16,
>>>>>>>   vdupq_lane_f16, vext_f16, vextq_f16, vmov_n_f16, vmovq_n_f16,
>>>>>>>   vrev64_f16, vrev64q_f16, vtrn_f16, vtrnq_f16, vuzp_f16, vuzpq_f16,
>>>>>>>   vzip_f16, vzipq_f16.
>>>>>>>
>>>>>>> This patch also updates the advsimd-intrinsics testsuite to test the f16
>>>>>>> variants for ARM targets. These intrinsics are only implemented in the
>>>>>>> ARM target so the tests are disabled for AArch64 using an extra
>>>>>>> condition on a new convenience macro FP16_SUPPORTED. This patch also
>>>>>>> disables, for the ARM target, the testsuite defined macro vdup_n_f16 as
>>>>>>> it is no longer needed.
>>>>>
>>>>>
>>>>> Since you committed this patch, I've noticed that libgcc fails to build
>>>>> when GCC is configured:
>>>>> --target arm-none-eabi and default cpu
>>>>> /tmp/9649048_29.tmpdir/ccuBwQJJ.s: Assembler messages:
>>>>> /tmp/9649048_29.tmpdir/ccuBwQJJ.s:64: Error: selected processor does
>>>>> not support ARM mode `movwlt r0,32768'
>>>>> /tmp/9649048_29.tmpdir/ccuBwQJJ.s:65: Error: selected processor does
>>>>> not support ARM mode `movwge r0,32767'
>>>>> make[4]: *** [_ssaddHQ.o] Error 1
>>>>> make[4]: Leaving directory
>>>>>
>>>>> `/tmp/9649048_29.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc1/arm-none-eabi/fpu/libgcc'
>>>>>
>>>>
>>>>
>>>> I can't reproduce the failure, could you send the configure arguments for
>>>> the build.
>>>>
>>>
>>> If I'm not mistaken, that is:
>>>  --target=arm-none-eabi  --disable-nls --disable-libgomp
>>> --disable-libmudflap --disable-libcilkrts --enable-checking
>>> --enable-languages=c,c++ --with-newlib
>>>
>>> Maybe you've disabled multilibs?
>>
>>
>> I'm pretty sure I built this as part of reviewing all these patches
>> with --with-mutlib-list=aprofile and didnt' see any failures. Not sure
>> what's going on here.
>>
>
> I'm not using very recent binutils, (can't remember if that's 2.25 or 2.26,
> and I can't easily check remotely). Maybe something changed in
> the assembler?
>
I checked, and I am still using 2.25. Was there a change in gas
in this area since then?

Christophe

>> Ramana
>>>
>>>> I've tried assembling the string 'movw r0, 32768' and get the error when
>>>> -march=armv6kz or earlier. I suspect the new movhi and/or movhf patterns
>>>> added earlier in the series need the architecture level added as a
>>>> precondition but I'll need to look into it.
>>>>
>>>> Matthew
>>>
>>> Thanks,
>>>
>>> Christophe

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [arm-embedded][committed][PATCH 10/17] Refactor support code for NEON builtins.
  2016-07-28 11:54   ` Ramana Radhakrishnan
@ 2016-12-05 16:47     ` Andre Vieira (lists)
  0 siblings, 0 replies; 73+ messages in thread
From: Andre Vieira (lists) @ 2016-12-05 16:47 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2117 bytes --]

On 28/07/16 12:54, Ramana Radhakrishnan wrote:
> On Tue, May 17, 2016 at 3:39 PM, Matthew Wahab
> <matthew.wahab@foss.arm.com> wrote:
>> The ACLE intrinsics introduced to support the ARMv8.2 FP16 extensions
>> require that intrinsics for scalar (VFP) instructions are available
>> under different conditions from those for the NEON intrinsics. To
>> support this, changes to the builtins support code are needed to enable
>> the scalar intrinsics to be initialized and expanded independently of
>> the NEON intrinsics.
>>
>> This patch prepares for this by refactoring some of the builtin support
>> code so that it can be used for both the scalar and the NEON intrinsics.
>>
>> Tested the series for arm-none-linux-gnueabihf with native bootstrap and
>> make check and for arm-none-eabi and armeb-none-eabi with make check on
>> an ARMv8.2-A emulator.
> 
> 
> OK.
> 
> Ramana
>>
>> Ok for trunk?
>> Matthew
>>
>> 2016-05-17  Matthew Wahab  <matthew.wahab@arm.com>
>>
>>         * config/arm/arm-builtins.c (ARM_BUILTIN_NEON_PATTERN_START):
>>         Change offset calculation.
>>         (arm_init_neon_builtin): New.
>>         (arm_init_builtins): Move body of a loop to the standalone
>>         function arm_init_neon_builtin.
>>         (arm_expand_neon_builtin_1): New.  Update comment.  Function body
>>         moved from arm_expand_neon_builtin with some white-space fixes.
>>         (arm_expand_neon_builtin): Move code into the standalone function
>>         arm_expand_neon_builtin_1.
>>
> 
Hi,

Backported this to embedded-6-branch in revision r<revison>.


gcc/ChangeLog.arm:

2016-12-05  Andre Vieira  <andre.simoesdiasvieira@arm.com>

        Backport from mainline
        2016-09-23  Matthew Wahab  <matthew.wahab@arm.com>

	* config/arm/arm-builtins.c (arm_init_neon_builtin): New.
	(arm_init_builtins): Move body of a loop to the standalone
	function arm_init_neon_builtin.
	(arm_expand_neon_builtin_1): New.  Update comment.  Function body
	moved from arm_neon_builtin with some white-space fixes.
	(arm_expand_neon_builtin): Move code into the standalone function
	arm_expand_neon_builtin_1.

[-- Attachment #2: diff0 --]
[-- Type: text/plain, Size: 13410 bytes --]

diff --git a/gcc/ChangeLog.arm b/gcc/ChangeLog.arm
index 800a4b60efe7fe5ba9077217b7eb1271e9e05180..d9c71983cf05c1fe6b7578e2c3d43a581412e708 100644
--- a/gcc/ChangeLog.arm
+++ b/gcc/ChangeLog.arm
@@ -1,6 +1,19 @@
 2016-12-05  Andre Vieira  <andre.simoesdiasvieira@arm.com>
 
 	Backport from mainline
+	2016-09-23  Matthew Wahab  <matthew.wahab@arm.com>
+
+	 * config/arm/arm-builtins.c (arm_init_neon_builtin): New.
+	 (arm_init_builtins): Move body of a loop to the standalone
+	 function arm_init_neon_builtin.
+	 (arm_expand_neon_builtin_1): New.  Update comment.  Function body
+	 moved from arm_neon_builtin with some white-space fixes.
+	 (arm_expand_neon_builtin): Move code into the standalone function
+	 arm_expand_neon_builtin_1.
+
+2016-12-05  Andre Vieira  <andre.simoesdiasvieira@arm.com>
+
+	Backport from mainline
 	2016-12-02  Andre Vieira  <andre.simoesdiasvieira@arm.com>
 		    Thomas Preud'homme	<thomas.preudhomme@arm.com>
 
diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index ac56648706cd81a35fc32bde0bf3fc723387f5d5..b747837313f9ec28496245f253071ac5bd8b08f9 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -545,7 +545,7 @@ enum arm_builtins
 };
 
 #define ARM_BUILTIN_NEON_PATTERN_START \
-    (ARM_BUILTIN_MAX - ARRAY_SIZE (neon_builtin_data))
+  (ARM_BUILTIN_NEON_BASE + 1)
 
 #undef CF
 #undef VAR1
@@ -897,6 +897,110 @@ arm_init_simd_builtin_scalar_types (void)
 					     "__builtin_neon_uti");
 }
 
+/* Set up a NEON builtin.  */
+
+static void
+arm_init_neon_builtin (unsigned int fcode,
+		       neon_builtin_datum *d)
+{
+  bool print_type_signature_p = false;
+  char type_signature[SIMD_MAX_BUILTIN_ARGS] = { 0 };
+  char namebuf[60];
+  tree ftype = NULL;
+  tree fndecl = NULL;
+
+  d->fcode = fcode;
+
+  /* We must track two variables here.  op_num is
+     the operand number as in the RTL pattern.  This is
+     required to access the mode (e.g. V4SF mode) of the
+     argument, from which the base type can be derived.
+     arg_num is an index in to the qualifiers data, which
+     gives qualifiers to the type (e.g. const unsigned).
+     The reason these two variables may differ by one is the
+     void return type.  While all return types take the 0th entry
+     in the qualifiers array, there is no operand for them in the
+     RTL pattern.  */
+  int op_num = insn_data[d->code].n_operands - 1;
+  int arg_num = d->qualifiers[0] & qualifier_void
+    ? op_num + 1
+    : op_num;
+  tree return_type = void_type_node, args = void_list_node;
+  tree eltype;
+
+  /* Build a function type directly from the insn_data for this
+     builtin.  The build_function_type () function takes care of
+     removing duplicates for us.  */
+  for (; op_num >= 0; arg_num--, op_num--)
+    {
+      machine_mode op_mode = insn_data[d->code].operand[op_num].mode;
+      enum arm_type_qualifiers qualifiers = d->qualifiers[arg_num];
+
+      if (qualifiers & qualifier_unsigned)
+	{
+	  type_signature[arg_num] = 'u';
+	  print_type_signature_p = true;
+	}
+      else if (qualifiers & qualifier_poly)
+	{
+	  type_signature[arg_num] = 'p';
+	  print_type_signature_p = true;
+	}
+      else
+	type_signature[arg_num] = 's';
+
+      /* Skip an internal operand for vget_{low, high}.  */
+      if (qualifiers & qualifier_internal)
+	continue;
+
+      /* Some builtins have different user-facing types
+	 for certain arguments, encoded in d->mode.  */
+      if (qualifiers & qualifier_map_mode)
+	op_mode = d->mode;
+
+      /* For pointers, we want a pointer to the basic type
+	 of the vector.  */
+      if (qualifiers & qualifier_pointer && VECTOR_MODE_P (op_mode))
+	op_mode = GET_MODE_INNER (op_mode);
+
+      eltype = arm_simd_builtin_type
+	(op_mode,
+	 (qualifiers & qualifier_unsigned) != 0,
+	 (qualifiers & qualifier_poly) != 0);
+      gcc_assert (eltype != NULL);
+
+      /* Add qualifiers.  */
+      if (qualifiers & qualifier_const)
+	eltype = build_qualified_type (eltype, TYPE_QUAL_CONST);
+
+      if (qualifiers & qualifier_pointer)
+	eltype = build_pointer_type (eltype);
+
+      /* If we have reached arg_num == 0, we are at a non-void
+	 return type.  Otherwise, we are still processing
+	 arguments.  */
+      if (arg_num == 0)
+	return_type = eltype;
+      else
+	args = tree_cons (NULL_TREE, eltype, args);
+    }
+
+  ftype = build_function_type (return_type, args);
+
+  gcc_assert (ftype != NULL);
+
+  if (print_type_signature_p)
+    snprintf (namebuf, sizeof (namebuf), "__builtin_neon_%s_%s",
+	      d->name, type_signature);
+  else
+    snprintf (namebuf, sizeof (namebuf), "__builtin_neon_%s",
+	      d->name);
+
+  fndecl = add_builtin_function (namebuf, ftype, fcode, BUILT_IN_MD,
+				 NULL, NULL_TREE);
+  arm_builtin_decls[fcode] = fndecl;
+}
+
 /* Set up all the NEON builtins, even builtins for instructions that are not
    in the current target ISA to allow the user to compile particular modules
    with different target specific options that differ from the command line
@@ -926,103 +1030,8 @@ arm_init_neon_builtins (void)
 
   for (i = 0; i < ARRAY_SIZE (neon_builtin_data); i++, fcode++)
     {
-      bool print_type_signature_p = false;
-      char type_signature[SIMD_MAX_BUILTIN_ARGS] = { 0 };
       neon_builtin_datum *d = &neon_builtin_data[i];
-      char namebuf[60];
-      tree ftype = NULL;
-      tree fndecl = NULL;
-
-      d->fcode = fcode;
-
-      /* We must track two variables here.  op_num is
-	 the operand number as in the RTL pattern.  This is
-	 required to access the mode (e.g. V4SF mode) of the
-	 argument, from which the base type can be derived.
-	 arg_num is an index in to the qualifiers data, which
-	 gives qualifiers to the type (e.g. const unsigned).
-	 The reason these two variables may differ by one is the
-	 void return type.  While all return types take the 0th entry
-	 in the qualifiers array, there is no operand for them in the
-	 RTL pattern.  */
-      int op_num = insn_data[d->code].n_operands - 1;
-      int arg_num = d->qualifiers[0] & qualifier_void
-		      ? op_num + 1
-		      : op_num;
-      tree return_type = void_type_node, args = void_list_node;
-      tree eltype;
-
-      /* Build a function type directly from the insn_data for this
-	 builtin.  The build_function_type () function takes care of
-	 removing duplicates for us.  */
-      for (; op_num >= 0; arg_num--, op_num--)
-	{
-	  machine_mode op_mode = insn_data[d->code].operand[op_num].mode;
-	  enum arm_type_qualifiers qualifiers = d->qualifiers[arg_num];
-
-	  if (qualifiers & qualifier_unsigned)
-	    {
-	      type_signature[arg_num] = 'u';
-	      print_type_signature_p = true;
-	    }
-	  else if (qualifiers & qualifier_poly)
-	    {
-	      type_signature[arg_num] = 'p';
-	      print_type_signature_p = true;
-	    }
-	  else
-	    type_signature[arg_num] = 's';
-
-	  /* Skip an internal operand for vget_{low, high}.  */
-	  if (qualifiers & qualifier_internal)
-	    continue;
-
-	  /* Some builtins have different user-facing types
-	     for certain arguments, encoded in d->mode.  */
-	  if (qualifiers & qualifier_map_mode)
-	      op_mode = d->mode;
-
-	  /* For pointers, we want a pointer to the basic type
-	     of the vector.  */
-	  if (qualifiers & qualifier_pointer && VECTOR_MODE_P (op_mode))
-	    op_mode = GET_MODE_INNER (op_mode);
-
-	  eltype = arm_simd_builtin_type
-		     (op_mode,
-		      (qualifiers & qualifier_unsigned) != 0,
-		      (qualifiers & qualifier_poly) != 0);
-	  gcc_assert (eltype != NULL);
-
-	  /* Add qualifiers.  */
-	  if (qualifiers & qualifier_const)
-	    eltype = build_qualified_type (eltype, TYPE_QUAL_CONST);
-
-	  if (qualifiers & qualifier_pointer)
-	      eltype = build_pointer_type (eltype);
-
-	  /* If we have reached arg_num == 0, we are at a non-void
-	     return type.  Otherwise, we are still processing
-	     arguments.  */
-	  if (arg_num == 0)
-	    return_type = eltype;
-	  else
-	    args = tree_cons (NULL_TREE, eltype, args);
-	}
-
-      ftype = build_function_type (return_type, args);
-
-      gcc_assert (ftype != NULL);
-
-      if (print_type_signature_p)
-	snprintf (namebuf, sizeof (namebuf), "__builtin_neon_%s_%s",
-		  d->name, type_signature);
-      else
-	snprintf (namebuf, sizeof (namebuf), "__builtin_neon_%s",
-		  d->name);
-
-      fndecl = add_builtin_function (namebuf, ftype, fcode, BUILT_IN_MD,
-				     NULL, NULL_TREE);
-      arm_builtin_decls[fcode] = fndecl;
+      arm_init_neon_builtin (fcode, d);
     }
 }
 
@@ -2224,40 +2233,16 @@ constant_arg:
   return target;
 }
 
-/* Expand a Neon builtin, i.e. those registered only if TARGET_NEON holds.
-   Most of these are "special" because they don't have symbolic
-   constants defined per-instruction or per instruction-variant. Instead, the
-   required info is looked up in the table neon_builtin_data.  */
+/* Expand a neon builtin.  This is also used for vfp builtins, which behave in
+   the same way.  These builtins are "special" because they don't have symbolic
+   constants defined per-instruction or per instruction-variant.  Instead, the
+   required info is looked up in the NEON_BUILTIN_DATA record that is passed
+   into the function.  */
+
 static rtx
-arm_expand_neon_builtin (int fcode, tree exp, rtx target)
+arm_expand_neon_builtin_1 (int fcode, tree exp, rtx target,
+			   neon_builtin_datum *d)
 {
-  /* Check in the context of the function making the call whether the
-     builtin is supported.  */
-  if (! TARGET_NEON)
-    {
-      fatal_error (input_location,
-		   "You must enable NEON instructions (e.g. -mfloat-abi=softfp -mfpu=neon) to use these intrinsics.");
-      return const0_rtx;
-    }
-
-  if (fcode == ARM_BUILTIN_NEON_LANE_CHECK)
-    {
-      /* Builtin is only to check bounds of the lane passed to some intrinsics
-	 that are implemented with gcc vector extensions in arm_neon.h.  */
-
-      tree nlanes = CALL_EXPR_ARG (exp, 0);
-      gcc_assert (TREE_CODE (nlanes) == INTEGER_CST);
-      rtx lane_idx = expand_normal (CALL_EXPR_ARG (exp, 1));
-      if (CONST_INT_P (lane_idx))
-	neon_lane_bounds (lane_idx, 0, TREE_INT_CST_LOW (nlanes), exp);
-      else
-	error ("%Klane index must be a constant immediate", exp);
-      /* Don't generate any RTL.  */
-      return const0_rtx;
-    }
-
-  neon_builtin_datum *d =
-		&neon_builtin_data[fcode - ARM_BUILTIN_NEON_PATTERN_START];
   enum insn_code icode = d->code;
   builtin_arg args[SIMD_MAX_BUILTIN_ARGS + 1];
   int num_args = insn_data[d->code].n_operands;
@@ -2273,8 +2258,8 @@ arm_expand_neon_builtin (int fcode, tree exp, rtx target)
       /* We have four arrays of data, each indexed in a different fashion.
 	 qualifiers - element 0 always describes the function return type.
 	 operands - element 0 is either the operand for return value (if
-	   the function has a non-void return type) or the operand for the
-	   first argument.
+	 the function has a non-void return type) or the operand for the
+	 first argument.
 	 expr_args - element 0 always holds the first argument.
 	 args - element 0 is always used for the return type.  */
       int qualifiers_k = k;
@@ -2296,7 +2281,7 @@ arm_expand_neon_builtin (int fcode, tree exp, rtx target)
 	  bool op_const_int_p =
 	    (CONST_INT_P (arg)
 	     && (*insn_data[icode].operand[operands_k].predicate)
-		(arg, insn_data[icode].operand[operands_k].mode));
+	     (arg, insn_data[icode].operand[operands_k].mode));
 	  args[k] = op_const_int_p ? NEON_ARG_CONSTANT : NEON_ARG_COPY_TO_REG;
 	}
       else if (d->qualifiers[qualifiers_k] & qualifier_pointer)
@@ -2309,8 +2294,47 @@ arm_expand_neon_builtin (int fcode, tree exp, rtx target)
   /* The interface to arm_expand_neon_args expects a 0 if
      the function is void, and a 1 if it is not.  */
   return arm_expand_neon_args
-	  (target, d->mode, fcode, icode, !is_void, exp,
-	   &args[1]);
+    (target, d->mode, fcode, icode, !is_void, exp,
+     &args[1]);
+}
+
+/* Expand a Neon builtin, i.e. those registered only if TARGET_NEON holds.
+   Most of these are "special" because they don't have symbolic
+   constants defined per-instruction or per instruction-variant.  Instead, the
+   required info is looked up in the table neon_builtin_data.  */
+
+static rtx
+arm_expand_neon_builtin (int fcode, tree exp, rtx target)
+{
+  if (fcode >= ARM_BUILTIN_NEON_BASE && ! TARGET_NEON)
+    {
+      fatal_error (input_location,
+		   "You must enable NEON instructions"
+		   " (e.g. -mfloat-abi=softfp -mfpu=neon)"
+		   " to use these intrinsics.");
+      return const0_rtx;
+    }
+
+  if (fcode == ARM_BUILTIN_NEON_LANE_CHECK)
+    {
+      /* Builtin is only to check bounds of the lane passed to some intrinsics
+	 that are implemented with gcc vector extensions in arm_neon.h.  */
+
+      tree nlanes = CALL_EXPR_ARG (exp, 0);
+      gcc_assert (TREE_CODE (nlanes) == INTEGER_CST);
+      rtx lane_idx = expand_normal (CALL_EXPR_ARG (exp, 1));
+      if (CONST_INT_P (lane_idx))
+	neon_lane_bounds (lane_idx, 0, TREE_INT_CST_LOW (nlanes), exp);
+      else
+	error ("%Klane index must be a constant immediate", exp);
+      /* Don't generate any RTL.  */
+      return const0_rtx;
+    }
+
+  neon_builtin_datum *d
+    = &neon_builtin_data[fcode - ARM_BUILTIN_NEON_PATTERN_START];
+
+  return arm_expand_neon_builtin_1 (fcode, exp, target, d);
 }
 
 /* Expand an expression EXP that calls a built-in function,

^ permalink raw reply	[flat|nested] 73+ messages in thread

end of thread, other threads:[~2016-12-05 16:47 UTC | newest]

Thread overview: 73+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-17 14:20 [PATCH 0/17][ARM] ARMv8.2-A and FP16 extension support Matthew Wahab
2016-05-17 14:23 ` [PATCH 1/17][ARM] Add ARMv8.2-A command line option and profile Matthew Wahab
2016-07-04 13:46   ` Matthew Wahab
2016-09-21 13:57     ` Ramana Radhakrishnan
2016-05-17 14:25 ` [PATCH 2/17][Testsuite] Add a selector for ARM FP16 alternative format support Matthew Wahab
2016-07-27 13:34   ` Ramana Radhakrishnan
2016-05-17 14:26 ` [PATCH 3/17][Testsuite] Add ARM support for ARMv8.2-A with FP16 arithmetic instructions Matthew Wahab
2016-07-04 13:49   ` Matthew Wahab
2016-07-27 13:34     ` Ramana Radhakrishnan
2016-05-17 14:28 ` [PATCH 4/17][ARM] Define feature macros for FP16 Matthew Wahab
2016-07-27 13:35   ` Ramana Radhakrishnan
2016-05-17 14:29 ` [PATCH 5/17][ARM] Enable HI mode moves for floating point values Matthew Wahab
2016-07-27 13:57   ` Ramana Radhakrishnan
2016-09-26 13:20     ` Christophe Lyon
2016-09-26 13:26       ` Matthew Wahab
2016-05-17 14:32 ` [PATCH 6/17][ARM] Add data processing intrinsics for float16_t Matthew Wahab
2016-07-27 13:59   ` Ramana Radhakrishnan
2016-09-25 14:44     ` Christophe Lyon
2016-09-26  9:56       ` Matthew Wahab
2016-09-26 12:54         ` Christophe Lyon
2016-09-26 13:11           ` Ramana Radhakrishnan
2016-09-26 13:19             ` Matthew Wahab
2016-09-26 13:21             ` Christophe Lyon
2016-09-26 20:02               ` Christophe Lyon
2016-05-17 14:34 ` [PATCH 7/17][ARM] Add FP16 data movement instructions Matthew Wahab
2016-07-04 13:57   ` Matthew Wahab
2016-07-27 14:01     ` Ramana Radhakrishnan
2016-05-17 14:36 ` [PATCH 8/17][ARM] Add VFP FP16 arithmetic instructions Matthew Wahab
2016-05-18  0:52   ` Joseph Myers
2016-05-18  0:57     ` Joseph Myers
2016-05-18 13:40     ` Matthew Wahab
2016-05-18 15:21       ` Joseph Myers
2016-05-19 14:54         ` Matthew Wahab
2016-07-04 14:02   ` Matthew Wahab
2016-07-28 11:37     ` Ramana Radhakrishnan
2016-08-03 11:52       ` Ramana Radhakrishnan
2016-08-03 13:10         ` Matthew Wahab
2016-08-03 14:45         ` James Greenhalgh
2016-08-03 17:44         ` Joseph Myers
2016-05-17 14:37 ` [PATCH 9/17][ARM] Add NEON " Matthew Wahab
2016-05-18  0:58   ` Joseph Myers
2016-05-19 17:01     ` Jiong Wang
2016-05-19 17:29       ` Joseph Myers
2016-06-08  8:46         ` James Greenhalgh
2016-06-08 20:02           ` Joseph Myers
2016-07-04 14:09     ` Matthew Wahab
2016-07-28 11:53       ` Ramana Radhakrishnan
2016-05-17 14:39 ` [PATCH 10/17][ARM] Refactor support code for NEON builtins Matthew Wahab
2016-07-28 11:54   ` Ramana Radhakrishnan
2016-12-05 16:47     ` [arm-embedded][committed][PATCH 10/17] " Andre Vieira (lists)
2016-05-17 14:41 ` [PATCH 11/17][ARM] Add builtins for VFP FP16 intrinsics Matthew Wahab
2016-07-04 14:12   ` Matthew Wahab
2016-07-28 11:55     ` Ramana Radhakrishnan
2016-05-17 14:43 ` [PATCH 12/17][ARM] Add builtins for NEON " Matthew Wahab
2016-07-04 14:13   ` Matthew Wahab
2016-07-28 11:56     ` Ramana Radhakrishnan
2016-05-17 14:44 ` [PATCH 13/17][ARM] Add VFP FP16 instrinsics Matthew Wahab
2016-07-04 14:14   ` Matthew Wahab
2016-07-28 11:57     ` Ramana Radhakrishnan
2016-05-17 14:47 ` [PATCH 14/17][ARM] Add NEON " Matthew Wahab
2016-07-04 14:16   ` Matthew Wahab
2016-08-03 12:57     ` Ramana Radhakrishnan
2016-05-17 14:49 ` [PATCH 15/17][ARM] Add tests for ARMv8.2-A FP16 support Matthew Wahab
2016-07-04 14:17   ` Matthew Wahab
2016-08-04  8:34     ` Ramana Radhakrishnan
2016-05-17 14:51 ` [PATCH 16/17][ARM] Add tests for VFP FP16 ACLE instrinsics Matthew Wahab
2016-05-18  1:07   ` Joseph Myers
2016-05-18 10:58     ` Matthew Wahab
2016-07-04 14:18       ` Matthew Wahab
2016-08-04  8:35         ` Ramana Radhakrishnan
2016-05-17 14:52 ` [PATCH 17/17][ARM] Add tests for NEON FP16 ACLE intrinsics Matthew Wahab
2016-07-04 14:22   ` Matthew Wahab
2016-08-04  9:01     ` Ramana Radhakrishnan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).