From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-395799-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 68510 invoked by alias); 22 Apr 2015 17:04:08 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 68497 invoked by uid 89); 22 Apr 2015 17:04:07 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00,KAM_LOTSOFHASH,SPF_PASS autolearn=no version=3.3.2
X-HELO: eu-smtp-delivery-143.mimecast.com
Received: from eu-smtp-delivery-143.mimecast.com (HELO eu-smtp-delivery-143.mimecast.com) (146.101.78.143) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 22 Apr 2015 17:04:04 +0000
Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.140]) by uk-mta-3.uk.mimecast.lan; Wed, 22 Apr 2015 18:04:00 +0100
Received: from [10.2.207.65] ([10.1.2.79]) by cam-owa1.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959);	 Wed, 22 Apr 2015 18:04:00 +0100
Message-ID: <5537D47F.6020103@arm.com>
Date: Wed, 22 Apr 2015 17:04:00 -0000
From: Alan Lawrence <alan.lawrence@arm.com>
User-Agent: Thunderbird 2.0.0.24 (X11/20101213)
MIME-Version: 1.0
To: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: [PATCH 5/14][AArch64] Add basic fp16 support
X-MC-Unique: FCooeaFURo6AJc0eGYwpmw-1
Content-Type: multipart/mixed; boundary="------------060407040604030601090808"
X-IsSubscribed: yes
X-SW-Source: 2015-04/txt/msg01339.txt.bz2

This is a multi-part message in MIME format.
--------------060407040604030601090808
Content-Type: text/plain; charset=WINDOWS-1252; format=flowed
Content-Transfer-Encoding: quoted-printable
Content-length: 2696

This adds basic support for moving __fp16 values around, passing and return=
ing,=20
and operating on them by promoting to 32-bit floats. Also a few scalar test=
cases.

Note I've not got an fmov (immediate) variant, because there is no 'fmov h<=
n>,=20
...' - the only way to load a 16-bit immediate is to reinterpret the bit pa=
ttern=20
into some other type. Vector MOVs are turned off for the same reason. If th=
is is=20
practical it can follow in a separate patch.

My reading of ACLE suggests the type name to use is __fp16, rather than=20
__builtin_aarch64_simd_hf. I can use the latter if that's preferable?

int<->f16 conversions are a little odd, assembly

int_to_f16: scvtf d0, w0 fcvt h0, d0 ret

int_from_f16: fcvt s0, h0 fcvtzs w0, s0 ret

The spec is silent on the absence or existence of intermediate rounding ste=
ps,=20
however, I don't think this matters: even float32_t offers soooo many more =
bits=20
than __fp16, that any integer which fits into the range of an __fp16 (i.e. =
is=20
not infinite), can be expressed exactly as a float32_t without any loss of=
=20
precision. So I think the above are OK. (if they can be optimized, that can=
=20
follow in a later patch.)

Note that unlike ARM, where we support both IEEE and Alternative formats (a=
nd,=20
somewhat-awkwardly, format-agnostic code too), here we are settling on IEEE=
=20
format always. Technically, we should output an EABI attribute saying which=
=20
format we are using here, however, aarch64 asm does not support the=20
.eabi-attribute directive yet, so it seems reasonable to leave this while t=
here=20
is only one possible format.

Bootstrapped + check-gcc on aarch64-none-linux-gnu.

gcc/ChangeLog:

	* config/aarch64/aarch64-builtins.c (aarch64_fp16_type_node): New.
	(aarch64_init_builtins): Make aarch64_fp16_type_node, use for __fp16.

	* config/aarch64/aarch64-modes.def: Add HFmode.

	* config/aarch64/aarch64.h (TARGET_CPU_CPP_BUILTINS): Define
	__ARM_FP16_FORMAT_IEEE and __ARM_FP16_ARGS. Set bit 1 of __ARM_FP.

	* config/aarch64/aarch64.c (aarch64_init_libfuncs,
	aarch64_promoted_type): New.

	(aarch64_float_const_representable_p): Disable HFmode.
	(aarch64_mangle_type): Mangle half-precision floats to "Dh".
	(TARGET_PROMOTED_TYPE): Define to aarch64_promoted_type.
	(TARGET_INIT_LIBFUNCS): Define to aarch64_init_libfuncs.

	* config/aarch64/aarch64.md (mov<mode>): Include HFmode using GPF_F16.
	(movhf_aarch64, extendhfsf2, extendhfdf2, truncsfhf2, truncdfhf2): New.

	* config/aarch64/iterators.md (GPF_F16): New.


gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/f16_convs_1.c: New test.
	* gcc.target/aarch64/f16_convs_2.c: New test.
	* gcc.target/aarch64/f16_movs_1.c: New test.

--------------060407040604030601090808
Content-Type: text/x-patch; name=05_aarch64_basic_fp16.patch
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline;
 filename="05_aarch64_basic_fp16.patch"
Content-length: 12568

diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aar=
ch64-builtins.c
index 87f1ac2ec1e3c774782c567b20c673802ae90d99..5a7b112bd1fe77826bfb84383c8=
6dceb6b1521e3 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -453,6 +453,9 @@ static struct aarch64_simd_type_info aarch64_simd_types=
 [] =3D {
 };
 #undef ENTRY
=20
+/* This type is not SIMD-specific; it is the user-visible __fp16.  */
+static tree aarch64_fp16_type_node =3D NULL_TREE;
+
 static tree aarch64_simd_intOI_type_node =3D NULL_TREE;
 static tree aarch64_simd_intEI_type_node =3D NULL_TREE;
 static tree aarch64_simd_intCI_type_node =3D NULL_TREE;
@@ -862,6 +865,12 @@ aarch64_init_builtins (void)
     =3D add_builtin_function ("__builtin_aarch64_set_fpsr", ftype_set_fpr,
 			    AARCH64_BUILTIN_SET_FPSR, BUILT_IN_MD, NULL, NULL_TREE);
=20
+  aarch64_fp16_type_node =3D make_node (REAL_TYPE);
+  TYPE_PRECISION (aarch64_fp16_type_node) =3D 16;
+  layout_type (aarch64_fp16_type_node);
+
+  (*lang_hooks.types.register_builtin_type) (aarch64_fp16_type_node, "__fp=
16");
+
   if (TARGET_SIMD)
     aarch64_init_simd_builtins ();
   if (TARGET_CRC32)
diff --git a/gcc/config/aarch64/aarch64-modes.def b/gcc/config/aarch64/aarc=
h64-modes.def
index b17b90d90601ae0a631a78560da743720c4638ce..c30059b632fa8cb7fd9071917d3=
f581f0966a86d 100644
--- a/gcc/config/aarch64/aarch64-modes.def
+++ b/gcc/config/aarch64/aarch64-modes.def
@@ -36,6 +36,10 @@ CC_MODE (CC_DLTU);
 CC_MODE (CC_DGEU);
 CC_MODE (CC_DGTU);
=20
+/* Half-precision floating point for arm_neon.h float16_t.  */
+FLOAT_MODE (HF, 2, 0);
+ADJUST_FLOAT_FORMAT (HF, &ieee_half_format);
+
 /* Vector modes.  */
 VECTOR_MODES (INT, 8);        /*       V8QI V4HI V2SI.  */
 VECTOR_MODES (INT, 16);       /* V16QI V8HI V4SI V2DI.  */
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index bf59e40a64459f6daddef47a5f5214adfd92d9b6..67c37ebc0e06d22e524322e5a82=
b6bcde550bd93 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -57,7 +57,9 @@
       if (TARGET_FLOAT)                                         \
         {                                                       \
           builtin_define ("__ARM_FEATURE_FMA");                 \
-          builtin_define_with_int_value ("__ARM_FP", 0x0C);     \
+	  builtin_define_with_int_value ("__ARM_FP", 0x0E);     \
+	  builtin_define ("__ARM_FP16_FORMAT_IEEE");		\
+	  builtin_define ("__ARM_FP16_ARGS");			\
         }                                                       \
       if (TARGET_SIMD)                                          \
         {                                                       \
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index b923fdb08a8e653570e51cf516dc551955961704..44956cf0276ed7b1369d1816f47=
2bad61ac421b1 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -8058,6 +8058,10 @@ aarch64_mangle_type (const_tree type)
   if (lang_hooks.types_compatible_p (CONST_CAST_TREE (type), va_list_type))
     return "St9__va_list";
=20
+  /* Half-precision float.  */
+  if (TREE_CODE (type) =3D=3D REAL_TYPE && TYPE_PRECISION (type) =3D=3D 16)
+    return "Dh";
+
   /* Mangle AArch64-specific internal types.  TYPE_NAME is non-NULL_TREE f=
or
      builtin types.  */
   if (TYPE_NAME (type) !=3D NULL)
@@ -9251,6 +9255,33 @@ aarch64_start_file (void)
   default_file_start();
 }
=20
+static void
+aarch64_init_libfuncs (void)
+{
+   /* Half-precision float operations.  The compiler handles all operations
+     with NULL libfuncs by converting to SFmode.  */
+
+  /* Conversions.  */
+  set_conv_libfunc (trunc_optab, HFmode, SFmode, "__gnu_f2h_ieee");
+  set_conv_libfunc (sext_optab, SFmode, HFmode, "__gnu_h2f_ieee");
+
+  /* Arithmetic.  */
+  set_optab_libfunc (add_optab, HFmode, NULL);
+  set_optab_libfunc (sdiv_optab, HFmode, NULL);
+  set_optab_libfunc (smul_optab, HFmode, NULL);
+  set_optab_libfunc (neg_optab, HFmode, NULL);
+  set_optab_libfunc (sub_optab, HFmode, NULL);
+
+  /* Comparisons.  */
+  set_optab_libfunc (eq_optab, HFmode, NULL);
+  set_optab_libfunc (ne_optab, HFmode, NULL);
+  set_optab_libfunc (lt_optab, HFmode, NULL);
+  set_optab_libfunc (le_optab, HFmode, NULL);
+  set_optab_libfunc (ge_optab, HFmode, NULL);
+  set_optab_libfunc (gt_optab, HFmode, NULL);
+  set_optab_libfunc (unord_optab, HFmode, NULL);
+}
+
 /* Target hook for c_mode_for_suffix.  */
 static machine_mode
 aarch64_c_mode_for_suffix (char suffix)
@@ -9289,7 +9320,8 @@ aarch64_float_const_representable_p (rtx x)
   if (!CONST_DOUBLE_P (x))
     return false;
=20
-  if (GET_MODE (x) =3D=3D VOIDmode)
+  /* We don't support HFmode constants yet.  */
+  if (GET_MODE (x) =3D=3D VOIDmode || GET_MODE (x) =3D=3D HFmode)
     return false;
=20
   REAL_VALUE_FROM_CONST_DOUBLE (r, x);
@@ -11230,6 +11262,14 @@ aarch64_gen_adjusted_ldpstp (rtx *operands, bool l=
oad,
   return true;
 }
=20
+/* Implement TARGET_PROMOTED_TYPE to promote float16 to 32 bits.  */
+static tree
+aarch64_promoted_type (const_tree t)
+{
+  if (SCALAR_FLOAT_TYPE_P (t) && TYPE_PRECISION (t) =3D=3D 16)
+    return float_type_node;
+  return NULL_TREE;
+}
 #undef TARGET_ADDRESS_COST
 #define TARGET_ADDRESS_COST aarch64_address_cost
=20
@@ -11384,6 +11424,9 @@ aarch64_gen_adjusted_ldpstp (rtx *operands, bool lo=
ad,
 #undef TARGET_SCHED_REASSOCIATION_WIDTH
 #define TARGET_SCHED_REASSOCIATION_WIDTH aarch64_reassociation_width
=20
+#undef TARGET_PROMOTED_TYPE
+#define TARGET_PROMOTED_TYPE aarch64_promoted_type
+
 #undef TARGET_SECONDARY_RELOAD
 #define TARGET_SECONDARY_RELOAD aarch64_secondary_reload
=20
@@ -11476,6 +11519,8 @@ aarch64_gen_adjusted_ldpstp (rtx *operands, bool lo=
ad,
 #define TARGET_VECTORIZE_VEC_PERM_CONST_OK \
   aarch64_vectorize_vec_perm_const_ok
=20
+#undef TARGET_INIT_LIBFUNCS
+#define TARGET_INIT_LIBFUNCS aarch64_init_libfuncs
=20
 #undef TARGET_FIXED_CONDITION_CODE_REGS
 #define TARGET_FIXED_CONDITION_CODE_REGS aarch64_fixed_condition_code_regs
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 1f4169ee76e7f3f321e5ed7a4d0f08b54ee3bf17..0851f6949adb69bf23221e81123=
0fae08749887c 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -976,8 +976,8 @@
 })
=20
 (define_expand "mov<mode>"
-  [(set (match_operand:GPF 0 "nonimmediate_operand" "")
-	(match_operand:GPF 1 "general_operand" ""))]
+  [(set (match_operand:GPF_F16 0 "nonimmediate_operand" "")
+	(match_operand:GPF_F16 1 "general_operand" ""))]
   ""
   "
     if (!TARGET_FLOAT)
@@ -991,6 +991,26 @@
   "
 )
=20
+(define_insn "*movhf_aarch64"
+  [(set (match_operand:HF 0 "nonimmediate_operand" "=3Dw, ?r,w,w,m,r,m ,r")
+	(match_operand:HF 1 "general_operand"      "?rY, w,w,m,w,m,rY,r"))]
+  "TARGET_FLOAT && (register_operand (operands[0], HFmode)
+    || register_operand (operands[1], HFmode))"
+  "@
+   mov\\t%0.h[0], %w1
+   umov\\t%w0, %1.h[0]
+   mov\\t%0.h[0], %1.h[0]
+   ldr\\t%h0, %1
+   str\\t%h1, %0
+   ldrh\\t%w0, %1
+   strh\\t%w1, %0
+   mov\\t%w0, %w1"
+  [(set_attr "type" "neon_from_gp,neon_to_gp,fmov,\
+                     f_loads,f_stores,load1,store1,mov_reg")
+   (set_attr "simd" "yes,yes,yes,*,*,*,*,*")
+   (set_attr "fp"   "*,*,*,yes,yes,*,*,*")]
+)
+
 (define_insn "*movsf_aarch64"
   [(set (match_operand:SF 0 "nonimmediate_operand" "=3Dw, ?r,w,w  ,w,m,r,m=
 ,r")
 	(match_operand:SF 1 "general_operand"      "?rY, w,w,Ufc,m,w,m,rY,r"))]
@@ -3882,6 +3902,22 @@
   [(set_attr "type" "f_cvt")]
 )
=20
+(define_insn "extendhfsf2"
+  [(set (match_operand:SF 0 "register_operand" "=3Dw")
+        (float_extend:SF (match_operand:HF 1 "register_operand" "w")))]
+  "TARGET_FLOAT"
+  "fcvt\\t%s0, %h1"
+  [(set_attr "type" "f_cvt")]
+)
+
+(define_insn "extendhfdf2"
+  [(set (match_operand:DF 0 "register_operand" "=3Dw")
+        (float_extend:DF (match_operand:HF 1 "register_operand" "w")))]
+  "TARGET_FLOAT"
+  "fcvt\\t%d0, %h1"
+  [(set_attr "type" "f_cvt")]
+)
+
 (define_insn "truncdfsf2"
   [(set (match_operand:SF 0 "register_operand" "=3Dw")
         (float_truncate:SF (match_operand:DF 1 "register_operand" "w")))]
@@ -3890,6 +3926,22 @@
   [(set_attr "type" "f_cvt")]
 )
=20
+(define_insn "truncsfhf2"
+  [(set (match_operand:HF 0 "register_operand" "=3Dw")
+        (float_truncate:HF (match_operand:SF 1 "register_operand" "w")))]
+  "TARGET_FLOAT"
+  "fcvt\\t%h0, %s1"
+  [(set_attr "type" "f_cvt")]
+)
+
+(define_insn "truncdfhf2"
+  [(set (match_operand:HF 0 "register_operand" "=3Dw")
+        (float_truncate:HF (match_operand:DF 1 "register_operand" "w")))]
+  "TARGET_FLOAT"
+  "fcvt\\t%h0, %d1"
+  [(set_attr "type" "f_cvt")]
+)
+
 (define_insn "fix_trunc<GPF:mode><GPI:mode>2"
   [(set (match_operand:GPI 0 "register_operand" "=3Dr")
         (fix:GPI (match_operand:GPF 1 "register_operand" "w")))]
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators=
.md
index 65a2849155c9b331dc6179853501f0a6207d1773..a8b782b887ee914bd2399807d2c=
cfdf4a8e6433b 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -38,6 +38,9 @@
 ;; Iterator for General Purpose Floating-point registers (32- and 64-bit m=
odes)
 (define_mode_iterator GPF [SF DF])
=20
+;; Iterator for General Purpose Float regs, inc float16_t.
+(define_mode_iterator GPF_F16 [HF SF DF])
+
 ;; Integer vector modes.
 (define_mode_iterator VDQ_I [V8QI V16QI V4HI V8HI V2SI V4SI V2DI])
=20
diff --git a/gcc/testsuite/gcc.target/aarch64/f16_convs_1.c b/gcc/testsuite=
/gcc.target/aarch64/f16_convs_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..d4e7c02db5e99068c9ddba1b563=
5e8904bf19e2d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/f16_convs_1.c
@@ -0,0 +1,39 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -fno-inline" } */
+
+#include <arm_neon.h>
+
+extern void abort (void);
+
+#define EPSILON 0.0001
+
+__fp16
+convert_f32_to_f16 (float in)
+{
+  return in;
+}
+
+float
+convert_f16_to_f32 (__fp16 in)
+{
+  return in;
+}
+
+int
+main (int argc, char **argv)
+{
+  __fp16 in1 =3D convert_f32_to_f16 (3.14159f);
+  __fp16 in2 =3D convert_f32_to_f16 (2.718f);
+
+  /* Do the addition on __fp16's (implicitly converts both operands to
+     float32, adds, converts back to f16, then we convert back to f32).  */
+  float32_t result1 =3D convert_f16_to_f32 (in1 + in2);
+
+  /* Do the addition on float32's (we convert both operands to f32, and ad=
d,
+     as above, but skip the final conversion f32 -> f16 -> f32).  */
+  float32_t result2 =3D convert_f16_to_f32 (in1) + convert_f16_to_f32 (in2=
);
+
+  if (__builtin_fabs (result2 - result1) > EPSILON)
+    abort ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/f16_convs_2.c b/gcc/testsuite=
/gcc.target/aarch64/f16_convs_2.c
new file mode 100644
index 0000000000000000000000000000000000000000..3421daef13ff1992775e8c42996=
23be9779ac45c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/f16_convs_2.c
@@ -0,0 +1,39 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -fno-inline" } */
+
+#include <arm_neon.h>
+
+extern void abort (void);
+
+#define EPSILON 0.0001
+
+__fp16
+convert_to_f16 (int in)
+{
+  return in;
+}
+
+int
+convert_from_f16 (__fp16 in)
+{
+  return in;
+}
+
+int
+main (int argc, char **argv)
+{
+  __fp16 in1 =3D convert_to_f16 (3);
+  __fp16 in2 =3D convert_to_f16 (2);
+
+  /* Do the addition on __fp16's (implicitly converts both operands to
+     float32, adds, converts back to f16, then we convert to int).  */
+  int result1 =3D convert_from_f16 (in1 + in2);
+
+  /* Do the addition on int's (we convert both operands directly to int, a=
dd,
+     and we're done).  */
+  int result2 =3D convert_from_f16 (in1) + convert_from_f16 (in2);
+
+  if (__builtin_abs (result2 - result1) > EPSILON)
+    abort ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/f16_movs_1.c b/gcc/testsuite/=
gcc.target/aarch64/f16_movs_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..6cb80866790c5c40a59d22f2bbb=
fce41ae5f07d0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/f16_movs_1.c
@@ -0,0 +1,26 @@
+/* { dg-do run } */
+/* { dg-options "-fno-inline -O2" } */
+
+#include <arm_neon.h>
+
+__fp16
+func2 (__fp16 a, __fp16 b)
+{
+  return b;
+}
+
+int
+main (int argc, char **argv)
+{
+  __fp16 array[16];
+  int i;
+
+  for (i =3D 0; i < sizeof (array) / sizeof (array[0]); i++)
+    array[i] =3D i;
+
+  array[0] =3D func2 (array[1], array[2]);
+
+  __builtin_printf ("%f\n", array[0]); /* { dg-output "2.0" } */
+
+  return 0;
+}

--------------060407040604030601090808--