From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 68510 invoked by alias); 22 Apr 2015 17:04:08 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 68497 invoked by uid 89); 22 Apr 2015 17:04:07 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00,KAM_LOTSOFHASH,SPF_PASS autolearn=no version=3.3.2 X-HELO: eu-smtp-delivery-143.mimecast.com Received: from eu-smtp-delivery-143.mimecast.com (HELO eu-smtp-delivery-143.mimecast.com) (146.101.78.143) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 22 Apr 2015 17:04:04 +0000 Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.140]) by uk-mta-3.uk.mimecast.lan; Wed, 22 Apr 2015 18:04:00 +0100 Received: from [10.2.207.65] ([10.1.2.79]) by cam-owa1.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959); Wed, 22 Apr 2015 18:04:00 +0100 Message-ID: <5537D47F.6020103@arm.com> Date: Wed, 22 Apr 2015 17:04:00 -0000 From: Alan Lawrence User-Agent: Thunderbird 2.0.0.24 (X11/20101213) MIME-Version: 1.0 To: "gcc-patches@gcc.gnu.org" Subject: [PATCH 5/14][AArch64] Add basic fp16 support X-MC-Unique: FCooeaFURo6AJc0eGYwpmw-1 Content-Type: multipart/mixed; boundary="------------060407040604030601090808" X-IsSubscribed: yes X-SW-Source: 2015-04/txt/msg01339.txt.bz2 This is a multi-part message in MIME format. --------------060407040604030601090808 Content-Type: text/plain; charset=WINDOWS-1252; format=flowed Content-Transfer-Encoding: quoted-printable Content-length: 2696 This adds basic support for moving __fp16 values around, passing and return= ing,=20 and operating on them by promoting to 32-bit floats. Also a few scalar test= cases. Note I've not got an fmov (immediate) variant, because there is no 'fmov h<= n>,=20 ...' - the only way to load a 16-bit immediate is to reinterpret the bit pa= ttern=20 into some other type. Vector MOVs are turned off for the same reason. If th= is is=20 practical it can follow in a separate patch. My reading of ACLE suggests the type name to use is __fp16, rather than=20 __builtin_aarch64_simd_hf. I can use the latter if that's preferable? int<->f16 conversions are a little odd, assembly int_to_f16: scvtf d0, w0 fcvt h0, d0 ret int_from_f16: fcvt s0, h0 fcvtzs w0, s0 ret The spec is silent on the absence or existence of intermediate rounding ste= ps,=20 however, I don't think this matters: even float32_t offers soooo many more = bits=20 than __fp16, that any integer which fits into the range of an __fp16 (i.e. = is=20 not infinite), can be expressed exactly as a float32_t without any loss of= =20 precision. So I think the above are OK. (if they can be optimized, that can= =20 follow in a later patch.) Note that unlike ARM, where we support both IEEE and Alternative formats (a= nd,=20 somewhat-awkwardly, format-agnostic code too), here we are settling on IEEE= =20 format always. Technically, we should output an EABI attribute saying which= =20 format we are using here, however, aarch64 asm does not support the=20 .eabi-attribute directive yet, so it seems reasonable to leave this while t= here=20 is only one possible format. Bootstrapped + check-gcc on aarch64-none-linux-gnu. gcc/ChangeLog: * config/aarch64/aarch64-builtins.c (aarch64_fp16_type_node): New. (aarch64_init_builtins): Make aarch64_fp16_type_node, use for __fp16. * config/aarch64/aarch64-modes.def: Add HFmode. * config/aarch64/aarch64.h (TARGET_CPU_CPP_BUILTINS): Define __ARM_FP16_FORMAT_IEEE and __ARM_FP16_ARGS. Set bit 1 of __ARM_FP. * config/aarch64/aarch64.c (aarch64_init_libfuncs, aarch64_promoted_type): New. (aarch64_float_const_representable_p): Disable HFmode. (aarch64_mangle_type): Mangle half-precision floats to "Dh". (TARGET_PROMOTED_TYPE): Define to aarch64_promoted_type. (TARGET_INIT_LIBFUNCS): Define to aarch64_init_libfuncs. * config/aarch64/aarch64.md (mov): Include HFmode using GPF_F16. (movhf_aarch64, extendhfsf2, extendhfdf2, truncsfhf2, truncdfhf2): New. * config/aarch64/iterators.md (GPF_F16): New. gcc/testsuite/ChangeLog: * gcc.target/aarch64/f16_convs_1.c: New test. * gcc.target/aarch64/f16_convs_2.c: New test. * gcc.target/aarch64/f16_movs_1.c: New test. --------------060407040604030601090808 Content-Type: text/x-patch; name=05_aarch64_basic_fp16.patch Content-Transfer-Encoding: quoted-printable Content-Disposition: inline; filename="05_aarch64_basic_fp16.patch" Content-length: 12568 diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aar= ch64-builtins.c index 87f1ac2ec1e3c774782c567b20c673802ae90d99..5a7b112bd1fe77826bfb84383c8= 6dceb6b1521e3 100644 --- a/gcc/config/aarch64/aarch64-builtins.c +++ b/gcc/config/aarch64/aarch64-builtins.c @@ -453,6 +453,9 @@ static struct aarch64_simd_type_info aarch64_simd_types= [] =3D { }; #undef ENTRY =20 +/* This type is not SIMD-specific; it is the user-visible __fp16. */ +static tree aarch64_fp16_type_node =3D NULL_TREE; + static tree aarch64_simd_intOI_type_node =3D NULL_TREE; static tree aarch64_simd_intEI_type_node =3D NULL_TREE; static tree aarch64_simd_intCI_type_node =3D NULL_TREE; @@ -862,6 +865,12 @@ aarch64_init_builtins (void) =3D add_builtin_function ("__builtin_aarch64_set_fpsr", ftype_set_fpr, AARCH64_BUILTIN_SET_FPSR, BUILT_IN_MD, NULL, NULL_TREE); =20 + aarch64_fp16_type_node =3D make_node (REAL_TYPE); + TYPE_PRECISION (aarch64_fp16_type_node) =3D 16; + layout_type (aarch64_fp16_type_node); + + (*lang_hooks.types.register_builtin_type) (aarch64_fp16_type_node, "__fp= 16"); + if (TARGET_SIMD) aarch64_init_simd_builtins (); if (TARGET_CRC32) diff --git a/gcc/config/aarch64/aarch64-modes.def b/gcc/config/aarch64/aarc= h64-modes.def index b17b90d90601ae0a631a78560da743720c4638ce..c30059b632fa8cb7fd9071917d3= f581f0966a86d 100644 --- a/gcc/config/aarch64/aarch64-modes.def +++ b/gcc/config/aarch64/aarch64-modes.def @@ -36,6 +36,10 @@ CC_MODE (CC_DLTU); CC_MODE (CC_DGEU); CC_MODE (CC_DGTU); =20 +/* Half-precision floating point for arm_neon.h float16_t. */ +FLOAT_MODE (HF, 2, 0); +ADJUST_FLOAT_FORMAT (HF, &ieee_half_format); + /* Vector modes. */ VECTOR_MODES (INT, 8); /* V8QI V4HI V2SI. */ VECTOR_MODES (INT, 16); /* V16QI V8HI V4SI V2DI. */ diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index bf59e40a64459f6daddef47a5f5214adfd92d9b6..67c37ebc0e06d22e524322e5a82= b6bcde550bd93 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -57,7 +57,9 @@ if (TARGET_FLOAT) \ { \ builtin_define ("__ARM_FEATURE_FMA"); \ - builtin_define_with_int_value ("__ARM_FP", 0x0C); \ + builtin_define_with_int_value ("__ARM_FP", 0x0E); \ + builtin_define ("__ARM_FP16_FORMAT_IEEE"); \ + builtin_define ("__ARM_FP16_ARGS"); \ } \ if (TARGET_SIMD) \ { \ diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index b923fdb08a8e653570e51cf516dc551955961704..44956cf0276ed7b1369d1816f47= 2bad61ac421b1 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -8058,6 +8058,10 @@ aarch64_mangle_type (const_tree type) if (lang_hooks.types_compatible_p (CONST_CAST_TREE (type), va_list_type)) return "St9__va_list"; =20 + /* Half-precision float. */ + if (TREE_CODE (type) =3D=3D REAL_TYPE && TYPE_PRECISION (type) =3D=3D 16) + return "Dh"; + /* Mangle AArch64-specific internal types. TYPE_NAME is non-NULL_TREE f= or builtin types. */ if (TYPE_NAME (type) !=3D NULL) @@ -9251,6 +9255,33 @@ aarch64_start_file (void) default_file_start(); } =20 +static void +aarch64_init_libfuncs (void) +{ + /* Half-precision float operations. The compiler handles all operations + with NULL libfuncs by converting to SFmode. */ + + /* Conversions. */ + set_conv_libfunc (trunc_optab, HFmode, SFmode, "__gnu_f2h_ieee"); + set_conv_libfunc (sext_optab, SFmode, HFmode, "__gnu_h2f_ieee"); + + /* Arithmetic. */ + set_optab_libfunc (add_optab, HFmode, NULL); + set_optab_libfunc (sdiv_optab, HFmode, NULL); + set_optab_libfunc (smul_optab, HFmode, NULL); + set_optab_libfunc (neg_optab, HFmode, NULL); + set_optab_libfunc (sub_optab, HFmode, NULL); + + /* Comparisons. */ + set_optab_libfunc (eq_optab, HFmode, NULL); + set_optab_libfunc (ne_optab, HFmode, NULL); + set_optab_libfunc (lt_optab, HFmode, NULL); + set_optab_libfunc (le_optab, HFmode, NULL); + set_optab_libfunc (ge_optab, HFmode, NULL); + set_optab_libfunc (gt_optab, HFmode, NULL); + set_optab_libfunc (unord_optab, HFmode, NULL); +} + /* Target hook for c_mode_for_suffix. */ static machine_mode aarch64_c_mode_for_suffix (char suffix) @@ -9289,7 +9320,8 @@ aarch64_float_const_representable_p (rtx x) if (!CONST_DOUBLE_P (x)) return false; =20 - if (GET_MODE (x) =3D=3D VOIDmode) + /* We don't support HFmode constants yet. */ + if (GET_MODE (x) =3D=3D VOIDmode || GET_MODE (x) =3D=3D HFmode) return false; =20 REAL_VALUE_FROM_CONST_DOUBLE (r, x); @@ -11230,6 +11262,14 @@ aarch64_gen_adjusted_ldpstp (rtx *operands, bool l= oad, return true; } =20 +/* Implement TARGET_PROMOTED_TYPE to promote float16 to 32 bits. */ +static tree +aarch64_promoted_type (const_tree t) +{ + if (SCALAR_FLOAT_TYPE_P (t) && TYPE_PRECISION (t) =3D=3D 16) + return float_type_node; + return NULL_TREE; +} #undef TARGET_ADDRESS_COST #define TARGET_ADDRESS_COST aarch64_address_cost =20 @@ -11384,6 +11424,9 @@ aarch64_gen_adjusted_ldpstp (rtx *operands, bool lo= ad, #undef TARGET_SCHED_REASSOCIATION_WIDTH #define TARGET_SCHED_REASSOCIATION_WIDTH aarch64_reassociation_width =20 +#undef TARGET_PROMOTED_TYPE +#define TARGET_PROMOTED_TYPE aarch64_promoted_type + #undef TARGET_SECONDARY_RELOAD #define TARGET_SECONDARY_RELOAD aarch64_secondary_reload =20 @@ -11476,6 +11519,8 @@ aarch64_gen_adjusted_ldpstp (rtx *operands, bool lo= ad, #define TARGET_VECTORIZE_VEC_PERM_CONST_OK \ aarch64_vectorize_vec_perm_const_ok =20 +#undef TARGET_INIT_LIBFUNCS +#define TARGET_INIT_LIBFUNCS aarch64_init_libfuncs =20 #undef TARGET_FIXED_CONDITION_CODE_REGS #define TARGET_FIXED_CONDITION_CODE_REGS aarch64_fixed_condition_code_regs diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 1f4169ee76e7f3f321e5ed7a4d0f08b54ee3bf17..0851f6949adb69bf23221e81123= 0fae08749887c 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -976,8 +976,8 @@ }) =20 (define_expand "mov" - [(set (match_operand:GPF 0 "nonimmediate_operand" "") - (match_operand:GPF 1 "general_operand" ""))] + [(set (match_operand:GPF_F16 0 "nonimmediate_operand" "") + (match_operand:GPF_F16 1 "general_operand" ""))] "" " if (!TARGET_FLOAT) @@ -991,6 +991,26 @@ " ) =20 +(define_insn "*movhf_aarch64" + [(set (match_operand:HF 0 "nonimmediate_operand" "=3Dw, ?r,w,w,m,r,m ,r") + (match_operand:HF 1 "general_operand" "?rY, w,w,m,w,m,rY,r"))] + "TARGET_FLOAT && (register_operand (operands[0], HFmode) + || register_operand (operands[1], HFmode))" + "@ + mov\\t%0.h[0], %w1 + umov\\t%w0, %1.h[0] + mov\\t%0.h[0], %1.h[0] + ldr\\t%h0, %1 + str\\t%h1, %0 + ldrh\\t%w0, %1 + strh\\t%w1, %0 + mov\\t%w0, %w1" + [(set_attr "type" "neon_from_gp,neon_to_gp,fmov,\ + f_loads,f_stores,load1,store1,mov_reg") + (set_attr "simd" "yes,yes,yes,*,*,*,*,*") + (set_attr "fp" "*,*,*,yes,yes,*,*,*")] +) + (define_insn "*movsf_aarch64" [(set (match_operand:SF 0 "nonimmediate_operand" "=3Dw, ?r,w,w ,w,m,r,m= ,r") (match_operand:SF 1 "general_operand" "?rY, w,w,Ufc,m,w,m,rY,r"))] @@ -3882,6 +3902,22 @@ [(set_attr "type" "f_cvt")] ) =20 +(define_insn "extendhfsf2" + [(set (match_operand:SF 0 "register_operand" "=3Dw") + (float_extend:SF (match_operand:HF 1 "register_operand" "w")))] + "TARGET_FLOAT" + "fcvt\\t%s0, %h1" + [(set_attr "type" "f_cvt")] +) + +(define_insn "extendhfdf2" + [(set (match_operand:DF 0 "register_operand" "=3Dw") + (float_extend:DF (match_operand:HF 1 "register_operand" "w")))] + "TARGET_FLOAT" + "fcvt\\t%d0, %h1" + [(set_attr "type" "f_cvt")] +) + (define_insn "truncdfsf2" [(set (match_operand:SF 0 "register_operand" "=3Dw") (float_truncate:SF (match_operand:DF 1 "register_operand" "w")))] @@ -3890,6 +3926,22 @@ [(set_attr "type" "f_cvt")] ) =20 +(define_insn "truncsfhf2" + [(set (match_operand:HF 0 "register_operand" "=3Dw") + (float_truncate:HF (match_operand:SF 1 "register_operand" "w")))] + "TARGET_FLOAT" + "fcvt\\t%h0, %s1" + [(set_attr "type" "f_cvt")] +) + +(define_insn "truncdfhf2" + [(set (match_operand:HF 0 "register_operand" "=3Dw") + (float_truncate:HF (match_operand:DF 1 "register_operand" "w")))] + "TARGET_FLOAT" + "fcvt\\t%h0, %d1" + [(set_attr "type" "f_cvt")] +) + (define_insn "fix_trunc2" [(set (match_operand:GPI 0 "register_operand" "=3Dr") (fix:GPI (match_operand:GPF 1 "register_operand" "w")))] diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators= .md index 65a2849155c9b331dc6179853501f0a6207d1773..a8b782b887ee914bd2399807d2c= cfdf4a8e6433b 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -38,6 +38,9 @@ ;; Iterator for General Purpose Floating-point registers (32- and 64-bit m= odes) (define_mode_iterator GPF [SF DF]) =20 +;; Iterator for General Purpose Float regs, inc float16_t. +(define_mode_iterator GPF_F16 [HF SF DF]) + ;; Integer vector modes. (define_mode_iterator VDQ_I [V8QI V16QI V4HI V8HI V2SI V4SI V2DI]) =20 diff --git a/gcc/testsuite/gcc.target/aarch64/f16_convs_1.c b/gcc/testsuite= /gcc.target/aarch64/f16_convs_1.c new file mode 100644 index 0000000000000000000000000000000000000000..d4e7c02db5e99068c9ddba1b563= 5e8904bf19e2d --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/f16_convs_1.c @@ -0,0 +1,39 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -fno-inline" } */ + +#include + +extern void abort (void); + +#define EPSILON 0.0001 + +__fp16 +convert_f32_to_f16 (float in) +{ + return in; +} + +float +convert_f16_to_f32 (__fp16 in) +{ + return in; +} + +int +main (int argc, char **argv) +{ + __fp16 in1 =3D convert_f32_to_f16 (3.14159f); + __fp16 in2 =3D convert_f32_to_f16 (2.718f); + + /* Do the addition on __fp16's (implicitly converts both operands to + float32, adds, converts back to f16, then we convert back to f32). */ + float32_t result1 =3D convert_f16_to_f32 (in1 + in2); + + /* Do the addition on float32's (we convert both operands to f32, and ad= d, + as above, but skip the final conversion f32 -> f16 -> f32). */ + float32_t result2 =3D convert_f16_to_f32 (in1) + convert_f16_to_f32 (in2= ); + + if (__builtin_fabs (result2 - result1) > EPSILON) + abort (); + return 0; +} diff --git a/gcc/testsuite/gcc.target/aarch64/f16_convs_2.c b/gcc/testsuite= /gcc.target/aarch64/f16_convs_2.c new file mode 100644 index 0000000000000000000000000000000000000000..3421daef13ff1992775e8c42996= 23be9779ac45c --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/f16_convs_2.c @@ -0,0 +1,39 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -fno-inline" } */ + +#include + +extern void abort (void); + +#define EPSILON 0.0001 + +__fp16 +convert_to_f16 (int in) +{ + return in; +} + +int +convert_from_f16 (__fp16 in) +{ + return in; +} + +int +main (int argc, char **argv) +{ + __fp16 in1 =3D convert_to_f16 (3); + __fp16 in2 =3D convert_to_f16 (2); + + /* Do the addition on __fp16's (implicitly converts both operands to + float32, adds, converts back to f16, then we convert to int). */ + int result1 =3D convert_from_f16 (in1 + in2); + + /* Do the addition on int's (we convert both operands directly to int, a= dd, + and we're done). */ + int result2 =3D convert_from_f16 (in1) + convert_from_f16 (in2); + + if (__builtin_abs (result2 - result1) > EPSILON) + abort (); + return 0; +} diff --git a/gcc/testsuite/gcc.target/aarch64/f16_movs_1.c b/gcc/testsuite/= gcc.target/aarch64/f16_movs_1.c new file mode 100644 index 0000000000000000000000000000000000000000..6cb80866790c5c40a59d22f2bbb= fce41ae5f07d0 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/f16_movs_1.c @@ -0,0 +1,26 @@ +/* { dg-do run } */ +/* { dg-options "-fno-inline -O2" } */ + +#include + +__fp16 +func2 (__fp16 a, __fp16 b) +{ + return b; +} + +int +main (int argc, char **argv) +{ + __fp16 array[16]; + int i; + + for (i =3D 0; i < sizeof (array) / sizeof (array[0]); i++) + array[i] =3D i; + + array[0] =3D func2 (array[1], array[2]); + + __builtin_printf ("%f\n", array[0]); /* { dg-output "2.0" } */ + + return 0; +} --------------060407040604030601090808--