From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 17608 invoked by alias); 4 Aug 2015 11:01:45 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 17591 invoked by uid 89); 4 Aug 2015 11:01:44 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.3 required=5.0 tests=AWL,BAYES_50,SPF_PASS autolearn=ham version=3.3.2 X-HELO: eu-smtp-delivery-143.mimecast.com Received: from eu-smtp-delivery-143.mimecast.com (HELO eu-smtp-delivery-143.mimecast.com) (146.101.78.143) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 04 Aug 2015 11:01:41 +0000 Received: from cam-owa2.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.140]) by eu-smtp-1.mimecast.com with ESMTP id uk-mta-15-VluvNU0vRoucU09TGIk95Q-1; Tue, 04 Aug 2015 12:01:35 +0100 Received: from [10.2.207.65] ([10.1.2.79]) by cam-owa2.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959); Tue, 4 Aug 2015 12:01:35 +0100 Message-ID: <55C09B8F.6020700@arm.com> Date: Tue, 04 Aug 2015 11:01:00 -0000 From: Alan Lawrence User-Agent: Thunderbird 2.0.0.24 (X11/20101213) MIME-Version: 1.0 To: James Greenhalgh CC: "gcc-patches@gcc.gnu.org" Subject: Re: [PATCH 8/15][AArch64] Add support for float16x{4,8}_t vectors/builtins References: <55B765DF.4040706@arm.com> <55B766B4.2030305@arm.com> <20150729102409.GC5656@arm.com> In-Reply-To: <20150729102409.GC5656@arm.com> X-MC-Unique: VluvNU0vRoucU09TGIk95Q-1 Content-Type: multipart/mixed; boundary="------------000109010707020108060300" X-IsSubscribed: yes X-SW-Source: 2015-08/txt/msg00158.txt.bz2 This is a multi-part message in MIME format. --------------000109010707020108060300 Content-Type: text/plain; charset=WINDOWS-1252; format=flowed Content-Transfer-Encoding: quoted-printable Content-length: 2769 James Greenhalgh wrote: >> -;; All modes. >> +;; All vector modes on which we support any arithmetic operations. >> (define_mode_iterator VALL [V8QI V16QI V4HI V8HI V2SI V4SI V2DI V2SF V4= SF V2DF]) >>=20=20 >> -;; All vector modes and DI. >> +;; All vector modes, including HF modes on which we cannot operate >=20 > The wording here is a bit off, we can operate on them - for a limited set > of operations (and you are missing a full stop). How > about something like: >=20 > All vector modes suitable for moving, loading and storing. >=20 >> +(define_mode_iterator VALL_F16 [V8QI V16QI V4HI V8HI V2SI V4SI V2DI >> + V4HF V8HF V2SF V4SF V2DF]) >> + >> +;; All vector modes barring F16, plus DI. >=20 > "barring HF modes" for consistency with the above comment. >=20 >> (define_mode_iterator VALLDI [V8QI V16QI V4HI V8HI V2SI V4SI V2DI V2SF = V4SF V2DF DI]) >>=20=20 >> +;; All vector modes and DI. >> +(define_mode_iterator VALLDI_F16 [V8QI V16QI V4HI V8HI V2SI V4SI V2DI >> + V4HF V8HF V2SF V4SF V2DF DI]) >> + >> ;; All vector modes and DI and DF. >=20 > Except HF modes. Here's a new version, updating the comments much as you suggest, dropping t= he=20 unrelated testsuite changes (already pushed), and adding VRL2/3/4 iterator= =20 values only for V4HF. Bootstrapped + check-gcc on aarch64-none-linux-gnu. gcc/ChangeLog: * config/aarch64/aarch64.c (aarch64_vector_mode_supported_p): Support V4HFmode and V8HFmode. (aarch64_split_simd_move): Add case for V8HFmode. * config/aarch64/aarch64-builtins.c (v4hf_UP, v8hf_UP): Define. (aarch64_simd_builtin_std_type): Handle HFmode. (aarch64_init_simd_builtin_types): Include Float16x4_t and Float16x8_t. * config/aarch64/aarch64-simd.md (mov, aarch64_get_lane, aarch64_ld1, aarch64_st1, aarch64_be_st1): Use VALLDI_F16 iterator. * config/aarch64/aarch64-simd-builtin-types.def: Add Float16x4_t, Float16x8_t. * config/aarch64/aarch64-simd-builtins.def (ld1, st1): Use VALL_F16. * config/aarch64/arm_neon.h (float16x4_t, float16x8_t, float16_t): New typedefs. (vget_lane_f16, vgetq_lane_f16, vset_lane_f16, vsetq_lane_f16, vld1_f16, vld1q_f16, vst1_f16, vst1q_f16, vst1_lane_f16, vst1q_lane_f16): New. * config/aarch64/iterators.md (VD, VQ, VQ_NO2E): Add vectors of HFmode. (VALLDI_F16, VALL_F16): New. (Vmtype, VEL, VCONQ, VHALF, V_TWO_ELEM, V_THREE_ELEM, V_FOUR_ELEM, q): Add cases for V4HF and V8HF. (VDBL, VRL2, VRL3, VRL4): Add V4HF case. gcc/testsuite/ChangeLog: * g++.dg/abi/mangle-neon-aarch64.C: Add cases for float16x4_t and float16x8_t. * gcc.target/aarch64/vset_lane_1.c: Likewise. * gcc.target/aarch64/vld1-vst1_1.c: Likewise. * gcc.target/aarch64/vld1_lane.c: Likewise. --------------000109010707020108060300 Content-Type: text/x-patch; name=08_aarch64_float16_vectors.patch Content-Transfer-Encoding: quoted-printable Content-Disposition: inline; filename="08_aarch64_float16_vectors.patch" Content-length: 26206 commit 49cb53a94a44fcda845c3f6ef11e88f9be458aad Author: Alan Lawrence Date: Tue Dec 2 13:08:15 2014 +0000 AArch64 2/N: Vector/__builtin basics: define+support types, movs, test = ABI. =20=20=20=20 Patterns, builtins, intrinsics for {ld1,st1}{,_lane},v{g,s}et_lane. Tes= ts: vld1-vst1_1, vset_lane_1, vld1_lane.c diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aar= ch64-builtins.c index cfb2dc1..a6c3377 100644 --- a/gcc/config/aarch64/aarch64-builtins.c +++ b/gcc/config/aarch64/aarch64-builtins.c @@ -66,6 +66,7 @@ =20 #define v8qi_UP V8QImode #define v4hi_UP V4HImode +#define v4hf_UP V4HFmode #define v2si_UP V2SImode #define v2sf_UP V2SFmode #define v1df_UP V1DFmode @@ -73,6 +74,7 @@ #define df_UP DFmode #define v16qi_UP V16QImode #define v8hi_UP V8HImode +#define v8hf_UP V8HFmode #define v4si_UP V4SImode #define v4sf_UP V4SFmode #define v2di_UP V2DImode @@ -523,6 +525,8 @@ aarch64_simd_builtin_std_type (enum machine_mode mode, return aarch64_simd_intCI_type_node; case XImode: return aarch64_simd_intXI_type_node; + case HFmode: + return aarch64_fp16_type_node; case SFmode: return float_type_node; case DFmode: @@ -607,6 +611,8 @@ aarch64_init_simd_builtin_types (void) aarch64_simd_types[Poly64x2_t].eltype =3D aarch64_simd_types[Poly64_t].i= type; =20 /* Continue with standard types. */ + aarch64_simd_types[Float16x4_t].eltype =3D aarch64_fp16_type_node; + aarch64_simd_types[Float16x8_t].eltype =3D aarch64_fp16_type_node; aarch64_simd_types[Float32x2_t].eltype =3D float_type_node; aarch64_simd_types[Float32x4_t].eltype =3D float_type_node; aarch64_simd_types[Float64x1_t].eltype =3D double_type_node; diff --git a/gcc/config/aarch64/aarch64-simd-builtin-types.def b/gcc/config= /aarch64/aarch64-simd-builtin-types.def index bb54e56..ea219b7 100644 --- a/gcc/config/aarch64/aarch64-simd-builtin-types.def +++ b/gcc/config/aarch64/aarch64-simd-builtin-types.def @@ -44,6 +44,8 @@ ENTRY (Poly16x8_t, V8HI, poly, 12) ENTRY (Poly64x1_t, DI, poly, 12) ENTRY (Poly64x2_t, V2DI, poly, 12) + ENTRY (Float16x4_t, V4HF, none, 13) + ENTRY (Float16x8_t, V8HF, none, 13) ENTRY (Float32x2_t, V2SF, none, 13) ENTRY (Float32x4_t, V4SF, none, 13) ENTRY (Float64x1_t, V1DF, none, 13) diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarc= h64/aarch64-simd-builtins.def index dd2bc47..4dd2bc7 100644 --- a/gcc/config/aarch64/aarch64-simd-builtins.def +++ b/gcc/config/aarch64/aarch64-simd-builtins.def @@ -367,11 +367,11 @@ VAR1 (UNOP, float_extend_lo_, 0, v2df) VAR1 (UNOP, float_truncate_lo_, 0, v2sf) =20 - /* Implemented by aarch64_ld1. */ - BUILTIN_VALL (LOAD1, ld1, 0) + /* Implemented by aarch64_ld1. */ + BUILTIN_VALL_F16 (LOAD1, ld1, 0) =20 - /* Implemented by aarch64_st1. */ - BUILTIN_VALL (STORE1, st1, 0) + /* Implemented by aarch64_st1. */ + BUILTIN_VALL_F16 (STORE1, st1, 0) =20 /* Implemented by fma4. */ BUILTIN_VDQF (TERNOP, fma, 4) diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch6= 4-simd.md index b90f938..5cc45ed 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -19,8 +19,8 @@ ;; . =20 (define_expand "mov" - [(set (match_operand:VALL 0 "nonimmediate_operand" "") - (match_operand:VALL 1 "general_operand" ""))] + [(set (match_operand:VALL_F16 0 "nonimmediate_operand" "") + (match_operand:VALL_F16 1 "general_operand" ""))] "TARGET_SIMD" " if (GET_CODE (operands[0]) =3D=3D MEM) @@ -2450,7 +2450,7 @@ (define_insn "aarch64_get_lane" [(set (match_operand: 0 "aarch64_simd_nonimmediate_operand" "=3Dr, = w, Utv") (vec_select: - (match_operand:VALL 1 "register_operand" "w, w, w") + (match_operand:VALL_F16 1 "register_operand" "w, w, w") (parallel [(match_operand:SI 2 "immediate_operand" "i, i, i")])))] "TARGET_SIMD" { @@ -4234,8 +4234,9 @@ ) =20 (define_insn "aarch64_be_ld1" - [(set (match_operand:VALLDI 0 "register_operand" "=3Dw") - (unspec:VALLDI [(match_operand:VALLDI 1 "aarch64_simd_struct_operand" "Ut= v")] + [(set (match_operand:VALLDI_F16 0 "register_operand" "=3Dw") + (unspec:VALLDI_F16 [(match_operand:VALLDI_F16 1 + "aarch64_simd_struct_operand" "Utv")] UNSPEC_LD1))] "TARGET_SIMD" "ld1\\t{%0}, %1" @@ -4243,8 +4244,8 @@ ) =20 (define_insn "aarch64_be_st1" - [(set (match_operand:VALLDI 0 "aarch64_simd_struct_operand" "=3DUtv") - (unspec:VALLDI [(match_operand:VALLDI 1 "register_operand" "w")] + [(set (match_operand:VALLDI_F16 0 "aarch64_simd_struct_operand" "=3DUtv") + (unspec:VALLDI_F16 [(match_operand:VALLDI_F16 1 "register_operand" "w")] UNSPEC_ST1))] "TARGET_SIMD" "st1\\t{%1}, %0" @@ -4533,16 +4534,16 @@ DONE; }) =20 -(define_expand "aarch64_ld1" - [(match_operand:VALL 0 "register_operand") +(define_expand "aarch64_ld1" + [(match_operand:VALL_F16 0 "register_operand") (match_operand:DI 1 "register_operand")] "TARGET_SIMD" { - machine_mode mode =3D mode; + machine_mode mode =3D mode; rtx mem =3D gen_rtx_MEM (mode, operands[1]); =20 if (BYTES_BIG_ENDIAN) - emit_insn (gen_aarch64_be_ld1 (operands[0], mem)); + emit_insn (gen_aarch64_be_ld1 (operands[0], mem)); else emit_move_insn (operands[0], mem); DONE; @@ -4901,16 +4902,16 @@ DONE; }) =20 -(define_expand "aarch64_st1" +(define_expand "aarch64_st1" [(match_operand:DI 0 "register_operand") - (match_operand:VALL 1 "register_operand")] + (match_operand:VALL_F16 1 "register_operand")] "TARGET_SIMD" { - machine_mode mode =3D mode; + machine_mode mode =3D mode; rtx mem =3D gen_rtx_MEM (mode, operands[0]); =20 if (BYTES_BIG_ENDIAN) - emit_insn (gen_aarch64_be_st1 (mem, operands[1])); + emit_insn (gen_aarch64_be_st1 (mem, operands[1])); else emit_move_insn (mem, operands[1]); DONE; diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index f338033..ccf063a 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -1111,6 +1111,9 @@ aarch64_split_simd_move (rtx dst, rtx src) case V2DImode: gen =3D gen_aarch64_split_simd_movv2di; break; + case V8HFmode: + gen =3D gen_aarch64_split_simd_movv8hf; + break; case V4SFmode: gen =3D gen_aarch64_split_simd_movv4sf; break; @@ -8264,6 +8267,7 @@ aarch64_vector_mode_supported_p (machine_mode mode) || mode =3D=3D V2SImode || mode =3D=3D V4HImode || mode =3D=3D V8QImode || mode =3D=3D V2SFmode || mode =3D=3D V4SFmode || mode =3D=3D V2DFmode + || mode =3D=3D V4HFmode || mode =3D=3D V8HFmode || mode =3D=3D V1DFmode)) return true; =20 diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index 44fe4f9..1d09930 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -923,7 +923,8 @@ extern enum aarch64_code_model aarch64_cmodel; /* Modes valid for AdvSIMD Q registers. */ #define AARCH64_VALID_SIMD_QREG_MODE(MODE) \ ((MODE) =3D=3D V4SImode || (MODE) =3D=3D V8HImode || (MODE) =3D=3D V16QI= mode \ - || (MODE) =3D=3D V4SFmode || (MODE) =3D=3D V2DImode || mode =3D=3D V2DF= mode) + || (MODE) =3D=3D V4SFmode || (MODE) =3D=3D V8HFmode || (MODE) =3D=3D V2= DImode \ + || (MODE) =3D=3D V2DFmode) =20 #define ENDIAN_LANE_N(mode, n) \ (BYTES_BIG_ENDIAN ? GET_MODE_NUNITS (mode) - 1 - n : n) diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index 114994e..7425485 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -40,6 +40,7 @@ typedef __Int8x8_t int8x8_t; typedef __Int16x4_t int16x4_t; typedef __Int32x2_t int32x2_t; typedef __Int64x1_t int64x1_t; +typedef __Float16x4_t float16x4_t; typedef __Float32x2_t float32x2_t; typedef __Poly8x8_t poly8x8_t; typedef __Poly16x4_t poly16x4_t; @@ -52,6 +53,7 @@ typedef __Int8x16_t int8x16_t; typedef __Int16x8_t int16x8_t; typedef __Int32x4_t int32x4_t; typedef __Int64x2_t int64x2_t; +typedef __Float16x8_t float16x8_t; typedef __Float32x4_t float32x4_t; typedef __Float64x2_t float64x2_t; typedef __Poly8x16_t poly8x16_t; @@ -67,6 +69,7 @@ typedef __Poly16_t poly16_t; typedef __Poly64_t poly64_t; typedef __Poly128_t poly128_t; =20 +typedef __fp16 float16_t; typedef float float32_t; typedef double float64_t; =20 @@ -2691,6 +2694,12 @@ vcreate_p16 (uint64_t __a) =20 /* vget_lane */ =20 +__extension__ static __inline float16_t __attribute__ ((__always_inline__)) +vget_lane_f16 (float16x4_t __a, const int __b) +{ + return __aarch64_vget_lane_any (__a, __b); +} + __extension__ static __inline float32_t __attribute__ ((__always_inline__)) vget_lane_f32 (float32x2_t __a, const int __b) { @@ -2765,6 +2774,12 @@ vget_lane_u64 (uint64x1_t __a, const int __b) =20 /* vgetq_lane */ =20 +__extension__ static __inline float16_t __attribute__ ((__always_inline__)) +vgetq_lane_f16 (float16x8_t __a, const int __b) +{ + return __aarch64_vget_lane_any (__a, __b); +} + __extension__ static __inline float32_t __attribute__ ((__always_inline__)) vgetq_lane_f32 (float32x4_t __a, const int __b) { @@ -4425,6 +4440,12 @@ vreinterpretq_u32_p16 (poly16x8_t __a) =20 /* vset_lane */ =20 +__extension__ static __inline float16x4_t __attribute__ ((__always_inline_= _)) +vset_lane_f16 (float16_t __elem, float16x4_t __vec, const int __index) +{ + return __aarch64_vset_lane_any (__elem, __vec, __index); +} + __extension__ static __inline float32x2_t __attribute__ ((__always_inline_= _)) vset_lane_f32 (float32_t __elem, float32x2_t __vec, const int __index) { @@ -4499,6 +4520,12 @@ vset_lane_u64 (uint64_t __elem, uint64x1_t __vec, co= nst int __index) =20 /* vsetq_lane */ =20 +__extension__ static __inline float16x8_t __attribute__ ((__always_inline_= _)) +vsetq_lane_f16 (float16_t __elem, float16x8_t __vec, const int __index) +{ + return __aarch64_vset_lane_any (__elem, __vec, __index); +} + __extension__ static __inline float32x4_t __attribute__ ((__always_inline_= _)) vsetq_lane_f32 (float32_t __elem, float32x4_t __vec, const int __index) { @@ -14612,6 +14639,12 @@ vfmsq_laneq_f64 (float64x2_t __a, float64x2_t __b, =20 /* vld1 */ =20 +__extension__ static __inline float16x4_t __attribute__ ((__always_inline_= _)) +vld1_f16 (const float16_t *__a) +{ + return __builtin_aarch64_ld1v4hf (__a); +} + __extension__ static __inline float32x2_t __attribute__ ((__always_inline_= _)) vld1_f32 (const float32_t *a) { @@ -14691,6 +14724,12 @@ vld1_u64 (const uint64_t *a) =20 /* vld1q */ =20 +__extension__ static __inline float16x8_t __attribute__ ((__always_inline_= _)) +vld1q_f16 (const float16_t *__a) +{ + return __builtin_aarch64_ld1v8hf (__a); +} + __extension__ static __inline float32x4_t __attribute__ ((__always_inline_= _)) vld1q_f32 (const float32_t *a) { @@ -14919,6 +14958,12 @@ vld1q_dup_u64 (const uint64_t* __a) =20 /* vld1_lane */ =20 +__extension__ static __inline float16x4_t __attribute__ ((__always_inline_= _)) +vld1_lane_f16 (const float16_t *__src, float16x4_t __vec, const int __lane) +{ + return __aarch64_vset_lane_any (*__src, __vec, __lane); +} + __extension__ static __inline float32x2_t __attribute__ ((__always_inline_= _)) vld1_lane_f32 (const float32_t *__src, float32x2_t __vec, const int __lane) { @@ -14993,6 +15038,12 @@ vld1_lane_u64 (const uint64_t *__src, uint64x1_t _= _vec, const int __lane) =20 /* vld1q_lane */ =20 +__extension__ static __inline float16x8_t __attribute__ ((__always_inline_= _)) +vld1q_lane_f16 (const float16_t *__src, float16x8_t __vec, const int __lan= e) +{ + return __aarch64_vset_lane_any (*__src, __vec, __lane); +} + __extension__ static __inline float32x4_t __attribute__ ((__always_inline_= _)) vld1q_lane_f32 (const float32_t *__src, float32x4_t __vec, const int __lan= e) { @@ -21960,6 +22011,12 @@ vsrid_n_u64 (uint64_t __a, uint64_t __b, const int= __c) /* vst1 */ =20 __extension__ static __inline void __attribute__ ((__always_inline__)) +vst1_f16 (float16_t *__a, float16x4_t __b) +{ + __builtin_aarch64_st1v4hf (__a, __b); +} + +__extension__ static __inline void __attribute__ ((__always_inline__)) vst1_f32 (float32_t *a, float32x2_t b) { __builtin_aarch64_st1v2sf ((__builtin_aarch64_simd_sf *) a, b); @@ -22039,6 +22096,12 @@ vst1_u64 (uint64_t *a, uint64x1_t b) /* vst1q */ =20 __extension__ static __inline void __attribute__ ((__always_inline__)) +vst1q_f16 (float16_t *__a, float16x8_t __b) +{ + __builtin_aarch64_st1v8hf (__a, __b); +} + +__extension__ static __inline void __attribute__ ((__always_inline__)) vst1q_f32 (float32_t *a, float32x4_t b) { __builtin_aarch64_st1v4sf ((__builtin_aarch64_simd_sf *) a, b); @@ -22119,6 +22182,12 @@ vst1q_u64 (uint64_t *a, uint64x2_t b) /* vst1_lane */ =20 __extension__ static __inline void __attribute__ ((__always_inline__)) +vst1_lane_f16 (float16_t *__a, float16x4_t __b, const int __lane) +{ + *__a =3D __aarch64_vget_lane_any (__b, __lane); +} + +__extension__ static __inline void __attribute__ ((__always_inline__)) vst1_lane_f32 (float32_t *__a, float32x2_t __b, const int __lane) { *__a =3D __aarch64_vget_lane_any (__b, __lane); @@ -22193,6 +22262,12 @@ vst1_lane_u64 (uint64_t *__a, uint64x1_t __b, cons= t int __lane) /* vst1q_lane */ =20 __extension__ static __inline void __attribute__ ((__always_inline__)) +vst1q_lane_f16 (float16_t *__a, float16x8_t __b, const int __lane) +{ + *__a =3D __aarch64_vget_lane_any (__b, __lane); +} + +__extension__ static __inline void __attribute__ ((__always_inline__)) vst1q_lane_f32 (float32_t *__a, float32x4_t __b, const int __lane) { *__a =3D __aarch64_vget_lane_any (__b, __lane); diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators= .md index a6b351b..a7aaa52 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -52,7 +52,7 @@ (define_mode_iterator VSDQ_I_DI [V8QI V16QI V4HI V8HI V2SI V4SI V2DI DI]) =20 ;; Double vector modes. -(define_mode_iterator VD [V8QI V4HI V2SI V2SF]) +(define_mode_iterator VD [V8QI V4HI V4HF V2SI V2SF]) =20 ;; vector, 64-bit container, all integer modes (define_mode_iterator VD_BHSI [V8QI V4HI V2SI]) @@ -61,10 +61,10 @@ (define_mode_iterator VDQ_BHSI [V8QI V16QI V4HI V8HI V2SI V4SI]) =20 ;; Quad vector modes. -(define_mode_iterator VQ [V16QI V8HI V4SI V2DI V4SF V2DF]) +(define_mode_iterator VQ [V16QI V8HI V4SI V2DI V8HF V4SF V2DF]) =20 ;; VQ without 2 element modes. -(define_mode_iterator VQ_NO2E [V16QI V8HI V4SI V4SF]) +(define_mode_iterator VQ_NO2E [V16QI V8HI V4SI V8HF V4SF]) =20 ;; Quad vector with only 2 element modes. (define_mode_iterator VQ_2E [V2DI V2DF]) @@ -97,12 +97,20 @@ ;; Vector Float modes with 2 elements. (define_mode_iterator V2F [V2SF V2DF]) =20 -;; All modes. +;; All vector modes on which we support any arithmetic operations. (define_mode_iterator VALL [V8QI V16QI V4HI V8HI V2SI V4SI V2DI V2SF V4SF = V2DF]) =20 -;; All vector modes and DI. +;; All vector modes, including HF modes on which we cannot operate +(define_mode_iterator VALL_F16 [V8QI V16QI V4HI V8HI V2SI V4SI V2DI + V4HF V8HF V2SF V4SF V2DF]) + +;; All vector modes barring F16, plus DI. (define_mode_iterator VALLDI [V8QI V16QI V4HI V8HI V2SI V4SI V2DI V2SF V4S= F V2DF DI]) =20 +;; All vector modes and DI. +(define_mode_iterator VALLDI_F16 [V8QI V16QI V4HI V8HI V2SI V4SI V2DI + V4HF V8HF V2SF V4SF V2DF DI]) + ;; All vector modes and DI and DF. (define_mode_iterator VALLDIF [V8QI V16QI V4HI V8HI V2SI V4SI V2DI V2SF V4SF V2DF DI DF]) @@ -364,7 +372,8 @@ (define_mode_attr Vmtype [(V8QI ".8b") (V16QI ".16b") (V4HI ".4h") (V8HI ".8h") (V2SI ".2s") (V4SI ".4s") - (V2DI ".2d") (V2SF ".2s") + (V2DI ".2d") (V4HF ".4h") + (V8HF ".8h") (V2SF ".2s") (V4SF ".4s") (V2DF ".2d") (DI "") (SI "") (HI "") (QI "") @@ -401,6 +410,7 @@ (V4HI "HI") (V8HI "HI") (V2SI "SI") (V4SI "SI") (DI "DI") (V2DI "DI") + (V4HF "HF") (V8HF "HF") (V2SF "SF") (V4SF "SF") (V2DF "DF") (DF "DF") (SI "SI") (HI "HI") @@ -419,6 +429,7 @@ (V4HI "V8HI") (V8HI "V8HI") (V2SI "V4SI") (V4SI "V4SI") (DI "V2DI") (V2DI "V2DI") + (V4HF "V8HF") (V8HF "V8HF") (V2SF "V2SF") (V4SF "V4SF") (V2DF "V2DF") (SI "V4SI") (HI "V8HI") (QI "V16QI")]) @@ -428,10 +439,12 @@ (V4HI "V2HI") (V8HI "V4HI") (V2SI "SI") (V4SI "V2SI") (V2DI "DI") (V2SF "SF") - (V4SF "V2SF") (V2DF "DF")]) + (V4SF "V2SF") (V4HF "V2HF") + (V8HF "V4HF") (V2DF "DF")]) =20 ;; Double modes of vector modes. (define_mode_attr VDBL [(V8QI "V16QI") (V4HI "V8HI") + (V4HF "V8HF") (V2SI "V4SI") (V2SF "V4SF") (SI "V2SI") (DI "V2DI") (DF "V2DF")]) @@ -542,6 +555,7 @@ (define_mode_attr nregs [(OI "2") (CI "3") (XI "4")]) =20 (define_mode_attr VRL2 [(V8QI "V32QI") (V4HI "V16HI") + (V4HF "V16HF") (V2SI "V8SI") (V2SF "V8SF") (DI "V4DI") (DF "V4DF") (V16QI "V32QI") (V8HI "V16HI") @@ -549,16 +563,20 @@ (V2DI "V4DI") (V2DF "V4DF")]) =20 (define_mode_attr VRL3 [(V8QI "V48QI") (V4HI "V24HI") + (V4HF "V24HF") (V2SI "V12SI") (V2SF "V12SF") (DI "V6DI") (DF "V6DF") (V16QI "V48QI") (V8HI "V24HI") + (V8HF "V48HF") (V4SI "V12SI") (V4SF "V12SF") (V2DI "V6DI") (V2DF "V6DF")]) =20 (define_mode_attr VRL4 [(V8QI "V64QI") (V4HI "V32HI") + (V4HF "V32HF") (V2SI "V16SI") (V2SF "V16SF") (DI "V8DI") (DF "V8DF") (V16QI "V64QI") (V8HI "V32HI") + (V8HF "V32HF") (V4SI "V16SI") (V4SF "V16SF") (V2DI "V8DI") (V2DF "V8DF")]) =20 @@ -571,6 +589,7 @@ (V2SI "V2SI") (V4SI "V2SI") (DI "V2DI") (V2DI "V2DI") (V2SF "V2SF") (V4SF "V2SF") + (V4HF "SF") (V8HF "SF") (DF "V2DI") (V2DF "V2DI")]) =20 ;; Similar, for three elements. @@ -579,6 +598,7 @@ (V2SI "BLK") (V4SI "BLK") (DI "EI") (V2DI "EI") (V2SF "BLK") (V4SF "BLK") + (V4HF "BLK") (V8HF "BLK") (DF "EI") (V2DF "EI")]) =20 ;; Similar, for four elements. @@ -587,6 +607,7 @@ (V2SI "V4SI") (V4SI "V4SI") (DI "OI") (V2DI "OI") (V2SF "V4SF") (V4SF "V4SF") + (V4HF "V4HF") (V8HF "V4HF") (DF "OI") (V2DF "OI")]) =20 =20 @@ -645,6 +666,7 @@ (V4HI "") (V8HI "_q") (V2SI "") (V4SI "_q") (DI "") (V2DI "_q") + (V4HF "") (V8HF "_q") (V2SF "") (V4SF "_q") (V2DF "_q") (QI "") (HI "") (SI "") (DI "") (SF "") (DF "")]) diff --git a/gcc/testsuite/g++.dg/abi/mangle-neon-aarch64.C b/gcc/testsuite= /g++.dg/abi/mangle-neon-aarch64.C index 09a20dc..5740c02 100644 --- a/gcc/testsuite/g++.dg/abi/mangle-neon-aarch64.C +++ b/gcc/testsuite/g++.dg/abi/mangle-neon-aarch64.C @@ -13,6 +13,7 @@ void f3 (uint8x8_t a) {} void f4 (uint16x4_t a) {} void f5 (uint32x2_t a) {} void f23 (uint64x1_t a) {} +void f61 (float16x4_t a) {} void f6 (float32x2_t a) {} void f7 (poly8x8_t a) {} void f8 (poly16x4_t a) {} @@ -25,6 +26,7 @@ void f13 (uint8x16_t a) {} void f14 (uint16x8_t a) {} void f15 (uint32x4_t a) {} void f16 (uint64x2_t a) {} +void f171 (float16x8_t a) {} void f17 (float32x4_t a) {} void f18 (float64x2_t a) {} void f19 (poly8x16_t a) {} @@ -42,6 +44,7 @@ void g1 (int8x16_t, int8x16_t) {} // { dg-final { scan-assembler "_Z2f412__Uint16x4_t:" } } // { dg-final { scan-assembler "_Z2f512__Uint32x2_t:" } } // { dg-final { scan-assembler "_Z3f2312__Uint64x1_t:" } } +// { dg-final { scan-assembler "_Z3f6113__Float16x4_t:" } } // { dg-final { scan-assembler "_Z2f613__Float32x2_t:" } } // { dg-final { scan-assembler "_Z2f711__Poly8x8_t:" } } // { dg-final { scan-assembler "_Z2f812__Poly16x4_t:" } } @@ -53,6 +56,7 @@ void g1 (int8x16_t, int8x16_t) {} // { dg-final { scan-assembler "_Z3f1412__Uint16x8_t:" } } // { dg-final { scan-assembler "_Z3f1512__Uint32x4_t:" } } // { dg-final { scan-assembler "_Z3f1612__Uint64x2_t:" } } +// { dg-final { scan-assembler "_Z4f17113__Float16x8_t:" } } // { dg-final { scan-assembler "_Z3f1713__Float32x4_t:" } } // { dg-final { scan-assembler "_Z3f1813__Float64x2_t:" } } // { dg-final { scan-assembler "_Z3f1912__Poly8x16_t:" } } diff --git a/gcc/testsuite/gcc.target/aarch64/vld1-vst1_1.c b/gcc/testsuite= /gcc.target/aarch64/vld1-vst1_1.c index 290444e..fa9ef0f 100644 --- a/gcc/testsuite/gcc.target/aarch64/vld1-vst1_1.c +++ b/gcc/testsuite/gcc.target/aarch64/vld1-vst1_1.c @@ -31,6 +31,7 @@ THING (int8x8_t, 8, int8_t, _s8) \ THING (uint8x8_t, 8, uint8_t, _u8) \ THING (int16x4_t, 4, int16_t, _s16) \ THING (uint16x4_t, 4, uint16_t, _u16) \ +THING (float16x4_t, 4, float16_t, _f16) \ THING (int32x2_t, 2, int32_t, _s32) \ THING (uint32x2_t, 2, uint32_t, _u32) \ THING (float32x2_t, 2, float32_t, _f32) \ @@ -38,8 +39,10 @@ THING (int8x16_t, 16, int8_t, q_s8) \ THING (uint8x16_t, 16, uint8_t, q_u8) \ THING (int16x8_t, 8, int16_t, q_s16) \ THING (uint16x8_t, 8, uint16_t, q_u16) \ +THING (float16x8_t, 8, float16_t, q_f16)\ THING (int32x4_t, 4, int32_t, q_s32) \ THING (uint32x4_t, 4, uint32_t, q_u32) \ +THING (float32x4_t, 4, float32_t, q_f32)\ THING (int64x2_t, 2, int64_t, q_s64) \ THING (uint64x2_t, 2, uint64_t, q_u64) \ THING (float64x2_t, 2, float64_t, q_f64) diff --git a/gcc/testsuite/gcc.target/aarch64/vld1_lane.c b/gcc/testsuite/g= cc.target/aarch64/vld1_lane.c index c2445f8..c70df71 100644 --- a/gcc/testsuite/gcc.target/aarch64/vld1_lane.c +++ b/gcc/testsuite/gcc.target/aarch64/vld1_lane.c @@ -16,6 +16,7 @@ VARIANT (int32, , 2, _s32, 0) \ VARIANT (int64, , 1, _s64, 0) \ VARIANT (poly8, , 8, _p8, 7) \ VARIANT (poly16, , 4, _p16, 2) \ +VARIANT (float16, , 4, _f16, 3) \ VARIANT (float32, , 2, _f32, 1) \ VARIANT (float64, , 1, _f64, 0) \ VARIANT (uint8, q, 16, _u8, 13) \ @@ -28,6 +29,7 @@ VARIANT (int32, q, 4, _s32, 1) \ VARIANT (int64, q, 2, _s64, 1) \ VARIANT (poly8, q, 16, _p8, 7) \ VARIANT (poly16, q, 8, _p16, 4) \ +VARIANT (float16, q, 8, _f16, 3)\ VARIANT (float32, q, 4, _f32, 2)\ VARIANT (float64, q, 2, _f64, 1) =20 @@ -56,7 +58,7 @@ VARIANTS (TESTMETH) =20 #define CHECK(BASE, Q, ELTS, SUFFIX, LANE) \ if (test_vld1##Q##_lane##SUFFIX ((const BASE##_t *)orig_data, \ - BASE##_data) !=3D 0) \ + & BASE##_data) !=3D 0) \ abort (); =20 int @@ -65,20 +67,20 @@ main (int argc, char **argv) /* Original data for all vector formats. */ uint64_t orig_data[2] =3D {0x1234567890abcdefULL, 0x13579bdf02468aceULL}; =20 - /* Data with which vldN_lane will overwrite some of previous. */ - uint8_t uint8_data[4] =3D { 7, 11, 13, 17 }; - uint16_t uint16_data[4] =3D { 257, 263, 269, 271 }; - uint32_t uint32_data[4] =3D { 65537, 65539, 65543, 65551 }; - uint64_t uint64_data[4] =3D { 0xdeadbeefcafebabeULL, 0x0123456789abcdefU= LL, - 0xfedcba9876543210LL, 0xdeadbabecafebeefLL }; - int8_t int8_data[4] =3D { -1, 3, -5, 7 }; - int16_t int16_data[4] =3D { 257, -259, 261, -263 }; - int32_t int32_data[4] =3D { 123456789, -987654321, -135792468, 975318642= }; - int64_t *int64_data =3D (int64_t *)uint64_data; - poly8_t poly8_data[4] =3D { 0, 7, 13, 18, }; - poly16_t poly16_data[4] =3D { 11111, 2222, 333, 44 }; - float32_t float32_data[4] =3D { 3.14159, 2.718, 1.414, 100.0 }; - float64_t float64_data[4] =3D { 1.010010001, 12345.6789, -9876.54321, 1.= 618 }; + /* Data with which vld1_lane will overwrite one element of previous. */ + uint8_t uint8_data =3D 7; + uint16_t uint16_data =3D 257; + uint32_t uint32_data =3D 65537; + uint64_t uint64_data =3D 0xdeadbeefcafebabeULL; + int8_t int8_data =3D -1; + int16_t int16_data =3D -259; + int32_t int32_data =3D -987654321; + int64_t int64_data =3D 0x1234567890abcdefLL; + poly8_t poly8_data =3D 13; + poly16_t poly16_data =3D 11111; + float16_t float16_data =3D 8.75; + float32_t float32_data =3D 3.14159; + float64_t float64_data =3D 1.010010001; =20 VARIANTS (CHECK); return 0; diff --git a/gcc/testsuite/gcc.target/aarch64/vset_lane_1.c b/gcc/testsuite= /gcc.target/aarch64/vset_lane_1.c index 5fb1139..bc0132c 100644 --- a/gcc/testsuite/gcc.target/aarch64/vset_lane_1.c +++ b/gcc/testsuite/gcc.target/aarch64/vset_lane_1.c @@ -16,6 +16,7 @@ VARIANT (int32_t, , 2, int32x2_t, _s32, 0) \ VARIANT (int64_t, , 1, int64x1_t, _s64, 0) \ VARIANT (poly8_t, , 8, poly8x8_t, _p8, 6) \ VARIANT (poly16_t, , 4, poly16x4_t, _p16, 2) \ +VARIANT (float16_t, , 4, float16x4_t, _f16, 3) \ VARIANT (float32_t, , 2, float32x2_t, _f32, 1) \ VARIANT (float64_t, , 1, float64x1_t, _f64, 0) \ VARIANT (uint8_t, q, 16, uint8x16_t, _u8, 11) \ @@ -28,6 +29,7 @@ VARIANT (int32_t, q, 4, int32x4_t, _s32, 3) \ VARIANT (int64_t, q, 2, int64x2_t, _s64, 0) \ VARIANT (poly8_t, q, 16, poly8x16_t, _p8, 14) \ VARIANT (poly16_t, q, 8, poly16x8_t, _p16, 6) \ +VARIANT (float16_t, q, 8, float16x8_t, _f16, 6) \ VARIANT (float32_t, q, 4, float32x4_t, _f32, 2) \ VARIANT (float64_t, q, 2, float64x2_t, _f64, 1) =20 @@ -76,6 +78,9 @@ main (int argc, char **argv) poly8_t poly8_t_data[16] =3D { 0, 7, 13, 18, 22, 25, 27, 28, 29, 31, 34, 38, 43, 49, 56, 64 }; poly16_t poly16_t_data[8] =3D { 11111, 2222, 333, 44, 5, 65432, 54321, 4= 3210 }; + float16_t float16_t_data[8] =3D { 1.25, 4.5, 7.875, 2.3125, 5.675, 8.875, + 3.6875, 6.75}; + float32_t float32_t_data[4] =3D { 3.14159, 2.718, 1.414, 100.0 }; float64_t float64_t_data[2] =3D { 1.01001000100001, 12345.6789 }; =20 --------------000109010707020108060300--