On 18/05/16 01:58, Joseph Myers wrote: > On Tue, 17 May 2016, Matthew Wahab wrote: > >> As with the VFP FP16 arithmetic instructions, operations on __fp16 >> values are done by conversion to single-precision. Any new optimization >> supported by the instruction descriptions can only apply to code >> generated using intrinsics added in this patch series. > > As with the scalar instructions, I think it is legitimate in most cases to > optimize arithmetic via single precision to work direct on __fp16 values > (and this would be natural for vectorization of __fp16 arithmetic). > >> A number of the instructions are modelled as two variants, one using >> UNSPEC and the other using RTL operations, with the model used decided >> by the funsafe-math-optimizations flag. This follows the >> single-precision instructions and is due to the half-precision >> operations having the same conditions and restrictions on their use in >> optmizations (when they are enabled). > > (Of course, these restrictions still apply.) The F16 support generally follows the F32 implementation and, for F32, direct arithmetic vector operations are only available when unsafe-math-optimizations is enabled. I want to check the behaviour of the F16 operations when unsafe-math is enabled so I'll defer to a follow up patch the change to use standard names for the vector operations. There are still some changes from the previous patch: - Two fma/fmsub patterns *fma4 and <*fmsub4 are dropped since they just duplicated *fma4_intrinsic and <*fmsub4_intrinsic. - Patterns neon_vadd_unspec and neon_vsub_unspec are dropped, they were redundant. - 2_fp16 is renamed to 2. This implements the abs and neg operations which are always safe to use. - neon_vsqrte is renamed to neon_vrsqrte. This is a misspelled intrinsic that wasn't caught in testing because the relevant test case is missing. The intrinsic is fixed here and in other patches and an advsimd-intrinsics test added later in the (updated) series. - neon_vcvt^{_n

* config/arm/iterators.md (VCVTHI): New.
(NEON_VCMP): Add UNSPEC_VCLT and UNSPEC_VCLE. Fix a long line.
(NEON_VAGLTE): New.
(VFM_LANE_AS): New.
(VH_CVTTO): New.
(V_reg): Add HF, V4HF and V8HF. Fix white-space.
(V_HALF): Add V4HF. Fix white-space.
(V_if_elem): Add HF, V4HF and V8HF. Fix white-space.
(V_s_elem): Likewise.
(V_sz_elem): Fix white-space.
(V_elem_ch): Likewise.
(VH_elem_ch): New.
(scalar_mul_constraint): Add V8HF and V4HF.
(Is_float_mode): Fix white-space.
(Is_d_reg): Fix white-space.
(q): Add HF. Fix white-space.
(float_sup): New.
(float_SUP): New.
(cmp_op_unsp): Add UNSPEC_VCALE and UNSPEC_VCALT.
(neon_vfm_lane_as): New.
* config/arm/neon.md (add3_fp16): New.
(sub3_fp16): New.
(mul3add_neon): New.
(fma4_intrinsic): New.
(fmsub4_intrinsic): Fix white-space.
(fmsub4_intrinsic): New.
(2): New.
(neon_v): New.
(neon_v): New.
(neon_vrsqrte): New.
(neon_vpaddv4hf): New.
(neon_vadd): New.
(neon_vsub): New.
(neon_vmulf): New.
(neon_vfma): New.
(neon_vfms): New.
(neon_vc): New.
(neon_vc_fp16insn): New
(neon_vc_fp16insn_unspec): New.
(neon_vca): New.
(neon_vca_fp16insn): New.
(neon_vca_fp16insn_unspec): New.
(neon_vcz): New.
(neon_vabd): New.
(neon_vf): New.
(neon_vpfv4hf: New.
(neon_): New.
(neon_vrecps): New.
(neon_vrsqrts): New.
(neon_vrecpe): New (VH variant).
(neon_vdup_lane_internal): New.
(neon_vdup_lane): New.
(neon_vcvt^{): New (VCVTHI variant).
(neon_vcvt^{): New (VH variant).
(neon_vcvt^{_n): New (VH variant).
(neon_vcvt^{_n): New (VCVTHI variant).
(neon_vcvt^{): New.
(neon_vmul_lane): New.
(neon_vmul_n): New.
* config/arm/unspecs.md (UNSPEC_VCALE): New
(UNSPEC_VCALT): New.
(UNSPEC_VFMA_LANE): New.
(UNSPECS_VFMS_LANE): New.

testsuite/
2016-07-04 Matthew Wahab

* gcc.target/arm/armv8_2-fp16-arith-1.c: Use arm_v8_2a_fp16_neon
options. Add tests for float16x4_t and float16x8_t.}}}}}}