On 18/05/16 01:58, Joseph Myers wrote: > On Tue, 17 May 2016, Matthew Wahab wrote: > >> As with the VFP FP16 arithmetic instructions, operations on __fp16 >> values are done by conversion to single-precision. Any new optimization >> supported by the instruction descriptions can only apply to code >> generated using intrinsics added in this patch series. > > As with the scalar instructions, I think it is legitimate in most cases to > optimize arithmetic via single precision to work direct on __fp16 values > (and this would be natural for vectorization of __fp16 arithmetic). > >> A number of the instructions are modelled as two variants, one using >> UNSPEC and the other using RTL operations, with the model used decided >> by the funsafe-math-optimizations flag. This follows the >> single-precision instructions and is due to the half-precision >> operations having the same conditions and restrictions on their use in >> optmizations (when they are enabled). > > (Of course, these restrictions still apply.) The F16 support generally follows the F32 implementation and, for F32, direct arithmetic vector operations are only available when unsafe-math-optimizations is enabled. I want to check the behaviour of the F16 operations when unsafe-math is enabled so I'll defer to a follow up patch the change to use standard names for the vector operations. There are still some changes from the previous patch: - Two fma/fmsub patterns *fma4 and <*fmsub4 are dropped since they just duplicated *fma4_intrinsic and <*fmsub4_intrinsic. - Patterns neon_vadd_unspec and neon_vsub_unspec are dropped, they were redundant. - 2_fp16 is renamed to 2. This implements the abs and neg operations which are always safe to use. - neon_vsqrte is renamed to neon_vrsqrte. This is a misspelled intrinsic that wasn't caught in testing because the relevant test case is missing. The intrinsic is fixed here and in other patches and an advsimd-intrinsics test added later in the (updated) series. - neon_vcvt_n * config/arm/iterators.md (VCVTHI): New. (NEON_VCMP): Add UNSPEC_VCLT and UNSPEC_VCLE. Fix a long line. (NEON_VAGLTE): New. (VFM_LANE_AS): New. (VH_CVTTO): New. (V_reg): Add HF, V4HF and V8HF. Fix white-space. (V_HALF): Add V4HF. Fix white-space. (V_if_elem): Add HF, V4HF and V8HF. Fix white-space. (V_s_elem): Likewise. (V_sz_elem): Fix white-space. (V_elem_ch): Likewise. (VH_elem_ch): New. (scalar_mul_constraint): Add V8HF and V4HF. (Is_float_mode): Fix white-space. (Is_d_reg): Fix white-space. (q): Add HF. Fix white-space. (float_sup): New. (float_SUP): New. (cmp_op_unsp): Add UNSPEC_VCALE and UNSPEC_VCALT. (neon_vfm_lane_as): New. * config/arm/neon.md (add3_fp16): New. (sub3_fp16): New. (mul3add_neon): New. (fma4_intrinsic): New. (fmsub4_intrinsic): Fix white-space. (fmsub4_intrinsic): New. (2): New. (neon_v): New. (neon_v): New. (neon_vrsqrte): New. (neon_vpaddv4hf): New. (neon_vadd): New. (neon_vsub): New. (neon_vmulf): New. (neon_vfma): New. (neon_vfms): New. (neon_vc): New. (neon_vc_fp16insn): New (neon_vc_fp16insn_unspec): New. (neon_vca): New. (neon_vca_fp16insn): New. (neon_vca_fp16insn_unspec): New. (neon_vcz): New. (neon_vabd): New. (neon_vf): New. (neon_vpfv4hf: New. (neon_): New. (neon_vrecps): New. (neon_vrsqrts): New. (neon_vrecpe): New (VH variant). (neon_vdup_lane_internal): New. (neon_vdup_lane): New. (neon_vcvt): New (VCVTHI variant). (neon_vcvt): New (VH variant). (neon_vcvt_n): New (VH variant). (neon_vcvt_n): New (VCVTHI variant). (neon_vcvt): New. (neon_vmul_lane): New. (neon_vmul_n): New. * config/arm/unspecs.md (UNSPEC_VCALE): New (UNSPEC_VCALT): New. (UNSPEC_VFMA_LANE): New. (UNSPECS_VFMS_LANE): New. testsuite/ 2016-07-04 Matthew Wahab * gcc.target/arm/armv8_2-fp16-arith-1.c: Use arm_v8_2a_fp16_neon options. Add tests for float16x4_t and float16x8_t.