From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 4153 invoked by alias); 28 Jul 2016 11:53:58 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 4100 invoked by uid 89); 28 Jul 2016 11:53:56 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.6 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 spammy=matthewwahabarmcom, matthew.wahab@foss.arm.com, matthewwahabfossarmcom, matthew.wahab@arm.com X-HELO: mail-lf0-f48.google.com Received: from mail-lf0-f48.google.com (HELO mail-lf0-f48.google.com) (209.85.215.48) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Thu, 28 Jul 2016 11:53:45 +0000 Received: by mail-lf0-f48.google.com with SMTP id b199so47794119lfe.0 for ; Thu, 28 Jul 2016 04:53:44 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=z1YMxqrqY7zyX2n8uHXGWfIMcEdp75w774ZUKM8RUrE=; b=OW7MUGMhfyRf47SP/m1HhJAh6HuzOuvhRXR5hsvfsEDEqvgiyRcJW2UgCWLBxpOf7I 8ng9gKM6HYLl4XsXbfSo+6802bt1eigyFAfr7DHxYaKysWzXl+w7tvEA9kLYALnukmkU MFFV5fEQxpeJ2DXZM8M4l3n/TI/P0URQquwKFS1NCQA9MwRa6FoXXSceIXSoGd8jloz3 RYSDNmpdtzTidaTaAJyNjCnVBQcZxuE2couqgGW5ocnRvjZxXSyAF+ogmyV5TgCXqMk9 XBIBPSpVC+3oI33knnLKnTlM/FLQMu7G+x89otOuvxEgKUanyOdOVAndGRKiHCXpqKP1 o8Vw== X-Gm-Message-State: AEkoouvwLqNgWnEpyyOlkN3HgPiGJ1aKJF0ZH2bt95OJC3gm0fG3azFgqh3KHXrlmclL+pqhT8GkkwWnvmRrQQ== X-Received: by 10.46.32.198 with SMTP id g67mr12327218lji.31.1469706821310; Thu, 28 Jul 2016 04:53:41 -0700 (PDT) MIME-Version: 1.0 Received: by 10.114.25.231 with HTTP; Thu, 28 Jul 2016 04:53:40 -0700 (PDT) In-Reply-To: <577A6E09.5020607@foss.arm.com> References: <573B28A3.9030603@foss.arm.com> <573B2CA9.5060703@foss.arm.com> <577A6E09.5020607@foss.arm.com> From: Ramana Radhakrishnan Date: Thu, 28 Jul 2016 11:53:00 -0000 Message-ID: Subject: Re: [PATCH 9/17][ARM] Add NEON FP16 arithmetic instructions. To: Matthew Wahab Cc: Joseph Myers , gcc-patches Content-Type: text/plain; charset=UTF-8 X-IsSubscribed: yes X-SW-Source: 2016-07/txt/msg01857.txt.bz2 On Mon, Jul 4, 2016 at 3:09 PM, Matthew Wahab wrote: > On 18/05/16 01:58, Joseph Myers wrote: >> On Tue, 17 May 2016, Matthew Wahab wrote: >> >>> As with the VFP FP16 arithmetic instructions, operations on __fp16 >>> values are done by conversion to single-precision. Any new optimization >>> supported by the instruction descriptions can only apply to code >>> generated using intrinsics added in this patch series. >> >> As with the scalar instructions, I think it is legitimate in most cases to >> optimize arithmetic via single precision to work direct on __fp16 values >> (and this would be natural for vectorization of __fp16 arithmetic). >> >>> A number of the instructions are modelled as two variants, one using >>> UNSPEC and the other using RTL operations, with the model used decided >>> by the funsafe-math-optimizations flag. This follows the >>> single-precision instructions and is due to the half-precision >>> operations having the same conditions and restrictions on their use in >>> optmizations (when they are enabled). >> >> (Of course, these restrictions still apply.) > > The F16 support generally follows the F32 implementation and, for F32, > direct arithmetic vector operations are only available when > unsafe-math-optimizations is enabled. I want to check the behaviour of > the F16 operations when unsafe-math is enabled so I'll defer to a follow > up patch the change to use standard names for the vector operations. > > There are still some changes from the previous patch: > > - Two fma/fmsub patterns *fma4 and <*fmsub4 are > dropped since they just duplicated *fma4_intrinsic and > <*fmsub4_intrinsic. > > - Patterns neon_vadd_unspec and neon_vsub_unspec are > dropped, they were redundant. > > - 2_fp16 is renamed to 2. This > implements the abs and neg operations which are always safe to use. > > - neon_vsqrte is renamed to neon_vrsqrte. This is a > misspelled intrinsic that wasn't caught in testing because the > relevant test case is missing. The intrinsic is fixed here and in > other patches and an advsimd-intrinsics test added later in the > (updated) series. > > - neon_vcvt_n correct range for f16 is 0-17. > > - Test armv8_2-fp16-arith-1.c is updated to expect f16 arithmetic > instructions rather then f32 and to use the neon command line options. > > Tested the series for arm-none-linux-gnueabihf with native bootstrap and > make check and for arm-none-eabi and armeb-none-eabi with make check on > an ARMv8.2-A emulator. > > Ok for trunk? OK. Ramana > Matthew > > 2016-07-04 Matthew Wahab > > * config/arm/iterators.md (VCVTHI): New. > (NEON_VCMP): Add UNSPEC_VCLT and UNSPEC_VCLE. Fix a long line. > (NEON_VAGLTE): New. > (VFM_LANE_AS): New. > (VH_CVTTO): New. > (V_reg): Add HF, V4HF and V8HF. Fix white-space. > (V_HALF): Add V4HF. Fix white-space. > (V_if_elem): Add HF, V4HF and V8HF. Fix white-space. > (V_s_elem): Likewise. > (V_sz_elem): Fix white-space. > (V_elem_ch): Likewise. > (VH_elem_ch): New. > (scalar_mul_constraint): Add V8HF and V4HF. > (Is_float_mode): Fix white-space. > (Is_d_reg): Fix white-space. > (q): Add HF. Fix white-space. > (float_sup): New. > (float_SUP): New. > (cmp_op_unsp): Add UNSPEC_VCALE and UNSPEC_VCALT. > (neon_vfm_lane_as): New. > * config/arm/neon.md (add3_fp16): New. > (sub3_fp16): New. > (mul3add_neon): New. > (fma4_intrinsic): New. > (fmsub4_intrinsic): Fix white-space. > (fmsub4_intrinsic): New. > (2): New. > (neon_v): New. > (neon_v): New. > (neon_vrsqrte): New. > (neon_vpaddv4hf): New. > (neon_vadd): New. > (neon_vsub): New. > (neon_vmulf): New. > (neon_vfma): New. > (neon_vfms): New. > (neon_vc): New. > (neon_vc_fp16insn): New > (neon_vc_fp16insn_unspec): New. > (neon_vca): New. > (neon_vca_fp16insn): New. > (neon_vca_fp16insn_unspec): New. > (neon_vcz): New. > (neon_vabd): New. > (neon_vf): New. > (neon_vpfv4hf: New. > (neon_): New. > (neon_vrecps): New. > (neon_vrsqrts): New. > (neon_vrecpe): New (VH variant). > (neon_vdup_lane_internal): New. > (neon_vdup_lane): New. > (neon_vcvt): New (VCVTHI variant). > (neon_vcvt): New (VH variant). > (neon_vcvt_n): New (VH variant). > (neon_vcvt_n): New (VCVTHI variant). > (neon_vcvt): New. > (neon_vmul_lane): New. > (neon_vmul_n): New. > * config/arm/unspecs.md (UNSPEC_VCALE): New > (UNSPEC_VCALT): New. > (UNSPEC_VFMA_LANE): New. > (UNSPECS_VFMS_LANE): New. > > testsuite/ > 2016-07-04 Matthew Wahab > > * gcc.target/arm/armv8_2-fp16-arith-1.c: Use arm_v8_2a_fp16_neon > options. Add tests for float16x4_t and float16x8_t. >