From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 119547 invoked by alias); 1 Apr 2016 19:47:17 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 119493 invoked by uid 89); 1 Apr 2016 19:47:16 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.0 required=5.0 tests=AWL,BAYES_00,KAM_LAZY_DOMAIN_SECURITY,RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=Hx-spam-relays-external:ESMTPA, HContent-transfer-encoding:7bit X-HELO: usmailout1.samsung.com Received: from mailout1.w2.samsung.com (HELO usmailout1.samsung.com) (211.189.100.11) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Fri, 01 Apr 2016 19:47:06 +0000 Received: from uscpsbgm2.samsung.com (u115.gpu85.samsung.co.kr [203.254.195.115]) by mailout1.w2.samsung.com (Oracle Communications Messaging Server 7.0.5.31.0 64bit (built May 5 2014)) with ESMTP id <0O4Z003Q10AGBN60@mailout1.w2.samsung.com> for gcc-patches@gcc.gnu.org; Fri, 01 Apr 2016 15:47:04 -0400 (EDT) Received: from ussync1.samsung.com ( [203.254.195.81]) by uscpsbgm2.samsung.com (USCPMTA) with SMTP id A7.0D.07641.730DEF65; Fri, 1 Apr 2016 15:47:03 -0400 (EDT) Received: from [172.31.207.194] ([105.140.31.10]) by ussync1.samsung.com (Oracle Communications Messaging Server 7.0.5.31.0 64bit (built May 5 2014)) with ESMTPA id <0O4Z00GV50AFX480@ussync1.samsung.com>; Fri, 01 Apr 2016 15:47:03 -0400 (EDT) Subject: Re: [AArch64] Emit division using the Newton series To: Wilco Dijkstra , GCC Patches References: <56EB0EDF.3060401@samsung.com> <56F2C329.10405@samsung.com> <56FDA311.7090309@samsung.com> Cc: James Greenhalgh , Andrew Pinski , nd From: Evandro Menezes Message-id: <56FED036.2070405@samsung.com> Date: Fri, 01 Apr 2016 19:47:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-version: 1.0 In-reply-to: Content-type: text/plain; charset=utf-8; format=flowed Content-transfer-encoding: 7bit X-IsSubscribed: yes X-SW-Source: 2016-04/txt/msg00090.txt.bz2 On 04/01/16 08:58, Wilco Dijkstra wrote: > Evandro Menezes wrote: > On 03/23/16 11:24, Evandro Menezes wrote: >> On 03/17/16 15:09, Evandro Menezes wrote: >>> This patch implements FP division by an approximation using the Newton >>> series. >>> >>> With this patch, DF division is sped up by over 100% and SF division, >>> zilch, both on A57 and on M1. > Mentioning throughput is not useful given that the vectorized single precision > case will give most of the speedup in actual code. > >> gcc/ >> * config/aarch64/aarch64-tuning-flags.def >> (AARCH64_EXTRA_TUNE_APPROX_DIV_{SF,DF}: New tuning macros. >> * config/aarch64/aarch64-protos.h >> (AARCH64_EXTRA_TUNE_APPROX_DIV): New macro. >> (aarch64_emit_approx_div): Declare new function. >> * config/aarch64/aarch64.c >> (aarch64_emit_approx_div): Define new function. >> * config/aarch64/aarch64.md ("div3"): New expansion. >> * config/aarch64/aarch64-simd.md ("div3"): Likewise. >> >> >> This version of the patch cleans up the changes to the MD files and >> optimizes the division when the numerator is 1.0. > Adding support for plain recip is good. Having the enabling logic no longer in > the md file is an improvement, but I don't believe adding tuning flags for the inner > mode is correct - we need a more generic solution like I mentioned in my other mail. > > The division variant should use the same latency reduction trick I mentioned for sqrt. Wilco, I don't think that it applies here, since it doesn't have to deal with special cases. As for the finer grained flags, I'll wait for the feedback on https://gcc.gnu.org/ml/gcc-patches/2016-04/msg00089.html Thank you, -- Evandro Menezes