From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 7632 invoked by alias); 18 Jul 2018 09:50:25 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 7603 invoked by uid 89); 18 Jul 2018 09:50:24 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-0.9 required=5.0 tests=BAYES_00,KAM_LAZY_DOMAIN_SECURITY autolearn=no version=3.3.2 spammy=headaches, H*f:CAFiYyc2ECcDg, H*i:sk:Q@mail., H*i:CAFiYyc2ECcDg X-HELO: foss.arm.com Received: from usa-sjc-mx-foss1.foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 18 Jul 2018 09:50:22 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id DC65180D; Wed, 18 Jul 2018 02:50:20 -0700 (PDT) Received: from [10.2.207.77] (e100706-lin.cambridge.arm.com [10.2.207.77]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 41E253F318; Wed, 18 Jul 2018 02:50:20 -0700 (PDT) Message-ID: <5B4F0D5A.2030401@foss.arm.com> Date: Wed, 18 Jul 2018 09:50:00 -0000 From: Kyrill Tkachov User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: Richard Biener CC: "fortran@gcc.gnu.org" , GCC Patches Subject: Re: [PATCH][Fortran] Use MIN/MAX_EXPR for intrinsics or __builtin_fmin/max when appropriate References: <5B4DE283.9060100@foss.arm.com> <5B4DF325.2050609@foss.arm.com> In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-SW-Source: 2018-07/txt/msg00986.txt.bz2 On 18/07/18 10:44, Richard Biener wrote: > On Tue, Jul 17, 2018 at 3:46 PM Kyrill Tkachov > wrote: >> Hi Richard, >> >> On 17/07/18 14:27, Richard Biener wrote: >>> On Tue, Jul 17, 2018 at 2:35 PM Kyrill Tkachov >>> wrote: >>>> Hi all, >>>> >>>> This is my first Fortran patch, so apologies if I'm missing something. >>>> The current expansion of the min and max intrinsics explicitly expands >>>> the comparisons between each argument to calculate the global min/max. >>>> Some targets, like aarch64, have instructions that can calculate the min/max >>>> of two real (floating-point) numbers with the proper NaN-handling semantics >>>> (if both inputs are NaN, return Nan. If one is NaN, return the other) and those >>>> are the semantics provided by the __builtin_fmin/max family of functions that expand >>>> to these instructions. >>>> >>>> This patch makes the frontend emit __builtin_fmin/max directly to compare each >>>> pair of numbers when the numbers are floating-point, and use MIN_EXPR/MAX_EXPR otherwise >>>> (integral types and -ffast-math) which should hopefully be easier to recognise in the >>> What is Fortrans requirement on min/max intrinsics? Doesn't it only >>> require things that >>> are guaranteed by MIN/MAX_EXPR anyways? The only restriction here is >> The current implementation expands to: >> mvar = a1; >> if (a2 .op. mvar || isnan (mvar)) >> mvar = a2; >> if (a3 .op. mvar || isnan (mvar)) >> mvar = a3; >> ... >> return mvar; >> >> That is, if one of the operands is a NaN it will return the other argument. >> If both (all) are NaNs, it will return NaN. This is the same as the semantics of fmin/max >> as far as I can tell. >> >>> /* Minimum and maximum values. When used with floating point, if both >>> operands are zeros, or if either operand is NaN, then it is unspecified >>> which of the two operands is returned as the result. */ >>> >>> which means MIN/MAX_EXPR are not strictly IEEE compliant with signed >>> zeros or NaNs. >>> Thus the correct test would be !HONOR_SIGNED_ZEROS && !HONOR_NANS if singed >>> zeros are significant. >> True, MIN/MAX_EXPR would not be appropriate in that condition. I guarded their use >> on !HONOR_NANS (type) only. I'll update it to !HONOR_SIGNED_ZEROS (type) && !HONOR_NANS (type). >> >> >>> I'm not sure if using fmin/max calls when we cannot use MIN/MAX_EXPR >>> is a good idea, >>> this may both generate bigger code and be slower. >> The patch will generate fmin/fmax calls (or the fminf,fminl variants) when mathfn_built_in advertises >> them as available (does that mean they'll have a fast inline implementation?) > This doesn't mean anything given you make them available with your > patch ;) So I expect it may > cause issues for !c99_runtime targets (and long double at least). Urgh, that can cause headaches... >> If the above doesn't hold and we can't use either MIN/MAX_EXPR of fmin/fmax then the patch falls back >> to the existing expansion. > As said I would not use fmin/fmax calls here at all. ... Given the comments from Thomas and Janne, maybe we should just emit MIN/MAX_EXPRs here since there is no language requirement on NaN/signed zero handling on these intrinsics? That should make it simpler and more portable. >> FWIW, this patch does improve performance on 521.wrf from SPEC2017 on aarch64. > You said that, yes. Even without -ffast-math? It improves at -O3 without -ffast-math in particular. With -ffast-math phiopt optimisation is more aggressive and merges the conditionals into MIN/MAX_EXPRs (minmax_replacement in tree-ssa-phiopt.c) Thanks, Kyrill > Richard. > >> Thanks, >> Kyrill >> >>> Richard. >>> >>>> midend and optimise. The previous approach of generating the open-coded version of that >>>> is used when we don't have an appropriate __builtin_fmin/max available. >>>> For example, for a configuration of x86_64-unknown-linux-gnu that I tested there was no >>>> 128-bit __built_fminl available. >>>> >>>> With this patch I'm seeing more than 7000 FMINNM/FMAXNM instructions being generated at -O3 >>>> on aarch64 for 521.wrf from fprate SPEC2017 where none before were generated >>>> (we were generating explicit comparisons and NaN checks). This gave a 2.4% improvement >>>> in performance on a Cortex-A72. >>>> >>>> Bootstrapped and tested on aarch64-none-linux-gnu and x86_64-unknown-linux-gnu. >>>> >>>> Ok for trunk? >>>> Thanks, >>>> Kyrill >>>> >>>> 2018-07-17 Kyrylo Tkachov >>>> >>>> * f95-lang.c (gfc_init_builtin_functions): Define __builtin_fmin, >>>> __builtin_fminf, __builtin_fminl, __builtin_fmax, __builtin_fmaxf, >>>> __builtin_fmaxl. >>>> * trans-intrinsic.c: Include builtins.h. >>>> (gfc_conv_intrinsic_minmax): Emit __builtin_fmin/max or MIN/MAX_EXPR >>>> functions to calculate the min/max. >>>> >>>> 2018-07-17 Kyrylo Tkachov >>>> >>>> * gfortran.dg/max_fmaxf.f90: New test. >>>> * gfortran.dg/min_fminf.f90: Likewise. >>>> * gfortran.dg/minmax_integer.f90: Likewise. >>>> * gfortran.dg/max_fmaxl_aarch64.f90: Likewise. >>>> * gfortran.dg/min_fminl_aarch64.f90: Likewise.