From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 57421 invoked by alias); 18 Jul 2018 11:45:42 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 55252 invoked by uid 89); 18 Jul 2018 11:44:15 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_PASS autolearn=ham version=3.3.2 spammy=H*i:sk:J2i4r3L, H*f:sk:J2i4r3L X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 18 Jul 2018 11:44:14 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 677707A9; Wed, 18 Jul 2018 04:44:12 -0700 (PDT) Received: from localhost (unknown [10.32.99.48]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 761343F318; Wed, 18 Jul 2018 04:44:11 -0700 (PDT) From: Richard Sandiford To: Richard Biener Mail-Followup-To: Richard Biener ,kyrylo.tkachov@foss.arm.com, "fortran\@gcc.gnu.org" , GCC Patches , richard.sandiford@arm.com Cc: kyrylo.tkachov@foss.arm.com, "fortran\@gcc.gnu.org" , GCC Patches Subject: Re: [PATCH]Use MIN/MAX_EXPR for intrinsics or __builtin_fmin/max when appropriate References: <5B4DE283.9060100@foss.arm.com> <5B4DF325.2050609@foss.arm.com> <5B4F0D5A.2030401@foss.arm.com> Date: Wed, 18 Jul 2018 11:45:00 -0000 In-Reply-To: (Richard Biener's message of "Wed, 18 Jul 2018 12:06:15 +0200") Message-ID: <87r2k04vli.fsf@arm.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-SW-Source: 2018-07/txt/msg00992.txt.bz2 Richard Biener writes: > On Wed, Jul 18, 2018 at 11:50 AM Kyrill Tkachov > wrote: >> >> >> On 18/07/18 10:44, Richard Biener wrote: >> > On Tue, Jul 17, 2018 at 3:46 PM Kyrill Tkachov >> > wrote: >> >> Hi Richard, >> >> >> >> On 17/07/18 14:27, Richard Biener wrote: >> >>> On Tue, Jul 17, 2018 at 2:35 PM Kyrill Tkachov >> >>> wrote: >> >>>> Hi all, >> >>>> >> >>>> This is my first Fortran patch, so apologies if I'm missing something. >> >>>> The current expansion of the min and max intrinsics explicitly expands >> >>>> the comparisons between each argument to calculate the global min/max. >> >>>> Some targets, like aarch64, have instructions that can calculate >> >>>> the min/max >> >>>> of two real (floating-point) numbers with the proper NaN-handling >> >>>> semantics >> >>>> (if both inputs are NaN, return Nan. If one is NaN, return the >> >>>> other) and those >> >>>> are the semantics provided by the __builtin_fmin/max family of >> >>>> functions that expand >> >>>> to these instructions. >> >>>> >> >>>> This patch makes the frontend emit __builtin_fmin/max directly to >> >>>> compare each >> >>>> pair of numbers when the numbers are floating-point, and use >> >>>> MIN_EXPR/MAX_EXPR otherwise >> >>>> (integral types and -ffast-math) which should hopefully be easier >> >>>> to recognise in the >> >>> What is Fortrans requirement on min/max intrinsics? Doesn't it only >> >>> require things that >> >>> are guaranteed by MIN/MAX_EXPR anyways? The only restriction here is >> >> The current implementation expands to: >> >> mvar = a1; >> >> if (a2 .op. mvar || isnan (mvar)) >> >> mvar = a2; >> >> if (a3 .op. mvar || isnan (mvar)) >> >> mvar = a3; >> >> ... >> >> return mvar; >> >> >> >> That is, if one of the operands is a NaN it will return the other argument. >> >> If both (all) are NaNs, it will return NaN. This is the same as the semantics of fmin/max >> >> as far as I can tell. >> >> >> >>> /* Minimum and maximum values. When used with floating point, if both >> >>> operands are zeros, or if either operand is NaN, then it is >> >>> unspecified >> >>> which of the two operands is returned as the result. */ >> >>> >> >>> which means MIN/MAX_EXPR are not strictly IEEE compliant with signed >> >>> zeros or NaNs. >> >>> Thus the correct test would be !HONOR_SIGNED_ZEROS && !HONOR_NANS >> >>> if singed >> >>> zeros are significant. >> >> True, MIN/MAX_EXPR would not be appropriate in that condition. I >> >> guarded their use >> >> on !HONOR_NANS (type) only. I'll update it to !HONOR_SIGNED_ZEROS >> >> (type) && !HONOR_NANS (type). >> >> >> >> >> >>> I'm not sure if using fmin/max calls when we cannot use MIN/MAX_EXPR >> >>> is a good idea, >> >>> this may both generate bigger code and be slower. >> >> The patch will generate fmin/fmax calls (or the fminf,fminl >> >> variants) when mathfn_built_in advertises >> >> them as available (does that mean they'll have a fast inline >> >> implementation?) >> > This doesn't mean anything given you make them available with your >> > patch ;) So I expect it may >> > cause issues for !c99_runtime targets (and long double at least). >> >> Urgh, that can cause headaches... >> >> >> If the above doesn't hold and we can't use either MIN/MAX_EXPR of >> >> fmin/fmax then the patch falls back >> >> to the existing expansion. >> > As said I would not use fmin/fmax calls here at all. >> >> ... Given the comments from Thomas and Janne, maybe we should just >> emit MIN/MAX_EXPRs here >> since there is no language requirement on NaN/signed zero handling on >> these intrinsics? >> That should make it simpler and more portable. > > That's fortran maintainers call. > >> >> FWIW, this patch does improve performance on 521.wrf from SPEC2017 >> >> on aarch64. >> > You said that, yes. Even without -ffast-math? >> >> It improves at -O3 without -ffast-math in particular. With -ffast-math >> phiopt optimisation >> is more aggressive and merges the conditionals into MIN/MAX_EXPRs >> (minmax_replacement in tree-ssa-phiopt.c) > > The question is will it be slower without -ffast-math, that is, when > fmin/max() calls are emitted rather > than inline conditionals. > > I think a patch just using MAX/MIN_EXPR within the existing > constraints and otherwise falling back to > the current code would be more obvious and other changes should be > mande independently. If going to MIN_EXPR and MAX_EXPR unconditionally isn't acceptable, maybe an alternative would be to go straight to internal functions, under the usual: direct_internal_fn_supported_p (IFN_F{MIN,MAX}, type, OPTIMIZE_FOR_SPEED) condition. Thanks, Richard