From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-481792-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 68525 invoked by alias); 18 Jul 2018 10:06:31 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 68427 invoked by uid 89); 18 Jul 2018 10:06:31 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-2.4 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy=falling, headaches, H*f:CAFiYyc2ECcDg, H*f:sk:25fmb9Y
X-HELO: mail-lj1-f195.google.com
Received: from mail-lj1-f195.google.com (HELO mail-lj1-f195.google.com) (209.85.208.195) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 18 Jul 2018 10:06:29 +0000
Received: by mail-lj1-f195.google.com with SMTP id j19-v6so3576807ljc.7;        Wed, 18 Jul 2018 03:06:28 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;        d=gmail.com; s=20161025;        h=mime-version:references:in-reply-to:from:date:message-id:subject:to         :cc;        bh=hCBc4/AXV4HJ5DSXuoZokv85Q4eBs8Q6lM02D5UHS6E=;        b=tZAKsjCVeNb43JivhP9XQz7XFIFswg3c7fXUpdWvSYCsDIFLG2z05FMpyYCvrZcrm7         tZRJqzulquzEDRFuQ/g/i3poo3rJPI3meMKqPeNPgieza66Xik075jK0e1FcVhJq81Ku         b2JkxpEuT/iqvsq1z47JBLGM/Dson3lVmFnrXCUqliWOcntqyn/yGPf/AufI5RBYs0Fc         jaNjRQsJ1afZLk/gPd8zdsCVsUjSdq7bG5hpAbA31VGOdetLmxwieS51d46xyb7tDlX0         V/yKKkTtdvB12oibBBY1DNSj1FLrX4499fDbKaSlp3ow4ct6VT16XXaN5PV1CyFvTs/H         sCoQ==
MIME-Version: 1.0
References: <5B4DE283.9060100@foss.arm.com> <CAFiYyc2F_H1bSCQg+caLQr8WnqExtkAVyAhaQMky_HbZCC=5hQ@mail.gmail.com> <5B4DF325.2050609@foss.arm.com> <CAFiYyc2ECcDg+25fmb9Yr8y0-6F2kQ3FKLrsm-jyw0LPvyqQ=Q@mail.gmail.com> <5B4F0D5A.2030401@foss.arm.com>
In-Reply-To: <5B4F0D5A.2030401@foss.arm.com>
From: Richard Biener <richard.guenther@gmail.com>
Date: Wed, 18 Jul 2018 10:06:00 -0000
Message-ID: <CAFiYyc2KjSqJjys51QnCuPnFYOv7e3=J2i4r3LA74Xj062utEg@mail.gmail.com>
Subject: Re: [PATCH][Fortran] Use MIN/MAX_EXPR for intrinsics or __builtin_fmin/max when appropriate
To: kyrylo.tkachov@foss.arm.com
Cc: "fortran@gcc.gnu.org" <fortran@gcc.gnu.org>, GCC Patches <gcc-patches@gcc.gnu.org>
Content-Type: text/plain; charset="UTF-8"
X-IsSubscribed: yes
X-SW-Source: 2018-07/txt/msg00989.txt.bz2

On Wed, Jul 18, 2018 at 11:50 AM Kyrill Tkachov
<kyrylo.tkachov@foss.arm.com> wrote:
>
>
> On 18/07/18 10:44, Richard Biener wrote:
> > On Tue, Jul 17, 2018 at 3:46 PM Kyrill Tkachov
> > <kyrylo.tkachov@foss.arm.com> wrote:
> >> Hi Richard,
> >>
> >> On 17/07/18 14:27, Richard Biener wrote:
> >>> On Tue, Jul 17, 2018 at 2:35 PM Kyrill Tkachov
> >>> <kyrylo.tkachov@foss.arm.com> wrote:
> >>>> Hi all,
> >>>>
> >>>> This is my first Fortran patch, so apologies if I'm missing something.
> >>>> The current expansion of the min and max intrinsics explicitly expands
> >>>> the comparisons between each argument to calculate the global min/max.
> >>>> Some targets, like aarch64, have instructions that can calculate the min/max
> >>>> of two real (floating-point) numbers with the proper NaN-handling semantics
> >>>> (if both inputs are NaN, return Nan. If one is NaN, return the other) and those
> >>>> are the semantics provided by the __builtin_fmin/max family of functions that expand
> >>>> to these instructions.
> >>>>
> >>>> This patch makes the frontend emit __builtin_fmin/max directly to compare each
> >>>> pair of numbers when the numbers are floating-point, and use MIN_EXPR/MAX_EXPR otherwise
> >>>> (integral types and -ffast-math) which should hopefully be easier to recognise in the
> >>> What is Fortrans requirement on min/max intrinsics?  Doesn't it only
> >>> require things that
> >>> are guaranteed by MIN/MAX_EXPR anyways?  The only restriction here is
> >> The current implementation expands to:
> >>       mvar = a1;
> >>       if (a2 .op. mvar || isnan (mvar))
> >>         mvar = a2;
> >>       if (a3 .op. mvar || isnan (mvar))
> >>         mvar = a3;
> >>       ...
> >>       return mvar;
> >>
> >> That is, if one of the operands is a NaN it will return the other argument.
> >> If both (all) are NaNs, it will return NaN. This is the same as the semantics of fmin/max
> >> as far as I can tell.
> >>
> >>> /* Minimum and maximum values.  When used with floating point, if both
> >>>      operands are zeros, or if either operand is NaN, then it is unspecified
> >>>      which of the two operands is returned as the result.  */
> >>>
> >>> which means MIN/MAX_EXPR are not strictly IEEE compliant with signed
> >>> zeros or NaNs.
> >>> Thus the correct test would be !HONOR_SIGNED_ZEROS && !HONOR_NANS if singed
> >>> zeros are significant.
> >> True, MIN/MAX_EXPR would not be appropriate in that condition. I guarded their use
> >> on !HONOR_NANS (type) only. I'll update it to !HONOR_SIGNED_ZEROS (type) && !HONOR_NANS (type).
> >>
> >>
> >>> I'm not sure if using fmin/max calls when we cannot use MIN/MAX_EXPR
> >>> is a good idea,
> >>> this may both generate bigger code and be slower.
> >> The patch will generate fmin/fmax calls (or the fminf,fminl variants) when mathfn_built_in advertises
> >> them as available (does that mean they'll have a fast inline implementation?)
> > This doesn't mean anything given you make them available with your
> > patch ;)  So I expect it may
> > cause issues for !c99_runtime targets (and long double at least).
>
> Urgh, that can cause headaches...
>
> >> If the above doesn't hold and we can't use either MIN/MAX_EXPR of fmin/fmax then the patch falls back
> >> to the existing expansion.
> > As said I would not use fmin/fmax calls here at all.
>
> ... Given the comments from Thomas and Janne, maybe we should just emit MIN/MAX_EXPRs here
> since there is no language requirement on NaN/signed zero handling on these intrinsics?
> That should make it simpler and more portable.

That's fortran maintainers call.

> >> FWIW, this patch does improve performance on 521.wrf from SPEC2017 on aarch64.
> > You said that, yes.  Even without -ffast-math?
>
> It improves at -O3 without -ffast-math in particular. With -ffast-math phiopt optimisation
> is more aggressive and merges the conditionals into MIN/MAX_EXPRs (minmax_replacement in tree-ssa-phiopt.c)

The question is will it be slower without -ffast-math, that is, when
fmin/max() calls are emitted rather
than inline conditionals.

I think a patch just using MAX/MIN_EXPR within the existing
constraints and otherwise falling back to
the current code would be more obvious and other changes should be
mande independently.

Richard.

> Thanks,
> Kyrill
>
> > Richard.
> >
> >> Thanks,
> >> Kyrill
> >>
> >>> Richard.
> >>>
> >>>> midend and optimise. The previous approach of generating the open-coded version of that
> >>>> is used when we don't have an appropriate __builtin_fmin/max available.
> >>>> For example, for a configuration of x86_64-unknown-linux-gnu that I tested there was no
> >>>> 128-bit __built_fminl available.
> >>>>
> >>>> With this patch I'm seeing more than 7000 FMINNM/FMAXNM instructions being generated at -O3
> >>>> on aarch64 for 521.wrf from fprate SPEC2017 where none before were generated
> >>>> (we were generating explicit comparisons and NaN checks). This gave a 2.4% improvement
> >>>> in performance on a Cortex-A72.
> >>>>
> >>>> Bootstrapped and tested on aarch64-none-linux-gnu and x86_64-unknown-linux-gnu.
> >>>>
> >>>> Ok for trunk?
> >>>> Thanks,
> >>>> Kyrill
> >>>>
> >>>> 2018-07-17  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
> >>>>
> >>>>        * f95-lang.c (gfc_init_builtin_functions): Define __builtin_fmin,
> >>>>        __builtin_fminf, __builtin_fminl, __builtin_fmax, __builtin_fmaxf,
> >>>>        __builtin_fmaxl.
> >>>>        * trans-intrinsic.c: Include builtins.h.
> >>>>        (gfc_conv_intrinsic_minmax): Emit __builtin_fmin/max or MIN/MAX_EXPR
> >>>>        functions to calculate the min/max.
> >>>>
> >>>> 2018-07-17  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
> >>>>
> >>>>        * gfortran.dg/max_fmaxf.f90: New test.
> >>>>        * gfortran.dg/min_fminf.f90: Likewise.
> >>>>        * gfortran.dg/minmax_integer.f90: Likewise.
> >>>>        * gfortran.dg/max_fmaxl_aarch64.f90: Likewise.
> >>>>        * gfortran.dg/min_fminl_aarch64.f90: Likewise.
>