From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-508367-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 63916 invoked by alias); 5 Sep 2019 09:50:07 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 63881 invoked by uid 89); 5 Sep 2019 09:50:04 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-2.3 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.1 spammy=smallest, aka, quickest, HX-Gm-Message-State:APjAAAWe
X-HELO: mail-lf1-f67.google.com
Received: from mail-lf1-f67.google.com (HELO mail-lf1-f67.google.com) (209.85.167.67) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 05 Sep 2019 09:50:02 +0000
Received: by mail-lf1-f67.google.com with SMTP id d10so1487718lfi.0        for <gcc-patches@gcc.gnu.org>; Thu, 05 Sep 2019 02:50:00 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;        d=gmail.com; s=20161025;        h=mime-version:references:in-reply-to:from:date:message-id:subject:to         :cc;        bh=+A766iutNYkhAkEFnC8E6o2hXozO1TH2YkL86BD1fzQ=;        b=X7KavmhfVbfO/htGjFfc4stPLOL5pIZ5oLRrYPywCavEZdZFjWS1wbmEJXx2WslDO7         d5HqCGPnXzBcLhO8UiEaRz0pv5W8DJuTrDfNQHrcBMrNz3FBYVIGTtmrqwZ24AUf8HXl         mzLGV21pBQGZ8OA/O3tXfdt0n2O/AjUfHZlOQhlUTiAjPfRIWx5YX40heEMDPq8kFFNl         k/mXT1o0IZh6UluOGHC5zji/dfWg9cRyvMGVFcnpcel729aIYQZ+jwpc9qxFXEku92ca         OlnYBBhGYYFHXi6mMk7hqg/J7kB0rcCWZJv19QCjIP9P6AsvgxtIZaitF0jTXfkuWAQV         kOOA==
MIME-Version: 1.0
References: <571395fe-921b-5a68-ec8d-84850a732253@arm.com> <alpine.LSU.2.20.1909031006590.32458@zhemvz.fhfr.qr> <99a19644-9826-ca59-3f8c-f5b93a0fff1a@arm.com>
In-Reply-To: <99a19644-9826-ca59-3f8c-f5b93a0fff1a@arm.com>
From: Richard Biener <richard.guenther@gmail.com>
Date: Thu, 05 Sep 2019 09:50:00 -0000
Message-ID: <CAFiYyc2m5q_j=AgDMLK18GyH_g+Do0UxQF_2kusncdqRGV2KDg@mail.gmail.com>
Subject: Re: [PATCH][GCC] Simplify to single precision where possible for binary/builtin maths operations.
To: Barnaby Wilks <Barnaby.Wilks@arm.com>
Cc: Richard Biener <rguenther@suse.de>, "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>, nd <nd@arm.com>, 	"law@redhat.com" <law@redhat.com>, "ian@airs.com" <ian@airs.com>, Tamar Christina <Tamar.Christina@arm.com>, 	Wilco Dijkstra <Wilco.Dijkstra@arm.com>, Richard Sandiford <Richard.Sandiford@arm.com>
Content-Type: text/plain; charset="UTF-8"
X-IsSubscribed: yes
X-SW-Source: 2019-09/txt/msg00258.txt.bz2

On Tue, Sep 3, 2019 at 5:23 PM Barnaby Wilks <Barnaby.Wilks@arm.com> wrote:
>
>
>
> On 9/3/19 9:23 AM, Richard Biener wrote:
> > On Mon, 2 Sep 2019, Barnaby Wilks wrote:
> >
> >> Hello,
> >>
> >> This patch introduces an optimization for narrowing binary and builtin
> >> math operations to the smallest type when unsafe math optimizations are
> >> enabled (typically -Ofast or -ffast-math).
> >>
> >> Consider the example:
> >>
> >>     float f (float x) {
> >>       return 1.0 / sqrt (x);
> >>     }
> >>
> >>     f:
> >>       fcvt   d0, s0
> >>       fmov   d1, 1.0e+0
> >>       fsqrt  d0, d0
> >>       fdiv   d0, d1, d0
> >>       fcvt   s0, d0
> >>       ret
> >>
> >> Given that all outputs are of float type, we can do the whole
> >> calculation in single precision and avoid any potentially expensive
> >> conversions between single and double precision.
> >>
> >> Aka the expression would end up looking more like
> >>
> >>     float f (float x) {
> >>       return 1.0f / sqrtf (x);
> >>     }
> >>
> >>     f:
> >>       fsqrt  s0, s0
> >>       fmov   s1, 1.0e+0
> >>       fdiv   s0, s1, s0
> >>       ret
> >>
> >> This optimization will narrow casts around math builtins, and also
> >> not try to find the widest type for calculations when processing binary
> >> math operations (if unsafe math optimizations are enable).
> >>
> >> Added tests to verify that narrower math builtins are chosen and
> >> no unnecessary casts are introduced when appropriate.
> >>
> >> Bootstrapped and regtested on aarch64 and x86_64 with no regressions.
> >>
> >> I don't have write access, so if OK for trunk then can someone commit on
> >> my behalf?
> >
> > @@ -5004,10 +5004,18 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >                && newtype == type
> >                && types_match (newtype, type))
> >              (op (convert:newtype @1) (convert:newtype @2))
> > -           (with { if (TYPE_PRECISION (ty1) > TYPE_PRECISION (newtype))
> > +           (with
> > +             {
> > +               if (!flag_unsafe_math_optimizations)
> > +                 {
> > +                   if (TYPE_PRECISION (ty1) > TYPE_PRECISION (newtype))
> >                        newtype = ty1;
> > +
> >                      if (TYPE_PRECISION (ty2) > TYPE_PRECISION (newtype))
> > -                     newtype = ty2; }
> > +                     newtype = ty2;
> > +                 }
> > +             }
> > +
> >                 /* Sometimes this transformation is safe (cannot
> >                    change results through affecting double rounding
> >                    cases) and sometimes it is not.  If NEWTYPE is
> >
> > The ChangeLog doesn't mention this change and I wonder what it is
> > for - later flag_unsafe_math_optimizations is checked, in particular
> >
> >                     && (flag_unsafe_math_optimizations
> >                         || (TYPE_PRECISION (newtype) == TYPE_PRECISION
> > (type)
> >                             && real_can_shorten_arithmetic (TYPE_MODE
> > (itype),
> >                                                             TYPE_MODE
> > (type))
> >                             && !excess_precision_type (newtype)))
> >
> > note the !excess_precision_type (newtype) which you fail to check
> > below.
>
> This change prevents the pattern from casting the operands to the widest
> type (the widest between the type of the operands and the type of the
> expression as a whole) if unsafe math optimizations are enabled.
> Whereas the second check of flag_unsafe_math_optimizations is a shortcut
> to enable the transformation as a whole, and does not affect the type
> that the operands are being cast to.
>
> Without the first check then the expression will always use the widest
> type, resulting in unnecessary casts. For example
>
>    float f (float x, float y) {
>      double z = 1.0 / x;
>      return z * y;
>    }
>
> Will generate (With -Ofast)
>
>    float D.3459;
>    double z;
>
>    _1 = (double) x;
>    z = 1.0e+0 / _1;
>    _2 = (double) y;
>    _3 = z * _2;
>    D.3459 = (float) _3;
>    return D.3459;
>
> Note how the parameters are cast to doubles, the whole calculation is
> done in double precision and then cast out to single precision at the
> end. (because double is the widest type in the expression)
>
> Whereas if you include the first flag_unsafe_math_optimizations check,
> and prevent the widening then you get
>
>    float D.3459;
>    double z;
>
>    _1 = (double) x;
>    z = 1.0e+0 / _1;
>    _2 = (float) z;
>    D.3459 = y * _2;
>    return D.3459;
>
> Where only "double z = 1.0 / x" happens in double precision, and the
> rest of the calculation is done in single precision, reducing the amount
> of casts.
>
> The benefits here can be seen in the generated code:
> Without the flag_unsafe_math_optimizations check
>
>          fcvt    d1, s1
>          fcvt    d0, s0
>          fdiv    d0, d1, d0
>          fcvt    s0, d0
>          ret
>
> With the check
>
>          fdiv    s0, s1, s0
>          ret

I see.  Can you please propose this change independently with a
separate testcase?

> >
> > @@ -5654,3 +5662,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >   (simplify
> >    (vec_perm vec_same_elem_p@0 @0 @1)
> >    @0)
> > +
> > +/* Convert expressions of the form
> > +   (x) math_call1 ((y) z) where (x) and z are the same type, into
> > +   math_call2 (z), where math_call2 is the math builtin for
> > +   type x.  Type x (and therefore type of z) must be a lower precision
> > +   than y/math_call1.  */
> > +(if (flag_unsafe_math_optimizations && !flag_errno_math)
> > +  (for op (COSH EXP EXP10 EXP2 EXPM1 GAMMA J0 J1 LGAMMA
> > +          POW10 SINH TGAMMA Y0 Y1 ACOS ACOSH ASIN ASINH
> > +          ATAN ATANH CBRT COS ERF ERFC LOG LOG10 LOG2
> > +          LOG1P SIN TAN TANH SQRT FABS LOGB)
> > +    (simplify
> > +      (convert (op@0 (convert@1 @2)))
> > +       (if (SCALAR_FLOAT_TYPE_P (type) && SCALAR_FLOAT_TYPE_P (TREE_TYPE
> > (@1))
> > +             && SCALAR_FLOAT_TYPE_P (TREE_TYPE (@2))
> > +             && types_match (type, TREE_TYPE (@2))
> > +             && TYPE_PRECISION (type) < TYPE_PRECISION (TREE_TYPE (@1)))
> > +         (with { enum built_in_function fcode = builtin_mathfn_code (@0);
> > +                 tree fn = mathfn_built_in (type, fcode, false); }
> > +           (if (fn)
> > +             (convert { build_call_expr (fn, 1, @2); })))))))
> >
> > This (convert { build_call_expr (..) } ) only works on GENERIC.
>
> I didn't realize that build_call_expr only works on GENERIC, I took
> that snippet from convert_to_real_1 in convert.c, which does a similar
> thing, but is not very generic and wont cover all the cases.
> I suppose this doesn't matter if the transformation is being moved out
> of match.pd into the backprop pass, or if not is there a more generic
> (if you excuse the pun) way to create function call nodes for GIMPLE &
> GENERIC?

Not as a match.pd "C expression", no.  You'd have to work like I outlined below.

> > I also wonder why you needed the mathfn_built_in change.
>
> Because some builtins are not marked as implicit - or more specifically
> reserved in C90 and actually specified in C99.
> To be honest, I wasn't really happy with doing this, as I'm not sure of
> it's implications, so if you have a better way to do this then that
> would be much appreciated?
>  From what I can tell its the float versions of the math builtins that
> are not implicit, and these are the functions that are most commonly
> needed to narrow to.

The issue with "reserved" function is that if the program doesn't
contain a suitable prototype or a call to such function we have to
avoid introducing it ourselves.  I admit we don't have very good
solutions to this, but at gimplification time we do

      /* If we see a call to a declared builtin or see its address
         being taken (we can unify those cases here) then we can mark
         the builtin for implicit generation by GCC.  */
      if (TREE_CODE (op0) == FUNCTION_DECL
          && fndecl_built_in_p (op0, BUILT_IN_NORMAL)
          && builtin_decl_declared_p (DECL_FUNCTION_CODE (op0)))
        set_builtin_decl_implicit_p (DECL_FUNCTION_CODE (op0), true);

so for testcases you have to make sure to provide appropriate declarations.

> > If you look at other examples in match.pd you'd see you should have
> > used sth like
> >
> >   (for op (BUILT_IN_COSH BUILT_IN_EXP ...)
> >        opf (BUILT_IN_COSHF BUILT_IN_EXPF ...)
> >     (simplify
> > ...
> >        (if (types_match (type, float_type_node))
> >          (opf @2)))
> >
> > and you have to repeat this for the COSHL (long double) case
> > with appropriate opd and opf lists.  In theory, if we'd extend
> > genmatch to 'transform' builtin function kinds that could be
> > done prettier like for example with
>
> Why would this need to be the case? The code already does practially the
> same thing by matching on all the builtins and then transforming down to
> the narrowest type with the builtin_mathfn_code/mathfn_built_in combination.

Because of the issue that you have to create the calls appropriately

> >   (for op (COSH EXP ...)
> >    (simplify
> > ...
> >     (op:type @2))
> >
> > which I'd kind-of like.  Note it's not as simple as passing
> > 'type' to mathfn_built_in since that expects literal
> > double_type_node and friends but we could use a {gimple,generic}-match.c
> > private helper for that.
>
> Would "type" not be a double_type_node (or related) literal already?
> If mathfn_built_in does not recognise the given type then it will just
> spit out NULL_TREE, which is checked by:
>
>    +                 tree fn = mathfn_built_in (type, fcode, false); }
>    +           (if (fn)
>    +             (convert { build_call_expr (fn, 1, @2); })))))))
>
>
> Apologies if any of these seem obvious questions - I'm quite new to GCC
> internals.

I was just talking about how to actually extend genmatch to recognize
the (op:type ...) for builtin function calls, automagically turning
sin to sinf if type is a single-precision FP type and said that simply
using mathfn_built_in with 'type' likely will end up with NULL_TREE
as result more often than necessary, so we have to either enhance
mathfn_built_in (maybe compare modes?) map to the canonical
types mathfn_built_in checks for (you'd have to consider systems
where double is equal to long double but they are still different
types).

Richard.

> Regards,
> Barney
>
> > Now - as a general comment I think adding this kind of narrowing is
> > good but doing it via match.pd patterns is quite limiting - eventually
> > the backprop pass would be a fit for propagating "needed precision"
> > and narrowing feeding stmts accordingly in a more general way?
> > Richard can probably tell quickest if it is feasible in that framework.
> >
> > Thanks,
> > Richard.
> >
> >
> >> Regards,
> >> Barney
> >>
> >> gcc/ChangeLog:
> >>
> >> 2019-09-02  Barnaby Wilks  <barnaby.wilks@arm.com>
> >>
> >>      * builtins.c (mathfn_built_in): Expose find implicit builtin parameter.
> >>      * builtins.h (mathfn_built_in): Likewise.
> >>      * match.pd: Add expressions for simplifying builtin and binary
> >>      math expressions.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >> 2019-09-02  Barnaby Wilks  <barnaby.wilks@arm.com>
> >>
> >>      * gcc.dg/fold-single-precision.c: New test.
> >>
> >