From: Richard Biener <richard.guenther@gmail.com>
To: Barnaby Wilks <Barnaby.Wilks@arm.com>
Cc: Richard Biener <rguenther@suse.de>,
"gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
nd <nd@arm.com>, "law@redhat.com" <law@redhat.com>,
"ian@airs.com" <ian@airs.com>,
Tamar Christina <Tamar.Christina@arm.com>,
Wilco Dijkstra <Wilco.Dijkstra@arm.com>,
Richard Sandiford <Richard.Sandiford@arm.com>
Subject: Re: [PATCH][GCC] Simplify to single precision where possible for binary/builtin maths operations.
Date: Thu, 05 Sep 2019 09:50:00 -0000 [thread overview]
Message-ID: <CAFiYyc2m5q_j=AgDMLK18GyH_g+Do0UxQF_2kusncdqRGV2KDg@mail.gmail.com> (raw)
In-Reply-To: <99a19644-9826-ca59-3f8c-f5b93a0fff1a@arm.com>
On Tue, Sep 3, 2019 at 5:23 PM Barnaby Wilks <Barnaby.Wilks@arm.com> wrote:
>
>
>
> On 9/3/19 9:23 AM, Richard Biener wrote:
> > On Mon, 2 Sep 2019, Barnaby Wilks wrote:
> >
> >> Hello,
> >>
> >> This patch introduces an optimization for narrowing binary and builtin
> >> math operations to the smallest type when unsafe math optimizations are
> >> enabled (typically -Ofast or -ffast-math).
> >>
> >> Consider the example:
> >>
> >> float f (float x) {
> >> return 1.0 / sqrt (x);
> >> }
> >>
> >> f:
> >> fcvt d0, s0
> >> fmov d1, 1.0e+0
> >> fsqrt d0, d0
> >> fdiv d0, d1, d0
> >> fcvt s0, d0
> >> ret
> >>
> >> Given that all outputs are of float type, we can do the whole
> >> calculation in single precision and avoid any potentially expensive
> >> conversions between single and double precision.
> >>
> >> Aka the expression would end up looking more like
> >>
> >> float f (float x) {
> >> return 1.0f / sqrtf (x);
> >> }
> >>
> >> f:
> >> fsqrt s0, s0
> >> fmov s1, 1.0e+0
> >> fdiv s0, s1, s0
> >> ret
> >>
> >> This optimization will narrow casts around math builtins, and also
> >> not try to find the widest type for calculations when processing binary
> >> math operations (if unsafe math optimizations are enable).
> >>
> >> Added tests to verify that narrower math builtins are chosen and
> >> no unnecessary casts are introduced when appropriate.
> >>
> >> Bootstrapped and regtested on aarch64 and x86_64 with no regressions.
> >>
> >> I don't have write access, so if OK for trunk then can someone commit on
> >> my behalf?
> >
> > @@ -5004,10 +5004,18 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > && newtype == type
> > && types_match (newtype, type))
> > (op (convert:newtype @1) (convert:newtype @2))
> > - (with { if (TYPE_PRECISION (ty1) > TYPE_PRECISION (newtype))
> > + (with
> > + {
> > + if (!flag_unsafe_math_optimizations)
> > + {
> > + if (TYPE_PRECISION (ty1) > TYPE_PRECISION (newtype))
> > newtype = ty1;
> > +
> > if (TYPE_PRECISION (ty2) > TYPE_PRECISION (newtype))
> > - newtype = ty2; }
> > + newtype = ty2;
> > + }
> > + }
> > +
> > /* Sometimes this transformation is safe (cannot
> > change results through affecting double rounding
> > cases) and sometimes it is not. If NEWTYPE is
> >
> > The ChangeLog doesn't mention this change and I wonder what it is
> > for - later flag_unsafe_math_optimizations is checked, in particular
> >
> > && (flag_unsafe_math_optimizations
> > || (TYPE_PRECISION (newtype) == TYPE_PRECISION
> > (type)
> > && real_can_shorten_arithmetic (TYPE_MODE
> > (itype),
> > TYPE_MODE
> > (type))
> > && !excess_precision_type (newtype)))
> >
> > note the !excess_precision_type (newtype) which you fail to check
> > below.
>
> This change prevents the pattern from casting the operands to the widest
> type (the widest between the type of the operands and the type of the
> expression as a whole) if unsafe math optimizations are enabled.
> Whereas the second check of flag_unsafe_math_optimizations is a shortcut
> to enable the transformation as a whole, and does not affect the type
> that the operands are being cast to.
>
> Without the first check then the expression will always use the widest
> type, resulting in unnecessary casts. For example
>
> float f (float x, float y) {
> double z = 1.0 / x;
> return z * y;
> }
>
> Will generate (With -Ofast)
>
> float D.3459;
> double z;
>
> _1 = (double) x;
> z = 1.0e+0 / _1;
> _2 = (double) y;
> _3 = z * _2;
> D.3459 = (float) _3;
> return D.3459;
>
> Note how the parameters are cast to doubles, the whole calculation is
> done in double precision and then cast out to single precision at the
> end. (because double is the widest type in the expression)
>
> Whereas if you include the first flag_unsafe_math_optimizations check,
> and prevent the widening then you get
>
> float D.3459;
> double z;
>
> _1 = (double) x;
> z = 1.0e+0 / _1;
> _2 = (float) z;
> D.3459 = y * _2;
> return D.3459;
>
> Where only "double z = 1.0 / x" happens in double precision, and the
> rest of the calculation is done in single precision, reducing the amount
> of casts.
>
> The benefits here can be seen in the generated code:
> Without the flag_unsafe_math_optimizations check
>
> fcvt d1, s1
> fcvt d0, s0
> fdiv d0, d1, d0
> fcvt s0, d0
> ret
>
> With the check
>
> fdiv s0, s1, s0
> ret
I see. Can you please propose this change independently with a
separate testcase?
> >
> > @@ -5654,3 +5662,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > (simplify
> > (vec_perm vec_same_elem_p@0 @0 @1)
> > @0)
> > +
> > +/* Convert expressions of the form
> > + (x) math_call1 ((y) z) where (x) and z are the same type, into
> > + math_call2 (z), where math_call2 is the math builtin for
> > + type x. Type x (and therefore type of z) must be a lower precision
> > + than y/math_call1. */
> > +(if (flag_unsafe_math_optimizations && !flag_errno_math)
> > + (for op (COSH EXP EXP10 EXP2 EXPM1 GAMMA J0 J1 LGAMMA
> > + POW10 SINH TGAMMA Y0 Y1 ACOS ACOSH ASIN ASINH
> > + ATAN ATANH CBRT COS ERF ERFC LOG LOG10 LOG2
> > + LOG1P SIN TAN TANH SQRT FABS LOGB)
> > + (simplify
> > + (convert (op@0 (convert@1 @2)))
> > + (if (SCALAR_FLOAT_TYPE_P (type) && SCALAR_FLOAT_TYPE_P (TREE_TYPE
> > (@1))
> > + && SCALAR_FLOAT_TYPE_P (TREE_TYPE (@2))
> > + && types_match (type, TREE_TYPE (@2))
> > + && TYPE_PRECISION (type) < TYPE_PRECISION (TREE_TYPE (@1)))
> > + (with { enum built_in_function fcode = builtin_mathfn_code (@0);
> > + tree fn = mathfn_built_in (type, fcode, false); }
> > + (if (fn)
> > + (convert { build_call_expr (fn, 1, @2); })))))))
> >
> > This (convert { build_call_expr (..) } ) only works on GENERIC.
>
> I didn't realize that build_call_expr only works on GENERIC, I took
> that snippet from convert_to_real_1 in convert.c, which does a similar
> thing, but is not very generic and wont cover all the cases.
> I suppose this doesn't matter if the transformation is being moved out
> of match.pd into the backprop pass, or if not is there a more generic
> (if you excuse the pun) way to create function call nodes for GIMPLE &
> GENERIC?
Not as a match.pd "C expression", no. You'd have to work like I outlined below.
> > I also wonder why you needed the mathfn_built_in change.
>
> Because some builtins are not marked as implicit - or more specifically
> reserved in C90 and actually specified in C99.
> To be honest, I wasn't really happy with doing this, as I'm not sure of
> it's implications, so if you have a better way to do this then that
> would be much appreciated?
> From what I can tell its the float versions of the math builtins that
> are not implicit, and these are the functions that are most commonly
> needed to narrow to.
The issue with "reserved" function is that if the program doesn't
contain a suitable prototype or a call to such function we have to
avoid introducing it ourselves. I admit we don't have very good
solutions to this, but at gimplification time we do
/* If we see a call to a declared builtin or see its address
being taken (we can unify those cases here) then we can mark
the builtin for implicit generation by GCC. */
if (TREE_CODE (op0) == FUNCTION_DECL
&& fndecl_built_in_p (op0, BUILT_IN_NORMAL)
&& builtin_decl_declared_p (DECL_FUNCTION_CODE (op0)))
set_builtin_decl_implicit_p (DECL_FUNCTION_CODE (op0), true);
so for testcases you have to make sure to provide appropriate declarations.
> > If you look at other examples in match.pd you'd see you should have
> > used sth like
> >
> > (for op (BUILT_IN_COSH BUILT_IN_EXP ...)
> > opf (BUILT_IN_COSHF BUILT_IN_EXPF ...)
> > (simplify
> > ...
> > (if (types_match (type, float_type_node))
> > (opf @2)))
> >
> > and you have to repeat this for the COSHL (long double) case
> > with appropriate opd and opf lists. In theory, if we'd extend
> > genmatch to 'transform' builtin function kinds that could be
> > done prettier like for example with
>
> Why would this need to be the case? The code already does practially the
> same thing by matching on all the builtins and then transforming down to
> the narrowest type with the builtin_mathfn_code/mathfn_built_in combination.
Because of the issue that you have to create the calls appropriately
> > (for op (COSH EXP ...)
> > (simplify
> > ...
> > (op:type @2))
> >
> > which I'd kind-of like. Note it's not as simple as passing
> > 'type' to mathfn_built_in since that expects literal
> > double_type_node and friends but we could use a {gimple,generic}-match.c
> > private helper for that.
>
> Would "type" not be a double_type_node (or related) literal already?
> If mathfn_built_in does not recognise the given type then it will just
> spit out NULL_TREE, which is checked by:
>
> + tree fn = mathfn_built_in (type, fcode, false); }
> + (if (fn)
> + (convert { build_call_expr (fn, 1, @2); })))))))
>
>
> Apologies if any of these seem obvious questions - I'm quite new to GCC
> internals.
I was just talking about how to actually extend genmatch to recognize
the (op:type ...) for builtin function calls, automagically turning
sin to sinf if type is a single-precision FP type and said that simply
using mathfn_built_in with 'type' likely will end up with NULL_TREE
as result more often than necessary, so we have to either enhance
mathfn_built_in (maybe compare modes?) map to the canonical
types mathfn_built_in checks for (you'd have to consider systems
where double is equal to long double but they are still different
types).
Richard.
> Regards,
> Barney
>
> > Now - as a general comment I think adding this kind of narrowing is
> > good but doing it via match.pd patterns is quite limiting - eventually
> > the backprop pass would be a fit for propagating "needed precision"
> > and narrowing feeding stmts accordingly in a more general way?
> > Richard can probably tell quickest if it is feasible in that framework.
> >
> > Thanks,
> > Richard.
> >
> >
> >> Regards,
> >> Barney
> >>
> >> gcc/ChangeLog:
> >>
> >> 2019-09-02 Barnaby Wilks <barnaby.wilks@arm.com>
> >>
> >> * builtins.c (mathfn_built_in): Expose find implicit builtin parameter.
> >> * builtins.h (mathfn_built_in): Likewise.
> >> * match.pd: Add expressions for simplifying builtin and binary
> >> math expressions.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >> 2019-09-02 Barnaby Wilks <barnaby.wilks@arm.com>
> >>
> >> * gcc.dg/fold-single-precision.c: New test.
> >>
> >
prev parent reply other threads:[~2019-09-05 9:50 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-09-02 17:29 Barnaby Wilks
2019-09-03 8:23 ` Richard Biener
2019-09-03 14:19 ` Richard Sandiford
2019-09-03 15:23 ` Barnaby Wilks
2019-09-05 9:50 ` Richard Biener [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAFiYyc2m5q_j=AgDMLK18GyH_g+Do0UxQF_2kusncdqRGV2KDg@mail.gmail.com' \
--to=richard.guenther@gmail.com \
--cc=Barnaby.Wilks@arm.com \
--cc=Richard.Sandiford@arm.com \
--cc=Tamar.Christina@arm.com \
--cc=Wilco.Dijkstra@arm.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=ian@airs.com \
--cc=law@redhat.com \
--cc=nd@arm.com \
--cc=rguenther@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).