From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 63916 invoked by alias); 5 Sep 2019 09:50:07 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 63881 invoked by uid 89); 5 Sep 2019 09:50:04 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-2.3 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.1 spammy=smallest, aka, quickest, HX-Gm-Message-State:APjAAAWe X-HELO: mail-lf1-f67.google.com Received: from mail-lf1-f67.google.com (HELO mail-lf1-f67.google.com) (209.85.167.67) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 05 Sep 2019 09:50:02 +0000 Received: by mail-lf1-f67.google.com with SMTP id d10so1487718lfi.0 for ; Thu, 05 Sep 2019 02:50:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=+A766iutNYkhAkEFnC8E6o2hXozO1TH2YkL86BD1fzQ=; b=X7KavmhfVbfO/htGjFfc4stPLOL5pIZ5oLRrYPywCavEZdZFjWS1wbmEJXx2WslDO7 d5HqCGPnXzBcLhO8UiEaRz0pv5W8DJuTrDfNQHrcBMrNz3FBYVIGTtmrqwZ24AUf8HXl mzLGV21pBQGZ8OA/O3tXfdt0n2O/AjUfHZlOQhlUTiAjPfRIWx5YX40heEMDPq8kFFNl k/mXT1o0IZh6UluOGHC5zji/dfWg9cRyvMGVFcnpcel729aIYQZ+jwpc9qxFXEku92ca OlnYBBhGYYFHXi6mMk7hqg/J7kB0rcCWZJv19QCjIP9P6AsvgxtIZaitF0jTXfkuWAQV kOOA== MIME-Version: 1.0 References: <571395fe-921b-5a68-ec8d-84850a732253@arm.com> <99a19644-9826-ca59-3f8c-f5b93a0fff1a@arm.com> In-Reply-To: <99a19644-9826-ca59-3f8c-f5b93a0fff1a@arm.com> From: Richard Biener Date: Thu, 05 Sep 2019 09:50:00 -0000 Message-ID: Subject: Re: [PATCH][GCC] Simplify to single precision where possible for binary/builtin maths operations. To: Barnaby Wilks Cc: Richard Biener , "gcc-patches@gcc.gnu.org" , nd , "law@redhat.com" , "ian@airs.com" , Tamar Christina , Wilco Dijkstra , Richard Sandiford Content-Type: text/plain; charset="UTF-8" X-IsSubscribed: yes X-SW-Source: 2019-09/txt/msg00258.txt.bz2 On Tue, Sep 3, 2019 at 5:23 PM Barnaby Wilks wrote: > > > > On 9/3/19 9:23 AM, Richard Biener wrote: > > On Mon, 2 Sep 2019, Barnaby Wilks wrote: > > > >> Hello, > >> > >> This patch introduces an optimization for narrowing binary and builtin > >> math operations to the smallest type when unsafe math optimizations are > >> enabled (typically -Ofast or -ffast-math). > >> > >> Consider the example: > >> > >> float f (float x) { > >> return 1.0 / sqrt (x); > >> } > >> > >> f: > >> fcvt d0, s0 > >> fmov d1, 1.0e+0 > >> fsqrt d0, d0 > >> fdiv d0, d1, d0 > >> fcvt s0, d0 > >> ret > >> > >> Given that all outputs are of float type, we can do the whole > >> calculation in single precision and avoid any potentially expensive > >> conversions between single and double precision. > >> > >> Aka the expression would end up looking more like > >> > >> float f (float x) { > >> return 1.0f / sqrtf (x); > >> } > >> > >> f: > >> fsqrt s0, s0 > >> fmov s1, 1.0e+0 > >> fdiv s0, s1, s0 > >> ret > >> > >> This optimization will narrow casts around math builtins, and also > >> not try to find the widest type for calculations when processing binary > >> math operations (if unsafe math optimizations are enable). > >> > >> Added tests to verify that narrower math builtins are chosen and > >> no unnecessary casts are introduced when appropriate. > >> > >> Bootstrapped and regtested on aarch64 and x86_64 with no regressions. > >> > >> I don't have write access, so if OK for trunk then can someone commit on > >> my behalf? > > > > @@ -5004,10 +5004,18 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > > && newtype == type > > && types_match (newtype, type)) > > (op (convert:newtype @1) (convert:newtype @2)) > > - (with { if (TYPE_PRECISION (ty1) > TYPE_PRECISION (newtype)) > > + (with > > + { > > + if (!flag_unsafe_math_optimizations) > > + { > > + if (TYPE_PRECISION (ty1) > TYPE_PRECISION (newtype)) > > newtype = ty1; > > + > > if (TYPE_PRECISION (ty2) > TYPE_PRECISION (newtype)) > > - newtype = ty2; } > > + newtype = ty2; > > + } > > + } > > + > > /* Sometimes this transformation is safe (cannot > > change results through affecting double rounding > > cases) and sometimes it is not. If NEWTYPE is > > > > The ChangeLog doesn't mention this change and I wonder what it is > > for - later flag_unsafe_math_optimizations is checked, in particular > > > > && (flag_unsafe_math_optimizations > > || (TYPE_PRECISION (newtype) == TYPE_PRECISION > > (type) > > && real_can_shorten_arithmetic (TYPE_MODE > > (itype), > > TYPE_MODE > > (type)) > > && !excess_precision_type (newtype))) > > > > note the !excess_precision_type (newtype) which you fail to check > > below. > > This change prevents the pattern from casting the operands to the widest > type (the widest between the type of the operands and the type of the > expression as a whole) if unsafe math optimizations are enabled. > Whereas the second check of flag_unsafe_math_optimizations is a shortcut > to enable the transformation as a whole, and does not affect the type > that the operands are being cast to. > > Without the first check then the expression will always use the widest > type, resulting in unnecessary casts. For example > > float f (float x, float y) { > double z = 1.0 / x; > return z * y; > } > > Will generate (With -Ofast) > > float D.3459; > double z; > > _1 = (double) x; > z = 1.0e+0 / _1; > _2 = (double) y; > _3 = z * _2; > D.3459 = (float) _3; > return D.3459; > > Note how the parameters are cast to doubles, the whole calculation is > done in double precision and then cast out to single precision at the > end. (because double is the widest type in the expression) > > Whereas if you include the first flag_unsafe_math_optimizations check, > and prevent the widening then you get > > float D.3459; > double z; > > _1 = (double) x; > z = 1.0e+0 / _1; > _2 = (float) z; > D.3459 = y * _2; > return D.3459; > > Where only "double z = 1.0 / x" happens in double precision, and the > rest of the calculation is done in single precision, reducing the amount > of casts. > > The benefits here can be seen in the generated code: > Without the flag_unsafe_math_optimizations check > > fcvt d1, s1 > fcvt d0, s0 > fdiv d0, d1, d0 > fcvt s0, d0 > ret > > With the check > > fdiv s0, s1, s0 > ret I see. Can you please propose this change independently with a separate testcase? > > > > @@ -5654,3 +5662,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > > (simplify > > (vec_perm vec_same_elem_p@0 @0 @1) > > @0) > > + > > +/* Convert expressions of the form > > + (x) math_call1 ((y) z) where (x) and z are the same type, into > > + math_call2 (z), where math_call2 is the math builtin for > > + type x. Type x (and therefore type of z) must be a lower precision > > + than y/math_call1. */ > > +(if (flag_unsafe_math_optimizations && !flag_errno_math) > > + (for op (COSH EXP EXP10 EXP2 EXPM1 GAMMA J0 J1 LGAMMA > > + POW10 SINH TGAMMA Y0 Y1 ACOS ACOSH ASIN ASINH > > + ATAN ATANH CBRT COS ERF ERFC LOG LOG10 LOG2 > > + LOG1P SIN TAN TANH SQRT FABS LOGB) > > + (simplify > > + (convert (op@0 (convert@1 @2))) > > + (if (SCALAR_FLOAT_TYPE_P (type) && SCALAR_FLOAT_TYPE_P (TREE_TYPE > > (@1)) > > + && SCALAR_FLOAT_TYPE_P (TREE_TYPE (@2)) > > + && types_match (type, TREE_TYPE (@2)) > > + && TYPE_PRECISION (type) < TYPE_PRECISION (TREE_TYPE (@1))) > > + (with { enum built_in_function fcode = builtin_mathfn_code (@0); > > + tree fn = mathfn_built_in (type, fcode, false); } > > + (if (fn) > > + (convert { build_call_expr (fn, 1, @2); }))))))) > > > > This (convert { build_call_expr (..) } ) only works on GENERIC. > > I didn't realize that build_call_expr only works on GENERIC, I took > that snippet from convert_to_real_1 in convert.c, which does a similar > thing, but is not very generic and wont cover all the cases. > I suppose this doesn't matter if the transformation is being moved out > of match.pd into the backprop pass, or if not is there a more generic > (if you excuse the pun) way to create function call nodes for GIMPLE & > GENERIC? Not as a match.pd "C expression", no. You'd have to work like I outlined below. > > I also wonder why you needed the mathfn_built_in change. > > Because some builtins are not marked as implicit - or more specifically > reserved in C90 and actually specified in C99. > To be honest, I wasn't really happy with doing this, as I'm not sure of > it's implications, so if you have a better way to do this then that > would be much appreciated? > From what I can tell its the float versions of the math builtins that > are not implicit, and these are the functions that are most commonly > needed to narrow to. The issue with "reserved" function is that if the program doesn't contain a suitable prototype or a call to such function we have to avoid introducing it ourselves. I admit we don't have very good solutions to this, but at gimplification time we do /* If we see a call to a declared builtin or see its address being taken (we can unify those cases here) then we can mark the builtin for implicit generation by GCC. */ if (TREE_CODE (op0) == FUNCTION_DECL && fndecl_built_in_p (op0, BUILT_IN_NORMAL) && builtin_decl_declared_p (DECL_FUNCTION_CODE (op0))) set_builtin_decl_implicit_p (DECL_FUNCTION_CODE (op0), true); so for testcases you have to make sure to provide appropriate declarations. > > If you look at other examples in match.pd you'd see you should have > > used sth like > > > > (for op (BUILT_IN_COSH BUILT_IN_EXP ...) > > opf (BUILT_IN_COSHF BUILT_IN_EXPF ...) > > (simplify > > ... > > (if (types_match (type, float_type_node)) > > (opf @2))) > > > > and you have to repeat this for the COSHL (long double) case > > with appropriate opd and opf lists. In theory, if we'd extend > > genmatch to 'transform' builtin function kinds that could be > > done prettier like for example with > > Why would this need to be the case? The code already does practially the > same thing by matching on all the builtins and then transforming down to > the narrowest type with the builtin_mathfn_code/mathfn_built_in combination. Because of the issue that you have to create the calls appropriately > > (for op (COSH EXP ...) > > (simplify > > ... > > (op:type @2)) > > > > which I'd kind-of like. Note it's not as simple as passing > > 'type' to mathfn_built_in since that expects literal > > double_type_node and friends but we could use a {gimple,generic}-match.c > > private helper for that. > > Would "type" not be a double_type_node (or related) literal already? > If mathfn_built_in does not recognise the given type then it will just > spit out NULL_TREE, which is checked by: > > + tree fn = mathfn_built_in (type, fcode, false); } > + (if (fn) > + (convert { build_call_expr (fn, 1, @2); }))))))) > > > Apologies if any of these seem obvious questions - I'm quite new to GCC > internals. I was just talking about how to actually extend genmatch to recognize the (op:type ...) for builtin function calls, automagically turning sin to sinf if type is a single-precision FP type and said that simply using mathfn_built_in with 'type' likely will end up with NULL_TREE as result more often than necessary, so we have to either enhance mathfn_built_in (maybe compare modes?) map to the canonical types mathfn_built_in checks for (you'd have to consider systems where double is equal to long double but they are still different types). Richard. > Regards, > Barney > > > Now - as a general comment I think adding this kind of narrowing is > > good but doing it via match.pd patterns is quite limiting - eventually > > the backprop pass would be a fit for propagating "needed precision" > > and narrowing feeding stmts accordingly in a more general way? > > Richard can probably tell quickest if it is feasible in that framework. > > > > Thanks, > > Richard. > > > > > >> Regards, > >> Barney > >> > >> gcc/ChangeLog: > >> > >> 2019-09-02 Barnaby Wilks > >> > >> * builtins.c (mathfn_built_in): Expose find implicit builtin parameter. > >> * builtins.h (mathfn_built_in): Likewise. > >> * match.pd: Add expressions for simplifying builtin and binary > >> math expressions. > >> > >> gcc/testsuite/ChangeLog: > >> > >> 2019-09-02 Barnaby Wilks > >> > >> * gcc.dg/fold-single-precision.c: New test. > >> > >