* [PATCH][GCC] Simplify to single precision where possible for binary/builtin maths operations. @ 2019-09-02 17:29 Barnaby Wilks 2019-09-03 8:23 ` Richard Biener 0 siblings, 1 reply; 5+ messages in thread From: Barnaby Wilks @ 2019-09-02 17:29 UTC (permalink / raw) To: gcc-patches; +Cc: nd, law, ian, rguenther, Tamar Christina, Wilco Dijkstra [-- Attachment #1: Type: text/plain, Size: 1702 bytes --] Hello, This patch introduces an optimization for narrowing binary and builtin math operations to the smallest type when unsafe math optimizations are enabled (typically -Ofast or -ffast-math). Consider the example: float f (float x) { return 1.0 / sqrt (x); } f: fcvt d0, s0 fmov d1, 1.0e+0 fsqrt d0, d0 fdiv d0, d1, d0 fcvt s0, d0 ret Given that all outputs are of float type, we can do the whole calculation in single precision and avoid any potentially expensive conversions between single and double precision. Aka the expression would end up looking more like float f (float x) { return 1.0f / sqrtf (x); } f: fsqrt s0, s0 fmov s1, 1.0e+0 fdiv s0, s1, s0 ret This optimization will narrow casts around math builtins, and also not try to find the widest type for calculations when processing binary math operations (if unsafe math optimizations are enable). Added tests to verify that narrower math builtins are chosen and no unnecessary casts are introduced when appropriate. Bootstrapped and regtested on aarch64 and x86_64 with no regressions. I don't have write access, so if OK for trunk then can someone commit on my behalf? Regards, Barney gcc/ChangeLog: 2019-09-02 Barnaby Wilks <barnaby.wilks@arm.com> * builtins.c (mathfn_built_in): Expose find implicit builtin parameter. * builtins.h (mathfn_built_in): Likewise. * match.pd: Add expressions for simplifying builtin and binary math expressions. gcc/testsuite/ChangeLog: 2019-09-02 Barnaby Wilks <barnaby.wilks@arm.com> * gcc.dg/fold-single-precision.c: New test. [-- Attachment #2: gcc-float-narrow-opt.txt --] [-- Type: text/plain, Size: 4701 bytes --] diff --git a/gcc/builtins.h b/gcc/builtins.h index 1ffb491d7850366c74bd694bf9e1c277bcde1da9..5cd02af3be55b041918ad6f1a44d5520f5689fee 100644 --- a/gcc/builtins.h +++ b/gcc/builtins.h @@ -108,6 +108,7 @@ extern void expand_builtin_setjmp_setup (rtx, rtx); extern void expand_builtin_setjmp_receiver (rtx); extern void expand_builtin_update_setjmp_buf (rtx); extern tree mathfn_built_in (tree, enum built_in_function fn); +extern tree mathfn_built_in (tree, enum built_in_function fn, bool implicit); extern tree mathfn_built_in (tree, combined_fn); extern rtx builtin_strncpy_read_str (void *, HOST_WIDE_INT, scalar_int_mode); extern rtx builtin_memset_read_str (void *, HOST_WIDE_INT, scalar_int_mode); diff --git a/gcc/builtins.c b/gcc/builtins.c index 695a9d191af4c4922351e3e59601a87b3fedda5c..6cfd7f4af54110fec9f53ddaf71408e7efc329da 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c @@ -2137,6 +2137,12 @@ mathfn_built_in (tree type, enum built_in_function fn) return mathfn_built_in_1 (type, as_combined_fn (fn), /*implicit=*/ 1); } +tree +mathfn_built_in (tree type, enum built_in_function fn, bool implicit) +{ + return mathfn_built_in_1 (type, as_combined_fn (fn), implicit); +} + /* If BUILT_IN_NORMAL function FNDECL has an associated internal function, return its code, otherwise return IFN_LAST. Note that this function only tests whether the function is defined in internals.def, not whether diff --git a/gcc/match.pd b/gcc/match.pd index 0317bc704f771f626ab72189b3a54de00087ad5a..3562548de3ebcb986da20986b868d9a3d318c4ee 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -5004,10 +5004,18 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) && newtype == type && types_match (newtype, type)) (op (convert:newtype @1) (convert:newtype @2)) - (with { if (TYPE_PRECISION (ty1) > TYPE_PRECISION (newtype)) + (with + { + if (!flag_unsafe_math_optimizations) + { + if (TYPE_PRECISION (ty1) > TYPE_PRECISION (newtype)) newtype = ty1; + if (TYPE_PRECISION (ty2) > TYPE_PRECISION (newtype)) - newtype = ty2; } + newtype = ty2; + } + } + /* Sometimes this transformation is safe (cannot change results through affecting double rounding cases) and sometimes it is not. If NEWTYPE is @@ -5654,3 +5662,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (simplify (vec_perm vec_same_elem_p@0 @0 @1) @0) + +/* Convert expressions of the form + (x) math_call1 ((y) z) where (x) and z are the same type, into + math_call2 (z), where math_call2 is the math builtin for + type x. Type x (and therefore type of z) must be a lower precision + than y/math_call1. */ +(if (flag_unsafe_math_optimizations && !flag_errno_math) + (for op (COSH EXP EXP10 EXP2 EXPM1 GAMMA J0 J1 LGAMMA + POW10 SINH TGAMMA Y0 Y1 ACOS ACOSH ASIN ASINH + ATAN ATANH CBRT COS ERF ERFC LOG LOG10 LOG2 + LOG1P SIN TAN TANH SQRT FABS LOGB) + (simplify + (convert (op@0 (convert@1 @2))) + (if (SCALAR_FLOAT_TYPE_P (type) && SCALAR_FLOAT_TYPE_P (TREE_TYPE (@1)) + && SCALAR_FLOAT_TYPE_P (TREE_TYPE (@2)) + && types_match (type, TREE_TYPE (@2)) + && TYPE_PRECISION (type) < TYPE_PRECISION (TREE_TYPE (@1))) + (with { enum built_in_function fcode = builtin_mathfn_code (@0); + tree fn = mathfn_built_in (type, fcode, false); } + (if (fn) + (convert { build_call_expr (fn, 1, @2); }))))))) diff --git a/gcc/testsuite/gcc.dg/fold-single-precision.c b/gcc/testsuite/gcc.dg/fold-single-precision.c new file mode 100644 index 0000000000000000000000000000000000000000..9209b5ce42d87cda69e84b048f0f0e3eaf0dd973 --- /dev/null +++ b/gcc/testsuite/gcc.dg/fold-single-precision.c @@ -0,0 +1,57 @@ +/* { dg-do compile } */ +/* { dg-options "-Ofast -fdump-tree-optimized" } */ + +#include <math.h> + +float f (float x) +{ + x = 1.0 / sqrt (x); + return x; +} + +float g (float x, float y) +{ + double t = 1.0 / x; + return t * y; +} + +float h (float x, float y) +{ + float z = pow (y, 2.0); + return sqrt ((x * x) + z); +} + +float i (float x) +{ + return x * (double) sqrtf (x); +} + +void j (float* x, float* y) +{ + double len = h (*x, *y); + *x = *x / len; + *y = *y / len; +} + +float k (float x, float y) +{ + double t = 4.0 * x; + double z = t + y; + return z; +} + +float l (float n) +{ + return cbrt (n); +} + +float m (float n) +{ + float x = n * n; + return sqrt (x) - 1.0f; +} + +/* { dg-final { scan-tree-dump "__builtin_sqrtf" "optimized" } } */ +/* { dg-final { scan-tree-dump "__builtin_cbrtf" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "\\(double\\)" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "\\(float\\)" "optimized" } } */ ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH][GCC] Simplify to single precision where possible for binary/builtin maths operations. 2019-09-02 17:29 [PATCH][GCC] Simplify to single precision where possible for binary/builtin maths operations Barnaby Wilks @ 2019-09-03 8:23 ` Richard Biener 2019-09-03 14:19 ` Richard Sandiford 2019-09-03 15:23 ` Barnaby Wilks 0 siblings, 2 replies; 5+ messages in thread From: Richard Biener @ 2019-09-03 8:23 UTC (permalink / raw) To: Barnaby Wilks Cc: gcc-patches, nd, law, ian, Tamar Christina, Wilco Dijkstra, richard.sandiford [-- Attachment #1: Type: text/plain, Size: 5850 bytes --] On Mon, 2 Sep 2019, Barnaby Wilks wrote: > Hello, > > This patch introduces an optimization for narrowing binary and builtin > math operations to the smallest type when unsafe math optimizations are > enabled (typically -Ofast or -ffast-math). > > Consider the example: > > float f (float x) { > return 1.0 / sqrt (x); > } > > f: > fcvt d0, s0 > fmov d1, 1.0e+0 > fsqrt d0, d0 > fdiv d0, d1, d0 > fcvt s0, d0 > ret > > Given that all outputs are of float type, we can do the whole > calculation in single precision and avoid any potentially expensive > conversions between single and double precision. > > Aka the expression would end up looking more like > > float f (float x) { > return 1.0f / sqrtf (x); > } > > f: > fsqrt s0, s0 > fmov s1, 1.0e+0 > fdiv s0, s1, s0 > ret > > This optimization will narrow casts around math builtins, and also > not try to find the widest type for calculations when processing binary > math operations (if unsafe math optimizations are enable). > > Added tests to verify that narrower math builtins are chosen and > no unnecessary casts are introduced when appropriate. > > Bootstrapped and regtested on aarch64 and x86_64 with no regressions. > > I don't have write access, so if OK for trunk then can someone commit on > my behalf? @@ -5004,10 +5004,18 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) && newtype == type && types_match (newtype, type)) (op (convert:newtype @1) (convert:newtype @2)) - (with { if (TYPE_PRECISION (ty1) > TYPE_PRECISION (newtype)) + (with + { + if (!flag_unsafe_math_optimizations) + { + if (TYPE_PRECISION (ty1) > TYPE_PRECISION (newtype)) newtype = ty1; + if (TYPE_PRECISION (ty2) > TYPE_PRECISION (newtype)) - newtype = ty2; } + newtype = ty2; + } + } + /* Sometimes this transformation is safe (cannot change results through affecting double rounding cases) and sometimes it is not. If NEWTYPE is The ChangeLog doesn't mention this change and I wonder what it is for - later flag_unsafe_math_optimizations is checked, in particular && (flag_unsafe_math_optimizations || (TYPE_PRECISION (newtype) == TYPE_PRECISION (type) && real_can_shorten_arithmetic (TYPE_MODE (itype), TYPE_MODE (type)) && !excess_precision_type (newtype))) note the !excess_precision_type (newtype) which you fail to check below. @@ -5654,3 +5662,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (simplify (vec_perm vec_same_elem_p@0 @0 @1) @0) + +/* Convert expressions of the form + (x) math_call1 ((y) z) where (x) and z are the same type, into + math_call2 (z), where math_call2 is the math builtin for + type x. Type x (and therefore type of z) must be a lower precision + than y/math_call1. */ +(if (flag_unsafe_math_optimizations && !flag_errno_math) + (for op (COSH EXP EXP10 EXP2 EXPM1 GAMMA J0 J1 LGAMMA + POW10 SINH TGAMMA Y0 Y1 ACOS ACOSH ASIN ASINH + ATAN ATANH CBRT COS ERF ERFC LOG LOG10 LOG2 + LOG1P SIN TAN TANH SQRT FABS LOGB) + (simplify + (convert (op@0 (convert@1 @2))) + (if (SCALAR_FLOAT_TYPE_P (type) && SCALAR_FLOAT_TYPE_P (TREE_TYPE (@1)) + && SCALAR_FLOAT_TYPE_P (TREE_TYPE (@2)) + && types_match (type, TREE_TYPE (@2)) + && TYPE_PRECISION (type) < TYPE_PRECISION (TREE_TYPE (@1))) + (with { enum built_in_function fcode = builtin_mathfn_code (@0); + tree fn = mathfn_built_in (type, fcode, false); } + (if (fn) + (convert { build_call_expr (fn, 1, @2); }))))))) This (convert { build_call_expr (..) } ) only works on GENERIC. I also wonder why you needed the mathfn_built_in change. If you look at other examples in match.pd you'd see you should have used sth like (for op (BUILT_IN_COSH BUILT_IN_EXP ...) opf (BUILT_IN_COSHF BUILT_IN_EXPF ...) (simplify ... (if (types_match (type, float_type_node)) (opf @2))) and you have to repeat this for the COSHL (long double) case with appropriate opd and opf lists. In theory, if we'd extend genmatch to 'transform' builtin function kinds that could be done prettier like for example with (for op (COSH EXP ...) (simplify ... (op:type @2)) which I'd kind-of like. Note it's not as simple as passing 'type' to mathfn_built_in since that expects literal double_type_node and friends but we could use a {gimple,generic}-match.c private helper for that. Now - as a general comment I think adding this kind of narrowing is good but doing it via match.pd patterns is quite limiting - eventually the backprop pass would be a fit for propagating "needed precision" and narrowing feeding stmts accordingly in a more general way? Richard can probably tell quickest if it is feasible in that framework. Thanks, Richard. > Regards, > Barney > > gcc/ChangeLog: > > 2019-09-02 Barnaby Wilks <barnaby.wilks@arm.com> > > * builtins.c (mathfn_built_in): Expose find implicit builtin parameter. > * builtins.h (mathfn_built_in): Likewise. > * match.pd: Add expressions for simplifying builtin and binary > math expressions. > > gcc/testsuite/ChangeLog: > > 2019-09-02 Barnaby Wilks <barnaby.wilks@arm.com> > > * gcc.dg/fold-single-precision.c: New test. > -- Richard Biener <rguenther@suse.de> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 247165 (AG München) ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH][GCC] Simplify to single precision where possible for binary/builtin maths operations. 2019-09-03 8:23 ` Richard Biener @ 2019-09-03 14:19 ` Richard Sandiford 2019-09-03 15:23 ` Barnaby Wilks 1 sibling, 0 replies; 5+ messages in thread From: Richard Sandiford @ 2019-09-03 14:19 UTC (permalink / raw) To: Richard Biener Cc: Barnaby Wilks, gcc-patches, nd, law, ian, Tamar Christina, Wilco Dijkstra Richard Biener <rguenther@suse.de> writes: > On Mon, 2 Sep 2019, Barnaby Wilks wrote: > >> Hello, >> >> This patch introduces an optimization for narrowing binary and builtin >> math operations to the smallest type when unsafe math optimizations are >> enabled (typically -Ofast or -ffast-math). >> >> Consider the example: >> >> float f (float x) { >> return 1.0 / sqrt (x); >> } >> >> f: >> fcvt d0, s0 >> fmov d1, 1.0e+0 >> fsqrt d0, d0 >> fdiv d0, d1, d0 >> fcvt s0, d0 >> ret >> >> Given that all outputs are of float type, we can do the whole >> calculation in single precision and avoid any potentially expensive >> conversions between single and double precision. >> >> Aka the expression would end up looking more like >> >> float f (float x) { >> return 1.0f / sqrtf (x); >> } >> >> f: >> fsqrt s0, s0 >> fmov s1, 1.0e+0 >> fdiv s0, s1, s0 >> ret >> >> This optimization will narrow casts around math builtins, and also >> not try to find the widest type for calculations when processing binary >> math operations (if unsafe math optimizations are enable). >> >> Added tests to verify that narrower math builtins are chosen and >> no unnecessary casts are introduced when appropriate. >> >> Bootstrapped and regtested on aarch64 and x86_64 with no regressions. >> >> I don't have write access, so if OK for trunk then can someone commit on >> my behalf? > [...] > > Now - as a general comment I think adding this kind of narrowing is > good but doing it via match.pd patterns is quite limiting - eventually > the backprop pass would be a fit for propagating "needed precision" > and narrowing feeding stmts accordingly in a more general way? > Richard can probably tell quickest if it is feasible in that framework. Yeah, I think it would be a good fit, and would for example cope with cases in which we select between two double results before doing the truncation to float. I'd wanted to do something similar for integer truncation but never found the time... At the moment, backprop handles a single piece of information: whether the sign of the value matters. This is (over?)generalised to be one bit of information in a word of flags. I guess we could take the same approach here and have flags for certain well-known floating-point types, but it might be cleaner to instead have a field that records the widest mode that users of the result want. I think to do this we'd need to build an array that maps floating-point machine_modes to their order in the FOR_EACH_MODE_IN_CLASS chain. That'll give us a total ordering over floating-point modes and mean that operator & (the usage_info confluence function) can just take whichever of the input usage_info modes has the highest index in this chain. Thanks, Richard ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH][GCC] Simplify to single precision where possible for binary/builtin maths operations. 2019-09-03 8:23 ` Richard Biener 2019-09-03 14:19 ` Richard Sandiford @ 2019-09-03 15:23 ` Barnaby Wilks 2019-09-05 9:50 ` Richard Biener 1 sibling, 1 reply; 5+ messages in thread From: Barnaby Wilks @ 2019-09-03 15:23 UTC (permalink / raw) To: Richard Biener Cc: gcc-patches, nd, law, ian, Tamar Christina, Wilco Dijkstra, Richard Sandiford On 9/3/19 9:23 AM, Richard Biener wrote: > On Mon, 2 Sep 2019, Barnaby Wilks wrote: > >> Hello, >> >> This patch introduces an optimization for narrowing binary and builtin >> math operations to the smallest type when unsafe math optimizations are >> enabled (typically -Ofast or -ffast-math). >> >> Consider the example: >> >> float f (float x) { >> return 1.0 / sqrt (x); >> } >> >> f: >> fcvt d0, s0 >> fmov d1, 1.0e+0 >> fsqrt d0, d0 >> fdiv d0, d1, d0 >> fcvt s0, d0 >> ret >> >> Given that all outputs are of float type, we can do the whole >> calculation in single precision and avoid any potentially expensive >> conversions between single and double precision. >> >> Aka the expression would end up looking more like >> >> float f (float x) { >> return 1.0f / sqrtf (x); >> } >> >> f: >> fsqrt s0, s0 >> fmov s1, 1.0e+0 >> fdiv s0, s1, s0 >> ret >> >> This optimization will narrow casts around math builtins, and also >> not try to find the widest type for calculations when processing binary >> math operations (if unsafe math optimizations are enable). >> >> Added tests to verify that narrower math builtins are chosen and >> no unnecessary casts are introduced when appropriate. >> >> Bootstrapped and regtested on aarch64 and x86_64 with no regressions. >> >> I don't have write access, so if OK for trunk then can someone commit on >> my behalf? > > @@ -5004,10 +5004,18 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > && newtype == type > && types_match (newtype, type)) > (op (convert:newtype @1) (convert:newtype @2)) > - (with { if (TYPE_PRECISION (ty1) > TYPE_PRECISION (newtype)) > + (with > + { > + if (!flag_unsafe_math_optimizations) > + { > + if (TYPE_PRECISION (ty1) > TYPE_PRECISION (newtype)) > newtype = ty1; > + > if (TYPE_PRECISION (ty2) > TYPE_PRECISION (newtype)) > - newtype = ty2; } > + newtype = ty2; > + } > + } > + > /* Sometimes this transformation is safe (cannot > change results through affecting double rounding > cases) and sometimes it is not. If NEWTYPE is > > The ChangeLog doesn't mention this change and I wonder what it is > for - later flag_unsafe_math_optimizations is checked, in particular > > && (flag_unsafe_math_optimizations > || (TYPE_PRECISION (newtype) == TYPE_PRECISION > (type) > && real_can_shorten_arithmetic (TYPE_MODE > (itype), > TYPE_MODE > (type)) > && !excess_precision_type (newtype))) > > note the !excess_precision_type (newtype) which you fail to check > below. This change prevents the pattern from casting the operands to the widest type (the widest between the type of the operands and the type of the expression as a whole) if unsafe math optimizations are enabled. Whereas the second check of flag_unsafe_math_optimizations is a shortcut to enable the transformation as a whole, and does not affect the type that the operands are being cast to. Without the first check then the expression will always use the widest type, resulting in unnecessary casts. For example float f (float x, float y) { double z = 1.0 / x; return z * y; } Will generate (With -Ofast) float D.3459; double z; _1 = (double) x; z = 1.0e+0 / _1; _2 = (double) y; _3 = z * _2; D.3459 = (float) _3; return D.3459; Note how the parameters are cast to doubles, the whole calculation is done in double precision and then cast out to single precision at the end. (because double is the widest type in the expression) Whereas if you include the first flag_unsafe_math_optimizations check, and prevent the widening then you get float D.3459; double z; _1 = (double) x; z = 1.0e+0 / _1; _2 = (float) z; D.3459 = y * _2; return D.3459; Where only "double z = 1.0 / x" happens in double precision, and the rest of the calculation is done in single precision, reducing the amount of casts. The benefits here can be seen in the generated code: Without the flag_unsafe_math_optimizations check fcvt d1, s1 fcvt d0, s0 fdiv d0, d1, d0 fcvt s0, d0 ret With the check fdiv s0, s1, s0 ret > > @@ -5654,3 +5662,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > (simplify > (vec_perm vec_same_elem_p@0 @0 @1) > @0) > + > +/* Convert expressions of the form > + (x) math_call1 ((y) z) where (x) and z are the same type, into > + math_call2 (z), where math_call2 is the math builtin for > + type x. Type x (and therefore type of z) must be a lower precision > + than y/math_call1. */ > +(if (flag_unsafe_math_optimizations && !flag_errno_math) > + (for op (COSH EXP EXP10 EXP2 EXPM1 GAMMA J0 J1 LGAMMA > + POW10 SINH TGAMMA Y0 Y1 ACOS ACOSH ASIN ASINH > + ATAN ATANH CBRT COS ERF ERFC LOG LOG10 LOG2 > + LOG1P SIN TAN TANH SQRT FABS LOGB) > + (simplify > + (convert (op@0 (convert@1 @2))) > + (if (SCALAR_FLOAT_TYPE_P (type) && SCALAR_FLOAT_TYPE_P (TREE_TYPE > (@1)) > + && SCALAR_FLOAT_TYPE_P (TREE_TYPE (@2)) > + && types_match (type, TREE_TYPE (@2)) > + && TYPE_PRECISION (type) < TYPE_PRECISION (TREE_TYPE (@1))) > + (with { enum built_in_function fcode = builtin_mathfn_code (@0); > + tree fn = mathfn_built_in (type, fcode, false); } > + (if (fn) > + (convert { build_call_expr (fn, 1, @2); }))))))) > > This (convert { build_call_expr (..) } ) only works on GENERIC. I didn't realize that build_call_expr only works on GENERIC, I took that snippet from convert_to_real_1 in convert.c, which does a similar thing, but is not very generic and wont cover all the cases. I suppose this doesn't matter if the transformation is being moved out of match.pd into the backprop pass, or if not is there a more generic (if you excuse the pun) way to create function call nodes for GIMPLE & GENERIC? > I also wonder why you needed the mathfn_built_in change. Because some builtins are not marked as implicit - or more specifically reserved in C90 and actually specified in C99. To be honest, I wasn't really happy with doing this, as I'm not sure of it's implications, so if you have a better way to do this then that would be much appreciated? From what I can tell its the float versions of the math builtins that are not implicit, and these are the functions that are most commonly needed to narrow to. > If you look at other examples in match.pd you'd see you should have > used sth like > > (for op (BUILT_IN_COSH BUILT_IN_EXP ...) > opf (BUILT_IN_COSHF BUILT_IN_EXPF ...) > (simplify > ... > (if (types_match (type, float_type_node)) > (opf @2))) > > and you have to repeat this for the COSHL (long double) case > with appropriate opd and opf lists. In theory, if we'd extend > genmatch to 'transform' builtin function kinds that could be > done prettier like for example with Why would this need to be the case? The code already does practially the same thing by matching on all the builtins and then transforming down to the narrowest type with the builtin_mathfn_code/mathfn_built_in combination. > (for op (COSH EXP ...) > (simplify > ... > (op:type @2)) > > which I'd kind-of like. Note it's not as simple as passing > 'type' to mathfn_built_in since that expects literal > double_type_node and friends but we could use a {gimple,generic}-match.c > private helper for that. Would "type" not be a double_type_node (or related) literal already? If mathfn_built_in does not recognise the given type then it will just spit out NULL_TREE, which is checked by: + tree fn = mathfn_built_in (type, fcode, false); } + (if (fn) + (convert { build_call_expr (fn, 1, @2); }))))))) Apologies if any of these seem obvious questions - I'm quite new to GCC internals. Regards, Barney > Now - as a general comment I think adding this kind of narrowing is > good but doing it via match.pd patterns is quite limiting - eventually > the backprop pass would be a fit for propagating "needed precision" > and narrowing feeding stmts accordingly in a more general way? > Richard can probably tell quickest if it is feasible in that framework. > > Thanks, > Richard. > > >> Regards, >> Barney >> >> gcc/ChangeLog: >> >> 2019-09-02 Barnaby Wilks <barnaby.wilks@arm.com> >> >> * builtins.c (mathfn_built_in): Expose find implicit builtin parameter. >> * builtins.h (mathfn_built_in): Likewise. >> * match.pd: Add expressions for simplifying builtin and binary >> math expressions. >> >> gcc/testsuite/ChangeLog: >> >> 2019-09-02 Barnaby Wilks <barnaby.wilks@arm.com> >> >> * gcc.dg/fold-single-precision.c: New test. >> > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH][GCC] Simplify to single precision where possible for binary/builtin maths operations. 2019-09-03 15:23 ` Barnaby Wilks @ 2019-09-05 9:50 ` Richard Biener 0 siblings, 0 replies; 5+ messages in thread From: Richard Biener @ 2019-09-05 9:50 UTC (permalink / raw) To: Barnaby Wilks Cc: Richard Biener, gcc-patches, nd, law, ian, Tamar Christina, Wilco Dijkstra, Richard Sandiford On Tue, Sep 3, 2019 at 5:23 PM Barnaby Wilks <Barnaby.Wilks@arm.com> wrote: > > > > On 9/3/19 9:23 AM, Richard Biener wrote: > > On Mon, 2 Sep 2019, Barnaby Wilks wrote: > > > >> Hello, > >> > >> This patch introduces an optimization for narrowing binary and builtin > >> math operations to the smallest type when unsafe math optimizations are > >> enabled (typically -Ofast or -ffast-math). > >> > >> Consider the example: > >> > >> float f (float x) { > >> return 1.0 / sqrt (x); > >> } > >> > >> f: > >> fcvt d0, s0 > >> fmov d1, 1.0e+0 > >> fsqrt d0, d0 > >> fdiv d0, d1, d0 > >> fcvt s0, d0 > >> ret > >> > >> Given that all outputs are of float type, we can do the whole > >> calculation in single precision and avoid any potentially expensive > >> conversions between single and double precision. > >> > >> Aka the expression would end up looking more like > >> > >> float f (float x) { > >> return 1.0f / sqrtf (x); > >> } > >> > >> f: > >> fsqrt s0, s0 > >> fmov s1, 1.0e+0 > >> fdiv s0, s1, s0 > >> ret > >> > >> This optimization will narrow casts around math builtins, and also > >> not try to find the widest type for calculations when processing binary > >> math operations (if unsafe math optimizations are enable). > >> > >> Added tests to verify that narrower math builtins are chosen and > >> no unnecessary casts are introduced when appropriate. > >> > >> Bootstrapped and regtested on aarch64 and x86_64 with no regressions. > >> > >> I don't have write access, so if OK for trunk then can someone commit on > >> my behalf? > > > > @@ -5004,10 +5004,18 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > > && newtype == type > > && types_match (newtype, type)) > > (op (convert:newtype @1) (convert:newtype @2)) > > - (with { if (TYPE_PRECISION (ty1) > TYPE_PRECISION (newtype)) > > + (with > > + { > > + if (!flag_unsafe_math_optimizations) > > + { > > + if (TYPE_PRECISION (ty1) > TYPE_PRECISION (newtype)) > > newtype = ty1; > > + > > if (TYPE_PRECISION (ty2) > TYPE_PRECISION (newtype)) > > - newtype = ty2; } > > + newtype = ty2; > > + } > > + } > > + > > /* Sometimes this transformation is safe (cannot > > change results through affecting double rounding > > cases) and sometimes it is not. If NEWTYPE is > > > > The ChangeLog doesn't mention this change and I wonder what it is > > for - later flag_unsafe_math_optimizations is checked, in particular > > > > && (flag_unsafe_math_optimizations > > || (TYPE_PRECISION (newtype) == TYPE_PRECISION > > (type) > > && real_can_shorten_arithmetic (TYPE_MODE > > (itype), > > TYPE_MODE > > (type)) > > && !excess_precision_type (newtype))) > > > > note the !excess_precision_type (newtype) which you fail to check > > below. > > This change prevents the pattern from casting the operands to the widest > type (the widest between the type of the operands and the type of the > expression as a whole) if unsafe math optimizations are enabled. > Whereas the second check of flag_unsafe_math_optimizations is a shortcut > to enable the transformation as a whole, and does not affect the type > that the operands are being cast to. > > Without the first check then the expression will always use the widest > type, resulting in unnecessary casts. For example > > float f (float x, float y) { > double z = 1.0 / x; > return z * y; > } > > Will generate (With -Ofast) > > float D.3459; > double z; > > _1 = (double) x; > z = 1.0e+0 / _1; > _2 = (double) y; > _3 = z * _2; > D.3459 = (float) _3; > return D.3459; > > Note how the parameters are cast to doubles, the whole calculation is > done in double precision and then cast out to single precision at the > end. (because double is the widest type in the expression) > > Whereas if you include the first flag_unsafe_math_optimizations check, > and prevent the widening then you get > > float D.3459; > double z; > > _1 = (double) x; > z = 1.0e+0 / _1; > _2 = (float) z; > D.3459 = y * _2; > return D.3459; > > Where only "double z = 1.0 / x" happens in double precision, and the > rest of the calculation is done in single precision, reducing the amount > of casts. > > The benefits here can be seen in the generated code: > Without the flag_unsafe_math_optimizations check > > fcvt d1, s1 > fcvt d0, s0 > fdiv d0, d1, d0 > fcvt s0, d0 > ret > > With the check > > fdiv s0, s1, s0 > ret I see. Can you please propose this change independently with a separate testcase? > > > > @@ -5654,3 +5662,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > > (simplify > > (vec_perm vec_same_elem_p@0 @0 @1) > > @0) > > + > > +/* Convert expressions of the form > > + (x) math_call1 ((y) z) where (x) and z are the same type, into > > + math_call2 (z), where math_call2 is the math builtin for > > + type x. Type x (and therefore type of z) must be a lower precision > > + than y/math_call1. */ > > +(if (flag_unsafe_math_optimizations && !flag_errno_math) > > + (for op (COSH EXP EXP10 EXP2 EXPM1 GAMMA J0 J1 LGAMMA > > + POW10 SINH TGAMMA Y0 Y1 ACOS ACOSH ASIN ASINH > > + ATAN ATANH CBRT COS ERF ERFC LOG LOG10 LOG2 > > + LOG1P SIN TAN TANH SQRT FABS LOGB) > > + (simplify > > + (convert (op@0 (convert@1 @2))) > > + (if (SCALAR_FLOAT_TYPE_P (type) && SCALAR_FLOAT_TYPE_P (TREE_TYPE > > (@1)) > > + && SCALAR_FLOAT_TYPE_P (TREE_TYPE (@2)) > > + && types_match (type, TREE_TYPE (@2)) > > + && TYPE_PRECISION (type) < TYPE_PRECISION (TREE_TYPE (@1))) > > + (with { enum built_in_function fcode = builtin_mathfn_code (@0); > > + tree fn = mathfn_built_in (type, fcode, false); } > > + (if (fn) > > + (convert { build_call_expr (fn, 1, @2); }))))))) > > > > This (convert { build_call_expr (..) } ) only works on GENERIC. > > I didn't realize that build_call_expr only works on GENERIC, I took > that snippet from convert_to_real_1 in convert.c, which does a similar > thing, but is not very generic and wont cover all the cases. > I suppose this doesn't matter if the transformation is being moved out > of match.pd into the backprop pass, or if not is there a more generic > (if you excuse the pun) way to create function call nodes for GIMPLE & > GENERIC? Not as a match.pd "C expression", no. You'd have to work like I outlined below. > > I also wonder why you needed the mathfn_built_in change. > > Because some builtins are not marked as implicit - or more specifically > reserved in C90 and actually specified in C99. > To be honest, I wasn't really happy with doing this, as I'm not sure of > it's implications, so if you have a better way to do this then that > would be much appreciated? > From what I can tell its the float versions of the math builtins that > are not implicit, and these are the functions that are most commonly > needed to narrow to. The issue with "reserved" function is that if the program doesn't contain a suitable prototype or a call to such function we have to avoid introducing it ourselves. I admit we don't have very good solutions to this, but at gimplification time we do /* If we see a call to a declared builtin or see its address being taken (we can unify those cases here) then we can mark the builtin for implicit generation by GCC. */ if (TREE_CODE (op0) == FUNCTION_DECL && fndecl_built_in_p (op0, BUILT_IN_NORMAL) && builtin_decl_declared_p (DECL_FUNCTION_CODE (op0))) set_builtin_decl_implicit_p (DECL_FUNCTION_CODE (op0), true); so for testcases you have to make sure to provide appropriate declarations. > > If you look at other examples in match.pd you'd see you should have > > used sth like > > > > (for op (BUILT_IN_COSH BUILT_IN_EXP ...) > > opf (BUILT_IN_COSHF BUILT_IN_EXPF ...) > > (simplify > > ... > > (if (types_match (type, float_type_node)) > > (opf @2))) > > > > and you have to repeat this for the COSHL (long double) case > > with appropriate opd and opf lists. In theory, if we'd extend > > genmatch to 'transform' builtin function kinds that could be > > done prettier like for example with > > Why would this need to be the case? The code already does practially the > same thing by matching on all the builtins and then transforming down to > the narrowest type with the builtin_mathfn_code/mathfn_built_in combination. Because of the issue that you have to create the calls appropriately > > (for op (COSH EXP ...) > > (simplify > > ... > > (op:type @2)) > > > > which I'd kind-of like. Note it's not as simple as passing > > 'type' to mathfn_built_in since that expects literal > > double_type_node and friends but we could use a {gimple,generic}-match.c > > private helper for that. > > Would "type" not be a double_type_node (or related) literal already? > If mathfn_built_in does not recognise the given type then it will just > spit out NULL_TREE, which is checked by: > > + tree fn = mathfn_built_in (type, fcode, false); } > + (if (fn) > + (convert { build_call_expr (fn, 1, @2); }))))))) > > > Apologies if any of these seem obvious questions - I'm quite new to GCC > internals. I was just talking about how to actually extend genmatch to recognize the (op:type ...) for builtin function calls, automagically turning sin to sinf if type is a single-precision FP type and said that simply using mathfn_built_in with 'type' likely will end up with NULL_TREE as result more often than necessary, so we have to either enhance mathfn_built_in (maybe compare modes?) map to the canonical types mathfn_built_in checks for (you'd have to consider systems where double is equal to long double but they are still different types). Richard. > Regards, > Barney > > > Now - as a general comment I think adding this kind of narrowing is > > good but doing it via match.pd patterns is quite limiting - eventually > > the backprop pass would be a fit for propagating "needed precision" > > and narrowing feeding stmts accordingly in a more general way? > > Richard can probably tell quickest if it is feasible in that framework. > > > > Thanks, > > Richard. > > > > > >> Regards, > >> Barney > >> > >> gcc/ChangeLog: > >> > >> 2019-09-02 Barnaby Wilks <barnaby.wilks@arm.com> > >> > >> * builtins.c (mathfn_built_in): Expose find implicit builtin parameter. > >> * builtins.h (mathfn_built_in): Likewise. > >> * match.pd: Add expressions for simplifying builtin and binary > >> math expressions. > >> > >> gcc/testsuite/ChangeLog: > >> > >> 2019-09-02 Barnaby Wilks <barnaby.wilks@arm.com> > >> > >> * gcc.dg/fold-single-precision.c: New test. > >> > > ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2019-09-05 9:50 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-09-02 17:29 [PATCH][GCC] Simplify to single precision where possible for binary/builtin maths operations Barnaby Wilks 2019-09-03 8:23 ` Richard Biener 2019-09-03 14:19 ` Richard Sandiford 2019-09-03 15:23 ` Barnaby Wilks 2019-09-05 9:50 ` Richard Biener
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).