public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH][GCC] Simplify to single precision where possible for binary/builtin maths operations.
@ 2019-09-02 17:29 Barnaby Wilks
  2019-09-03  8:23 ` Richard Biener
  0 siblings, 1 reply; 5+ messages in thread
From: Barnaby Wilks @ 2019-09-02 17:29 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, law, ian, rguenther, Tamar Christina, Wilco Dijkstra

[-- Attachment #1: Type: text/plain, Size: 1702 bytes --]

Hello,

This patch introduces an optimization for narrowing binary and builtin
math operations to the smallest type when unsafe math optimizations are
enabled (typically -Ofast or -ffast-math).

Consider the example:

   float f (float x) {
     return 1.0 / sqrt (x);
   }

   f:
     fcvt	d0, s0
     fmov	d1, 1.0e+0
     fsqrt	d0, d0
     fdiv	d0, d1, d0
     fcvt	s0, d0
     ret

Given that all outputs are of float type, we can do the whole 
calculation in single precision and avoid any potentially expensive 
conversions between single and double precision.

Aka the expression would end up looking more like

   float f (float x) {
     return 1.0f / sqrtf (x);
   }

   f:
     fsqrt	s0, s0
     fmov	s1, 1.0e+0
     fdiv	s0, s1, s0
     ret

This optimization will narrow casts around math builtins, and also
not try to find the widest type for calculations when processing binary
math operations (if unsafe math optimizations are enable).

Added tests to verify that narrower math builtins are chosen and
no unnecessary casts are introduced when appropriate.

Bootstrapped and regtested on aarch64 and x86_64 with no regressions.

I don't have write access, so if OK for trunk then can someone commit on 
my behalf?

Regards,
Barney

gcc/ChangeLog:

2019-09-02  Barnaby Wilks  <barnaby.wilks@arm.com>

	* builtins.c (mathfn_built_in): Expose find implicit builtin parameter.
	* builtins.h (mathfn_built_in): Likewise.
	* match.pd: Add expressions for simplifying builtin and binary
	math expressions.

gcc/testsuite/ChangeLog:

2019-09-02  Barnaby Wilks  <barnaby.wilks@arm.com>

	* gcc.dg/fold-single-precision.c: New test.

[-- Attachment #2: gcc-float-narrow-opt.txt --]
[-- Type: text/plain, Size: 4701 bytes --]

diff --git a/gcc/builtins.h b/gcc/builtins.h
index 1ffb491d7850366c74bd694bf9e1c277bcde1da9..5cd02af3be55b041918ad6f1a44d5520f5689fee 100644
--- a/gcc/builtins.h
+++ b/gcc/builtins.h
@@ -108,6 +108,7 @@ extern void expand_builtin_setjmp_setup (rtx, rtx);
 extern void expand_builtin_setjmp_receiver (rtx);
 extern void expand_builtin_update_setjmp_buf (rtx);
 extern tree mathfn_built_in (tree, enum built_in_function fn);
+extern tree mathfn_built_in (tree, enum built_in_function fn, bool implicit);
 extern tree mathfn_built_in (tree, combined_fn);
 extern rtx builtin_strncpy_read_str (void *, HOST_WIDE_INT, scalar_int_mode);
 extern rtx builtin_memset_read_str (void *, HOST_WIDE_INT, scalar_int_mode);
diff --git a/gcc/builtins.c b/gcc/builtins.c
index 695a9d191af4c4922351e3e59601a87b3fedda5c..6cfd7f4af54110fec9f53ddaf71408e7efc329da 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -2137,6 +2137,12 @@ mathfn_built_in (tree type, enum built_in_function fn)
   return mathfn_built_in_1 (type, as_combined_fn (fn), /*implicit=*/ 1);
 }
 
+tree
+mathfn_built_in (tree type, enum built_in_function fn, bool implicit)
+{
+  return mathfn_built_in_1 (type, as_combined_fn (fn), implicit);
+}
+
 /* If BUILT_IN_NORMAL function FNDECL has an associated internal function,
    return its code, otherwise return IFN_LAST.  Note that this function
    only tests whether the function is defined in internals.def, not whether
diff --git a/gcc/match.pd b/gcc/match.pd
index 0317bc704f771f626ab72189b3a54de00087ad5a..3562548de3ebcb986da20986b868d9a3d318c4ee 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -5004,10 +5004,18 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 	      && newtype == type
 	      && types_match (newtype, type))
 	    (op (convert:newtype @1) (convert:newtype @2))
-	    (with { if (TYPE_PRECISION (ty1) > TYPE_PRECISION (newtype))
+	    (with
+	      {
+		if (!flag_unsafe_math_optimizations)
+		  {
+		    if (TYPE_PRECISION (ty1) > TYPE_PRECISION (newtype))
 		      newtype = ty1;
+
 		    if (TYPE_PRECISION (ty2) > TYPE_PRECISION (newtype))
-		      newtype = ty2; }
+		      newtype = ty2;
+		  }
+	      }
+
 	       /* Sometimes this transformation is safe (cannot
 		  change results through affecting double rounding
 		  cases) and sometimes it is not.  If NEWTYPE is
@@ -5654,3 +5662,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (simplify
  (vec_perm vec_same_elem_p@0 @0 @1)
  @0)
+
+/* Convert expressions of the form
+   (x) math_call1 ((y) z) where (x) and z are the same type, into
+   math_call2 (z), where math_call2 is the math builtin for
+   type x.  Type x (and therefore type of z) must be a lower precision
+   than y/math_call1.  */
+(if (flag_unsafe_math_optimizations && !flag_errno_math)
+  (for op (COSH EXP EXP10 EXP2 EXPM1 GAMMA J0 J1 LGAMMA
+	   POW10 SINH TGAMMA Y0 Y1 ACOS ACOSH ASIN ASINH
+	   ATAN ATANH CBRT COS ERF ERFC LOG LOG10 LOG2
+	   LOG1P SIN TAN TANH SQRT FABS LOGB)
+    (simplify
+      (convert (op@0 (convert@1 @2)))
+	(if (SCALAR_FLOAT_TYPE_P (type) && SCALAR_FLOAT_TYPE_P (TREE_TYPE (@1))
+	      && SCALAR_FLOAT_TYPE_P (TREE_TYPE (@2))
+	      && types_match (type, TREE_TYPE (@2))
+	      && TYPE_PRECISION (type) < TYPE_PRECISION (TREE_TYPE (@1)))
+	  (with { enum built_in_function fcode = builtin_mathfn_code (@0);
+		  tree fn = mathfn_built_in (type, fcode, false); }
+	    (if (fn)
+	      (convert { build_call_expr (fn, 1, @2); })))))))
diff --git a/gcc/testsuite/gcc.dg/fold-single-precision.c b/gcc/testsuite/gcc.dg/fold-single-precision.c
new file mode 100644
index 0000000000000000000000000000000000000000..9209b5ce42d87cda69e84b048f0f0e3eaf0dd973
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/fold-single-precision.c
@@ -0,0 +1,57 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -fdump-tree-optimized" } */
+
+#include <math.h>
+
+float f (float x)
+{
+  x = 1.0 / sqrt (x);
+  return x;
+}
+
+float g (float x, float y)
+{
+  double t = 1.0 / x;
+  return t * y;
+}
+
+float h (float x, float y)
+{
+  float z = pow (y, 2.0);
+  return sqrt ((x * x) + z);
+}
+
+float i (float x)
+{
+  return x * (double) sqrtf (x);
+}
+
+void j (float* x, float* y)
+{
+  double len = h (*x, *y);
+  *x = *x / len;
+  *y = *y / len;
+}
+
+float k (float x, float y)
+{
+  double t = 4.0 * x;
+  double z = t + y;
+  return z;
+}
+
+float l (float n)
+{
+  return cbrt (n);
+}
+
+float m (float n)
+{
+  float x = n * n;
+  return sqrt (x) - 1.0f;
+}
+
+/* { dg-final { scan-tree-dump "__builtin_sqrtf" "optimized" } } */
+/* { dg-final { scan-tree-dump "__builtin_cbrtf" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "\\(double\\)" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "\\(float\\)" "optimized" } } */

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH][GCC] Simplify to single precision where possible for binary/builtin maths operations.
  2019-09-02 17:29 [PATCH][GCC] Simplify to single precision where possible for binary/builtin maths operations Barnaby Wilks
@ 2019-09-03  8:23 ` Richard Biener
  2019-09-03 14:19   ` Richard Sandiford
  2019-09-03 15:23   ` Barnaby Wilks
  0 siblings, 2 replies; 5+ messages in thread
From: Richard Biener @ 2019-09-03  8:23 UTC (permalink / raw)
  To: Barnaby Wilks
  Cc: gcc-patches, nd, law, ian, Tamar Christina, Wilco Dijkstra,
	richard.sandiford

[-- Attachment #1: Type: text/plain, Size: 5850 bytes --]

On Mon, 2 Sep 2019, Barnaby Wilks wrote:

> Hello,
> 
> This patch introduces an optimization for narrowing binary and builtin
> math operations to the smallest type when unsafe math optimizations are
> enabled (typically -Ofast or -ffast-math).
> 
> Consider the example:
> 
>    float f (float x) {
>      return 1.0 / sqrt (x);
>    }
> 
>    f:
>      fcvt	d0, s0
>      fmov	d1, 1.0e+0
>      fsqrt	d0, d0
>      fdiv	d0, d1, d0
>      fcvt	s0, d0
>      ret
> 
> Given that all outputs are of float type, we can do the whole 
> calculation in single precision and avoid any potentially expensive 
> conversions between single and double precision.
> 
> Aka the expression would end up looking more like
> 
>    float f (float x) {
>      return 1.0f / sqrtf (x);
>    }
> 
>    f:
>      fsqrt	s0, s0
>      fmov	s1, 1.0e+0
>      fdiv	s0, s1, s0
>      ret
> 
> This optimization will narrow casts around math builtins, and also
> not try to find the widest type for calculations when processing binary
> math operations (if unsafe math optimizations are enable).
> 
> Added tests to verify that narrower math builtins are chosen and
> no unnecessary casts are introduced when appropriate.
> 
> Bootstrapped and regtested on aarch64 and x86_64 with no regressions.
> 
> I don't have write access, so if OK for trunk then can someone commit on 
> my behalf?

@@ -5004,10 +5004,18 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
              && newtype == type
              && types_match (newtype, type))
            (op (convert:newtype @1) (convert:newtype @2))
-           (with { if (TYPE_PRECISION (ty1) > TYPE_PRECISION (newtype))
+           (with
+             {
+               if (!flag_unsafe_math_optimizations)
+                 {
+                   if (TYPE_PRECISION (ty1) > TYPE_PRECISION (newtype))
                      newtype = ty1;
+
                    if (TYPE_PRECISION (ty2) > TYPE_PRECISION (newtype))
-                     newtype = ty2; }
+                     newtype = ty2;
+                 }
+             }
+
               /* Sometimes this transformation is safe (cannot
                  change results through affecting double rounding
                  cases) and sometimes it is not.  If NEWTYPE is

The ChangeLog doesn't mention this change and I wonder what it is
for - later flag_unsafe_math_optimizations is checked, in particular

                   && (flag_unsafe_math_optimizations
                       || (TYPE_PRECISION (newtype) == TYPE_PRECISION 
(type)
                           && real_can_shorten_arithmetic (TYPE_MODE 
(itype),
                                                           TYPE_MODE 
(type))
                           && !excess_precision_type (newtype)))

note the !excess_precision_type (newtype) which you fail to check
below.


@@ -5654,3 +5662,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (simplify
  (vec_perm vec_same_elem_p@0 @0 @1)
  @0)
+
+/* Convert expressions of the form
+   (x) math_call1 ((y) z) where (x) and z are the same type, into
+   math_call2 (z), where math_call2 is the math builtin for
+   type x.  Type x (and therefore type of z) must be a lower precision
+   than y/math_call1.  */
+(if (flag_unsafe_math_optimizations && !flag_errno_math)
+  (for op (COSH EXP EXP10 EXP2 EXPM1 GAMMA J0 J1 LGAMMA
+          POW10 SINH TGAMMA Y0 Y1 ACOS ACOSH ASIN ASINH
+          ATAN ATANH CBRT COS ERF ERFC LOG LOG10 LOG2
+          LOG1P SIN TAN TANH SQRT FABS LOGB)
+    (simplify
+      (convert (op@0 (convert@1 @2)))
+       (if (SCALAR_FLOAT_TYPE_P (type) && SCALAR_FLOAT_TYPE_P (TREE_TYPE
(@1))
+             && SCALAR_FLOAT_TYPE_P (TREE_TYPE (@2))
+             && types_match (type, TREE_TYPE (@2))
+             && TYPE_PRECISION (type) < TYPE_PRECISION (TREE_TYPE (@1)))
+         (with { enum built_in_function fcode = builtin_mathfn_code (@0);
+                 tree fn = mathfn_built_in (type, fcode, false); }
+           (if (fn)
+             (convert { build_call_expr (fn, 1, @2); })))))))

This (convert { build_call_expr (..) } ) only works on GENERIC.
I also wonder why you needed the mathfn_built_in change.

If you look at other examples in match.pd you'd see you should have
used sth like

 (for op (BUILT_IN_COSH BUILT_IN_EXP ...)
      opf (BUILT_IN_COSHF BUILT_IN_EXPF ...)
   (simplify
...
      (if (types_match (type, float_type_node))
        (opf @2)))

and you have to repeat this for the COSHL (long double) case
with appropriate opd and opf lists.  In theory, if we'd extend
genmatch to 'transform' builtin function kinds that could be
done prettier like for example with

 (for op (COSH EXP ...)
  (simplify
...
   (op:type @2))

which I'd kind-of like.  Note it's not as simple as passing
'type' to mathfn_built_in since that expects literal
double_type_node and friends but we could use a {gimple,generic}-match.c
private helper for that.

Now - as a general comment I think adding this kind of narrowing is
good but doing it via match.pd patterns is quite limiting - eventually
the backprop pass would be a fit for propagating "needed precision"
and narrowing feeding stmts accordingly in a more general way?
Richard can probably tell quickest if it is feasible in that framework.

Thanks,
Richard.


> Regards,
> Barney
> 
> gcc/ChangeLog:
> 
> 2019-09-02  Barnaby Wilks  <barnaby.wilks@arm.com>
> 
> 	* builtins.c (mathfn_built_in): Expose find implicit builtin parameter.
> 	* builtins.h (mathfn_built_in): Likewise.
> 	* match.pd: Add expressions for simplifying builtin and binary
> 	math expressions.
> 
> gcc/testsuite/ChangeLog:
> 
> 2019-09-02  Barnaby Wilks  <barnaby.wilks@arm.com>
> 
> 	* gcc.dg/fold-single-precision.c: New test.
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 247165 (AG München)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH][GCC] Simplify to single precision where possible for binary/builtin maths operations.
  2019-09-03  8:23 ` Richard Biener
@ 2019-09-03 14:19   ` Richard Sandiford
  2019-09-03 15:23   ` Barnaby Wilks
  1 sibling, 0 replies; 5+ messages in thread
From: Richard Sandiford @ 2019-09-03 14:19 UTC (permalink / raw)
  To: Richard Biener
  Cc: Barnaby Wilks, gcc-patches, nd, law, ian, Tamar Christina,
	Wilco Dijkstra

Richard Biener <rguenther@suse.de> writes:
> On Mon, 2 Sep 2019, Barnaby Wilks wrote:
>
>> Hello,
>> 
>> This patch introduces an optimization for narrowing binary and builtin
>> math operations to the smallest type when unsafe math optimizations are
>> enabled (typically -Ofast or -ffast-math).
>> 
>> Consider the example:
>> 
>>    float f (float x) {
>>      return 1.0 / sqrt (x);
>>    }
>> 
>>    f:
>>      fcvt	d0, s0
>>      fmov	d1, 1.0e+0
>>      fsqrt	d0, d0
>>      fdiv	d0, d1, d0
>>      fcvt	s0, d0
>>      ret
>> 
>> Given that all outputs are of float type, we can do the whole 
>> calculation in single precision and avoid any potentially expensive 
>> conversions between single and double precision.
>> 
>> Aka the expression would end up looking more like
>> 
>>    float f (float x) {
>>      return 1.0f / sqrtf (x);
>>    }
>> 
>>    f:
>>      fsqrt	s0, s0
>>      fmov	s1, 1.0e+0
>>      fdiv	s0, s1, s0
>>      ret
>> 
>> This optimization will narrow casts around math builtins, and also
>> not try to find the widest type for calculations when processing binary
>> math operations (if unsafe math optimizations are enable).
>> 
>> Added tests to verify that narrower math builtins are chosen and
>> no unnecessary casts are introduced when appropriate.
>> 
>> Bootstrapped and regtested on aarch64 and x86_64 with no regressions.
>> 
>> I don't have write access, so if OK for trunk then can someone commit on 
>> my behalf?
> [...]
>
> Now - as a general comment I think adding this kind of narrowing is
> good but doing it via match.pd patterns is quite limiting - eventually
> the backprop pass would be a fit for propagating "needed precision"
> and narrowing feeding stmts accordingly in a more general way?
> Richard can probably tell quickest if it is feasible in that framework.

Yeah, I think it would be a good fit, and would for example cope with
cases in which we select between two double results before doing the
truncation to float.  I'd wanted to do something similar for integer
truncation but never found the time...

At the moment, backprop handles a single piece of information: whether
the sign of the value matters.  This is (over?)generalised to be one bit
of information in a word of flags.  I guess we could take the same
approach here and have flags for certain well-known floating-point
types, but it might be cleaner to instead have a field that records the
widest mode that users of the result want.

I think to do this we'd need to build an array that maps floating-point
machine_modes to their order in the FOR_EACH_MODE_IN_CLASS chain.
That'll give us a total ordering over floating-point modes and mean
that operator & (the usage_info confluence function) can just take
whichever of the input usage_info modes has the highest index in this
chain.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH][GCC] Simplify to single precision where possible for binary/builtin maths operations.
  2019-09-03  8:23 ` Richard Biener
  2019-09-03 14:19   ` Richard Sandiford
@ 2019-09-03 15:23   ` Barnaby Wilks
  2019-09-05  9:50     ` Richard Biener
  1 sibling, 1 reply; 5+ messages in thread
From: Barnaby Wilks @ 2019-09-03 15:23 UTC (permalink / raw)
  To: Richard Biener
  Cc: gcc-patches, nd, law, ian, Tamar Christina, Wilco Dijkstra,
	Richard Sandiford



On 9/3/19 9:23 AM, Richard Biener wrote:
> On Mon, 2 Sep 2019, Barnaby Wilks wrote:
> 
>> Hello,
>>
>> This patch introduces an optimization for narrowing binary and builtin
>> math operations to the smallest type when unsafe math optimizations are
>> enabled (typically -Ofast or -ffast-math).
>>
>> Consider the example:
>>
>>     float f (float x) {
>>       return 1.0 / sqrt (x);
>>     }
>>
>>     f:
>>       fcvt	d0, s0
>>       fmov	d1, 1.0e+0
>>       fsqrt	d0, d0
>>       fdiv	d0, d1, d0
>>       fcvt	s0, d0
>>       ret
>>
>> Given that all outputs are of float type, we can do the whole
>> calculation in single precision and avoid any potentially expensive
>> conversions between single and double precision.
>>
>> Aka the expression would end up looking more like
>>
>>     float f (float x) {
>>       return 1.0f / sqrtf (x);
>>     }
>>
>>     f:
>>       fsqrt	s0, s0
>>       fmov	s1, 1.0e+0
>>       fdiv	s0, s1, s0
>>       ret
>>
>> This optimization will narrow casts around math builtins, and also
>> not try to find the widest type for calculations when processing binary
>> math operations (if unsafe math optimizations are enable).
>>
>> Added tests to verify that narrower math builtins are chosen and
>> no unnecessary casts are introduced when appropriate.
>>
>> Bootstrapped and regtested on aarch64 and x86_64 with no regressions.
>>
>> I don't have write access, so if OK for trunk then can someone commit on
>> my behalf?
> 
> @@ -5004,10 +5004,18 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>                && newtype == type
>                && types_match (newtype, type))
>              (op (convert:newtype @1) (convert:newtype @2))
> -           (with { if (TYPE_PRECISION (ty1) > TYPE_PRECISION (newtype))
> +           (with
> +             {
> +               if (!flag_unsafe_math_optimizations)
> +                 {
> +                   if (TYPE_PRECISION (ty1) > TYPE_PRECISION (newtype))
>                        newtype = ty1;
> +
>                      if (TYPE_PRECISION (ty2) > TYPE_PRECISION (newtype))
> -                     newtype = ty2; }
> +                     newtype = ty2;
> +                 }
> +             }
> +
>                 /* Sometimes this transformation is safe (cannot
>                    change results through affecting double rounding
>                    cases) and sometimes it is not.  If NEWTYPE is
> 
> The ChangeLog doesn't mention this change and I wonder what it is
> for - later flag_unsafe_math_optimizations is checked, in particular
> 
>                     && (flag_unsafe_math_optimizations
>                         || (TYPE_PRECISION (newtype) == TYPE_PRECISION
> (type)
>                             && real_can_shorten_arithmetic (TYPE_MODE
> (itype),
>                                                             TYPE_MODE
> (type))
>                             && !excess_precision_type (newtype)))
> 
> note the !excess_precision_type (newtype) which you fail to check
> below.

This change prevents the pattern from casting the operands to the widest 
type (the widest between the type of the operands and the type of the 
expression as a whole) if unsafe math optimizations are enabled.
Whereas the second check of flag_unsafe_math_optimizations is a shortcut 
to enable the transformation as a whole, and does not affect the type 
that the operands are being cast to.

Without the first check then the expression will always use the widest 
type, resulting in unnecessary casts. For example

   float f (float x, float y) {
     double z = 1.0 / x;
     return z * y;
   }

Will generate (With -Ofast)

   float D.3459;
   double z;

   _1 = (double) x;
   z = 1.0e+0 / _1;
   _2 = (double) y;
   _3 = z * _2;
   D.3459 = (float) _3;
   return D.3459;

Note how the parameters are cast to doubles, the whole calculation is 
done in double precision and then cast out to single precision at the 
end. (because double is the widest type in the expression)

Whereas if you include the first flag_unsafe_math_optimizations check, 
and prevent the widening then you get

   float D.3459;
   double z;

   _1 = (double) x;
   z = 1.0e+0 / _1;
   _2 = (float) z;
   D.3459 = y * _2;
   return D.3459;

Where only "double z = 1.0 / x" happens in double precision, and the 
rest of the calculation is done in single precision, reducing the amount 
of casts.

The benefits here can be seen in the generated code:
Without the flag_unsafe_math_optimizations check

         fcvt    d1, s1
         fcvt    d0, s0
         fdiv    d0, d1, d0
         fcvt    s0, d0
         ret

With the check

         fdiv    s0, s1, s0
         ret

> 
> @@ -5654,3 +5662,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (simplify
>    (vec_perm vec_same_elem_p@0 @0 @1)
>    @0)
> +
> +/* Convert expressions of the form
> +   (x) math_call1 ((y) z) where (x) and z are the same type, into
> +   math_call2 (z), where math_call2 is the math builtin for
> +   type x.  Type x (and therefore type of z) must be a lower precision
> +   than y/math_call1.  */
> +(if (flag_unsafe_math_optimizations && !flag_errno_math)
> +  (for op (COSH EXP EXP10 EXP2 EXPM1 GAMMA J0 J1 LGAMMA
> +          POW10 SINH TGAMMA Y0 Y1 ACOS ACOSH ASIN ASINH
> +          ATAN ATANH CBRT COS ERF ERFC LOG LOG10 LOG2
> +          LOG1P SIN TAN TANH SQRT FABS LOGB)
> +    (simplify
> +      (convert (op@0 (convert@1 @2)))
> +       (if (SCALAR_FLOAT_TYPE_P (type) && SCALAR_FLOAT_TYPE_P (TREE_TYPE
> (@1))
> +             && SCALAR_FLOAT_TYPE_P (TREE_TYPE (@2))
> +             && types_match (type, TREE_TYPE (@2))
> +             && TYPE_PRECISION (type) < TYPE_PRECISION (TREE_TYPE (@1)))
> +         (with { enum built_in_function fcode = builtin_mathfn_code (@0);
> +                 tree fn = mathfn_built_in (type, fcode, false); }
> +           (if (fn)
> +             (convert { build_call_expr (fn, 1, @2); })))))))
> 
> This (convert { build_call_expr (..) } ) only works on GENERIC.

I didn't realize that build_call_expr only works on GENERIC, I took
that snippet from convert_to_real_1 in convert.c, which does a similar 
thing, but is not very generic and wont cover all the cases.
I suppose this doesn't matter if the transformation is being moved out 
of match.pd into the backprop pass, or if not is there a more generic 
(if you excuse the pun) way to create function call nodes for GIMPLE & 
GENERIC?

> I also wonder why you needed the mathfn_built_in change.

Because some builtins are not marked as implicit - or more specifically 
reserved in C90 and actually specified in C99.
To be honest, I wasn't really happy with doing this, as I'm not sure of 
it's implications, so if you have a better way to do this then that 
would be much appreciated?
 From what I can tell its the float versions of the math builtins that 
are not implicit, and these are the functions that are most commonly 
needed to narrow to.

> If you look at other examples in match.pd you'd see you should have
> used sth like
> 
>   (for op (BUILT_IN_COSH BUILT_IN_EXP ...)
>        opf (BUILT_IN_COSHF BUILT_IN_EXPF ...)
>     (simplify
> ...
>        (if (types_match (type, float_type_node))
>          (opf @2)))
> 
> and you have to repeat this for the COSHL (long double) case
> with appropriate opd and opf lists.  In theory, if we'd extend
> genmatch to 'transform' builtin function kinds that could be
> done prettier like for example with

Why would this need to be the case? The code already does practially the 
same thing by matching on all the builtins and then transforming down to 
the narrowest type with the builtin_mathfn_code/mathfn_built_in combination.

>   (for op (COSH EXP ...)
>    (simplify
> ...
>     (op:type @2))
> 
> which I'd kind-of like.  Note it's not as simple as passing
> 'type' to mathfn_built_in since that expects literal
> double_type_node and friends but we could use a {gimple,generic}-match.c
> private helper for that.

Would "type" not be a double_type_node (or related) literal already?
If mathfn_built_in does not recognise the given type then it will just 
spit out NULL_TREE, which is checked by:

   +                 tree fn = mathfn_built_in (type, fcode, false); }
   +           (if (fn)
   +             (convert { build_call_expr (fn, 1, @2); })))))))


Apologies if any of these seem obvious questions - I'm quite new to GCC 
internals.

Regards,
Barney

> Now - as a general comment I think adding this kind of narrowing is
> good but doing it via match.pd patterns is quite limiting - eventually
> the backprop pass would be a fit for propagating "needed precision"
> and narrowing feeding stmts accordingly in a more general way?
> Richard can probably tell quickest if it is feasible in that framework.
> 
> Thanks,
> Richard.
> 
> 
>> Regards,
>> Barney
>>
>> gcc/ChangeLog:
>>
>> 2019-09-02  Barnaby Wilks  <barnaby.wilks@arm.com>
>>
>> 	* builtins.c (mathfn_built_in): Expose find implicit builtin parameter.
>> 	* builtins.h (mathfn_built_in): Likewise.
>> 	* match.pd: Add expressions for simplifying builtin and binary
>> 	math expressions.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2019-09-02  Barnaby Wilks  <barnaby.wilks@arm.com>
>>
>> 	* gcc.dg/fold-single-precision.c: New test.
>>
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH][GCC] Simplify to single precision where possible for binary/builtin maths operations.
  2019-09-03 15:23   ` Barnaby Wilks
@ 2019-09-05  9:50     ` Richard Biener
  0 siblings, 0 replies; 5+ messages in thread
From: Richard Biener @ 2019-09-05  9:50 UTC (permalink / raw)
  To: Barnaby Wilks
  Cc: Richard Biener, gcc-patches, nd, law, ian, Tamar Christina,
	Wilco Dijkstra, Richard Sandiford

On Tue, Sep 3, 2019 at 5:23 PM Barnaby Wilks <Barnaby.Wilks@arm.com> wrote:
>
>
>
> On 9/3/19 9:23 AM, Richard Biener wrote:
> > On Mon, 2 Sep 2019, Barnaby Wilks wrote:
> >
> >> Hello,
> >>
> >> This patch introduces an optimization for narrowing binary and builtin
> >> math operations to the smallest type when unsafe math optimizations are
> >> enabled (typically -Ofast or -ffast-math).
> >>
> >> Consider the example:
> >>
> >>     float f (float x) {
> >>       return 1.0 / sqrt (x);
> >>     }
> >>
> >>     f:
> >>       fcvt   d0, s0
> >>       fmov   d1, 1.0e+0
> >>       fsqrt  d0, d0
> >>       fdiv   d0, d1, d0
> >>       fcvt   s0, d0
> >>       ret
> >>
> >> Given that all outputs are of float type, we can do the whole
> >> calculation in single precision and avoid any potentially expensive
> >> conversions between single and double precision.
> >>
> >> Aka the expression would end up looking more like
> >>
> >>     float f (float x) {
> >>       return 1.0f / sqrtf (x);
> >>     }
> >>
> >>     f:
> >>       fsqrt  s0, s0
> >>       fmov   s1, 1.0e+0
> >>       fdiv   s0, s1, s0
> >>       ret
> >>
> >> This optimization will narrow casts around math builtins, and also
> >> not try to find the widest type for calculations when processing binary
> >> math operations (if unsafe math optimizations are enable).
> >>
> >> Added tests to verify that narrower math builtins are chosen and
> >> no unnecessary casts are introduced when appropriate.
> >>
> >> Bootstrapped and regtested on aarch64 and x86_64 with no regressions.
> >>
> >> I don't have write access, so if OK for trunk then can someone commit on
> >> my behalf?
> >
> > @@ -5004,10 +5004,18 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >                && newtype == type
> >                && types_match (newtype, type))
> >              (op (convert:newtype @1) (convert:newtype @2))
> > -           (with { if (TYPE_PRECISION (ty1) > TYPE_PRECISION (newtype))
> > +           (with
> > +             {
> > +               if (!flag_unsafe_math_optimizations)
> > +                 {
> > +                   if (TYPE_PRECISION (ty1) > TYPE_PRECISION (newtype))
> >                        newtype = ty1;
> > +
> >                      if (TYPE_PRECISION (ty2) > TYPE_PRECISION (newtype))
> > -                     newtype = ty2; }
> > +                     newtype = ty2;
> > +                 }
> > +             }
> > +
> >                 /* Sometimes this transformation is safe (cannot
> >                    change results through affecting double rounding
> >                    cases) and sometimes it is not.  If NEWTYPE is
> >
> > The ChangeLog doesn't mention this change and I wonder what it is
> > for - later flag_unsafe_math_optimizations is checked, in particular
> >
> >                     && (flag_unsafe_math_optimizations
> >                         || (TYPE_PRECISION (newtype) == TYPE_PRECISION
> > (type)
> >                             && real_can_shorten_arithmetic (TYPE_MODE
> > (itype),
> >                                                             TYPE_MODE
> > (type))
> >                             && !excess_precision_type (newtype)))
> >
> > note the !excess_precision_type (newtype) which you fail to check
> > below.
>
> This change prevents the pattern from casting the operands to the widest
> type (the widest between the type of the operands and the type of the
> expression as a whole) if unsafe math optimizations are enabled.
> Whereas the second check of flag_unsafe_math_optimizations is a shortcut
> to enable the transformation as a whole, and does not affect the type
> that the operands are being cast to.
>
> Without the first check then the expression will always use the widest
> type, resulting in unnecessary casts. For example
>
>    float f (float x, float y) {
>      double z = 1.0 / x;
>      return z * y;
>    }
>
> Will generate (With -Ofast)
>
>    float D.3459;
>    double z;
>
>    _1 = (double) x;
>    z = 1.0e+0 / _1;
>    _2 = (double) y;
>    _3 = z * _2;
>    D.3459 = (float) _3;
>    return D.3459;
>
> Note how the parameters are cast to doubles, the whole calculation is
> done in double precision and then cast out to single precision at the
> end. (because double is the widest type in the expression)
>
> Whereas if you include the first flag_unsafe_math_optimizations check,
> and prevent the widening then you get
>
>    float D.3459;
>    double z;
>
>    _1 = (double) x;
>    z = 1.0e+0 / _1;
>    _2 = (float) z;
>    D.3459 = y * _2;
>    return D.3459;
>
> Where only "double z = 1.0 / x" happens in double precision, and the
> rest of the calculation is done in single precision, reducing the amount
> of casts.
>
> The benefits here can be seen in the generated code:
> Without the flag_unsafe_math_optimizations check
>
>          fcvt    d1, s1
>          fcvt    d0, s0
>          fdiv    d0, d1, d0
>          fcvt    s0, d0
>          ret
>
> With the check
>
>          fdiv    s0, s1, s0
>          ret

I see.  Can you please propose this change independently with a
separate testcase?

> >
> > @@ -5654,3 +5662,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >   (simplify
> >    (vec_perm vec_same_elem_p@0 @0 @1)
> >    @0)
> > +
> > +/* Convert expressions of the form
> > +   (x) math_call1 ((y) z) where (x) and z are the same type, into
> > +   math_call2 (z), where math_call2 is the math builtin for
> > +   type x.  Type x (and therefore type of z) must be a lower precision
> > +   than y/math_call1.  */
> > +(if (flag_unsafe_math_optimizations && !flag_errno_math)
> > +  (for op (COSH EXP EXP10 EXP2 EXPM1 GAMMA J0 J1 LGAMMA
> > +          POW10 SINH TGAMMA Y0 Y1 ACOS ACOSH ASIN ASINH
> > +          ATAN ATANH CBRT COS ERF ERFC LOG LOG10 LOG2
> > +          LOG1P SIN TAN TANH SQRT FABS LOGB)
> > +    (simplify
> > +      (convert (op@0 (convert@1 @2)))
> > +       (if (SCALAR_FLOAT_TYPE_P (type) && SCALAR_FLOAT_TYPE_P (TREE_TYPE
> > (@1))
> > +             && SCALAR_FLOAT_TYPE_P (TREE_TYPE (@2))
> > +             && types_match (type, TREE_TYPE (@2))
> > +             && TYPE_PRECISION (type) < TYPE_PRECISION (TREE_TYPE (@1)))
> > +         (with { enum built_in_function fcode = builtin_mathfn_code (@0);
> > +                 tree fn = mathfn_built_in (type, fcode, false); }
> > +           (if (fn)
> > +             (convert { build_call_expr (fn, 1, @2); })))))))
> >
> > This (convert { build_call_expr (..) } ) only works on GENERIC.
>
> I didn't realize that build_call_expr only works on GENERIC, I took
> that snippet from convert_to_real_1 in convert.c, which does a similar
> thing, but is not very generic and wont cover all the cases.
> I suppose this doesn't matter if the transformation is being moved out
> of match.pd into the backprop pass, or if not is there a more generic
> (if you excuse the pun) way to create function call nodes for GIMPLE &
> GENERIC?

Not as a match.pd "C expression", no.  You'd have to work like I outlined below.

> > I also wonder why you needed the mathfn_built_in change.
>
> Because some builtins are not marked as implicit - or more specifically
> reserved in C90 and actually specified in C99.
> To be honest, I wasn't really happy with doing this, as I'm not sure of
> it's implications, so if you have a better way to do this then that
> would be much appreciated?
>  From what I can tell its the float versions of the math builtins that
> are not implicit, and these are the functions that are most commonly
> needed to narrow to.

The issue with "reserved" function is that if the program doesn't
contain a suitable prototype or a call to such function we have to
avoid introducing it ourselves.  I admit we don't have very good
solutions to this, but at gimplification time we do

      /* If we see a call to a declared builtin or see its address
         being taken (we can unify those cases here) then we can mark
         the builtin for implicit generation by GCC.  */
      if (TREE_CODE (op0) == FUNCTION_DECL
          && fndecl_built_in_p (op0, BUILT_IN_NORMAL)
          && builtin_decl_declared_p (DECL_FUNCTION_CODE (op0)))
        set_builtin_decl_implicit_p (DECL_FUNCTION_CODE (op0), true);

so for testcases you have to make sure to provide appropriate declarations.

> > If you look at other examples in match.pd you'd see you should have
> > used sth like
> >
> >   (for op (BUILT_IN_COSH BUILT_IN_EXP ...)
> >        opf (BUILT_IN_COSHF BUILT_IN_EXPF ...)
> >     (simplify
> > ...
> >        (if (types_match (type, float_type_node))
> >          (opf @2)))
> >
> > and you have to repeat this for the COSHL (long double) case
> > with appropriate opd and opf lists.  In theory, if we'd extend
> > genmatch to 'transform' builtin function kinds that could be
> > done prettier like for example with
>
> Why would this need to be the case? The code already does practially the
> same thing by matching on all the builtins and then transforming down to
> the narrowest type with the builtin_mathfn_code/mathfn_built_in combination.

Because of the issue that you have to create the calls appropriately

> >   (for op (COSH EXP ...)
> >    (simplify
> > ...
> >     (op:type @2))
> >
> > which I'd kind-of like.  Note it's not as simple as passing
> > 'type' to mathfn_built_in since that expects literal
> > double_type_node and friends but we could use a {gimple,generic}-match.c
> > private helper for that.
>
> Would "type" not be a double_type_node (or related) literal already?
> If mathfn_built_in does not recognise the given type then it will just
> spit out NULL_TREE, which is checked by:
>
>    +                 tree fn = mathfn_built_in (type, fcode, false); }
>    +           (if (fn)
>    +             (convert { build_call_expr (fn, 1, @2); })))))))
>
>
> Apologies if any of these seem obvious questions - I'm quite new to GCC
> internals.

I was just talking about how to actually extend genmatch to recognize
the (op:type ...) for builtin function calls, automagically turning
sin to sinf if type is a single-precision FP type and said that simply
using mathfn_built_in with 'type' likely will end up with NULL_TREE
as result more often than necessary, so we have to either enhance
mathfn_built_in (maybe compare modes?) map to the canonical
types mathfn_built_in checks for (you'd have to consider systems
where double is equal to long double but they are still different
types).

Richard.

> Regards,
> Barney
>
> > Now - as a general comment I think adding this kind of narrowing is
> > good but doing it via match.pd patterns is quite limiting - eventually
> > the backprop pass would be a fit for propagating "needed precision"
> > and narrowing feeding stmts accordingly in a more general way?
> > Richard can probably tell quickest if it is feasible in that framework.
> >
> > Thanks,
> > Richard.
> >
> >
> >> Regards,
> >> Barney
> >>
> >> gcc/ChangeLog:
> >>
> >> 2019-09-02  Barnaby Wilks  <barnaby.wilks@arm.com>
> >>
> >>      * builtins.c (mathfn_built_in): Expose find implicit builtin parameter.
> >>      * builtins.h (mathfn_built_in): Likewise.
> >>      * match.pd: Add expressions for simplifying builtin and binary
> >>      math expressions.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >> 2019-09-02  Barnaby Wilks  <barnaby.wilks@arm.com>
> >>
> >>      * gcc.dg/fold-single-precision.c: New test.
> >>
> >

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-09-05  9:50 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-02 17:29 [PATCH][GCC] Simplify to single precision where possible for binary/builtin maths operations Barnaby Wilks
2019-09-03  8:23 ` Richard Biener
2019-09-03 14:19   ` Richard Sandiford
2019-09-03 15:23   ` Barnaby Wilks
2019-09-05  9:50     ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).