public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH][Fortran] Use MIN/MAX_EXPR for intrinsics or __builtin_fmin/max when appropriate
@ 2018-07-17 12:35 Kyrill Tkachov
  2018-07-17 13:27 ` Richard Biener
  0 siblings, 1 reply; 21+ messages in thread
From: Kyrill Tkachov @ 2018-07-17 12:35 UTC (permalink / raw)
  To: fortran, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2176 bytes --]

Hi all,

This is my first Fortran patch, so apologies if I'm missing something.
The current expansion of the min and max intrinsics explicitly expands
the comparisons between each argument to calculate the global min/max.
Some targets, like aarch64, have instructions that can calculate the min/max
of two real (floating-point) numbers with the proper NaN-handling semantics
(if both inputs are NaN, return Nan. If one is NaN, return the other) and those
are the semantics provided by the __builtin_fmin/max family of functions that expand
to these instructions.

This patch makes the frontend emit __builtin_fmin/max directly to compare each
pair of numbers when the numbers are floating-point, and use MIN_EXPR/MAX_EXPR otherwise
(integral types and -ffast-math) which should hopefully be easier to recognise in the
midend and optimise. The previous approach of generating the open-coded version of that
is used when we don't have an appropriate __builtin_fmin/max available.
For example, for a configuration of x86_64-unknown-linux-gnu that I tested there was no
128-bit __built_fminl available.

With this patch I'm seeing more than 7000 FMINNM/FMAXNM instructions being generated at -O3
on aarch64 for 521.wrf from fprate SPEC2017 where none before were generated
(we were generating explicit comparisons and NaN checks). This gave a 2.4% improvement
in performance on a Cortex-A72.

Bootstrapped and tested on aarch64-none-linux-gnu and x86_64-unknown-linux-gnu.

Ok for trunk?
Thanks,
Kyrill

2018-07-17  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * f95-lang.c (gfc_init_builtin_functions): Define __builtin_fmin,
     __builtin_fminf, __builtin_fminl, __builtin_fmax, __builtin_fmaxf,
     __builtin_fmaxl.
     * trans-intrinsic.c: Include builtins.h.
     (gfc_conv_intrinsic_minmax): Emit __builtin_fmin/max or MIN/MAX_EXPR
     functions to calculate the min/max.

2018-07-17  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * gfortran.dg/max_fmaxf.f90: New test.
     * gfortran.dg/min_fminf.f90: Likewise.
     * gfortran.dg/minmax_integer.f90: Likewise.
     * gfortran.dg/max_fmaxl_aarch64.f90: Likewise.
     * gfortran.dg/min_fminl_aarch64.f90: Likewise.

[-- Attachment #2: fort-fmin.patch --]
[-- Type: text/x-patch, Size: 9835 bytes --]

diff --git a/gcc/fortran/f95-lang.c b/gcc/fortran/f95-lang.c
index 0f39f0ca788ea9e5868d4718c5f90c102958968f..5dd58f3d3d0242d77a5838ffa0395e7b941c8f85 100644
--- a/gcc/fortran/f95-lang.c
+++ b/gcc/fortran/f95-lang.c
@@ -724,6 +724,20 @@ gfc_init_builtin_functions (void)
   gfc_define_builtin ("__builtin_roundf", mfunc_float[0], 
 		      BUILT_IN_ROUNDF, "roundf", ATTR_CONST_NOTHROW_LEAF_LIST);
 
+  gfc_define_builtin  ("__builtin_fmin", mfunc_double[1],
+		      BUILT_IN_FMIN, "fmin", ATTR_CONST_NOTHROW_LEAF_LIST);
+  gfc_define_builtin  ("__builtin_fminf", mfunc_float[1],
+		      BUILT_IN_FMINF, "fminf", ATTR_CONST_NOTHROW_LEAF_LIST);
+  gfc_define_builtin  ("__builtin_fminl", mfunc_longdouble[1],
+		      BUILT_IN_FMINL, "fminl", ATTR_CONST_NOTHROW_LEAF_LIST);
+
+  gfc_define_builtin  ("__builtin_fmax", mfunc_double[1],
+		      BUILT_IN_FMAX, "fmax", ATTR_CONST_NOTHROW_LEAF_LIST);
+  gfc_define_builtin  ("__builtin_fmaxf", mfunc_float[1],
+		      BUILT_IN_FMAXF, "fmaxf", ATTR_CONST_NOTHROW_LEAF_LIST);
+  gfc_define_builtin  ("__builtin_fmaxl", mfunc_longdouble[1],
+		      BUILT_IN_FMAXL, "fmaxl", ATTR_CONST_NOTHROW_LEAF_LIST);
+
   gfc_define_builtin ("__builtin_truncl", mfunc_longdouble[0],
 		      BUILT_IN_TRUNCL, "truncl", ATTR_CONST_NOTHROW_LEAF_LIST);
   gfc_define_builtin ("__builtin_trunc", mfunc_double[0],
diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c
index d306e3a5a6209c1621d91f99ffc366acecd9c3d0..5dde54a3f3c2606f987b42480b1921e6304ccda5 100644
--- a/gcc/fortran/trans-intrinsic.c
+++ b/gcc/fortran/trans-intrinsic.c
@@ -31,6 +31,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "trans.h"
 #include "stringpool.h"
 #include "fold-const.h"
+#include "builtins.h"
 #include "tree-nested.h"
 #include "stor-layout.h"
 #include "toplev.h"	/* For rest_of_decl_compilation.  */
@@ -3874,14 +3875,13 @@ gfc_conv_intrinsic_ttynam (gfc_se * se, gfc_expr * expr)
     minmax (a1, a2, a3, ...)
     {
       mvar = a1;
-      if (a2 .op. mvar || isnan (mvar))
-        mvar = a2;
-      if (a3 .op. mvar || isnan (mvar))
-        mvar = a3;
+      __builtin_fmin/max (mvar, a2);
+      __builtin_fmin/max (mvar, a3);
       ...
-      return mvar
+      return mvar;
     }
- */
+   For integral types or when we don't care about NaNs use
+   MIN/MAX_EXPRs.  */
 
 /* TODO: Mismatching types can occur when specific names are used.
    These should be handled during resolution.  */
@@ -3891,7 +3891,6 @@ gfc_conv_intrinsic_minmax (gfc_se * se, gfc_expr * expr, enum tree_code op)
   tree tmp;
   tree mvar;
   tree val;
-  tree thencase;
   tree *args;
   tree type;
   gfc_actual_arglist *argexpr;
@@ -3912,55 +3911,79 @@ gfc_conv_intrinsic_minmax (gfc_se * se, gfc_expr * expr, enum tree_code op)
 
   mvar = gfc_create_var (type, "M");
   gfc_add_modify (&se->pre, mvar, args[0]);
-  for (i = 1, argexpr = argexpr->next; i < nargs; i++)
-    {
-      tree cond, isnan;
 
+  tree builtin = NULL_TREE;
+  if (SCALAR_FLOAT_TYPE_P (type))
+    builtin = mathfn_built_in (type, op == GT_EXPR
+				     ? BUILT_IN_FMAX : BUILT_IN_FMIN);
+
+  for (i = 1, argexpr = argexpr->next; i < nargs; i++, argexpr = argexpr->next)
+    {
+      tree cond = NULL_TREE;
       val = args[i];
 
       /* Handle absent optional arguments by ignoring the comparison.  */
       if (argexpr->expr->expr_type == EXPR_VARIABLE
 	  && argexpr->expr->symtree->n.sym->attr.optional
 	  && TREE_CODE (val) == INDIRECT_REF)
-	cond = fold_build2_loc (input_location,
+	{
+	  cond = fold_build2_loc (input_location,
 				NE_EXPR, logical_type_node,
 				TREE_OPERAND (val, 0),
 			build_int_cst (TREE_TYPE (TREE_OPERAND (val, 0)), 0));
-      else
-      {
-	cond = NULL_TREE;
-
+	}
+      else if (!VAR_P (val) && !TREE_CONSTANT (val))
 	/* Only evaluate the argument once.  */
-	if (!VAR_P (val) && !TREE_CONSTANT (val))
-	  val = gfc_evaluate_now (val, &se->pre);
-      }
+	val = gfc_evaluate_now (val, &se->pre);
 
-      thencase = build2_v (MODIFY_EXPR, mvar, convert (type, val));
+      tree calc;
+      /* If we dealing with integral types or we don't care about NaNs
+	 just do a MIN/MAX_EXPR.  */
+      if (!HONOR_NANS (type))
+	{
+
+	  tree_code code = op == GT_EXPR ? MAX_EXPR : MIN_EXPR;
+	  calc = fold_build2_loc (input_location, code, type,
+				  convert (type, val), mvar);
+	  tmp = build2_v (MODIFY_EXPR, mvar, calc);
 
-      tmp = fold_build2_loc (input_location, op, logical_type_node,
-			     convert (type, val), mvar);
+	}
+      /* If we care about NaNs and we have __builtin_fmin/max builtins
+	 to perform the comparison, use those.  */
+      else if (SCALAR_FLOAT_TYPE_P (type) && builtin)
+	{
+	  calc = build_call_expr_loc (input_location, builtin,
+				      2, mvar, convert (type, val));
+	  tmp = build2_v (MODIFY_EXPR, mvar, calc);
 
-      /* FIXME: When the IEEE_ARITHMETIC module is implemented, the call to
-	 __builtin_isnan might be made dependent on that module being loaded,
-	 to help performance of programs that don't rely on IEEE semantics.  */
-      if (FLOAT_TYPE_P (TREE_TYPE (mvar)))
+	}
+      /* Otherwise expand to:
+	mvar = a1;
+	if (a2 .op. mvar || isnan (mvar))
+	  mvar = a2;
+	if (a3 .op. mvar || isnan (mvar))
+	  mvar = a3;
+	...  */
+      else
 	{
-	  isnan = build_call_expr_loc (input_location,
-				       builtin_decl_explicit (BUILT_IN_ISNAN),
-				       1, mvar);
+	  tree isnan = build_call_expr_loc (input_location,
+					builtin_decl_explicit (BUILT_IN_ISNAN),
+					1, mvar);
+	  tmp = fold_build2_loc (input_location, op, logical_type_node,
+				 convert (type, val), mvar);
+
 	  tmp = fold_build2_loc (input_location, TRUTH_OR_EXPR,
-				 logical_type_node, tmp,
-				 fold_convert (logical_type_node, isnan));
+				  logical_type_node, tmp,
+				  fold_convert (logical_type_node, isnan));
+	  tmp = build3_v (COND_EXPR, tmp,
+			  build2_v (MODIFY_EXPR, mvar, convert (type, val)),
+			  build_empty_stmt (input_location));
 	}
-      tmp = build3_v (COND_EXPR, tmp, thencase,
-		      build_empty_stmt (input_location));
 
       if (cond != NULL_TREE)
 	tmp = build3_v (COND_EXPR, cond, tmp,
 			build_empty_stmt (input_location));
-
       gfc_add_expr_to_block (&se->pre, tmp);
-      argexpr = argexpr->next;
     }
   se->expr = mvar;
 }
diff --git a/gcc/testsuite/gfortran.dg/max_fmaxf.f90 b/gcc/testsuite/gfortran.dg/max_fmaxf.f90
new file mode 100644
index 0000000000000000000000000000000000000000..9082d4a7e70378efb36e24e3e575f11a74631657
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/max_fmaxf.f90
@@ -0,0 +1,15 @@
+! { dg-do compile }
+! { dg-options "-O2 -fdump-tree-optimized" }
+
+subroutine foo (a, b, c, d, e, f, g, h)
+  real (kind=8) :: a, b, c, d, e, f, g, h
+  a = max (a, b, c, d, e, f, g, h)
+end subroutine
+
+subroutine foof (a, b, c, d, e, f, g, h)
+  real (kind=4) :: a, b, c, d, e, f, g, h
+  a = max (a, b, c, d, e, f, g, h)
+end subroutine
+
+! { dg-final { scan-tree-dump-times "__builtin_fmax " 7 "optimized" } }
+! { dg-final { scan-tree-dump-times "__builtin_fmaxf " 7 "optimized" } }
diff --git a/gcc/testsuite/gfortran.dg/max_fmaxl_aarch64.f90 b/gcc/testsuite/gfortran.dg/max_fmaxl_aarch64.f90
new file mode 100644
index 0000000000000000000000000000000000000000..8c8ea063e5d0718dc829c1f5574c5b46040e6786
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/max_fmaxl_aarch64.f90
@@ -0,0 +1,9 @@
+! { dg-do compile { target aarch64*-*-* } }
+! { dg-options "-O2 -fdump-tree-optimized" }
+
+subroutine fool (a, b, c, d, e, f, g, h)
+  real (kind=16) :: a, b, c, d, e, f, g, h
+  a = max (a, b, c, d, e, f, g, h)
+end subroutine
+
+! { dg-final { scan-tree-dump-times "__builtin_fmaxl " 7 "optimized" } }
diff --git a/gcc/testsuite/gfortran.dg/min_fminf.f90 b/gcc/testsuite/gfortran.dg/min_fminf.f90
new file mode 100644
index 0000000000000000000000000000000000000000..6b929611f1f166877a19c475704a9312edf7b170
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/min_fminf.f90
@@ -0,0 +1,15 @@
+! { dg-do compile }
+! { dg-options "-O2 -fdump-tree-optimized" }
+
+subroutine foo (a, b, c, d, e, f, g, h)
+  real (kind=8) :: a, b, c, d, e, f, g, h
+  a = min (a, b, c, d, e, f, g, h)
+end subroutine
+
+subroutine foof (a, b, c, d, e, f, g, h)
+  real (kind=4) :: a, b, c, d, e, f, g, h
+  a = min (a, b, c, d, e, f, g, h)
+end subroutine
+
+! { dg-final { scan-tree-dump-times "__builtin_fmin " 7 "optimized" } }
+! { dg-final { scan-tree-dump-times "__builtin_fminf " 7 "optimized" } }
diff --git a/gcc/testsuite/gfortran.dg/min_fminl_aarch64.f90 b/gcc/testsuite/gfortran.dg/min_fminl_aarch64.f90
new file mode 100644
index 0000000000000000000000000000000000000000..92368917fb48e0c468a16d080ab3a9ac842e01a7
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/min_fminl_aarch64.f90
@@ -0,0 +1,9 @@
+! { dg-do compile { target aarch64*-*-* } }
+! { dg-options "-O2 -fdump-tree-optimized" }
+
+subroutine fool (a, b, c, d, e, f, g, h)
+  real (kind=16) :: a, b, c, d, e, f, g, h
+  a = min (a, b, c, d, e, f, g, h)
+end subroutine
+
+! { dg-final { scan-tree-dump-times "__builtin_fminl " 7 "optimized" } }
diff --git a/gcc/testsuite/gfortran.dg/minmax_integer.f90 b/gcc/testsuite/gfortran.dg/minmax_integer.f90
new file mode 100644
index 0000000000000000000000000000000000000000..5b6be38c7055ce4e8620cf75ec7d8a182436b24f
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/minmax_integer.f90
@@ -0,0 +1,15 @@
+! { dg-do compile }
+! { dg-options "-O2 -fdump-tree-optimized" }
+
+subroutine foo (a, b, c, d, e, f, g, h)
+  integer (kind=4) :: a, b, c, d, e, f, g, h
+  a = min (a, b, c, d, e, f, g, h)
+end subroutine
+
+subroutine foof (a, b, c, d, e, f, g, h)
+  integer (kind=4) :: a, b, c, d, e, f, g, h
+  a = max (a, b, c, d, e, f, g, h)
+end subroutine
+
+! { dg-final { scan-tree-dump-times "MIN_EXPR" 7 "optimized" } }
+! { dg-final { scan-tree-dump-times "MAX_EXPR" 7 "optimized" } }

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH][Fortran] Use MIN/MAX_EXPR for intrinsics or __builtin_fmin/max when appropriate
  2018-07-17 12:35 [PATCH][Fortran] Use MIN/MAX_EXPR for intrinsics or __builtin_fmin/max when appropriate Kyrill Tkachov
@ 2018-07-17 13:27 ` Richard Biener
  2018-07-17 13:46   ` Kyrill Tkachov
  0 siblings, 1 reply; 21+ messages in thread
From: Richard Biener @ 2018-07-17 13:27 UTC (permalink / raw)
  To: kyrylo.tkachov; +Cc: fortran, GCC Patches

On Tue, Jul 17, 2018 at 2:35 PM Kyrill Tkachov
<kyrylo.tkachov@foss.arm.com> wrote:
>
> Hi all,
>
> This is my first Fortran patch, so apologies if I'm missing something.
> The current expansion of the min and max intrinsics explicitly expands
> the comparisons between each argument to calculate the global min/max.
> Some targets, like aarch64, have instructions that can calculate the min/max
> of two real (floating-point) numbers with the proper NaN-handling semantics
> (if both inputs are NaN, return Nan. If one is NaN, return the other) and those
> are the semantics provided by the __builtin_fmin/max family of functions that expand
> to these instructions.
>
> This patch makes the frontend emit __builtin_fmin/max directly to compare each
> pair of numbers when the numbers are floating-point, and use MIN_EXPR/MAX_EXPR otherwise
> (integral types and -ffast-math) which should hopefully be easier to recognise in the

What is Fortrans requirement on min/max intrinsics?  Doesn't it only
require things that
are guaranteed by MIN/MAX_EXPR anyways?  The only restriction here is

/* Minimum and maximum values.  When used with floating point, if both
   operands are zeros, or if either operand is NaN, then it is unspecified
   which of the two operands is returned as the result.  */

which means MIN/MAX_EXPR are not strictly IEEE compliant with signed
zeros or NaNs.
Thus the correct test would be !HONOR_SIGNED_ZEROS && !HONOR_NANS if singed
zeros are significant.

I'm not sure if using fmin/max calls when we cannot use MIN/MAX_EXPR
is a good idea,
this may both generate bigger code and be slower.

Richard.

> midend and optimise. The previous approach of generating the open-coded version of that
> is used when we don't have an appropriate __builtin_fmin/max available.
> For example, for a configuration of x86_64-unknown-linux-gnu that I tested there was no
> 128-bit __built_fminl available.
>
> With this patch I'm seeing more than 7000 FMINNM/FMAXNM instructions being generated at -O3
> on aarch64 for 521.wrf from fprate SPEC2017 where none before were generated
> (we were generating explicit comparisons and NaN checks). This gave a 2.4% improvement
> in performance on a Cortex-A72.
>
> Bootstrapped and tested on aarch64-none-linux-gnu and x86_64-unknown-linux-gnu.
>
> Ok for trunk?
> Thanks,
> Kyrill
>
> 2018-07-17  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>      * f95-lang.c (gfc_init_builtin_functions): Define __builtin_fmin,
>      __builtin_fminf, __builtin_fminl, __builtin_fmax, __builtin_fmaxf,
>      __builtin_fmaxl.
>      * trans-intrinsic.c: Include builtins.h.
>      (gfc_conv_intrinsic_minmax): Emit __builtin_fmin/max or MIN/MAX_EXPR
>      functions to calculate the min/max.
>
> 2018-07-17  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>      * gfortran.dg/max_fmaxf.f90: New test.
>      * gfortran.dg/min_fminf.f90: Likewise.
>      * gfortran.dg/minmax_integer.f90: Likewise.
>      * gfortran.dg/max_fmaxl_aarch64.f90: Likewise.
>      * gfortran.dg/min_fminl_aarch64.f90: Likewise.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH][Fortran] Use MIN/MAX_EXPR for intrinsics or __builtin_fmin/max when appropriate
  2018-07-17 13:27 ` Richard Biener
@ 2018-07-17 13:46   ` Kyrill Tkachov
  2018-07-17 15:37     ` Thomas Koenig
  2018-07-18  9:44     ` [PATCH][Fortran] Use MIN/MAX_EXPR for intrinsics or __builtin_fmin/max when appropriate Richard Biener
  0 siblings, 2 replies; 21+ messages in thread
From: Kyrill Tkachov @ 2018-07-17 13:46 UTC (permalink / raw)
  To: Richard Biener; +Cc: fortran, GCC Patches

Hi Richard,

On 17/07/18 14:27, Richard Biener wrote:
> On Tue, Jul 17, 2018 at 2:35 PM Kyrill Tkachov
> <kyrylo.tkachov@foss.arm.com> wrote:
>> Hi all,
>>
>> This is my first Fortran patch, so apologies if I'm missing something.
>> The current expansion of the min and max intrinsics explicitly expands
>> the comparisons between each argument to calculate the global min/max.
>> Some targets, like aarch64, have instructions that can calculate the min/max
>> of two real (floating-point) numbers with the proper NaN-handling semantics
>> (if both inputs are NaN, return Nan. If one is NaN, return the other) and those
>> are the semantics provided by the __builtin_fmin/max family of functions that expand
>> to these instructions.
>>
>> This patch makes the frontend emit __builtin_fmin/max directly to compare each
>> pair of numbers when the numbers are floating-point, and use MIN_EXPR/MAX_EXPR otherwise
>> (integral types and -ffast-math) which should hopefully be easier to recognise in the
> What is Fortrans requirement on min/max intrinsics?  Doesn't it only
> require things that
> are guaranteed by MIN/MAX_EXPR anyways?  The only restriction here is

The current implementation expands to:
     mvar = a1;
     if (a2 .op. mvar || isnan (mvar))
       mvar = a2;
     if (a3 .op. mvar || isnan (mvar))
       mvar = a3;
     ...
     return mvar;

That is, if one of the operands is a NaN it will return the other argument.
If both (all) are NaNs, it will return NaN. This is the same as the semantics of fmin/max
as far as I can tell.

> /* Minimum and maximum values.  When used with floating point, if both
>     operands are zeros, or if either operand is NaN, then it is unspecified
>     which of the two operands is returned as the result.  */
>
> which means MIN/MAX_EXPR are not strictly IEEE compliant with signed
> zeros or NaNs.
> Thus the correct test would be !HONOR_SIGNED_ZEROS && !HONOR_NANS if singed
> zeros are significant.

True, MIN/MAX_EXPR would not be appropriate in that condition. I guarded their use
on !HONOR_NANS (type) only. I'll update it to !HONOR_SIGNED_ZEROS (type) && !HONOR_NANS (type).


>
> I'm not sure if using fmin/max calls when we cannot use MIN/MAX_EXPR
> is a good idea,
> this may both generate bigger code and be slower.

The patch will generate fmin/fmax calls (or the fminf,fminl variants) when mathfn_built_in advertises
them as available (does that mean they'll have a fast inline implementation?)

If the above doesn't hold and we can't use either MIN/MAX_EXPR of fmin/fmax then the patch falls back
to the existing expansion.

FWIW, this patch does improve performance on 521.wrf from SPEC2017 on aarch64.

Thanks,
Kyrill

>
> Richard.
>
>> midend and optimise. The previous approach of generating the open-coded version of that
>> is used when we don't have an appropriate __builtin_fmin/max available.
>> For example, for a configuration of x86_64-unknown-linux-gnu that I tested there was no
>> 128-bit __built_fminl available.
>>
>> With this patch I'm seeing more than 7000 FMINNM/FMAXNM instructions being generated at -O3
>> on aarch64 for 521.wrf from fprate SPEC2017 where none before were generated
>> (we were generating explicit comparisons and NaN checks). This gave a 2.4% improvement
>> in performance on a Cortex-A72.
>>
>> Bootstrapped and tested on aarch64-none-linux-gnu and x86_64-unknown-linux-gnu.
>>
>> Ok for trunk?
>> Thanks,
>> Kyrill
>>
>> 2018-07-17  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>
>>       * f95-lang.c (gfc_init_builtin_functions): Define __builtin_fmin,
>>       __builtin_fminf, __builtin_fminl, __builtin_fmax, __builtin_fmaxf,
>>       __builtin_fmaxl.
>>       * trans-intrinsic.c: Include builtins.h.
>>       (gfc_conv_intrinsic_minmax): Emit __builtin_fmin/max or MIN/MAX_EXPR
>>       functions to calculate the min/max.
>>
>> 2018-07-17  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>
>>       * gfortran.dg/max_fmaxf.f90: New test.
>>       * gfortran.dg/min_fminf.f90: Likewise.
>>       * gfortran.dg/minmax_integer.f90: Likewise.
>>       * gfortran.dg/max_fmaxl_aarch64.f90: Likewise.
>>       * gfortran.dg/min_fminl_aarch64.f90: Likewise.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH][Fortran] Use MIN/MAX_EXPR for intrinsics or __builtin_fmin/max when appropriate
  2018-07-17 13:46   ` Kyrill Tkachov
@ 2018-07-17 15:37     ` Thomas Koenig
  2018-07-17 16:16       ` Kyrill Tkachov
  2018-07-17 20:06       ` Janne Blomqvist
  2018-07-18  9:44     ` [PATCH][Fortran] Use MIN/MAX_EXPR for intrinsics or __builtin_fmin/max when appropriate Richard Biener
  1 sibling, 2 replies; 21+ messages in thread
From: Thomas Koenig @ 2018-07-17 15:37 UTC (permalink / raw)
  To: Kyrill Tkachov, Richard Biener; +Cc: fortran, GCC Patches

Hi Kyrill,

> The current implementation expands to:
>      mvar = a1;
>      if (a2 .op. mvar || isnan (mvar))
>        mvar = a2;
>      if (a3 .op. mvar || isnan (mvar))
>        mvar = a3;
>      ...
>      return mvar;
> 
> That is, if one of the operands is a NaN it will return the other argument.
> If both (all) are NaNs, it will return NaN. This is the same as the 
> semantics of fmin/max
> as far as I can tell.

I've looked at the F2008 standard, and, interestingly enough, the
requirement on MIN and MAX do not mention NaNs at all. 13.7.106
has, for MAX,

Result Value. The value of the result is that of the largest argument.

plus some stuff about character variables (not relevant here).  Similar
for MIN.

Also, the section on IEEE_ARITHMETIC (14.9) does not mention
comparisons; also, "Complete conformance with IEC 60559:1989 is not
required", what is required is the correct support for +,-, and *,
plus support for / if IEEE_SUPPORT_DIVIDE is covered.

So, the Fortran standard does not impose many requirements. I do think
that a patch such as yours should not change the current behavior unless
we know what it does and do think it is a good idea.  Hmm...

Having said that, I think we pretty much cover all the corner cases
in nan_1.f90, so if that test passes without regression, then that
aspect should be fine.

Question: You have found an advantage on Aarm64. Do you have
access to other architectures so see if there is also a speed
advantage, or maybe a disadvantage?

Regards

	Thomas

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH][Fortran] Use MIN/MAX_EXPR for intrinsics or __builtin_fmin/max when appropriate
  2018-07-17 15:37     ` Thomas Koenig
@ 2018-07-17 16:16       ` Kyrill Tkachov
  2018-07-17 17:42         ` Thomas Koenig
  2018-07-17 20:06       ` Janne Blomqvist
  1 sibling, 1 reply; 21+ messages in thread
From: Kyrill Tkachov @ 2018-07-17 16:16 UTC (permalink / raw)
  To: Thomas Koenig, Richard Biener; +Cc: fortran, GCC Patches

Hi Thomas,

On 17/07/18 16:36, Thomas Koenig wrote:
> Hi Kyrill,
>
>> The current implementation expands to:
>>      mvar = a1;
>>      if (a2 .op. mvar || isnan (mvar))
>>        mvar = a2;
>>      if (a3 .op. mvar || isnan (mvar))
>>        mvar = a3;
>>      ...
>>      return mvar;
>>
>> That is, if one of the operands is a NaN it will return the other argument.
>> If both (all) are NaNs, it will return NaN. This is the same as the semantics of fmin/max
>> as far as I can tell.
>
> I've looked at the F2008 standard, and, interestingly enough, the
> requirement on MIN and MAX do not mention NaNs at all. 13.7.106
> has, for MAX,
>
> Result Value. The value of the result is that of the largest argument.
>
> plus some stuff about character variables (not relevant here). Similar
> for MIN.
>
> Also, the section on IEEE_ARITHMETIC (14.9) does not mention
> comparisons; also, "Complete conformance with IEC 60559:1989 is not
> required", what is required is the correct support for +,-, and *,
> plus support for / if IEEE_SUPPORT_DIVIDE is covered.
>

Thanks for checking this.

> So, the Fortran standard does not impose many requirements. I do think
> that a patch such as yours should not change the current behavior unless
> we know what it does and do think it is a good idea.  Hmm...
>
> Having said that, I think we pretty much cover all the corner cases
> in nan_1.f90, so if that test passes without regression, then that
> aspect should be fine.
>

Looking at the test it looks like there is a de facto expected behaviour.
For example it contains:
if (max(2.d0, nan) /= 2.d0) STOP 9

So it definitely expects comparison with NaN to return the non-NaN result,
which is a the behaviour what my patch preserves.

On integral arguments or when we don't care about NaNs (-Ofast and such) we'll be using
the MIN/MAX_EXPR, which doesn't specify what's returned on a NaN argument, thus allowing
for more aggressive optimisations.

> Question: You have found an advantage on Aarm64. Do you have
> access to other architectures so see if there is also a speed
> advantage, or maybe a disadvantage?
>

Because the expansion now emits straightline code rather than conditionals and branches
it should be easier to optimise in general, so I'd expect this to be an improvement overall.
That said, I have benchmarked it on SPEC2017 on aarch64.

If you have any benchmarks of interest to you you (or somebody else) can run on a target that you
care about I would be very grateful for any results.

Thanks,
Kyrill

> Regards
>
>     Thomas

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH][Fortran] Use MIN/MAX_EXPR for intrinsics or __builtin_fmin/max when appropriate
  2018-07-17 16:16       ` Kyrill Tkachov
@ 2018-07-17 17:42         ` Thomas Koenig
  0 siblings, 0 replies; 21+ messages in thread
From: Thomas Koenig @ 2018-07-17 17:42 UTC (permalink / raw)
  To: Kyrill Tkachov, Richard Biener; +Cc: fortran, GCC Patches

Hi Kyrill,

> Because the expansion now emits straightline code rather than 
> conditionals and branches
> it should be easier to optimise in general, so I'd expect this to be an 
> improvement overall.
> That said, I have benchmarked it on SPEC2017 on aarch64.

> If you have any benchmarks of interest to you you (or somebody else) can 
> run on a target that you
> care about I would be very grateful for any results.

Well, most people currently use x86_64 for scientific computing, so I
would be concerned most about this architecture. As for the test case,
min / max performance clearly has an effect on 521.wrf, so this would
be ideal.

If you could run 521.wrf on x86_64, and find that it does not
regress measureably (or even shows an improvement), the patch is OK.
I'd be interested in the timings you get.

Regards

	Thomas

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH][Fortran] Use MIN/MAX_EXPR for intrinsics or __builtin_fmin/max when appropriate
  2018-07-17 15:37     ` Thomas Koenig
  2018-07-17 16:16       ` Kyrill Tkachov
@ 2018-07-17 20:06       ` Janne Blomqvist
  2018-07-17 20:35         ` Janne Blomqvist
  1 sibling, 1 reply; 21+ messages in thread
From: Janne Blomqvist @ 2018-07-17 20:06 UTC (permalink / raw)
  To: Thomas Koenig; +Cc: Kyrill Tkachov, Richard Biener, fortran, GCC Patches

On Tue, Jul 17, 2018 at 6:36 PM, Thomas Koenig <tkoenig@netcologne.de>
wrote:

> Hi Kyrill,
>
> The current implementation expands to:
>>      mvar = a1;
>>      if (a2 .op. mvar || isnan (mvar))
>>        mvar = a2;
>>      if (a3 .op. mvar || isnan (mvar))
>>        mvar = a3;
>>      ...
>>      return mvar;
>>
>> That is, if one of the operands is a NaN it will return the other
>> argument.
>> If both (all) are NaNs, it will return NaN. This is the same as the
>> semantics of fmin/max
>> as far as I can tell.
>>
>
> I've looked at the F2008 standard, and, interestingly enough, the
> requirement on MIN and MAX do not mention NaNs at all. 13.7.106
> has, for MAX,
>
> Result Value. The value of the result is that of the largest argument.
>
> plus some stuff about character variables (not relevant here).  Similar
> for MIN.
>

FWIW, this has not changed in the latest(?) draft for F2018 (N2146), see
16.9.125.

Also, the section on IEEE_ARITHMETIC (14.9) does not mention
> comparisons; also, "Complete conformance with IEC 60559:1989 is not
> required", what is required is the correct support for +,-, and *,
> plus support for / if IEEE_SUPPORT_DIVIDE is covered.
>

Interestingly, here the F2018 draft has new intrinsics in the
IEEE_ARITHMETIC module, IEEE_MAX_NUM, IEEE_MAX_NUM_MAG, IEEE_MIN_NUM,
IEEE_MIN_NUM_MAG. These correspond to the {max,min}num{,_mag} operations in
IEEE 754-2008, which AFAICT has the same NaN semantics as __builtin_fmax
etc.


> So, the Fortran standard does not impose many requirements.


If so, why don't we just use {MAX,MIN}_EXPR unconditionally? Those who
worry about the behavior wrt. NaNs, infinities etc. can use the intrinsics
from IEEE_ARITHMETIC?


This thread also has some interesting discussion on the topic:
https://github.com/JuliaLang/julia/issues/7866



-- 
Janne Blomqvist

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH][Fortran] Use MIN/MAX_EXPR for intrinsics or __builtin_fmin/max when appropriate
  2018-07-17 20:06       ` Janne Blomqvist
@ 2018-07-17 20:35         ` Janne Blomqvist
  2018-07-18 11:17           ` [PATCH][Fortran][v2] Use MIN/MAX_EXPR for min/max intrinsics Kyrill Tkachov
  0 siblings, 1 reply; 21+ messages in thread
From: Janne Blomqvist @ 2018-07-17 20:35 UTC (permalink / raw)
  To: Thomas Koenig; +Cc: Kyrill Tkachov, Richard Biener, fortran, GCC Patches

On Tue, Jul 17, 2018 at 11:06 PM, Janne Blomqvist <blomqvist.janne@gmail.com
> wrote:

> On Tue, Jul 17, 2018 at 6:36 PM, Thomas Koenig <tkoenig@netcologne.de>
> wrote:
>
>> Hi Kyrill,
>>
>> The current implementation expands to:
>>>      mvar = a1;
>>>      if (a2 .op. mvar || isnan (mvar))
>>>        mvar = a2;
>>>      if (a3 .op. mvar || isnan (mvar))
>>>        mvar = a3;
>>>      ...
>>>      return mvar;
>>>
>>> That is, if one of the operands is a NaN it will return the other
>>> argument.
>>> If both (all) are NaNs, it will return NaN. This is the same as the
>>> semantics of fmin/max
>>> as far as I can tell.
>>>
>>
>> I've looked at the F2008 standard, and, interestingly enough, the
>> requirement on MIN and MAX do not mention NaNs at all. 13.7.106
>> has, for MAX,
>>
>> Result Value. The value of the result is that of the largest argument.
>>
>> plus some stuff about character variables (not relevant here).  Similar
>> for MIN.
>>
>
> FWIW, this has not changed in the latest(?) draft for F2018 (N2146), see
> 16.9.125.
>
> Also, the section on IEEE_ARITHMETIC (14.9) does not mention
>> comparisons; also, "Complete conformance with IEC 60559:1989 is not
>> required", what is required is the correct support for +,-, and *,
>> plus support for / if IEEE_SUPPORT_DIVIDE is covered.
>>
>
> Interestingly, here the F2018 draft has new intrinsics in the
> IEEE_ARITHMETIC module, IEEE_MAX_NUM, IEEE_MAX_NUM_MAG, IEEE_MIN_NUM,
> IEEE_MIN_NUM_MAG. These correspond to the {max,min}num{,_mag} operations in
> IEEE 754-2008, which AFAICT has the same NaN semantics as __builtin_fmax
> etc.
>
>
>> So, the Fortran standard does not impose many requirements.
>
>
> If so, why don't we just use {MAX,MIN}_EXPR unconditionally? Those who
> worry about the behavior wrt. NaNs, infinities etc. can use the intrinsics
> from IEEE_ARITHMETIC?
>
>
> This thread also has some interesting discussion on the topic:
> https://github.com/JuliaLang/julia/issues/7866
>

Oh, and on http://754r.ucbtest.org/ there is information about the next
update after IEEE 754-2008. In particular,
http://754r.ucbtest.org/changes.html notes that the above mentioned
{max,min}num{,_mag}  have been deleted, and "new
{min,max}imum{,Number,Magnitude,MagnitudeNumber} operations are
recommended; NaN and signed zero handling are changed from 754-2008 5.3.1.
".


-- 
Janne Blomqvist

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH][Fortran] Use MIN/MAX_EXPR for intrinsics or __builtin_fmin/max when appropriate
  2018-07-17 13:46   ` Kyrill Tkachov
  2018-07-17 15:37     ` Thomas Koenig
@ 2018-07-18  9:44     ` Richard Biener
  2018-07-18  9:50       ` Kyrill Tkachov
  1 sibling, 1 reply; 21+ messages in thread
From: Richard Biener @ 2018-07-18  9:44 UTC (permalink / raw)
  To: kyrylo.tkachov; +Cc: fortran, GCC Patches

On Tue, Jul 17, 2018 at 3:46 PM Kyrill Tkachov
<kyrylo.tkachov@foss.arm.com> wrote:
>
> Hi Richard,
>
> On 17/07/18 14:27, Richard Biener wrote:
> > On Tue, Jul 17, 2018 at 2:35 PM Kyrill Tkachov
> > <kyrylo.tkachov@foss.arm.com> wrote:
> >> Hi all,
> >>
> >> This is my first Fortran patch, so apologies if I'm missing something.
> >> The current expansion of the min and max intrinsics explicitly expands
> >> the comparisons between each argument to calculate the global min/max.
> >> Some targets, like aarch64, have instructions that can calculate the min/max
> >> of two real (floating-point) numbers with the proper NaN-handling semantics
> >> (if both inputs are NaN, return Nan. If one is NaN, return the other) and those
> >> are the semantics provided by the __builtin_fmin/max family of functions that expand
> >> to these instructions.
> >>
> >> This patch makes the frontend emit __builtin_fmin/max directly to compare each
> >> pair of numbers when the numbers are floating-point, and use MIN_EXPR/MAX_EXPR otherwise
> >> (integral types and -ffast-math) which should hopefully be easier to recognise in the
> > What is Fortrans requirement on min/max intrinsics?  Doesn't it only
> > require things that
> > are guaranteed by MIN/MAX_EXPR anyways?  The only restriction here is
>
> The current implementation expands to:
>      mvar = a1;
>      if (a2 .op. mvar || isnan (mvar))
>        mvar = a2;
>      if (a3 .op. mvar || isnan (mvar))
>        mvar = a3;
>      ...
>      return mvar;
>
> That is, if one of the operands is a NaN it will return the other argument.
> If both (all) are NaNs, it will return NaN. This is the same as the semantics of fmin/max
> as far as I can tell.
>
> > /* Minimum and maximum values.  When used with floating point, if both
> >     operands are zeros, or if either operand is NaN, then it is unspecified
> >     which of the two operands is returned as the result.  */
> >
> > which means MIN/MAX_EXPR are not strictly IEEE compliant with signed
> > zeros or NaNs.
> > Thus the correct test would be !HONOR_SIGNED_ZEROS && !HONOR_NANS if singed
> > zeros are significant.
>
> True, MIN/MAX_EXPR would not be appropriate in that condition. I guarded their use
> on !HONOR_NANS (type) only. I'll update it to !HONOR_SIGNED_ZEROS (type) && !HONOR_NANS (type).
>
>
> >
> > I'm not sure if using fmin/max calls when we cannot use MIN/MAX_EXPR
> > is a good idea,
> > this may both generate bigger code and be slower.
>
> The patch will generate fmin/fmax calls (or the fminf,fminl variants) when mathfn_built_in advertises
> them as available (does that mean they'll have a fast inline implementation?)

This doesn't mean anything given you make them available with your
patch ;)  So I expect it may
cause issues for !c99_runtime targets (and long double at least).

> If the above doesn't hold and we can't use either MIN/MAX_EXPR of fmin/fmax then the patch falls back
> to the existing expansion.

As said I would not use fmin/fmax calls here at all.

> FWIW, this patch does improve performance on 521.wrf from SPEC2017 on aarch64.

You said that, yes.  Even without -ffast-math?

Richard.

> Thanks,
> Kyrill
>
> >
> > Richard.
> >
> >> midend and optimise. The previous approach of generating the open-coded version of that
> >> is used when we don't have an appropriate __builtin_fmin/max available.
> >> For example, for a configuration of x86_64-unknown-linux-gnu that I tested there was no
> >> 128-bit __built_fminl available.
> >>
> >> With this patch I'm seeing more than 7000 FMINNM/FMAXNM instructions being generated at -O3
> >> on aarch64 for 521.wrf from fprate SPEC2017 where none before were generated
> >> (we were generating explicit comparisons and NaN checks). This gave a 2.4% improvement
> >> in performance on a Cortex-A72.
> >>
> >> Bootstrapped and tested on aarch64-none-linux-gnu and x86_64-unknown-linux-gnu.
> >>
> >> Ok for trunk?
> >> Thanks,
> >> Kyrill
> >>
> >> 2018-07-17  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
> >>
> >>       * f95-lang.c (gfc_init_builtin_functions): Define __builtin_fmin,
> >>       __builtin_fminf, __builtin_fminl, __builtin_fmax, __builtin_fmaxf,
> >>       __builtin_fmaxl.
> >>       * trans-intrinsic.c: Include builtins.h.
> >>       (gfc_conv_intrinsic_minmax): Emit __builtin_fmin/max or MIN/MAX_EXPR
> >>       functions to calculate the min/max.
> >>
> >> 2018-07-17  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
> >>
> >>       * gfortran.dg/max_fmaxf.f90: New test.
> >>       * gfortran.dg/min_fminf.f90: Likewise.
> >>       * gfortran.dg/minmax_integer.f90: Likewise.
> >>       * gfortran.dg/max_fmaxl_aarch64.f90: Likewise.
> >>       * gfortran.dg/min_fminl_aarch64.f90: Likewise.
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH][Fortran] Use MIN/MAX_EXPR for intrinsics or __builtin_fmin/max when appropriate
  2018-07-18  9:44     ` [PATCH][Fortran] Use MIN/MAX_EXPR for intrinsics or __builtin_fmin/max when appropriate Richard Biener
@ 2018-07-18  9:50       ` Kyrill Tkachov
  2018-07-18 10:06         ` Richard Biener
  0 siblings, 1 reply; 21+ messages in thread
From: Kyrill Tkachov @ 2018-07-18  9:50 UTC (permalink / raw)
  To: Richard Biener; +Cc: fortran, GCC Patches


On 18/07/18 10:44, Richard Biener wrote:
> On Tue, Jul 17, 2018 at 3:46 PM Kyrill Tkachov
> <kyrylo.tkachov@foss.arm.com> wrote:
>> Hi Richard,
>>
>> On 17/07/18 14:27, Richard Biener wrote:
>>> On Tue, Jul 17, 2018 at 2:35 PM Kyrill Tkachov
>>> <kyrylo.tkachov@foss.arm.com> wrote:
>>>> Hi all,
>>>>
>>>> This is my first Fortran patch, so apologies if I'm missing something.
>>>> The current expansion of the min and max intrinsics explicitly expands
>>>> the comparisons between each argument to calculate the global min/max.
>>>> Some targets, like aarch64, have instructions that can calculate the min/max
>>>> of two real (floating-point) numbers with the proper NaN-handling semantics
>>>> (if both inputs are NaN, return Nan. If one is NaN, return the other) and those
>>>> are the semantics provided by the __builtin_fmin/max family of functions that expand
>>>> to these instructions.
>>>>
>>>> This patch makes the frontend emit __builtin_fmin/max directly to compare each
>>>> pair of numbers when the numbers are floating-point, and use MIN_EXPR/MAX_EXPR otherwise
>>>> (integral types and -ffast-math) which should hopefully be easier to recognise in the
>>> What is Fortrans requirement on min/max intrinsics?  Doesn't it only
>>> require things that
>>> are guaranteed by MIN/MAX_EXPR anyways?  The only restriction here is
>> The current implementation expands to:
>>       mvar = a1;
>>       if (a2 .op. mvar || isnan (mvar))
>>         mvar = a2;
>>       if (a3 .op. mvar || isnan (mvar))
>>         mvar = a3;
>>       ...
>>       return mvar;
>>
>> That is, if one of the operands is a NaN it will return the other argument.
>> If both (all) are NaNs, it will return NaN. This is the same as the semantics of fmin/max
>> as far as I can tell.
>>
>>> /* Minimum and maximum values.  When used with floating point, if both
>>>      operands are zeros, or if either operand is NaN, then it is unspecified
>>>      which of the two operands is returned as the result.  */
>>>
>>> which means MIN/MAX_EXPR are not strictly IEEE compliant with signed
>>> zeros or NaNs.
>>> Thus the correct test would be !HONOR_SIGNED_ZEROS && !HONOR_NANS if singed
>>> zeros are significant.
>> True, MIN/MAX_EXPR would not be appropriate in that condition. I guarded their use
>> on !HONOR_NANS (type) only. I'll update it to !HONOR_SIGNED_ZEROS (type) && !HONOR_NANS (type).
>>
>>
>>> I'm not sure if using fmin/max calls when we cannot use MIN/MAX_EXPR
>>> is a good idea,
>>> this may both generate bigger code and be slower.
>> The patch will generate fmin/fmax calls (or the fminf,fminl variants) when mathfn_built_in advertises
>> them as available (does that mean they'll have a fast inline implementation?)
> This doesn't mean anything given you make them available with your
> patch ;)  So I expect it may
> cause issues for !c99_runtime targets (and long double at least).

Urgh, that can cause headaches...

>> If the above doesn't hold and we can't use either MIN/MAX_EXPR of fmin/fmax then the patch falls back
>> to the existing expansion.
> As said I would not use fmin/fmax calls here at all.

... Given the comments from Thomas and Janne, maybe we should just emit MIN/MAX_EXPRs here
since there is no language requirement on NaN/signed zero handling on these intrinsics?
That should make it simpler and more portable.

>> FWIW, this patch does improve performance on 521.wrf from SPEC2017 on aarch64.
> You said that, yes.  Even without -ffast-math?

It improves at -O3 without -ffast-math in particular. With -ffast-math phiopt optimisation
is more aggressive and merges the conditionals into MIN/MAX_EXPRs (minmax_replacement in tree-ssa-phiopt.c)

Thanks,
Kyrill

> Richard.
>
>> Thanks,
>> Kyrill
>>
>>> Richard.
>>>
>>>> midend and optimise. The previous approach of generating the open-coded version of that
>>>> is used when we don't have an appropriate __builtin_fmin/max available.
>>>> For example, for a configuration of x86_64-unknown-linux-gnu that I tested there was no
>>>> 128-bit __built_fminl available.
>>>>
>>>> With this patch I'm seeing more than 7000 FMINNM/FMAXNM instructions being generated at -O3
>>>> on aarch64 for 521.wrf from fprate SPEC2017 where none before were generated
>>>> (we were generating explicit comparisons and NaN checks). This gave a 2.4% improvement
>>>> in performance on a Cortex-A72.
>>>>
>>>> Bootstrapped and tested on aarch64-none-linux-gnu and x86_64-unknown-linux-gnu.
>>>>
>>>> Ok for trunk?
>>>> Thanks,
>>>> Kyrill
>>>>
>>>> 2018-07-17  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>>
>>>>        * f95-lang.c (gfc_init_builtin_functions): Define __builtin_fmin,
>>>>        __builtin_fminf, __builtin_fminl, __builtin_fmax, __builtin_fmaxf,
>>>>        __builtin_fmaxl.
>>>>        * trans-intrinsic.c: Include builtins.h.
>>>>        (gfc_conv_intrinsic_minmax): Emit __builtin_fmin/max or MIN/MAX_EXPR
>>>>        functions to calculate the min/max.
>>>>
>>>> 2018-07-17  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>>
>>>>        * gfortran.dg/max_fmaxf.f90: New test.
>>>>        * gfortran.dg/min_fminf.f90: Likewise.
>>>>        * gfortran.dg/minmax_integer.f90: Likewise.
>>>>        * gfortran.dg/max_fmaxl_aarch64.f90: Likewise.
>>>>        * gfortran.dg/min_fminl_aarch64.f90: Likewise.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH][Fortran] Use MIN/MAX_EXPR for intrinsics or __builtin_fmin/max when appropriate
  2018-07-18  9:50       ` Kyrill Tkachov
@ 2018-07-18 10:06         ` Richard Biener
  2018-07-18 11:45           ` [PATCH]Use " Richard Sandiford
  0 siblings, 1 reply; 21+ messages in thread
From: Richard Biener @ 2018-07-18 10:06 UTC (permalink / raw)
  To: kyrylo.tkachov; +Cc: fortran, GCC Patches

On Wed, Jul 18, 2018 at 11:50 AM Kyrill Tkachov
<kyrylo.tkachov@foss.arm.com> wrote:
>
>
> On 18/07/18 10:44, Richard Biener wrote:
> > On Tue, Jul 17, 2018 at 3:46 PM Kyrill Tkachov
> > <kyrylo.tkachov@foss.arm.com> wrote:
> >> Hi Richard,
> >>
> >> On 17/07/18 14:27, Richard Biener wrote:
> >>> On Tue, Jul 17, 2018 at 2:35 PM Kyrill Tkachov
> >>> <kyrylo.tkachov@foss.arm.com> wrote:
> >>>> Hi all,
> >>>>
> >>>> This is my first Fortran patch, so apologies if I'm missing something.
> >>>> The current expansion of the min and max intrinsics explicitly expands
> >>>> the comparisons between each argument to calculate the global min/max.
> >>>> Some targets, like aarch64, have instructions that can calculate the min/max
> >>>> of two real (floating-point) numbers with the proper NaN-handling semantics
> >>>> (if both inputs are NaN, return Nan. If one is NaN, return the other) and those
> >>>> are the semantics provided by the __builtin_fmin/max family of functions that expand
> >>>> to these instructions.
> >>>>
> >>>> This patch makes the frontend emit __builtin_fmin/max directly to compare each
> >>>> pair of numbers when the numbers are floating-point, and use MIN_EXPR/MAX_EXPR otherwise
> >>>> (integral types and -ffast-math) which should hopefully be easier to recognise in the
> >>> What is Fortrans requirement on min/max intrinsics?  Doesn't it only
> >>> require things that
> >>> are guaranteed by MIN/MAX_EXPR anyways?  The only restriction here is
> >> The current implementation expands to:
> >>       mvar = a1;
> >>       if (a2 .op. mvar || isnan (mvar))
> >>         mvar = a2;
> >>       if (a3 .op. mvar || isnan (mvar))
> >>         mvar = a3;
> >>       ...
> >>       return mvar;
> >>
> >> That is, if one of the operands is a NaN it will return the other argument.
> >> If both (all) are NaNs, it will return NaN. This is the same as the semantics of fmin/max
> >> as far as I can tell.
> >>
> >>> /* Minimum and maximum values.  When used with floating point, if both
> >>>      operands are zeros, or if either operand is NaN, then it is unspecified
> >>>      which of the two operands is returned as the result.  */
> >>>
> >>> which means MIN/MAX_EXPR are not strictly IEEE compliant with signed
> >>> zeros or NaNs.
> >>> Thus the correct test would be !HONOR_SIGNED_ZEROS && !HONOR_NANS if singed
> >>> zeros are significant.
> >> True, MIN/MAX_EXPR would not be appropriate in that condition. I guarded their use
> >> on !HONOR_NANS (type) only. I'll update it to !HONOR_SIGNED_ZEROS (type) && !HONOR_NANS (type).
> >>
> >>
> >>> I'm not sure if using fmin/max calls when we cannot use MIN/MAX_EXPR
> >>> is a good idea,
> >>> this may both generate bigger code and be slower.
> >> The patch will generate fmin/fmax calls (or the fminf,fminl variants) when mathfn_built_in advertises
> >> them as available (does that mean they'll have a fast inline implementation?)
> > This doesn't mean anything given you make them available with your
> > patch ;)  So I expect it may
> > cause issues for !c99_runtime targets (and long double at least).
>
> Urgh, that can cause headaches...
>
> >> If the above doesn't hold and we can't use either MIN/MAX_EXPR of fmin/fmax then the patch falls back
> >> to the existing expansion.
> > As said I would not use fmin/fmax calls here at all.
>
> ... Given the comments from Thomas and Janne, maybe we should just emit MIN/MAX_EXPRs here
> since there is no language requirement on NaN/signed zero handling on these intrinsics?
> That should make it simpler and more portable.

That's fortran maintainers call.

> >> FWIW, this patch does improve performance on 521.wrf from SPEC2017 on aarch64.
> > You said that, yes.  Even without -ffast-math?
>
> It improves at -O3 without -ffast-math in particular. With -ffast-math phiopt optimisation
> is more aggressive and merges the conditionals into MIN/MAX_EXPRs (minmax_replacement in tree-ssa-phiopt.c)

The question is will it be slower without -ffast-math, that is, when
fmin/max() calls are emitted rather
than inline conditionals.

I think a patch just using MAX/MIN_EXPR within the existing
constraints and otherwise falling back to
the current code would be more obvious and other changes should be
mande independently.

Richard.

> Thanks,
> Kyrill
>
> > Richard.
> >
> >> Thanks,
> >> Kyrill
> >>
> >>> Richard.
> >>>
> >>>> midend and optimise. The previous approach of generating the open-coded version of that
> >>>> is used when we don't have an appropriate __builtin_fmin/max available.
> >>>> For example, for a configuration of x86_64-unknown-linux-gnu that I tested there was no
> >>>> 128-bit __built_fminl available.
> >>>>
> >>>> With this patch I'm seeing more than 7000 FMINNM/FMAXNM instructions being generated at -O3
> >>>> on aarch64 for 521.wrf from fprate SPEC2017 where none before were generated
> >>>> (we were generating explicit comparisons and NaN checks). This gave a 2.4% improvement
> >>>> in performance on a Cortex-A72.
> >>>>
> >>>> Bootstrapped and tested on aarch64-none-linux-gnu and x86_64-unknown-linux-gnu.
> >>>>
> >>>> Ok for trunk?
> >>>> Thanks,
> >>>> Kyrill
> >>>>
> >>>> 2018-07-17  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
> >>>>
> >>>>        * f95-lang.c (gfc_init_builtin_functions): Define __builtin_fmin,
> >>>>        __builtin_fminf, __builtin_fminl, __builtin_fmax, __builtin_fmaxf,
> >>>>        __builtin_fmaxl.
> >>>>        * trans-intrinsic.c: Include builtins.h.
> >>>>        (gfc_conv_intrinsic_minmax): Emit __builtin_fmin/max or MIN/MAX_EXPR
> >>>>        functions to calculate the min/max.
> >>>>
> >>>> 2018-07-17  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
> >>>>
> >>>>        * gfortran.dg/max_fmaxf.f90: New test.
> >>>>        * gfortran.dg/min_fminf.f90: Likewise.
> >>>>        * gfortran.dg/minmax_integer.f90: Likewise.
> >>>>        * gfortran.dg/max_fmaxl_aarch64.f90: Likewise.
> >>>>        * gfortran.dg/min_fminl_aarch64.f90: Likewise.
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH][Fortran][v2] Use MIN/MAX_EXPR for min/max intrinsics
  2018-07-17 20:35         ` Janne Blomqvist
@ 2018-07-18 11:17           ` Kyrill Tkachov
  2018-07-18 13:26             ` Thomas König
  0 siblings, 1 reply; 21+ messages in thread
From: Kyrill Tkachov @ 2018-07-18 11:17 UTC (permalink / raw)
  To: Janne Blomqvist, Thomas Koenig; +Cc: Richard Biener, fortran, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 2203 bytes --]

Hi all,

Thank you for the feedback so far.
This version of the patch doesn't try to emit fmin/fmax function calls but instead
emits MIN/MAX_EXPR sequences unconditionally.
I think a source of confusion in the original proposal (for me at least) was
that on aarch64 (that I primarily work on) we implement the fmin/fmax optabs
and therefore these calls are expanded to a single instruction.
But on x86_64 these optabs are not implemented and therefore expand to actual library calls.
Therefore at -O3 (no -ffast-math) I saw a gain on aarch64. But I measured today
on x86_64 and saw a regression.

Thomas and Janne suggested that the Fortran standard does not impose a requirement
on NaN handling for the min/max intrinsics, which would make emitting MIN/MAX_EXPR
sequences unconditionally a valid approach.

However, the gfortran.dg/nan_1.f90 test checks that handling of NaN values in
these intrinsics follows the IEEE semantics (min (nan, 2.0) == 2.0, for example).
This is not required by the standard, but is the existing gfortran behaviour.

If we end up always emitting MIN/MAX_EXPR sequences, like this version of the patch does,
then that test fails on some configurations of x86_64 and not others (for me it FAILs
by default, but passes with -march=native on my machine) and passes on AArch64.
This is expected since MIN/MAX_EXPR doesn't enforce IEEE behaviour on its arguments.

However, by always emitting MIN/MAX_EXPR the gfc_conv_intrinsic_minmax function is
simplified and, perhaps more importantly, generates faster code in the -O3 case.
With this patch I see performance improvement on 521.wrf on both AArch64 (3.7%)
and x86_64 (5.4%).

Thomas, Janne, would this relaxation of NaN handling be acceptable given the benefits
mentioned above? If so, what would be the recommended adjustment to the nan_1.f90 test?

Thanks,
Kyrill

2018-07-18  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * trans-intrinsic.c: (gfc_conv_intrinsic_minmax): Emit MIN_MAX_EXPR
     sequence to calculate the min/max.

2018-07-18  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * gfortran.dg/max_float.f90: New test.
     * gfortran.dg/min_float.f90: Likewise.
     * gfortran.dg/minmax_integer.f90: Likewise.

[-- Attachment #2: fort-v2.patch --]
[-- Type: text/x-patch, Size: 5823 bytes --]

diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c
index d306e3a5a6209c1621d91f99ffc366acecd9c3d0..e5a1f1ddabeedc7b9f473db11e70f29548fc69ac 100644
--- a/gcc/fortran/trans-intrinsic.c
+++ b/gcc/fortran/trans-intrinsic.c
@@ -3874,14 +3874,11 @@ gfc_conv_intrinsic_ttynam (gfc_se * se, gfc_expr * expr)
     minmax (a1, a2, a3, ...)
     {
       mvar = a1;
-      if (a2 .op. mvar || isnan (mvar))
-        mvar = a2;
-      if (a3 .op. mvar || isnan (mvar))
-        mvar = a3;
+      mvar = MIN/MAX_EXPR (mvar, a2);
+      mvar = MIN/MAX_EXPR (mvar, a3);
       ...
-      return mvar
-    }
- */
+      return mvar;
+    }  */
 
 /* TODO: Mismatching types can occur when specific names are used.
    These should be handled during resolution.  */
@@ -3891,7 +3888,6 @@ gfc_conv_intrinsic_minmax (gfc_se * se, gfc_expr * expr, enum tree_code op)
   tree tmp;
   tree mvar;
   tree val;
-  tree thencase;
   tree *args;
   tree type;
   gfc_actual_arglist *argexpr;
@@ -3912,55 +3908,37 @@ gfc_conv_intrinsic_minmax (gfc_se * se, gfc_expr * expr, enum tree_code op)
 
   mvar = gfc_create_var (type, "M");
   gfc_add_modify (&se->pre, mvar, args[0]);
-  for (i = 1, argexpr = argexpr->next; i < nargs; i++)
-    {
-      tree cond, isnan;
 
+  for (i = 1, argexpr = argexpr->next; i < nargs; i++, argexpr = argexpr->next)
+    {
+      tree cond = NULL_TREE;
       val = args[i];
 
       /* Handle absent optional arguments by ignoring the comparison.  */
       if (argexpr->expr->expr_type == EXPR_VARIABLE
 	  && argexpr->expr->symtree->n.sym->attr.optional
 	  && TREE_CODE (val) == INDIRECT_REF)
-	cond = fold_build2_loc (input_location,
+	{
+	  cond = fold_build2_loc (input_location,
 				NE_EXPR, logical_type_node,
 				TREE_OPERAND (val, 0),
 			build_int_cst (TREE_TYPE (TREE_OPERAND (val, 0)), 0));
-      else
-      {
-	cond = NULL_TREE;
-
+	}
+      else if (!VAR_P (val) && !TREE_CONSTANT (val))
 	/* Only evaluate the argument once.  */
-	if (!VAR_P (val) && !TREE_CONSTANT (val))
-	  val = gfc_evaluate_now (val, &se->pre);
-      }
-
-      thencase = build2_v (MODIFY_EXPR, mvar, convert (type, val));
+	val = gfc_evaluate_now (val, &se->pre);
 
-      tmp = fold_build2_loc (input_location, op, logical_type_node,
-			     convert (type, val), mvar);
+      tree calc;
 
-      /* FIXME: When the IEEE_ARITHMETIC module is implemented, the call to
-	 __builtin_isnan might be made dependent on that module being loaded,
-	 to help performance of programs that don't rely on IEEE semantics.  */
-      if (FLOAT_TYPE_P (TREE_TYPE (mvar)))
-	{
-	  isnan = build_call_expr_loc (input_location,
-				       builtin_decl_explicit (BUILT_IN_ISNAN),
-				       1, mvar);
-	  tmp = fold_build2_loc (input_location, TRUTH_OR_EXPR,
-				 logical_type_node, tmp,
-				 fold_convert (logical_type_node, isnan));
-	}
-      tmp = build3_v (COND_EXPR, tmp, thencase,
-		      build_empty_stmt (input_location));
+      tree_code code = op == GT_EXPR ? MAX_EXPR : MIN_EXPR;
+      calc = fold_build2_loc (input_location, code, type,
+				convert (type, val), mvar);
+      tmp = build2_v (MODIFY_EXPR, mvar, calc);
 
       if (cond != NULL_TREE)
 	tmp = build3_v (COND_EXPR, cond, tmp,
 			build_empty_stmt (input_location));
-
       gfc_add_expr_to_block (&se->pre, tmp);
-      argexpr = argexpr->next;
     }
   se->expr = mvar;
 }
diff --git a/gcc/testsuite/gfortran.dg/max_float.f90 b/gcc/testsuite/gfortran.dg/max_float.f90
new file mode 100644
index 0000000000000000000000000000000000000000..a3a5d4f5df29cfa9c4e3abc2c18e7d3de1169fc3
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/max_float.f90
@@ -0,0 +1,19 @@
+! { dg-do compile }
+! { dg-options "-O2 -fdump-tree-optimized" }
+
+subroutine fool (a, b, c, d, e, f, g, h)
+  real (kind=16) :: a, b, c, d, e, f, g, h
+  a = max (a, b, c, d, e, f, g, h)
+end subroutine
+
+subroutine foo (a, b, c, d, e, f, g, h)
+  real (kind=8) :: a, b, c, d, e, f, g, h
+  a = max (a, b, c, d, e, f, g, h)
+end subroutine
+
+subroutine foof (a, b, c, d, e, f, g, h)
+  real (kind=4) :: a, b, c, d, e, f, g, h
+  a = max (a, b, c, d, e, f, g, h)
+end subroutine
+
+! { dg-final { scan-tree-dump-times "MAX_EXPR " 21 "optimized" } }
diff --git a/gcc/testsuite/gfortran.dg/min_float.f90 b/gcc/testsuite/gfortran.dg/min_float.f90
new file mode 100644
index 0000000000000000000000000000000000000000..41bd6b3c4062f364791841f7097f9a5c00782ec8
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/min_float.f90
@@ -0,0 +1,19 @@
+! { dg-do compile }
+! { dg-options "-O2 -fdump-tree-optimized" }
+
+subroutine fool (a, b, c, d, e, f, g, h)
+  real (kind=16) :: a, b, c, d, e, f, g, h
+  a = min (a, b, c, d, e, f, g, h)
+end subroutine
+
+subroutine foo (a, b, c, d, e, f, g, h)
+  real (kind=8) :: a, b, c, d, e, f, g, h
+  a = min (a, b, c, d, e, f, g, h)
+end subroutine
+
+subroutine foof (a, b, c, d, e, f, g, h)
+  real (kind=4) :: a, b, c, d, e, f, g, h
+  a = min (a, b, c, d, e, f, g, h)
+end subroutine
+
+! { dg-final { scan-tree-dump-times "MIN_EXPR " 21 "optimized" } }
diff --git a/gcc/testsuite/gfortran.dg/minmax_integer.f90 b/gcc/testsuite/gfortran.dg/minmax_integer.f90
new file mode 100644
index 0000000000000000000000000000000000000000..5b6be38c7055ce4e8620cf75ec7d8a182436b24f
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/minmax_integer.f90
@@ -0,0 +1,15 @@
+! { dg-do compile }
+! { dg-options "-O2 -fdump-tree-optimized" }
+
+subroutine foo (a, b, c, d, e, f, g, h)
+  integer (kind=4) :: a, b, c, d, e, f, g, h
+  a = min (a, b, c, d, e, f, g, h)
+end subroutine
+
+subroutine foof (a, b, c, d, e, f, g, h)
+  integer (kind=4) :: a, b, c, d, e, f, g, h
+  a = max (a, b, c, d, e, f, g, h)
+end subroutine
+
+! { dg-final { scan-tree-dump-times "MIN_EXPR" 7 "optimized" } }
+! { dg-final { scan-tree-dump-times "MAX_EXPR" 7 "optimized" } }

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH]Use MIN/MAX_EXPR for intrinsics or __builtin_fmin/max when appropriate
  2018-07-18 10:06         ` Richard Biener
@ 2018-07-18 11:45           ` Richard Sandiford
  0 siblings, 0 replies; 21+ messages in thread
From: Richard Sandiford @ 2018-07-18 11:45 UTC (permalink / raw)
  To: Richard Biener; +Cc: kyrylo.tkachov, fortran, GCC Patches

Richard Biener <richard.guenther@gmail.com> writes:
> On Wed, Jul 18, 2018 at 11:50 AM Kyrill Tkachov
> <kyrylo.tkachov@foss.arm.com> wrote:
>>
>>
>> On 18/07/18 10:44, Richard Biener wrote:
>> > On Tue, Jul 17, 2018 at 3:46 PM Kyrill Tkachov
>> > <kyrylo.tkachov@foss.arm.com> wrote:
>> >> Hi Richard,
>> >>
>> >> On 17/07/18 14:27, Richard Biener wrote:
>> >>> On Tue, Jul 17, 2018 at 2:35 PM Kyrill Tkachov
>> >>> <kyrylo.tkachov@foss.arm.com> wrote:
>> >>>> Hi all,
>> >>>>
>> >>>> This is my first Fortran patch, so apologies if I'm missing something.
>> >>>> The current expansion of the min and max intrinsics explicitly expands
>> >>>> the comparisons between each argument to calculate the global min/max.
>> >>>> Some targets, like aarch64, have instructions that can calculate
>> >>>> the min/max
>> >>>> of two real (floating-point) numbers with the proper NaN-handling
>> >>>> semantics
>> >>>> (if both inputs are NaN, return Nan. If one is NaN, return the
>> >>>> other) and those
>> >>>> are the semantics provided by the __builtin_fmin/max family of
>> >>>> functions that expand
>> >>>> to these instructions.
>> >>>>
>> >>>> This patch makes the frontend emit __builtin_fmin/max directly to
>> >>>> compare each
>> >>>> pair of numbers when the numbers are floating-point, and use
>> >>>> MIN_EXPR/MAX_EXPR otherwise
>> >>>> (integral types and -ffast-math) which should hopefully be easier
>> >>>> to recognise in the
>> >>> What is Fortrans requirement on min/max intrinsics?  Doesn't it only
>> >>> require things that
>> >>> are guaranteed by MIN/MAX_EXPR anyways?  The only restriction here is
>> >> The current implementation expands to:
>> >>       mvar = a1;
>> >>       if (a2 .op. mvar || isnan (mvar))
>> >>         mvar = a2;
>> >>       if (a3 .op. mvar || isnan (mvar))
>> >>         mvar = a3;
>> >>       ...
>> >>       return mvar;
>> >>
>> >> That is, if one of the operands is a NaN it will return the other argument.
>> >> If both (all) are NaNs, it will return NaN. This is the same as the semantics of fmin/max
>> >> as far as I can tell.
>> >>
>> >>> /* Minimum and maximum values.  When used with floating point, if both
>> >>>      operands are zeros, or if either operand is NaN, then it is
>> >>> unspecified
>> >>>      which of the two operands is returned as the result.  */
>> >>>
>> >>> which means MIN/MAX_EXPR are not strictly IEEE compliant with signed
>> >>> zeros or NaNs.
>> >>> Thus the correct test would be !HONOR_SIGNED_ZEROS && !HONOR_NANS
>> >>> if singed
>> >>> zeros are significant.
>> >> True, MIN/MAX_EXPR would not be appropriate in that condition. I
>> >> guarded their use
>> >> on !HONOR_NANS (type) only. I'll update it to !HONOR_SIGNED_ZEROS
>> >> (type) && !HONOR_NANS (type).
>> >>
>> >>
>> >>> I'm not sure if using fmin/max calls when we cannot use MIN/MAX_EXPR
>> >>> is a good idea,
>> >>> this may both generate bigger code and be slower.
>> >> The patch will generate fmin/fmax calls (or the fminf,fminl
>> >> variants) when mathfn_built_in advertises
>> >> them as available (does that mean they'll have a fast inline
>> >> implementation?)
>> > This doesn't mean anything given you make them available with your
>> > patch ;)  So I expect it may
>> > cause issues for !c99_runtime targets (and long double at least).
>>
>> Urgh, that can cause headaches...
>>
>> >> If the above doesn't hold and we can't use either MIN/MAX_EXPR of
>> >> fmin/fmax then the patch falls back
>> >> to the existing expansion.
>> > As said I would not use fmin/fmax calls here at all.
>>
>> ... Given the comments from Thomas and Janne, maybe we should just
>> emit MIN/MAX_EXPRs here
>> since there is no language requirement on NaN/signed zero handling on
>> these intrinsics?
>> That should make it simpler and more portable.
>
> That's fortran maintainers call.
>
>> >> FWIW, this patch does improve performance on 521.wrf from SPEC2017
>> >> on aarch64.
>> > You said that, yes.  Even without -ffast-math?
>>
>> It improves at -O3 without -ffast-math in particular. With -ffast-math
>> phiopt optimisation
>> is more aggressive and merges the conditionals into MIN/MAX_EXPRs
>> (minmax_replacement in tree-ssa-phiopt.c)
>
> The question is will it be slower without -ffast-math, that is, when
> fmin/max() calls are emitted rather
> than inline conditionals.
>
> I think a patch just using MAX/MIN_EXPR within the existing
> constraints and otherwise falling back to
> the current code would be more obvious and other changes should be
> mande independently.

If going to MIN_EXPR and MAX_EXPR unconditionally isn't acceptable,
maybe an alternative would be to go straight to internal functions,
under the usual:

  direct_internal_fn_supported_p (IFN_F{MIN,MAX}, type, OPTIMIZE_FOR_SPEED)

condition.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH][Fortran][v2] Use MIN/MAX_EXPR for min/max intrinsics
  2018-07-18 11:17           ` [PATCH][Fortran][v2] Use MIN/MAX_EXPR for min/max intrinsics Kyrill Tkachov
@ 2018-07-18 13:26             ` Thomas König
  2018-07-18 14:03               ` Kyrill Tkachov
  2018-07-18 15:10               ` Janne Blomqvist
  0 siblings, 2 replies; 21+ messages in thread
From: Thomas König @ 2018-07-18 13:26 UTC (permalink / raw)
  To: Kyrill Tkachov
  Cc: Janne Blomqvist, Thomas Koenig, Richard Biener, fortran, GCC Patches

Hi Kyrlll,

> Am 18.07.2018 um 13:17 schrieb Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>:
> 
> Thomas, Janne, would this relaxation of NaN handling be acceptable given the benefits
> mentioned above? If so, what would be the recommended adjustment to the nan_1.f90 test?

I would be a bit careful about changing behavior in such a major way. What would the results with NaN and infinity then be, with or without optimization? Would the results be consistent with min(nan,num) vs min(num,nan)? Would they be consistent with the new IEEE standard?

In general, I think that min(nan,num) should be nan and that our current behavior is not the best.

Does anybody have dats points on how this is handled by other compilers?

Oh, and if anything is changed, then compile and runtime behavior should always be the same.

Regards, Thomas

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH][Fortran][v2] Use MIN/MAX_EXPR for min/max intrinsics
  2018-07-18 13:26             ` Thomas König
@ 2018-07-18 14:03               ` Kyrill Tkachov
  2018-07-18 14:55                 ` Janne Blomqvist
  2018-07-18 15:28                 ` Richard Sandiford
  2018-07-18 15:10               ` Janne Blomqvist
  1 sibling, 2 replies; 21+ messages in thread
From: Kyrill Tkachov @ 2018-07-18 14:03 UTC (permalink / raw)
  To: Thomas König
  Cc: Janne Blomqvist, Thomas Koenig, Richard Biener, fortran,
	GCC Patches, Richard Sandiford

[-- Attachment #1: Type: text/plain, Size: 2314 bytes --]


On 18/07/18 14:26, Thomas König wrote:
> Hi Kyrlll,
>
>> Am 18.07.2018 um 13:17 schrieb Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>:
>>
>> Thomas, Janne, would this relaxation of NaN handling be acceptable given the benefits
>> mentioned above? If so, what would be the recommended adjustment to the nan_1.f90 test?
> I would be a bit careful about changing behavior in such a major way. What would the results with NaN and infinity then be, with or without optimization? Would the results be consistent with min(nan,num) vs min(num,nan)? Would they be consistent with the new IEEE standard?
>
> In general, I think that min(nan,num) should be nan and that our current behavior is not the best.
>
> Does anybody have dats points on how this is handled by other compilers?
>
> Oh, and if anything is changed, then compile and runtime behavior should always be the same.

Thanks, that makes it clearer what behaviour is accceptable.

So this v3 patch follows Richard Sandiford's suggested approach of emitting IFN_FMIN/FMAX
when dealing with floating-point values and NaN handling is important and the target
supports the IFN_FMIN/FMAX. Otherwise the current explicit comparison sequence is emitted.
For integer types and -ffast-math floating-point it will emit MIN/MAX_EXPR.

With this patch the nan_1.f90 behaviour is preserved on all targets, we get the optimal
sequence on aarch64 and on x86_64 we avoid the function call, with no changes in code generation.

This gives the performance improvement on 521.wrf on aarch64 and leaves it unchanged on x86_64.

I'm hoping this addresses all the concerns raised in this thread:
* The NaN-handling behaviour is unchanged on all platforms.
* The fast inline sequence is emitted where it is available.
* No calls to library fmin*/fmax* are emitted where there were none.
* MIN/MAX_EXPR sequence are emitted where possible.

Is this acceptable?

Thanks,
Kyrill

2018-07-18  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * trans-intrinsic.c: (gfc_conv_intrinsic_minmax): Emit MIN_MAX_EXPR
     or IFN_FMIN/FMAX sequence to calculate the min/max when possible.

2018-07-18  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * gfortran.dg/max_fmaxl_aarch64.f90: New test.
     * gfortran.dg/min_fminl_aarch64.f90: Likewise.
     * gfortran.dg/minmax_integer.f90: Likewise.

[-- Attachment #2: fort-v3.patch --]
[-- Type: text/x-patch, Size: 7106 bytes --]

diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c
index d306e3a5a6209c1621d91f99ffc366acecd9c3d0..6f5700f2a421d2a735d77c4c4ec0c4c9c058e727 100644
--- a/gcc/fortran/trans-intrinsic.c
+++ b/gcc/fortran/trans-intrinsic.c
@@ -31,6 +31,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "trans.h"
 #include "stringpool.h"
 #include "fold-const.h"
+#include "internal-fn.h"
 #include "tree-nested.h"
 #include "stor-layout.h"
 #include "toplev.h"	/* For rest_of_decl_compilation.  */
@@ -3874,14 +3875,15 @@ gfc_conv_intrinsic_ttynam (gfc_se * se, gfc_expr * expr)
     minmax (a1, a2, a3, ...)
     {
       mvar = a1;
-      if (a2 .op. mvar || isnan (mvar))
-        mvar = a2;
-      if (a3 .op. mvar || isnan (mvar))
-        mvar = a3;
+      mvar = COMP (mvar, a2)
+      mvar = COMP (mvar, a3)
       ...
-      return mvar
+      return mvar;
     }
- */
+    Where COMP is MIN/MAX_EXPR for integral types or when we don't
+    care about NaNs, or IFN_FMIN/MAX when the target has support for
+    fast NaN-honouring min/max.  When neither holds expand a sequence
+    of explicit comparisons.  */
 
 /* TODO: Mismatching types can occur when specific names are used.
    These should be handled during resolution.  */
@@ -3891,7 +3893,6 @@ gfc_conv_intrinsic_minmax (gfc_se * se, gfc_expr * expr, enum tree_code op)
   tree tmp;
   tree mvar;
   tree val;
-  tree thencase;
   tree *args;
   tree type;
   gfc_actual_arglist *argexpr;
@@ -3912,55 +3913,77 @@ gfc_conv_intrinsic_minmax (gfc_se * se, gfc_expr * expr, enum tree_code op)
 
   mvar = gfc_create_var (type, "M");
   gfc_add_modify (&se->pre, mvar, args[0]);
-  for (i = 1, argexpr = argexpr->next; i < nargs; i++)
-    {
-      tree cond, isnan;
 
+  internal_fn ifn = op == GT_EXPR ? IFN_FMAX : IFN_FMIN;
+
+  for (i = 1, argexpr = argexpr->next; i < nargs; i++, argexpr = argexpr->next)
+    {
+      tree cond = NULL_TREE;
       val = args[i];
 
       /* Handle absent optional arguments by ignoring the comparison.  */
       if (argexpr->expr->expr_type == EXPR_VARIABLE
 	  && argexpr->expr->symtree->n.sym->attr.optional
 	  && TREE_CODE (val) == INDIRECT_REF)
-	cond = fold_build2_loc (input_location,
+	{
+	  cond = fold_build2_loc (input_location,
 				NE_EXPR, logical_type_node,
 				TREE_OPERAND (val, 0),
 			build_int_cst (TREE_TYPE (TREE_OPERAND (val, 0)), 0));
-      else
-      {
-	cond = NULL_TREE;
-
+	}
+      else if (!VAR_P (val) && !TREE_CONSTANT (val))
 	/* Only evaluate the argument once.  */
-	if (!VAR_P (val) && !TREE_CONSTANT (val))
-	  val = gfc_evaluate_now (val, &se->pre);
-      }
+	val = gfc_evaluate_now (val, &se->pre);
 
-      thencase = build2_v (MODIFY_EXPR, mvar, convert (type, val));
+      tree calc;
+      /* If we dealing with integral types or we don't care about NaNs
+	 just do a MIN/MAX_EXPR.  */
+      if (!HONOR_NANS (type) && !HONOR_SIGNED_ZEROS (type))
+	{
+
+	  tree_code code = op == GT_EXPR ? MAX_EXPR : MIN_EXPR;
+	  calc = fold_build2_loc (input_location, code, type,
+				  convert (type, val), mvar);
+	  tmp = build2_v (MODIFY_EXPR, mvar, calc);
 
-      tmp = fold_build2_loc (input_location, op, logical_type_node,
-			     convert (type, val), mvar);
+	}
+      /* If we care about NaNs and we have internal functions available for
+	 fmin/fmax to perform the comparison, use those.  */
+      else if (SCALAR_FLOAT_TYPE_P (type)
+	      && direct_internal_fn_supported_p (ifn, type, OPTIMIZE_FOR_SPEED))
+	{
+	  calc = build_call_expr_internal_loc (input_location, ifn, type,
+				      2, mvar, convert (type, val));
+	  tmp = build2_v (MODIFY_EXPR, mvar, calc);
 
-      /* FIXME: When the IEEE_ARITHMETIC module is implemented, the call to
-	 __builtin_isnan might be made dependent on that module being loaded,
-	 to help performance of programs that don't rely on IEEE semantics.  */
-      if (FLOAT_TYPE_P (TREE_TYPE (mvar)))
+	}
+      /* Otherwise expand to:
+	mvar = a1;
+	if (a2 .op. mvar || isnan (mvar))
+	  mvar = a2;
+	if (a3 .op. mvar || isnan (mvar))
+	  mvar = a3;
+	...  */
+      else
 	{
-	  isnan = build_call_expr_loc (input_location,
-				       builtin_decl_explicit (BUILT_IN_ISNAN),
-				       1, mvar);
+	  tree isnan = build_call_expr_loc (input_location,
+					builtin_decl_explicit (BUILT_IN_ISNAN),
+					1, mvar);
+	  tmp = fold_build2_loc (input_location, op, logical_type_node,
+				 convert (type, val), mvar);
+
 	  tmp = fold_build2_loc (input_location, TRUTH_OR_EXPR,
-				 logical_type_node, tmp,
-				 fold_convert (logical_type_node, isnan));
+				  logical_type_node, tmp,
+				  fold_convert (logical_type_node, isnan));
+	  tmp = build3_v (COND_EXPR, tmp,
+			  build2_v (MODIFY_EXPR, mvar, convert (type, val)),
+			  build_empty_stmt (input_location));
 	}
-      tmp = build3_v (COND_EXPR, tmp, thencase,
-		      build_empty_stmt (input_location));
 
       if (cond != NULL_TREE)
 	tmp = build3_v (COND_EXPR, cond, tmp,
 			build_empty_stmt (input_location));
-
       gfc_add_expr_to_block (&se->pre, tmp);
-      argexpr = argexpr->next;
     }
   se->expr = mvar;
 }
diff --git a/gcc/testsuite/gfortran.dg/max_fmaxl_aarch64.f90 b/gcc/testsuite/gfortran.dg/max_fmaxl_aarch64.f90
new file mode 100644
index 0000000000000000000000000000000000000000..8c8ea063e5d0718dc829c1f5574c5b46040e6786
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/max_fmaxl_aarch64.f90
@@ -0,0 +1,9 @@
+! { dg-do compile { target aarch64*-*-* } }
+! { dg-options "-O2 -fdump-tree-optimized" }
+
+subroutine fool (a, b, c, d, e, f, g, h)
+  real (kind=16) :: a, b, c, d, e, f, g, h
+  a = max (a, b, c, d, e, f, g, h)
+end subroutine
+
+! { dg-final { scan-tree-dump-times "__builtin_fmaxl " 7 "optimized" } }
diff --git a/gcc/testsuite/gfortran.dg/min_fminl_aarch64.f90 b/gcc/testsuite/gfortran.dg/min_fminl_aarch64.f90
new file mode 100644
index 0000000000000000000000000000000000000000..92368917fb48e0c468a16d080ab3a9ac842e01a7
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/min_fminl_aarch64.f90
@@ -0,0 +1,9 @@
+! { dg-do compile { target aarch64*-*-* } }
+! { dg-options "-O2 -fdump-tree-optimized" }
+
+subroutine fool (a, b, c, d, e, f, g, h)
+  real (kind=16) :: a, b, c, d, e, f, g, h
+  a = min (a, b, c, d, e, f, g, h)
+end subroutine
+
+! { dg-final { scan-tree-dump-times "__builtin_fminl " 7 "optimized" } }
diff --git a/gcc/testsuite/gfortran.dg/minmax_integer.f90 b/gcc/testsuite/gfortran.dg/minmax_integer.f90
new file mode 100644
index 0000000000000000000000000000000000000000..5b6be38c7055ce4e8620cf75ec7d8a182436b24f
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/minmax_integer.f90
@@ -0,0 +1,15 @@
+! { dg-do compile }
+! { dg-options "-O2 -fdump-tree-optimized" }
+
+subroutine foo (a, b, c, d, e, f, g, h)
+  integer (kind=4) :: a, b, c, d, e, f, g, h
+  a = min (a, b, c, d, e, f, g, h)
+end subroutine
+
+subroutine foof (a, b, c, d, e, f, g, h)
+  integer (kind=4) :: a, b, c, d, e, f, g, h
+  a = max (a, b, c, d, e, f, g, h)
+end subroutine
+
+! { dg-final { scan-tree-dump-times "MIN_EXPR" 7 "optimized" } }
+! { dg-final { scan-tree-dump-times "MAX_EXPR" 7 "optimized" } }

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH][Fortran][v2] Use MIN/MAX_EXPR for min/max intrinsics
  2018-07-18 14:03               ` Kyrill Tkachov
@ 2018-07-18 14:55                 ` Janne Blomqvist
  2018-07-18 15:28                 ` Richard Sandiford
  1 sibling, 0 replies; 21+ messages in thread
From: Janne Blomqvist @ 2018-07-18 14:55 UTC (permalink / raw)
  To: Kyrill Tkachov
  Cc: Thomas König, Thomas Koenig, Richard Biener, fortran,
	GCC Patches, Richard Sandiford

On Wed, Jul 18, 2018 at 5:03 PM, Kyrill Tkachov <kyrylo.tkachov@foss.arm.com
> wrote:

>
> On 18/07/18 14:26, Thomas König wrote:
>
>> Hi Kyrlll,
>>
>> Am 18.07.2018 um 13:17 schrieb Kyrill Tkachov <
>>> kyrylo.tkachov@foss.arm.com>:
>>>
>>> Thomas, Janne, would this relaxation of NaN handling be acceptable given
>>> the benefits
>>> mentioned above? If so, what would be the recommended adjustment to the
>>> nan_1.f90 test?
>>>
>> I would be a bit careful about changing behavior in such a major way.
>> What would the results with NaN and infinity then be, with or without
>> optimization? Would the results be consistent with min(nan,num) vs
>> min(num,nan)? Would they be consistent with the new IEEE standard?
>>
>> In general, I think that min(nan,num) should be nan and that our current
>> behavior is not the best.
>>
>> Does anybody have dats points on how this is handled by other compilers?
>>
>> Oh, and if anything is changed, then compile and runtime behavior should
>> always be the same.
>>
>
> Thanks, that makes it clearer what behaviour is accceptable.
>
> So this v3 patch follows Richard Sandiford's suggested approach of
> emitting IFN_FMIN/FMAX
> when dealing with floating-point values and NaN handling is important and
> the target
> supports the IFN_FMIN/FMAX. Otherwise the current explicit comparison
> sequence is emitted.
> For integer types and -ffast-math floating-point it will emit MIN/MAX_EXPR.
>
> With this patch the nan_1.f90 behaviour is preserved on all targets, we
> get the optimal
> sequence on aarch64 and on x86_64 we avoid the function call, with no
> changes in code generation.
>
> This gives the performance improvement on 521.wrf on aarch64 and leaves it
> unchanged on x86_64.
>
> I'm hoping this addresses all the concerns raised in this thread:
> * The NaN-handling behaviour is unchanged on all platforms.
> * The fast inline sequence is emitted where it is available.
> * No calls to library fmin*/fmax* are emitted where there were none.
> * MIN/MAX_EXPR sequence are emitted where possible.
>
> Is this acceptable?
>

So if I understand it correctly, the "internal fn" thing is a mechanism
that allows to check whether the target supports expanding a builtin inline
or whether it requires a call to an external library function?

If so, then yes, Ok, thanks for the patch!


-- 
Janne Blomqvist

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH][Fortran][v2] Use MIN/MAX_EXPR for min/max intrinsics
  2018-07-18 13:26             ` Thomas König
  2018-07-18 14:03               ` Kyrill Tkachov
@ 2018-07-18 15:10               ` Janne Blomqvist
  2018-07-26 20:36                 ` Joseph Myers
  2018-08-06 12:05                 ` Janne Blomqvist
  1 sibling, 2 replies; 21+ messages in thread
From: Janne Blomqvist @ 2018-07-18 15:10 UTC (permalink / raw)
  To: Thomas König
  Cc: Kyrill Tkachov, Thomas Koenig, Richard Biener, fortran, GCC Patches

On Wed, Jul 18, 2018 at 4:26 PM, Thomas König <tk@tkoenig.net> wrote:

> Hi Kyrlll,
>
> > Am 18.07.2018 um 13:17 schrieb Kyrill Tkachov <
> kyrylo.tkachov@foss.arm.com>:
> >
> > Thomas, Janne, would this relaxation of NaN handling be acceptable given
> the benefits
> > mentioned above? If so, what would be the recommended adjustment to the
> nan_1.f90 test?
>
> I would be a bit careful about changing behavior in such a major way. What
> would the results with NaN and infinity then be, with or without
> optimization? Would the results be consistent with min(nan,num) vs
> min(num,nan)? Would they be consistent with the new IEEE standard?
>

AFAIU, MIN/MAX_EXPR do the right thing when comparing a normal number with
Inf. For NaN the result is undefined, and you might indeed have

min(a, NaN) = a
min(NaN, a) = NaN

where "a" is a normal number.

(I think that happens at least on x86 if MIN_EXPR is expanded to
minsd/minpd.

Apparently what the proper result for min(a, NaN) should be is contentious
enough that minnum was removed from the upcoming IEEE 754 revision, and new
operations AFAICS have the semantics

minimum(a, NaN) = minimum(NaN, a) = NaN
minimumNumber(a, NaN) = minimumNumber(NaN, a) = a

That is minimumNumber corresponds to minnum in IEEE 754-2008 and fmin* in
C, and to the current behavior of gfortran.


> In general, I think that min(nan,num) should be nan and that our current
> behavior is not the best.
>

There was some extensive discussion of that in the Julia bug report I
linked to in an earlier message, and they came to the same conclusion and
changed their behavior.


> Does anybody have dats points on how this is handled by other compilers?
>

The only other compiler I have access to at the moment is ifort (and not
the latest version), but maybe somebody has access to a wider variety?


> Oh, and if anything is changed, then compile and runtime behavior should
> always be the same.
>

Well, IFF we place some weight on the runtime behavior being particularly
sensible wrt NaN's, which it wouldn't be if we just use a plain
MIN/MAX_EXPR. Is it worth taking a performance hit for, though? In
particular, if other compilers are inconsistent, we might as well do
whatever is fastest.


-- 
Janne Blomqvist

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH][Fortran][v2] Use MIN/MAX_EXPR for min/max intrinsics
  2018-07-18 14:03               ` Kyrill Tkachov
  2018-07-18 14:55                 ` Janne Blomqvist
@ 2018-07-18 15:28                 ` Richard Sandiford
  2018-07-18 16:04                   ` Kyrill Tkachov
  1 sibling, 1 reply; 21+ messages in thread
From: Richard Sandiford @ 2018-07-18 15:28 UTC (permalink / raw)
  To: Kyrill Tkachov
  Cc: Thomas König, Janne Blomqvist, Thomas Koenig,
	Richard Biener, fortran, GCC Patches

Thanks for doing this.

Kyrill  Tkachov <kyrylo.tkachov@foss.arm.com> writes:
> +	  calc = build_call_expr_internal_loc (input_location, ifn, type,
> +				      2, mvar, convert (type, val));

(indentation looks off)

> diff --git a/gcc/testsuite/gfortran.dg/max_fmaxl_aarch64.f90 b/gcc/testsuite/gfortran.dg/max_fmaxl_aarch64.f90
> new file mode 100644
> index 0000000000000000000000000000000000000000..8c8ea063e5d0718dc829c1f5574c5b46040e6786
> --- /dev/null
> +++ b/gcc/testsuite/gfortran.dg/max_fmaxl_aarch64.f90
> @@ -0,0 +1,9 @@
> +! { dg-do compile { target aarch64*-*-* } }
> +! { dg-options "-O2 -fdump-tree-optimized" }
> +
> +subroutine fool (a, b, c, d, e, f, g, h)
> +  real (kind=16) :: a, b, c, d, e, f, g, h
> +  a = max (a, b, c, d, e, f, g, h)
> +end subroutine
> +
> +! { dg-final { scan-tree-dump-times "__builtin_fmaxl " 7 "optimized" } }
> diff --git a/gcc/testsuite/gfortran.dg/min_fminl_aarch64.f90 b/gcc/testsuite/gfortran.dg/min_fminl_aarch64.f90
> new file mode 100644
> index 0000000000000000000000000000000000000000..92368917fb48e0c468a16d080ab3a9ac842e01a7
> --- /dev/null
> +++ b/gcc/testsuite/gfortran.dg/min_fminl_aarch64.f90
> @@ -0,0 +1,9 @@
> +! { dg-do compile { target aarch64*-*-* } }
> +! { dg-options "-O2 -fdump-tree-optimized" }
> +
> +subroutine fool (a, b, c, d, e, f, g, h)
> +  real (kind=16) :: a, b, c, d, e, f, g, h
> +  a = min (a, b, c, d, e, f, g, h)
> +end subroutine
> +
> +! { dg-final { scan-tree-dump-times "__builtin_fminl " 7 "optimized" } }

Do these still pass?  I wouldn't have expected us to use __builtin_fmin*
and __builtin_fmax* now.

It would be good to have tests that we use ".FMIN" and ".FMAX" for kind=4
and kind=8 on AArch64, since that's really the end goal here.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH][Fortran][v2] Use MIN/MAX_EXPR for min/max intrinsics
  2018-07-18 15:28                 ` Richard Sandiford
@ 2018-07-18 16:04                   ` Kyrill Tkachov
  0 siblings, 0 replies; 21+ messages in thread
From: Kyrill Tkachov @ 2018-07-18 16:04 UTC (permalink / raw)
  To: Thomas König, Janne Blomqvist, Thomas Koenig,
	Richard Biener, fortran, GCC Patches, richard.sandiford

[-- Attachment #1: Type: text/plain, Size: 2723 bytes --]

Hi Richard,

On 18/07/18 16:27, Richard Sandiford wrote:
> Thanks for doing this.
>
> Kyrill  Tkachov <kyrylo.tkachov@foss.arm.com> writes:
>> +	  calc = build_call_expr_internal_loc (input_location, ifn, type,
>> +				      2, mvar, convert (type, val));
> (indentation looks off)
>
>> diff --git a/gcc/testsuite/gfortran.dg/max_fmaxl_aarch64.f90 b/gcc/testsuite/gfortran.dg/max_fmaxl_aarch64.f90
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..8c8ea063e5d0718dc829c1f5574c5b46040e6786
>> --- /dev/null
>> +++ b/gcc/testsuite/gfortran.dg/max_fmaxl_aarch64.f90
>> @@ -0,0 +1,9 @@
>> +! { dg-do compile { target aarch64*-*-* } }
>> +! { dg-options "-O2 -fdump-tree-optimized" }
>> +
>> +subroutine fool (a, b, c, d, e, f, g, h)
>> +  real (kind=16) :: a, b, c, d, e, f, g, h
>> +  a = max (a, b, c, d, e, f, g, h)
>> +end subroutine
>> +
>> +! { dg-final { scan-tree-dump-times "__builtin_fmaxl " 7 "optimized" } }
>> diff --git a/gcc/testsuite/gfortran.dg/min_fminl_aarch64.f90 b/gcc/testsuite/gfortran.dg/min_fminl_aarch64.f90
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..92368917fb48e0c468a16d080ab3a9ac842e01a7
>> --- /dev/null
>> +++ b/gcc/testsuite/gfortran.dg/min_fminl_aarch64.f90
>> @@ -0,0 +1,9 @@
>> +! { dg-do compile { target aarch64*-*-* } }
>> +! { dg-options "-O2 -fdump-tree-optimized" }
>> +
>> +subroutine fool (a, b, c, d, e, f, g, h)
>> +  real (kind=16) :: a, b, c, d, e, f, g, h
>> +  a = min (a, b, c, d, e, f, g, h)
>> +end subroutine
>> +
>> +! { dg-final { scan-tree-dump-times "__builtin_fminl " 7 "optimized" } }
> Do these still pass?  I wouldn't have expected us to use __builtin_fmin*
> and __builtin_fmax* now.
>
> It would be good to have tests that we use ".FMIN" and ".FMAX" for kind=4
> and kind=8 on AArch64, since that's really the end goal here.

Doh, yes. I had spotted that myself after I had sent out the patch.
I've fixed that and the indentation issue in this small revision.

Given Janne's comments I will commit this tomorrow if there are no objections.
This patch should be a conservative improvement. If the Fortran folks decide
to sacrifice the more predictable NaN handling in favour of more optimisation
leeway by using MIN/MAX_EXPR unconditionally we can do that as a follow-up.

Thanks for the help,
Kyrill

2018-07-18  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * trans-intrinsic.c: (gfc_conv_intrinsic_minmax): Emit MIN_MAX_EXPR
     or IFN_FMIN/FMAX sequence to calculate the min/max when possible.

2018-07-18  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * gfortran.dg/max_fmax_aarch64.f90: New test.
     * gfortran.dg/min_fmin_aarch64.f90: Likewise.
     * gfortran.dg/minmax_integer.f90: Likewise.


[-- Attachment #2: fort-v4.patch --]
[-- Type: text/x-patch, Size: 7360 bytes --]

diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c
index d306e3a5a6209c1621d91f99ffc366acecd9c3d0..c9b5479740c3f98f906132fda5c252274c4b6edd 100644
--- a/gcc/fortran/trans-intrinsic.c
+++ b/gcc/fortran/trans-intrinsic.c
@@ -31,6 +31,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "trans.h"
 #include "stringpool.h"
 #include "fold-const.h"
+#include "internal-fn.h"
 #include "tree-nested.h"
 #include "stor-layout.h"
 #include "toplev.h"	/* For rest_of_decl_compilation.  */
@@ -3874,14 +3875,15 @@ gfc_conv_intrinsic_ttynam (gfc_se * se, gfc_expr * expr)
     minmax (a1, a2, a3, ...)
     {
       mvar = a1;
-      if (a2 .op. mvar || isnan (mvar))
-        mvar = a2;
-      if (a3 .op. mvar || isnan (mvar))
-        mvar = a3;
+      mvar = COMP (mvar, a2)
+      mvar = COMP (mvar, a3)
       ...
-      return mvar
+      return mvar;
     }
- */
+    Where COMP is MIN/MAX_EXPR for integral types or when we don't
+    care about NaNs, or IFN_FMIN/MAX when the target has support for
+    fast NaN-honouring min/max.  When neither holds expand a sequence
+    of explicit comparisons.  */
 
 /* TODO: Mismatching types can occur when specific names are used.
    These should be handled during resolution.  */
@@ -3891,7 +3893,6 @@ gfc_conv_intrinsic_minmax (gfc_se * se, gfc_expr * expr, enum tree_code op)
   tree tmp;
   tree mvar;
   tree val;
-  tree thencase;
   tree *args;
   tree type;
   gfc_actual_arglist *argexpr;
@@ -3912,55 +3913,77 @@ gfc_conv_intrinsic_minmax (gfc_se * se, gfc_expr * expr, enum tree_code op)
 
   mvar = gfc_create_var (type, "M");
   gfc_add_modify (&se->pre, mvar, args[0]);
-  for (i = 1, argexpr = argexpr->next; i < nargs; i++)
-    {
-      tree cond, isnan;
 
+  internal_fn ifn = op == GT_EXPR ? IFN_FMAX : IFN_FMIN;
+
+  for (i = 1, argexpr = argexpr->next; i < nargs; i++, argexpr = argexpr->next)
+    {
+      tree cond = NULL_TREE;
       val = args[i];
 
       /* Handle absent optional arguments by ignoring the comparison.  */
       if (argexpr->expr->expr_type == EXPR_VARIABLE
 	  && argexpr->expr->symtree->n.sym->attr.optional
 	  && TREE_CODE (val) == INDIRECT_REF)
-	cond = fold_build2_loc (input_location,
+	{
+	  cond = fold_build2_loc (input_location,
 				NE_EXPR, logical_type_node,
 				TREE_OPERAND (val, 0),
 			build_int_cst (TREE_TYPE (TREE_OPERAND (val, 0)), 0));
-      else
-      {
-	cond = NULL_TREE;
-
+	}
+      else if (!VAR_P (val) && !TREE_CONSTANT (val))
 	/* Only evaluate the argument once.  */
-	if (!VAR_P (val) && !TREE_CONSTANT (val))
-	  val = gfc_evaluate_now (val, &se->pre);
-      }
+	val = gfc_evaluate_now (val, &se->pre);
 
-      thencase = build2_v (MODIFY_EXPR, mvar, convert (type, val));
+      tree calc;
+      /* If we dealing with integral types or we don't care about NaNs
+	 just do a MIN/MAX_EXPR.  */
+      if (!HONOR_NANS (type) && !HONOR_SIGNED_ZEROS (type))
+	{
+
+	  tree_code code = op == GT_EXPR ? MAX_EXPR : MIN_EXPR;
+	  calc = fold_build2_loc (input_location, code, type,
+				  convert (type, val), mvar);
+	  tmp = build2_v (MODIFY_EXPR, mvar, calc);
 
-      tmp = fold_build2_loc (input_location, op, logical_type_node,
-			     convert (type, val), mvar);
+	}
+      /* If we care about NaNs and we have internal functions available for
+	 fmin/fmax to perform the comparison, use those.  */
+      else if (SCALAR_FLOAT_TYPE_P (type)
+	      && direct_internal_fn_supported_p (ifn, type, OPTIMIZE_FOR_SPEED))
+	{
+	  calc = build_call_expr_internal_loc (input_location, ifn, type,
+						2, mvar, convert (type, val));
+	  tmp = build2_v (MODIFY_EXPR, mvar, calc);
 
-      /* FIXME: When the IEEE_ARITHMETIC module is implemented, the call to
-	 __builtin_isnan might be made dependent on that module being loaded,
-	 to help performance of programs that don't rely on IEEE semantics.  */
-      if (FLOAT_TYPE_P (TREE_TYPE (mvar)))
+	}
+      /* Otherwise expand to:
+	mvar = a1;
+	if (a2 .op. mvar || isnan (mvar))
+	  mvar = a2;
+	if (a3 .op. mvar || isnan (mvar))
+	  mvar = a3;
+	...  */
+      else
 	{
-	  isnan = build_call_expr_loc (input_location,
-				       builtin_decl_explicit (BUILT_IN_ISNAN),
-				       1, mvar);
+	  tree isnan = build_call_expr_loc (input_location,
+					builtin_decl_explicit (BUILT_IN_ISNAN),
+					1, mvar);
+	  tmp = fold_build2_loc (input_location, op, logical_type_node,
+				 convert (type, val), mvar);
+
 	  tmp = fold_build2_loc (input_location, TRUTH_OR_EXPR,
-				 logical_type_node, tmp,
-				 fold_convert (logical_type_node, isnan));
+				  logical_type_node, tmp,
+				  fold_convert (logical_type_node, isnan));
+	  tmp = build3_v (COND_EXPR, tmp,
+			  build2_v (MODIFY_EXPR, mvar, convert (type, val)),
+			  build_empty_stmt (input_location));
 	}
-      tmp = build3_v (COND_EXPR, tmp, thencase,
-		      build_empty_stmt (input_location));
 
       if (cond != NULL_TREE)
 	tmp = build3_v (COND_EXPR, cond, tmp,
 			build_empty_stmt (input_location));
-
       gfc_add_expr_to_block (&se->pre, tmp);
-      argexpr = argexpr->next;
     }
   se->expr = mvar;
 }
diff --git a/gcc/testsuite/gfortran.dg/max_fmax_aarch64.f90 b/gcc/testsuite/gfortran.dg/max_fmax_aarch64.f90
new file mode 100644
index 0000000000000000000000000000000000000000..b818241a1f9aa7018efaf300cfecb70f413b7573
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/max_fmax_aarch64.f90
@@ -0,0 +1,15 @@
+! { dg-do compile { target aarch64*-*-* } }
+! { dg-options "-O2 -fdump-tree-optimized" }
+
+subroutine foo (a, b, c, d, e, f, g, h)
+  real (kind=8) :: a, b, c, d, e, f, g, h
+  a = max (a, b, c, d, e, f, g, h)
+end subroutine
+
+subroutine foof (a, b, c, d, e, f, g, h)
+  real (kind=4) :: a, b, c, d, e, f, g, h
+  a = max (a, b, c, d, e, f, g, h)
+end subroutine
+
+
+! { dg-final { scan-tree-dump-times "\.FMAX " 14 "optimized" } }
diff --git a/gcc/testsuite/gfortran.dg/min_fmin_aarch64.f90 b/gcc/testsuite/gfortran.dg/min_fmin_aarch64.f90
new file mode 100644
index 0000000000000000000000000000000000000000..009869b497df7737089971e00c01e1c29c0a3032
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/min_fmin_aarch64.f90
@@ -0,0 +1,15 @@
+! { dg-do compile { target aarch64*-*-* } }
+! { dg-options "-O2 -fdump-tree-optimized" }
+
+subroutine foo (a, b, c, d, e, f, g, h)
+  real (kind=8) :: a, b, c, d, e, f, g, h
+  a = min (a, b, c, d, e, f, g, h)
+end subroutine
+
+
+subroutine foof (a, b, c, d, e, f, g, h)
+  real (kind=4) :: a, b, c, d, e, f, g, h
+  a = min (a, b, c, d, e, f, g, h)
+end subroutine
+
+! { dg-final { scan-tree-dump-times "\.FMIN " 14 "optimized" } }
diff --git a/gcc/testsuite/gfortran.dg/minmax_integer.f90 b/gcc/testsuite/gfortran.dg/minmax_integer.f90
new file mode 100644
index 0000000000000000000000000000000000000000..5b6be38c7055ce4e8620cf75ec7d8a182436b24f
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/minmax_integer.f90
@@ -0,0 +1,15 @@
+! { dg-do compile }
+! { dg-options "-O2 -fdump-tree-optimized" }
+
+subroutine foo (a, b, c, d, e, f, g, h)
+  integer (kind=4) :: a, b, c, d, e, f, g, h
+  a = min (a, b, c, d, e, f, g, h)
+end subroutine
+
+subroutine foof (a, b, c, d, e, f, g, h)
+  integer (kind=4) :: a, b, c, d, e, f, g, h
+  a = max (a, b, c, d, e, f, g, h)
+end subroutine
+
+! { dg-final { scan-tree-dump-times "MIN_EXPR" 7 "optimized" } }
+! { dg-final { scan-tree-dump-times "MAX_EXPR" 7 "optimized" } }

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH][Fortran][v2] Use MIN/MAX_EXPR for min/max intrinsics
  2018-07-18 15:10               ` Janne Blomqvist
@ 2018-07-26 20:36                 ` Joseph Myers
  2018-08-06 12:05                 ` Janne Blomqvist
  1 sibling, 0 replies; 21+ messages in thread
From: Joseph Myers @ 2018-07-26 20:36 UTC (permalink / raw)
  To: Janne Blomqvist
  Cc: Thomas König, Kyrill Tkachov, Thomas Koenig, Richard Biener,
	fortran, GCC Patches

On Wed, 18 Jul 2018, Janne Blomqvist wrote:

> minimumNumber(a, NaN) = minimumNumber(NaN, a) = a
> 
> That is minimumNumber corresponds to minnum in IEEE 754-2008 and fmin* in

No, it differs in the handling of signaling NaNs (with minimumNumber, if 
the NaN argument is signaling, it results in the "invalid" exception but 
the non-NaN argument is still returned, whereas with minNum, a quiet NaN 
was returned in that case).  A new fminimum_num function is proposed as a 
C binding to the new operation.

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2273.pdf

(The new operations are also more strictly defined regarding zero 
arguments, to treat -0 as less than +0, which was unspecified for minNum 
and fmin.)

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH][Fortran][v2] Use MIN/MAX_EXPR for min/max intrinsics
  2018-07-18 15:10               ` Janne Blomqvist
  2018-07-26 20:36                 ` Joseph Myers
@ 2018-08-06 12:05                 ` Janne Blomqvist
  1 sibling, 0 replies; 21+ messages in thread
From: Janne Blomqvist @ 2018-08-06 12:05 UTC (permalink / raw)
  To: Thomas König
  Cc: Kyrill Tkachov, Thomas Koenig, Richard Biener, fortran, GCC Patches

On Wed, Jul 18, 2018 at 6:10 PM, Janne Blomqvist <blomqvist.janne@gmail.com>
wrote:

> On Wed, Jul 18, 2018 at 4:26 PM, Thomas König <tk@tkoenig.net> wrote:
>
>> Hi Kyrlll,
>>
>> > Am 18.07.2018 um 13:17 schrieb Kyrill Tkachov <
>> kyrylo.tkachov@foss.arm.com>:
>> >
>> > Thomas, Janne, would this relaxation of NaN handling be acceptable
>> given the benefits
>> > mentioned above? If so, what would be the recommended adjustment to the
>> nan_1.f90 test?
>>
>> I would be a bit careful about changing behavior in such a major way.
>> What would the results with NaN and infinity then be, with or without
>> optimization? Would the results be consistent with min(nan,num) vs
>> min(num,nan)? Would they be consistent with the new IEEE standard?
>>
>
> AFAIU, MIN/MAX_EXPR do the right thing when comparing a normal number with
> Inf. For NaN the result is undefined, and you might indeed have
>
> min(a, NaN) = a
> min(NaN, a) = NaN
>
> where "a" is a normal number.
>
> (I think that happens at least on x86 if MIN_EXPR is expanded to
> minsd/minpd.
>
> Apparently what the proper result for min(a, NaN) should be is contentious
> enough that minnum was removed from the upcoming IEEE 754 revision, and new
> operations AFAICS have the semantics
>
> minimum(a, NaN) = minimum(NaN, a) = NaN
> minimumNumber(a, NaN) = minimumNumber(NaN, a) = a
>
> That is minimumNumber corresponds to minnum in IEEE 754-2008 and fmin* in
> C, and to the current behavior of gfortran.
>
>
>> In general, I think that min(nan,num) should be nan and that our current
>> behavior is not the best.
>>
>
> There was some extensive discussion of that in the Julia bug report I
> linked to in an earlier message, and they came to the same conclusion and
> changed their behavior.
>
>
>> Does anybody have dats points on how this is handled by other compilers?
>>
>
> The only other compiler I have access to at the moment is ifort (and not
> the latest version), but maybe somebody has access to a wider variety?
>
>
>> Oh, and if anything is changed, then compile and runtime behavior should
>> always be the same.
>>
>
> Well, IFF we place some weight on the runtime behavior being particularly
> sensible wrt NaN's, which it wouldn't be if we just use a plain
> MIN/MAX_EXPR. Is it worth taking a performance hit for, though? In
> particular, if other compilers are inconsistent, we might as well do
> whatever is fastest.
>
>
> --
> Janne Blomqvist
>


The testcase below (the functions in a separate file to prevent
inter-procedural and constant propagation optimizations):

program main
  implicit none
  real :: a, b = 1., mymax, mydiv
  external mymax, mydiv
  a = mydiv(0., 0.)
  print *, 'Verify that the following value is a NaN: ', a
  print *, 'max(', a, ',', b, ') = ', mymax(a, b)
  print *, 'max(', b, ',', a, ') = ', mymax(b, a)

  a = mydiv(1., 0.)
  print *, 'Verify that the following is a Inf: ', a
  print *, 'max(', a, ',', b, ') = ', mymax(a, b)
  print *, 'max(', b, ',', a, ') = ', mymax(b, a)
end program main

real function mymax(a, b)
  implicit none
  real :: a, b
  mymax = max(a, b)
end function mymax

real function mydiv(a, b)
  implicit none
  real :: a, b
  mydiv = a/b
end function mydiv


With gfortran 6.2 (didn't bother to check other versions as it shouldn't
have changed lately) and Intel Fortran 17.0.1 I get the following:

% gfortran main.f90 my.f90 && ./a.out
 Verify that the following value is a NaN:               NaN
 max(              NaN ,   1.00000000     ) =    1.00000000
 max(   1.00000000     ,              NaN ) =    1.00000000
 Verify that the following is a Inf:          Infinity
 max(         Infinity ,   1.00000000     ) =          Infinity
 max(   1.00000000     ,         Infinity ) =          Infinity

% gfortran -ffast-math main.f90 my.f90 && ./a.out
 Verify that the following value is a NaN:               NaN
 max(              NaN ,   1.00000000     ) =               NaN
 max(   1.00000000     ,              NaN ) =    1.00000000
 Verify that the following is a Inf:          Infinity
 max(         Infinity ,   1.00000000     ) =          Infinity
 max(   1.00000000     ,         Infinity ) =          Infinity


% ifort main.f90 my.f90 && ./a.out
 Verify that the following value is a NaN:             NaN
 max(            NaN ,   1.000000     ) =    1.000000
 max(   1.000000     ,            NaN ) =             NaN
 Verify that the following is a Inf:        Infinity
 max(       Infinity ,   1.000000     ) =        Infinity
 max(   1.000000     ,       Infinity ) =        Infinity


% ifort -fp-model strict main.f90 my.f90 && ./a.out
 Verify that the following value is a NaN:             NaN
 max(            NaN ,   1.000000     ) =    1.000000
 max(   1.000000     ,            NaN ) =             NaN
 Verify that the following is a Inf:        Infinity
 max(       Infinity ,   1.000000     ) =        Infinity
 max(   1.000000     ,       Infinity ) =        Infinity


For brevity I have omitted tests with various -O[N] optimization levels,
which didn't affect the results on either gfortran nor ifort.

This suggests that ifort does the equivalent of MAX_EXPR unconditionally.

Does anyone have access to other compilers, what results do they give?


-- 
Janne Blomqvist

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2018-08-06 12:05 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-17 12:35 [PATCH][Fortran] Use MIN/MAX_EXPR for intrinsics or __builtin_fmin/max when appropriate Kyrill Tkachov
2018-07-17 13:27 ` Richard Biener
2018-07-17 13:46   ` Kyrill Tkachov
2018-07-17 15:37     ` Thomas Koenig
2018-07-17 16:16       ` Kyrill Tkachov
2018-07-17 17:42         ` Thomas Koenig
2018-07-17 20:06       ` Janne Blomqvist
2018-07-17 20:35         ` Janne Blomqvist
2018-07-18 11:17           ` [PATCH][Fortran][v2] Use MIN/MAX_EXPR for min/max intrinsics Kyrill Tkachov
2018-07-18 13:26             ` Thomas König
2018-07-18 14:03               ` Kyrill Tkachov
2018-07-18 14:55                 ` Janne Blomqvist
2018-07-18 15:28                 ` Richard Sandiford
2018-07-18 16:04                   ` Kyrill Tkachov
2018-07-18 15:10               ` Janne Blomqvist
2018-07-26 20:36                 ` Joseph Myers
2018-08-06 12:05                 ` Janne Blomqvist
2018-07-18  9:44     ` [PATCH][Fortran] Use MIN/MAX_EXPR for intrinsics or __builtin_fmin/max when appropriate Richard Biener
2018-07-18  9:50       ` Kyrill Tkachov
2018-07-18 10:06         ` Richard Biener
2018-07-18 11:45           ` [PATCH]Use " Richard Sandiford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).