public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Kyrill  Tkachov <kyrylo.tkachov@foss.arm.com>
To: fortran@gcc.gnu.org,
	 "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: [PATCH][Fortran] Use MIN/MAX_EXPR for intrinsics or __builtin_fmin/max when appropriate
Date: Tue, 17 Jul 2018 12:35:00 -0000	[thread overview]
Message-ID: <5B4DE283.9060100@foss.arm.com> (raw)

[-- Attachment #1: Type: text/plain, Size: 2176 bytes --]

Hi all,

This is my first Fortran patch, so apologies if I'm missing something.
The current expansion of the min and max intrinsics explicitly expands
the comparisons between each argument to calculate the global min/max.
Some targets, like aarch64, have instructions that can calculate the min/max
of two real (floating-point) numbers with the proper NaN-handling semantics
(if both inputs are NaN, return Nan. If one is NaN, return the other) and those
are the semantics provided by the __builtin_fmin/max family of functions that expand
to these instructions.

This patch makes the frontend emit __builtin_fmin/max directly to compare each
pair of numbers when the numbers are floating-point, and use MIN_EXPR/MAX_EXPR otherwise
(integral types and -ffast-math) which should hopefully be easier to recognise in the
midend and optimise. The previous approach of generating the open-coded version of that
is used when we don't have an appropriate __builtin_fmin/max available.
For example, for a configuration of x86_64-unknown-linux-gnu that I tested there was no
128-bit __built_fminl available.

With this patch I'm seeing more than 7000 FMINNM/FMAXNM instructions being generated at -O3
on aarch64 for 521.wrf from fprate SPEC2017 where none before were generated
(we were generating explicit comparisons and NaN checks). This gave a 2.4% improvement
in performance on a Cortex-A72.

Bootstrapped and tested on aarch64-none-linux-gnu and x86_64-unknown-linux-gnu.

Ok for trunk?
Thanks,
Kyrill

2018-07-17  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * f95-lang.c (gfc_init_builtin_functions): Define __builtin_fmin,
     __builtin_fminf, __builtin_fminl, __builtin_fmax, __builtin_fmaxf,
     __builtin_fmaxl.
     * trans-intrinsic.c: Include builtins.h.
     (gfc_conv_intrinsic_minmax): Emit __builtin_fmin/max or MIN/MAX_EXPR
     functions to calculate the min/max.

2018-07-17  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * gfortran.dg/max_fmaxf.f90: New test.
     * gfortran.dg/min_fminf.f90: Likewise.
     * gfortran.dg/minmax_integer.f90: Likewise.
     * gfortran.dg/max_fmaxl_aarch64.f90: Likewise.
     * gfortran.dg/min_fminl_aarch64.f90: Likewise.

[-- Attachment #2: fort-fmin.patch --]
[-- Type: text/x-patch, Size: 9835 bytes --]

diff --git a/gcc/fortran/f95-lang.c b/gcc/fortran/f95-lang.c
index 0f39f0ca788ea9e5868d4718c5f90c102958968f..5dd58f3d3d0242d77a5838ffa0395e7b941c8f85 100644
--- a/gcc/fortran/f95-lang.c
+++ b/gcc/fortran/f95-lang.c
@@ -724,6 +724,20 @@ gfc_init_builtin_functions (void)
   gfc_define_builtin ("__builtin_roundf", mfunc_float[0], 
 		      BUILT_IN_ROUNDF, "roundf", ATTR_CONST_NOTHROW_LEAF_LIST);
 
+  gfc_define_builtin  ("__builtin_fmin", mfunc_double[1],
+		      BUILT_IN_FMIN, "fmin", ATTR_CONST_NOTHROW_LEAF_LIST);
+  gfc_define_builtin  ("__builtin_fminf", mfunc_float[1],
+		      BUILT_IN_FMINF, "fminf", ATTR_CONST_NOTHROW_LEAF_LIST);
+  gfc_define_builtin  ("__builtin_fminl", mfunc_longdouble[1],
+		      BUILT_IN_FMINL, "fminl", ATTR_CONST_NOTHROW_LEAF_LIST);
+
+  gfc_define_builtin  ("__builtin_fmax", mfunc_double[1],
+		      BUILT_IN_FMAX, "fmax", ATTR_CONST_NOTHROW_LEAF_LIST);
+  gfc_define_builtin  ("__builtin_fmaxf", mfunc_float[1],
+		      BUILT_IN_FMAXF, "fmaxf", ATTR_CONST_NOTHROW_LEAF_LIST);
+  gfc_define_builtin  ("__builtin_fmaxl", mfunc_longdouble[1],
+		      BUILT_IN_FMAXL, "fmaxl", ATTR_CONST_NOTHROW_LEAF_LIST);
+
   gfc_define_builtin ("__builtin_truncl", mfunc_longdouble[0],
 		      BUILT_IN_TRUNCL, "truncl", ATTR_CONST_NOTHROW_LEAF_LIST);
   gfc_define_builtin ("__builtin_trunc", mfunc_double[0],
diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c
index d306e3a5a6209c1621d91f99ffc366acecd9c3d0..5dde54a3f3c2606f987b42480b1921e6304ccda5 100644
--- a/gcc/fortran/trans-intrinsic.c
+++ b/gcc/fortran/trans-intrinsic.c
@@ -31,6 +31,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "trans.h"
 #include "stringpool.h"
 #include "fold-const.h"
+#include "builtins.h"
 #include "tree-nested.h"
 #include "stor-layout.h"
 #include "toplev.h"	/* For rest_of_decl_compilation.  */
@@ -3874,14 +3875,13 @@ gfc_conv_intrinsic_ttynam (gfc_se * se, gfc_expr * expr)
     minmax (a1, a2, a3, ...)
     {
       mvar = a1;
-      if (a2 .op. mvar || isnan (mvar))
-        mvar = a2;
-      if (a3 .op. mvar || isnan (mvar))
-        mvar = a3;
+      __builtin_fmin/max (mvar, a2);
+      __builtin_fmin/max (mvar, a3);
       ...
-      return mvar
+      return mvar;
     }
- */
+   For integral types or when we don't care about NaNs use
+   MIN/MAX_EXPRs.  */
 
 /* TODO: Mismatching types can occur when specific names are used.
    These should be handled during resolution.  */
@@ -3891,7 +3891,6 @@ gfc_conv_intrinsic_minmax (gfc_se * se, gfc_expr * expr, enum tree_code op)
   tree tmp;
   tree mvar;
   tree val;
-  tree thencase;
   tree *args;
   tree type;
   gfc_actual_arglist *argexpr;
@@ -3912,55 +3911,79 @@ gfc_conv_intrinsic_minmax (gfc_se * se, gfc_expr * expr, enum tree_code op)
 
   mvar = gfc_create_var (type, "M");
   gfc_add_modify (&se->pre, mvar, args[0]);
-  for (i = 1, argexpr = argexpr->next; i < nargs; i++)
-    {
-      tree cond, isnan;
 
+  tree builtin = NULL_TREE;
+  if (SCALAR_FLOAT_TYPE_P (type))
+    builtin = mathfn_built_in (type, op == GT_EXPR
+				     ? BUILT_IN_FMAX : BUILT_IN_FMIN);
+
+  for (i = 1, argexpr = argexpr->next; i < nargs; i++, argexpr = argexpr->next)
+    {
+      tree cond = NULL_TREE;
       val = args[i];
 
       /* Handle absent optional arguments by ignoring the comparison.  */
       if (argexpr->expr->expr_type == EXPR_VARIABLE
 	  && argexpr->expr->symtree->n.sym->attr.optional
 	  && TREE_CODE (val) == INDIRECT_REF)
-	cond = fold_build2_loc (input_location,
+	{
+	  cond = fold_build2_loc (input_location,
 				NE_EXPR, logical_type_node,
 				TREE_OPERAND (val, 0),
 			build_int_cst (TREE_TYPE (TREE_OPERAND (val, 0)), 0));
-      else
-      {
-	cond = NULL_TREE;
-
+	}
+      else if (!VAR_P (val) && !TREE_CONSTANT (val))
 	/* Only evaluate the argument once.  */
-	if (!VAR_P (val) && !TREE_CONSTANT (val))
-	  val = gfc_evaluate_now (val, &se->pre);
-      }
+	val = gfc_evaluate_now (val, &se->pre);
 
-      thencase = build2_v (MODIFY_EXPR, mvar, convert (type, val));
+      tree calc;
+      /* If we dealing with integral types or we don't care about NaNs
+	 just do a MIN/MAX_EXPR.  */
+      if (!HONOR_NANS (type))
+	{
+
+	  tree_code code = op == GT_EXPR ? MAX_EXPR : MIN_EXPR;
+	  calc = fold_build2_loc (input_location, code, type,
+				  convert (type, val), mvar);
+	  tmp = build2_v (MODIFY_EXPR, mvar, calc);
 
-      tmp = fold_build2_loc (input_location, op, logical_type_node,
-			     convert (type, val), mvar);
+	}
+      /* If we care about NaNs and we have __builtin_fmin/max builtins
+	 to perform the comparison, use those.  */
+      else if (SCALAR_FLOAT_TYPE_P (type) && builtin)
+	{
+	  calc = build_call_expr_loc (input_location, builtin,
+				      2, mvar, convert (type, val));
+	  tmp = build2_v (MODIFY_EXPR, mvar, calc);
 
-      /* FIXME: When the IEEE_ARITHMETIC module is implemented, the call to
-	 __builtin_isnan might be made dependent on that module being loaded,
-	 to help performance of programs that don't rely on IEEE semantics.  */
-      if (FLOAT_TYPE_P (TREE_TYPE (mvar)))
+	}
+      /* Otherwise expand to:
+	mvar = a1;
+	if (a2 .op. mvar || isnan (mvar))
+	  mvar = a2;
+	if (a3 .op. mvar || isnan (mvar))
+	  mvar = a3;
+	...  */
+      else
 	{
-	  isnan = build_call_expr_loc (input_location,
-				       builtin_decl_explicit (BUILT_IN_ISNAN),
-				       1, mvar);
+	  tree isnan = build_call_expr_loc (input_location,
+					builtin_decl_explicit (BUILT_IN_ISNAN),
+					1, mvar);
+	  tmp = fold_build2_loc (input_location, op, logical_type_node,
+				 convert (type, val), mvar);
+
 	  tmp = fold_build2_loc (input_location, TRUTH_OR_EXPR,
-				 logical_type_node, tmp,
-				 fold_convert (logical_type_node, isnan));
+				  logical_type_node, tmp,
+				  fold_convert (logical_type_node, isnan));
+	  tmp = build3_v (COND_EXPR, tmp,
+			  build2_v (MODIFY_EXPR, mvar, convert (type, val)),
+			  build_empty_stmt (input_location));
 	}
-      tmp = build3_v (COND_EXPR, tmp, thencase,
-		      build_empty_stmt (input_location));
 
       if (cond != NULL_TREE)
 	tmp = build3_v (COND_EXPR, cond, tmp,
 			build_empty_stmt (input_location));
-
       gfc_add_expr_to_block (&se->pre, tmp);
-      argexpr = argexpr->next;
     }
   se->expr = mvar;
 }
diff --git a/gcc/testsuite/gfortran.dg/max_fmaxf.f90 b/gcc/testsuite/gfortran.dg/max_fmaxf.f90
new file mode 100644
index 0000000000000000000000000000000000000000..9082d4a7e70378efb36e24e3e575f11a74631657
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/max_fmaxf.f90
@@ -0,0 +1,15 @@
+! { dg-do compile }
+! { dg-options "-O2 -fdump-tree-optimized" }
+
+subroutine foo (a, b, c, d, e, f, g, h)
+  real (kind=8) :: a, b, c, d, e, f, g, h
+  a = max (a, b, c, d, e, f, g, h)
+end subroutine
+
+subroutine foof (a, b, c, d, e, f, g, h)
+  real (kind=4) :: a, b, c, d, e, f, g, h
+  a = max (a, b, c, d, e, f, g, h)
+end subroutine
+
+! { dg-final { scan-tree-dump-times "__builtin_fmax " 7 "optimized" } }
+! { dg-final { scan-tree-dump-times "__builtin_fmaxf " 7 "optimized" } }
diff --git a/gcc/testsuite/gfortran.dg/max_fmaxl_aarch64.f90 b/gcc/testsuite/gfortran.dg/max_fmaxl_aarch64.f90
new file mode 100644
index 0000000000000000000000000000000000000000..8c8ea063e5d0718dc829c1f5574c5b46040e6786
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/max_fmaxl_aarch64.f90
@@ -0,0 +1,9 @@
+! { dg-do compile { target aarch64*-*-* } }
+! { dg-options "-O2 -fdump-tree-optimized" }
+
+subroutine fool (a, b, c, d, e, f, g, h)
+  real (kind=16) :: a, b, c, d, e, f, g, h
+  a = max (a, b, c, d, e, f, g, h)
+end subroutine
+
+! { dg-final { scan-tree-dump-times "__builtin_fmaxl " 7 "optimized" } }
diff --git a/gcc/testsuite/gfortran.dg/min_fminf.f90 b/gcc/testsuite/gfortran.dg/min_fminf.f90
new file mode 100644
index 0000000000000000000000000000000000000000..6b929611f1f166877a19c475704a9312edf7b170
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/min_fminf.f90
@@ -0,0 +1,15 @@
+! { dg-do compile }
+! { dg-options "-O2 -fdump-tree-optimized" }
+
+subroutine foo (a, b, c, d, e, f, g, h)
+  real (kind=8) :: a, b, c, d, e, f, g, h
+  a = min (a, b, c, d, e, f, g, h)
+end subroutine
+
+subroutine foof (a, b, c, d, e, f, g, h)
+  real (kind=4) :: a, b, c, d, e, f, g, h
+  a = min (a, b, c, d, e, f, g, h)
+end subroutine
+
+! { dg-final { scan-tree-dump-times "__builtin_fmin " 7 "optimized" } }
+! { dg-final { scan-tree-dump-times "__builtin_fminf " 7 "optimized" } }
diff --git a/gcc/testsuite/gfortran.dg/min_fminl_aarch64.f90 b/gcc/testsuite/gfortran.dg/min_fminl_aarch64.f90
new file mode 100644
index 0000000000000000000000000000000000000000..92368917fb48e0c468a16d080ab3a9ac842e01a7
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/min_fminl_aarch64.f90
@@ -0,0 +1,9 @@
+! { dg-do compile { target aarch64*-*-* } }
+! { dg-options "-O2 -fdump-tree-optimized" }
+
+subroutine fool (a, b, c, d, e, f, g, h)
+  real (kind=16) :: a, b, c, d, e, f, g, h
+  a = min (a, b, c, d, e, f, g, h)
+end subroutine
+
+! { dg-final { scan-tree-dump-times "__builtin_fminl " 7 "optimized" } }
diff --git a/gcc/testsuite/gfortran.dg/minmax_integer.f90 b/gcc/testsuite/gfortran.dg/minmax_integer.f90
new file mode 100644
index 0000000000000000000000000000000000000000..5b6be38c7055ce4e8620cf75ec7d8a182436b24f
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/minmax_integer.f90
@@ -0,0 +1,15 @@
+! { dg-do compile }
+! { dg-options "-O2 -fdump-tree-optimized" }
+
+subroutine foo (a, b, c, d, e, f, g, h)
+  integer (kind=4) :: a, b, c, d, e, f, g, h
+  a = min (a, b, c, d, e, f, g, h)
+end subroutine
+
+subroutine foof (a, b, c, d, e, f, g, h)
+  integer (kind=4) :: a, b, c, d, e, f, g, h
+  a = max (a, b, c, d, e, f, g, h)
+end subroutine
+
+! { dg-final { scan-tree-dump-times "MIN_EXPR" 7 "optimized" } }
+! { dg-final { scan-tree-dump-times "MAX_EXPR" 7 "optimized" } }

             reply	other threads:[~2018-07-17 12:35 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-17 12:35 Kyrill Tkachov [this message]
2018-07-17 13:27 ` Richard Biener
2018-07-17 13:46   ` Kyrill Tkachov
2018-07-17 15:37     ` Thomas Koenig
2018-07-17 16:16       ` Kyrill Tkachov
2018-07-17 17:42         ` Thomas Koenig
2018-07-17 20:06       ` Janne Blomqvist
2018-07-17 20:35         ` Janne Blomqvist
2018-07-18 11:17           ` [PATCH][Fortran][v2] Use MIN/MAX_EXPR for min/max intrinsics Kyrill Tkachov
2018-07-18 13:26             ` Thomas König
2018-07-18 14:03               ` Kyrill Tkachov
2018-07-18 14:55                 ` Janne Blomqvist
2018-07-18 15:28                 ` Richard Sandiford
2018-07-18 16:04                   ` Kyrill Tkachov
2018-07-18 15:10               ` Janne Blomqvist
2018-07-26 20:36                 ` Joseph Myers
2018-08-06 12:05                 ` Janne Blomqvist
2018-07-18  9:44     ` [PATCH][Fortran] Use MIN/MAX_EXPR for intrinsics or __builtin_fmin/max when appropriate Richard Biener
2018-07-18  9:50       ` Kyrill Tkachov
2018-07-18 10:06         ` Richard Biener
2018-07-18 11:45           ` [PATCH]Use " Richard Sandiford

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5B4DE283.9060100@foss.arm.com \
    --to=kyrylo.tkachov@foss.arm.com \
    --cc=fortran@gcc.gnu.org \
    --cc=gcc-patches@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).