From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-481724-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 115289 invoked by alias); 17 Jul 2018 12:35:23 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 115262 invoked by uid 89); 17 Jul 2018 12:35:21 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-25.6 required=5.0 tests=BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,KAM_LAZY_DOMAIN_SECURITY,KAM_LOTSOFHASH autolearn=ham version=3.3.2 spammy=IEEE, subroutine
X-HELO: foss.arm.com
Received: from usa-sjc-mx-foss1.foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 17 Jul 2018 12:35:19 +0000
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249])	by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 29BFC18A;	Tue, 17 Jul 2018 05:35:18 -0700 (PDT)
Received: from [10.2.207.77] (e100706-lin.cambridge.arm.com [10.2.207.77])	by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 4A3193F5A0;	Tue, 17 Jul 2018 05:35:17 -0700 (PDT)
Message-ID: <5B4DE283.9060100@foss.arm.com>
Date: Tue, 17 Jul 2018 12:35:00 -0000
From: Kyrill  Tkachov <kyrylo.tkachov@foss.arm.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0
MIME-Version: 1.0
To: fortran@gcc.gnu.org,  "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: [PATCH][Fortran] Use MIN/MAX_EXPR for intrinsics or __builtin_fmin/max when appropriate
Content-Type: multipart/mixed; boundary="------------060408070703010602030600"
X-SW-Source: 2018-07/txt/msg00921.txt.bz2

This is a multi-part message in MIME format.
--------------060408070703010602030600
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Content-length: 2176

Hi all,

This is my first Fortran patch, so apologies if I'm missing something.
The current expansion of the min and max intrinsics explicitly expands
the comparisons between each argument to calculate the global min/max.
Some targets, like aarch64, have instructions that can calculate the min/max
of two real (floating-point) numbers with the proper NaN-handling semantics
(if both inputs are NaN, return Nan. If one is NaN, return the other) and those
are the semantics provided by the __builtin_fmin/max family of functions that expand
to these instructions.

This patch makes the frontend emit __builtin_fmin/max directly to compare each
pair of numbers when the numbers are floating-point, and use MIN_EXPR/MAX_EXPR otherwise
(integral types and -ffast-math) which should hopefully be easier to recognise in the
midend and optimise. The previous approach of generating the open-coded version of that
is used when we don't have an appropriate __builtin_fmin/max available.
For example, for a configuration of x86_64-unknown-linux-gnu that I tested there was no
128-bit __built_fminl available.

With this patch I'm seeing more than 7000 FMINNM/FMAXNM instructions being generated at -O3
on aarch64 for 521.wrf from fprate SPEC2017 where none before were generated
(we were generating explicit comparisons and NaN checks). This gave a 2.4% improvement
in performance on a Cortex-A72.

Bootstrapped and tested on aarch64-none-linux-gnu and x86_64-unknown-linux-gnu.

Ok for trunk?
Thanks,
Kyrill

2018-07-17  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * f95-lang.c (gfc_init_builtin_functions): Define __builtin_fmin,
     __builtin_fminf, __builtin_fminl, __builtin_fmax, __builtin_fmaxf,
     __builtin_fmaxl.
     * trans-intrinsic.c: Include builtins.h.
     (gfc_conv_intrinsic_minmax): Emit __builtin_fmin/max or MIN/MAX_EXPR
     functions to calculate the min/max.

2018-07-17  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * gfortran.dg/max_fmaxf.f90: New test.
     * gfortran.dg/min_fminf.f90: Likewise.
     * gfortran.dg/minmax_integer.f90: Likewise.
     * gfortran.dg/max_fmaxl_aarch64.f90: Likewise.
     * gfortran.dg/min_fminl_aarch64.f90: Likewise.

--------------060408070703010602030600
Content-Type: text/x-patch;
 name="fort-fmin.patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename="fort-fmin.patch"
Content-length: 9835

diff --git a/gcc/fortran/f95-lang.c b/gcc/fortran/f95-lang.c
index 0f39f0ca788ea9e5868d4718c5f90c102958968f..5dd58f3d3d0242d77a5838ffa0395e7b941c8f85 100644
--- a/gcc/fortran/f95-lang.c
+++ b/gcc/fortran/f95-lang.c
@@ -724,6 +724,20 @@ gfc_init_builtin_functions (void)
   gfc_define_builtin ("__builtin_roundf", mfunc_float[0], 
 		      BUILT_IN_ROUNDF, "roundf", ATTR_CONST_NOTHROW_LEAF_LIST);
 
+  gfc_define_builtin  ("__builtin_fmin", mfunc_double[1],
+		      BUILT_IN_FMIN, "fmin", ATTR_CONST_NOTHROW_LEAF_LIST);
+  gfc_define_builtin  ("__builtin_fminf", mfunc_float[1],
+		      BUILT_IN_FMINF, "fminf", ATTR_CONST_NOTHROW_LEAF_LIST);
+  gfc_define_builtin  ("__builtin_fminl", mfunc_longdouble[1],
+		      BUILT_IN_FMINL, "fminl", ATTR_CONST_NOTHROW_LEAF_LIST);
+
+  gfc_define_builtin  ("__builtin_fmax", mfunc_double[1],
+		      BUILT_IN_FMAX, "fmax", ATTR_CONST_NOTHROW_LEAF_LIST);
+  gfc_define_builtin  ("__builtin_fmaxf", mfunc_float[1],
+		      BUILT_IN_FMAXF, "fmaxf", ATTR_CONST_NOTHROW_LEAF_LIST);
+  gfc_define_builtin  ("__builtin_fmaxl", mfunc_longdouble[1],
+		      BUILT_IN_FMAXL, "fmaxl", ATTR_CONST_NOTHROW_LEAF_LIST);
+
   gfc_define_builtin ("__builtin_truncl", mfunc_longdouble[0],
 		      BUILT_IN_TRUNCL, "truncl", ATTR_CONST_NOTHROW_LEAF_LIST);
   gfc_define_builtin ("__builtin_trunc", mfunc_double[0],
diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c
index d306e3a5a6209c1621d91f99ffc366acecd9c3d0..5dde54a3f3c2606f987b42480b1921e6304ccda5 100644
--- a/gcc/fortran/trans-intrinsic.c
+++ b/gcc/fortran/trans-intrinsic.c
@@ -31,6 +31,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "trans.h"
 #include "stringpool.h"
 #include "fold-const.h"
+#include "builtins.h"
 #include "tree-nested.h"
 #include "stor-layout.h"
 #include "toplev.h"	/* For rest_of_decl_compilation.  */
@@ -3874,14 +3875,13 @@ gfc_conv_intrinsic_ttynam (gfc_se * se, gfc_expr * expr)
     minmax (a1, a2, a3, ...)
     {
       mvar = a1;
-      if (a2 .op. mvar || isnan (mvar))
-        mvar = a2;
-      if (a3 .op. mvar || isnan (mvar))
-        mvar = a3;
+      __builtin_fmin/max (mvar, a2);
+      __builtin_fmin/max (mvar, a3);
       ...
-      return mvar
+      return mvar;
     }
- */
+   For integral types or when we don't care about NaNs use
+   MIN/MAX_EXPRs.  */
 
 /* TODO: Mismatching types can occur when specific names are used.
    These should be handled during resolution.  */
@@ -3891,7 +3891,6 @@ gfc_conv_intrinsic_minmax (gfc_se * se, gfc_expr * expr, enum tree_code op)
   tree tmp;
   tree mvar;
   tree val;
-  tree thencase;
   tree *args;
   tree type;
   gfc_actual_arglist *argexpr;
@@ -3912,55 +3911,79 @@ gfc_conv_intrinsic_minmax (gfc_se * se, gfc_expr * expr, enum tree_code op)
 
   mvar = gfc_create_var (type, "M");
   gfc_add_modify (&se->pre, mvar, args[0]);
-  for (i = 1, argexpr = argexpr->next; i < nargs; i++)
-    {
-      tree cond, isnan;
 
+  tree builtin = NULL_TREE;
+  if (SCALAR_FLOAT_TYPE_P (type))
+    builtin = mathfn_built_in (type, op == GT_EXPR
+				     ? BUILT_IN_FMAX : BUILT_IN_FMIN);
+
+  for (i = 1, argexpr = argexpr->next; i < nargs; i++, argexpr = argexpr->next)
+    {
+      tree cond = NULL_TREE;
       val = args[i];
 
       /* Handle absent optional arguments by ignoring the comparison.  */
       if (argexpr->expr->expr_type == EXPR_VARIABLE
 	  && argexpr->expr->symtree->n.sym->attr.optional
 	  && TREE_CODE (val) == INDIRECT_REF)
-	cond = fold_build2_loc (input_location,
+	{
+	  cond = fold_build2_loc (input_location,
 				NE_EXPR, logical_type_node,
 				TREE_OPERAND (val, 0),
 			build_int_cst (TREE_TYPE (TREE_OPERAND (val, 0)), 0));
-      else
-      {
-	cond = NULL_TREE;
-
+	}
+      else if (!VAR_P (val) && !TREE_CONSTANT (val))
 	/* Only evaluate the argument once.  */
-	if (!VAR_P (val) && !TREE_CONSTANT (val))
-	  val = gfc_evaluate_now (val, &se->pre);
-      }
+	val = gfc_evaluate_now (val, &se->pre);
 
-      thencase = build2_v (MODIFY_EXPR, mvar, convert (type, val));
+      tree calc;
+      /* If we dealing with integral types or we don't care about NaNs
+	 just do a MIN/MAX_EXPR.  */
+      if (!HONOR_NANS (type))
+	{
+
+	  tree_code code = op == GT_EXPR ? MAX_EXPR : MIN_EXPR;
+	  calc = fold_build2_loc (input_location, code, type,
+				  convert (type, val), mvar);
+	  tmp = build2_v (MODIFY_EXPR, mvar, calc);
 
-      tmp = fold_build2_loc (input_location, op, logical_type_node,
-			     convert (type, val), mvar);
+	}
+      /* If we care about NaNs and we have __builtin_fmin/max builtins
+	 to perform the comparison, use those.  */
+      else if (SCALAR_FLOAT_TYPE_P (type) && builtin)
+	{
+	  calc = build_call_expr_loc (input_location, builtin,
+				      2, mvar, convert (type, val));
+	  tmp = build2_v (MODIFY_EXPR, mvar, calc);
 
-      /* FIXME: When the IEEE_ARITHMETIC module is implemented, the call to
-	 __builtin_isnan might be made dependent on that module being loaded,
-	 to help performance of programs that don't rely on IEEE semantics.  */
-      if (FLOAT_TYPE_P (TREE_TYPE (mvar)))
+	}
+      /* Otherwise expand to:
+	mvar = a1;
+	if (a2 .op. mvar || isnan (mvar))
+	  mvar = a2;
+	if (a3 .op. mvar || isnan (mvar))
+	  mvar = a3;
+	...  */
+      else
 	{
-	  isnan = build_call_expr_loc (input_location,
-				       builtin_decl_explicit (BUILT_IN_ISNAN),
-				       1, mvar);
+	  tree isnan = build_call_expr_loc (input_location,
+					builtin_decl_explicit (BUILT_IN_ISNAN),
+					1, mvar);
+	  tmp = fold_build2_loc (input_location, op, logical_type_node,
+				 convert (type, val), mvar);
+
 	  tmp = fold_build2_loc (input_location, TRUTH_OR_EXPR,
-				 logical_type_node, tmp,
-				 fold_convert (logical_type_node, isnan));
+				  logical_type_node, tmp,
+				  fold_convert (logical_type_node, isnan));
+	  tmp = build3_v (COND_EXPR, tmp,
+			  build2_v (MODIFY_EXPR, mvar, convert (type, val)),
+			  build_empty_stmt (input_location));
 	}
-      tmp = build3_v (COND_EXPR, tmp, thencase,
-		      build_empty_stmt (input_location));
 
       if (cond != NULL_TREE)
 	tmp = build3_v (COND_EXPR, cond, tmp,
 			build_empty_stmt (input_location));
-
       gfc_add_expr_to_block (&se->pre, tmp);
-      argexpr = argexpr->next;
     }
   se->expr = mvar;
 }
diff --git a/gcc/testsuite/gfortran.dg/max_fmaxf.f90 b/gcc/testsuite/gfortran.dg/max_fmaxf.f90
new file mode 100644
index 0000000000000000000000000000000000000000..9082d4a7e70378efb36e24e3e575f11a74631657
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/max_fmaxf.f90
@@ -0,0 +1,15 @@
+! { dg-do compile }
+! { dg-options "-O2 -fdump-tree-optimized" }
+
+subroutine foo (a, b, c, d, e, f, g, h)
+  real (kind=8) :: a, b, c, d, e, f, g, h
+  a = max (a, b, c, d, e, f, g, h)
+end subroutine
+
+subroutine foof (a, b, c, d, e, f, g, h)
+  real (kind=4) :: a, b, c, d, e, f, g, h
+  a = max (a, b, c, d, e, f, g, h)
+end subroutine
+
+! { dg-final { scan-tree-dump-times "__builtin_fmax " 7 "optimized" } }
+! { dg-final { scan-tree-dump-times "__builtin_fmaxf " 7 "optimized" } }
diff --git a/gcc/testsuite/gfortran.dg/max_fmaxl_aarch64.f90 b/gcc/testsuite/gfortran.dg/max_fmaxl_aarch64.f90
new file mode 100644
index 0000000000000000000000000000000000000000..8c8ea063e5d0718dc829c1f5574c5b46040e6786
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/max_fmaxl_aarch64.f90
@@ -0,0 +1,9 @@
+! { dg-do compile { target aarch64*-*-* } }
+! { dg-options "-O2 -fdump-tree-optimized" }
+
+subroutine fool (a, b, c, d, e, f, g, h)
+  real (kind=16) :: a, b, c, d, e, f, g, h
+  a = max (a, b, c, d, e, f, g, h)
+end subroutine
+
+! { dg-final { scan-tree-dump-times "__builtin_fmaxl " 7 "optimized" } }
diff --git a/gcc/testsuite/gfortran.dg/min_fminf.f90 b/gcc/testsuite/gfortran.dg/min_fminf.f90
new file mode 100644
index 0000000000000000000000000000000000000000..6b929611f1f166877a19c475704a9312edf7b170
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/min_fminf.f90
@@ -0,0 +1,15 @@
+! { dg-do compile }
+! { dg-options "-O2 -fdump-tree-optimized" }
+
+subroutine foo (a, b, c, d, e, f, g, h)
+  real (kind=8) :: a, b, c, d, e, f, g, h
+  a = min (a, b, c, d, e, f, g, h)
+end subroutine
+
+subroutine foof (a, b, c, d, e, f, g, h)
+  real (kind=4) :: a, b, c, d, e, f, g, h
+  a = min (a, b, c, d, e, f, g, h)
+end subroutine
+
+! { dg-final { scan-tree-dump-times "__builtin_fmin " 7 "optimized" } }
+! { dg-final { scan-tree-dump-times "__builtin_fminf " 7 "optimized" } }
diff --git a/gcc/testsuite/gfortran.dg/min_fminl_aarch64.f90 b/gcc/testsuite/gfortran.dg/min_fminl_aarch64.f90
new file mode 100644
index 0000000000000000000000000000000000000000..92368917fb48e0c468a16d080ab3a9ac842e01a7
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/min_fminl_aarch64.f90
@@ -0,0 +1,9 @@
+! { dg-do compile { target aarch64*-*-* } }
+! { dg-options "-O2 -fdump-tree-optimized" }
+
+subroutine fool (a, b, c, d, e, f, g, h)
+  real (kind=16) :: a, b, c, d, e, f, g, h
+  a = min (a, b, c, d, e, f, g, h)
+end subroutine
+
+! { dg-final { scan-tree-dump-times "__builtin_fminl " 7 "optimized" } }
diff --git a/gcc/testsuite/gfortran.dg/minmax_integer.f90 b/gcc/testsuite/gfortran.dg/minmax_integer.f90
new file mode 100644
index 0000000000000000000000000000000000000000..5b6be38c7055ce4e8620cf75ec7d8a182436b24f
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/minmax_integer.f90
@@ -0,0 +1,15 @@
+! { dg-do compile }
+! { dg-options "-O2 -fdump-tree-optimized" }
+
+subroutine foo (a, b, c, d, e, f, g, h)
+  integer (kind=4) :: a, b, c, d, e, f, g, h
+  a = min (a, b, c, d, e, f, g, h)
+end subroutine
+
+subroutine foof (a, b, c, d, e, f, g, h)
+  integer (kind=4) :: a, b, c, d, e, f, g, h
+  a = max (a, b, c, d, e, f, g, h)
+end subroutine
+
+! { dg-final { scan-tree-dump-times "MIN_EXPR" 7 "optimized" } }
+! { dg-final { scan-tree-dump-times "MAX_EXPR" 7 "optimized" } }

--------------060408070703010602030600--