From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 115289 invoked by alias); 17 Jul 2018 12:35:23 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 115262 invoked by uid 89); 17 Jul 2018 12:35:21 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-25.6 required=5.0 tests=BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,KAM_LAZY_DOMAIN_SECURITY,KAM_LOTSOFHASH autolearn=ham version=3.3.2 spammy=IEEE, subroutine X-HELO: foss.arm.com Received: from usa-sjc-mx-foss1.foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 17 Jul 2018 12:35:19 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 29BFC18A; Tue, 17 Jul 2018 05:35:18 -0700 (PDT) Received: from [10.2.207.77] (e100706-lin.cambridge.arm.com [10.2.207.77]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 4A3193F5A0; Tue, 17 Jul 2018 05:35:17 -0700 (PDT) Message-ID: <5B4DE283.9060100@foss.arm.com> Date: Tue, 17 Jul 2018 12:35:00 -0000 From: Kyrill Tkachov User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: fortran@gcc.gnu.org, "gcc-patches@gcc.gnu.org" Subject: [PATCH][Fortran] Use MIN/MAX_EXPR for intrinsics or __builtin_fmin/max when appropriate Content-Type: multipart/mixed; boundary="------------060408070703010602030600" X-SW-Source: 2018-07/txt/msg00921.txt.bz2 This is a multi-part message in MIME format. --------------060408070703010602030600 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-length: 2176 Hi all, This is my first Fortran patch, so apologies if I'm missing something. The current expansion of the min and max intrinsics explicitly expands the comparisons between each argument to calculate the global min/max. Some targets, like aarch64, have instructions that can calculate the min/max of two real (floating-point) numbers with the proper NaN-handling semantics (if both inputs are NaN, return Nan. If one is NaN, return the other) and those are the semantics provided by the __builtin_fmin/max family of functions that expand to these instructions. This patch makes the frontend emit __builtin_fmin/max directly to compare each pair of numbers when the numbers are floating-point, and use MIN_EXPR/MAX_EXPR otherwise (integral types and -ffast-math) which should hopefully be easier to recognise in the midend and optimise. The previous approach of generating the open-coded version of that is used when we don't have an appropriate __builtin_fmin/max available. For example, for a configuration of x86_64-unknown-linux-gnu that I tested there was no 128-bit __built_fminl available. With this patch I'm seeing more than 7000 FMINNM/FMAXNM instructions being generated at -O3 on aarch64 for 521.wrf from fprate SPEC2017 where none before were generated (we were generating explicit comparisons and NaN checks). This gave a 2.4% improvement in performance on a Cortex-A72. Bootstrapped and tested on aarch64-none-linux-gnu and x86_64-unknown-linux-gnu. Ok for trunk? Thanks, Kyrill 2018-07-17 Kyrylo Tkachov * f95-lang.c (gfc_init_builtin_functions): Define __builtin_fmin, __builtin_fminf, __builtin_fminl, __builtin_fmax, __builtin_fmaxf, __builtin_fmaxl. * trans-intrinsic.c: Include builtins.h. (gfc_conv_intrinsic_minmax): Emit __builtin_fmin/max or MIN/MAX_EXPR functions to calculate the min/max. 2018-07-17 Kyrylo Tkachov * gfortran.dg/max_fmaxf.f90: New test. * gfortran.dg/min_fminf.f90: Likewise. * gfortran.dg/minmax_integer.f90: Likewise. * gfortran.dg/max_fmaxl_aarch64.f90: Likewise. * gfortran.dg/min_fminl_aarch64.f90: Likewise. --------------060408070703010602030600 Content-Type: text/x-patch; name="fort-fmin.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="fort-fmin.patch" Content-length: 9835 diff --git a/gcc/fortran/f95-lang.c b/gcc/fortran/f95-lang.c index 0f39f0ca788ea9e5868d4718c5f90c102958968f..5dd58f3d3d0242d77a5838ffa0395e7b941c8f85 100644 --- a/gcc/fortran/f95-lang.c +++ b/gcc/fortran/f95-lang.c @@ -724,6 +724,20 @@ gfc_init_builtin_functions (void) gfc_define_builtin ("__builtin_roundf", mfunc_float[0], BUILT_IN_ROUNDF, "roundf", ATTR_CONST_NOTHROW_LEAF_LIST); + gfc_define_builtin ("__builtin_fmin", mfunc_double[1], + BUILT_IN_FMIN, "fmin", ATTR_CONST_NOTHROW_LEAF_LIST); + gfc_define_builtin ("__builtin_fminf", mfunc_float[1], + BUILT_IN_FMINF, "fminf", ATTR_CONST_NOTHROW_LEAF_LIST); + gfc_define_builtin ("__builtin_fminl", mfunc_longdouble[1], + BUILT_IN_FMINL, "fminl", ATTR_CONST_NOTHROW_LEAF_LIST); + + gfc_define_builtin ("__builtin_fmax", mfunc_double[1], + BUILT_IN_FMAX, "fmax", ATTR_CONST_NOTHROW_LEAF_LIST); + gfc_define_builtin ("__builtin_fmaxf", mfunc_float[1], + BUILT_IN_FMAXF, "fmaxf", ATTR_CONST_NOTHROW_LEAF_LIST); + gfc_define_builtin ("__builtin_fmaxl", mfunc_longdouble[1], + BUILT_IN_FMAXL, "fmaxl", ATTR_CONST_NOTHROW_LEAF_LIST); + gfc_define_builtin ("__builtin_truncl", mfunc_longdouble[0], BUILT_IN_TRUNCL, "truncl", ATTR_CONST_NOTHROW_LEAF_LIST); gfc_define_builtin ("__builtin_trunc", mfunc_double[0], diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c index d306e3a5a6209c1621d91f99ffc366acecd9c3d0..5dde54a3f3c2606f987b42480b1921e6304ccda5 100644 --- a/gcc/fortran/trans-intrinsic.c +++ b/gcc/fortran/trans-intrinsic.c @@ -31,6 +31,7 @@ along with GCC; see the file COPYING3. If not see #include "trans.h" #include "stringpool.h" #include "fold-const.h" +#include "builtins.h" #include "tree-nested.h" #include "stor-layout.h" #include "toplev.h" /* For rest_of_decl_compilation. */ @@ -3874,14 +3875,13 @@ gfc_conv_intrinsic_ttynam (gfc_se * se, gfc_expr * expr) minmax (a1, a2, a3, ...) { mvar = a1; - if (a2 .op. mvar || isnan (mvar)) - mvar = a2; - if (a3 .op. mvar || isnan (mvar)) - mvar = a3; + __builtin_fmin/max (mvar, a2); + __builtin_fmin/max (mvar, a3); ... - return mvar + return mvar; } - */ + For integral types or when we don't care about NaNs use + MIN/MAX_EXPRs. */ /* TODO: Mismatching types can occur when specific names are used. These should be handled during resolution. */ @@ -3891,7 +3891,6 @@ gfc_conv_intrinsic_minmax (gfc_se * se, gfc_expr * expr, enum tree_code op) tree tmp; tree mvar; tree val; - tree thencase; tree *args; tree type; gfc_actual_arglist *argexpr; @@ -3912,55 +3911,79 @@ gfc_conv_intrinsic_minmax (gfc_se * se, gfc_expr * expr, enum tree_code op) mvar = gfc_create_var (type, "M"); gfc_add_modify (&se->pre, mvar, args[0]); - for (i = 1, argexpr = argexpr->next; i < nargs; i++) - { - tree cond, isnan; + tree builtin = NULL_TREE; + if (SCALAR_FLOAT_TYPE_P (type)) + builtin = mathfn_built_in (type, op == GT_EXPR + ? BUILT_IN_FMAX : BUILT_IN_FMIN); + + for (i = 1, argexpr = argexpr->next; i < nargs; i++, argexpr = argexpr->next) + { + tree cond = NULL_TREE; val = args[i]; /* Handle absent optional arguments by ignoring the comparison. */ if (argexpr->expr->expr_type == EXPR_VARIABLE && argexpr->expr->symtree->n.sym->attr.optional && TREE_CODE (val) == INDIRECT_REF) - cond = fold_build2_loc (input_location, + { + cond = fold_build2_loc (input_location, NE_EXPR, logical_type_node, TREE_OPERAND (val, 0), build_int_cst (TREE_TYPE (TREE_OPERAND (val, 0)), 0)); - else - { - cond = NULL_TREE; - + } + else if (!VAR_P (val) && !TREE_CONSTANT (val)) /* Only evaluate the argument once. */ - if (!VAR_P (val) && !TREE_CONSTANT (val)) - val = gfc_evaluate_now (val, &se->pre); - } + val = gfc_evaluate_now (val, &se->pre); - thencase = build2_v (MODIFY_EXPR, mvar, convert (type, val)); + tree calc; + /* If we dealing with integral types or we don't care about NaNs + just do a MIN/MAX_EXPR. */ + if (!HONOR_NANS (type)) + { + + tree_code code = op == GT_EXPR ? MAX_EXPR : MIN_EXPR; + calc = fold_build2_loc (input_location, code, type, + convert (type, val), mvar); + tmp = build2_v (MODIFY_EXPR, mvar, calc); - tmp = fold_build2_loc (input_location, op, logical_type_node, - convert (type, val), mvar); + } + /* If we care about NaNs and we have __builtin_fmin/max builtins + to perform the comparison, use those. */ + else if (SCALAR_FLOAT_TYPE_P (type) && builtin) + { + calc = build_call_expr_loc (input_location, builtin, + 2, mvar, convert (type, val)); + tmp = build2_v (MODIFY_EXPR, mvar, calc); - /* FIXME: When the IEEE_ARITHMETIC module is implemented, the call to - __builtin_isnan might be made dependent on that module being loaded, - to help performance of programs that don't rely on IEEE semantics. */ - if (FLOAT_TYPE_P (TREE_TYPE (mvar))) + } + /* Otherwise expand to: + mvar = a1; + if (a2 .op. mvar || isnan (mvar)) + mvar = a2; + if (a3 .op. mvar || isnan (mvar)) + mvar = a3; + ... */ + else { - isnan = build_call_expr_loc (input_location, - builtin_decl_explicit (BUILT_IN_ISNAN), - 1, mvar); + tree isnan = build_call_expr_loc (input_location, + builtin_decl_explicit (BUILT_IN_ISNAN), + 1, mvar); + tmp = fold_build2_loc (input_location, op, logical_type_node, + convert (type, val), mvar); + tmp = fold_build2_loc (input_location, TRUTH_OR_EXPR, - logical_type_node, tmp, - fold_convert (logical_type_node, isnan)); + logical_type_node, tmp, + fold_convert (logical_type_node, isnan)); + tmp = build3_v (COND_EXPR, tmp, + build2_v (MODIFY_EXPR, mvar, convert (type, val)), + build_empty_stmt (input_location)); } - tmp = build3_v (COND_EXPR, tmp, thencase, - build_empty_stmt (input_location)); if (cond != NULL_TREE) tmp = build3_v (COND_EXPR, cond, tmp, build_empty_stmt (input_location)); - gfc_add_expr_to_block (&se->pre, tmp); - argexpr = argexpr->next; } se->expr = mvar; } diff --git a/gcc/testsuite/gfortran.dg/max_fmaxf.f90 b/gcc/testsuite/gfortran.dg/max_fmaxf.f90 new file mode 100644 index 0000000000000000000000000000000000000000..9082d4a7e70378efb36e24e3e575f11a74631657 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/max_fmaxf.f90 @@ -0,0 +1,15 @@ +! { dg-do compile } +! { dg-options "-O2 -fdump-tree-optimized" } + +subroutine foo (a, b, c, d, e, f, g, h) + real (kind=8) :: a, b, c, d, e, f, g, h + a = max (a, b, c, d, e, f, g, h) +end subroutine + +subroutine foof (a, b, c, d, e, f, g, h) + real (kind=4) :: a, b, c, d, e, f, g, h + a = max (a, b, c, d, e, f, g, h) +end subroutine + +! { dg-final { scan-tree-dump-times "__builtin_fmax " 7 "optimized" } } +! { dg-final { scan-tree-dump-times "__builtin_fmaxf " 7 "optimized" } } diff --git a/gcc/testsuite/gfortran.dg/max_fmaxl_aarch64.f90 b/gcc/testsuite/gfortran.dg/max_fmaxl_aarch64.f90 new file mode 100644 index 0000000000000000000000000000000000000000..8c8ea063e5d0718dc829c1f5574c5b46040e6786 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/max_fmaxl_aarch64.f90 @@ -0,0 +1,9 @@ +! { dg-do compile { target aarch64*-*-* } } +! { dg-options "-O2 -fdump-tree-optimized" } + +subroutine fool (a, b, c, d, e, f, g, h) + real (kind=16) :: a, b, c, d, e, f, g, h + a = max (a, b, c, d, e, f, g, h) +end subroutine + +! { dg-final { scan-tree-dump-times "__builtin_fmaxl " 7 "optimized" } } diff --git a/gcc/testsuite/gfortran.dg/min_fminf.f90 b/gcc/testsuite/gfortran.dg/min_fminf.f90 new file mode 100644 index 0000000000000000000000000000000000000000..6b929611f1f166877a19c475704a9312edf7b170 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/min_fminf.f90 @@ -0,0 +1,15 @@ +! { dg-do compile } +! { dg-options "-O2 -fdump-tree-optimized" } + +subroutine foo (a, b, c, d, e, f, g, h) + real (kind=8) :: a, b, c, d, e, f, g, h + a = min (a, b, c, d, e, f, g, h) +end subroutine + +subroutine foof (a, b, c, d, e, f, g, h) + real (kind=4) :: a, b, c, d, e, f, g, h + a = min (a, b, c, d, e, f, g, h) +end subroutine + +! { dg-final { scan-tree-dump-times "__builtin_fmin " 7 "optimized" } } +! { dg-final { scan-tree-dump-times "__builtin_fminf " 7 "optimized" } } diff --git a/gcc/testsuite/gfortran.dg/min_fminl_aarch64.f90 b/gcc/testsuite/gfortran.dg/min_fminl_aarch64.f90 new file mode 100644 index 0000000000000000000000000000000000000000..92368917fb48e0c468a16d080ab3a9ac842e01a7 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/min_fminl_aarch64.f90 @@ -0,0 +1,9 @@ +! { dg-do compile { target aarch64*-*-* } } +! { dg-options "-O2 -fdump-tree-optimized" } + +subroutine fool (a, b, c, d, e, f, g, h) + real (kind=16) :: a, b, c, d, e, f, g, h + a = min (a, b, c, d, e, f, g, h) +end subroutine + +! { dg-final { scan-tree-dump-times "__builtin_fminl " 7 "optimized" } } diff --git a/gcc/testsuite/gfortran.dg/minmax_integer.f90 b/gcc/testsuite/gfortran.dg/minmax_integer.f90 new file mode 100644 index 0000000000000000000000000000000000000000..5b6be38c7055ce4e8620cf75ec7d8a182436b24f --- /dev/null +++ b/gcc/testsuite/gfortran.dg/minmax_integer.f90 @@ -0,0 +1,15 @@ +! { dg-do compile } +! { dg-options "-O2 -fdump-tree-optimized" } + +subroutine foo (a, b, c, d, e, f, g, h) + integer (kind=4) :: a, b, c, d, e, f, g, h + a = min (a, b, c, d, e, f, g, h) +end subroutine + +subroutine foof (a, b, c, d, e, f, g, h) + integer (kind=4) :: a, b, c, d, e, f, g, h + a = max (a, b, c, d, e, f, g, h) +end subroutine + +! { dg-final { scan-tree-dump-times "MIN_EXPR" 7 "optimized" } } +! { dg-final { scan-tree-dump-times "MAX_EXPR" 7 "optimized" } } --------------060408070703010602030600--