[PING][Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax*

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PING][Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax*
@ 2015-08-06  9:39 David Sherwood
  0 siblings, 0 replies; 24+ messages in thread
From: David Sherwood @ 2015-08-06  9:39 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2944 bytes --]

Hi,

Sorry to bother people again. Is this OK to go now?

Thanks!
David.

> >
> > > On Mon, 29 Jun 2015, David Sherwood wrote:
> > >
> > > > Hi,
> > > >
> > > > I have added new STRICT_MAX_EXPR and STRICT_MIN_EXPR expressions to support the
> > > > IEEE versions of fmin and fmax. This is done by recognising the math library
> > > > "fmax" and "fmin" builtin functions in a similar way to how this is done for
> > > > -ffast-math. This also allows us to vectorise the IEEE max/min functions for
> > > > targets that support it, for example aarch64/aarch32.
> > >
> > > This patch is missing documentation.  You need to document the new insn
> > > patterns in md.texi and the new tree codes in generic.texi.
> >
> > Hi, I've uploaded a new patch with the documentation. Hope this is ok.
> 
> In various places where you refer to one operand being NaN, I think you
> mean one operand being a *quiet* NaN (if one is a signaling NaN - only
> supported by GCC if -fsignaling-nans - the IEEE minNum and maxNum
> operations raise "invalid" and return a quiet NaN).

Hi, I have a new patch that hopefully addresses the documentation issues.

Thanks,
David.

ChangeLog:

2015-07-15  David Sherwood  <david.sherwood@arm.com>

gcc/
    * builtins.c (integer_valued_real_p): Add STRICT_MIN_EXPR and
    STRICT_MAX_EXPR.
    (fold_builtin_fmin_fmax): For strict math, convert builting fmin and 
    fmax to STRICT_MIN_EXPR and STRICT_MIN_EXPR, respectively.
    * expr.c (expand_expr_real_2): Add STRICT_MIN_EXPR and STRICT_MAX_EXPR.
    * fold-const.c (const_binop): Likewise.
    (fold_binary_loc, tree_binary_nonnegative_warnv_p): Likewise.
    (tree_binary_nonzero_warnv_p): Likewise.
    * optabs.h (strict_minmax_support): Declare.
    * optabs.def: Add new optabs strict_max_optab/strict_min_optab.
    * optabs.c (optab_for_tree_code): Return new optabs for STRICT_MIN_EXPR
    and STRICT_MAX_EXPR.
    (strict_minmax_support): New function.
    * real.c (real_arithmetic): Add STRICT_MIN_EXPR and STRICT_MAX_EXPR.
    * tree.def: Likewise.
    * tree.c (associative_tree_code, commutative_tree_code): Likewise.
    * tree-cfg.c (verify_expr): Likewise.
    (verify_gimple_assign_binary): Likewise.
    * tree-inline.c (estimate_operator_cost): Likewise.
    * tree-pretty-print.c (dump_generic_node, op_code_prio): Likewise.
    (op_symbol_code): Likewise.
gcc/config:
    * aarch64/aarch64.md: New pattern.
    * aarch64/aarch64-simd.md: Likewise.
    * aarch64/iterators.md: New unspecs, iterators.
    * arm/iterators.md: New iterators.
    * arm/unspecs.md: New unspecs.
    * arm/neon.md: New pattern.
    * arm/vfp.md: Likewise.
gcc/doc:
    * generic.texi: Add STRICT_MAX_EXPR and STRICT_MIN_EXPR.
    * md.texi: Add strict_min and strict_max patterns.
gcc/testsuite
    * gcc.target/aarch64/maxmin_strict.c: New test.
    * gcc.target/arm/maxmin_strict.c: New test.

[-- Attachment #2: strict_max.patch --]
[-- Type: application/octet-stream, Size: 21492 bytes --]

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 1b5e659..ef1a15f 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -7438,6 +7438,8 @@ integer_valued_real_p (tree t)
     case MULT_EXPR:
     case MIN_EXPR:
     case MAX_EXPR:
+    case STRICT_MIN_EXPR:
+    case STRICT_MAX_EXPR:
       return integer_valued_real_p (TREE_OPERAND (t, 0))
 	     && integer_valued_real_p (TREE_OPERAND (t, 1));
 
@@ -9221,6 +9223,10 @@ fold_builtin_fmin_fmax (location_t loc, tree arg0, tree arg1,
 	return fold_build2_loc (loc, (max ? MAX_EXPR : MIN_EXPR), type,
 			    fold_convert_loc (loc, type, arg0),
 			    fold_convert_loc (loc, type, arg1));
+      else if (strict_minmax_support (type, max))
+	return fold_build2_loc (loc, (max ? STRICT_MAX_EXPR : STRICT_MIN_EXPR),
+			    type, fold_convert_loc (loc, type, arg0),
+			    fold_convert_loc (loc, type, arg1));
     }
   return NULL_TREE;
 }
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index b90f938..72f5877 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1821,6 +1821,15 @@
   [(set_attr "type" "neon_fp_minmax_<Vetype><q>")]
 )
 
+(define_insn "<maxmin_strict><mode>3"
+  [(set (match_operand:VDQF 0 "register_operand" "=w")
+	(unspec:VDQF [(match_operand:VDQF 1 "register_operand" "0")
+		      (match_operand:VDQF 2 "register_operand" "w")]
+		      FMAXMIN_STRICT))]
+  "TARGET_SIMD"
+  "<maxmin_strict_op>\\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
+)
+
 (define_insn "<maxmin_uns><mode>3"
   [(set (match_operand:VDQF 0 "register_operand" "=w")
        (unspec:VDQF [(match_operand:VDQF 1 "register_operand" "w")
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index d3f5d5b..ee9bf99 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4234,6 +4234,15 @@
   [(set_attr "type" "f_minmax<s>")]
 )
 
+(define_insn "<maxmin_strict><mode>3"
+  [(set (match_operand:GPF 0 "register_operand" "=w")
+	(unspec:GPF [(match_operand:GPF 1 "register_operand" "0")
+		     (match_operand:GPF 2 "register_operand" "w")]
+		     FMAXMIN_STRICT))]
+  "TARGET_FLOAT"
+  "<maxmin_strict_op>\\t%<s>0, %<s>1, %<s>2"
+)
+
 ;; -------------------------------------------------------------------
 ;; Reload support
 ;; -------------------------------------------------------------------
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 498358a..0a7c760 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -279,6 +279,8 @@
     UNSPEC_PMULL2       ; Used in aarch64-simd.md.
     UNSPEC_REV_REGLIST  ; Used in aarch64-simd.md.
     UNSPEC_VEC_SHR      ; Used in aarch64-simd.md.
+    UNSPEC_FMAX_STRICT  ; Used in aarch64-simd.md.
+    UNSPEC_FMIN_STRICT  ; Used in aarch64-simd.md.
 ])
 
 ;; -------------------------------------------------------------------
@@ -868,6 +870,8 @@
 
 (define_int_iterator FMAXMIN_UNS [UNSPEC_FMAX UNSPEC_FMIN])
 
+(define_int_iterator FMAXMIN_STRICT [UNSPEC_FMAX_STRICT UNSPEC_FMIN_STRICT])
+
 (define_int_iterator VQDMULH [UNSPEC_SQDMULH UNSPEC_SQRDMULH])
 
 (define_int_iterator USSUQADD [UNSPEC_SUQADD UNSPEC_USQADD])
@@ -948,6 +952,12 @@
 				 (UNSPEC_FMINNMV "fminnm")
 				 (UNSPEC_FMINV "fmin")])
 
+(define_int_attr  maxmin_strict [(UNSPEC_FMAX_STRICT "strict_max")
+				 (UNSPEC_FMIN_STRICT "strict_min")])
+
+(define_int_attr  maxmin_strict_op [(UNSPEC_FMAX_STRICT "fmaxnm")
+				    (UNSPEC_FMIN_STRICT "fminnm")])
+
 (define_int_attr sur [(UNSPEC_SHADD "s") (UNSPEC_UHADD "u")
 		      (UNSPEC_SRHADD "sr") (UNSPEC_URHADD "ur")
 		      (UNSPEC_SHSUB "s") (UNSPEC_UHSUB "u")
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 1e7f3f1..3b24e4d 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -292,6 +292,8 @@
 
 (define_int_iterator VMAXMINF [UNSPEC_VMAX UNSPEC_VMIN])
 
+(define_int_iterator VMAXMINF_STRICT [UNSPEC_VMAX_STRICT UNSPEC_VMIN_STRICT])
+
 (define_int_iterator VPADDL [UNSPEC_VPADDL_S UNSPEC_VPADDL_U])
 
 (define_int_iterator VPADAL [UNSPEC_VPADAL_S UNSPEC_VPADAL_U])
@@ -716,6 +718,13 @@
   (UNSPEC_VPMIN "min") (UNSPEC_VPMIN_U "min")
 ])
 
+(define_int_attr  maxmin_strict [
+  (UNSPEC_VMAX_STRICT "strict_max") (UNSPEC_VMIN_STRICT "strict_min")])
+
+(define_int_attr maxmin_strict_op [
+  (UNSPEC_VMAX_STRICT "vmaxnm") (UNSPEC_VMIN_STRICT "vminnm")
+])
+
 (define_int_attr shift_op [
   (UNSPEC_VSHL_S "shl") (UNSPEC_VSHL_U "shl")
   (UNSPEC_VRSHL_S "rshl") (UNSPEC_VRSHL_U "rshl")
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 654d9d5..e71e31f 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -2354,6 +2354,15 @@
   [(set_attr "type" "neon_fp_minmax_s<q>")]
 )
 
+(define_insn "<maxmin_strict><mode>3"
+  [(set (match_operand:VCVTF 0 "s_register_operand" "=w")
+	(unspec:VCVTF [(match_operand:VCVTF 1 "s_register_operand" "w")
+		       (match_operand:VCVTF 2 "s_register_operand" "w")]
+		       VMAXMINF_STRICT))]
+  "TARGET_NEON && TARGET_FPU_ARMV8"
+  "<maxmin_strict_op>.<V_s_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+)
+
 (define_expand "neon_vpadd<mode>"
   [(match_operand:VD 0 "s_register_operand" "=w")
    (match_operand:VD 1 "s_register_operand" "w")
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index 0ec2c48..83094d5 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -224,8 +224,10 @@
   UNSPEC_VLD4_LANE
   UNSPEC_VMAX
   UNSPEC_VMAX_U
+  UNSPEC_VMAX_STRICT
   UNSPEC_VMIN
   UNSPEC_VMIN_U
+  UNSPEC_VMIN_STRICT
   UNSPEC_VMLA
   UNSPEC_VMLA_LANE
   UNSPEC_VMLAL_S
diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
index f62ff79..351af4f 100644
--- a/gcc/config/arm/vfp.md
+++ b/gcc/config/arm/vfp.md
@@ -1345,6 +1345,15 @@
    (set_attr "conds" "unconditional")]
 )
 
+(define_insn "<maxmin_strict><mode>3"
+  [(set (match_operand:SDF 0 "s_register_operand" "=<F_constraint>")
+	(unspec:SDF [(match_operand:SDF 1 "s_register_operand" "<F_constraint>")
+		     (match_operand:SDF 2 "s_register_operand" "<F_constraint>")]
+		     VMAXMINF_STRICT))]
+  "TARGET_HARD_FLOAT && TARGET_VFP5 <vfp_double_cond>"
+  "<maxmin_strict_op>.<V_if_elem>\\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+)
+
 ;; Write Floating-point Status and Control Register.
 (define_insn "set_fpscr"
   [(unspec_volatile [(match_operand:SI 0 "register_operand" "r")] VUNSPEC_SET_FPSCR)]
diff --git a/gcc/doc/generic.texi b/gcc/doc/generic.texi
index bbafad9..8dad9a7 100644
--- a/gcc/doc/generic.texi
+++ b/gcc/doc/generic.texi
@@ -1268,6 +1268,8 @@ the byte offset of the field, but should not be used directly; call
 @tindex TARGET_EXPR
 @tindex VA_ARG_EXPR
 @tindex ANNOTATE_EXPR
+@tindex STRICT_MAX_EXPR
+@tindex STRICT_MIN_EXPR
 
 @table @code
 @item NEGATE_EXPR
@@ -1687,8 +1689,16 @@ its sole argument yields the representation for @code{ap}.
 This node is used to attach markers to an expression. The first operand
 is the annotated expression, the second is an @code{INTEGER_CST} with
 a value from @code{enum annot_expr_kind}.
-@end table
 
+@item STRICT_MAX_EXPR
+@item STRICT_MIN_EXPR
+These nodes represent IEEE-conformant maximum and minimum operations.  If either
+operand is a quiet @code{NaN} the other operand is returned.  If both operands
+are quiet @code{NaN}, then a quiet @code{NaN} is returned.  In the case when gcc
+supports signalling @code{NaN} (-fsignaling-nans) an invalid floating point
+exception is raised and a quiet @code{NaN} is returned.
+
+@end table
 
 @node Vectors
 @subsection Vectors
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index e991286..f1c3417 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -4869,6 +4869,15 @@ Signed minimum and maximum operations.  When used with floating point,
 if both operands are zeros, or if either operand is @code{NaN}, then
 it is unspecified which of the two operands is returned as the result.
 
+@cindex @code{strict_min@var{m}3} instruction pattern
+@cindex @code{strict_max@var{m}3} instruction pattern
+@item @samp{strict_min@var{m}3}, @samp{strict_max@var{m}3}
+IEEE-conformant minimum and maximum operations.  If one operand is a quiet
+@code{NaN}, then the other operand is returned.  If both operands are quiet
+@code{NaN}, then a quiet @code{NaN} is returned.  In the case when gcc supports
+signalling @code{NaN} (-fsignaling-nans) an invalid floating point exception is
+raised and a quiet @code{NaN} is returned.
+
 @cindex @code{reduc_smin_@var{m}} instruction pattern
 @cindex @code{reduc_smax_@var{m}} instruction pattern
 @item @samp{reduc_smin_@var{m}}, @samp{reduc_smax_@var{m}}
diff --git a/gcc/expr.c b/gcc/expr.c
index 78904c2..e2adb01 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -8729,6 +8729,8 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
       return expand_abs (mode, op0, target, unsignedp,
 			 safe_from_p (target, treeop0, 1));
 
+    case STRICT_MAX_EXPR:
+    case STRICT_MIN_EXPR:
     case MAX_EXPR:
     case MIN_EXPR:
       target = original_target;
diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 60aa210..143f457 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -1164,6 +1164,8 @@ const_binop (enum tree_code code, tree arg1, tree arg2)
 	case RDIV_EXPR:
 	case MIN_EXPR:
 	case MAX_EXPR:
+	case STRICT_MIN_EXPR:
+	case STRICT_MAX_EXPR:
 	  break;
 
 	default:
@@ -9872,7 +9874,8 @@ fold_binary_loc (location_t loc,
      cases, the appropriate type conversions should be put back in
      the tree that will get out of the constant folder.  */
 
-  if (kind == tcc_comparison || code == MIN_EXPR || code == MAX_EXPR)
+  if (kind == tcc_comparison || code == MIN_EXPR || code == MAX_EXPR
+      || code == STRICT_MIN_EXPR || code == STRICT_MAX_EXPR)
     {
       STRIP_SIGN_NOPS (arg0);
       STRIP_SIGN_NOPS (arg1);
@@ -14773,6 +14776,7 @@ tree_binary_nonnegative_warnv_p (enum tree_code code, tree type, tree op0,
 
     case BIT_AND_EXPR:
     case MAX_EXPR:
+    case STRICT_MAX_EXPR:
       return (tree_expr_nonnegative_warnv_p (op0,
 					     strict_overflow_p)
 	      || tree_expr_nonnegative_warnv_p (op1,
@@ -14781,6 +14785,7 @@ tree_binary_nonnegative_warnv_p (enum tree_code code, tree type, tree op0,
     case BIT_IOR_EXPR:
     case BIT_XOR_EXPR:
     case MIN_EXPR:
+    case STRICT_MIN_EXPR:
     case RDIV_EXPR:
     case TRUNC_DIV_EXPR:
     case CEIL_DIV_EXPR:
@@ -15235,6 +15240,7 @@ tree_binary_nonzero_warnv_p (enum tree_code code,
       break;
 
     case MIN_EXPR:
+    case STRICT_MIN_EXPR:
       sub_strict_overflow_p = false;
       if (tree_expr_nonzero_warnv_p (op0,
 				     &sub_strict_overflow_p)
@@ -15247,6 +15253,7 @@ tree_binary_nonzero_warnv_p (enum tree_code code,
       break;
 
     case MAX_EXPR:
+    case STRICT_MAX_EXPR:
       sub_strict_overflow_p = false;
       if (tree_expr_nonzero_warnv_p (op0,
 				     &sub_strict_overflow_p))
diff --git a/gcc/optabs.c b/gcc/optabs.c
index 491341b..ca642de 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -482,6 +482,12 @@ optab_for_tree_code (enum tree_code code, const_tree type,
     case MIN_EXPR:
       return TYPE_UNSIGNED (type) ? umin_optab : smin_optab;
 
+    case STRICT_MAX_EXPR:
+      return strict_max_optab;
+
+    case STRICT_MIN_EXPR:
+      return strict_min_optab;
+
     case REALIGN_LOAD_EXPR:
       return vec_realign_load_optab;
 
@@ -6798,6 +6804,16 @@ expand_vec_perm (machine_mode mode, rtx v0, rtx v1, rtx sel, rtx target)
   return tmp;
 }
 
+/* Return true if the target supports strict math max (MAX = TRUE) and min
+   (MAX = FALSE) operations on type TYPE.  */
+bool
+strict_minmax_support (tree type, bool max)
+{
+  optab optab = optab_for_tree_code
+    (max ? STRICT_MAX_EXPR : STRICT_MIN_EXPR, type, optab_default);
+  return optab_handler (optab, TYPE_MODE (type)) != CODE_FOR_nothing;
+}
+
 /* Return insn code for a conditional operator with a comparison in
    mode CMODE, unsigned if UNS is true, resulting in a value of mode VMODE.  */
 
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 888b21c..7a79e76 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -244,6 +244,10 @@ OPTAB_D (sin_optab, "sin$a2")
 OPTAB_D (sincos_optab, "sincos$a3")
 OPTAB_D (tan_optab, "tan$a2")
 
+/* C99 implementations of fmax/fmin.  */
+OPTAB_D (strict_max_optab, "strict_max$a3")
+OPTAB_D (strict_min_optab, "strict_min$a3")
+
 /* Vector reduction to a scalar.  */
 OPTAB_D (reduc_smax_scal_optab, "reduc_smax_scal_$a")
 OPTAB_D (reduc_smin_scal_optab, "reduc_smin_scal_$a")
diff --git a/gcc/optabs.h b/gcc/optabs.h
index 95f5cbc..14b7a39 100644
--- a/gcc/optabs.h
+++ b/gcc/optabs.h
@@ -565,4 +565,6 @@ extern bool lshift_cheap_p (bool);
 
 extern enum rtx_code get_rtx_code (enum tree_code tcode, bool unsignedp);
 
+extern bool strict_minmax_support (tree, bool);
+
 #endif /* GCC_OPTABS_H */
diff --git a/gcc/real.c b/gcc/real.c
index 2d34b62..aa2f63c 100644
--- a/gcc/real.c
+++ b/gcc/real.c
@@ -1034,6 +1034,15 @@ real_arithmetic (REAL_VALUE_TYPE *r, int icode, const REAL_VALUE_TYPE *op0,
 	*r = *op1;
       break;
 
+    case STRICT_MIN_EXPR:
+      if (op0->cl == rvc_nan)
+	*r = *op1;
+      else if (do_compare (op0, op1, -1) < 0)
+	*r = *op0;
+      else
+	*r = *op1;
+      break;
+
     case MAX_EXPR:
       if (op1->cl == rvc_nan)
 	*r = *op1;
@@ -1043,6 +1052,15 @@ real_arithmetic (REAL_VALUE_TYPE *r, int icode, const REAL_VALUE_TYPE *op0,
 	*r = *op0;
       break;
 
+    case STRICT_MAX_EXPR:
+      if (op0->cl == rvc_nan)
+	*r = *op1;
+      else if (do_compare (op0, op1, 1) < 0)
+	*r = *op1;
+      else
+	*r = *op0;
+      break;
+
     case NEGATE_EXPR:
       *r = *op0;
       r->sign ^= 1;
diff --git a/gcc/testsuite/gcc.target/aarch64/maxmin_strict.c b/gcc/testsuite/gcc.target/aarch64/maxmin_strict.c
new file mode 100644
index 0000000..09cea1d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/maxmin_strict.c
@@ -0,0 +1,69 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -ftree-vectorize -fno-inline -save-temps" } */
+
+
+extern void abort (void);
+double fmax(double, double);
+float fmaxf(float, float);
+double fmin(double, double);
+float fminf(float, float);
+
+#define isnan __builtin_isnan
+#define isinf __builtin_isinf
+
+#define NAN __builtin_nan ("")
+#define INFINITY __builtin_inf ()
+
+#define NUM_ELEMS(TYPE) (16 / sizeof (TYPE))
+
+#define DEF_MAXMIN(TYPE,FUN)\
+void test_##FUN (TYPE *__restrict__ r, TYPE *__restrict__ a,\
+		 TYPE *__restrict__ b)\
+{\
+  int i;\
+  for (i = 0; i < NUM_ELEMS (TYPE); i++)\
+    r[i] = FUN (a[i], b[i]);\
+}\
+
+DEF_MAXMIN (float, fmaxf)
+DEF_MAXMIN (double, fmax)
+
+DEF_MAXMIN (float, fminf)
+DEF_MAXMIN (double, fmin)
+
+int main ()
+{
+  float a_f[4] = { 4, NAN, -3, INFINITY };
+  float b_f[4] = { 1,   7,NAN, 0 };
+  float r_f[4];
+  double a_d[4] = { 4, NAN,  -3,  INFINITY };
+  double b_d[4] = { 1,   7, NAN,  0 };
+  double r_d[4];
+
+  test_fmaxf (r_f, a_f, b_f);
+  if (r_f[0] != 4 || isnan (r_f[1]) || isnan (r_f[2]) || !isinf (r_f[3]))
+    abort ();
+
+  test_fminf (r_f, a_f, b_f);
+  if (r_f[0] != 1 || isnan (r_f[1]) || isnan (r_f[2]) || isinf (r_f[3]))
+    abort ();
+
+  test_fmax (r_d, a_d, b_d);
+  test_fmax (&r_d[2], &a_d[2], &b_d[2]);
+  if (r_d[0] != 4 || isnan (r_d[1]) || isnan (r_d[2]) || !isinf (r_d[3]))
+    abort ();
+
+  test_fmin (r_d, a_d, b_d);
+  test_fmin (&r_d[2], &a_d[2], &b_d[2]);
+  if (r_d[0] != 1 || isnan (r_d[1]) || isnan (r_d[2]) || isinf (r_d[3]))
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times "fmaxnm\tv\[0-9\]+\.4s, v\[0-9\]+\.4s, v\[0-9\]+\.4s" 1 } } */
+/* { dg-final { scan-assembler-times "fmaxnm\tv\[0-9\]+\.2d, v\[0-9\]+\.2d, v\[0-9\]+\.2d" 1 } } */
+
+/* { dg-final { scan-assembler-times "fminnm\tv\[0-9\]+\.4s, v\[0-9\]+\.4s, v\[0-9\]+\.4s" 1 } } */
+/* { dg-final { scan-assembler-times "fminnm\tv\[0-9\]+\.2d, v\[0-9\]+\.2d, v\[0-9\]+\.2d" 1 } } */
+
diff --git a/gcc/testsuite/gcc.target/arm/maxmin_strict.c b/gcc/testsuite/gcc.target/arm/maxmin_strict.c
new file mode 100644
index 0000000..aa1dd6c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/maxmin_strict.c
@@ -0,0 +1,67 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_neon_ok } */
+/* { dg-options "-O2 -ftree-vectorize -fno-inline -march=armv8-a -save-temps" } */
+/* { dg-add-options arm_v8_neon } */
+
+extern void abort (void);
+double fmax(double, double);
+float fmaxf(float, float);
+double fmin(double, double);
+float fminf(float, float);
+
+#define isnan __builtin_isnan
+#define isinf __builtin_isinf
+
+#define NAN __builtin_nan ("")
+#define INFINITY __builtin_inf ()
+
+#define DEF_MAXMIN(TYPE,FUN)\
+void test_##FUN (TYPE *__restrict__ r, TYPE *__restrict__ a,\
+		 TYPE *__restrict__ b)\
+{\
+  int i;\
+  for (i = 0; i < 4; i++)\
+    r[i] = FUN (a[i], b[i]);\
+}\
+
+DEF_MAXMIN (float, fmaxf)
+DEF_MAXMIN (double, fmax)
+
+DEF_MAXMIN (float, fminf)
+DEF_MAXMIN (double, fmin)
+
+int main ()
+{
+  float a_f[4] = { 4, NAN, -3, INFINITY };
+  float b_f[4] = { 1,   7,NAN, 0 };
+  float r_f[4];
+  double a_d[4] = { 4, NAN,  -3,  INFINITY };
+  double b_d[4] = { 1,   7, NAN,  0 };
+  double r_d[4];
+
+  test_fmaxf (r_f, a_f, b_f);
+  if (r_f[0] != 4 || isnan (r_f[1]) || isnan (r_f[2]) || !isinf (r_f[3]))
+    abort ();
+
+  test_fminf (r_f, a_f, b_f);
+  if (r_f[0] != 1 || isnan (r_f[1]) || isnan (r_f[2]) || isinf (r_f[3]))
+    abort ();
+
+  test_fmax (r_d, a_d, b_d);
+  if (r_d[0] != 4 || isnan (r_d[1]) || isnan (r_d[2]) || !isinf (r_d[3]))
+    abort ();
+
+  test_fmin (r_d, a_d, b_d);
+  if (r_d[0] != 1 || isnan (r_d[1]) || isnan (r_d[2]) || isinf (r_d[3]))
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times "vmaxnm.f32\tq\[0-9\]+, q\[0-9\]+, q\[0-9\]+" 1 } } */
+/* { dg-final { scan-assembler-times "vminnm.f32\tq\[0-9\]+, q\[0-9\]+, q\[0-9\]+" 1 } } */
+
+/* NOTE: There are no double precision vector versions of vmaxnm/vminnm.  */
+/* { dg-final { scan-assembler-times "vmaxnm.f64\td\[0-9\]+, d\[0-9\]+, d\[0-9\]+" 1 } } */
+/* { dg-final { scan-assembler-times "vminnm.f64\td\[0-9\]+, d\[0-9\]+, d\[0-9\]+" 1 } } */
+
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index adc56ba..f717e37 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -3091,6 +3091,8 @@ verify_expr (tree *tp, int *walk_subtrees, void *data ATTRIBUTE_UNUSED)
     case EXACT_DIV_EXPR:
     case MIN_EXPR:
     case MAX_EXPR:
+    case STRICT_MIN_EXPR:
+    case STRICT_MAX_EXPR:
     case LSHIFT_EXPR:
     case RSHIFT_EXPR:
     case LROTATE_EXPR:
@@ -3916,6 +3918,8 @@ verify_gimple_assign_binary (gassign *stmt)
     case EXACT_DIV_EXPR:
     case MIN_EXPR:
     case MAX_EXPR:
+    case STRICT_MIN_EXPR:
+    case STRICT_MAX_EXPR:
     case BIT_IOR_EXPR:
     case BIT_XOR_EXPR:
     case BIT_AND_EXPR:
diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
index ce9495d..1b95154 100644
--- a/gcc/tree-inline.c
+++ b/gcc/tree-inline.c
@@ -3888,6 +3888,8 @@ estimate_operator_cost (enum tree_code code, eni_weights *weights,
     case FLOAT_EXPR:
     case MIN_EXPR:
     case MAX_EXPR:
+    case STRICT_MIN_EXPR:
+    case STRICT_MAX_EXPR:
     case ABS_EXPR:
 
     case LSHIFT_EXPR:
diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c
index 13587e6..6d13fd2 100644
--- a/gcc/tree-pretty-print.c
+++ b/gcc/tree-pretty-print.c
@@ -2849,6 +2849,8 @@ dump_generic_node (pretty_printer *pp, tree node, int spc, int flags,
       pp_string (pp, " > ");
       break;
 
+    case STRICT_MIN_EXPR:
+    case STRICT_MAX_EXPR:
     case VEC_WIDEN_MULT_HI_EXPR:
     case VEC_WIDEN_MULT_LO_EXPR:
     case VEC_WIDEN_MULT_EVEN_EXPR:
@@ -3223,6 +3225,8 @@ op_code_prio (enum tree_code code)
       /* Special expressions.  */
     case MIN_EXPR:
     case MAX_EXPR:
+    case STRICT_MIN_EXPR:
+    case STRICT_MAX_EXPR:
     case ABS_EXPR:
     case REALPART_EXPR:
     case IMAGPART_EXPR:
@@ -3419,6 +3423,12 @@ op_symbol_code (enum tree_code code)
     case MIN_EXPR:
       return "min";
 
+    case STRICT_MAX_EXPR:
+      return "strictmax";
+
+    case STRICT_MIN_EXPR:
+      return "strictmin";
+
     default:
       return "<<< ??? >>>";
     }
diff --git a/gcc/tree.c b/gcc/tree.c
index f6ab441..2d6b909 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -7529,6 +7529,8 @@ associative_tree_code (enum tree_code code)
     case MULT_EXPR:
     case MIN_EXPR:
     case MAX_EXPR:
+    case STRICT_MIN_EXPR:
+    case STRICT_MAX_EXPR:
       return true;
 
     default:
@@ -7549,6 +7551,8 @@ commutative_tree_code (enum tree_code code)
     case MULT_HIGHPART_EXPR:
     case MIN_EXPR:
     case MAX_EXPR:
+    case STRICT_MIN_EXPR:
+    case STRICT_MAX_EXPR:
     case BIT_IOR_EXPR:
     case BIT_XOR_EXPR:
     case BIT_AND_EXPR:
diff --git a/gcc/tree.def b/gcc/tree.def
index 56580af..daa4c77 100644
--- a/gcc/tree.def
+++ b/gcc/tree.def
@@ -722,6 +722,14 @@ DEFTREECODE (NEGATE_EXPR, "negate_expr", tcc_unary, 1)
 DEFTREECODE (MIN_EXPR, "min_expr", tcc_binary, 2)
 DEFTREECODE (MAX_EXPR, "max_expr", tcc_binary, 2)
 
+/* Minimum and maximum values, but when used with floating point it conforms to
+   the C99 definition of fmax and fmin, i.e.
+     1. if one operand is NaN the other numeric value is returned,
+     2. if both operands are NaN then a NaN is returned,
+     3. there is no distinction between -0 and 0.  */
+DEFTREECODE (STRICT_MIN_EXPR, "strict_min_expr", tcc_binary, 2)
+DEFTREECODE (STRICT_MAX_EXPR, "strict_max_expr", tcc_binary, 2)
+
 /* Represents the absolute value of the operand.
 
    An ABS_EXPR must have either an INTEGER_TYPE or a REAL_TYPE.  The

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PING][Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax*
@ 2015-08-13 10:13 David Sherwood
  2015-08-13 11:12 ` Richard Biener
  0 siblings, 1 reply; 24+ messages in thread
From: David Sherwood @ 2015-08-13 10:13 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2944 bytes --]

Hi,

Sorry to bother people again. Is this OK to go now?

Thanks!
David.

> >
> > > On Mon, 29 Jun 2015, David Sherwood wrote:
> > >
> > > > Hi,
> > > >
> > > > I have added new STRICT_MAX_EXPR and STRICT_MIN_EXPR expressions to support the
> > > > IEEE versions of fmin and fmax. This is done by recognising the math library
> > > > "fmax" and "fmin" builtin functions in a similar way to how this is done for
> > > > -ffast-math. This also allows us to vectorise the IEEE max/min functions for
> > > > targets that support it, for example aarch64/aarch32.
> > >
> > > This patch is missing documentation.  You need to document the new insn
> > > patterns in md.texi and the new tree codes in generic.texi.
> >
> > Hi, I've uploaded a new patch with the documentation. Hope this is ok.
> 
> In various places where you refer to one operand being NaN, I think you
> mean one operand being a *quiet* NaN (if one is a signaling NaN - only
> supported by GCC if -fsignaling-nans - the IEEE minNum and maxNum
> operations raise "invalid" and return a quiet NaN).

Hi, I have a new patch that hopefully addresses the documentation issues.

Thanks,
David.

ChangeLog:

2015-07-15  David Sherwood  <david.sherwood@arm.com>

gcc/
    * builtins.c (integer_valued_real_p): Add STRICT_MIN_EXPR and
    STRICT_MAX_EXPR.
    (fold_builtin_fmin_fmax): For strict math, convert builting fmin and 
    fmax to STRICT_MIN_EXPR and STRICT_MIN_EXPR, respectively.
    * expr.c (expand_expr_real_2): Add STRICT_MIN_EXPR and STRICT_MAX_EXPR.
    * fold-const.c (const_binop): Likewise.
    (fold_binary_loc, tree_binary_nonnegative_warnv_p): Likewise.
    (tree_binary_nonzero_warnv_p): Likewise.
    * optabs.h (strict_minmax_support): Declare.
    * optabs.def: Add new optabs strict_max_optab/strict_min_optab.
    * optabs.c (optab_for_tree_code): Return new optabs for STRICT_MIN_EXPR
    and STRICT_MAX_EXPR.
    (strict_minmax_support): New function.
    * real.c (real_arithmetic): Add STRICT_MIN_EXPR and STRICT_MAX_EXPR.
    * tree.def: Likewise.
    * tree.c (associative_tree_code, commutative_tree_code): Likewise.
    * tree-cfg.c (verify_expr): Likewise.
    (verify_gimple_assign_binary): Likewise.
    * tree-inline.c (estimate_operator_cost): Likewise.
    * tree-pretty-print.c (dump_generic_node, op_code_prio): Likewise.
    (op_symbol_code): Likewise.
gcc/config:
    * aarch64/aarch64.md: New pattern.
    * aarch64/aarch64-simd.md: Likewise.
    * aarch64/iterators.md: New unspecs, iterators.
    * arm/iterators.md: New iterators.
    * arm/unspecs.md: New unspecs.
    * arm/neon.md: New pattern.
    * arm/vfp.md: Likewise.
gcc/doc:
    * generic.texi: Add STRICT_MAX_EXPR and STRICT_MIN_EXPR.
    * md.texi: Add strict_min and strict_max patterns.
gcc/testsuite
    * gcc.target/aarch64/maxmin_strict.c: New test.
    * gcc.target/arm/maxmin_strict.c: New test.

[-- Attachment #2: strict_max.patch --]
[-- Type: application/octet-stream, Size: 21492 bytes --]

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 1b5e659..ef1a15f 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -7438,6 +7438,8 @@ integer_valued_real_p (tree t)
     case MULT_EXPR:
     case MIN_EXPR:
     case MAX_EXPR:
+    case STRICT_MIN_EXPR:
+    case STRICT_MAX_EXPR:
       return integer_valued_real_p (TREE_OPERAND (t, 0))
 	     && integer_valued_real_p (TREE_OPERAND (t, 1));
 
@@ -9221,6 +9223,10 @@ fold_builtin_fmin_fmax (location_t loc, tree arg0, tree arg1,
 	return fold_build2_loc (loc, (max ? MAX_EXPR : MIN_EXPR), type,
 			    fold_convert_loc (loc, type, arg0),
 			    fold_convert_loc (loc, type, arg1));
+      else if (strict_minmax_support (type, max))
+	return fold_build2_loc (loc, (max ? STRICT_MAX_EXPR : STRICT_MIN_EXPR),
+			    type, fold_convert_loc (loc, type, arg0),
+			    fold_convert_loc (loc, type, arg1));
     }
   return NULL_TREE;
 }
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index b90f938..72f5877 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1821,6 +1821,15 @@
   [(set_attr "type" "neon_fp_minmax_<Vetype><q>")]
 )
 
+(define_insn "<maxmin_strict><mode>3"
+  [(set (match_operand:VDQF 0 "register_operand" "=w")
+	(unspec:VDQF [(match_operand:VDQF 1 "register_operand" "0")
+		      (match_operand:VDQF 2 "register_operand" "w")]
+		      FMAXMIN_STRICT))]
+  "TARGET_SIMD"
+  "<maxmin_strict_op>\\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
+)
+
 (define_insn "<maxmin_uns><mode>3"
   [(set (match_operand:VDQF 0 "register_operand" "=w")
        (unspec:VDQF [(match_operand:VDQF 1 "register_operand" "w")
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index d3f5d5b..ee9bf99 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4234,6 +4234,15 @@
   [(set_attr "type" "f_minmax<s>")]
 )
 
+(define_insn "<maxmin_strict><mode>3"
+  [(set (match_operand:GPF 0 "register_operand" "=w")
+	(unspec:GPF [(match_operand:GPF 1 "register_operand" "0")
+		     (match_operand:GPF 2 "register_operand" "w")]
+		     FMAXMIN_STRICT))]
+  "TARGET_FLOAT"
+  "<maxmin_strict_op>\\t%<s>0, %<s>1, %<s>2"
+)
+
 ;; -------------------------------------------------------------------
 ;; Reload support
 ;; -------------------------------------------------------------------
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 498358a..0a7c760 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -279,6 +279,8 @@
     UNSPEC_PMULL2       ; Used in aarch64-simd.md.
     UNSPEC_REV_REGLIST  ; Used in aarch64-simd.md.
     UNSPEC_VEC_SHR      ; Used in aarch64-simd.md.
+    UNSPEC_FMAX_STRICT  ; Used in aarch64-simd.md.
+    UNSPEC_FMIN_STRICT  ; Used in aarch64-simd.md.
 ])
 
 ;; -------------------------------------------------------------------
@@ -868,6 +870,8 @@
 
 (define_int_iterator FMAXMIN_UNS [UNSPEC_FMAX UNSPEC_FMIN])
 
+(define_int_iterator FMAXMIN_STRICT [UNSPEC_FMAX_STRICT UNSPEC_FMIN_STRICT])
+
 (define_int_iterator VQDMULH [UNSPEC_SQDMULH UNSPEC_SQRDMULH])
 
 (define_int_iterator USSUQADD [UNSPEC_SUQADD UNSPEC_USQADD])
@@ -948,6 +952,12 @@
 				 (UNSPEC_FMINNMV "fminnm")
 				 (UNSPEC_FMINV "fmin")])
 
+(define_int_attr  maxmin_strict [(UNSPEC_FMAX_STRICT "strict_max")
+				 (UNSPEC_FMIN_STRICT "strict_min")])
+
+(define_int_attr  maxmin_strict_op [(UNSPEC_FMAX_STRICT "fmaxnm")
+				    (UNSPEC_FMIN_STRICT "fminnm")])
+
 (define_int_attr sur [(UNSPEC_SHADD "s") (UNSPEC_UHADD "u")
 		      (UNSPEC_SRHADD "sr") (UNSPEC_URHADD "ur")
 		      (UNSPEC_SHSUB "s") (UNSPEC_UHSUB "u")
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 1e7f3f1..3b24e4d 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -292,6 +292,8 @@
 
 (define_int_iterator VMAXMINF [UNSPEC_VMAX UNSPEC_VMIN])
 
+(define_int_iterator VMAXMINF_STRICT [UNSPEC_VMAX_STRICT UNSPEC_VMIN_STRICT])
+
 (define_int_iterator VPADDL [UNSPEC_VPADDL_S UNSPEC_VPADDL_U])
 
 (define_int_iterator VPADAL [UNSPEC_VPADAL_S UNSPEC_VPADAL_U])
@@ -716,6 +718,13 @@
   (UNSPEC_VPMIN "min") (UNSPEC_VPMIN_U "min")
 ])
 
+(define_int_attr  maxmin_strict [
+  (UNSPEC_VMAX_STRICT "strict_max") (UNSPEC_VMIN_STRICT "strict_min")])
+
+(define_int_attr maxmin_strict_op [
+  (UNSPEC_VMAX_STRICT "vmaxnm") (UNSPEC_VMIN_STRICT "vminnm")
+])
+
 (define_int_attr shift_op [
   (UNSPEC_VSHL_S "shl") (UNSPEC_VSHL_U "shl")
   (UNSPEC_VRSHL_S "rshl") (UNSPEC_VRSHL_U "rshl")
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 654d9d5..e71e31f 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -2354,6 +2354,15 @@
   [(set_attr "type" "neon_fp_minmax_s<q>")]
 )
 
+(define_insn "<maxmin_strict><mode>3"
+  [(set (match_operand:VCVTF 0 "s_register_operand" "=w")
+	(unspec:VCVTF [(match_operand:VCVTF 1 "s_register_operand" "w")
+		       (match_operand:VCVTF 2 "s_register_operand" "w")]
+		       VMAXMINF_STRICT))]
+  "TARGET_NEON && TARGET_FPU_ARMV8"
+  "<maxmin_strict_op>.<V_s_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+)
+
 (define_expand "neon_vpadd<mode>"
   [(match_operand:VD 0 "s_register_operand" "=w")
    (match_operand:VD 1 "s_register_operand" "w")
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index 0ec2c48..83094d5 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -224,8 +224,10 @@
   UNSPEC_VLD4_LANE
   UNSPEC_VMAX
   UNSPEC_VMAX_U
+  UNSPEC_VMAX_STRICT
   UNSPEC_VMIN
   UNSPEC_VMIN_U
+  UNSPEC_VMIN_STRICT
   UNSPEC_VMLA
   UNSPEC_VMLA_LANE
   UNSPEC_VMLAL_S
diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
index f62ff79..351af4f 100644
--- a/gcc/config/arm/vfp.md
+++ b/gcc/config/arm/vfp.md
@@ -1345,6 +1345,15 @@
    (set_attr "conds" "unconditional")]
 )
 
+(define_insn "<maxmin_strict><mode>3"
+  [(set (match_operand:SDF 0 "s_register_operand" "=<F_constraint>")
+	(unspec:SDF [(match_operand:SDF 1 "s_register_operand" "<F_constraint>")
+		     (match_operand:SDF 2 "s_register_operand" "<F_constraint>")]
+		     VMAXMINF_STRICT))]
+  "TARGET_HARD_FLOAT && TARGET_VFP5 <vfp_double_cond>"
+  "<maxmin_strict_op>.<V_if_elem>\\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+)
+
 ;; Write Floating-point Status and Control Register.
 (define_insn "set_fpscr"
   [(unspec_volatile [(match_operand:SI 0 "register_operand" "r")] VUNSPEC_SET_FPSCR)]
diff --git a/gcc/doc/generic.texi b/gcc/doc/generic.texi
index bbafad9..8dad9a7 100644
--- a/gcc/doc/generic.texi
+++ b/gcc/doc/generic.texi
@@ -1268,6 +1268,8 @@ the byte offset of the field, but should not be used directly; call
 @tindex TARGET_EXPR
 @tindex VA_ARG_EXPR
 @tindex ANNOTATE_EXPR
+@tindex STRICT_MAX_EXPR
+@tindex STRICT_MIN_EXPR
 
 @table @code
 @item NEGATE_EXPR
@@ -1687,8 +1689,16 @@ its sole argument yields the representation for @code{ap}.
 This node is used to attach markers to an expression. The first operand
 is the annotated expression, the second is an @code{INTEGER_CST} with
 a value from @code{enum annot_expr_kind}.
-@end table
 
+@item STRICT_MAX_EXPR
+@item STRICT_MIN_EXPR
+These nodes represent IEEE-conformant maximum and minimum operations.  If either
+operand is a quiet @code{NaN} the other operand is returned.  If both operands
+are quiet @code{NaN}, then a quiet @code{NaN} is returned.  In the case when gcc
+supports signalling @code{NaN} (-fsignaling-nans) an invalid floating point
+exception is raised and a quiet @code{NaN} is returned.
+
+@end table
 
 @node Vectors
 @subsection Vectors
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index e991286..f1c3417 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -4869,6 +4869,15 @@ Signed minimum and maximum operations.  When used with floating point,
 if both operands are zeros, or if either operand is @code{NaN}, then
 it is unspecified which of the two operands is returned as the result.
 
+@cindex @code{strict_min@var{m}3} instruction pattern
+@cindex @code{strict_max@var{m}3} instruction pattern
+@item @samp{strict_min@var{m}3}, @samp{strict_max@var{m}3}
+IEEE-conformant minimum and maximum operations.  If one operand is a quiet
+@code{NaN}, then the other operand is returned.  If both operands are quiet
+@code{NaN}, then a quiet @code{NaN} is returned.  In the case when gcc supports
+signalling @code{NaN} (-fsignaling-nans) an invalid floating point exception is
+raised and a quiet @code{NaN} is returned.
+
 @cindex @code{reduc_smin_@var{m}} instruction pattern
 @cindex @code{reduc_smax_@var{m}} instruction pattern
 @item @samp{reduc_smin_@var{m}}, @samp{reduc_smax_@var{m}}
diff --git a/gcc/expr.c b/gcc/expr.c
index 78904c2..e2adb01 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -8729,6 +8729,8 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
       return expand_abs (mode, op0, target, unsignedp,
 			 safe_from_p (target, treeop0, 1));
 
+    case STRICT_MAX_EXPR:
+    case STRICT_MIN_EXPR:
     case MAX_EXPR:
     case MIN_EXPR:
       target = original_target;
diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 60aa210..143f457 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -1164,6 +1164,8 @@ const_binop (enum tree_code code, tree arg1, tree arg2)
 	case RDIV_EXPR:
 	case MIN_EXPR:
 	case MAX_EXPR:
+	case STRICT_MIN_EXPR:
+	case STRICT_MAX_EXPR:
 	  break;
 
 	default:
@@ -9872,7 +9874,8 @@ fold_binary_loc (location_t loc,
      cases, the appropriate type conversions should be put back in
      the tree that will get out of the constant folder.  */
 
-  if (kind == tcc_comparison || code == MIN_EXPR || code == MAX_EXPR)
+  if (kind == tcc_comparison || code == MIN_EXPR || code == MAX_EXPR
+      || code == STRICT_MIN_EXPR || code == STRICT_MAX_EXPR)
     {
       STRIP_SIGN_NOPS (arg0);
       STRIP_SIGN_NOPS (arg1);
@@ -14773,6 +14776,7 @@ tree_binary_nonnegative_warnv_p (enum tree_code code, tree type, tree op0,
 
     case BIT_AND_EXPR:
     case MAX_EXPR:
+    case STRICT_MAX_EXPR:
       return (tree_expr_nonnegative_warnv_p (op0,
 					     strict_overflow_p)
 	      || tree_expr_nonnegative_warnv_p (op1,
@@ -14781,6 +14785,7 @@ tree_binary_nonnegative_warnv_p (enum tree_code code, tree type, tree op0,
     case BIT_IOR_EXPR:
     case BIT_XOR_EXPR:
     case MIN_EXPR:
+    case STRICT_MIN_EXPR:
     case RDIV_EXPR:
     case TRUNC_DIV_EXPR:
     case CEIL_DIV_EXPR:
@@ -15235,6 +15240,7 @@ tree_binary_nonzero_warnv_p (enum tree_code code,
       break;
 
     case MIN_EXPR:
+    case STRICT_MIN_EXPR:
       sub_strict_overflow_p = false;
       if (tree_expr_nonzero_warnv_p (op0,
 				     &sub_strict_overflow_p)
@@ -15247,6 +15253,7 @@ tree_binary_nonzero_warnv_p (enum tree_code code,
       break;
 
     case MAX_EXPR:
+    case STRICT_MAX_EXPR:
       sub_strict_overflow_p = false;
       if (tree_expr_nonzero_warnv_p (op0,
 				     &sub_strict_overflow_p))
diff --git a/gcc/optabs.c b/gcc/optabs.c
index 491341b..ca642de 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -482,6 +482,12 @@ optab_for_tree_code (enum tree_code code, const_tree type,
     case MIN_EXPR:
       return TYPE_UNSIGNED (type) ? umin_optab : smin_optab;
 
+    case STRICT_MAX_EXPR:
+      return strict_max_optab;
+
+    case STRICT_MIN_EXPR:
+      return strict_min_optab;
+
     case REALIGN_LOAD_EXPR:
       return vec_realign_load_optab;
 
@@ -6798,6 +6804,16 @@ expand_vec_perm (machine_mode mode, rtx v0, rtx v1, rtx sel, rtx target)
   return tmp;
 }
 
+/* Return true if the target supports strict math max (MAX = TRUE) and min
+   (MAX = FALSE) operations on type TYPE.  */
+bool
+strict_minmax_support (tree type, bool max)
+{
+  optab optab = optab_for_tree_code
+    (max ? STRICT_MAX_EXPR : STRICT_MIN_EXPR, type, optab_default);
+  return optab_handler (optab, TYPE_MODE (type)) != CODE_FOR_nothing;
+}
+
 /* Return insn code for a conditional operator with a comparison in
    mode CMODE, unsigned if UNS is true, resulting in a value of mode VMODE.  */
 
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 888b21c..7a79e76 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -244,6 +244,10 @@ OPTAB_D (sin_optab, "sin$a2")
 OPTAB_D (sincos_optab, "sincos$a3")
 OPTAB_D (tan_optab, "tan$a2")
 
+/* C99 implementations of fmax/fmin.  */
+OPTAB_D (strict_max_optab, "strict_max$a3")
+OPTAB_D (strict_min_optab, "strict_min$a3")
+
 /* Vector reduction to a scalar.  */
 OPTAB_D (reduc_smax_scal_optab, "reduc_smax_scal_$a")
 OPTAB_D (reduc_smin_scal_optab, "reduc_smin_scal_$a")
diff --git a/gcc/optabs.h b/gcc/optabs.h
index 95f5cbc..14b7a39 100644
--- a/gcc/optabs.h
+++ b/gcc/optabs.h
@@ -565,4 +565,6 @@ extern bool lshift_cheap_p (bool);
 
 extern enum rtx_code get_rtx_code (enum tree_code tcode, bool unsignedp);
 
+extern bool strict_minmax_support (tree, bool);
+
 #endif /* GCC_OPTABS_H */
diff --git a/gcc/real.c b/gcc/real.c
index 2d34b62..aa2f63c 100644
--- a/gcc/real.c
+++ b/gcc/real.c
@@ -1034,6 +1034,15 @@ real_arithmetic (REAL_VALUE_TYPE *r, int icode, const REAL_VALUE_TYPE *op0,
 	*r = *op1;
       break;
 
+    case STRICT_MIN_EXPR:
+      if (op0->cl == rvc_nan)
+	*r = *op1;
+      else if (do_compare (op0, op1, -1) < 0)
+	*r = *op0;
+      else
+	*r = *op1;
+      break;
+
     case MAX_EXPR:
       if (op1->cl == rvc_nan)
 	*r = *op1;
@@ -1043,6 +1052,15 @@ real_arithmetic (REAL_VALUE_TYPE *r, int icode, const REAL_VALUE_TYPE *op0,
 	*r = *op0;
       break;
 
+    case STRICT_MAX_EXPR:
+      if (op0->cl == rvc_nan)
+	*r = *op1;
+      else if (do_compare (op0, op1, 1) < 0)
+	*r = *op1;
+      else
+	*r = *op0;
+      break;
+
     case NEGATE_EXPR:
       *r = *op0;
       r->sign ^= 1;
diff --git a/gcc/testsuite/gcc.target/aarch64/maxmin_strict.c b/gcc/testsuite/gcc.target/aarch64/maxmin_strict.c
new file mode 100644
index 0000000..09cea1d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/maxmin_strict.c
@@ -0,0 +1,69 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -ftree-vectorize -fno-inline -save-temps" } */
+
+
+extern void abort (void);
+double fmax(double, double);
+float fmaxf(float, float);
+double fmin(double, double);
+float fminf(float, float);
+
+#define isnan __builtin_isnan
+#define isinf __builtin_isinf
+
+#define NAN __builtin_nan ("")
+#define INFINITY __builtin_inf ()
+
+#define NUM_ELEMS(TYPE) (16 / sizeof (TYPE))
+
+#define DEF_MAXMIN(TYPE,FUN)\
+void test_##FUN (TYPE *__restrict__ r, TYPE *__restrict__ a,\
+		 TYPE *__restrict__ b)\
+{\
+  int i;\
+  for (i = 0; i < NUM_ELEMS (TYPE); i++)\
+    r[i] = FUN (a[i], b[i]);\
+}\
+
+DEF_MAXMIN (float, fmaxf)
+DEF_MAXMIN (double, fmax)
+
+DEF_MAXMIN (float, fminf)
+DEF_MAXMIN (double, fmin)
+
+int main ()
+{
+  float a_f[4] = { 4, NAN, -3, INFINITY };
+  float b_f[4] = { 1,   7,NAN, 0 };
+  float r_f[4];
+  double a_d[4] = { 4, NAN,  -3,  INFINITY };
+  double b_d[4] = { 1,   7, NAN,  0 };
+  double r_d[4];
+
+  test_fmaxf (r_f, a_f, b_f);
+  if (r_f[0] != 4 || isnan (r_f[1]) || isnan (r_f[2]) || !isinf (r_f[3]))
+    abort ();
+
+  test_fminf (r_f, a_f, b_f);
+  if (r_f[0] != 1 || isnan (r_f[1]) || isnan (r_f[2]) || isinf (r_f[3]))
+    abort ();
+
+  test_fmax (r_d, a_d, b_d);
+  test_fmax (&r_d[2], &a_d[2], &b_d[2]);
+  if (r_d[0] != 4 || isnan (r_d[1]) || isnan (r_d[2]) || !isinf (r_d[3]))
+    abort ();
+
+  test_fmin (r_d, a_d, b_d);
+  test_fmin (&r_d[2], &a_d[2], &b_d[2]);
+  if (r_d[0] != 1 || isnan (r_d[1]) || isnan (r_d[2]) || isinf (r_d[3]))
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times "fmaxnm\tv\[0-9\]+\.4s, v\[0-9\]+\.4s, v\[0-9\]+\.4s" 1 } } */
+/* { dg-final { scan-assembler-times "fmaxnm\tv\[0-9\]+\.2d, v\[0-9\]+\.2d, v\[0-9\]+\.2d" 1 } } */
+
+/* { dg-final { scan-assembler-times "fminnm\tv\[0-9\]+\.4s, v\[0-9\]+\.4s, v\[0-9\]+\.4s" 1 } } */
+/* { dg-final { scan-assembler-times "fminnm\tv\[0-9\]+\.2d, v\[0-9\]+\.2d, v\[0-9\]+\.2d" 1 } } */
+
diff --git a/gcc/testsuite/gcc.target/arm/maxmin_strict.c b/gcc/testsuite/gcc.target/arm/maxmin_strict.c
new file mode 100644
index 0000000..aa1dd6c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/maxmin_strict.c
@@ -0,0 +1,67 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_neon_ok } */
+/* { dg-options "-O2 -ftree-vectorize -fno-inline -march=armv8-a -save-temps" } */
+/* { dg-add-options arm_v8_neon } */
+
+extern void abort (void);
+double fmax(double, double);
+float fmaxf(float, float);
+double fmin(double, double);
+float fminf(float, float);
+
+#define isnan __builtin_isnan
+#define isinf __builtin_isinf
+
+#define NAN __builtin_nan ("")
+#define INFINITY __builtin_inf ()
+
+#define DEF_MAXMIN(TYPE,FUN)\
+void test_##FUN (TYPE *__restrict__ r, TYPE *__restrict__ a,\
+		 TYPE *__restrict__ b)\
+{\
+  int i;\
+  for (i = 0; i < 4; i++)\
+    r[i] = FUN (a[i], b[i]);\
+}\
+
+DEF_MAXMIN (float, fmaxf)
+DEF_MAXMIN (double, fmax)
+
+DEF_MAXMIN (float, fminf)
+DEF_MAXMIN (double, fmin)
+
+int main ()
+{
+  float a_f[4] = { 4, NAN, -3, INFINITY };
+  float b_f[4] = { 1,   7,NAN, 0 };
+  float r_f[4];
+  double a_d[4] = { 4, NAN,  -3,  INFINITY };
+  double b_d[4] = { 1,   7, NAN,  0 };
+  double r_d[4];
+
+  test_fmaxf (r_f, a_f, b_f);
+  if (r_f[0] != 4 || isnan (r_f[1]) || isnan (r_f[2]) || !isinf (r_f[3]))
+    abort ();
+
+  test_fminf (r_f, a_f, b_f);
+  if (r_f[0] != 1 || isnan (r_f[1]) || isnan (r_f[2]) || isinf (r_f[3]))
+    abort ();
+
+  test_fmax (r_d, a_d, b_d);
+  if (r_d[0] != 4 || isnan (r_d[1]) || isnan (r_d[2]) || !isinf (r_d[3]))
+    abort ();
+
+  test_fmin (r_d, a_d, b_d);
+  if (r_d[0] != 1 || isnan (r_d[1]) || isnan (r_d[2]) || isinf (r_d[3]))
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times "vmaxnm.f32\tq\[0-9\]+, q\[0-9\]+, q\[0-9\]+" 1 } } */
+/* { dg-final { scan-assembler-times "vminnm.f32\tq\[0-9\]+, q\[0-9\]+, q\[0-9\]+" 1 } } */
+
+/* NOTE: There are no double precision vector versions of vmaxnm/vminnm.  */
+/* { dg-final { scan-assembler-times "vmaxnm.f64\td\[0-9\]+, d\[0-9\]+, d\[0-9\]+" 1 } } */
+/* { dg-final { scan-assembler-times "vminnm.f64\td\[0-9\]+, d\[0-9\]+, d\[0-9\]+" 1 } } */
+
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index adc56ba..f717e37 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -3091,6 +3091,8 @@ verify_expr (tree *tp, int *walk_subtrees, void *data ATTRIBUTE_UNUSED)
     case EXACT_DIV_EXPR:
     case MIN_EXPR:
     case MAX_EXPR:
+    case STRICT_MIN_EXPR:
+    case STRICT_MAX_EXPR:
     case LSHIFT_EXPR:
     case RSHIFT_EXPR:
     case LROTATE_EXPR:
@@ -3916,6 +3918,8 @@ verify_gimple_assign_binary (gassign *stmt)
     case EXACT_DIV_EXPR:
     case MIN_EXPR:
     case MAX_EXPR:
+    case STRICT_MIN_EXPR:
+    case STRICT_MAX_EXPR:
     case BIT_IOR_EXPR:
     case BIT_XOR_EXPR:
     case BIT_AND_EXPR:
diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
index ce9495d..1b95154 100644
--- a/gcc/tree-inline.c
+++ b/gcc/tree-inline.c
@@ -3888,6 +3888,8 @@ estimate_operator_cost (enum tree_code code, eni_weights *weights,
     case FLOAT_EXPR:
     case MIN_EXPR:
     case MAX_EXPR:
+    case STRICT_MIN_EXPR:
+    case STRICT_MAX_EXPR:
     case ABS_EXPR:
 
     case LSHIFT_EXPR:
diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c
index 13587e6..6d13fd2 100644
--- a/gcc/tree-pretty-print.c
+++ b/gcc/tree-pretty-print.c
@@ -2849,6 +2849,8 @@ dump_generic_node (pretty_printer *pp, tree node, int spc, int flags,
       pp_string (pp, " > ");
       break;
 
+    case STRICT_MIN_EXPR:
+    case STRICT_MAX_EXPR:
     case VEC_WIDEN_MULT_HI_EXPR:
     case VEC_WIDEN_MULT_LO_EXPR:
     case VEC_WIDEN_MULT_EVEN_EXPR:
@@ -3223,6 +3225,8 @@ op_code_prio (enum tree_code code)
       /* Special expressions.  */
     case MIN_EXPR:
     case MAX_EXPR:
+    case STRICT_MIN_EXPR:
+    case STRICT_MAX_EXPR:
     case ABS_EXPR:
     case REALPART_EXPR:
     case IMAGPART_EXPR:
@@ -3419,6 +3423,12 @@ op_symbol_code (enum tree_code code)
     case MIN_EXPR:
       return "min";
 
+    case STRICT_MAX_EXPR:
+      return "strictmax";
+
+    case STRICT_MIN_EXPR:
+      return "strictmin";
+
     default:
       return "<<< ??? >>>";
     }
diff --git a/gcc/tree.c b/gcc/tree.c
index f6ab441..2d6b909 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -7529,6 +7529,8 @@ associative_tree_code (enum tree_code code)
     case MULT_EXPR:
     case MIN_EXPR:
     case MAX_EXPR:
+    case STRICT_MIN_EXPR:
+    case STRICT_MAX_EXPR:
       return true;
 
     default:
@@ -7549,6 +7551,8 @@ commutative_tree_code (enum tree_code code)
     case MULT_HIGHPART_EXPR:
     case MIN_EXPR:
     case MAX_EXPR:
+    case STRICT_MIN_EXPR:
+    case STRICT_MAX_EXPR:
     case BIT_IOR_EXPR:
     case BIT_XOR_EXPR:
     case BIT_AND_EXPR:
diff --git a/gcc/tree.def b/gcc/tree.def
index 56580af..daa4c77 100644
--- a/gcc/tree.def
+++ b/gcc/tree.def
@@ -722,6 +722,14 @@ DEFTREECODE (NEGATE_EXPR, "negate_expr", tcc_unary, 1)
 DEFTREECODE (MIN_EXPR, "min_expr", tcc_binary, 2)
 DEFTREECODE (MAX_EXPR, "max_expr", tcc_binary, 2)
 
+/* Minimum and maximum values, but when used with floating point it conforms to
+   the C99 definition of fmax and fmin, i.e.
+     1. if one operand is NaN the other numeric value is returned,
+     2. if both operands are NaN then a NaN is returned,
+     3. there is no distinction between -0 and 0.  */
+DEFTREECODE (STRICT_MIN_EXPR, "strict_min_expr", tcc_binary, 2)
+DEFTREECODE (STRICT_MAX_EXPR, "strict_max_expr", tcc_binary, 2)
+
 /* Represents the absolute value of the operand.
 
    An ABS_EXPR must have either an INTEGER_TYPE or a REAL_TYPE.  The

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PING][Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax*
  2015-08-13 10:13 David Sherwood
@ 2015-08-13 11:12 ` Richard Biener
  2015-08-17  9:41   ` David Sherwood
  0 siblings, 1 reply; 24+ messages in thread
From: Richard Biener @ 2015-08-13 11:12 UTC (permalink / raw)
  To: David Sherwood; +Cc: GCC Patches

On Thu, Aug 13, 2015 at 12:11 PM, David Sherwood <david.sherwood@arm.com> wrote:
> Hi,
>
> Sorry to bother people again. Is this OK to go now?

Hmm, why don't you go the vectorized function call path for this,
implementing the builtin_vectorized_function target hook?

Richard.

> Thanks!
> David.
>
>> >
>> > > On Mon, 29 Jun 2015, David Sherwood wrote:
>> > >
>> > > > Hi,
>> > > >
>> > > > I have added new STRICT_MAX_EXPR and STRICT_MIN_EXPR expressions to support the
>> > > > IEEE versions of fmin and fmax. This is done by recognising the math library
>> > > > "fmax" and "fmin" builtin functions in a similar way to how this is done for
>> > > > -ffast-math. This also allows us to vectorise the IEEE max/min functions for
>> > > > targets that support it, for example aarch64/aarch32.
>> > >
>> > > This patch is missing documentation.  You need to document the new insn
>> > > patterns in md.texi and the new tree codes in generic.texi.
>> >
>> > Hi, I've uploaded a new patch with the documentation. Hope this is ok.
>>
>> In various places where you refer to one operand being NaN, I think you
>> mean one operand being a *quiet* NaN (if one is a signaling NaN - only
>> supported by GCC if -fsignaling-nans - the IEEE minNum and maxNum
>> operations raise "invalid" and return a quiet NaN).
>
> Hi, I have a new patch that hopefully addresses the documentation issues.
>
> Thanks,
> David.
>
> ChangeLog:
>
> 2015-07-15  David Sherwood  <david.sherwood@arm.com>
>
> gcc/
>     * builtins.c (integer_valued_real_p): Add STRICT_MIN_EXPR and
>     STRICT_MAX_EXPR.
>     (fold_builtin_fmin_fmax): For strict math, convert builting fmin and
>     fmax to STRICT_MIN_EXPR and STRICT_MIN_EXPR, respectively.
>     * expr.c (expand_expr_real_2): Add STRICT_MIN_EXPR and STRICT_MAX_EXPR.
>     * fold-const.c (const_binop): Likewise.
>     (fold_binary_loc, tree_binary_nonnegative_warnv_p): Likewise.
>     (tree_binary_nonzero_warnv_p): Likewise.
>     * optabs.h (strict_minmax_support): Declare.
>     * optabs.def: Add new optabs strict_max_optab/strict_min_optab.
>     * optabs.c (optab_for_tree_code): Return new optabs for STRICT_MIN_EXPR
>     and STRICT_MAX_EXPR.
>     (strict_minmax_support): New function.
>     * real.c (real_arithmetic): Add STRICT_MIN_EXPR and STRICT_MAX_EXPR.
>     * tree.def: Likewise.
>     * tree.c (associative_tree_code, commutative_tree_code): Likewise.
>     * tree-cfg.c (verify_expr): Likewise.
>     (verify_gimple_assign_binary): Likewise.
>     * tree-inline.c (estimate_operator_cost): Likewise.
>     * tree-pretty-print.c (dump_generic_node, op_code_prio): Likewise.
>     (op_symbol_code): Likewise.
> gcc/config:
>     * aarch64/aarch64.md: New pattern.
>     * aarch64/aarch64-simd.md: Likewise.
>     * aarch64/iterators.md: New unspecs, iterators.
>     * arm/iterators.md: New iterators.
>     * arm/unspecs.md: New unspecs.
>     * arm/neon.md: New pattern.
>     * arm/vfp.md: Likewise.
> gcc/doc:
>     * generic.texi: Add STRICT_MAX_EXPR and STRICT_MIN_EXPR.
>     * md.texi: Add strict_min and strict_max patterns.
> gcc/testsuite
>     * gcc.target/aarch64/maxmin_strict.c: New test.
>     * gcc.target/arm/maxmin_strict.c: New test.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PING][Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax*
  2015-08-13 11:12 ` Richard Biener
@ 2015-08-17  9:41   ` David Sherwood
  2015-08-17 14:02     ` Richard Biener
  0 siblings, 1 reply; 24+ messages in thread
From: David Sherwood @ 2015-08-17  9:41 UTC (permalink / raw)
  To: 'Richard Biener'; +Cc: GCC Patches

Hi Richard,

Thanks for the reply. I'd chosen to add new expressions as this seemed more
consistent with the existing MAX_EXPR and MIN_EXPR tree codes. In addition it
would seem to provide more opportunities for optimisation than a target-specific
builtin implementation would. I accept that optimisation opportunities will
be more limited for strict math compilation, but that it was still worth having
them. Also, if we did map it to builtins then the scalar version would go
through the optabs and the vector version would go through the target's builtin
expansion, which doesn't seem very consistent.

Regards,
David.

> -----Original Message-----
> From: Richard Biener [mailto:richard.guenther@gmail.com]
> Sent: 13 August 2015 12:10
> To: David Sherwood
> Cc: GCC Patches
> Subject: Re: [PING][Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax*
> 
> On Thu, Aug 13, 2015 at 12:11 PM, David Sherwood <david.sherwood@arm.com> wrote:
> > Hi,
> >
> > Sorry to bother people again. Is this OK to go now?
> 
> Hmm, why don't you go the vectorized function call path for this,
> implementing the builtin_vectorized_function target hook?
> 
> Richard.
> 
> > Thanks!
> > David.
> >
> >> >
> >> > > On Mon, 29 Jun 2015, David Sherwood wrote:
> >> > >
> >> > > > Hi,
> >> > > >
> >> > > > I have added new STRICT_MAX_EXPR and STRICT_MIN_EXPR expressions to support the
> >> > > > IEEE versions of fmin and fmax. This is done by recognising the math library
> >> > > > "fmax" and "fmin" builtin functions in a similar way to how this is done for
> >> > > > -ffast-math. This also allows us to vectorise the IEEE max/min functions for
> >> > > > targets that support it, for example aarch64/aarch32.
> >> > >
> >> > > This patch is missing documentation.  You need to document the new insn
> >> > > patterns in md.texi and the new tree codes in generic.texi.
> >> >
> >> > Hi, I've uploaded a new patch with the documentation. Hope this is ok.
> >>
> >> In various places where you refer to one operand being NaN, I think you
> >> mean one operand being a *quiet* NaN (if one is a signaling NaN - only
> >> supported by GCC if -fsignaling-nans - the IEEE minNum and maxNum
> >> operations raise "invalid" and return a quiet NaN).
> >
> > Hi, I have a new patch that hopefully addresses the documentation issues.
> >
> > Thanks,
> > David.
> >
> > ChangeLog:
> >
> > 2015-07-15  David Sherwood  <david.sherwood@arm.com>
> >
> > gcc/
> >     * builtins.c (integer_valued_real_p): Add STRICT_MIN_EXPR and
> >     STRICT_MAX_EXPR.
> >     (fold_builtin_fmin_fmax): For strict math, convert builting fmin and
> >     fmax to STRICT_MIN_EXPR and STRICT_MIN_EXPR, respectively.
> >     * expr.c (expand_expr_real_2): Add STRICT_MIN_EXPR and STRICT_MAX_EXPR.
> >     * fold-const.c (const_binop): Likewise.
> >     (fold_binary_loc, tree_binary_nonnegative_warnv_p): Likewise.
> >     (tree_binary_nonzero_warnv_p): Likewise.
> >     * optabs.h (strict_minmax_support): Declare.
> >     * optabs.def: Add new optabs strict_max_optab/strict_min_optab.
> >     * optabs.c (optab_for_tree_code): Return new optabs for STRICT_MIN_EXPR
> >     and STRICT_MAX_EXPR.
> >     (strict_minmax_support): New function.
> >     * real.c (real_arithmetic): Add STRICT_MIN_EXPR and STRICT_MAX_EXPR.
> >     * tree.def: Likewise.
> >     * tree.c (associative_tree_code, commutative_tree_code): Likewise.
> >     * tree-cfg.c (verify_expr): Likewise.
> >     (verify_gimple_assign_binary): Likewise.
> >     * tree-inline.c (estimate_operator_cost): Likewise.
> >     * tree-pretty-print.c (dump_generic_node, op_code_prio): Likewise.
> >     (op_symbol_code): Likewise.
> > gcc/config:
> >     * aarch64/aarch64.md: New pattern.
> >     * aarch64/aarch64-simd.md: Likewise.
> >     * aarch64/iterators.md: New unspecs, iterators.
> >     * arm/iterators.md: New iterators.
> >     * arm/unspecs.md: New unspecs.
> >     * arm/neon.md: New pattern.
> >     * arm/vfp.md: Likewise.
> > gcc/doc:
> >     * generic.texi: Add STRICT_MAX_EXPR and STRICT_MIN_EXPR.
> >     * md.texi: Add strict_min and strict_max patterns.
> > gcc/testsuite
> >     * gcc.target/aarch64/maxmin_strict.c: New test.
> >     * gcc.target/arm/maxmin_strict.c: New test.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PING][Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax*
  2015-08-17  9:41   ` David Sherwood
@ 2015-08-17 14:02     ` Richard Biener
  2015-08-18 11:10       ` David Sherwood
  0 siblings, 1 reply; 24+ messages in thread
From: Richard Biener @ 2015-08-17 14:02 UTC (permalink / raw)
  To: David Sherwood; +Cc: GCC Patches

On Mon, Aug 17, 2015 at 11:29 AM, David Sherwood <david.sherwood@arm.com> wrote:
> Hi Richard,
>
> Thanks for the reply. I'd chosen to add new expressions as this seemed more
> consistent with the existing MAX_EXPR and MIN_EXPR tree codes. In addition it
> would seem to provide more opportunities for optimisation than a target-specific
> builtin implementation would. I accept that optimisation opportunities will
> be more limited for strict math compilation, but that it was still worth having
> them. Also, if we did map it to builtins then the scalar version would go
> through the optabs and the vector version would go through the target's builtin
> expansion, which doesn't seem very consistent.

On another note ISTR you can't associate STRICT_MIN/MAX_EXPR and thus
you can't vectorize anyway?  (strict IEEE behavior is about NaNs, correct?)

Richard.

> Regards,
> David.
>
>> -----Original Message-----
>> From: Richard Biener [mailto:richard.guenther@gmail.com]
>> Sent: 13 August 2015 12:10
>> To: David Sherwood
>> Cc: GCC Patches
>> Subject: Re: [PING][Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax*
>>
>> On Thu, Aug 13, 2015 at 12:11 PM, David Sherwood <david.sherwood@arm.com> wrote:
>> > Hi,
>> >
>> > Sorry to bother people again. Is this OK to go now?
>>
>> Hmm, why don't you go the vectorized function call path for this,
>> implementing the builtin_vectorized_function target hook?
>>
>> Richard.
>>
>> > Thanks!
>> > David.
>> >
>> >> >
>> >> > > On Mon, 29 Jun 2015, David Sherwood wrote:
>> >> > >
>> >> > > > Hi,
>> >> > > >
>> >> > > > I have added new STRICT_MAX_EXPR and STRICT_MIN_EXPR expressions to support the
>> >> > > > IEEE versions of fmin and fmax. This is done by recognising the math library
>> >> > > > "fmax" and "fmin" builtin functions in a similar way to how this is done for
>> >> > > > -ffast-math. This also allows us to vectorise the IEEE max/min functions for
>> >> > > > targets that support it, for example aarch64/aarch32.
>> >> > >
>> >> > > This patch is missing documentation.  You need to document the new insn
>> >> > > patterns in md.texi and the new tree codes in generic.texi.
>> >> >
>> >> > Hi, I've uploaded a new patch with the documentation. Hope this is ok.
>> >>
>> >> In various places where you refer to one operand being NaN, I think you
>> >> mean one operand being a *quiet* NaN (if one is a signaling NaN - only
>> >> supported by GCC if -fsignaling-nans - the IEEE minNum and maxNum
>> >> operations raise "invalid" and return a quiet NaN).
>> >
>> > Hi, I have a new patch that hopefully addresses the documentation issues.
>> >
>> > Thanks,
>> > David.
>> >
>> > ChangeLog:
>> >
>> > 2015-07-15  David Sherwood  <david.sherwood@arm.com>
>> >
>> > gcc/
>> >     * builtins.c (integer_valued_real_p): Add STRICT_MIN_EXPR and
>> >     STRICT_MAX_EXPR.
>> >     (fold_builtin_fmin_fmax): For strict math, convert builting fmin and
>> >     fmax to STRICT_MIN_EXPR and STRICT_MIN_EXPR, respectively.
>> >     * expr.c (expand_expr_real_2): Add STRICT_MIN_EXPR and STRICT_MAX_EXPR.
>> >     * fold-const.c (const_binop): Likewise.
>> >     (fold_binary_loc, tree_binary_nonnegative_warnv_p): Likewise.
>> >     (tree_binary_nonzero_warnv_p): Likewise.
>> >     * optabs.h (strict_minmax_support): Declare.
>> >     * optabs.def: Add new optabs strict_max_optab/strict_min_optab.
>> >     * optabs.c (optab_for_tree_code): Return new optabs for STRICT_MIN_EXPR
>> >     and STRICT_MAX_EXPR.
>> >     (strict_minmax_support): New function.
>> >     * real.c (real_arithmetic): Add STRICT_MIN_EXPR and STRICT_MAX_EXPR.
>> >     * tree.def: Likewise.
>> >     * tree.c (associative_tree_code, commutative_tree_code): Likewise.
>> >     * tree-cfg.c (verify_expr): Likewise.
>> >     (verify_gimple_assign_binary): Likewise.
>> >     * tree-inline.c (estimate_operator_cost): Likewise.
>> >     * tree-pretty-print.c (dump_generic_node, op_code_prio): Likewise.
>> >     (op_symbol_code): Likewise.
>> > gcc/config:
>> >     * aarch64/aarch64.md: New pattern.
>> >     * aarch64/aarch64-simd.md: Likewise.
>> >     * aarch64/iterators.md: New unspecs, iterators.
>> >     * arm/iterators.md: New iterators.
>> >     * arm/unspecs.md: New unspecs.
>> >     * arm/neon.md: New pattern.
>> >     * arm/vfp.md: Likewise.
>> > gcc/doc:
>> >     * generic.texi: Add STRICT_MAX_EXPR and STRICT_MIN_EXPR.
>> >     * md.texi: Add strict_min and strict_max patterns.
>> > gcc/testsuite
>> >     * gcc.target/aarch64/maxmin_strict.c: New test.
>> >     * gcc.target/arm/maxmin_strict.c: New test.
>
>
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PING][Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax*
  2015-08-17 14:02     ` Richard Biener
@ 2015-08-18 11:10       ` David Sherwood
  2015-08-18 13:31         ` Richard Biener
  0 siblings, 1 reply; 24+ messages in thread
From: David Sherwood @ 2015-08-18 11:10 UTC (permalink / raw)
  To: 'Richard Biener'; +Cc: GCC Patches

> On Mon, Aug 17, 2015 at 11:29 AM, David Sherwood <david.sherwood@arm.com> wrote:
> > Hi Richard,
> >
> > Thanks for the reply. I'd chosen to add new expressions as this seemed more
> > consistent with the existing MAX_EXPR and MIN_EXPR tree codes. In addition it
> > would seem to provide more opportunities for optimisation than a target-specific
> > builtin implementation would. I accept that optimisation opportunities will
> > be more limited for strict math compilation, but that it was still worth having
> > them. Also, if we did map it to builtins then the scalar version would go
> > through the optabs and the vector version would go through the target's builtin
> > expansion, which doesn't seem very consistent.
> 
> On another note ISTR you can't associate STRICT_MIN/MAX_EXPR and thus
> you can't vectorize anyway?  (strict IEEE behavior is about NaNs, correct?)
I thought for this particular case associativity wasn't an issue? We're not doing any
reductions here, just simply performing max/min operations on each pair of elements
in the vectors. I thought for IEEE-compliant behaviour we just need to ensure that for
each pair of elements if one element is a NaN we return the other one.

David.

> 
> Richard.
> 
> > Regards,
> > David.
> >
> >> -----Original Message-----
> >> From: Richard Biener [mailto:richard.guenther@gmail.com]
> >> Sent: 13 August 2015 12:10
> >> To: David Sherwood
> >> Cc: GCC Patches
> >> Subject: Re: [PING][Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax*
> >>
> >> On Thu, Aug 13, 2015 at 12:11 PM, David Sherwood <david.sherwood@arm.com> wrote:
> >> > Hi,
> >> >
> >> > Sorry to bother people again. Is this OK to go now?
> >>
> >> Hmm, why don't you go the vectorized function call path for this,
> >> implementing the builtin_vectorized_function target hook?
> >>
> >> Richard.
> >>
> >> > Thanks!
> >> > David.
> >> >
> >> >> >
> >> >> > > On Mon, 29 Jun 2015, David Sherwood wrote:
> >> >> > >
> >> >> > > > Hi,
> >> >> > > >
> >> >> > > > I have added new STRICT_MAX_EXPR and STRICT_MIN_EXPR expressions to support the
> >> >> > > > IEEE versions of fmin and fmax. This is done by recognising the math library
> >> >> > > > "fmax" and "fmin" builtin functions in a similar way to how this is done for
> >> >> > > > -ffast-math. This also allows us to vectorise the IEEE max/min functions for
> >> >> > > > targets that support it, for example aarch64/aarch32.
> >> >> > >
> >> >> > > This patch is missing documentation.  You need to document the new insn
> >> >> > > patterns in md.texi and the new tree codes in generic.texi.
> >> >> >
> >> >> > Hi, I've uploaded a new patch with the documentation. Hope this is ok.
> >> >>
> >> >> In various places where you refer to one operand being NaN, I think you
> >> >> mean one operand being a *quiet* NaN (if one is a signaling NaN - only
> >> >> supported by GCC if -fsignaling-nans - the IEEE minNum and maxNum
> >> >> operations raise "invalid" and return a quiet NaN).
> >> >
> >> > Hi, I have a new patch that hopefully addresses the documentation issues.
> >> >
> >> > Thanks,
> >> > David.
> >> >
> >> > ChangeLog:
> >> >
> >> > 2015-07-15  David Sherwood  <david.sherwood@arm.com>
> >> >
> >> > gcc/
> >> >     * builtins.c (integer_valued_real_p): Add STRICT_MIN_EXPR and
> >> >     STRICT_MAX_EXPR.
> >> >     (fold_builtin_fmin_fmax): For strict math, convert builting fmin and
> >> >     fmax to STRICT_MIN_EXPR and STRICT_MIN_EXPR, respectively.
> >> >     * expr.c (expand_expr_real_2): Add STRICT_MIN_EXPR and STRICT_MAX_EXPR.
> >> >     * fold-const.c (const_binop): Likewise.
> >> >     (fold_binary_loc, tree_binary_nonnegative_warnv_p): Likewise.
> >> >     (tree_binary_nonzero_warnv_p): Likewise.
> >> >     * optabs.h (strict_minmax_support): Declare.
> >> >     * optabs.def: Add new optabs strict_max_optab/strict_min_optab.
> >> >     * optabs.c (optab_for_tree_code): Return new optabs for STRICT_MIN_EXPR
> >> >     and STRICT_MAX_EXPR.
> >> >     (strict_minmax_support): New function.
> >> >     * real.c (real_arithmetic): Add STRICT_MIN_EXPR and STRICT_MAX_EXPR.
> >> >     * tree.def: Likewise.
> >> >     * tree.c (associative_tree_code, commutative_tree_code): Likewise.
> >> >     * tree-cfg.c (verify_expr): Likewise.
> >> >     (verify_gimple_assign_binary): Likewise.
> >> >     * tree-inline.c (estimate_operator_cost): Likewise.
> >> >     * tree-pretty-print.c (dump_generic_node, op_code_prio): Likewise.
> >> >     (op_symbol_code): Likewise.
> >> > gcc/config:
> >> >     * aarch64/aarch64.md: New pattern.
> >> >     * aarch64/aarch64-simd.md: Likewise.
> >> >     * aarch64/iterators.md: New unspecs, iterators.
> >> >     * arm/iterators.md: New iterators.
> >> >     * arm/unspecs.md: New unspecs.
> >> >     * arm/neon.md: New pattern.
> >> >     * arm/vfp.md: Likewise.
> >> > gcc/doc:
> >> >     * generic.texi: Add STRICT_MAX_EXPR and STRICT_MIN_EXPR.
> >> >     * md.texi: Add strict_min and strict_max patterns.
> >> > gcc/testsuite
> >> >     * gcc.target/aarch64/maxmin_strict.c: New test.
> >> >     * gcc.target/arm/maxmin_strict.c: New test.
> >
> >
> >



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PING][Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax*
  2015-08-18 11:10       ` David Sherwood
@ 2015-08-18 13:31         ` Richard Biener
  2015-08-18 14:20           ` Richard Sandiford
  0 siblings, 1 reply; 24+ messages in thread
From: Richard Biener @ 2015-08-18 13:31 UTC (permalink / raw)
  To: David Sherwood; +Cc: GCC Patches

On Tue, Aug 18, 2015 at 1:07 PM, David Sherwood <david.sherwood@arm.com> wrote:
>> On Mon, Aug 17, 2015 at 11:29 AM, David Sherwood <david.sherwood@arm.com> wrote:
>> > Hi Richard,
>> >
>> > Thanks for the reply. I'd chosen to add new expressions as this seemed more
>> > consistent with the existing MAX_EXPR and MIN_EXPR tree codes. In addition it
>> > would seem to provide more opportunities for optimisation than a target-specific
>> > builtin implementation would. I accept that optimisation opportunities will
>> > be more limited for strict math compilation, but that it was still worth having
>> > them. Also, if we did map it to builtins then the scalar version would go
>> > through the optabs and the vector version would go through the target's builtin
>> > expansion, which doesn't seem very consistent.
>>
>> On another note ISTR you can't associate STRICT_MIN/MAX_EXPR and thus
>> you can't vectorize anyway?  (strict IEEE behavior is about NaNs, correct?)
> I thought for this particular case associativity wasn't an issue? We're not doing any
> reductions here, just simply performing max/min operations on each pair of elements
> in the vectors. I thought for IEEE-compliant behaviour we just need to ensure that for
> each pair of elements if one element is a NaN we return the other one.

Hmm, true.  Ok, my comment still stands - I don't see that using a tree code is
the best thing to do here.  You can add fmin/max optabs and special expansion
of BUILT_IN_FMIN/MAX and you can use a target builtin for the
vectorized variant.

The reason I am pushing against a new tree code is that we'd have an awful
lot of similar codes when pushing other flag related IL specialities to actual
IL constructs.  And we still need to find a consistent way to do that.

Richard.

> David.
>
>>
>> Richard.
>>
>> > Regards,
>> > David.
>> >
>> >> -----Original Message-----
>> >> From: Richard Biener [mailto:richard.guenther@gmail.com]
>> >> Sent: 13 August 2015 12:10
>> >> To: David Sherwood
>> >> Cc: GCC Patches
>> >> Subject: Re: [PING][Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax*
>> >>
>> >> On Thu, Aug 13, 2015 at 12:11 PM, David Sherwood <david.sherwood@arm.com> wrote:
>> >> > Hi,
>> >> >
>> >> > Sorry to bother people again. Is this OK to go now?
>> >>
>> >> Hmm, why don't you go the vectorized function call path for this,
>> >> implementing the builtin_vectorized_function target hook?
>> >>
>> >> Richard.
>> >>
>> >> > Thanks!
>> >> > David.
>> >> >
>> >> >> >
>> >> >> > > On Mon, 29 Jun 2015, David Sherwood wrote:
>> >> >> > >
>> >> >> > > > Hi,
>> >> >> > > >
>> >> >> > > > I have added new STRICT_MAX_EXPR and STRICT_MIN_EXPR expressions to support the
>> >> >> > > > IEEE versions of fmin and fmax. This is done by recognising the math library
>> >> >> > > > "fmax" and "fmin" builtin functions in a similar way to how this is done for
>> >> >> > > > -ffast-math. This also allows us to vectorise the IEEE max/min functions for
>> >> >> > > > targets that support it, for example aarch64/aarch32.
>> >> >> > >
>> >> >> > > This patch is missing documentation.  You need to document the new insn
>> >> >> > > patterns in md.texi and the new tree codes in generic.texi.
>> >> >> >
>> >> >> > Hi, I've uploaded a new patch with the documentation. Hope this is ok.
>> >> >>
>> >> >> In various places where you refer to one operand being NaN, I think you
>> >> >> mean one operand being a *quiet* NaN (if one is a signaling NaN - only
>> >> >> supported by GCC if -fsignaling-nans - the IEEE minNum and maxNum
>> >> >> operations raise "invalid" and return a quiet NaN).
>> >> >
>> >> > Hi, I have a new patch that hopefully addresses the documentation issues.
>> >> >
>> >> > Thanks,
>> >> > David.
>> >> >
>> >> > ChangeLog:
>> >> >
>> >> > 2015-07-15  David Sherwood  <david.sherwood@arm.com>
>> >> >
>> >> > gcc/
>> >> >     * builtins.c (integer_valued_real_p): Add STRICT_MIN_EXPR and
>> >> >     STRICT_MAX_EXPR.
>> >> >     (fold_builtin_fmin_fmax): For strict math, convert builting fmin and
>> >> >     fmax to STRICT_MIN_EXPR and STRICT_MIN_EXPR, respectively.
>> >> >     * expr.c (expand_expr_real_2): Add STRICT_MIN_EXPR and STRICT_MAX_EXPR.
>> >> >     * fold-const.c (const_binop): Likewise.
>> >> >     (fold_binary_loc, tree_binary_nonnegative_warnv_p): Likewise.
>> >> >     (tree_binary_nonzero_warnv_p): Likewise.
>> >> >     * optabs.h (strict_minmax_support): Declare.
>> >> >     * optabs.def: Add new optabs strict_max_optab/strict_min_optab.
>> >> >     * optabs.c (optab_for_tree_code): Return new optabs for STRICT_MIN_EXPR
>> >> >     and STRICT_MAX_EXPR.
>> >> >     (strict_minmax_support): New function.
>> >> >     * real.c (real_arithmetic): Add STRICT_MIN_EXPR and STRICT_MAX_EXPR.
>> >> >     * tree.def: Likewise.
>> >> >     * tree.c (associative_tree_code, commutative_tree_code): Likewise.
>> >> >     * tree-cfg.c (verify_expr): Likewise.
>> >> >     (verify_gimple_assign_binary): Likewise.
>> >> >     * tree-inline.c (estimate_operator_cost): Likewise.
>> >> >     * tree-pretty-print.c (dump_generic_node, op_code_prio): Likewise.
>> >> >     (op_symbol_code): Likewise.
>> >> > gcc/config:
>> >> >     * aarch64/aarch64.md: New pattern.
>> >> >     * aarch64/aarch64-simd.md: Likewise.
>> >> >     * aarch64/iterators.md: New unspecs, iterators.
>> >> >     * arm/iterators.md: New iterators.
>> >> >     * arm/unspecs.md: New unspecs.
>> >> >     * arm/neon.md: New pattern.
>> >> >     * arm/vfp.md: Likewise.
>> >> > gcc/doc:
>> >> >     * generic.texi: Add STRICT_MAX_EXPR and STRICT_MIN_EXPR.
>> >> >     * md.texi: Add strict_min and strict_max patterns.
>> >> > gcc/testsuite
>> >> >     * gcc.target/aarch64/maxmin_strict.c: New test.
>> >> >     * gcc.target/arm/maxmin_strict.c: New test.
>> >
>> >
>> >
>
>
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PING][Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax*
  2015-08-18 13:31         ` Richard Biener
@ 2015-08-18 14:20           ` Richard Sandiford
  2015-08-19  9:48             ` Richard Biener
  0 siblings, 1 reply; 24+ messages in thread
From: Richard Sandiford @ 2015-08-18 14:20 UTC (permalink / raw)
  To: Richard Biener; +Cc: David Sherwood, GCC Patches

Richard Biener <richard.guenther@gmail.com> writes:
> On Tue, Aug 18, 2015 at 1:07 PM, David Sherwood <david.sherwood@arm.com> wrote:
>>> On Mon, Aug 17, 2015 at 11:29 AM, David Sherwood
>>> <david.sherwood@arm.com> wrote:
>>> > Hi Richard,
>>> >
>>> > Thanks for the reply. I'd chosen to add new expressions as this seemed more
>>> > consistent with the existing MAX_EXPR and MIN_EXPR tree codes. In
>>> > addition it
>>> > would seem to provide more opportunities for optimisation than a
>>> > target-specific
>>> > builtin implementation would. I accept that optimisation opportunities will
>>> > be more limited for strict math compilation, but that it was still
>>> > worth having
>>> > them. Also, if we did map it to builtins then the scalar version would go
>>> > through the optabs and the vector version would go through the
>>> > target's builtin
>>> > expansion, which doesn't seem very consistent.
>>>
>>> On another note ISTR you can't associate STRICT_MIN/MAX_EXPR and thus
>>> you can't vectorize anyway?  (strict IEEE behavior is about NaNs, correct?)
>> I thought for this particular case associativity wasn't an issue?
>> We're not doing any
>> reductions here, just simply performing max/min operations on each
>> pair of elements
>> in the vectors. I thought for IEEE-compliant behaviour we just need to
>> ensure that for
>> each pair of elements if one element is a NaN we return the other one.
>
> Hmm, true.  Ok, my comment still stands - I don't see that using a
> tree code is the best thing to do here.  You can add fmin/max optabs
> and special expansion of BUILT_IN_FMIN/MAX and you can use a target
> builtin for the vectorized variant.
>
> The reason I am pushing against a new tree code is that we'd have an
> awful lot of similar codes when pushing other flag related IL
> specialities to actual IL constructs.  And we still need to find a
> consistent way to do that.

In this case though the new code is really the "native" min/max operation
for fp, rather than some weird flag-dependent behaviour.  Maybe it's
a bit unfortunate that the non-strict min/max fp operation got mapped
to the generic MIN_EXPR and MAX_EXPR when the non-strict version is really
the flag-related modification.  The STRICT_* prefix is forced by that and
might make it seem like more of a special case than it really is.

If you're still not convinced, how about an internal function instead
of a built-in function, so that we can continue to use optabs for all
cases?  I'd really like to avoid forcing such a generic concept down to
target-specific builtins with target-specific expansion code, especially
when the same concept is exposed by target-independent code for scalars.

TBH though I'm not sure why an internal_fn value (or a target-specific
builtin enum value) is worse than a tree-code value, unless the limit
of the tree_code bitfield is in sight (maybe it is).

Thanks,
Richard

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PING][Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax*
  2015-08-18 14:20           ` Richard Sandiford
@ 2015-08-19  9:48             ` Richard Biener
  2015-08-19 10:04               ` Richard Sandiford
  2015-08-19 15:07               ` Michael Matz
  0 siblings, 2 replies; 24+ messages in thread
From: Richard Biener @ 2015-08-19  9:48 UTC (permalink / raw)
  To: Richard Biener, David Sherwood, GCC Patches, richard.sandiford

On Tue, Aug 18, 2015 at 4:15 PM, Richard Sandiford
<richard.sandiford@arm.com> wrote:
> Richard Biener <richard.guenther@gmail.com> writes:
>> On Tue, Aug 18, 2015 at 1:07 PM, David Sherwood <david.sherwood@arm.com> wrote:
>>>> On Mon, Aug 17, 2015 at 11:29 AM, David Sherwood
>>>> <david.sherwood@arm.com> wrote:
>>>> > Hi Richard,
>>>> >
>>>> > Thanks for the reply. I'd chosen to add new expressions as this seemed more
>>>> > consistent with the existing MAX_EXPR and MIN_EXPR tree codes. In
>>>> > addition it
>>>> > would seem to provide more opportunities for optimisation than a
>>>> > target-specific
>>>> > builtin implementation would. I accept that optimisation opportunities will
>>>> > be more limited for strict math compilation, but that it was still
>>>> > worth having
>>>> > them. Also, if we did map it to builtins then the scalar version would go
>>>> > through the optabs and the vector version would go through the
>>>> > target's builtin
>>>> > expansion, which doesn't seem very consistent.
>>>>
>>>> On another note ISTR you can't associate STRICT_MIN/MAX_EXPR and thus
>>>> you can't vectorize anyway?  (strict IEEE behavior is about NaNs, correct?)
>>> I thought for this particular case associativity wasn't an issue?
>>> We're not doing any
>>> reductions here, just simply performing max/min operations on each
>>> pair of elements
>>> in the vectors. I thought for IEEE-compliant behaviour we just need to
>>> ensure that for
>>> each pair of elements if one element is a NaN we return the other one.
>>
>> Hmm, true.  Ok, my comment still stands - I don't see that using a
>> tree code is the best thing to do here.  You can add fmin/max optabs
>> and special expansion of BUILT_IN_FMIN/MAX and you can use a target
>> builtin for the vectorized variant.
>>
>> The reason I am pushing against a new tree code is that we'd have an
>> awful lot of similar codes when pushing other flag related IL
>> specialities to actual IL constructs.  And we still need to find a
>> consistent way to do that.
>
> In this case though the new code is really the "native" min/max operation
> for fp, rather than some weird flag-dependent behaviour.  Maybe it's
> a bit unfortunate that the non-strict min/max fp operation got mapped
> to the generic MIN_EXPR and MAX_EXPR when the non-strict version is really
> the flag-related modification.  The STRICT_* prefix is forced by that and
> might make it seem like more of a special case than it really is.

In some sense.  But the "strict" version already has a builtin (just no
special expander in builtins.c).  We usually don't add 1:1 tree codes
for existing builtins (why have builtins at all then?).

> If you're still not convinced, how about an internal function instead
> of a built-in function, so that we can continue to use optabs for all
> cases?  I'd really like to avoid forcing such a generic concept down to
> target-specific builtins with target-specific expansion code, especially
> when the same concept is exposed by target-independent code for scalars.

The target builtin is for the vectorized variant - not all targets might have
that and we'd need to query the target about this.  So using a IFN would
mean adding a target hook for that query.

> TBH though I'm not sure why an internal_fn value (or a target-specific
> builtin enum value) is worse than a tree-code value, unless the limit
> of the tree_code bitfield is in sight (maybe it is).

I think tree_code is 64bits now.

Richard.

> Thanks,
> Richard
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PING][Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax*
  2015-08-19  9:48             ` Richard Biener
@ 2015-08-19 10:04               ` Richard Sandiford
  2015-08-19 10:31                 ` Richard Biener
  2015-08-19 15:07               ` Michael Matz
  1 sibling, 1 reply; 24+ messages in thread
From: Richard Sandiford @ 2015-08-19 10:04 UTC (permalink / raw)
  To: Richard Biener; +Cc: David Sherwood, GCC Patches

Richard Biener <richard.guenther@gmail.com> writes:
> On Tue, Aug 18, 2015 at 4:15 PM, Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>> Richard Biener <richard.guenther@gmail.com> writes:
>>> On Tue, Aug 18, 2015 at 1:07 PM, David Sherwood
>>> <david.sherwood@arm.com> wrote:
>>>>> On Mon, Aug 17, 2015 at 11:29 AM, David Sherwood
>>>>> <david.sherwood@arm.com> wrote:
>>>>> > Hi Richard,
>>>>> >
>>>>> > Thanks for the reply. I'd chosen to add new expressions as this
>>>>> > seemed more
>>>>> > consistent with the existing MAX_EXPR and MIN_EXPR tree codes. In
>>>>> > addition it
>>>>> > would seem to provide more opportunities for optimisation than a
>>>>> > target-specific
>>>>> > builtin implementation would. I accept that optimisation
>>>>> > opportunities will
>>>>> > be more limited for strict math compilation, but that it was still
>>>>> > worth having
>>>>> > them. Also, if we did map it to builtins then the scalar version would go
>>>>> > through the optabs and the vector version would go through the
>>>>> > target's builtin
>>>>> > expansion, which doesn't seem very consistent.
>>>>>
>>>>> On another note ISTR you can't associate STRICT_MIN/MAX_EXPR and thus
>>>>> you can't vectorize anyway?  (strict IEEE behavior is about NaNs, correct?)
>>>> I thought for this particular case associativity wasn't an issue?
>>>> We're not doing any
>>>> reductions here, just simply performing max/min operations on each
>>>> pair of elements
>>>> in the vectors. I thought for IEEE-compliant behaviour we just need to
>>>> ensure that for
>>>> each pair of elements if one element is a NaN we return the other one.
>>>
>>> Hmm, true.  Ok, my comment still stands - I don't see that using a
>>> tree code is the best thing to do here.  You can add fmin/max optabs
>>> and special expansion of BUILT_IN_FMIN/MAX and you can use a target
>>> builtin for the vectorized variant.
>>>
>>> The reason I am pushing against a new tree code is that we'd have an
>>> awful lot of similar codes when pushing other flag related IL
>>> specialities to actual IL constructs.  And we still need to find a
>>> consistent way to do that.
>>
>> In this case though the new code is really the "native" min/max operation
>> for fp, rather than some weird flag-dependent behaviour.  Maybe it's
>> a bit unfortunate that the non-strict min/max fp operation got mapped
>> to the generic MIN_EXPR and MAX_EXPR when the non-strict version is really
>> the flag-related modification.  The STRICT_* prefix is forced by that and
>> might make it seem like more of a special case than it really is.
>
> In some sense.  But the "strict" version already has a builtin (just no
> special expander in builtins.c).  We usually don't add 1:1 tree codes
> for existing builtins (why have builtins at all then?).

We still need the builtin to match the C function (and to allow direct
calls to __builtin_fmin, etc., which are occasionally useful).

>> If you're still not convinced, how about an internal function instead
>> of a built-in function, so that we can continue to use optabs for all
>> cases?  I'd really like to avoid forcing such a generic concept down to
>> target-specific builtins with target-specific expansion code, especially
>> when the same concept is exposed by target-independent code for scalars.
>
> The target builtin is for the vectorized variant - not all targets might have
> that and we'd need to query the target about this.  So using a IFN would
> mean adding a target hook for that query.

No, the idea is that if we have a tree code or an internal function, the
decision about whether we have target support is based on a query of the
optabs (just like it is for scalar, and for other vectorisable tree codes).
No new hooks are needed.

The patch checked for target support that way.

> > TBH though I'm not sure why an internal_fn value (or a target-specific
> > builtin enum value) is worse than a tree-code value, unless the limit
> > of the tree_code bitfield is in sight (maybe it is).
>
> I think tree_code is 64bits now.

Even better :-)

Thanks,
Richard

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PING][Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax*
  2015-08-19 10:04               ` Richard Sandiford
@ 2015-08-19 10:31                 ` Richard Biener
  2015-08-19 12:23                   ` Richard Sandiford
  0 siblings, 1 reply; 24+ messages in thread
From: Richard Biener @ 2015-08-19 10:31 UTC (permalink / raw)
  To: Richard Biener, David Sherwood, GCC Patches, richard.sandiford

On Wed, Aug 19, 2015 at 11:54 AM, Richard Sandiford
<richard.sandiford@arm.com> wrote:
> Richard Biener <richard.guenther@gmail.com> writes:
>> On Tue, Aug 18, 2015 at 4:15 PM, Richard Sandiford
>> <richard.sandiford@arm.com> wrote:
>>> Richard Biener <richard.guenther@gmail.com> writes:
>>>> On Tue, Aug 18, 2015 at 1:07 PM, David Sherwood
>>>> <david.sherwood@arm.com> wrote:
>>>>>> On Mon, Aug 17, 2015 at 11:29 AM, David Sherwood
>>>>>> <david.sherwood@arm.com> wrote:
>>>>>> > Hi Richard,
>>>>>> >
>>>>>> > Thanks for the reply. I'd chosen to add new expressions as this
>>>>>> > seemed more
>>>>>> > consistent with the existing MAX_EXPR and MIN_EXPR tree codes. In
>>>>>> > addition it
>>>>>> > would seem to provide more opportunities for optimisation than a
>>>>>> > target-specific
>>>>>> > builtin implementation would. I accept that optimisation
>>>>>> > opportunities will
>>>>>> > be more limited for strict math compilation, but that it was still
>>>>>> > worth having
>>>>>> > them. Also, if we did map it to builtins then the scalar version would go
>>>>>> > through the optabs and the vector version would go through the
>>>>>> > target's builtin
>>>>>> > expansion, which doesn't seem very consistent.
>>>>>>
>>>>>> On another note ISTR you can't associate STRICT_MIN/MAX_EXPR and thus
>>>>>> you can't vectorize anyway?  (strict IEEE behavior is about NaNs, correct?)
>>>>> I thought for this particular case associativity wasn't an issue?
>>>>> We're not doing any
>>>>> reductions here, just simply performing max/min operations on each
>>>>> pair of elements
>>>>> in the vectors. I thought for IEEE-compliant behaviour we just need to
>>>>> ensure that for
>>>>> each pair of elements if one element is a NaN we return the other one.
>>>>
>>>> Hmm, true.  Ok, my comment still stands - I don't see that using a
>>>> tree code is the best thing to do here.  You can add fmin/max optabs
>>>> and special expansion of BUILT_IN_FMIN/MAX and you can use a target
>>>> builtin for the vectorized variant.
>>>>
>>>> The reason I am pushing against a new tree code is that we'd have an
>>>> awful lot of similar codes when pushing other flag related IL
>>>> specialities to actual IL constructs.  And we still need to find a
>>>> consistent way to do that.
>>>
>>> In this case though the new code is really the "native" min/max operation
>>> for fp, rather than some weird flag-dependent behaviour.  Maybe it's
>>> a bit unfortunate that the non-strict min/max fp operation got mapped
>>> to the generic MIN_EXPR and MAX_EXPR when the non-strict version is really
>>> the flag-related modification.  The STRICT_* prefix is forced by that and
>>> might make it seem like more of a special case than it really is.
>>
>> In some sense.  But the "strict" version already has a builtin (just no
>> special expander in builtins.c).  We usually don't add 1:1 tree codes
>> for existing builtins (why have builtins at all then?).
>
> We still need the builtin to match the C function (and to allow direct
> calls to __builtin_fmin, etc., which are occasionally useful).
>
>>> If you're still not convinced, how about an internal function instead
>>> of a built-in function, so that we can continue to use optabs for all
>>> cases?  I'd really like to avoid forcing such a generic concept down to
>>> target-specific builtins with target-specific expansion code, especially
>>> when the same concept is exposed by target-independent code for scalars.
>>
>> The target builtin is for the vectorized variant - not all targets might have
>> that and we'd need to query the target about this.  So using a IFN would
>> mean adding a target hook for that query.
>
> No, the idea is that if we have a tree code or an internal function, the
> decision about whether we have target support is based on a query of the
> optabs (just like it is for scalar, and for other vectorisable tree codes).
> No new hooks are needed.
>
> The patch checked for target support that way.

Fair enough.  Still this means we should have tree codes for all builtins
that eventually are vectorized?  So why don't we have SIN_EXPR,
POW_EXPR (ok, I did argue and have patches for that in the past),
RINT_EXPR, SQRT_EXPR, etc?

This patch starts to go down that route which is why I ask for the
whole picture to be considered and hinted at the alternative implementation
which follows existing practice.  Add a expander in builtins.c, add an optab,
and eventual support to vectorized_function.

See for example ix86_builtin_vectorized_function which handles
sqrt, floor, ceil, etc. and even FMA (we only fold FMA to FMA_EXPR
if the target supports it for the scalar mode, so not sure if there is
any x86 ISA where it has vectorized FMA but not scalar FMA).

>> > TBH though I'm not sure why an internal_fn value (or a target-specific
>> > builtin enum value) is worse than a tree-code value, unless the limit
>> > of the tree_code bitfield is in sight (maybe it is).
>>
>> I think tree_code is 64bits now.
>
> Even better :-)

Yes.

I'm not against adding a corresponding tree code for all math builtin functions,
we just have to decide whether this is the way to go (and of course support
expanding those back to libcalls to libc/m rather than libgcc).  There are
also constraints on what kind of STRICT_FMIN_EXPR the compiler may
generate as the target may not be able to expand the long double variant
directly but needs a libcall but libm might not be linked or may not
have support
for it.  That would be a new thing compared to libgcc providing a fallback
for all other tree codes.

Richard.

> Thanks,
> Richard
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PING][Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax*
  2015-08-19 10:31                 ` Richard Biener
@ 2015-08-19 12:23                   ` Richard Sandiford
  2015-08-19 12:35                     ` Richard Biener
  0 siblings, 1 reply; 24+ messages in thread
From: Richard Sandiford @ 2015-08-19 12:23 UTC (permalink / raw)
  To: Richard Biener; +Cc: David Sherwood, GCC Patches

Richard Biener <richard.guenther@gmail.com> writes:
> On Wed, Aug 19, 2015 at 11:54 AM, Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>> Richard Biener <richard.guenther@gmail.com> writes:
>>> On Tue, Aug 18, 2015 at 4:15 PM, Richard Sandiford
>>> <richard.sandiford@arm.com> wrote:
>>>> Richard Biener <richard.guenther@gmail.com> writes:
>>>>> On Tue, Aug 18, 2015 at 1:07 PM, David Sherwood
>>>>> <david.sherwood@arm.com> wrote:
>>>>>>> On Mon, Aug 17, 2015 at 11:29 AM, David Sherwood
>>>>>>> <david.sherwood@arm.com> wrote:
>>>>>>> > Hi Richard,
>>>>>>> >
>>>>>>> > Thanks for the reply. I'd chosen to add new expressions as this
>>>>>>> > seemed more
>>>>>>> > consistent with the existing MAX_EXPR and MIN_EXPR tree codes. In
>>>>>>> > addition it
>>>>>>> > would seem to provide more opportunities for optimisation than a
>>>>>>> > target-specific
>>>>>>> > builtin implementation would. I accept that optimisation
>>>>>>> > opportunities will
>>>>>>> > be more limited for strict math compilation, but that it was still
>>>>>>> > worth having
>>>>>>> > them. Also, if we did map it to builtins then the scalar
>>>>>>> > version would go
>>>>>>> > through the optabs and the vector version would go through the
>>>>>>> > target's builtin
>>>>>>> > expansion, which doesn't seem very consistent.
>>>>>>>
>>>>>>> On another note ISTR you can't associate STRICT_MIN/MAX_EXPR and thus
>>>>>>> you can't vectorize anyway?  (strict IEEE behavior is about NaNs,
>>>>>>> correct?)
>>>>>> I thought for this particular case associativity wasn't an issue?
>>>>>> We're not doing any
>>>>>> reductions here, just simply performing max/min operations on each
>>>>>> pair of elements
>>>>>> in the vectors. I thought for IEEE-compliant behaviour we just need to
>>>>>> ensure that for
>>>>>> each pair of elements if one element is a NaN we return the other one.
>>>>>
>>>>> Hmm, true.  Ok, my comment still stands - I don't see that using a
>>>>> tree code is the best thing to do here.  You can add fmin/max optabs
>>>>> and special expansion of BUILT_IN_FMIN/MAX and you can use a target
>>>>> builtin for the vectorized variant.
>>>>>
>>>>> The reason I am pushing against a new tree code is that we'd have an
>>>>> awful lot of similar codes when pushing other flag related IL
>>>>> specialities to actual IL constructs.  And we still need to find a
>>>>> consistent way to do that.
>>>>
>>>> In this case though the new code is really the "native" min/max operation
>>>> for fp, rather than some weird flag-dependent behaviour.  Maybe it's
>>>> a bit unfortunate that the non-strict min/max fp operation got mapped
>>>> to the generic MIN_EXPR and MAX_EXPR when the non-strict version is really
>>>> the flag-related modification.  The STRICT_* prefix is forced by that and
>>>> might make it seem like more of a special case than it really is.
>>>
>>> In some sense.  But the "strict" version already has a builtin (just no
>>> special expander in builtins.c).  We usually don't add 1:1 tree codes
>>> for existing builtins (why have builtins at all then?).
>>
>> We still need the builtin to match the C function (and to allow direct
>> calls to __builtin_fmin, etc., which are occasionally useful).
>>
>>>> If you're still not convinced, how about an internal function instead
>>>> of a built-in function, so that we can continue to use optabs for all
>>>> cases?  I'd really like to avoid forcing such a generic concept down to
>>>> target-specific builtins with target-specific expansion code, especially
>>>> when the same concept is exposed by target-independent code for scalars.
>>>
>>> The target builtin is for the vectorized variant - not all targets might have
>>> that and we'd need to query the target about this.  So using a IFN would
>>> mean adding a target hook for that query.
>>
>> No, the idea is that if we have a tree code or an internal function, the
>> decision about whether we have target support is based on a query of the
>> optabs (just like it is for scalar, and for other vectorisable tree codes).
>> No new hooks are needed.
>>
>> The patch checked for target support that way.
>
> Fair enough.  Still this means we should have tree codes for all builtins
> that eventually are vectorized?  So why don't we have SIN_EXPR,
> POW_EXPR (ok, I did argue and have patches for that in the past),
> RINT_EXPR, SQRT_EXPR, etc?

Yeah, it doesn't sound so bad to me :-)  The choice of what's a function
in C and what's inherent is pretty arbitrary.  E.g. % on doubles could
have implemented fmod() or remainder().  Casts from double to int could
have used the current rounding mode, but instead they truncate and
conversions using the rounding mode need to go through something like
(l)lrint().  Like you say, pow() could have been an operator (and is in
many languages), but instead it's a function.

> This patch starts to go down that route which is why I ask for the
> whole picture to be considered and hinted at the alternative implementation
> which follows existing practice.  Add a expander in builtins.c, add an optab,
> and eventual support to vectorized_function.
>
> See for example ix86_builtin_vectorized_function which handles
> sqrt, floor, ceil, etc. and even FMA (we only fold FMA to FMA_EXPR
> if the target supports it for the scalar mode, so not sure if there is
> any x86 ISA where it has vectorized FMA but not scalar FMA).

Yeah.  TBH I'm really against doing that unless (a) there's good reason
to believe that the concept really is specific to one target and
wouldn't be implemented on others or (b) there really is a function
rather than an instruction underneath (usually the case for sin, etc.).
But (b) could also be handled by the optab support library mechanism.

Reasons against using target-specific builtins for operations that
have direct support in the ISA:

1. Like you say, in practice vector ops only tend to be supported if the
   associated scalar op is also supported.  Sticking to this approach
   means that vector ops follow a different path from scalar ops whereas
   (for example) division follows the same path for both.  It just seems
   confusing to have some floating-point optabs that support both scalar
   and vector operands and others that only support scalar operands.

2. Once converted to a target-specific function, the target-independent
   code has no idea what the function does or how expensive it is.
   We might start out with just one hook to convert a scalar operation
   to a target-dependent built-in function, but I bet over time we'll
   grow other hooks to query properties about the function, such as
   costs.

3. builtin_vectorized_function returns a decl rather than a call.
   If the target's vector API doesn't already have a built-in for the
   operation we need, with the exact types and arguments that we expect,
   the target needs to define one, presumably marked so that it isn't
   callable by input code.

   E.g. on targets where FP conversion instructions allow an explicit
   rounding mode to be specified as an operand, it's reasonable for a
   target's vector API to expose that operand as a constant argument to
   the API function.  There'd then be one API function for all vector-
   float-to-vector-float integer rounding operations, rather than one
   for vector rint(), one for vector ceil(), etc.  (I'm thinking of
   System z instructions here, although I don't know offhand what the
   vector API is there.)  IMO it doesn't make sense to force the target
   to define "fake" built-in functions for all those possibilities
   purely for the sake of the target hook.  It's a lot of extra code,
   and it's extra code that would be duplicated on any target that needs
   to do this.

IMO optabs are the best way for the target to tell the target-independent
code what it can do.  If it supports sqrt on df it defines sqrtdf and
if it supports vector sqrt on v2df it defines sqrtv2df.  These patterns
will often be a single define_expand or define_insn template -- the
vectorness often comes "for free" in terms of writing the pattern.

>>> > TBH though I'm not sure why an internal_fn value (or a target-specific
>>> > builtin enum value) is worse than a tree-code value, unless the limit
>>> > of the tree_code bitfield is in sight (maybe it is).
>>>
>>> I think tree_code is 64bits now.
>>
>> Even better :-)
>
> Yes.
>
> I'm not against adding a corresponding tree code for all math builtin functions,
> we just have to decide whether this is the way to go (and of course support
> expanding those back to libcalls to libc/m rather than libgcc).  There are
> also constraints on what kind of STRICT_FMIN_EXPR the compiler may
> generate as the target may not be able to expand the long double variant
> directly but needs a libcall but libm might not be linked or may not
> have support
> for it.  That would be a new thing compared to libgcc providing a fallback
> for all other tree codes.

True, but that doesn't seem too bad.  The constraints would be the same
if we're operating on built-in functions rather than codes.  I suppose
built-in functions make this more explicit, but at the end of the day
it's a costing decision.  We should no more be converting a cheap
operation into an expensive libgcc function than converting a cheap
operation into an expensive libm function, even if the libgcc conversion
links.

There's certainly precedent for introducing calls to things that libgcc
doesn't define.  E.g. we already introduce calls to memcpy in things
like loop distribution, even though we don't provide a fallback memcpy
in libgcc.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PING][Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax*
  2015-08-19 12:23                   ` Richard Sandiford
@ 2015-08-19 12:35                     ` Richard Biener
  2015-08-19 13:16                       ` Richard Sandiford
                                         ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Richard Biener @ 2015-08-19 12:35 UTC (permalink / raw)
  To: Richard Biener, David Sherwood, GCC Patches, richard.sandiford

On Wed, Aug 19, 2015 at 2:11 PM, Richard Sandiford
<richard.sandiford@arm.com> wrote:
> Richard Biener <richard.guenther@gmail.com> writes:
>> On Wed, Aug 19, 2015 at 11:54 AM, Richard Sandiford
>> <richard.sandiford@arm.com> wrote:
>>> Richard Biener <richard.guenther@gmail.com> writes:
>>>> On Tue, Aug 18, 2015 at 4:15 PM, Richard Sandiford
>>>> <richard.sandiford@arm.com> wrote:
>>>>> Richard Biener <richard.guenther@gmail.com> writes:
>>>>>> On Tue, Aug 18, 2015 at 1:07 PM, David Sherwood
>>>>>> <david.sherwood@arm.com> wrote:
>>>>>>>> On Mon, Aug 17, 2015 at 11:29 AM, David Sherwood
>>>>>>>> <david.sherwood@arm.com> wrote:
>>>>>>>> > Hi Richard,
>>>>>>>> >
>>>>>>>> > Thanks for the reply. I'd chosen to add new expressions as this
>>>>>>>> > seemed more
>>>>>>>> > consistent with the existing MAX_EXPR and MIN_EXPR tree codes. In
>>>>>>>> > addition it
>>>>>>>> > would seem to provide more opportunities for optimisation than a
>>>>>>>> > target-specific
>>>>>>>> > builtin implementation would. I accept that optimisation
>>>>>>>> > opportunities will
>>>>>>>> > be more limited for strict math compilation, but that it was still
>>>>>>>> > worth having
>>>>>>>> > them. Also, if we did map it to builtins then the scalar
>>>>>>>> > version would go
>>>>>>>> > through the optabs and the vector version would go through the
>>>>>>>> > target's builtin
>>>>>>>> > expansion, which doesn't seem very consistent.
>>>>>>>>
>>>>>>>> On another note ISTR you can't associate STRICT_MIN/MAX_EXPR and thus
>>>>>>>> you can't vectorize anyway?  (strict IEEE behavior is about NaNs,
>>>>>>>> correct?)
>>>>>>> I thought for this particular case associativity wasn't an issue?
>>>>>>> We're not doing any
>>>>>>> reductions here, just simply performing max/min operations on each
>>>>>>> pair of elements
>>>>>>> in the vectors. I thought for IEEE-compliant behaviour we just need to
>>>>>>> ensure that for
>>>>>>> each pair of elements if one element is a NaN we return the other one.
>>>>>>
>>>>>> Hmm, true.  Ok, my comment still stands - I don't see that using a
>>>>>> tree code is the best thing to do here.  You can add fmin/max optabs
>>>>>> and special expansion of BUILT_IN_FMIN/MAX and you can use a target
>>>>>> builtin for the vectorized variant.
>>>>>>
>>>>>> The reason I am pushing against a new tree code is that we'd have an
>>>>>> awful lot of similar codes when pushing other flag related IL
>>>>>> specialities to actual IL constructs.  And we still need to find a
>>>>>> consistent way to do that.
>>>>>
>>>>> In this case though the new code is really the "native" min/max operation
>>>>> for fp, rather than some weird flag-dependent behaviour.  Maybe it's
>>>>> a bit unfortunate that the non-strict min/max fp operation got mapped
>>>>> to the generic MIN_EXPR and MAX_EXPR when the non-strict version is really
>>>>> the flag-related modification.  The STRICT_* prefix is forced by that and
>>>>> might make it seem like more of a special case than it really is.
>>>>
>>>> In some sense.  But the "strict" version already has a builtin (just no
>>>> special expander in builtins.c).  We usually don't add 1:1 tree codes
>>>> for existing builtins (why have builtins at all then?).
>>>
>>> We still need the builtin to match the C function (and to allow direct
>>> calls to __builtin_fmin, etc., which are occasionally useful).
>>>
>>>>> If you're still not convinced, how about an internal function instead
>>>>> of a built-in function, so that we can continue to use optabs for all
>>>>> cases?  I'd really like to avoid forcing such a generic concept down to
>>>>> target-specific builtins with target-specific expansion code, especially
>>>>> when the same concept is exposed by target-independent code for scalars.
>>>>
>>>> The target builtin is for the vectorized variant - not all targets might have
>>>> that and we'd need to query the target about this.  So using a IFN would
>>>> mean adding a target hook for that query.
>>>
>>> No, the idea is that if we have a tree code or an internal function, the
>>> decision about whether we have target support is based on a query of the
>>> optabs (just like it is for scalar, and for other vectorisable tree codes).
>>> No new hooks are needed.
>>>
>>> The patch checked for target support that way.
>>
>> Fair enough.  Still this means we should have tree codes for all builtins
>> that eventually are vectorized?  So why don't we have SIN_EXPR,
>> POW_EXPR (ok, I did argue and have patches for that in the past),
>> RINT_EXPR, SQRT_EXPR, etc?
>
> Yeah, it doesn't sound so bad to me :-)  The choice of what's a function
> in C and what's inherent is pretty arbitrary.  E.g. % on doubles could
> have implemented fmod() or remainder().  Casts from double to int could
> have used the current rounding mode, but instead they truncate and
> conversions using the rounding mode need to go through something like
> (l)lrint().  Like you say, pow() could have been an operator (and is in
> many languages), but instead it's a function.
>
>> This patch starts to go down that route which is why I ask for the
>> whole picture to be considered and hinted at the alternative implementation
>> which follows existing practice.  Add a expander in builtins.c, add an optab,
>> and eventual support to vectorized_function.
>>
>> See for example ix86_builtin_vectorized_function which handles
>> sqrt, floor, ceil, etc. and even FMA (we only fold FMA to FMA_EXPR
>> if the target supports it for the scalar mode, so not sure if there is
>> any x86 ISA where it has vectorized FMA but not scalar FMA).
>
> Yeah.  TBH I'm really against doing that unless (a) there's good reason
> to believe that the concept really is specific to one target and
> wouldn't be implemented on others or (b) there really is a function
> rather than an instruction underneath (usually the case for sin, etc.).
> But (b) could also be handled by the optab support library mechanism.
>
> Reasons against using target-specific builtins for operations that
> have direct support in the ISA:
>
> 1. Like you say, in practice vector ops only tend to be supported if the
>    associated scalar op is also supported.  Sticking to this approach
>    means that vector ops follow a different path from scalar ops whereas
>    (for example) division follows the same path for both.  It just seems
>    confusing to have some floating-point optabs that support both scalar
>    and vector operands and others that only support scalar operands.
>
> 2. Once converted to a target-specific function, the target-independent
>    code has no idea what the function does or how expensive it is.
>    We might start out with just one hook to convert a scalar operation
>    to a target-dependent built-in function, but I bet over time we'll
>    grow other hooks to query properties about the function, such as
>    costs.
>
> 3. builtin_vectorized_function returns a decl rather than a call.
>    If the target's vector API doesn't already have a built-in for the
>    operation we need, with the exact types and arguments that we expect,
>    the target needs to define one, presumably marked so that it isn't
>    callable by input code.
>
>    E.g. on targets where FP conversion instructions allow an explicit
>    rounding mode to be specified as an operand, it's reasonable for a
>    target's vector API to expose that operand as a constant argument to
>    the API function.  There'd then be one API function for all vector-
>    float-to-vector-float integer rounding operations, rather than one
>    for vector rint(), one for vector ceil(), etc.  (I'm thinking of
>    System z instructions here, although I don't know offhand what the
>    vector API is there.)  IMO it doesn't make sense to force the target
>    to define "fake" built-in functions for all those possibilities
>    purely for the sake of the target hook.  It's a lot of extra code,
>    and it's extra code that would be duplicated on any target that needs
>    to do this.
>
> IMO optabs are the best way for the target to tell the target-independent
> code what it can do.  If it supports sqrt on df it defines sqrtdf and
> if it supports vector sqrt on v2df it defines sqrtv2df.  These patterns
> will often be a single define_expand or define_insn template -- the
> vectorness often comes "for free" in terms of writing the pattern.
>
>>>> > TBH though I'm not sure why an internal_fn value (or a target-specific
>>>> > builtin enum value) is worse than a tree-code value, unless the limit
>>>> > of the tree_code bitfield is in sight (maybe it is).
>>>>
>>>> I think tree_code is 64bits now.
>>>
>>> Even better :-)
>>
>> Yes.
>>
>> I'm not against adding a corresponding tree code for all math builtin functions,
>> we just have to decide whether this is the way to go (and of course support
>> expanding those back to libcalls to libc/m rather than libgcc).  There are
>> also constraints on what kind of STRICT_FMIN_EXPR the compiler may
>> generate as the target may not be able to expand the long double variant
>> directly but needs a libcall but libm might not be linked or may not
>> have support
>> for it.  That would be a new thing compared to libgcc providing a fallback
>> for all other tree codes.
>
> True, but that doesn't seem too bad.  The constraints would be the same
> if we're operating on built-in functions rather than codes.  I suppose
> built-in functions make this more explicit, but at the end of the day
> it's a costing decision.  We should no more be converting a cheap
> operation into an expensive libgcc function than converting a cheap
> operation into an expensive libm function, even if the libgcc conversion
> links.
>
> There's certainly precedent for introducing calls to things that libgcc
> doesn't define.  E.g. we already introduce calls to memcpy in things
> like loop distribution, even though we don't provide a fallback memcpy
> in libgcc.

As an additional point for many math functions we have to support errno
which means, like, BUILT_IN_SQRT can be rewritten to SQRT_EXPR
only if -fno-math-errno is in effect.  But then code has to handle
both variants for things like constant folding and expression combining.
That's very unfortunate and something we want to avoid (one reason
the POW_EXPR thing didn't fly when I tried).  STRICT_FMIN/MAX_EXPR
is an example where this doesn't apply, of course (but I detest the name,
just use FMIN/FMAX_EXPR?).  Still you'd need to handle both,
FMIN_EXPR and BUILT_IN_FMIN, in code doing analysis/transform.

Richard.


> Thanks,
> Richard
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PING][Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax*
  2015-08-19 12:35                     ` Richard Biener
@ 2015-08-19 13:16                       ` Richard Sandiford
  2015-08-19 13:41                         ` Richard Biener
  2015-08-19 15:32                       ` Joseph Myers
  2015-11-23  9:21                       ` David Sherwood
  2 siblings, 1 reply; 24+ messages in thread
From: Richard Sandiford @ 2015-08-19 13:16 UTC (permalink / raw)
  To: Richard Biener; +Cc: David Sherwood, GCC Patches

Richard Biener <richard.guenther@gmail.com> writes:
> As an additional point for many math functions we have to support errno
> which means, like, BUILT_IN_SQRT can be rewritten to SQRT_EXPR
> only if -fno-math-errno is in effect.  But then code has to handle
> both variants for things like constant folding and expression combining.
> That's very unfortunate and something we want to avoid (one reason
> the POW_EXPR thing didn't fly when I tried).  STRICT_FMIN/MAX_EXPR
> is an example where this doesn't apply, of course (but I detest the name,
> just use FMIN/FMAX_EXPR?).  Still you'd need to handle both,
> FMIN_EXPR and BUILT_IN_FMIN, in code doing analysis/transform.

Yeah, but match.pd makes that easy, right? ;-)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PING][Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax*
  2015-08-19 13:16                       ` Richard Sandiford
@ 2015-08-19 13:41                         ` Richard Biener
  2015-09-14 10:47                           ` David Sherwood
  0 siblings, 1 reply; 24+ messages in thread
From: Richard Biener @ 2015-08-19 13:41 UTC (permalink / raw)
  To: Richard Biener, David Sherwood, GCC Patches, richard.sandiford

On Wed, Aug 19, 2015 at 3:06 PM, Richard Sandiford
<richard.sandiford@arm.com> wrote:
> Richard Biener <richard.guenther@gmail.com> writes:
>> As an additional point for many math functions we have to support errno
>> which means, like, BUILT_IN_SQRT can be rewritten to SQRT_EXPR
>> only if -fno-math-errno is in effect.  But then code has to handle
>> both variants for things like constant folding and expression combining.
>> That's very unfortunate and something we want to avoid (one reason
>> the POW_EXPR thing didn't fly when I tried).  STRICT_FMIN/MAX_EXPR
>> is an example where this doesn't apply, of course (but I detest the name,
>> just use FMIN/FMAX_EXPR?).  Still you'd need to handle both,
>> FMIN_EXPR and BUILT_IN_FMIN, in code doing analysis/transform.
>
> Yeah, but match.pd makes that easy, right? ;-)

Sure, but that only addresses stmt combining, not other passes.  And of course
it causes {gimple,generic}-match.c to become even bigger ;)

Richard.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PING][Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax*
  2015-08-19 13:41                         ` Richard Biener
@ 2015-09-14 10:47                           ` David Sherwood
  2015-09-14 13:42                             ` Richard Biener
  0 siblings, 1 reply; 24+ messages in thread
From: David Sherwood @ 2015-09-14 10:47 UTC (permalink / raw)
  To: 'Richard Biener', GCC Patches, Richard Sandiford

[-- Attachment #1: Type: text/plain, Size: 3232 bytes --]

Hi All,

For what it's worth I have uploaded a new patch that changes the name
from STRICT_FMIN/MAX to just FMIN/FMAX, although I realise that this
discussion has not yet been resolved. I have also added scheduling
attributes to the aarch64 instructions.

Regards,
David Sherwood.

ChangeLog:

2015-08-28  David Sherwood  <david.sherwood@arm.com>

    gcc/
	* builtins.c (integer_valued_real_p): Add FMIN_EXPR and FMAX_EXPR.
	(fold_builtin_fmin_fmax): For strict math, convert builtins fmin and
	fmax to FMIN_EXPR and FMIN_EXPR, respectively.
	* expr.c (expand_expr_real_2): Add FMIN_EXPR and FMAX_EXPR.
	* fold-const.c (const_binop): Likewise.
	(fold_binary_loc, tree_binary_nonnegative_warnv_p): Likewise.
	(tree_binary_nonzero_warnv_p): Likewise.
	* optabs.h (fminmax_support): Declare.
	* optabs.def: Add new optabs fmax_optab/fmin_optab.
	* optabs.c (optab_for_tree_code): Return new optabs for FMIN_EXPR and
	FMAX_EXPR.
	(fminmax_support): New function.
	* real.c (real_arithmetic): Add FMIN_EXPR and FMAX_EXPR.
	* tree.def: Likewise.
	* tree.c (associative_tree_code, commutative_tree_code): Likewise.
	* tree-cfg.c (verify_expr): Likewise.
	(verify_gimple_assign_binary): Likewise.
	* tree-inline.c (estimate_operator_cost): Likewise.
	* tree-pretty-print.c (dump_generic_node, op_code_prio): Likewise.
	(op_symbol_code): Likewise.
	* config/aarch64/aarch64.md: New pattern.
	* config/aarch64/aarch64-simd.md: Likewise.
	* config/aarch64/iterators.md: New unspecs, iterators.
	* config/arm/iterators.md: New iterators.
	* config/arm/unspecs.md: New unspecs.
	* config/arm/neon.md: New pattern.
	* config/arm/vfp.md: Likewise.
        * doc/generic.texi: Add FMAX_EXPR and FMIN_EXPR.
        * doc/md.texi: Add fmin and fmax patterns.
    gcc/testsuite
        * gcc.target/aarch64/fmaxmin.c: New test.
        * gcc.target/arm/fmaxmin.c: New test.


> -----Original Message-----
> From: Richard Biener [mailto:richard.guenther@gmail.com]
> Sent: 19 August 2015 14:41
> To: Richard Biener; David Sherwood; GCC Patches; Richard Sandiford
> Subject: Re: [PING][Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax*
> 
> On Wed, Aug 19, 2015 at 3:06 PM, Richard Sandiford
> <richard.sandiford@arm.com> wrote:
> > Richard Biener <richard.guenther@gmail.com> writes:
> >> As an additional point for many math functions we have to support errno
> >> which means, like, BUILT_IN_SQRT can be rewritten to SQRT_EXPR
> >> only if -fno-math-errno is in effect.  But then code has to handle
> >> both variants for things like constant folding and expression combining.
> >> That's very unfortunate and something we want to avoid (one reason
> >> the POW_EXPR thing didn't fly when I tried).  STRICT_FMIN/MAX_EXPR
> >> is an example where this doesn't apply, of course (but I detest the name,
> >> just use FMIN/FMAX_EXPR?).  Still you'd need to handle both,
> >> FMIN_EXPR and BUILT_IN_FMIN, in code doing analysis/transform.
> >
> > Yeah, but match.pd makes that easy, right? ;-)
> 
> Sure, but that only addresses stmt combining, not other passes.  And of course
> it causes {gimple,generic}-match.c to become even bigger ;)
> 
> Richard.


[-- Attachment #2: fmaxmin.patch --]
[-- Type: application/octet-stream, Size: 21233 bytes --]

diff --git a/gcc/builtins.c b/gcc/builtins.c
index d79372c..284534f 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -7393,6 +7393,8 @@ integer_valued_real_p (tree t)
     case MULT_EXPR:
     case MIN_EXPR:
     case MAX_EXPR:
+    case FMIN_EXPR:
+    case FMAX_EXPR:
       return integer_valued_real_p (TREE_OPERAND (t, 0))
 	     && integer_valued_real_p (TREE_OPERAND (t, 1));
 
@@ -9176,6 +9178,10 @@ fold_builtin_fmin_fmax (location_t loc, tree arg0, tree arg1,
 	return fold_build2_loc (loc, (max ? MAX_EXPR : MIN_EXPR), type,
 			    fold_convert_loc (loc, type, arg0),
 			    fold_convert_loc (loc, type, arg1));
+      else if (fminmax_support (type, max))
+	return fold_build2_loc (loc, (max ? FMAX_EXPR : FMIN_EXPR), type,
+			    fold_convert_loc (loc, type, arg0),
+			    fold_convert_loc (loc, type, arg1));
     }
   return NULL_TREE;
 }
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 9777418..0a704b6 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1821,6 +1821,16 @@
   [(set_attr "type" "neon_fp_minmax_<Vetype><q>")]
 )
 
+(define_insn "<fmaxmin><mode>3"
+  [(set (match_operand:VDQF 0 "register_operand" "=w")
+	(unspec:VDQF [(match_operand:VDQF 1 "register_operand" "w")
+		      (match_operand:VDQF 2 "register_operand" "w")]
+		      FMAXMIN_STRICT))]
+  "TARGET_SIMD"
+  "<fmaxmin_op>\\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
+  [(set_attr "type" "neon_fp_minmax_<Vetype><q>")]
+)
+
 (define_insn "<maxmin_uns><mode>3"
   [(set (match_operand:VDQF 0 "register_operand" "=w")
        (unspec:VDQF [(match_operand:VDQF 1 "register_operand" "w")
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index c8511f0..023be58 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4315,6 +4315,16 @@
   [(set_attr "type" "f_minmax<s>")]
 )
 
+(define_insn "<fmaxmin><mode>3"
+  [(set (match_operand:GPF 0 "register_operand" "=w")
+	(unspec:GPF [(match_operand:GPF 1 "register_operand" "w")
+		     (match_operand:GPF 2 "register_operand" "w")]
+		     FMAXMIN_STRICT))]
+  "TARGET_FLOAT"
+  "<fmaxmin_op>\\t%<s>0, %<s>1, %<s>2"
+  [(set_attr "type" "f_minmax<s>")]
+)
+
 ;; -------------------------------------------------------------------
 ;; Reload support
 ;; -------------------------------------------------------------------
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index b8a45d1..5e248bc 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -282,6 +282,8 @@
     UNSPEC_PMULL2       ; Used in aarch64-simd.md.
     UNSPEC_REV_REGLIST  ; Used in aarch64-simd.md.
     UNSPEC_VEC_SHR      ; Used in aarch64-simd.md.
+    UNSPEC_FMAX_STRICT  ; Used in aarch64-simd.md.
+    UNSPEC_FMIN_STRICT  ; Used in aarch64-simd.md.
 ])
 
 ;; -------------------------------------------------------------------
@@ -873,6 +875,8 @@
 
 (define_int_iterator FMAXMIN_UNS [UNSPEC_FMAX UNSPEC_FMIN])
 
+(define_int_iterator FMAXMIN_STRICT [UNSPEC_FMAX_STRICT UNSPEC_FMIN_STRICT])
+
 (define_int_iterator VQDMULH [UNSPEC_SQDMULH UNSPEC_SQRDMULH])
 
 (define_int_iterator USSUQADD [UNSPEC_SUQADD UNSPEC_USQADD])
@@ -953,6 +957,12 @@
 				 (UNSPEC_FMINNMV "fminnm")
 				 (UNSPEC_FMINV "fmin")])
 
+(define_int_attr fmaxmin [(UNSPEC_FMAX_STRICT "fmax")
+			  (UNSPEC_FMIN_STRICT "fmin")])
+
+(define_int_attr fmaxmin_op [(UNSPEC_FMAX_STRICT "fmaxnm")
+			     (UNSPEC_FMIN_STRICT "fminnm")])
+
 (define_int_attr sur [(UNSPEC_SHADD "s") (UNSPEC_UHADD "u")
 		      (UNSPEC_SRHADD "sr") (UNSPEC_URHADD "ur")
 		      (UNSPEC_SHSUB "s") (UNSPEC_UHSUB "u")
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 1e7f3f1..42fc688 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -292,6 +292,8 @@
 
 (define_int_iterator VMAXMINF [UNSPEC_VMAX UNSPEC_VMIN])
 
+(define_int_iterator VMAXMINF_STRICT [UNSPEC_VMAX_STRICT UNSPEC_VMIN_STRICT])
+
 (define_int_iterator VPADDL [UNSPEC_VPADDL_S UNSPEC_VPADDL_U])
 
 (define_int_iterator VPADAL [UNSPEC_VPADAL_S UNSPEC_VPADAL_U])
@@ -716,6 +718,13 @@
   (UNSPEC_VPMIN "min") (UNSPEC_VPMIN_U "min")
 ])
 
+(define_int_attr fmaxmin [
+  (UNSPEC_VMAX_STRICT "fmax") (UNSPEC_VMIN_STRICT "fmin")])
+
+(define_int_attr fmaxmin_op [
+  (UNSPEC_VMAX_STRICT "vmaxnm") (UNSPEC_VMIN_STRICT "vminnm")
+])
+
 (define_int_attr shift_op [
   (UNSPEC_VSHL_S "shl") (UNSPEC_VSHL_U "shl")
   (UNSPEC_VRSHL_S "rshl") (UNSPEC_VRSHL_U "rshl")
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 873330f..d39f7ff 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -2354,6 +2354,16 @@
   [(set_attr "type" "neon_fp_minmax_s<q>")]
 )
 
+(define_insn "<fmaxmin><mode>3"
+  [(set (match_operand:VCVTF 0 "s_register_operand" "=w")
+	(unspec:VCVTF [(match_operand:VCVTF 1 "s_register_operand" "w")
+		       (match_operand:VCVTF 2 "s_register_operand" "w")]
+		       VMAXMINF_STRICT))]
+  "TARGET_NEON && TARGET_FPU_ARMV8"
+  "<fmaxmin_op>.<V_s_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+  [(set_attr "type" "neon_fp_minmax_s<q>")]
+)
+
 (define_expand "neon_vpadd<mode>"
   [(match_operand:VD 0 "s_register_operand" "=w")
    (match_operand:VD 1 "s_register_operand" "w")
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index 0ec2c48..83094d5 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -224,8 +224,10 @@
   UNSPEC_VLD4_LANE
   UNSPEC_VMAX
   UNSPEC_VMAX_U
+  UNSPEC_VMAX_STRICT
   UNSPEC_VMIN
   UNSPEC_VMIN_U
+  UNSPEC_VMIN_STRICT
   UNSPEC_VMLA
   UNSPEC_VMLA_LANE
   UNSPEC_VMLAL_S
diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
index 081aab2..a9a5949 100644
--- a/gcc/config/arm/vfp.md
+++ b/gcc/config/arm/vfp.md
@@ -1368,6 +1368,17 @@
    (set_attr "conds" "unconditional")]
 )
 
+(define_insn "<fmaxmin><mode>3"
+  [(set (match_operand:SDF 0 "s_register_operand" "=<F_constraint>")
+	(unspec:SDF [(match_operand:SDF 1 "s_register_operand" "<F_constraint>")
+		     (match_operand:SDF 2 "s_register_operand" "<F_constraint>")]
+		     VMAXMINF_STRICT))]
+  "TARGET_HARD_FLOAT && TARGET_VFP5 <vfp_double_cond>"
+  "<fmaxmin_op>.<V_if_elem>\\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+  [(set_attr "type" "f_minmax<vfp_type>")
+   (set_attr "conds" "unconditional")]
+)
+
 ;; Write Floating-point Status and Control Register.
 (define_insn "set_fpscr"
   [(unspec_volatile [(match_operand:SI 0 "register_operand" "r")] VUNSPEC_SET_FPSCR)]
diff --git a/gcc/doc/generic.texi b/gcc/doc/generic.texi
index bbafad9..937ca5b 100644
--- a/gcc/doc/generic.texi
+++ b/gcc/doc/generic.texi
@@ -1268,6 +1268,8 @@ the byte offset of the field, but should not be used directly; call
 @tindex TARGET_EXPR
 @tindex VA_ARG_EXPR
 @tindex ANNOTATE_EXPR
+@tindex FMAX_EXPR
+@tindex FMIN_EXPR
 
 @table @code
 @item NEGATE_EXPR
@@ -1687,8 +1689,16 @@ its sole argument yields the representation for @code{ap}.
 This node is used to attach markers to an expression. The first operand
 is the annotated expression, the second is an @code{INTEGER_CST} with
 a value from @code{enum annot_expr_kind}.
-@end table
 
+@item FMAX_EXPR
+@item FMIN_EXPR
+These nodes represent IEEE-conformant maximum and minimum operations.  If either
+operand is a quiet @code{NaN} the other operand is returned.  If both operands
+are quiet @code{NaN}, then a quiet @code{NaN} is returned.  In the case when gcc
+supports signalling @code{NaN} (-fsignaling-nans) an invalid floating point
+exception is raised and a quiet @code{NaN} is returned.
+
+@end table
 
 @node Vectors
 @subsection Vectors
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 0bffdc6..31b1d24 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -4879,6 +4879,15 @@ Signed minimum and maximum operations.  When used with floating point,
 if both operands are zeros, or if either operand is @code{NaN}, then
 it is unspecified which of the two operands is returned as the result.
 
+@cindex @code{fmin@var{m}3} instruction pattern
+@cindex @code{fmax@var{m}3} instruction pattern
+@item @samp{fmin@var{m}3}, @samp{fmax@var{m}3}
+IEEE-conformant minimum and maximum operations.  If one operand is a quiet
+@code{NaN}, then the other operand is returned.  If both operands are quiet
+@code{NaN}, then a quiet @code{NaN} is returned.  In the case when gcc supports
+signalling @code{NaN} (-fsignaling-nans) an invalid floating point exception is
+raised and a quiet @code{NaN} is returned.
+
 @cindex @code{reduc_smin_@var{m}} instruction pattern
 @cindex @code{reduc_smax_@var{m}} instruction pattern
 @item @samp{reduc_smin_@var{m}}, @samp{reduc_smax_@var{m}}
diff --git a/gcc/expr.c b/gcc/expr.c
index 1e820b4..f69ba80 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -8689,6 +8689,8 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
       return expand_abs (mode, op0, target, unsignedp,
 			 safe_from_p (target, treeop0, 1));
 
+    case FMAX_EXPR:
+    case FMIN_EXPR:
     case MAX_EXPR:
     case MIN_EXPR:
       target = original_target;
diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index c826e67..7846e42 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -1155,6 +1155,8 @@ const_binop (enum tree_code code, tree arg1, tree arg2)
 	case RDIV_EXPR:
 	case MIN_EXPR:
 	case MAX_EXPR:
+	case FMIN_EXPR:
+	case FMAX_EXPR:
 	  break;
 
 	default:
@@ -9069,7 +9071,8 @@ fold_binary_loc (location_t loc,
      cases, the appropriate type conversions should be put back in
      the tree that will get out of the constant folder.  */
 
-  if (kind == tcc_comparison || code == MIN_EXPR || code == MAX_EXPR)
+  if (kind == tcc_comparison || code == MIN_EXPR || code == MAX_EXPR
+      || code == FMIN_EXPR || code == FMAX_EXPR)
     {
       STRIP_SIGN_NOPS (arg0);
       STRIP_SIGN_NOPS (arg1);
@@ -13017,6 +13020,7 @@ tree_binary_nonnegative_warnv_p (enum tree_code code, tree type, tree op0,
 
     case BIT_AND_EXPR:
     case MAX_EXPR:
+    case FMAX_EXPR:
       return (tree_expr_nonnegative_warnv_p (op0,
 					     strict_overflow_p)
 	      || tree_expr_nonnegative_warnv_p (op1,
@@ -13025,6 +13029,7 @@ tree_binary_nonnegative_warnv_p (enum tree_code code, tree type, tree op0,
     case BIT_IOR_EXPR:
     case BIT_XOR_EXPR:
     case MIN_EXPR:
+    case FMIN_EXPR:
     case RDIV_EXPR:
     case TRUNC_DIV_EXPR:
     case CEIL_DIV_EXPR:
@@ -13479,6 +13484,7 @@ tree_binary_nonzero_warnv_p (enum tree_code code,
       break;
 
     case MIN_EXPR:
+    case FMIN_EXPR:
       sub_strict_overflow_p = false;
       if (tree_expr_nonzero_warnv_p (op0,
 				     &sub_strict_overflow_p)
@@ -13491,6 +13497,7 @@ tree_binary_nonzero_warnv_p (enum tree_code code,
       break;
 
     case MAX_EXPR:
+    case FMAX_EXPR:
       sub_strict_overflow_p = false;
       if (tree_expr_nonzero_warnv_p (op0,
 				     &sub_strict_overflow_p))
diff --git a/gcc/optabs.c b/gcc/optabs.c
index e533e6e..b733a2e 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -476,6 +476,12 @@ optab_for_tree_code (enum tree_code code, const_tree type,
     case MIN_EXPR:
       return TYPE_UNSIGNED (type) ? umin_optab : smin_optab;
 
+    case FMAX_EXPR:
+      return fmax_optab;
+
+    case FMIN_EXPR:
+      return fmin_optab;
+
     case REALIGN_LOAD_EXPR:
       return vec_realign_load_optab;
 
@@ -6791,6 +6797,16 @@ expand_vec_perm (machine_mode mode, rtx v0, rtx v1, rtx sel, rtx target)
   return tmp;
 }
 
+/* Return true if the target supports strict FP math max (MAX = TRUE) and min
+   (MAX = FALSE) operations on type TYPE.  */
+bool
+fminmax_support (tree type, bool max)
+{
+  optab optab = optab_for_tree_code
+    (max ? FMAX_EXPR : FMIN_EXPR, type, optab_default);
+  return optab_handler (optab, TYPE_MODE (type)) != CODE_FOR_nothing;
+}
+
 /* Return insn code for a conditional operator with a comparison in
    mode CMODE, unsigned if UNS is true, resulting in a value of mode VMODE.  */
 
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 888b21c..36c72d8 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -244,6 +244,10 @@ OPTAB_D (sin_optab, "sin$a2")
 OPTAB_D (sincos_optab, "sincos$a3")
 OPTAB_D (tan_optab, "tan$a2")
 
+/* C99 implementations of fmax/fmin.  */
+OPTAB_D (fmax_optab, "fmax$a3")
+OPTAB_D (fmin_optab, "fmin$a3")
+
 /* Vector reduction to a scalar.  */
 OPTAB_D (reduc_smax_scal_optab, "reduc_smax_scal_$a")
 OPTAB_D (reduc_smin_scal_optab, "reduc_smin_scal_$a")
diff --git a/gcc/optabs.h b/gcc/optabs.h
index 95f5cbc..0bbf736 100644
--- a/gcc/optabs.h
+++ b/gcc/optabs.h
@@ -565,4 +565,6 @@ extern bool lshift_cheap_p (bool);
 
 extern enum rtx_code get_rtx_code (enum tree_code tcode, bool unsignedp);
 
+extern bool fminmax_support (tree, bool);
+
 #endif /* GCC_OPTABS_H */
diff --git a/gcc/real.c b/gcc/real.c
index c1ff78d..3a2d7b6 100644
--- a/gcc/real.c
+++ b/gcc/real.c
@@ -1033,6 +1033,15 @@ real_arithmetic (REAL_VALUE_TYPE *r, int icode, const REAL_VALUE_TYPE *op0,
 	*r = *op1;
       break;
 
+    case FMIN_EXPR:
+      if (op0->cl == rvc_nan)
+	*r = *op1;
+      else if (do_compare (op0, op1, -1) < 0)
+	*r = *op0;
+      else
+	*r = *op1;
+      break;
+
     case MAX_EXPR:
       if (op1->cl == rvc_nan)
 	*r = *op1;
@@ -1042,6 +1051,15 @@ real_arithmetic (REAL_VALUE_TYPE *r, int icode, const REAL_VALUE_TYPE *op0,
 	*r = *op0;
       break;
 
+    case FMAX_EXPR:
+      if (op0->cl == rvc_nan)
+	*r = *op1;
+      else if (do_compare (op0, op1, 1) < 0)
+	*r = *op1;
+      else
+	*r = *op0;
+      break;
+
     case NEGATE_EXPR:
       *r = *op0;
       r->sign ^= 1;
diff --git a/gcc/testsuite/gcc.target/aarch64/fmaxmin.c b/gcc/testsuite/gcc.target/aarch64/fmaxmin.c
new file mode 100644
index 0000000..7654955
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/fmaxmin.c
@@ -0,0 +1,69 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -ftree-vectorize -fno-inline -save-temps" } */
+
+
+extern void abort (void);
+double fmax (double, double);
+float fmaxf (float, float);
+double fmin (double, double);
+float fminf (float, float);
+
+#define isnan __builtin_isnan
+#define isinf __builtin_isinf
+
+#define NAN __builtin_nan ("")
+#define INFINITY __builtin_inf ()
+
+#define NUM_ELEMS(TYPE) (16 / sizeof (TYPE))
+
+#define DEF_MAXMIN(TYPE,FUN)\
+void test_##FUN (TYPE *__restrict__ r, TYPE *__restrict__ a,\
+		 TYPE *__restrict__ b)\
+{\
+  int i;\
+  for (i = 0; i < NUM_ELEMS (TYPE); i++)\
+    r[i] = FUN (a[i], b[i]);\
+}\
+
+DEF_MAXMIN (float, fmaxf)
+DEF_MAXMIN (double, fmax)
+
+DEF_MAXMIN (float, fminf)
+DEF_MAXMIN (double, fmin)
+
+int main ()
+{
+  float a_f[4] = { 4, NAN, -3, INFINITY };
+  float b_f[4] = { 1,   7,NAN, 0 };
+  float r_f[4];
+  double a_d[4] = { 4, NAN,  -3,  INFINITY };
+  double b_d[4] = { 1,   7, NAN,  0 };
+  double r_d[4];
+
+  test_fmaxf (r_f, a_f, b_f);
+  if (r_f[0] != 4 || isnan (r_f[1]) || isnan (r_f[2]) || !isinf (r_f[3]))
+    abort ();
+
+  test_fminf (r_f, a_f, b_f);
+  if (r_f[0] != 1 || isnan (r_f[1]) || isnan (r_f[2]) || isinf (r_f[3]))
+    abort ();
+
+  test_fmax (r_d, a_d, b_d);
+  test_fmax (&r_d[2], &a_d[2], &b_d[2]);
+  if (r_d[0] != 4 || isnan (r_d[1]) || isnan (r_d[2]) || !isinf (r_d[3]))
+    abort ();
+
+  test_fmin (r_d, a_d, b_d);
+  test_fmin (&r_d[2], &a_d[2], &b_d[2]);
+  if (r_d[0] != 1 || isnan (r_d[1]) || isnan (r_d[2]) || isinf (r_d[3]))
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times "fmaxnm\tv\[0-9\]+\.4s, v\[0-9\]+\.4s, v\[0-9\]+\.4s" 1 } } */
+/* { dg-final { scan-assembler-times "fmaxnm\tv\[0-9\]+\.2d, v\[0-9\]+\.2d, v\[0-9\]+\.2d" 1 } } */
+
+/* { dg-final { scan-assembler-times "fminnm\tv\[0-9\]+\.4s, v\[0-9\]+\.4s, v\[0-9\]+\.4s" 1 } } */
+/* { dg-final { scan-assembler-times "fminnm\tv\[0-9\]+\.2d, v\[0-9\]+\.2d, v\[0-9\]+\.2d" 1 } } */
+
diff --git a/gcc/testsuite/gcc.target/arm/fmaxmin.c b/gcc/testsuite/gcc.target/arm/fmaxmin.c
new file mode 100644
index 0000000..f55ac5f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/fmaxmin.c
@@ -0,0 +1,67 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_neon_ok } */
+/* { dg-options "-O2 -ftree-vectorize -fno-inline -march=armv8-a -save-temps" } */
+/* { dg-add-options arm_v8_neon } */
+
+extern void abort (void);
+double fmax (double, double);
+float fmaxf (float, float);
+double fmin (double, double);
+float fminf (float, float);
+
+#define isnan __builtin_isnan
+#define isinf __builtin_isinf
+
+#define NAN __builtin_nan ("")
+#define INFINITY __builtin_inf ()
+
+#define DEF_MAXMIN(TYPE,FUN)\
+void test_##FUN (TYPE *__restrict__ r, TYPE *__restrict__ a,\
+		 TYPE *__restrict__ b)\
+{\
+  int i;\
+  for (i = 0; i < 4; i++)\
+    r[i] = FUN (a[i], b[i]);\
+}\
+
+DEF_MAXMIN (float, fmaxf)
+DEF_MAXMIN (double, fmax)
+
+DEF_MAXMIN (float, fminf)
+DEF_MAXMIN (double, fmin)
+
+int main ()
+{
+  float a_f[4] = { 4, NAN, -3, INFINITY };
+  float b_f[4] = { 1,   7,NAN, 0 };
+  float r_f[4];
+  double a_d[4] = { 4, NAN,  -3,  INFINITY };
+  double b_d[4] = { 1,   7, NAN,  0 };
+  double r_d[4];
+
+  test_fmaxf (r_f, a_f, b_f);
+  if (r_f[0] != 4 || isnan (r_f[1]) || isnan (r_f[2]) || !isinf (r_f[3]))
+    abort ();
+
+  test_fminf (r_f, a_f, b_f);
+  if (r_f[0] != 1 || isnan (r_f[1]) || isnan (r_f[2]) || isinf (r_f[3]))
+    abort ();
+
+  test_fmax (r_d, a_d, b_d);
+  if (r_d[0] != 4 || isnan (r_d[1]) || isnan (r_d[2]) || !isinf (r_d[3]))
+    abort ();
+
+  test_fmin (r_d, a_d, b_d);
+  if (r_d[0] != 1 || isnan (r_d[1]) || isnan (r_d[2]) || isinf (r_d[3]))
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times "vmaxnm.f32\tq\[0-9\]+, q\[0-9\]+, q\[0-9\]+" 1 } } */
+/* { dg-final { scan-assembler-times "vminnm.f32\tq\[0-9\]+, q\[0-9\]+, q\[0-9\]+" 1 } } */
+
+/* NOTE: There are no double precision vector versions of vmaxnm/vminnm.  */
+/* { dg-final { scan-assembler-times "vmaxnm.f64\td\[0-9\]+, d\[0-9\]+, d\[0-9\]+" 1 } } */
+/* { dg-final { scan-assembler-times "vminnm.f64\td\[0-9\]+, d\[0-9\]+, d\[0-9\]+" 1 } } */
+
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index 5ac73b3..36ab7a9 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -3055,6 +3055,8 @@ verify_expr (tree *tp, int *walk_subtrees, void *data ATTRIBUTE_UNUSED)
     case EXACT_DIV_EXPR:
     case MIN_EXPR:
     case MAX_EXPR:
+    case FMIN_EXPR:
+    case FMAX_EXPR:
     case LSHIFT_EXPR:
     case RSHIFT_EXPR:
     case LROTATE_EXPR:
@@ -3880,6 +3882,8 @@ verify_gimple_assign_binary (gassign *stmt)
     case EXACT_DIV_EXPR:
     case MIN_EXPR:
     case MAX_EXPR:
+    case FMIN_EXPR:
+    case FMAX_EXPR:
     case BIT_IOR_EXPR:
     case BIT_XOR_EXPR:
     case BIT_AND_EXPR:
diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
index e1ceea4..d41c8ff 100644
--- a/gcc/tree-inline.c
+++ b/gcc/tree-inline.c
@@ -3873,6 +3873,8 @@ estimate_operator_cost (enum tree_code code, eni_weights *weights,
     case FLOAT_EXPR:
     case MIN_EXPR:
     case MAX_EXPR:
+    case FMIN_EXPR:
+    case FMAX_EXPR:
     case ABS_EXPR:
 
     case LSHIFT_EXPR:
diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c
index 7cd1fe7..dc950f7 100644
--- a/gcc/tree-pretty-print.c
+++ b/gcc/tree-pretty-print.c
@@ -2844,6 +2844,8 @@ dump_generic_node (pretty_printer *pp, tree node, int spc, int flags,
       pp_string (pp, " > ");
       break;
 
+    case FMIN_EXPR:
+    case FMAX_EXPR:
     case VEC_WIDEN_MULT_HI_EXPR:
     case VEC_WIDEN_MULT_LO_EXPR:
     case VEC_WIDEN_MULT_EVEN_EXPR:
@@ -3218,6 +3220,8 @@ op_code_prio (enum tree_code code)
       /* Special expressions.  */
     case MIN_EXPR:
     case MAX_EXPR:
+    case FMIN_EXPR:
+    case FMAX_EXPR:
     case ABS_EXPR:
     case REALPART_EXPR:
     case IMAGPART_EXPR:
@@ -3414,6 +3418,12 @@ op_symbol_code (enum tree_code code)
     case MIN_EXPR:
       return "min";
 
+    case FMAX_EXPR:
+      return "fmax";
+
+    case FMIN_EXPR:
+      return "fmin";
+
     default:
       return "<<< ??? >>>";
     }
diff --git a/gcc/tree.c b/gcc/tree.c
index af3a6a3..183a9e6 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -7533,6 +7533,8 @@ associative_tree_code (enum tree_code code)
     case MULT_EXPR:
     case MIN_EXPR:
     case MAX_EXPR:
+    case FMIN_EXPR:
+    case FMAX_EXPR:
       return true;
 
     default:
@@ -7553,6 +7555,8 @@ commutative_tree_code (enum tree_code code)
     case MULT_HIGHPART_EXPR:
     case MIN_EXPR:
     case MAX_EXPR:
+    case FMIN_EXPR:
+    case FMAX_EXPR:
     case BIT_IOR_EXPR:
     case BIT_XOR_EXPR:
     case BIT_AND_EXPR:
diff --git a/gcc/tree.def b/gcc/tree.def
index 56580af..cf19392 100644
--- a/gcc/tree.def
+++ b/gcc/tree.def
@@ -722,6 +722,14 @@ DEFTREECODE (NEGATE_EXPR, "negate_expr", tcc_unary, 1)
 DEFTREECODE (MIN_EXPR, "min_expr", tcc_binary, 2)
 DEFTREECODE (MAX_EXPR, "max_expr", tcc_binary, 2)
 
+/* Minimum and maximum values, but when used with floating point it conforms to
+   the C99 definition of fmax and fmin, i.e.
+     1. if one operand is NaN the other numeric value is returned,
+     2. if both operands are NaN then a NaN is returned,
+     3. there is no distinction between -0 and 0.  */
+DEFTREECODE (FMIN_EXPR, "fmin_expr", tcc_binary, 2)
+DEFTREECODE (FMAX_EXPR, "fmax_expr", tcc_binary, 2)
+
 /* Represents the absolute value of the operand.
 
    An ABS_EXPR must have either an INTEGER_TYPE or a REAL_TYPE.  The

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PING][Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax*
  2015-09-14 10:47                           ` David Sherwood
@ 2015-09-14 13:42                             ` Richard Biener
  2015-09-14 20:38                               ` Joseph Myers
  0 siblings, 1 reply; 24+ messages in thread
From: Richard Biener @ 2015-09-14 13:42 UTC (permalink / raw)
  To: David Sherwood; +Cc: GCC Patches, Richard Sandiford

To make progess here I think adding new optabs is fine.  So can you
split out that part and implement builtin expanders
for fmin/max instead?

Btw, FMIN/MAX_EXPR are not commutative AFAIK because of behavior for
fmax (-NaN, NaN)
vs. fmax (NaN, -NaN)?

Richard.

On Mon, Sep 14, 2015 at 12:36 PM, David Sherwood <david.sherwood@arm.com> wrote:
> Hi All,
>
> For what it's worth I have uploaded a new patch that changes the name
> from STRICT_FMIN/MAX to just FMIN/FMAX, although I realise that this
> discussion has not yet been resolved. I have also added scheduling
> attributes to the aarch64 instructions.
>
> Regards,
> David Sherwood.
>
> ChangeLog:
>
> 2015-08-28  David Sherwood  <david.sherwood@arm.com>
>
>     gcc/
>         * builtins.c (integer_valued_real_p): Add FMIN_EXPR and FMAX_EXPR.
>         (fold_builtin_fmin_fmax): For strict math, convert builtins fmin and
>         fmax to FMIN_EXPR and FMIN_EXPR, respectively.
>         * expr.c (expand_expr_real_2): Add FMIN_EXPR and FMAX_EXPR.
>         * fold-const.c (const_binop): Likewise.
>         (fold_binary_loc, tree_binary_nonnegative_warnv_p): Likewise.
>         (tree_binary_nonzero_warnv_p): Likewise.
>         * optabs.h (fminmax_support): Declare.
>         * optabs.def: Add new optabs fmax_optab/fmin_optab.
>         * optabs.c (optab_for_tree_code): Return new optabs for FMIN_EXPR and
>         FMAX_EXPR.
>         (fminmax_support): New function.
>         * real.c (real_arithmetic): Add FMIN_EXPR and FMAX_EXPR.
>         * tree.def: Likewise.
>         * tree.c (associative_tree_code, commutative_tree_code): Likewise.
>         * tree-cfg.c (verify_expr): Likewise.
>         (verify_gimple_assign_binary): Likewise.
>         * tree-inline.c (estimate_operator_cost): Likewise.
>         * tree-pretty-print.c (dump_generic_node, op_code_prio): Likewise.
>         (op_symbol_code): Likewise.
>         * config/aarch64/aarch64.md: New pattern.
>         * config/aarch64/aarch64-simd.md: Likewise.
>         * config/aarch64/iterators.md: New unspecs, iterators.
>         * config/arm/iterators.md: New iterators.
>         * config/arm/unspecs.md: New unspecs.
>         * config/arm/neon.md: New pattern.
>         * config/arm/vfp.md: Likewise.
>         * doc/generic.texi: Add FMAX_EXPR and FMIN_EXPR.
>         * doc/md.texi: Add fmin and fmax patterns.
>     gcc/testsuite
>         * gcc.target/aarch64/fmaxmin.c: New test.
>         * gcc.target/arm/fmaxmin.c: New test.
>
>
>> -----Original Message-----
>> From: Richard Biener [mailto:richard.guenther@gmail.com]
>> Sent: 19 August 2015 14:41
>> To: Richard Biener; David Sherwood; GCC Patches; Richard Sandiford
>> Subject: Re: [PING][Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax*
>>
>> On Wed, Aug 19, 2015 at 3:06 PM, Richard Sandiford
>> <richard.sandiford@arm.com> wrote:
>> > Richard Biener <richard.guenther@gmail.com> writes:
>> >> As an additional point for many math functions we have to support errno
>> >> which means, like, BUILT_IN_SQRT can be rewritten to SQRT_EXPR
>> >> only if -fno-math-errno is in effect.  But then code has to handle
>> >> both variants for things like constant folding and expression combining.
>> >> That's very unfortunate and something we want to avoid (one reason
>> >> the POW_EXPR thing didn't fly when I tried).  STRICT_FMIN/MAX_EXPR
>> >> is an example where this doesn't apply, of course (but I detest the name,
>> >> just use FMIN/FMAX_EXPR?).  Still you'd need to handle both,
>> >> FMIN_EXPR and BUILT_IN_FMIN, in code doing analysis/transform.
>> >
>> > Yeah, but match.pd makes that easy, right? ;-)
>>
>> Sure, but that only addresses stmt combining, not other passes.  And of course
>> it causes {gimple,generic}-match.c to become even bigger ;)
>>
>> Richard.
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PING][Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax*
  2015-09-14 13:42                             ` Richard Biener
@ 2015-09-14 20:38                               ` Joseph Myers
  0 siblings, 0 replies; 24+ messages in thread
From: Joseph Myers @ 2015-09-14 20:38 UTC (permalink / raw)
  To: Richard Biener; +Cc: David Sherwood, GCC Patches, Richard Sandiford

On Mon, 14 Sep 2015, Richard Biener wrote:

> To make progess here I think adding new optabs is fine.  So can you
> split out that part and implement builtin expanders
> for fmin/max instead?
> 
> Btw, FMIN/MAX_EXPR are not commutative AFAIK because of behavior for
> fmax (-NaN, NaN)
> vs. fmax (NaN, -NaN)?

For those cases, it's unspecified which NaN is returned, so they can be 
considered commutative (and likewise for the case when the arguments are 0 
and -0).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PING][Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax*
  2015-08-19 12:35                     ` Richard Biener
  2015-08-19 13:16                       ` Richard Sandiford
@ 2015-08-19 15:32                       ` Joseph Myers
  2015-11-23  9:21                       ` David Sherwood
  2 siblings, 0 replies; 24+ messages in thread
From: Joseph Myers @ 2015-08-19 15:32 UTC (permalink / raw)
  To: Richard Biener; +Cc: David Sherwood, GCC Patches, richard.sandiford

On Wed, 19 Aug 2015, Richard Biener wrote:

> As an additional point for many math functions we have to support errno
> which means, like, BUILT_IN_SQRT can be rewritten to SQRT_EXPR
> only if -fno-math-errno is in effect.  But then code has to handle

I'd say that for functions like that (which can be expanded inline only 
for -fno-math-errno) there should be no-errno built-in function variants 
that users can call even if -fmath-errno (if not expanded inline, they'd 
still result in a call to a libm function that might set errno).

An example of a use for that is AArch64 sqrt intrinsics that need an 
architecture-specific built-in __builtin_aarch64_sqrtdf when 
__builtin_sqrt_noerrno would do just as well if it existed.  As another 
example: various libm functions are marked in builtins.def as not setting 
errno, even though their proper semantics mean they might set errno; see 
bug 64101 for the example of erf.  One such function is fma.  But if you 
limit fma inline expansion (for calls to fma / __builtin_fma in the user's 
program; obviously this doesn't affect expansion via contraction of a * b 
+ c) to allow for the possibility of errno setting, you definitely want a 
way for user programs to get back the efficient inline expansion if they 
don't need errno set; for example, glibc uses __builtin_fma in various 
cases if _FP_FAST_FMA, and does not need errno setting in those cases, so 
would want to use __builtin_fma_noerrno in the event of any such change.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PING][Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax*
  2015-08-19 12:35                     ` Richard Biener
  2015-08-19 13:16                       ` Richard Sandiford
  2015-08-19 15:32                       ` Joseph Myers
@ 2015-11-23  9:21                       ` David Sherwood
  2015-11-25 12:39                         ` Richard Biener
  2 siblings, 1 reply; 24+ messages in thread
From: David Sherwood @ 2015-11-23  9:21 UTC (permalink / raw)
  To: GCC Patches; +Cc: 'Richard Biener', Richard Sandiford

[-- Attachment #1: Type: text/plain, Size: 12628 bytes --]

Hi,

This is part 1 of a reworked version of a patch I originally submitted in
August, rebased after Richard Sandiford's recent work on the internal
functions. This first patch adds the internal function definitions and optabs
that provide support for IEEE fmax()/fmin() functions.

Later patches will add the appropriate aarch64/aarch32 vector instructions.

Tested:

x86_64-linux: no regressions
aarch64-none-elf: no regressions
arm-none-eabi: no regressions

Regards,
David Sherwood.

ChangeLog:

2015-11-19  David Sherwood  <david.sherwood@arm.com>

    gcc/
	* optabs.def: Add new optabs fmax_optab/fmin_optab.
	* internal-fn.def: Add new fmax/fmin internal functions.
	* config/aarch64/aarch64.md: New pattern.
	* config/aarch64/aarch64-simd.md: Likewise.
	* config/aarch64/iterators.md: New unspecs, iterators.
	* config/arm/iterators.md: New iterators.
	* config/arm/unspecs.md: New unspecs.
	* config/arm/neon.md: New pattern.
	* config/arm/vfp.md: Likewise.
        * doc/md.texi: Add fmin and fmax patterns.
    gcc/testsuite
        * gcc.target/aarch64/fmaxmin.c: New test.
        * gcc.target/arm/fmaxmin.c: New test.


> -----Original Message-----
> From: Richard Biener [mailto:richard.guenther@gmail.com]
> Sent: 19 August 2015 13:35
> To: Richard Biener; David Sherwood; GCC Patches; Richard Sandiford
> Subject: Re: [PING][Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax*
> 
> On Wed, Aug 19, 2015 at 2:11 PM, Richard Sandiford
> <richard.sandiford@arm.com> wrote:
> > Richard Biener <richard.guenther@gmail.com> writes:
> >> On Wed, Aug 19, 2015 at 11:54 AM, Richard Sandiford
> >> <richard.sandiford@arm.com> wrote:
> >>> Richard Biener <richard.guenther@gmail.com> writes:
> >>>> On Tue, Aug 18, 2015 at 4:15 PM, Richard Sandiford
> >>>> <richard.sandiford@arm.com> wrote:
> >>>>> Richard Biener <richard.guenther@gmail.com> writes:
> >>>>>> On Tue, Aug 18, 2015 at 1:07 PM, David Sherwood
> >>>>>> <david.sherwood@arm.com> wrote:
> >>>>>>>> On Mon, Aug 17, 2015 at 11:29 AM, David Sherwood
> >>>>>>>> <david.sherwood@arm.com> wrote:
> >>>>>>>> > Hi Richard,
> >>>>>>>> >
> >>>>>>>> > Thanks for the reply. I'd chosen to add new expressions as this
> >>>>>>>> > seemed more
> >>>>>>>> > consistent with the existing MAX_EXPR and MIN_EXPR tree codes. In
> >>>>>>>> > addition it
> >>>>>>>> > would seem to provide more opportunities for optimisation than a
> >>>>>>>> > target-specific
> >>>>>>>> > builtin implementation would. I accept that optimisation
> >>>>>>>> > opportunities will
> >>>>>>>> > be more limited for strict math compilation, but that it was still
> >>>>>>>> > worth having
> >>>>>>>> > them. Also, if we did map it to builtins then the scalar
> >>>>>>>> > version would go
> >>>>>>>> > through the optabs and the vector version would go through the
> >>>>>>>> > target's builtin
> >>>>>>>> > expansion, which doesn't seem very consistent.
> >>>>>>>>
> >>>>>>>> On another note ISTR you can't associate STRICT_MIN/MAX_EXPR and thus
> >>>>>>>> you can't vectorize anyway?  (strict IEEE behavior is about NaNs,
> >>>>>>>> correct?)
> >>>>>>> I thought for this particular case associativity wasn't an issue?
> >>>>>>> We're not doing any
> >>>>>>> reductions here, just simply performing max/min operations on each
> >>>>>>> pair of elements
> >>>>>>> in the vectors. I thought for IEEE-compliant behaviour we just need to
> >>>>>>> ensure that for
> >>>>>>> each pair of elements if one element is a NaN we return the other one.
> >>>>>>
> >>>>>> Hmm, true.  Ok, my comment still stands - I don't see that using a
> >>>>>> tree code is the best thing to do here.  You can add fmin/max optabs
> >>>>>> and special expansion of BUILT_IN_FMIN/MAX and you can use a target
> >>>>>> builtin for the vectorized variant.
> >>>>>>
> >>>>>> The reason I am pushing against a new tree code is that we'd have an
> >>>>>> awful lot of similar codes when pushing other flag related IL
> >>>>>> specialities to actual IL constructs.  And we still need to find a
> >>>>>> consistent way to do that.
> >>>>>
> >>>>> In this case though the new code is really the "native" min/max operation
> >>>>> for fp, rather than some weird flag-dependent behaviour.  Maybe it's
> >>>>> a bit unfortunate that the non-strict min/max fp operation got mapped
> >>>>> to the generic MIN_EXPR and MAX_EXPR when the non-strict version is really
> >>>>> the flag-related modification.  The STRICT_* prefix is forced by that and
> >>>>> might make it seem like more of a special case than it really is.
> >>>>
> >>>> In some sense.  But the "strict" version already has a builtin (just no
> >>>> special expander in builtins.c).  We usually don't add 1:1 tree codes
> >>>> for existing builtins (why have builtins at all then?).
> >>>
> >>> We still need the builtin to match the C function (and to allow direct
> >>> calls to __builtin_fmin, etc., which are occasionally useful).
> >>>
> >>>>> If you're still not convinced, how about an internal function instead
> >>>>> of a built-in function, so that we can continue to use optabs for all
> >>>>> cases?  I'd really like to avoid forcing such a generic concept down to
> >>>>> target-specific builtins with target-specific expansion code, especially
> >>>>> when the same concept is exposed by target-independent code for scalars.
> >>>>
> >>>> The target builtin is for the vectorized variant - not all targets might have
> >>>> that and we'd need to query the target about this.  So using a IFN would
> >>>> mean adding a target hook for that query.
> >>>
> >>> No, the idea is that if we have a tree code or an internal function, the
> >>> decision about whether we have target support is based on a query of the
> >>> optabs (just like it is for scalar, and for other vectorisable tree codes).
> >>> No new hooks are needed.
> >>>
> >>> The patch checked for target support that way.
> >>
> >> Fair enough.  Still this means we should have tree codes for all builtins
> >> that eventually are vectorized?  So why don't we have SIN_EXPR,
> >> POW_EXPR (ok, I did argue and have patches for that in the past),
> >> RINT_EXPR, SQRT_EXPR, etc?
> >
> > Yeah, it doesn't sound so bad to me :-)  The choice of what's a function
> > in C and what's inherent is pretty arbitrary.  E.g. % on doubles could
> > have implemented fmod() or remainder().  Casts from double to int could
> > have used the current rounding mode, but instead they truncate and
> > conversions using the rounding mode need to go through something like
> > (l)lrint().  Like you say, pow() could have been an operator (and is in
> > many languages), but instead it's a function.
> >
> >> This patch starts to go down that route which is why I ask for the
> >> whole picture to be considered and hinted at the alternative implementation
> >> which follows existing practice.  Add a expander in builtins.c, add an optab,
> >> and eventual support to vectorized_function.
> >>
> >> See for example ix86_builtin_vectorized_function which handles
> >> sqrt, floor, ceil, etc. and even FMA (we only fold FMA to FMA_EXPR
> >> if the target supports it for the scalar mode, so not sure if there is
> >> any x86 ISA where it has vectorized FMA but not scalar FMA).
> >
> > Yeah.  TBH I'm really against doing that unless (a) there's good reason
> > to believe that the concept really is specific to one target and
> > wouldn't be implemented on others or (b) there really is a function
> > rather than an instruction underneath (usually the case for sin, etc.).
> > But (b) could also be handled by the optab support library mechanism.
> >
> > Reasons against using target-specific builtins for operations that
> > have direct support in the ISA:
> >
> > 1. Like you say, in practice vector ops only tend to be supported if the
> >    associated scalar op is also supported.  Sticking to this approach
> >    means that vector ops follow a different path from scalar ops whereas
> >    (for example) division follows the same path for both.  It just seems
> >    confusing to have some floating-point optabs that support both scalar
> >    and vector operands and others that only support scalar operands.
> >
> > 2. Once converted to a target-specific function, the target-independent
> >    code has no idea what the function does or how expensive it is.
> >    We might start out with just one hook to convert a scalar operation
> >    to a target-dependent built-in function, but I bet over time we'll
> >    grow other hooks to query properties about the function, such as
> >    costs.
> >
> > 3. builtin_vectorized_function returns a decl rather than a call.
> >    If the target's vector API doesn't already have a built-in for the
> >    operation we need, with the exact types and arguments that we expect,
> >    the target needs to define one, presumably marked so that it isn't
> >    callable by input code.
> >
> >    E.g. on targets where FP conversion instructions allow an explicit
> >    rounding mode to be specified as an operand, it's reasonable for a
> >    target's vector API to expose that operand as a constant argument to
> >    the API function.  There'd then be one API function for all vector-
> >    float-to-vector-float integer rounding operations, rather than one
> >    for vector rint(), one for vector ceil(), etc.  (I'm thinking of
> >    System z instructions here, although I don't know offhand what the
> >    vector API is there.)  IMO it doesn't make sense to force the target
> >    to define "fake" built-in functions for all those possibilities
> >    purely for the sake of the target hook.  It's a lot of extra code,
> >    and it's extra code that would be duplicated on any target that needs
> >    to do this.
> >
> > IMO optabs are the best way for the target to tell the target-independent
> > code what it can do.  If it supports sqrt on df it defines sqrtdf and
> > if it supports vector sqrt on v2df it defines sqrtv2df.  These patterns
> > will often be a single define_expand or define_insn template -- the
> > vectorness often comes "for free" in terms of writing the pattern.
> >
> >>>> > TBH though I'm not sure why an internal_fn value (or a target-specific
> >>>> > builtin enum value) is worse than a tree-code value, unless the limit
> >>>> > of the tree_code bitfield is in sight (maybe it is).
> >>>>
> >>>> I think tree_code is 64bits now.
> >>>
> >>> Even better :-)
> >>
> >> Yes.
> >>
> >> I'm not against adding a corresponding tree code for all math builtin functions,
> >> we just have to decide whether this is the way to go (and of course support
> >> expanding those back to libcalls to libc/m rather than libgcc).  There are
> >> also constraints on what kind of STRICT_FMIN_EXPR the compiler may
> >> generate as the target may not be able to expand the long double variant
> >> directly but needs a libcall but libm might not be linked or may not
> >> have support
> >> for it.  That would be a new thing compared to libgcc providing a fallback
> >> for all other tree codes.
> >
> > True, but that doesn't seem too bad.  The constraints would be the same
> > if we're operating on built-in functions rather than codes.  I suppose
> > built-in functions make this more explicit, but at the end of the day
> > it's a costing decision.  We should no more be converting a cheap
> > operation into an expensive libgcc function than converting a cheap
> > operation into an expensive libm function, even if the libgcc conversion
> > links.
> >
> > There's certainly precedent for introducing calls to things that libgcc
> > doesn't define.  E.g. we already introduce calls to memcpy in things
> > like loop distribution, even though we don't provide a fallback memcpy
> > in libgcc.
> 
> As an additional point for many math functions we have to support errno
> which means, like, BUILT_IN_SQRT can be rewritten to SQRT_EXPR
> only if -fno-math-errno is in effect.  But then code has to handle
> both variants for things like constant folding and expression combining.
> That's very unfortunate and something we want to avoid (one reason
> the POW_EXPR thing didn't fly when I tried).  STRICT_FMIN/MAX_EXPR
> is an example where this doesn't apply, of course (but I detest the name,
> just use FMIN/FMAX_EXPR?).  Still you'd need to handle both,
> FMIN_EXPR and BUILT_IN_FMIN, in code doing analysis/transform.
> 
> Richard.
> 
> 
> > Thanks,
> > Richard
> >


[-- Attachment #2: fmaxmin_v1.patch --]
[-- Type: application/octet-stream, Size: 2092 bytes --]

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 8b2deaa..e5bc59e 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -4957,6 +4957,15 @@ Signed minimum and maximum operations.  When used with floating point,
 if both operands are zeros, or if either operand is @code{NaN}, then
 it is unspecified which of the two operands is returned as the result.
 
+@cindex @code{fmin@var{m}3} instruction pattern
+@cindex @code{fmax@var{m}3} instruction pattern
+@item @samp{fmin@var{m}3}, @samp{fmax@var{m}3}
+IEEE-conformant minimum and maximum operations.  If one operand is a quiet
+@code{NaN}, then the other operand is returned.  If both operands are quiet
+@code{NaN}, then a quiet @code{NaN} is returned.  In the case when gcc supports
+signalling @code{NaN} (-fsignaling-nans) an invalid floating point exception is
+raised and a quiet @code{NaN} is returned.
+
 @cindex @code{reduc_smin_@var{m}} instruction pattern
 @cindex @code{reduc_smax_@var{m}} instruction pattern
 @item @samp{reduc_smin_@var{m}}, @samp{reduc_smax_@var{m}}
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 825dba1..dee9332 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -125,6 +125,8 @@ DEF_INTERNAL_FLT_FN (FMOD, ECF_CONST, fmod, binary)
 DEF_INTERNAL_FLT_FN (POW, ECF_CONST, pow, binary)
 DEF_INTERNAL_FLT_FN (REMAINDER, ECF_CONST, remainder, binary)
 DEF_INTERNAL_FLT_FN (SCALB, ECF_CONST, scalb, binary)
+DEF_INTERNAL_FLT_FN (FMIN, ECF_CONST, fmin, binary)
+DEF_INTERNAL_FLT_FN (FMAX, ECF_CONST, fmax, binary)
 
 /* FP scales.  */
 DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 0ca2333..fed757b 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -251,6 +251,10 @@ OPTAB_D (sin_optab, "sin$a2")
 OPTAB_D (sincos_optab, "sincos$a3")
 OPTAB_D (tan_optab, "tan$a2")
 
+/* C99 implementations of fmax/fmin.  */
+OPTAB_D (fmax_optab, "fmax$a3")
+OPTAB_D (fmin_optab, "fmin$a3")
+
 /* Vector reduction to a scalar.  */
 OPTAB_D (reduc_smax_scal_optab, "reduc_smax_scal_$a")
 OPTAB_D (reduc_smin_scal_optab, "reduc_smin_scal_$a")

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PING][Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax*
  2015-11-23  9:21                       ` David Sherwood
@ 2015-11-25 12:39                         ` Richard Biener
  0 siblings, 0 replies; 24+ messages in thread
From: Richard Biener @ 2015-11-25 12:39 UTC (permalink / raw)
  To: David Sherwood; +Cc: GCC Patches, Richard Sandiford

On Mon, Nov 23, 2015 at 10:21 AM, David Sherwood <david.sherwood@arm.com> wrote:
> Hi,
>
> This is part 1 of a reworked version of a patch I originally submitted in
> August, rebased after Richard Sandiford's recent work on the internal
> functions. This first patch adds the internal function definitions and optabs
> that provide support for IEEE fmax()/fmin() functions.
>
> Later patches will add the appropriate aarch64/aarch32 vector instructions.

Ok.

Thanks,
Richard.

> Tested:
>
> x86_64-linux: no regressions
> aarch64-none-elf: no regressions
> arm-none-eabi: no regressions
>
> Regards,
> David Sherwood.
>
> ChangeLog:
>
> 2015-11-19  David Sherwood  <david.sherwood@arm.com>
>
>     gcc/
>         * optabs.def: Add new optabs fmax_optab/fmin_optab.
>         * internal-fn.def: Add new fmax/fmin internal functions.
>         * config/aarch64/aarch64.md: New pattern.
>         * config/aarch64/aarch64-simd.md: Likewise.
>         * config/aarch64/iterators.md: New unspecs, iterators.
>         * config/arm/iterators.md: New iterators.
>         * config/arm/unspecs.md: New unspecs.
>         * config/arm/neon.md: New pattern.
>         * config/arm/vfp.md: Likewise.
>         * doc/md.texi: Add fmin and fmax patterns.
>     gcc/testsuite
>         * gcc.target/aarch64/fmaxmin.c: New test.
>         * gcc.target/arm/fmaxmin.c: New test.
>
>
>> -----Original Message-----
>> From: Richard Biener [mailto:richard.guenther@gmail.com]
>> Sent: 19 August 2015 13:35
>> To: Richard Biener; David Sherwood; GCC Patches; Richard Sandiford
>> Subject: Re: [PING][Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax*
>>
>> On Wed, Aug 19, 2015 at 2:11 PM, Richard Sandiford
>> <richard.sandiford@arm.com> wrote:
>> > Richard Biener <richard.guenther@gmail.com> writes:
>> >> On Wed, Aug 19, 2015 at 11:54 AM, Richard Sandiford
>> >> <richard.sandiford@arm.com> wrote:
>> >>> Richard Biener <richard.guenther@gmail.com> writes:
>> >>>> On Tue, Aug 18, 2015 at 4:15 PM, Richard Sandiford
>> >>>> <richard.sandiford@arm.com> wrote:
>> >>>>> Richard Biener <richard.guenther@gmail.com> writes:
>> >>>>>> On Tue, Aug 18, 2015 at 1:07 PM, David Sherwood
>> >>>>>> <david.sherwood@arm.com> wrote:
>> >>>>>>>> On Mon, Aug 17, 2015 at 11:29 AM, David Sherwood
>> >>>>>>>> <david.sherwood@arm.com> wrote:
>> >>>>>>>> > Hi Richard,
>> >>>>>>>> >
>> >>>>>>>> > Thanks for the reply. I'd chosen to add new expressions as this
>> >>>>>>>> > seemed more
>> >>>>>>>> > consistent with the existing MAX_EXPR and MIN_EXPR tree codes. In
>> >>>>>>>> > addition it
>> >>>>>>>> > would seem to provide more opportunities for optimisation than a
>> >>>>>>>> > target-specific
>> >>>>>>>> > builtin implementation would. I accept that optimisation
>> >>>>>>>> > opportunities will
>> >>>>>>>> > be more limited for strict math compilation, but that it was still
>> >>>>>>>> > worth having
>> >>>>>>>> > them. Also, if we did map it to builtins then the scalar
>> >>>>>>>> > version would go
>> >>>>>>>> > through the optabs and the vector version would go through the
>> >>>>>>>> > target's builtin
>> >>>>>>>> > expansion, which doesn't seem very consistent.
>> >>>>>>>>
>> >>>>>>>> On another note ISTR you can't associate STRICT_MIN/MAX_EXPR and thus
>> >>>>>>>> you can't vectorize anyway?  (strict IEEE behavior is about NaNs,
>> >>>>>>>> correct?)
>> >>>>>>> I thought for this particular case associativity wasn't an issue?
>> >>>>>>> We're not doing any
>> >>>>>>> reductions here, just simply performing max/min operations on each
>> >>>>>>> pair of elements
>> >>>>>>> in the vectors. I thought for IEEE-compliant behaviour we just need to
>> >>>>>>> ensure that for
>> >>>>>>> each pair of elements if one element is a NaN we return the other one.
>> >>>>>>
>> >>>>>> Hmm, true.  Ok, my comment still stands - I don't see that using a
>> >>>>>> tree code is the best thing to do here.  You can add fmin/max optabs
>> >>>>>> and special expansion of BUILT_IN_FMIN/MAX and you can use a target
>> >>>>>> builtin for the vectorized variant.
>> >>>>>>
>> >>>>>> The reason I am pushing against a new tree code is that we'd have an
>> >>>>>> awful lot of similar codes when pushing other flag related IL
>> >>>>>> specialities to actual IL constructs.  And we still need to find a
>> >>>>>> consistent way to do that.
>> >>>>>
>> >>>>> In this case though the new code is really the "native" min/max operation
>> >>>>> for fp, rather than some weird flag-dependent behaviour.  Maybe it's
>> >>>>> a bit unfortunate that the non-strict min/max fp operation got mapped
>> >>>>> to the generic MIN_EXPR and MAX_EXPR when the non-strict version is really
>> >>>>> the flag-related modification.  The STRICT_* prefix is forced by that and
>> >>>>> might make it seem like more of a special case than it really is.
>> >>>>
>> >>>> In some sense.  But the "strict" version already has a builtin (just no
>> >>>> special expander in builtins.c).  We usually don't add 1:1 tree codes
>> >>>> for existing builtins (why have builtins at all then?).
>> >>>
>> >>> We still need the builtin to match the C function (and to allow direct
>> >>> calls to __builtin_fmin, etc., which are occasionally useful).
>> >>>
>> >>>>> If you're still not convinced, how about an internal function instead
>> >>>>> of a built-in function, so that we can continue to use optabs for all
>> >>>>> cases?  I'd really like to avoid forcing such a generic concept down to
>> >>>>> target-specific builtins with target-specific expansion code, especially
>> >>>>> when the same concept is exposed by target-independent code for scalars.
>> >>>>
>> >>>> The target builtin is for the vectorized variant - not all targets might have
>> >>>> that and we'd need to query the target about this.  So using a IFN would
>> >>>> mean adding a target hook for that query.
>> >>>
>> >>> No, the idea is that if we have a tree code or an internal function, the
>> >>> decision about whether we have target support is based on a query of the
>> >>> optabs (just like it is for scalar, and for other vectorisable tree codes).
>> >>> No new hooks are needed.
>> >>>
>> >>> The patch checked for target support that way.
>> >>
>> >> Fair enough.  Still this means we should have tree codes for all builtins
>> >> that eventually are vectorized?  So why don't we have SIN_EXPR,
>> >> POW_EXPR (ok, I did argue and have patches for that in the past),
>> >> RINT_EXPR, SQRT_EXPR, etc?
>> >
>> > Yeah, it doesn't sound so bad to me :-)  The choice of what's a function
>> > in C and what's inherent is pretty arbitrary.  E.g. % on doubles could
>> > have implemented fmod() or remainder().  Casts from double to int could
>> > have used the current rounding mode, but instead they truncate and
>> > conversions using the rounding mode need to go through something like
>> > (l)lrint().  Like you say, pow() could have been an operator (and is in
>> > many languages), but instead it's a function.
>> >
>> >> This patch starts to go down that route which is why I ask for the
>> >> whole picture to be considered and hinted at the alternative implementation
>> >> which follows existing practice.  Add a expander in builtins.c, add an optab,
>> >> and eventual support to vectorized_function.
>> >>
>> >> See for example ix86_builtin_vectorized_function which handles
>> >> sqrt, floor, ceil, etc. and even FMA (we only fold FMA to FMA_EXPR
>> >> if the target supports it for the scalar mode, so not sure if there is
>> >> any x86 ISA where it has vectorized FMA but not scalar FMA).
>> >
>> > Yeah.  TBH I'm really against doing that unless (a) there's good reason
>> > to believe that the concept really is specific to one target and
>> > wouldn't be implemented on others or (b) there really is a function
>> > rather than an instruction underneath (usually the case for sin, etc.).
>> > But (b) could also be handled by the optab support library mechanism.
>> >
>> > Reasons against using target-specific builtins for operations that
>> > have direct support in the ISA:
>> >
>> > 1. Like you say, in practice vector ops only tend to be supported if the
>> >    associated scalar op is also supported.  Sticking to this approach
>> >    means that vector ops follow a different path from scalar ops whereas
>> >    (for example) division follows the same path for both.  It just seems
>> >    confusing to have some floating-point optabs that support both scalar
>> >    and vector operands and others that only support scalar operands.
>> >
>> > 2. Once converted to a target-specific function, the target-independent
>> >    code has no idea what the function does or how expensive it is.
>> >    We might start out with just one hook to convert a scalar operation
>> >    to a target-dependent built-in function, but I bet over time we'll
>> >    grow other hooks to query properties about the function, such as
>> >    costs.
>> >
>> > 3. builtin_vectorized_function returns a decl rather than a call.
>> >    If the target's vector API doesn't already have a built-in for the
>> >    operation we need, with the exact types and arguments that we expect,
>> >    the target needs to define one, presumably marked so that it isn't
>> >    callable by input code.
>> >
>> >    E.g. on targets where FP conversion instructions allow an explicit
>> >    rounding mode to be specified as an operand, it's reasonable for a
>> >    target's vector API to expose that operand as a constant argument to
>> >    the API function.  There'd then be one API function for all vector-
>> >    float-to-vector-float integer rounding operations, rather than one
>> >    for vector rint(), one for vector ceil(), etc.  (I'm thinking of
>> >    System z instructions here, although I don't know offhand what the
>> >    vector API is there.)  IMO it doesn't make sense to force the target
>> >    to define "fake" built-in functions for all those possibilities
>> >    purely for the sake of the target hook.  It's a lot of extra code,
>> >    and it's extra code that would be duplicated on any target that needs
>> >    to do this.
>> >
>> > IMO optabs are the best way for the target to tell the target-independent
>> > code what it can do.  If it supports sqrt on df it defines sqrtdf and
>> > if it supports vector sqrt on v2df it defines sqrtv2df.  These patterns
>> > will often be a single define_expand or define_insn template -- the
>> > vectorness often comes "for free" in terms of writing the pattern.
>> >
>> >>>> > TBH though I'm not sure why an internal_fn value (or a target-specific
>> >>>> > builtin enum value) is worse than a tree-code value, unless the limit
>> >>>> > of the tree_code bitfield is in sight (maybe it is).
>> >>>>
>> >>>> I think tree_code is 64bits now.
>> >>>
>> >>> Even better :-)
>> >>
>> >> Yes.
>> >>
>> >> I'm not against adding a corresponding tree code for all math builtin functions,
>> >> we just have to decide whether this is the way to go (and of course support
>> >> expanding those back to libcalls to libc/m rather than libgcc).  There are
>> >> also constraints on what kind of STRICT_FMIN_EXPR the compiler may
>> >> generate as the target may not be able to expand the long double variant
>> >> directly but needs a libcall but libm might not be linked or may not
>> >> have support
>> >> for it.  That would be a new thing compared to libgcc providing a fallback
>> >> for all other tree codes.
>> >
>> > True, but that doesn't seem too bad.  The constraints would be the same
>> > if we're operating on built-in functions rather than codes.  I suppose
>> > built-in functions make this more explicit, but at the end of the day
>> > it's a costing decision.  We should no more be converting a cheap
>> > operation into an expensive libgcc function than converting a cheap
>> > operation into an expensive libm function, even if the libgcc conversion
>> > links.
>> >
>> > There's certainly precedent for introducing calls to things that libgcc
>> > doesn't define.  E.g. we already introduce calls to memcpy in things
>> > like loop distribution, even though we don't provide a fallback memcpy
>> > in libgcc.
>>
>> As an additional point for many math functions we have to support errno
>> which means, like, BUILT_IN_SQRT can be rewritten to SQRT_EXPR
>> only if -fno-math-errno is in effect.  But then code has to handle
>> both variants for things like constant folding and expression combining.
>> That's very unfortunate and something we want to avoid (one reason
>> the POW_EXPR thing didn't fly when I tried).  STRICT_FMIN/MAX_EXPR
>> is an example where this doesn't apply, of course (but I detest the name,
>> just use FMIN/FMAX_EXPR?).  Still you'd need to handle both,
>> FMIN_EXPR and BUILT_IN_FMIN, in code doing analysis/transform.
>>
>> Richard.
>>
>>
>> > Thanks,
>> > Richard
>> >
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PING][Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax*
  2015-08-19  9:48             ` Richard Biener
  2015-08-19 10:04               ` Richard Sandiford
@ 2015-08-19 15:07               ` Michael Matz
  2015-08-19 15:25                 ` Richard Biener
  1 sibling, 1 reply; 24+ messages in thread
From: Michael Matz @ 2015-08-19 15:07 UTC (permalink / raw)
  To: Richard Biener; +Cc: David Sherwood, GCC Patches, richard.sandiford

Hi,

On Wed, 19 Aug 2015, Richard Biener wrote:

> I think tree_code is 64bits now.

Huh?  No; it's 16 bit since 8 bit run out.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PING][Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax*
  2015-08-19 15:07               ` Michael Matz
@ 2015-08-19 15:25                 ` Richard Biener
  2015-08-19 15:39                   ` Richard Sandiford
  0 siblings, 1 reply; 24+ messages in thread
From: Richard Biener @ 2015-08-19 15:25 UTC (permalink / raw)
  To: Michael Matz; +Cc: David Sherwood, GCC Patches, richard.sandiford

On August 19, 2015 5:05:01 PM GMT+02:00, Michael Matz <matz@suse.de> wrote:
>Hi,
>
>On Wed, 19 Aug 2015, Richard Biener wrote:
>
>> I think tree_code is 64bits now.
>
>Huh?  No; it's 16 bit since 8 bit run out.

Err, that's what I was trying to say...
16bits, obviously.

BTW, in addition to errno math there is rounding math where we rely on virtual operands to not mess with ordering.

Richard.

>
>Ciao,
>Michael.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PING][Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax*
  2015-08-19 15:25                 ` Richard Biener
@ 2015-08-19 15:39                   ` Richard Sandiford
  0 siblings, 0 replies; 24+ messages in thread
From: Richard Sandiford @ 2015-08-19 15:39 UTC (permalink / raw)
  To: Richard Biener; +Cc: Michael Matz, David Sherwood, GCC Patches

Richard Biener <richard.guenther@gmail.com> writes:
> BTW, in addition to errno math there is rounding math where we rely on
> virtual operands to not mess with ordering.

But you know what I'm going to say to that.  Rounding affects arithmetic
just as much as things like pow().  (And also doesn't affect min/max.)

Richard

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2015-11-25 12:38 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-06  9:39 [PING][Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax* David Sherwood
2015-08-13 10:13 David Sherwood
2015-08-13 11:12 ` Richard Biener
2015-08-17  9:41   ` David Sherwood
2015-08-17 14:02     ` Richard Biener
2015-08-18 11:10       ` David Sherwood
2015-08-18 13:31         ` Richard Biener
2015-08-18 14:20           ` Richard Sandiford
2015-08-19  9:48             ` Richard Biener
2015-08-19 10:04               ` Richard Sandiford
2015-08-19 10:31                 ` Richard Biener
2015-08-19 12:23                   ` Richard Sandiford
2015-08-19 12:35                     ` Richard Biener
2015-08-19 13:16                       ` Richard Sandiford
2015-08-19 13:41                         ` Richard Biener
2015-09-14 10:47                           ` David Sherwood
2015-09-14 13:42                             ` Richard Biener
2015-09-14 20:38                               ` Joseph Myers
2015-08-19 15:32                       ` Joseph Myers
2015-11-23  9:21                       ` David Sherwood
2015-11-25 12:39                         ` Richard Biener
2015-08-19 15:07               ` Michael Matz
2015-08-19 15:25                 ` Richard Biener
2015-08-19 15:39                   ` Richard Sandiford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).