FENV_ACCESS status

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* FENV_ACCESS status
@ 2020-08-05 18:01 Marc Glisse
  2020-08-07  9:19 ` Richard Biener
  0 siblings, 1 reply; 5+ messages in thread
From: Marc Glisse @ 2020-08-05 18:01 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2909 bytes --]

Hello,

I updated the patch discussed in 
https://patchwork.ozlabs.org/project/gcc/patch/alpine.DEB.2.02.1906221743430.16432@grove.saclay.inria.fr/ 
and pushed it as something like refs/users/glisse/heads/fenv (first user 
branch in gcc's git, I hope it worked). I am also attaching the diff here.

I managed to compile and run real-world code with it, which is a good sign 
:-)

As should be obvious looking at the diff, there is a lot of work left. 
Experts may also find much better ways to rewrite several parts of the 
patch.

The option is called -ffenv-access so it doesn't interfere with 
-frounding-math, at least until we have something good enough to replace 
-frounding-math without too much performance regression.

I switched to hex float constants for DBL_MAX and others for C99+, I don't 
care about making fenv_access work in prehistoric modes. On the other 
hand, since I haven't started on fenv_round, this is probably useless for 
now.

Several changes, in particular the constexpr stuff, was needed to parse 
standard headers, otherwise I would have delayed the implementation.

Currently the floating point environment is represented by "memory" in 
general. Refining it so the compiler knows that storing a float does not 
change the rounding mode (for instance) should wait, in my opinion.

Conversions look like
.FENV_CONVERT (arg, (target_type*)0, 0)
the pointer is there so we know the target type, even if the lhs 
disappears at some point. The last 0 is the same as for all the others, a 
place to store options about the operation (do we care about rounding, 
about exceptions, etc), it is just a placeholder for now. I could rename 
it to .FENV_NOP since we seem to generate NOP usually, but it looked 
strange to me.

I did not do anything for templates in C++. As long as we have a constant 
global flag, it doesn't matter, but as soon as we will have a pragma, 
things will get messy, we will need to remember what the mode was when 
parsing, so we can use it at instantiation time... (or just declare that 
the pragma doesn't work with templates in a first version)

I am trying to have enough infrastructure in place so that the 
functionality is useful, and also so that implementing other pieces 
(parsing the pragma, C front-end, gimple optimizations, target hook for 
the asm string, opcode and target optimization, simd, etc) become 
independent and can be done by different people. It is unlikely that I can 
find the time to do everything. If other people want to contribute or even 
take over (assuming the branch does not look hopelessly bad to them), that 
would be great! That's also why I pushed it as a branch.

Apart from the obvious (making sure it bootstraps, running the testsuite, 
adding a few tests), what missing pieces do you consider a strict 
requirement for this to have a chance to reach master one day as an 
experimental option?

-- 
Marc Glisse

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Type: TEXT/x-diff; name=fenv5.patch, Size: 19210 bytes --]

commit 4adb494e88323bf41ee2c0871caa2323fa2aca06
Author: Marc Glisse <marc.glisse@inria.fr>
Date:   Wed Aug 5 18:49:57 2020 +0200

    Introduce -ffenv-access

diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c
index 74ecca8de8e..4c4d4f3d563 100644
--- a/gcc/c-family/c-cppbuiltin.c
+++ b/gcc/c-family/c-cppbuiltin.c
@@ -1635,16 +1635,23 @@ lazy_hex_fp_value (cpp_reader *, cpp_macro *macro, unsigned num)
 {
   REAL_VALUE_TYPE real;
   char dec_str[64], buf1[256];
+  size_t len;
 
   gcc_checking_assert (num < lazy_hex_fp_value_count);
 
-  real_from_string (&real, lazy_hex_fp_values[num].hex_str);
-  real_to_decimal_for_mode (dec_str, &real, sizeof (dec_str),
-			    lazy_hex_fp_values[num].digits, 0,
-			    lazy_hex_fp_values[num].mode);
+  if (!flag_isoc99)
+    {
+      real_from_string (&real, lazy_hex_fp_values[num].hex_str);
+      real_to_decimal_for_mode (dec_str, &real, sizeof (dec_str),
+				lazy_hex_fp_values[num].digits, 0,
+				lazy_hex_fp_values[num].mode);
+
+      len = sprintf (buf1, "%s%s", dec_str, lazy_hex_fp_values[num].fp_suffix);
+    }
+  else
+    len = sprintf (buf1, "%s%s", lazy_hex_fp_values[num].hex_str,
+		   lazy_hex_fp_values[num].fp_suffix);
 
-  size_t len
-    = sprintf (buf1, "%s%s", dec_str, lazy_hex_fp_values[num].fp_suffix);
   gcc_assert (len < sizeof (buf1));
   for (unsigned idx = 0; idx < macro->count; idx++)
     if (macro->exp.tokens[idx].type == CPP_NUMBER)
@@ -1701,13 +1708,16 @@ builtin_define_with_hex_fp_value (const char *macro,
      it's easy to get the exact correct value), parse it as a real,
      then print it back out as decimal.  */
 
-  real_from_string (&real, hex_str);
-  real_to_decimal_for_mode (dec_str, &real, sizeof (dec_str), digits, 0,
-			    TYPE_MODE (type));
+  if (!flag_isoc99)
+    {
+      real_from_string (&real, hex_str);
+      real_to_decimal_for_mode (dec_str, &real, sizeof (dec_str), digits, 0,
+				TYPE_MODE (type));
+    }
 
   /* Assemble the macro in the following fashion
      macro = fp_cast [dec_str fp_suffix] */
-  sprintf (buf2, "%s%s", dec_str, fp_suffix);
+  sprintf (buf2, "%s%s", flag_isoc99 ? hex_str : dec_str, fp_suffix);
   sprintf (buf1, fp_cast, buf2);
   sprintf (buf, "%s=%s", macro, buf1);
 
diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 2b1aca16eb4..10fac2acbfb 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -1554,6 +1554,10 @@ finput-charset=
 C ObjC C++ ObjC++ Joined RejectNegative
 -finput-charset=<cset>	Specify the default character set for source files.
 
+ffenv-access
+C++ ObjC++ Var(flag_fenv_access)
+Assume #pragma STDC FENV_ACCESS ON at the beginning of the translation unit (experimental and incomplete support).
+
 fextern-tls-init
 C++ ObjC++ Var(flag_extern_tls_init) Init(-1)
 Support dynamic initialization of thread-local variables in a different translation unit.
diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index f164b211c9f..f80eb56b7ac 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -9634,7 +9634,20 @@ build_cxx_call (tree fn, int nargs, tree *argarray,
 
   /* Remember roughly where this call is.  */
   location_t loc = cp_expr_loc_or_input_loc (fn);
-  fn = build_call_a (fn, nargs, argarray);
+  if (flag_fenv_access && TREE_CODE (fn) == ADDR_EXPR
+      && TREE_CODE (TREE_OPERAND (fn, 0)) == FUNCTION_DECL
+      && (fndecl_built_in_p (TREE_OPERAND (fn, 0), BUILT_IN_SQRT)
+	  || fndecl_built_in_p (TREE_OPERAND (fn, 0), BUILT_IN_SQRTF)
+	  || fndecl_built_in_p (TREE_OPERAND (fn, 0), BUILT_IN_SQRTL)))
+    {
+      fn = build_call_expr_internal_loc (loc, IFN_FENV_SQRT,
+					 TREE_TYPE (argarray[0]), 2,
+					 argarray[0], integer_zero_node);
+      TREE_SIDE_EFFECTS (fn) = 1;
+    }
+  else
+    fn = build_call_a (fn, nargs, argarray);
+
   SET_EXPR_LOCATION (fn, loc);
 
   fndecl = get_callee_fndecl (fn);
diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index b1c1d249c6e..4e4290ec3c4 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -1708,6 +1708,43 @@ cxx_eval_internal_function (const constexpr_ctx *ctx, tree t,
 	  }
       }
 
+    case IFN_FENV_FLOAT:
+    case IFN_FENV_CONVERT:
+      {
+	tree arg = cxx_eval_constant_expression (ctx, CALL_EXPR_ARG (t, 0),
+						 false, non_constant_p,
+						 overflow_p);
+	if (CONSTANT_CLASS_P (arg))
+	  {
+	    combined_fn cfn = CFN_FENV_CONVERT;
+	    if (CALL_EXPR_IFN (t) == IFN_FENV_FLOAT)
+	      cfn = CFN_FENV_FLOAT;
+	    tree res = fold_const_call (cfn, TREE_TYPE (t), arg,
+					CALL_EXPR_ARG (t, 1),
+					CALL_EXPR_ARG (t, 2));
+	    if (res)
+	      return res;
+	  }
+	*non_constant_p = true;
+	return t;
+      }
+
+    case IFN_FENV_SQRT:
+      {
+	tree arg = cxx_eval_constant_expression (ctx, CALL_EXPR_ARG (t, 0),
+						 false, non_constant_p,
+						 overflow_p);
+	if (CONSTANT_CLASS_P (arg))
+	  {
+	    tree res = fold_const_call (CFN_FENV_SQRT, TREE_TYPE (t), arg,
+					CALL_EXPR_ARG (t, 1));
+	    if (res)
+	      return res;
+	  }
+	*non_constant_p = true;
+	return t;
+      }
+
     default:
       if (!ctx->quiet)
 	error_at (cp_expr_loc_or_input_loc (t),
@@ -7486,6 +7523,9 @@ potential_constant_expression_1 (tree t, bool want_rval, bool strict, bool now,
 		case IFN_MUL_OVERFLOW:
 		case IFN_LAUNDER:
 		case IFN_VEC_CONVERT:
+		case IFN_FENV_FLOAT:
+		case IFN_FENV_CONVERT:
+		case IFN_FENV_SQRT:
 		  bail = false;
 		  break;
 
diff --git a/gcc/cp/cvt.c b/gcc/cp/cvt.c
index c9e7b1ff044..b47dfe6078f 100644
--- a/gcc/cp/cvt.c
+++ b/gcc/cp/cvt.c
@@ -910,7 +910,32 @@ ocp_convert (tree type, tree expr, int convtype, int flags,
 		      TREE_TYPE (e));
 	}
       if (code == REAL_TYPE)
-	return convert_to_real_maybe_fold (type, e, dofold);
+	{
+	  if (flag_fenv_access && SCALAR_FLOAT_TYPE_P (TREE_TYPE (e))
+	      && TREE_TYPE (e) != type)
+	    {
+	      // encode the type information, in case the lhs disappears
+	      tree z = build_zero_cst (build_pointer_type (type));
+	      tree result
+		= build_call_expr_internal_loc (loc, IFN_FENV_CONVERT,
+						type, 3, e, z,
+						integer_zero_node);
+	      TREE_SIDE_EFFECTS (result) = 1;
+	      return result;
+	    }
+	  if (flag_fenv_access && INTEGRAL_TYPE_P (TREE_TYPE (e)))
+	    {
+	      // encode the type information, in case the lhs disappears
+	      tree z = build_zero_cst (build_pointer_type (type));
+	      tree result
+		= build_call_expr_internal_loc (loc, IFN_FENV_FLOAT,
+						type, 3, e, z,
+						integer_zero_node);
+	      TREE_SIDE_EFFECTS (result) = 1;
+	      return result;
+	    }
+	  return convert_to_real_maybe_fold (type, e, dofold);
+	}
       else if (code == COMPLEX_TYPE)
 	return convert_to_complex_maybe_fold (type, e, dofold);
     }
diff --git a/gcc/cp/typeck.c b/gcc/cp/typeck.c
index a557f3439a8..5685c7a01d5 100644
--- a/gcc/cp/typeck.c
+++ b/gcc/cp/typeck.c
@@ -5768,6 +5768,38 @@ cp_build_binary_op (const op_location_t &location,
 	instrument_expr = instrument_expr1;
     }
 
+  // FIXME: vectors (and complex?) as well
+  if (flag_fenv_access && SCALAR_FLOAT_TYPE_P (build_type))
+    {
+      bool do_fenv_subst = true;
+      internal_fn ifn;
+      switch (resultcode)
+	{
+	case PLUS_EXPR:
+	  ifn = IFN_FENV_PLUS;
+	  break;
+	case MINUS_EXPR:
+	  ifn = IFN_FENV_MINUS;
+	  break;
+	case MULT_EXPR:
+	  ifn = IFN_FENV_MULT;
+	  break;
+	case RDIV_EXPR:
+	  ifn = IFN_FENV_RDIV;
+	  break;
+	default:
+	  do_fenv_subst = false;
+	}
+      if (do_fenv_subst)
+	{
+	  result = build_call_expr_internal_loc (location, ifn, build_type,
+						 3, op0, op1,
+						 integer_zero_node);
+	  TREE_SIDE_EFFECTS (result) = 1;
+	  return result;
+	}
+    }
+
   result = build2_loc (location, resultcode, build_type, op0, op1);
   if (final_type != 0)
     result = cp_convert (final_type, result, complain);
diff --git a/gcc/fold-const-call.c b/gcc/fold-const-call.c
index c9e368db9d0..c77e0d70884 100644
--- a/gcc/fold-const-call.c
+++ b/gcc/fold-const-call.c
@@ -689,6 +689,35 @@ fold_const_vec_convert (tree ret_type, tree arg)
   return elts.build ();
 }
 
+/* Fold a call to IFN_FENV_CONVERT (ARG) returning TYPE.  */
+
+static tree
+fold_const_fenv_convert (tree ret_type, tree arg0, tree arg1, tree arg2)
+{
+  // FIXME!
+  return NULL_TREE;
+}
+
+/* Fold a call to IFN_FENV_FLOAT (ARG) returning TYPE.  */
+
+static tree
+fold_const_fenv_float (tree ret_type, tree arg0, tree arg1, tree arg2)
+{
+  // FIXME!
+  if (integer_zerop (arg0))
+    return build_zero_cst (ret_type);
+  return NULL_TREE;
+}
+
+/* Fold a call to IFN_FENV_SQRT (ARG) returning TYPE.  */
+
+static tree
+fold_const_fenv_sqrt (tree ret_type, tree arg0, tree arg1)
+{
+  // FIXME!
+  return NULL_TREE;
+}
+
 /* Try to evaluate:
 
       IFN_WHILE_ULT (ARG0, ARG1, (TYPE) { ... })
@@ -1675,6 +1704,9 @@ fold_const_call (combined_fn fn, tree type, tree arg0, tree arg1)
     case CFN_FOLD_LEFT_PLUS:
       return fold_const_fold_left (type, arg0, arg1, PLUS_EXPR);
 
+    case CFN_FENV_SQRT:
+      return fold_const_fenv_sqrt (type, arg0, arg1);
+
     default:
       return fold_const_call_1 (fn, type, arg0, arg1);
     }
@@ -1834,6 +1866,12 @@ fold_const_call (combined_fn fn, tree type, tree arg0, tree arg1, tree arg2)
 	return NULL_TREE;
       }
 
+    case CFN_FENV_CONVERT:
+      return fold_const_fenv_convert (type, arg0, arg1, arg2);
+
+    case CFN_FENV_FLOAT:
+      return fold_const_fenv_float (type, arg0, arg1, arg2);
+
     default:
       return fold_const_call_1 (fn, type, arg0, arg1, arg2);
     }
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index 8efc77d986b..b92f343622b 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -30,6 +30,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-vrp.h"
 #include "tree-ssanames.h"
 #include "expmed.h"
+#include "explow.h"
 #include "memmodel.h"
 #include "optabs.h"
 #include "emit-rtl.h"
@@ -3000,6 +3001,265 @@ expand_NOP (internal_fn, gcall *)
   /* Nothing.  But it shouldn't really prevail.  */
 }
 
+/* Checks if OP is already protected (in particular the result of some FENV
+   arithmetic) and thus can skip the inline asm.  */
+static bool
+already_protected (tree op)
+{
+  // TODO: function parameters/returns, volatile reads, etc may also be ok?
+  gimple *def_stmt;
+  if (TREE_CODE (op) != SSA_NAME
+      || !(def_stmt = SSA_NAME_DEF_STMT (op))
+      || gimple_code (def_stmt) != GIMPLE_CALL
+      || !gimple_call_internal_p (def_stmt))
+    return false;
+  switch (gimple_call_internal_fn (def_stmt))
+    {
+    case IFN_FENV_PLUS:
+    case IFN_FENV_MINUS:
+    case IFN_FENV_MULT:
+    case IFN_FENV_RDIV:
+    case IFN_FENV_SQRT:
+    case IFN_FENV_FLOAT:
+    case IFN_FENV_CONVERT:
+      return true;
+    default:
+      return false;
+    }
+}
+
+/* Emit asm volatile with inout memory parameter X.  */
+static void
+emit_asm_protection (rtx x, machine_mode mode, location_t loc)
+{
+  const char* constraint = "=m";
+  rtvec argvec = rtvec_alloc (1);
+  rtvec constraintvec = rtvec_alloc (1);
+  rtvec labelvec = rtvec_alloc (0);
+  rtx body = gen_rtx_ASM_OPERANDS (mode, "", "", 0, argvec, constraintvec,
+				   labelvec, loc);
+  MEM_VOLATILE_P (body) = true;
+  ASM_OPERANDS_INPUT (body, 0) = x;
+  ASM_OPERANDS_INPUT_CONSTRAINT_EXP (body, 0)
+    = gen_rtx_ASM_INPUT_loc (mode, "0", loc);
+  ASM_OPERANDS_OUTPUT_CONSTRAINT (body) = constraint;
+  emit_insn (gen_rtx_SET (x, body));
+}
+
+/* Expand FENV-protected arithmetic.  */
+static void
+expand_FENV_binop (gcall *stmt, tree_code code)
+{
+  tree lhs = gimple_call_lhs (stmt);
+  tree op0 = gimple_call_arg (stmt, 0);
+  tree op1 = gimple_call_arg (stmt, 1);
+  tree ftype = TREE_TYPE (op0);
+  machine_mode fmode = TYPE_MODE (ftype);
+  location_t loc = gimple_location (stmt);
+  rtx x0, x1;
+  bool skip0 = false;
+
+  if (already_protected (op0))
+    {
+      skip0 = true;
+      x0 = expand_normal (op0);
+    }
+  else
+    {
+      x0 = validize_mem (assign_temp (ftype, true, true));
+      expand_assignment (make_tree (ftype, x0), op0, false);
+      emit_asm_protection (x0, fmode, loc);
+    }
+
+  /* Careful not to end up with something like X - X, which could get
+     simplified.  */
+  if (!skip0 && already_protected (op1))
+    {
+      x1 = expand_normal (op1);
+    }
+  else
+    {
+      x1 = validize_mem (assign_temp (ftype, true, true));
+      expand_assignment (make_tree (ftype, x1), op1, false);
+      emit_asm_protection (x1, fmode, loc);
+    }
+
+  rtx x2 = validize_mem (assign_temp (ftype, true, true));
+  // Using some intermediate registers for x0 and x1 could help x86 avoid
+  // reg+mem when it isn't necessary. It could also save a bit of stack.
+  optab o = optab_for_tree_code (code, ftype, optab_default);
+  rtx y2 = expand_binop (fmode, o, x0, x1, x2, false, OPTAB_DIRECT);
+  if (x2 != y2)
+    emit_move_insn (x2, y2);
+  emit_asm_protection (x2, fmode, loc);
+  if (lhs)
+    expand_assignment (lhs, make_tree (ftype, x2), false);
+  crtl->has_asm_statement = 1;
+}
+
+static void
+expand_FENV_PLUS (internal_fn, gcall *stmt)
+{
+  expand_FENV_binop (stmt, PLUS_EXPR);
+}
+
+static void
+expand_FENV_MINUS (internal_fn, gcall *stmt)
+{
+  expand_FENV_binop (stmt, MINUS_EXPR);
+}
+
+static void
+expand_FENV_MULT (internal_fn, gcall *stmt)
+{
+  expand_FENV_binop (stmt, MULT_EXPR);
+}
+
+static void
+expand_FENV_RDIV (internal_fn, gcall *stmt)
+{
+  expand_FENV_binop (stmt, RDIV_EXPR);
+}
+
+#if 0
+static void
+expand_FENV_NEGATE (internal_fn, gcall *stmt)
+{
+  tree lhs = gimple_call_lhs (stmt);
+  tree op = gimple_call_arg (stmt, 0);
+  tree ftype = TREE_TYPE (op);
+  machine_mode fmode = TYPE_MODE (ftype);
+  location_t loc = gimple_location (stmt);
+  rtx x;
+
+  if (already_protected (op))
+    {
+      x = expand_normal (op);
+    }
+  else
+    {
+      x = validize_mem (assign_temp (ftype, true, true));
+      expand_assignment (make_tree (ftype, x), op, false);
+      emit_asm_protection (x, fmode, loc);
+    }
+
+  rtx x2 = validize_mem (assign_temp (ftype, true, true));
+  optab o = optab_for_tree_code (NEGATE_EXPR, ftype, optab_default);
+  rtx y2 = expand_unop (fmode, o, x, x2, false);
+  if (x2 != y2)
+    emit_move_insn (x2, y2);
+  emit_asm_protection (x2, fmode, loc);
+  if (lhs)
+    expand_assignment (lhs, make_tree (ftype, x2), false);
+  crtl->has_asm_statement = 1;
+}
+#endif
+
+static void
+expand_FENV_SQRT (internal_fn, gcall *stmt)
+{
+  // We cannot delegate to expand_call_stmt of a dummy stmt with BUILT_IN_SQRT
+  // because it does not always generate a call.
+  tree lhs = gimple_call_lhs (stmt);
+  tree op = gimple_call_arg (stmt, 0);
+  tree ftype = TREE_TYPE (op);
+  machine_mode fmode = TYPE_MODE (ftype);
+  location_t loc = gimple_location (stmt);
+  built_in_function f;
+  if (fmode == TYPE_MODE (float_type_node))
+    f = BUILT_IN_SQRTF;
+  else if (fmode == TYPE_MODE (double_type_node))
+    f = BUILT_IN_SQRT;
+  else if (fmode == TYPE_MODE (long_double_type_node))
+    f = BUILT_IN_SQRTL;
+  else
+    gcc_unreachable();
+  tree sqrtfn = mathfn_built_in (ftype, f);
+  tree exp = build_vl_exp (CALL_EXPR, 1 + 3);
+  CALL_EXPR_FN (exp)
+    = build1_loc (loc, ADDR_EXPR,
+		  build_pointer_type (TREE_TYPE (sqrtfn)), sqrtfn);
+  TREE_TYPE (exp) = ftype;
+  CALL_EXPR_STATIC_CHAIN (exp) = gimple_call_chain (stmt); // ???
+  TREE_SIDE_EFFECTS (exp) = 1;
+  TREE_NOTHROW (exp) = 1;
+  CALL_EXPR_TAILCALL (exp) = gimple_call_tail_p (stmt);
+  CALL_FROM_THUNK_P (exp) = gimple_call_from_thunk_p (stmt);
+  CALL_EXPR_ARG (exp, 0) = op;
+  SET_EXPR_LOCATION (exp, loc);
+  // debug?
+
+  rtx res = validize_mem (assign_temp (ftype, true, true));
+  tree tmp = make_tree (ftype, res);
+  expand_assignment (tmp, exp, false);
+  // Still needed to avoid DCE when the result is unused.
+  emit_asm_protection (res, fmode, loc);
+  if (lhs)
+    expand_assignment (lhs, tmp, false);
+}
+
+static void
+expand_FENV_FLOAT (internal_fn, gcall *stmt)
+{
+  tree lhs = gimple_call_lhs (stmt);
+  tree op = gimple_call_arg (stmt, 0);
+  tree itype = TREE_TYPE (op);
+  tree otype = TREE_TYPE (TREE_TYPE (gimple_call_arg (stmt, 1)));
+  machine_mode imode = TYPE_MODE (itype);
+  machine_mode omode = TYPE_MODE (otype);
+  location_t loc = gimple_location (stmt);
+  rtx x;
+
+  if (already_protected (op))
+    {
+      x = expand_normal (op);
+    }
+  else
+    {
+      x = validize_mem (assign_temp (itype, true, true));
+      expand_assignment (make_tree (itype, x), op, false);
+      emit_asm_protection (x, imode, loc);
+    }
+
+  rtx x2 = validize_mem (assign_temp (otype, true, true));
+  expand_float (x2, x, TYPE_UNSIGNED (itype));
+  emit_asm_protection (x2, omode, loc);
+  if (lhs)
+    expand_assignment (lhs, make_tree (otype, x2), false);
+  crtl->has_asm_statement = 1;
+}
+
+static void
+expand_FENV_CONVERT (internal_fn, gcall *stmt)
+{
+  tree lhs = gimple_call_lhs (stmt);
+  tree op = gimple_call_arg (stmt, 0);
+  tree itype = TREE_TYPE (op);
+  tree otype = TREE_TYPE (TREE_TYPE (gimple_call_arg (stmt, 1)));
+  machine_mode imode = TYPE_MODE (itype);
+  machine_mode omode = TYPE_MODE (otype);
+  location_t loc = gimple_location (stmt);
+  rtx x;
+
+  if (already_protected (op))
+    {
+      x = expand_normal (op);
+    }
+  else
+    {
+      x = validize_mem (assign_temp (itype, true, true));
+      expand_assignment (make_tree (itype, x), op, false);
+      emit_asm_protection (x, imode, loc);
+    }
+
+  rtx x2 = validize_mem (assign_temp (otype, true, true));
+  convert_move (x2, x, 0);
+  emit_asm_protection (x2, omode, loc);
+  if (lhs)
+    expand_assignment (lhs, make_tree (otype, x2), false);
+  crtl->has_asm_statement = 1;
+}
+
 /* Coroutines, all should have been processed at this stage.  */
 
 static void
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 13e60828fcf..014fc44f6e6 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -386,6 +386,15 @@ DEF_INTERNAL_FN (CO_FRAME, ECF_PURE | ECF_NOTHROW | ECF_LEAF, NULL)
 /* A NOP function with arbitrary arguments and return value.  */
 DEF_INTERNAL_FN (NOP, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
 
+/* float operations with rounding / exception flags.  */
+DEF_INTERNAL_FN (FENV_PLUS, ECF_LEAF | ECF_NOTHROW, NULL)
+DEF_INTERNAL_FN (FENV_MINUS, ECF_LEAF | ECF_NOTHROW, NULL)
+DEF_INTERNAL_FN (FENV_MULT, ECF_LEAF | ECF_NOTHROW, NULL)
+DEF_INTERNAL_FN (FENV_RDIV, ECF_LEAF | ECF_NOTHROW, NULL)
+DEF_INTERNAL_FN (FENV_FLOAT, ECF_LEAF | ECF_NOTHROW, NULL)
+DEF_INTERNAL_FN (FENV_CONVERT, ECF_LEAF | ECF_NOTHROW, NULL)
+DEF_INTERNAL_FN (FENV_SQRT, ECF_LEAF | ECF_NOTHROW, NULL)
+
 #undef DEF_INTERNAL_INT_FN
 #undef DEF_INTERNAL_FLT_FN
 #undef DEF_INTERNAL_FLT_FLOATN_FN

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: FENV_ACCESS status
  2020-08-05 18:01 FENV_ACCESS status Marc Glisse
@ 2020-08-07  9:19 ` Richard Biener
  2020-08-07 10:45   ` Marc Glisse
  0 siblings, 1 reply; 5+ messages in thread
From: Richard Biener @ 2020-08-07  9:19 UTC (permalink / raw)
  To: Marc Glisse; +Cc: GCC Patches

On Wed, Aug 5, 2020 at 8:02 PM Marc Glisse <marc.glisse@inria.fr> wrote:
>
> Hello,
>
> I updated the patch discussed in
> https://patchwork.ozlabs.org/project/gcc/patch/alpine.DEB.2.02.1906221743430.16432@grove.saclay.inria.fr/
> and pushed it as something like refs/users/glisse/heads/fenv (first user
> branch in gcc's git, I hope it worked). I am also attaching the diff here.
>
> I managed to compile and run real-world code with it, which is a good sign
> :-)
>
> As should be obvious looking at the diff, there is a lot of work left.
> Experts may also find much better ways to rewrite several parts of the
> patch.
>
> The option is called -ffenv-access so it doesn't interfere with
> -frounding-math, at least until we have something good enough to replace
> -frounding-math without too much performance regression.
>
> I switched to hex float constants for DBL_MAX and others for C99+, I don't
> care about making fenv_access work in prehistoric modes. On the other
> hand, since I haven't started on fenv_round, this is probably useless for
> now.
>
> Several changes, in particular the constexpr stuff, was needed to parse
> standard headers, otherwise I would have delayed the implementation.
>
> Currently the floating point environment is represented by "memory" in
> general. Refining it so the compiler knows that storing a float does not
> change the rounding mode (for instance) should wait, in my opinion.
>
> Conversions look like
> .FENV_CONVERT (arg, (target_type*)0, 0)
> the pointer is there so we know the target type, even if the lhs
> disappears at some point. The last 0 is the same as for all the others, a
> place to store options about the operation (do we care about rounding,
> about exceptions, etc), it is just a placeholder for now. I could rename
> it to .FENV_NOP since we seem to generate NOP usually, but it looked
> strange to me.

You could carry the info in the existing flags operand if you make that a
pointer ...

> I did not do anything for templates in C++. As long as we have a constant
> global flag, it doesn't matter, but as soon as we will have a pragma,
> things will get messy, we will need to remember what the mode was when
> parsing, so we can use it at instantiation time... (or just declare that
> the pragma doesn't work with templates in a first version)
>
> I am trying to have enough infrastructure in place so that the
> functionality is useful, and also so that implementing other pieces
> (parsing the pragma, C front-end, gimple optimizations, target hook for
> the asm string, opcode and target optimization, simd, etc) become
> independent and can be done by different people. It is unlikely that I can
> find the time to do everything. If other people want to contribute or even
> take over (assuming the branch does not look hopelessly bad to them), that
> would be great! That's also why I pushed it as a branch.
>
> Apart from the obvious (making sure it bootstraps, running the testsuite,
> adding a few tests), what missing pieces do you consider a strict
> requirement for this to have a chance to reach master one day as an
> experimental option?

Adding some info missing above from reading the patch.

The idea seems to be to turn FP operations like PLUS_EXPR, FLOAT_EXPR
but also (only?) calls to BUILT_IN_SQRT to internal functions named
IFN_FENV_* where the internal function presumably has some extra
information.

You have

+/* float operations with rounding / exception flags.  */
+DEF_INTERNAL_FN (FENV_PLUS, ECF_LEAF | ECF_NOTHROW, NULL)
+DEF_INTERNAL_FN (FENV_MINUS, ECF_LEAF | ECF_NOTHROW, NULL)
+DEF_INTERNAL_FN (FENV_MULT, ECF_LEAF | ECF_NOTHROW, NULL)
+DEF_INTERNAL_FN (FENV_RDIV, ECF_LEAF | ECF_NOTHROW, NULL)
+DEF_INTERNAL_FN (FENV_FLOAT, ECF_LEAF | ECF_NOTHROW, NULL)
+DEF_INTERNAL_FN (FENV_CONVERT, ECF_LEAF | ECF_NOTHROW, NULL)
+DEF_INTERNAL_FN (FENV_SQRT, ECF_LEAF | ECF_NOTHROW, NULL)

so with -fnon-call-exceptions they will not be throwing (but regular
FP PLUS_EXPR
would).  They will appear to alter memory state - that's probably to have the
extra dependence on FENV changing/querying operations but then why do you
still need to emit asm()s?

I suppose the (currently unused) flags parameter could be populated with
some known FP ENV state and then limited optimization across stmts
with the same non-zero state could be done?

Using internal function calls paints us a bit into a corner since they are still
subject to the single-SSA def restriction in case we'd want to make FENV
dataflow more explicit.  What's the advantage of internal functions compared
to using asms for the operations themselves if we wrap this class into
a set of "nicer" helpers?

One complication with tracking data-flow is "unknown" stuff, I'd suggest
to invent a mediator between memory state and FP state which would
semantically be load and store operations of the FP state from/to memory.

That said, you're the one doing the work and going with internal functions
is reasonable - I'm not sure to what extent optimization for FENV acccess
code will ever be possible (or wanted/expected).  So going more precise
might not have any advantage.

You needed to guard SQRT - will you need to guard other math functions?
(round, etc.)

If we need to keep the IFNs use memory state they will count towards
walk limits of the alias oracle even if they can be disambiguated against.
This will affect both compile-time and optimizations.

+  /* Careful not to end up with something like X - X, which could get
+     simplified.  */
+  if (!skip0 && already_protected (op1))

we're already relying on RTL not optimizing (x + 0.5) - 0.5 but since
that would involve association the simple X - X case might indeed
be optimized (but wouldn't that be a bug if it is not correct?)

Thanks,
Richard.

> --
> Marc Glisse

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: FENV_ACCESS status
  2020-08-07  9:19 ` Richard Biener
@ 2020-08-07 10:45   ` Marc Glisse
  2020-08-07 12:01     ` Richard Biener
  0 siblings, 1 reply; 5+ messages in thread
From: Marc Glisse @ 2020-08-07 10:45 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

Thank you for your comments.

On Fri, 7 Aug 2020, Richard Biener wrote:

>> Conversions look like
>> .FENV_CONVERT (arg, (target_type*)0, 0)
>> the pointer is there so we know the target type, even if the lhs
>> disappears at some point. The last 0 is the same as for all the others, a
>> place to store options about the operation (do we care about rounding,
>> about exceptions, etc), it is just a placeholder for now. I could rename
>> it to .FENV_NOP since we seem to generate NOP usually, but it looked
>> strange to me.
>
> You could carry the info in the existing flags operand if you make that a
> pointer ...

Ah, true, I forgot that some other trees already use this kind of trick.
Not super pretty, but probably better than an extra argument.

> Adding some info missing above from reading the patch.
>
> The idea seems to be to turn FP operations like PLUS_EXPR, FLOAT_EXPR
> but also (only?) calls to BUILT_IN_SQRT to internal functions named
> IFN_FENV_* where the internal function presumably has some extra
> information.

Sqrt does seem to have a special place in IEEE 754, and in practice some
targets have instructions (with rounding) for it.

> You have
>
> +/* float operations with rounding / exception flags.  */
> +DEF_INTERNAL_FN (FENV_PLUS, ECF_LEAF | ECF_NOTHROW, NULL)
> +DEF_INTERNAL_FN (FENV_MINUS, ECF_LEAF | ECF_NOTHROW, NULL)
> +DEF_INTERNAL_FN (FENV_MULT, ECF_LEAF | ECF_NOTHROW, NULL)
> +DEF_INTERNAL_FN (FENV_RDIV, ECF_LEAF | ECF_NOTHROW, NULL)
> +DEF_INTERNAL_FN (FENV_FLOAT, ECF_LEAF | ECF_NOTHROW, NULL)
> +DEF_INTERNAL_FN (FENV_CONVERT, ECF_LEAF | ECF_NOTHROW, NULL)
> +DEF_INTERNAL_FN (FENV_SQRT, ECF_LEAF | ECF_NOTHROW, NULL)
>
> so with -fnon-call-exceptions they will not be throwing (but regular
> FP PLUS_EXPR would).

Hmm, ok, I guess I should remove ECF_NOTHROW then, the priority should
be to be correct, we can carefully reintroduce optimizations later.

> They will appear to alter memory state - that's probably to have the
> extra dependence on FENV changing/querying operations but then why do you
> still need to emit asm()s?

The IFNs are for GIMPLE and represent the operations, while the asm are 
simple passthrough for RTL, I replace the first with the second (plus the 
regular operation) at expansion.

> I suppose the (currently unused) flags parameter could be populated with
> some known FP ENV state and then limited optimization across stmts
> with the same non-zero state could be done?

I was mostly thinking of storing information like:
* don't care about the rounding mode for this operation
* may drop exceptions produced by this operation
* may produce extra exceptions
* don't care about signed zero
* may contract into FMA
* don't care about errno (for sqrt?)
etc

With fenv_round, we would actually have to store the rounding mode of
the operation (upward, towards-zero, dynamic, don't-care, etc), a bit
less nice because 0 is not a safe fallback anymore. We could also store
it when we detect a call to fesetround before, but we have to be careful
that this doesn't result in even more calls to fesetround at expansion
for targets that do not have statically rounded operations.

If there are other, better things to store there, great.

> Using internal function calls paints us a bit into a corner since they are still
> subject to the single-SSA def restriction in case we'd want to make FENV
> dataflow more explicit.  What's the advantage of internal functions compared
> to using asms for the operations themselves if we wrap this class into
> a set of "nicer" helpers?

I wanted the representation on gimple to look a bit nice so it would be 
both easy to read in the dumps, and not too hard to write optimizations 
for, and a function call looked good enough. Making FENV dataflow explicit 
would mean having PHIs for FENV, etc? At most I thought FENV would be 
represented by one specific memory region which would not alias user 
variables of type float or double, in particular.

I don't really see what it would look like with asms and helpers. In some 
sense, the IFNs are already wrappers, that we unwrap at expansion. Your 
asms would take some FENV as intput and output, so we have to track what 
FENV to use where, similar to .MEM.

> One complication with tracking data-flow is "unknown" stuff, I'd suggest
> to invent a mediator between memory state and FP state which would
> semantically be load and store operations of the FP state from/to memory.

All I can think of is make FP state a particular variable in memory, and 
teach alias analysis that those functions only read/write to this 
variable. What do you have in mind, splitting operations as:

fenv0 = read_fenv()
(res, fenv1) = oper(arg0, arg1, fenv0)
store_fenv(fenv1)

so that "oper" itself is const? (and hopefully simplify consecutive 
read_fenv/store_fenv so there are fewer of them) I wonder if lying about 
the constness of the operation may be problematic.

(and asm would be abused as a way to return a pair, with hopefully some 
marker so we know it isn't a real asm)

> That said, you're the one doing the work and going with internal functions
> is reasonable - I'm not sure to what extent optimization for FENV acccess
> code will ever be possible (or wanted/expected).  So going more precise
> might not have any advantage.

I think some optimizations are expected. For instance, not having to 
re-read the same number from memory many times just because there was an 
addition in between (which could write to fenv but that's it). Some may 
still want FMA (with a consistent rounding direction). For those (like me) 
who usually only care about rounding and not exceptions, making the 
operations pure would be great, and nothing says we cannot vectorize those 
rounded operations!

I am trying to be realistic with what I can achieve, but if you think the 
IFNs would paint us into a corner, then we can drop this approach.

> You needed to guard SQRT - will you need to guard other math functions?
> (round, etc.)

Maybe, but probably not many. I thought I might have to guard all of them 
(sin, cos, etc), but IIRC Joseph's comment seemed to imply that this 
wouldn't be necessary. I am likely missing FMA now...

> If we need to keep the IFNs use memory state they will count towards
> walk limits of the alias oracle even if they can be disambiguated against.
> This will affect both compile-time and optimizations.

Yes...

> +  /* Careful not to end up with something like X - X, which could get
> +     simplified.  */
> +  if (!skip0 && already_protected (op1))
>
> we're already relying on RTL not optimizing (x + 0.5) - 0.5 but since
> that would involve association the simple X - X case might indeed
> be optimized (but wouldn't that be a bug if it is not correct?)

Indeed we do not currently simplify X-X without -ffinite-math-only. 
However, I am trying to be safe, and whether we can simplify or not is 
something that depends on each operation (what the pragma said at that 
point in the source code), while flag_finite_math_only is at best per 
function.

-- 
Marc Glisse

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: FENV_ACCESS status
  2020-08-07 10:45   ` Marc Glisse
@ 2020-08-07 12:01     ` Richard Biener
  2020-08-07 16:45       ` Marc Glisse
  0 siblings, 1 reply; 5+ messages in thread
From: Richard Biener @ 2020-08-07 12:01 UTC (permalink / raw)
  To: Marc Glisse; +Cc: GCC Patches

On Fri, Aug 7, 2020 at 12:45 PM Marc Glisse <marc.glisse@inria.fr> wrote:
>
> Thank you for your comments.
>
> On Fri, 7 Aug 2020, Richard Biener wrote:
>
> >> Conversions look like
> >> .FENV_CONVERT (arg, (target_type*)0, 0)
> >> the pointer is there so we know the target type, even if the lhs
> >> disappears at some point. The last 0 is the same as for all the others, a
> >> place to store options about the operation (do we care about rounding,
> >> about exceptions, etc), it is just a placeholder for now. I could rename
> >> it to .FENV_NOP since we seem to generate NOP usually, but it looked
> >> strange to me.
> >
> > You could carry the info in the existing flags operand if you make that a
> > pointer ...
>
> Ah, true, I forgot that some other trees already use this kind of trick.
> Not super pretty, but probably better than an extra argument.
>
> > Adding some info missing above from reading the patch.
> >
> > The idea seems to be to turn FP operations like PLUS_EXPR, FLOAT_EXPR
> > but also (only?) calls to BUILT_IN_SQRT to internal functions named
> > IFN_FENV_* where the internal function presumably has some extra
> > information.
>
> Sqrt does seem to have a special place in IEEE 754, and in practice some
> targets have instructions (with rounding) for it.
>
> > You have
> >
> > +/* float operations with rounding / exception flags.  */
> > +DEF_INTERNAL_FN (FENV_PLUS, ECF_LEAF | ECF_NOTHROW, NULL)
> > +DEF_INTERNAL_FN (FENV_MINUS, ECF_LEAF | ECF_NOTHROW, NULL)
> > +DEF_INTERNAL_FN (FENV_MULT, ECF_LEAF | ECF_NOTHROW, NULL)
> > +DEF_INTERNAL_FN (FENV_RDIV, ECF_LEAF | ECF_NOTHROW, NULL)
> > +DEF_INTERNAL_FN (FENV_FLOAT, ECF_LEAF | ECF_NOTHROW, NULL)
> > +DEF_INTERNAL_FN (FENV_CONVERT, ECF_LEAF | ECF_NOTHROW, NULL)
> > +DEF_INTERNAL_FN (FENV_SQRT, ECF_LEAF | ECF_NOTHROW, NULL)
> >
> > so with -fnon-call-exceptions they will not be throwing (but regular
> > FP PLUS_EXPR would).
>
> Hmm, ok, I guess I should remove ECF_NOTHROW then, the priority should
> be to be correct, we can carefully reintroduce optimizations later.
>
> > They will appear to alter memory state - that's probably to have the
> > extra dependence on FENV changing/querying operations but then why do you
> > still need to emit asm()s?
>
> The IFNs are for GIMPLE and represent the operations, while the asm are
> simple passthrough for RTL, I replace the first with the second (plus the
> regular operation) at expansion.

Ah, OK.

> > I suppose the (currently unused) flags parameter could be populated with
> > some known FP ENV state and then limited optimization across stmts
> > with the same non-zero state could be done?
>
> I was mostly thinking of storing information like:
> * don't care about the rounding mode for this operation
> * may drop exceptions produced by this operation
> * may produce extra exceptions
> * don't care about signed zero
> * may contract into FMA
> * don't care about errno (for sqrt?)
> etc

So we could leverage the same mechanism for inlining a non-ffast-math
function into a -ffast-math function, rewriting operations to IFNs?  Though
the resulting less optimization might offset any benefit we get from the
inlining...  At least the above list somewhat suggests it want's to
capture the various -f*-math options.

> With fenv_round, we would actually have to store the rounding mode of
> the operation (upward, towards-zero, dynamic, don't-care, etc), a bit
> less nice because 0 is not a safe fallback anymore. We could also store
> it when we detect a call to fesetround before, but we have to be careful
> that this doesn't result in even more calls to fesetround at expansion
> for targets that do not have statically rounded operations.
>
> If there are other, better things to store there, great.
>
> > Using internal function calls paints us a bit into a corner since they are still
> > subject to the single-SSA def restriction in case we'd want to make FENV
> > dataflow more explicit.  What's the advantage of internal functions compared
> > to using asms for the operations themselves if we wrap this class into
> > a set of "nicer" helpers?
>
> I wanted the representation on gimple to look a bit nice so it would be
> both easy to read in the dumps, and not too hard to write optimizations
> for, and a function call looked good enough. Making FENV dataflow explicit
> would mean having PHIs for FENV, etc? At most I thought FENV would be
> represented by one specific memory region which would not alias user
> variables of type float or double, in particular.

Uh, yeah - didn't think of PHIs.

> I don't really see what it would look like with asms and helpers. In some
> sense, the IFNs are already wrappers, that we unwrap at expansion. Your
> asms would take some FENV as intput and output, so we have to track what
> FENV to use where, similar to .MEM.

True.  The main advantage (if it is any...) would be it not be .MEM

> > One complication with tracking data-flow is "unknown" stuff, I'd suggest
> > to invent a mediator between memory state and FP state which would
> > semantically be load and store operations of the FP state from/to memory.
>
> All I can think of is make FP state a particular variable in memory, and
> teach alias analysis that those functions only read/write to this
> variable. What do you have in mind, splitting operations as:
>
> fenv0 = read_fenv()
> (res, fenv1) = oper(arg0, arg1, fenv0)
> store_fenv(fenv1)
>
> so that "oper" itself is const? (and hopefully simplify consecutive
> read_fenv/store_fenv so there are fewer of them) I wonder if lying about
> the constness of the operation may be problematic.

Kind-of.  I thought to do this around "unknown" operations like function
calls only:

 store_fenv(fenv0);
 foo ();
 fenv0 = read_fenv();

because all functions can set/query the FP state.  You get this
implicitely by tracking FENV as memory itself.

> (and asm would be abused as a way to return a pair, with hopefully some
> marker so we know it isn't a real asm)

Yeah, I still have the idea to eventually do some of the RTL expansion on
GIMPLE by having GIMPLE asm()s with the actual assembly be a reference
to the define_insns or so (thus an enum).  For our case it would be just
PLUS_EXPR or maybe even add2sf.

We've used _Complex to have two-valued returns but obviously the
FP stuff should also work for _Complex FP types and having the FENV
state as 'float' is quite ugly.  Which means we would need to invent
some way to have a SSA name be a tuple of something.  Maybe useful
elsewhere as well.

> > That said, you're the one doing the work and going with internal functions
> > is reasonable - I'm not sure to what extent optimization for FENV acccess
> > code will ever be possible (or wanted/expected).  So going more precise
> > might not have any advantage.
>
> I think some optimizations are expected. For instance, not having to
> re-read the same number from memory many times just because there was an
> addition in between (which could write to fenv but that's it). Some may
> still want FMA (with a consistent rounding direction). For those (like me)
> who usually only care about rounding and not exceptions, making the
> operations pure would be great, and nothing says we cannot vectorize those
> rounded operations!
>
> I am trying to be realistic with what I can achieve, but if you think the
> IFNs would paint us into a corner, then we can drop this approach.

Well, I'm not sure - both asm() and IFNs are a separate class of what passes
are used to see (assigns).  But indeed passes know how to handle function
calls, no pass knows how to handle asms.

I guess there's nothing else but to try ...

Suppose for example you have

 _3 = .IFN_PLUS (_1, _2, 0);
 _4 = .IFN_PLUS (_1, _2, 0);

the first plus may alter FP state (set inexact) but since the second plus
computes the same value we'd want to elide it(?).  Now if there's a
feclearexcept() inbetween we can't elide it - and that works as proposed
because the memory state is inspected by feclearexcept().  But I can't
see how we can convince FRE that we can elide the second plus when
both are modifying memory.  There's no such thing currently as
effects on memory state only depend on arguments.  I _think_ we don't
have to say the mem out state depends on the mem in state (FP ENV),
well - it does, but the difference only depends on the actual arguments.

That said, tracking FENV together with memory will complicate things
but explicitely tracking an (or multiple?) extra FP ENV register input/output
makes the problem not go away (the second plus still has the mutated
FP ENV from the first plus as input).  Instead we'd have to separately
track the effect of a single operation and the overall FP state, like

(_3, flags_5) = .IFN_PLUS (_1, _2, 0);
fpexstate = merge (flags_5, fpexstate);
(_4, flags_6) = .IFN_PLUS (_1, _2, 0);
fpexstate = merge (flage_6, fpexstate);

or so and there we can CSE.  We have to track exception state separately
from the FP control word for rounding-mode for this to work.  Thus when
we're not interested in the exception state then .IFN_PLUS would be 'pure'
(only dependent on the FP CW)?

So I guess we should think of somehow separating rounding mode tracking
and exception state?  If we make the functions affect memory anyway
we can have the FP state reg(s) modeled explicitely with a fake decl(s) and pass
that by reference to the IFNs?  Then we can make use of the "fn spec" attribute
to tell which function reads/writes which reg.  Across unknown functions we'd
then have to use the store/load "trick" to merge them with the global
memory state
though.

> > You needed to guard SQRT - will you need to guard other math functions?
> > (round, etc.)
>
> Maybe, but probably not many. I thought I might have to guard all of them
> (sin, cos, etc), but IIRC Joseph's comment seemed to imply that this
> wouldn't be necessary. I am likely missing FMA now...

Ah, as far as correctness is concerned.  Yes, you're probably right.

> > If we need to keep the IFNs use memory state they will count towards
> > walk limits of the alias oracle even if they can be disambiguated against.
> > This will affect both compile-time and optimizations.
>
> Yes...
>
> > +  /* Careful not to end up with something like X - X, which could get
> > +     simplified.  */
> > +  if (!skip0 && already_protected (op1))
> >
> > we're already relying on RTL not optimizing (x + 0.5) - 0.5 but since
> > that would involve association the simple X - X case might indeed
> > be optimized (but wouldn't that be a bug if it is not correct?)
>
> Indeed we do not currently simplify X-X without -ffinite-math-only.
> However, I am trying to be safe, and whether we can simplify or not is
> something that depends on each operation (what the pragma said at that
> point in the source code), while flag_finite_math_only is at best per
> function.

OK, fair enough.

Sorry for the incoherent brain-dump above ;)
Richard.

>
> --
> Marc Glisse

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: FENV_ACCESS status
  2020-08-07 12:01     ` Richard Biener
@ 2020-08-07 16:45       ` Marc Glisse
  0 siblings, 0 replies; 5+ messages in thread
From: Marc Glisse @ 2020-08-07 16:45 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

On Fri, 7 Aug 2020, Richard Biener wrote:

>> I was mostly thinking of storing information like:
>> * don't care about the rounding mode for this operation
>> * may drop exceptions produced by this operation
>> * may produce extra exceptions
>> * don't care about signed zero
>> * may contract into FMA
>> * don't care about errno (for sqrt?)
>> etc
>
> So we could leverage the same mechanism for inlining a non-ffast-math
> function into a -ffast-math function, rewriting operations to IFNs?

Yes.

> Though the resulting less optimization might offset any benefit we get 
> from the inlining...

I was hoping enough optimizations would still be possible. With the right 
flags, the function could be marked pure or const, could be vectorized, 
etc. We could go through the transformations in match.pd and copy each one 
for the IFN, checking the relevant set of flags (although they might need 
to be more manual in forwprop if match.pd cannot handle them).

> At least the above list somewhat suggests it want's to capture the 
> various -f*-math options.

Originally I only wanted rounding and exceptions, but this looked like a 
sensible generalization after a previous discussion.

>>> One complication with tracking data-flow is "unknown" stuff, I'd suggest
>>> to invent a mediator between memory state and FP state which would
>>> semantically be load and store operations of the FP state from/to memory.
>>
>> All I can think of is make FP state a particular variable in memory, and
>> teach alias analysis that those functions only read/write to this
>> variable. What do you have in mind, splitting operations as:
>>
>> fenv0 = read_fenv()
>> (res, fenv1) = oper(arg0, arg1, fenv0)
>> store_fenv(fenv1)
>>
>> so that "oper" itself is const? (and hopefully simplify consecutive
>> read_fenv/store_fenv so there are fewer of them) I wonder if lying about
>> the constness of the operation may be problematic.
>
> Kind-of.  I thought to do this around "unknown" operations like function
> calls only:
>
> store_fenv(fenv0);
> foo ();
> fenv0 = read_fenv();

In what I described a few lines above, that's roughly what would remain 
after simplification, but instead you would generate it directly, saving 
some compile time if there are more floating point operations than 
unknown. It may help to add them also for branch/join. And even then it 
may not be sufficient. If 2 branches start with read_fenv or end with 
store_fenv, we don't want an optimizer to move them into a single call 
outside of the branches, because then the operation itself, being const, 
could move outside of the branch. ISTR that there are ways to avoid this 
kind of transformation (mostly meant to avoid duplicating an inline asm 
containing a hardcoded label).

At expansion, I guess read_fenv/store_fenv would expand to nothing, they 
were mostly there to protect the true operation, and we could still expand
(res, fenv1) = oper(arg0, arg1, fenv0)
to
res=asm_hide(oper(asm_hide(arg0),asm_hide(arg1)))
if we don't want to also model things in RTL for every target (at least to 
begin with).

> I guess there's nothing else but to try ...
>
> Suppose for example you have
>
> _3 = .IFN_PLUS (_1, _2, 0);
> _4 = .IFN_PLUS (_1, _2, 0);
>
> the first plus may alter FP state (set inexact) but since the second plus
> computes the same value we'd want to elide it(?).

Assuming there is nothing in between, I think so, yes.

> Now if there's a feclearexcept() inbetween we can't elide it - and that 
> works as proposed because the memory state is inspected by 
> feclearexcept().

The exact effect of feclearexcept depends on how we model things. It could 
be considered write-only. If the argument is FE_ALL_EXCEPT, things may 
also be easier.

In some cases, with

_3 = .IFN_PLUS (_1, _2, 0);
feclearexcept (...);
_4 = .IFN_PLUS (_1, _2, 0);

we may want to elide the first IFN...

> But I can't see how we can convince FRE that we can elide the second 
> plus when both are modifying memory.

Yes, that's certainly harder.

Actually, for optimization purposes, I would distinguish the case where we 
care about exceptions and the case where we don't. The few times I've used 
exceptions, it was only for a single operation, and I didn't expect any 
optimization. On the other hand, I often use hundreds of rounded 
operations where I don't care about exceptions. Those can be marked as 
pure (I expect querying if .FENV_PLUS is pure to involve looking at a bit 
in its last argument), and would fit much more easily with the current 
optimizations. I can't claim that my uses are representative of all uses 
though, some people may do long, regular computations and trap on 
FE_INVALID...

I am not that interested in exceptions, but since just rounding does not 
match a standard feature, it seemed more sensible to handle both together. 
I did wonder about making 2 sets of functions, the ones with exceptions 
(much harder for optimization, although not completely hopeless if people 
are really motivated) and the pure ones without exceptions, so the first 
wouldn't hinder the second too much. But having the strictest version 
first looked reasonable.

> There's no such thing currently as effects on memory state only depend 
> on arguments.

This reminds me of the initialization of static/thread_local variables in 
functions, when Jason tried to add an attribute, but I don't think it was 
ever committed, and the semantics were likely too different.

> I _think_ we don't have to say the mem out state depends on the mem in 
> state (FP ENV), well - it does, but the difference only depends on the 
> actual arguments.

A different rounding mode could cause different exceptions I believe.

> That said, tracking FENV together with memory will complicate things
> but explicitely tracking an (or multiple?) extra FP ENV register input/output
> makes the problem not go away (the second plus still has the mutated
> FP ENV from the first plus as input).  Instead we'd have to separately
> track the effect of a single operation and the overall FP state, like
>
> (_3, flags_5) = .IFN_PLUS (_1, _2, 0);
> fpexstate = merge (flags_5, fpexstate);
> (_4, flags_6) = .IFN_PLUS (_1, _2, 0);
> fpexstate = merge (flage_6, fpexstate);

We would have to be careful that lines 2 and 3 cannot be swapped (unless 
we keep all the merges and key expansion on those and not on the IFN? 
But we may end up with a use of the sum before the merge).

> or so and there we can CSE.

And I guess we would have a transformation
merge(f, merge(f, state)) --> merge(f, state)

> We have to track exception state separately
> from the FP control word for rounding-mode for this to work.  Thus when
> we're not interested in the exception state then .IFN_PLUS would be 'pure'
> (only dependent on the FP CW)?
>
> So I guess we should think of somehow separating rounding mode tracking
> and exception state?  If we make the functions affect memory anyway
> we can have the FP state reg(s) modeled explicitely with a fake decl(s) and pass
> that by reference to the IFNs?  Then we can make use of the "fn spec" attribute
> to tell which function reads/writes which reg.  Across unknown functions we'd
> then have to use the store/load "trick" to merge them with the global
> memory state though.

Splitting the rounding mode from the exceptions certainly makes sense, 
since they are used quite differently.

_3 = .FENV_PLUS (_1, _2, 0, &fenv_round, &fenv_except)
or just
_3 = .FENV_PLUS (_1, _2, 1, &fenv_round, 0)
or 
_3 = .FENV_PLUS (_1, _2, 2, 0, &fenv_except)
when we are not interested in everything.

with fake global decls for fenv_round and fenv_except (so "unknown" 
already possibly reads/writes it) and fn specs to say it doesn't look at 
other memory? I was more thinking of making that implicit, through magic 
in a couple relevant functions (the value in flags says if the global 
fenv_round or fenv_except is accessed), as a refinement of just "memory".

But IIUC, we would need something that does not use memory at all (not 
even one variable) if we wanted to avoid the big penalty in alias 
analysis, etc.

If we consider the case without exceptions:

round = get_fenv_round()
_3 = .FENV_PLUS (_1, _2, opts, round)

with .FENV_PLUS "const" and get_fenv_round "pure" (or even reading round 
from a fake global variable instead of a function call) would be tempting, 
but it doesn't work, since now .FENV_PLUS can migrate after a later call 
to fesetround. Even without exceptions we need some protection after, so 
it may be easier to keep the memory (fenv) read as part of .FENV_PLUS.

Also, caring only about rounding doesn't match any standard #pragma, so 
such an option may see very little use in practice...

> Sorry for the incoherent brain-dump above ;)

It is great to have someone to discuss this with!

-- 
Marc Glisse

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-08-07 16:46 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-05 18:01 FENV_ACCESS status Marc Glisse
2020-08-07  9:19 ` Richard Biener
2020-08-07 10:45   ` Marc Glisse
2020-08-07 12:01     ` Richard Biener
2020-08-07 16:45       ` Marc Glisse

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).