public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] -fftz-math: assume that denorms _must_ be flushed to zero optimizations
@ 2017-08-10 17:54 Pekka Jääskeläinen
  2017-08-14  9:28 ` Richard Biener
  0 siblings, 1 reply; 7+ messages in thread
From: Pekka Jääskeläinen @ 2017-08-10 17:54 UTC (permalink / raw)
  To: GCC Patches, Henry Linjamäki, Martin Jambor

[-- Attachment #1: Type: text/plain, Size: 692 bytes --]

Hi,

The attached patch adds a new switch -fftz-math which makes certain
optimizations
assume that "flush to zero" behavior of denormal inputs and outputs is
not an optimization
hint, but required behavior for semantical correctness.

The need for this was initiated by HSAIL (BRIG). With HSAIL, flush to
zero handling is required,
(not only "allowed") in case an HSAIL instruction is marked with the
'ftz' modifier (all HSA Base
profile instructions are).

The patch is not complete and likely misses many optimizations.
However, it is a starting point
that fixes a few cases brought out by the HSAIL conformance suite. We
plan to extend this
as new cases come up.

OK for trunk?

BR,
Pekka

[-- Attachment #2: gcc-ftz-math-switch.patch --]
[-- Type: text/x-patch, Size: 3924 bytes --]

Index: gcc/common.opt
===================================================================
--- gcc/common.opt	(revision 251026)
+++ gcc/common.opt	(working copy)
@@ -2281,6 +2281,11 @@
 Common Report Var(flag_single_precision_constant) Optimization
 Convert floating point constants to single precision constants.
 
+fftz-math
+Common Report Var(flag_ftz_math) Optimization
+Optimizations handle floating-point operations as they must flush
+subnormal floating-point values to zero.
+
 fsplit-ivs-in-unroller
 Common Report Var(flag_split_ivs_in_unroller) Init(1) Optimization
 Split lifetimes of induction variables when loops are unrolled.
Index: gcc/doc/invoke.texi
===================================================================
--- gcc/doc/invoke.texi	(revision 251026)
+++ gcc/doc/invoke.texi	(working copy)
@@ -9458,6 +9458,17 @@
 This option is experimental and does not currently guarantee to
 disable all GCC optimizations that affect signaling NaN behavior.
 
+@item -fftz-math
+@opindex ftz-math
+This option is experimental. With this flag on GCC treats
+floating-point operations (except abs, class, copysign and neg) as
+they must flush subnormal input operands and results to zero
+(FTZ). The FTZ rules are derived from HSA Programmers Reference Manual
+for the base profile. This alters optimizations that would break the
+rules, for example X * 1 -> X simplification. The option assumes the
+target supports FTZ in hardware and has it enabled - either by default
+or set by the user.
+
 @item -fno-fp-int-builtin-inexact
 @opindex fno-fp-int-builtin-inexact
 Do not allow the built-in functions @code{ceil}, @code{floor},
Index: gcc/fold-const-call.c
===================================================================
--- gcc/fold-const-call.c	(revision 251026)
+++ gcc/fold-const-call.c	(working copy)
@@ -697,7 +697,7 @@
 	      && do_mpfr_arg1 (result, mpfr_y1, arg, format));
 
     CASE_CFN_FLOOR:
-      if (!REAL_VALUE_ISNAN (*arg) || !flag_errno_math)
+      if ((!REAL_VALUE_ISNAN (*arg) || !flag_errno_math) && !flag_ftz_math)
 	{
 	  real_floor (result, format, arg);
 	  return true;
@@ -705,7 +705,7 @@
       return false;
 
     CASE_CFN_CEIL:
-      if (!REAL_VALUE_ISNAN (*arg) || !flag_errno_math)
+      if ((!REAL_VALUE_ISNAN (*arg) || !flag_errno_math) && !flag_ftz_math)
 	{
 	  real_ceil (result, format, arg);
 	  return true;
Index: gcc/match.pd
===================================================================
--- gcc/match.pd	(revision 251026)
+++ gcc/match.pd	(working copy)
@@ -143,6 +143,7 @@
 (simplify
  (mult @0 real_onep)
  (if (!HONOR_SNANS (type)
+      && !flag_ftz_math
       && (!HONOR_SIGNED_ZEROS (type)
           || !COMPLEX_FLOAT_TYPE_P (type)))
   (non_lvalue @0)))
@@ -151,6 +152,7 @@
 (simplify
  (mult @0 real_minus_onep)
   (if (!HONOR_SNANS (type)
+       && !flag_ftz_math
        && (!HONOR_SIGNED_ZEROS (type)
            || !COMPLEX_FLOAT_TYPE_P (type)))
    (negate @0)))
@@ -332,13 +334,13 @@
 /* In IEEE floating point, x/1 is not equivalent to x for snans.  */
 (simplify
  (rdiv @0 real_onep)
- (if (!HONOR_SNANS (type))
+ (if (!HONOR_SNANS (type) && !flag_ftz_math)
   (non_lvalue @0)))
 
 /* In IEEE floating point, x/-1 is not equivalent to -x for snans.  */
 (simplify
  (rdiv @0 real_minus_onep)
- (if (!HONOR_SNANS (type))
+ (if (!HONOR_SNANS (type) && !flag_ftz_math)
   (negate @0)))
 
 (if (flag_reciprocal_math)
Index: gcc/simplify-rtx.c
===================================================================
--- gcc/simplify-rtx.c	(revision 251026)
+++ gcc/simplify-rtx.c	(working copy)
@@ -2565,8 +2565,10 @@
 	return op1;
 
       /* In IEEE floating point, x*1 is not equivalent to x for
-	 signalling NaNs.  */
+	 signalling NaNs.
+	 For -fftz-math, x*1 is not equivalent to x for subnormals. */
       if (!HONOR_SNANS (mode)
+	  && (FLOAT_MODE_P (mode) && !flag_ftz_math)
 	  && trueop1 == CONST1_RTX (mode))
 	return op0;
 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] -fftz-math: assume that denorms _must_ be flushed to zero optimizations
  2017-08-10 17:54 [PATCH] -fftz-math: assume that denorms _must_ be flushed to zero optimizations Pekka Jääskeläinen
@ 2017-08-14  9:28 ` Richard Biener
  2017-08-14 11:21   ` Pekka Jääskeläinen
  0 siblings, 1 reply; 7+ messages in thread
From: Richard Biener @ 2017-08-14  9:28 UTC (permalink / raw)
  To: Pekka Jääskeläinen, Joseph S. Myers
  Cc: GCC Patches, Henry Linjamäki, Martin Jambor

On Thu, Aug 10, 2017 at 6:39 PM, Pekka Jääskeläinen <pekka@parmance.com> wrote:
> Hi,
>
> The attached patch adds a new switch -fftz-math which makes certain
> optimizations
> assume that "flush to zero" behavior of denormal inputs and outputs is
> not an optimization
> hint, but required behavior for semantical correctness.
>
> The need for this was initiated by HSAIL (BRIG). With HSAIL, flush to
> zero handling is required,
> (not only "allowed") in case an HSAIL instruction is marked with the
> 'ftz' modifier (all HSA Base
> profile instructions are).

This suggests only outputs are flushed to zero?  OTOH documentation
for X * 1 -> X suggests otherwise.  This simplification also suggests to
make FTZ operations explicit instead of adding a flag?  Thus the BRIG
FE would emit FTZ (X) * 1 which we can optimize to FTZ (X), and we
could eventually add a pass optimizing FTZ operations?

> The patch is not complete and likely misses many optimizations.
> However, it is a starting point
> that fixes a few cases brought out by the HSAIL conformance suite. We
> plan to extend this
> as new cases come up.
>
> OK for trunk?

Hmm.  I don't like testing flag_ftz_math too much here.

Are input denormals really required to be flushed to zero or is it enough
to flush outputs to zero?  If the latter then this is more like the modes
not having denormals (and much nicer to optimization, only constant
folding being affected)?

Richard.

> BR,
> Pekka

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] -fftz-math: assume that denorms _must_ be flushed to zero optimizations
  2017-08-14  9:28 ` Richard Biener
@ 2017-08-14 11:21   ` Pekka Jääskeläinen
  2017-08-14 11:25     ` Richard Biener
  2017-08-14 13:17     ` Joseph Myers
  0 siblings, 2 replies; 7+ messages in thread
From: Pekka Jääskeläinen @ 2017-08-14 11:21 UTC (permalink / raw)
  To: Richard Biener
  Cc: Joseph S. Myers, GCC Patches, Henry Linjamäki, Martin Jambor

Hi Richard,

The base idea of the patch is to optimize for the (common) situation
where FTZ/DAZ
is controlled by a CPU-wide flag and we then need to only avoid compile-time
optimizations that assume semantics where denorm handling is on to support
the ‘forced FTZ/DAZ semantics’.

> This suggests only outputs are flushed to zero?  OTOH documentation
> for X * 1 -> X suggests otherwise.  This simplification also suggests to
> make FTZ operations explicit instead of adding a flag?  Thus the BRIG
> FE would emit FTZ (X) * 1 which we can optimize to FTZ (X), and we
> could eventually add a pass optimizing FTZ operations?

Both the inputs and outputs must be flushed to zero in the HSAIL’s
‘ftz’ semantics.
FTZ operations were previously always “explicit” in the BRIG FE output, like you
propose here; there were builtin calls injected for all inputs and the
output of ‘ftz’-marked
float HSAIL instructions. This is still provided as a fallback for
targets which do not
support a CPU mode flag.

The problem with a special FTZ ‘operation’ of some kind in the generic output is
that the basic optimizations get confused by a new operation and we’d need to
add knowledge of the ‘FTZ’ operation to a bunch of existing optimizer
code, which
seems unnecessary to support this case as the optimizations typically apply also
for the ‘FTZ semantics’ when the FTZ/DAZ flag is on.

Thanks,
Pekka

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] -fftz-math: assume that denorms _must_ be flushed to zero optimizations
  2017-08-14 11:21   ` Pekka Jääskeläinen
@ 2017-08-14 11:25     ` Richard Biener
  2017-08-14 13:17     ` Joseph Myers
  1 sibling, 0 replies; 7+ messages in thread
From: Richard Biener @ 2017-08-14 11:25 UTC (permalink / raw)
  To: Pekka Jääskeläinen
  Cc: Joseph S. Myers, GCC Patches, Henry Linjamäki, Martin Jambor

On Mon, Aug 14, 2017 at 12:45 PM, Pekka Jääskeläinen <pekka@parmance.com> wrote:
> Hi Richard,
>
> The base idea of the patch is to optimize for the (common) situation
> where FTZ/DAZ
> is controlled by a CPU-wide flag and we then need to only avoid compile-time
> optimizations that assume semantics where denorm handling is on to support
> the ‘forced FTZ/DAZ semantics’.
>
>> This suggests only outputs are flushed to zero?  OTOH documentation
>> for X * 1 -> X suggests otherwise.  This simplification also suggests to
>> make FTZ operations explicit instead of adding a flag?  Thus the BRIG
>> FE would emit FTZ (X) * 1 which we can optimize to FTZ (X), and we
>> could eventually add a pass optimizing FTZ operations?
>
> Both the inputs and outputs must be flushed to zero in the HSAIL’s
> ‘ftz’ semantics.
> FTZ operations were previously always “explicit” in the BRIG FE output, like you
> propose here; there were builtin calls injected for all inputs and the
> output of ‘ftz’-marked
> float HSAIL instructions. This is still provided as a fallback for
> targets which do not
> support a CPU mode flag.

I see.  But how does making them implicit fix cases in the conformance
testsuite?  That is, isn't the error in the runtime implementation of
__hsail_ftz_*?  I'd have used a "simple"

  if (fpclassify (x) == FP_SUBNORMAL)
    return copysign (0, x);

> The problem with a special FTZ ‘operation’ of some kind in the generic output is
> that the basic optimizations get confused by a new operation and we’d need to
> add knowledge of the ‘FTZ’ operation to a bunch of existing optimizer
> code, which
> seems unnecessary to support this case as the optimizations typically apply also
> for the ‘FTZ semantics’ when the FTZ/DAZ flag is on.

Apart from the exceptions you needed to guard ... do you have an example of
a transform that is confused by explicit FTZ and that would be valid if that FTZ
were implicit?  An explicit FTZ should be much safer.  I think the builtins
should also be CONST and not only PURE.

Richard.

> Thanks,
> Pekka

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] -fftz-math: assume that denorms _must_ be flushed to zero optimizations
  2017-08-14 11:21   ` Pekka Jääskeläinen
  2017-08-14 11:25     ` Richard Biener
@ 2017-08-14 13:17     ` Joseph Myers
  2017-08-22 14:10       ` Pekka Jääskeläinen
  1 sibling, 1 reply; 7+ messages in thread
From: Joseph Myers @ 2017-08-14 13:17 UTC (permalink / raw)
  To: Pekka Jääskeläinen
  Cc: Richard Biener, GCC Patches, Henry Linjamäki, Martin Jambor

[-- Attachment #1: Type: text/plain, Size: 690 bytes --]

On Mon, 14 Aug 2017, Pekka Jääskeläinen wrote:

> Both the inputs and outputs must be flushed to zero in the HSAIL’s
> ‘ftz’ semantics.

Presumably this means that constant folding needs to know about those 
semantics, both for operations with a subnormal floating-point argument 
(whether or not the output is floating point, or floating point in the 
same format), and those with such a result?

Can assignments copy subnormals without converting them to zero?  Should 
comparisons flush input subnormals to zero before comparing?  Should 
conversions e.g. from float to double convert a float subnormal input to 
zero?

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] -fftz-math: assume that denorms _must_ be flushed to zero optimizations
  2017-08-14 13:17     ` Joseph Myers
@ 2017-08-22 14:10       ` Pekka Jääskeläinen
  2017-08-22 14:20         ` Richard Biener
  0 siblings, 1 reply; 7+ messages in thread
From: Pekka Jääskeläinen @ 2017-08-22 14:10 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Richard Biener, GCC Patches, Henry Linjamäki, Martin Jambor

[-- Attachment #1: Type: text/plain, Size: 4382 bytes --]

Hi Richard and Joseph,

Replies for both inline:

I wrote:
>> Both the inputs and outputs must be flushed to zero in the HSAIL’s
>> ‘ftz’ semantics.
>> FTZ operations were previously always “explicit” in the BRIG FE output, like you
>> propose here; there were builtin calls injected for all inputs and the
>> output of ‘ftz’-marked
>> float HSAIL instructions. This is still provided as a fallback for
>> targets which do not
>> support a CPU mode flag.

On Mon, Aug 14, 2017 at 1:17 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> I see.  But how does making them implicit fix cases in the conformance
> testsuite?  That is, isn't the error in the runtime implementation of
> __hsail_ftz_*?  I'd have used a "simple" [...]

There are two parts in the story here:

1) Making the FTZ/DAZ “the default”, meaning no builtin calls or
similar are used to flush
the operands/results, but relying on that the runtime flips on the
FTZ/DAZ CPU flags
before executing this code. This is purely a performance optimization because
those FTZ/DAZ builtin calls (three per HSAIL instruction) ruin the performance
for multiple reasons. We implemented this optimization already in our
staging branch of
the BRIG FE.

2) Ensuring GCC does not perform certain compile-time optimizations with the
assumption that FTZ/DAZ is optional, but make it assume that ftz
should happen for
correctness. The proposed patch addresses this part for the compiler
side by disabling
the currently known optimizations which should be flushed at runtime
when “ftz denorm
math” is desired.

>> The problem with a special FTZ ‘operation’ of some kind in the generic output is
>> that the basic optimizations get confused by a new operation and we’d need to
>> add knowledge of the ‘FTZ’ operation to a bunch of existing optimizer
>> code, which
>> seems unnecessary to support this case as the optimizations typically apply also
>> for the ‘FTZ semantics’ when the FTZ/DAZ flag is on.
>
> Apart from the exceptions you needed to guard ... do you have an example of
> a transform that is confused by explicit FTZ and that would be valid if that FTZ
> were implicit?  An explicit FTZ should be much safer.  I think the builtins
> should also be CONST and not only PURE.

Explicit builtin calls ruin many optimizations starting from a simple
common subexpression
elimination if they don’t understand what the builtin returns for any
given operand. Thus,
inlining the builtin function’s code would be needed first and there
would be a lot of code
inlined due to the abundance of ftz calls required and you cannot
eliminate it all (as at
compile time you don’t know if the operand is a denorm or not). Another approach
would be to introduce special cases to the optimizations affected so
they understand
the FTZ builtin and might be able to remove the useless ones. This potentially
touches _a lot_ of code. And in the end, if the CPU could flush
denorms efficiently
using hardware (typically it’s faster to do FTZ in HW than gradual
underflow so this
is likely the case), any builtin call to do it that cannot be
optimized away presents
additional, possibly major, runtime overhead.

We tested if a simple common subexpression elimination case works with
the ftz builtins
and it didn’t. CONST didn’t help here.

However, I understand your concern that there might be optimizations
that still break the
FTZ semantics if there are no explicit builtin calls, but we are
prepared to fix them case by
case if/when they appear. The attached updated patch fixes a few
additional cases we noticed,
e.g. it disables several constant folding cases.

On Mon, Aug 14, 2017 at 2:30 PM, Joseph Myers <joseph@codesourcery.com> wrote:
> Presumably this means that constant folding needs to know about those
> semantics, both for operations with a subnormal floating-point argument
> (whether or not the output is floating point, or floating point in the
> same format), and those with such a result?
> Can assignments copy subnormals without converting them to zero?  Should
> comparisons flush input subnormals to zero before comparing?  Should
> conversions e.g. from float to double convert a float subnormal input to
> zero?

I can answer yes to all of these questions.

BR,
Pekka

[-- Attachment #2: ftz-math-v2.patch --]
[-- Type: text/x-patch, Size: 21524 bytes --]

From 0b97ccde3ec837329b4c551ccd7f98c074ca7a7b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Henry=20Linjam=C3=A4ki?= <henry.linjamaki@parmance.com>
Date: Mon, 24 Jul 2017 09:28:00 +0300
Subject: [PATCH] Added common -fftz-math flag

With the flag set on the compiler assumes the floating-point
operations must flush received and resulting subnormal floating-point
values to zero.
---
 gcc/common.opt                  |   5 +
 gcc/doc/invoke.texi             |  11 ++
 gcc/fold-const-call.c           |   9 +-
 gcc/fold-const.c                |  22 +++
 gcc/match.pd                    |  14 +-
 gcc/simplify-rtx.c              |  30 +++-
 gcc/testsuite/gcc.dg/ftz-math.c | 330 ++++++++++++++++++++++++++++++++++++++++
 7 files changed, 405 insertions(+), 16 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/ftz-math.c

diff --git a/gcc/common.opt b/gcc/common.opt
index 13305558d2d..fd77d00d814 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2266,6 +2266,11 @@ fsingle-precision-constant
 Common Report Var(flag_single_precision_constant) Optimization
 Convert floating point constants to single precision constants.
 
+fftz-math
+Common Report Var(flag_ftz_math) Optimization
+Optimizations handle floating-point operations as they must flush
+subnormal floating-point values to zero.
+
 fsplit-ivs-in-unroller
 Common Report Var(flag_split_ivs_in_unroller) Init(1) Optimization
 Split lifetimes of induction variables when loops are unrolled.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index a6ce483d890..c3da6c8ebe3 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -9330,6 +9330,17 @@ The default is @option{-fno-signaling-nans}.
 This option is experimental and does not currently guarantee to
 disable all GCC optimizations that affect signaling NaN behavior.
 
+@item -fftz-math
+@opindex ftz-math
+This option is experimental. With this flag on GCC treats
+floating-point operations (except abs, classify, copysign and
+negation) as they must flush subnormal input operands and results to
+zero (FTZ). The FTZ rules are derived from HSA Programmers Reference
+Manual for the base profile. This alters optimizations that would
+break the rules, for example X * 1 -> X simplification. The option
+assumes the target supports FTZ in hardware and has it enabled -
+either by default or set by the user.
+
 @item -fno-fp-int-builtin-inexact
 @opindex fno-fp-int-builtin-inexact
 Do not allow the built-in functions @code{ceil}, @code{floor},
diff --git a/gcc/fold-const-call.c b/gcc/fold-const-call.c
index 381cb7fd290..21715f090da 100644
--- a/gcc/fold-const-call.c
+++ b/gcc/fold-const-call.c
@@ -1049,7 +1049,8 @@ fold_const_call_1 (combined_fn fn, tree type, tree arg)
   if (real_cst_p (arg))
     {
       gcc_checking_assert (SCALAR_FLOAT_MODE_P (arg_mode));
-      if (mode == arg_mode)
+      /* For -fftz-math subnormals are not folded correctly.  */
+      if (mode == arg_mode && !flag_ftz_math)
 	{
 	  /* real -> real.  */
 	  REAL_VALUE_TYPE result;
@@ -1299,7 +1300,8 @@ fold_const_call_1 (combined_fn fn, tree type, tree arg0, tree arg1)
       && real_cst_p (arg1))
     {
       gcc_checking_assert (SCALAR_FLOAT_MODE_P (arg0_mode));
-      if (mode == arg0_mode)
+      /* For -fftz-math subnormals are not folded correctly.  */
+      if (mode == arg0_mode && !flag_ftz_math)
 	{
 	  /* real, real -> real.  */
 	  REAL_VALUE_TYPE result;
@@ -1494,7 +1496,8 @@ fold_const_call_1 (combined_fn fn, tree type, tree arg0, tree arg1, tree arg2)
       && real_cst_p (arg2))
     {
       gcc_checking_assert (SCALAR_FLOAT_MODE_P (arg0_mode));
-      if (mode == arg0_mode)
+      /* For -fftz-math subnormals are not folded correctly.  */
+      if (mode == arg0_mode && !flag_ftz_math)
 	{
 	  /* real, real, real -> real.  */
 	  REAL_VALUE_TYPE result;
diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index f6d5af43b33..1b19bc93248 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -1152,6 +1152,11 @@ const_binop (enum tree_code code, tree arg1, tree arg2)
       bool inexact;
       tree t, type;
 
+      /* For ftz-math disable all floating point constant folding for
+	 now.  */
+      if (flag_ftz_math)
+	return NULL_TREE;
+
       /* The following codes are handled by real_arithmetic.  */
       switch (code)
 	{
@@ -2000,6 +2005,10 @@ fold_convert_const_real_from_real (tree type, const_tree arg1)
       && REAL_VALUE_ISSIGNALING_NAN (TREE_REAL_CST (arg1)))
     return NULL_TREE; 
 
+  /* For ftz-math constant folding is disabled for now.  */
+  if (flag_ftz_math)
+    return NULL_TREE;
+
   real_convert (&value, TYPE_MODE (type), &TREE_REAL_CST (arg1));
   t = build_real (type, value);
 
@@ -6479,6 +6488,10 @@ fold_real_zero_addition_p (const_tree type, const_tree addend, int negate)
   if (!real_zerop (addend))
     return false;
 
+  /* X +/- 0 flushes subnormals to zero but plain X does not.  */
+  if (flag_ftz_math)
+    return false;
+
   /* Don't allow the fold with -fsignaling-nans.  */
   if (HONOR_SNANS (element_mode (type)))
     return false;
@@ -9117,6 +9130,11 @@ fold_binary_loc (location_t loc,
   arg0 = op0;
   arg1 = op1;
 
+  /* For ftz-math disable all floating point constant folding for
+     now.  */
+  if (flag_ftz_math && FLOAT_TYPE_P (type))
+    return NULL_TREE;
+
   /* Strip any conversions that don't change the mode.  This is
      safe for every expression, except for a comparison expression
      because its signedness is derived from its operands.  So, in
@@ -13831,6 +13849,10 @@ fold_relational_const (enum tree_code code, tree type, tree op0, tree op1)
 
   if (TREE_CODE (op0) == REAL_CST && TREE_CODE (op1) == REAL_CST)
     {
+      /* For ftz-math disable all constant folding for now.  */
+      if (flag_ftz_math)
+	return NULL_TREE;
+
       const REAL_VALUE_TYPE *c0 = TREE_REAL_CST_PTR (op0);
       const REAL_VALUE_TYPE *c1 = TREE_REAL_CST_PTR (op1);
 
diff --git a/gcc/match.pd b/gcc/match.pd
index 80a17ba3d23..c4e8eefe0c1 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -129,6 +129,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (simplify
  (mult @0 real_onep)
  (if (!HONOR_SNANS (type)
+      && !flag_ftz_math
       && (!HONOR_SIGNED_ZEROS (type)
           || !COMPLEX_FLOAT_TYPE_P (type)))
   (non_lvalue @0)))
@@ -137,6 +138,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (simplify
  (mult @0 real_minus_onep)
   (if (!HONOR_SNANS (type)
+       && !flag_ftz_math
        && (!HONOR_SIGNED_ZEROS (type)
            || !COMPLEX_FLOAT_TYPE_P (type)))
    (negate @0)))
@@ -240,13 +242,13 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 /* In IEEE floating point, x/1 is not equivalent to x for snans.  */
 (simplify
  (rdiv @0 real_onep)
- (if (!HONOR_SNANS (type))
+ (if (!HONOR_SNANS (type) && !flag_ftz_math)
   (non_lvalue @0)))
 
 /* In IEEE floating point, x/-1 is not equivalent to -x for snans.  */
 (simplify
  (rdiv @0 real_minus_onep)
- (if (!HONOR_SNANS (type))
+ (if (!HONOR_SNANS (type) && !flag_ftz_math)
   (negate @0)))
 
 (if (flag_reciprocal_math)
@@ -1394,7 +1396,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (for minmax (min max FMIN FMAX)
  (simplify
   (minmax @0 @0)
-  @0))
+   (if (FLOAT_TYPE_P (type) && !flag_ftz_math)
+    @0)))
 /* min(max(x,y),y) -> y.  */
 (simplify
  (min:c (max:c @0 @1) @1)
@@ -1853,7 +1856,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 	  || (GENERIC
 	      && TYPE_MAIN_VARIANT (type) == TYPE_MAIN_VARIANT (inside_type)))
 	 && (((inter_int || inter_ptr) && final_int)
-	     || (inter_float && final_float))
+	     || (inter_float && final_float && !flag_ftz_math))
 	 && inter_prec >= final_prec)
      (ocvt @0))
 
@@ -1862,7 +1865,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
        former is wider than the latter and doesn't change the signedness
        (for integers).  Avoid this if the final type is a pointer since
        then we sometimes need the middle conversion.  */
-    (if (((inter_int && inside_int) || (inter_float && inside_float))
+    (if (((inter_int && inside_int) || (inter_float && inside_float
+					&& !flag_ftz_math))
 	 && (final_int || final_float)
 	 && inter_prec >= inside_prec
 	 && (inter_float || inter_unsignedp == inside_unsignedp))
diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index 7cab26a0e34..3c904cdefd6 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -1240,8 +1240,12 @@ simplify_unary_operation_1 (enum rtx_code code, machine_mode mode, rtx op)
       if (DECIMAL_FLOAT_MODE_P (mode))
 	break;
 
-      /* (float_truncate:SF (float_extend:DF foo:SF)) = foo:SF.  */
-      if (GET_CODE (op) == FLOAT_EXTEND
+      /* (float_truncate:SF (float_extend:DF foo:SF)) = foo:SF except
+	 for -fftz-math with subnormal input. Simplifications like
+	 this must be prevented as they no longer perform
+	 flush-to-zero as required by the semantics of -fftz-math
+	 flag.  */
+      if (!flag_ftz_math && GET_CODE (op) == FLOAT_EXTEND
 	  && GET_MODE (XEXP (op, 0)) == mode)
 	return XEXP (op, 0);
 
@@ -1891,14 +1895,16 @@ simplify_const_unary_operation (enum rtx_code code, machine_mode mode,
 	case FLOAT_TRUNCATE:
 	  /* Don't perform the operation if flag_signaling_nans is on
 	     and the operand is a signaling NaN.  */
-	  if (HONOR_SNANS (mode) && REAL_VALUE_ISSIGNALING_NAN (d))
+	  if ((HONOR_SNANS (mode) && REAL_VALUE_ISSIGNALING_NAN (d))
+	      || flag_ftz_math)
 	    return NULL_RTX;
 	  d = real_value_truncate (mode, d);
 	  break;
 	case FLOAT_EXTEND:
 	  /* Don't perform the operation if flag_signaling_nans is on
 	     and the operand is a signaling NaN.  */
-	  if (HONOR_SNANS (mode) && REAL_VALUE_ISSIGNALING_NAN (d))
+	  if ((HONOR_SNANS (mode) && REAL_VALUE_ISSIGNALING_NAN (d))
+	       || flag_ftz_math)
 	    return NULL_RTX;
 	  /* All this does is change the mode, unless changing
 	     mode class.  */
@@ -2137,7 +2143,8 @@ simplify_binary_operation_1 (enum rtx_code code, machine_mode mode,
 	 when x is NaN, infinite, or finite and nonzero.  They aren't
 	 when x is -0 and the rounding mode is not towards -infinity,
 	 since (-0) + 0 is then 0.  */
-      if (!HONOR_SIGNED_ZEROS (mode) && trueop1 == CONST0_RTX (mode))
+      if (!HONOR_SIGNED_ZEROS (mode) && !flag_ftz_math
+	  && trueop1 == CONST0_RTX (mode))
 	return op0;
 
       /* ((-a) + b) -> (b - a) and similarly for (a + (-b)).  These
@@ -2342,8 +2349,9 @@ simplify_binary_operation_1 (enum rtx_code code, machine_mode mode,
       /* Subtracting 0 has no effect unless the mode has signed zeros
 	 and supports rounding towards -infinity.  In such a case,
 	 0 - 0 is -0.  */
-      if (!(HONOR_SIGNED_ZEROS (mode)
-	    && HONOR_SIGN_DEPENDENT_ROUNDING (mode))
+      if (!((HONOR_SIGNED_ZEROS (mode)
+	     && HONOR_SIGN_DEPENDENT_ROUNDING (mode))
+	    || flag_ftz_math)
 	  && trueop1 == CONST0_RTX (mode))
 	return op0;
 
@@ -2558,6 +2566,7 @@ simplify_binary_operation_1 (enum rtx_code code, machine_mode mode,
       /* In IEEE floating point, x*1 is not equivalent to x for
 	 signalling NaNs.  */
       if (!HONOR_SNANS (mode)
+	  && (FLOAT_MODE_P (mode) && !flag_ftz_math)
 	  && trueop1 == CONST1_RTX (mode))
 	return op0;
 
@@ -4001,6 +4010,10 @@ simplify_const_binary_operation (enum rtx_code code, machine_mode mode,
 	  const REAL_VALUE_TYPE *opr0, *opr1;
 	  bool inexact;
 
+	  /* Subnormals are not handled correctly with -fftz-math.  */
+	  if (flag_ftz_math)
+	    return 0;
+
 	  opr0 = CONST_DOUBLE_REAL_VALUE (op0);
 	  opr1 = CONST_DOUBLE_REAL_VALUE (op1);
 
@@ -5083,7 +5096,8 @@ simplify_const_relational_operation (enum rtx_code code,
 
   /* If the operands are floating-point constants, see if we can fold
      the result.  */
-  if (CONST_DOUBLE_AS_FLOAT_P (trueop0)
+  if (!flag_ftz_math
+      && CONST_DOUBLE_AS_FLOAT_P (trueop0)
       && CONST_DOUBLE_AS_FLOAT_P (trueop1)
       && SCALAR_FLOAT_MODE_P (GET_MODE (trueop0)))
     {
diff --git a/gcc/testsuite/gcc.dg/ftz-math.c b/gcc/testsuite/gcc.dg/ftz-math.c
new file mode 100644
index 00000000000..f782515a044
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ftz-math.c
@@ -0,0 +1,330 @@
+/* Tests -fftz-math flag */
+/* { dg-do run { target x86_64-*-* } } */
+/* { dg-options "-O2 -fftz-math" } */
+
+#include <math.h>
+
+/* #define DEBUG_TEST */
+#ifdef DEBUG_TEST
+#  include <stdio.h>
+#endif
+
+#include "xmmintrin.h"
+#include "pmmintrin.h"
+
+union uf
+{
+  unsigned int u;
+  float f;
+};
+
+union ud
+{
+  unsigned long long u;
+  double d;
+};
+
+static unsigned int
+f2u (float v)
+{
+  union uf u;
+  u.f = v;
+  return u.u;
+}
+
+static unsigned long long
+d2u (double v)
+{
+  union ud u;
+  u.d = v;
+  return u.u;
+}
+
+
+static void
+enable_ftz_mode ()
+{
+  _MM_SET_FLUSH_ZERO_MODE (_MM_FLUSH_ZERO_ON);
+  _MM_SET_DENORMALS_ZERO_MODE (_MM_DENORMALS_ZERO_ON);
+}
+
+static int
+test_sf_is_zero (float x)
+{
+  /* FTZ mode is on, must do bitwise ops for zero test. */
+  return ((f2u (x) & 0x7fffffffu) == 0u);
+}
+
+static int
+test_df_is_zero (double x)
+{
+  /* FTZ mode is on, must do bitwise ops for zero test. */
+  return ((d2u (x) & 0x7fffffffffffffffull) == 0ull);
+}
+
+static int
+test_sf_is_subnormal (float x) {
+  unsigned int u = f2u (x);
+  if (u & 0x7f800000u)
+    return 0;
+  return (u & 0x007fffffu);
+}
+
+static int
+test_df_is_subnormal (double x) {
+  unsigned long long u = d2u (x);
+  if (u & 0x7ff0000000000000ull)
+    return 0;
+  return (u & 0x000fffffffffffffull);
+}
+
+#ifdef DEBUG_TEST
+void err_print (unsigned line, const char* expr)
+{
+  printf ("Line %d: FAIL: %s\n", line, expr);
+  abort ();
+}
+#  define TEST_SF_IS_ZERO(expr) \
+  if (!test_sf_is_zero (expr)) err_print (__LINE__, #expr)
+#  define TEST_SF_IS_SUBNORMAL(expr) \
+  if (!test_sf_is_subnormal (expr)) err_print (__LINE__, #expr)
+#  define TEST_DF_IS_ZERO(expr) \
+  if (!test_df_is_zero (expr)) err_print (__LINE__, #expr)
+#  define TEST_DF_IS_SUBNORMAL(expr) \
+  if (!test_df_is_subnormal (expr)) err_print (__LINE__, #expr)
+#  define TEST_TRUE(expr) if (!(expr)) err_print (__LINE__, #expr)
+#else
+#  define TEST_SF_IS_ZERO(expr) if (!test_sf_is_zero (expr)) abort ()
+#  define TEST_SF_IS_SUBNORMAL(expr) if (!test_sf_is_subnormal (expr)) abort ()
+#  define TEST_DF_IS_ZERO(expr) if (!test_df_is_zero (expr)) abort ()
+#  define TEST_DF_IS_SUBNORMAL(expr) if (!test_df_is_subnormal (expr)) abort ()
+#  define TEST_TRUE(expr) if (!(expr)) abort ()
+#endif
+
+volatile float sf;
+volatile double df;
+
+int
+main ()
+{
+  enable_ftz_mode ();
+
+  /* Circulate through volatile to avoid constant folding. */
+  sf = 2.87E-42f; /* = subnormal */
+  float x = sf;
+
+  TEST_SF_IS_SUBNORMAL (x); /* Store/load should not flush. */
+  TEST_TRUE (!isnormal (x));
+  TEST_TRUE (fpclassify (x) == FP_SUBNORMAL);
+
+  TEST_DF_IS_ZERO ((double) x);
+
+  /* Test the expression is not simplified to plain x, thus, leaking the
+     subnormal. */
+  TEST_SF_IS_ZERO (x * 1);
+  TEST_SF_IS_ZERO (x * -1);
+  TEST_SF_IS_ZERO (x * 0.5);
+
+  TEST_SF_IS_ZERO (x / 1);
+  TEST_SF_IS_ZERO (x / -1);
+  TEST_SF_IS_ZERO (x / 2);
+
+  TEST_SF_IS_ZERO (fminf (x, x));
+  TEST_SF_IS_ZERO (fminf (x, -x));
+  TEST_SF_IS_ZERO (fmaxf (x, x));
+  TEST_SF_IS_ZERO (fmaxf (x, -x));
+
+  TEST_SF_IS_ZERO (x + 0);
+  TEST_SF_IS_ZERO (0 - x);
+
+  TEST_SF_IS_ZERO (x - 0);
+  TEST_SF_IS_ZERO (x + x);
+
+  TEST_SF_IS_ZERO (x + 0.0f);
+  TEST_SF_IS_ZERO (0.0f - x);
+  TEST_SF_IS_ZERO (x - 0.0f);
+  TEST_SF_IS_ZERO (x + x);
+
+  TEST_SF_IS_ZERO (x * copysignf (1.0f, x));
+  TEST_SF_IS_ZERO (x * copysignf (1.0f, -x));
+
+  float y = sf;
+  TEST_SF_IS_ZERO (fminf (fmaxf (x, y), y));
+
+  TEST_SF_IS_SUBNORMAL (x == y ? x : y);
+  TEST_SF_IS_SUBNORMAL (x != y ? x : y);
+  TEST_SF_IS_SUBNORMAL (x >= y ? x : y);
+  TEST_SF_IS_SUBNORMAL (x > y ? x : y);
+  TEST_SF_IS_SUBNORMAL (x <= y ? x : y);
+  TEST_SF_IS_SUBNORMAL (x < y ? x : y);
+
+  /* FP ops that should not flush. */
+  TEST_SF_IS_SUBNORMAL (fabsf (x));
+  TEST_SF_IS_SUBNORMAL (x < 0 ? -x : x);
+  TEST_SF_IS_SUBNORMAL (-x);
+  TEST_SF_IS_SUBNORMAL (copysignf (x, -1.0));
+
+  /* Test constant folding with subnormal values. */
+  TEST_TRUE (!isnormal (2.87E-42f));
+
+  TEST_SF_IS_SUBNORMAL (-(2.87E-42f));
+  TEST_SF_IS_SUBNORMAL (fabsf (-2.87E-42f));
+  TEST_SF_IS_SUBNORMAL (copysignf (2.87E-42f, -1.0));
+
+  TEST_SF_IS_ZERO (fminf (2.87E-42f, 2.87E-42f));
+  TEST_SF_IS_ZERO (fminf (2.87E-42f, -5.74E-42f));
+  TEST_SF_IS_ZERO (fmaxf (2.87E-42f, 2.87E-42f));
+  TEST_SF_IS_ZERO (fmaxf (2.87E-42f, -5.74E-42f));
+
+  TEST_SF_IS_ZERO (floorf (-2.87E-42f));
+  TEST_SF_IS_ZERO (ceilf (2.87E-42f));
+
+  TEST_SF_IS_ZERO (sqrtf (2.82E-42f));
+
+  TEST_SF_IS_ZERO (2.87E-42f + 0.0f);
+  TEST_SF_IS_ZERO (2.87E-42f + 5.74E-42f);
+  TEST_SF_IS_ZERO (2.87E-42f - 0.0f);
+  TEST_SF_IS_ZERO (0.0f - 2.87E-42f);
+  TEST_SF_IS_ZERO (2.87E-42f * 1.0f);
+  TEST_SF_IS_ZERO (2.87E-42f * -1.0f);
+  TEST_SF_IS_ZERO (2.87E-42f * 12.3f);
+  TEST_SF_IS_ZERO (2.87E-42f / 1.0f);
+  TEST_SF_IS_ZERO (2.87E-42f / 12.3f);
+
+  TEST_TRUE (2.87E-42f == -5.74E-42f);
+  TEST_TRUE (2.87E-42f == -5.74E-42f ? 1 : 0);
+
+  TEST_TRUE (2.87E-42f == -5.74E-42f ? 1 : 0);
+  TEST_TRUE (2.87E-42f != -5.74E-42f ? 0 : 1);
+  TEST_TRUE (2.87E-42f >= -5.74E-42f ? 1 : 0);
+  TEST_TRUE (2.87E-42f >= 5.74E-42f ? 1 : 0);
+  TEST_TRUE (2.87E-42f > -5.74E-42f ? 0 : 1);
+  TEST_TRUE (2.87E-42f > 5.74E-42f ? 0 : 1);
+  TEST_TRUE (2.87E-42f <= -5.74E-42f ? 1 : 0);
+  TEST_TRUE (2.87E-42f <= 5.74E-42f ? 1 : 0);
+  TEST_TRUE (2.87E-42f < -5.74E-42f ? 0 : 1);
+  TEST_TRUE (2.87E-42f < 5.74E-42f ? 0 : 1);
+
+  /*  A < B ? A : B -> min (B, A)  must not happen (min flushes to zero).*/
+  TEST_SF_IS_SUBNORMAL (2.87E-42f < -5.74E-42f ? 2.87E-42f : -5.74E-42f);
+
+  /* Normal and subnormal input. */
+  TEST_TRUE ((2.87E-42f + 1.1754944E-38f) == 1.1754944E-38f);
+  TEST_TRUE ((1.1754944E-38f - 2.87E-42f) == 1.1754944E-38f);
+
+  /* Expression with normal numbers. Result of the Substraction is
+     subnormal. */
+  float sf_tmp = (1.469368E-38f - 1.1754944E-38f) + 1.1754944E-38f;
+  TEST_TRUE (sf_tmp == 1.1754944E-38f);
+
+
+  /*** Test with double precision. ***/
+  df = 5.06E-321;
+  double dx = df;
+
+  TEST_DF_IS_SUBNORMAL (dx);
+  TEST_TRUE (!isnormal (dx));
+  TEST_TRUE (fpclassify (dx) == FP_SUBNORMAL);
+
+  TEST_SF_IS_ZERO ((float)dx);
+
+  TEST_DF_IS_ZERO (dx * 1);
+  TEST_DF_IS_ZERO (dx * -1);
+  TEST_DF_IS_ZERO (dx * 0.5);
+
+  TEST_DF_IS_ZERO (dx / 1);
+  TEST_DF_IS_ZERO (dx / -1);
+  TEST_DF_IS_ZERO (dx / 2);
+
+  TEST_DF_IS_ZERO (fmin (dx, dx));
+  TEST_DF_IS_ZERO (fmin (dx, -dx));
+  TEST_DF_IS_ZERO (fmax (dx, dx));
+  TEST_DF_IS_ZERO (fmax (dx, -dx));
+
+  TEST_DF_IS_ZERO (dx + 0);
+  TEST_DF_IS_ZERO (0 - dx);
+  TEST_DF_IS_ZERO (dx - 0);
+  TEST_DF_IS_ZERO (dx + dx);
+
+  TEST_DF_IS_ZERO (dx + 0.0);
+  TEST_DF_IS_ZERO (0.0 - dx);
+  TEST_DF_IS_ZERO (dx - 0.0);
+
+  TEST_DF_IS_ZERO (dx * copysign (1.0, dx));
+  TEST_DF_IS_ZERO (dx * copysign (1.0, -dx));
+
+  df = -1.61895E-319;
+  double dy = df;
+  TEST_SF_IS_ZERO (fmin (fmax (dx, dy), dy));
+
+  TEST_DF_IS_SUBNORMAL (dx == dy ? dx : dy);
+  TEST_DF_IS_SUBNORMAL (dx != dy ? dx : dy);
+  TEST_DF_IS_SUBNORMAL (dx >= dy ? dx : dy);
+  TEST_DF_IS_SUBNORMAL (dx > dy ? dx : dy);
+  TEST_DF_IS_SUBNORMAL (dx <= dy ? dx : dy);
+  TEST_DF_IS_SUBNORMAL (dx < dy ? dx : dy);
+
+  /* FP ops that should not flush. */
+  TEST_DF_IS_SUBNORMAL (fabs (dx));
+  TEST_DF_IS_SUBNORMAL (dx < 0 ? -dx : dx);
+  TEST_DF_IS_SUBNORMAL (-dx);
+  TEST_DF_IS_SUBNORMAL (copysign (dx, -1.0));
+
+  /* Test constant folding with subnormal values. */
+
+  TEST_TRUE (!isnormal (5.06E-321));
+  TEST_TRUE (fpclassify (5.06E-321) == FP_SUBNORMAL);
+
+  TEST_DF_IS_SUBNORMAL (-(5.06E-321));
+  TEST_DF_IS_SUBNORMAL (fabs (-5.06E-321));
+  TEST_DF_IS_SUBNORMAL (copysign (5.06E-321, -1.0));
+
+  TEST_DF_IS_ZERO (fmin (5.06E-321, 5.06E-321));
+  TEST_DF_IS_ZERO (fmin (5.06E-321, -1.61895E-319));
+  TEST_DF_IS_ZERO (fmax (5.06E-321, 5.06E-321));
+  TEST_DF_IS_ZERO (fmax (5.06E-321, -1.61895E-319));
+
+  TEST_DF_IS_ZERO (floor (-5.06E-321));
+  TEST_DF_IS_ZERO (ceil (5.06E-321));
+  TEST_DF_IS_ZERO (sqrt (2.82E-42f));
+
+  TEST_DF_IS_ZERO (5.06E-321 + 0.0);
+  TEST_DF_IS_ZERO (5.06E-321 + 1.61895E-319);
+  TEST_DF_IS_ZERO (5.06E-321 - 0.0);
+  TEST_DF_IS_ZERO (0.0 - 5.06E-321);
+  TEST_DF_IS_ZERO (5.06E-321 * 1.0);
+  TEST_DF_IS_ZERO (5.06E-321 * -1.0);
+  TEST_DF_IS_ZERO (5.06E-321 * 12.3);
+  TEST_DF_IS_ZERO (5.06E-321 / 1.0);
+  TEST_DF_IS_ZERO (5.06E-321 / 12.3);
+
+  TEST_TRUE (5.06E-321 == -1.61895E-319);
+
+  TEST_TRUE (5.06E-321 == -1.61895E-319 ? 1 : 0);
+  TEST_TRUE (5.06E-321 == -1.61895E-319 ? 1 : 0);
+  TEST_TRUE (5.06E-321 != -1.61895E-319 ? 0 : 1);
+  TEST_TRUE (5.06E-321 >= -1.61895E-319 ? 1 : 0);
+  TEST_TRUE (5.06E-321 >= 1.61895E-319 ? 1 : 0);
+  TEST_TRUE (5.06E-321 > -1.61895E-319 ? 0 : 1);
+  TEST_TRUE (5.06E-321 > 1.61895E-319 ? 0 : 1);
+  TEST_TRUE (5.06E-321 <= -1.61895E-319 ? 1 : 0);
+  TEST_TRUE (5.06E-321 <= 1.61895E-319 ? 1 : 0);
+  TEST_TRUE (5.06E-321 < -1.61895E-319 ? 0 : 1);
+  TEST_TRUE (5.06E-321 < 1.61895E-319 ? 0 : 1);
+
+  /*  A < B ? A : B -> min (B, A)  must not happen (min flushes to zero).*/
+  TEST_DF_IS_SUBNORMAL (5.06E-321 < -1.61895E-319 ? 5.06E-321 : -1.61895E-319);
+
+  /* Normal and subnormal input. */
+  TEST_TRUE ((5.06E-321 + 3.33E-308) == 3.33E-308);
+  TEST_TRUE ((3.33E-308 - 5.06E-321) == 3.33E-308);
+
+  /* Expression with normal numbers. Result of the Substraction is
+     subnormal. */
+  double df_a = 3.33E-308;
+  double df_b = 2.78E-308;
+  double df_tmp = (df_a - df_b) + df_b;
+  TEST_TRUE (df_tmp == df_b);
+
+  return 0;
+}

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] -fftz-math: assume that denorms _must_ be flushed to zero optimizations
  2017-08-22 14:10       ` Pekka Jääskeläinen
@ 2017-08-22 14:20         ` Richard Biener
  0 siblings, 0 replies; 7+ messages in thread
From: Richard Biener @ 2017-08-22 14:20 UTC (permalink / raw)
  To: Pekka Jääskeläinen
  Cc: Joseph Myers, GCC Patches, Henry Linjamäki, Martin Jambor

On Tue, Aug 22, 2017 at 3:28 PM, Pekka Jääskeläinen <pekka@parmance.com> wrote:
> Hi Richard and Joseph,
>
> Replies for both inline:
>
> I wrote:
>>> Both the inputs and outputs must be flushed to zero in the HSAIL’s
>>> ‘ftz’ semantics.
>>> FTZ operations were previously always “explicit” in the BRIG FE output, like you
>>> propose here; there were builtin calls injected for all inputs and the
>>> output of ‘ftz’-marked
>>> float HSAIL instructions. This is still provided as a fallback for
>>> targets which do not
>>> support a CPU mode flag.
>
> On Mon, Aug 14, 2017 at 1:17 PM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> I see.  But how does making them implicit fix cases in the conformance
>> testsuite?  That is, isn't the error in the runtime implementation of
>> __hsail_ftz_*?  I'd have used a "simple" [...]
>
> There are two parts in the story here:
>
> 1) Making the FTZ/DAZ “the default”, meaning no builtin calls or
> similar are used to flush
> the operands/results, but relying on that the runtime flips on the
> FTZ/DAZ CPU flags
> before executing this code. This is purely a performance optimization because
> those FTZ/DAZ builtin calls (three per HSAIL instruction) ruin the performance
> for multiple reasons. We implemented this optimization already in our
> staging branch of
> the BRIG FE.
>
> 2) Ensuring GCC does not perform certain compile-time optimizations with the
> assumption that FTZ/DAZ is optional, but make it assume that ftz
> should happen for
> correctness. The proposed patch addresses this part for the compiler
> side by disabling
> the currently known optimizations which should be flushed at runtime
> when “ftz denorm
> math” is desired.
>
>>> The problem with a special FTZ ‘operation’ of some kind in the generic output is
>>> that the basic optimizations get confused by a new operation and we’d need to
>>> add knowledge of the ‘FTZ’ operation to a bunch of existing optimizer
>>> code, which
>>> seems unnecessary to support this case as the optimizations typically apply also
>>> for the ‘FTZ semantics’ when the FTZ/DAZ flag is on.
>>
>> Apart from the exceptions you needed to guard ... do you have an example of
>> a transform that is confused by explicit FTZ and that would be valid if that FTZ
>> were implicit?  An explicit FTZ should be much safer.  I think the builtins
>> should also be CONST and not only PURE.
>
> Explicit builtin calls ruin many optimizations starting from a simple
> common subexpression
> elimination if they don’t understand what the builtin returns for any
> given operand.

Calls to const functions are CSEd just fine (if they are passed the same
argument, that is).

int __attribute__((const)) foo (int i);

int main()
{
  return foo(1) + foo(1);
}

results in 2 * foo (1).

Note that I expected FTZ to be a tree code and not a builtin.  The target
can then choose to simply elide all FTZ.  Constant folding can then
also correctly handle FTZ in the places where it is relevant.

> Thus,
> inlining the builtin function’s code would be needed first and there
> would be a lot of code
> inlined due to the abundance of ftz calls required and you cannot
> eliminate it all (as at
> compile time you don’t know if the operand is a denorm or not). Another approach
> would be to introduce special cases to the optimizations affected so
> they understand
> the FTZ builtin and might be able to remove the useless ones. This potentially
> touches _a lot_ of code. And in the end, if the CPU could flush
> denorms efficiently
> using hardware (typically it’s faster to do FTZ in HW than gradual
> underflow so this
> is likely the case), any builtin call to do it that cannot be
> optimized away presents
> additional, possibly major, runtime overhead.

Understood.

> We tested if a simple common subexpression elimination case works with
> the ftz builtins
> and it didn’t. CONST didn’t help here.
>
> However, I understand your concern that there might be optimizations
> that still break the
> FTZ semantics if there are no explicit builtin calls, but we are
> prepared to fix them case by
> case if/when they appear. The attached updated patch fixes a few
> additional cases we noticed,
> e.g. it disables several constant folding cases.
>
> On Mon, Aug 14, 2017 at 2:30 PM, Joseph Myers <joseph@codesourcery.com> wrote:
>> Presumably this means that constant folding needs to know about those
>> semantics, both for operations with a subnormal floating-point argument
>> (whether or not the output is floating point, or floating point in the
>> same format), and those with such a result?
>> Can assignments copy subnormals without converting them to zero?  Should
>> comparisons flush input subnormals to zero before comparing?  Should
>> conversions e.g. from float to double convert a float subnormal input to
>> zero?
>
> I can answer yes to all of these questions.

I think the flag approach isn't good here.  If we'd have a mode that
doesn't have
denormals we could represent that but it's the language frontend that requires
a certain semantic and thus it should impose those as IL details.  These days
I'd not like to introduce global flags for semantic details of the IL
as we try to
get rid of those already existing.

Richard.

> BR,
> Pekka

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-08-22 13:52 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-10 17:54 [PATCH] -fftz-math: assume that denorms _must_ be flushed to zero optimizations Pekka Jääskeläinen
2017-08-14  9:28 ` Richard Biener
2017-08-14 11:21   ` Pekka Jääskeläinen
2017-08-14 11:25     ` Richard Biener
2017-08-14 13:17     ` Joseph Myers
2017-08-22 14:10       ` Pekka Jääskeläinen
2017-08-22 14:20         ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).