x87 float truncation/accuracy (gcc vs. icc/msvc)

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* x87 float truncation/accuracy (gcc vs. icc/msvc)
@ 2004-03-18 17:49 Roger Sayle
  2004-03-18 18:40 ` Jan Hubicka
  2004-03-19 11:30 ` Thomas Kunert
  0 siblings, 2 replies; 7+ messages in thread
From: Roger Sayle @ 2004-03-18 17:49 UTC (permalink / raw)
  To: gcc

My boss recently alerted me to an anomalous performance change in
a piece of code he was working on.  The reduced test case is shown
below:

float foo(float *x)
{
  int i;
  float y = 0.0;
  for (i=0; i<10; i++)
    y += 2.0*x[i];
  return y;
}

which on x87 runs at less than half of the speed of changing the
"2.0" to "2.0f"!?.  The cause can be seen in the assembly output:

foo:    subl    $4, %esp
        fldz
        movl    8(%esp), %edx
        xorl    %eax, %eax
.L6:    flds    (%edx,%eax,4)
        incl    %eax
        cmpl    $9, %eax
        fadd    %st(0), %st
        faddp   %st, %st(1)
        fstps   (%esp)           <--- here
        flds    (%esp)           <--- here
        jle     .L6
        popl    %eax
        ret

The two instructions marked "here" are removed by using the float
constant, but are present with a double constant, even with -ffast-math.
The problem is that these instructions round %st(0) to a "float" by
storing it to memory and reading it back in again.  Changing the type
of "y" to double also resolves the problem (even though the addition
is actually XFmode, we don't attempt to correctly round down to DFmode).

In the original code, these two variants actually produced significantly
different results by changing the coefficient's type.  Particularly,
interesting is that both the Microsoft Visual C/C++ compiler and Intel's
icc both *by default* completely optimized away this "float_truncate",
producing incorrectly rounded results.

The same problem also hurts GCC on the popular "mflop" benchmark.

My interest now is how best to catch this transformation/optimization
using flag_unsafe_math_optimizations.  My analysis so far is that
this is an i386.md specific transformation.  On many machines "float"
operations are faster than "double", and their hardware often supports
efficient "double->float" conversion.  The IA-32 architecture on the
other hand seems unique in that commercial compilers are free to consider
"truncdfsf2" as a no-op, in the same way as "extendsfdf2".

Do any of the x86 backend gurus have any suggestions as to how best
to implement "truncdfsf2" as a move between x87 registers, but as a
regular "fst*s" instruction for memory targets?  My initial attempt
was to simply guard the following splitter with !flag_unsafe_math_...

(define_split
  [(set (match_operand:SF 0 "register_operand" "")
        (float_truncate:SF
         (match_operand:DF 1 "fp_register_operand" "")))
   (clobber (match_operand:SF 2 "memory_operand" ""))]
  "TARGET_80387 && reload_completed"
  [(set (match_dup 2) (float_truncate:SF (match_dup 1)))
   (set (match_dup 0) (match_dup 2))]
  "")

Alas this failed miserably.

Any advice would be much appreciated.  I've confirmed that GCC performs
the related "safe" constant folding optimizations, such as converting
"(float)((double)f1 op (double)f2)" into "f1 op f2" for floating point
values f1 and f2, and operation op one of add, sub or mul.  For "mul",
for example, the two 24 bit mantissas of an IEEE "float" can't overflow
the 53 bit mantissa of an IEEE double, so there's no double rounding and
so a floating point multiplication returns the same (perfectly rounded)
result.  These don't help the code above however, which is fundamentally
unsafe and not normally a win except on Intel.

Roger
--

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: x87 float truncation/accuracy (gcc vs. icc/msvc)
  2004-03-18 17:49 x87 float truncation/accuracy (gcc vs. icc/msvc) Roger Sayle
@ 2004-03-18 18:40 ` Jan Hubicka
  2004-03-18 19:18   ` Roger Sayle
  2004-03-19 11:30 ` Thomas Kunert
  1 sibling, 1 reply; 7+ messages in thread
From: Jan Hubicka @ 2004-03-18 18:40 UTC (permalink / raw)
  To: Roger Sayle; +Cc: gcc

> Do any of the x86 backend gurus have any suggestions as to how best
> to implement "truncdfsf2" as a move between x87 registers, but as a
> regular "fst*s" instruction for memory targets?  My initial attempt
> was to simply guard the following splitter with !flag_unsafe_math_...
> 
> (define_split
>   [(set (match_operand:SF 0 "register_operand" "")
>         (float_truncate:SF
>          (match_operand:DF 1 "fp_register_operand" "")))
>    (clobber (match_operand:SF 2 "memory_operand" ""))]
>   "TARGET_80387 && reload_completed"
>   [(set (match_dup 2) (float_truncate:SF (match_dup 1)))
>    (set (match_dup 0) (match_dup 2))]
>   "")
> 
> Alas this failed miserably.

You can just cut&paste the extendsfdf implementation.  Basically it
immitate move pattern for x87 but do proper conversions for SSE.
I was very tempted to do this for a while (and sent patch back in 98 or
so) but there appeared to be consensus that the truncations are very
important, but it does not seem to be the practice.
If you want to get really good about elliminating the truncations, you
will need to play the games with combiner patterns containing trucnates,
silimarly as we do for extensions but this is tricky (you will face
pattern explosion).

In the past we tried to use match_operand predicate that accepts the
extensions but that approach failed since reload is handling unarry
expressions in operand by passing them to move patterns and this
behaviour is needed by some other targets.

Honza
> 
> Any advice would be much appreciated.  I've confirmed that GCC performs
> the related "safe" constant folding optimizations, such as converting
> "(float)((double)f1 op (double)f2)" into "f1 op f2" for floating point
> values f1 and f2, and operation op one of add, sub or mul.  For "mul",
> for example, the two 24 bit mantissas of an IEEE "float" can't overflow
> the 53 bit mantissa of an IEEE double, so there's no double rounding and
> so a floating point multiplication returns the same (perfectly rounded)
> result.  These don't help the code above however, which is fundamentally
> unsafe and not normally a win except on Intel.
> 
> Roger
> --

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: x87 float truncation/accuracy (gcc vs. icc/msvc)
  2004-03-18 18:40 ` Jan Hubicka
@ 2004-03-18 19:18   ` Roger Sayle
  2004-03-20 22:07     ` Jan Hubicka
  0 siblings, 1 reply; 7+ messages in thread
From: Roger Sayle @ 2004-03-18 19:18 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc

On Thu, 18 Mar 2004, Jan Hubicka wrote:
> You can just cut&paste the extendsfdf implementation.  Basically it
> immitate move pattern for x87 but do proper conversions for SSE.
> I was very tempted to do this for a while (and sent patch back in 98
> or so) but there appeared to be consensus that the truncations are
> very important, but it does not seem to be the practice.

If you can find the original posting and/or refresh this patch against
mainline, I'll be happy to review it for you, provided any change in
current functionality is guarded by flag_unsafe_math_optimizations.

> If you want to get really good about elliminating the truncations, you
> will need to play the games with combiner patterns containing truncates,
> silimarly as we do for extensions but this is tricky (you will face
> pattern explosion).

Indeed.  This was my major concern about modelling truncation like
extension.  i386.md already contains a large number of patterns intended
purely to provide "*ext" variants of floating point operations.  I
wasn't sure if these were required by the i386 backend's extend?f?f2
implementation.

Thanks for your help,

Roger
--

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: x87 float truncation/accuracy (gcc vs. icc/msvc)
  2004-03-18 17:49 x87 float truncation/accuracy (gcc vs. icc/msvc) Roger Sayle
  2004-03-18 18:40 ` Jan Hubicka
@ 2004-03-19 11:30 ` Thomas Kunert
  2004-03-19 12:28   ` Jakub Jelinek
  2004-03-21  2:04   ` Robert Dewar
  1 sibling, 2 replies; 7+ messages in thread
From: Thomas Kunert @ 2004-03-19 11:30 UTC (permalink / raw)
  To: Roger Sayle; +Cc: gcc

Roger Sayle wrote:

> interesting is that both the Microsoft Visual C/C++ compiler and Intel's
> icc both *by default* completely optimized away this "float_truncate",
> producing incorrectly rounded results.
> 

Could you please explain what's wrong with incorrectly rounded results?
I was under the impression accuracy and performance is more important to
most people. And for the ones who actually care about rounding
there is -ffloat-store. What is this option for, if the truncation happens
anyway?

The current behavior is very surprising, at least to me. Excess precision is 
documented and known by most people doing numerical applications. On the other 
hand, a performance impact for the sake of correct rounding is surprising and 
very annoying here, especially since one cannot turn it off. And please don't 
expect everyone to use a flag named unsafe-math-optimizations.

Thanks,
Thomas Kunert

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: x87 float truncation/accuracy (gcc vs. icc/msvc)
  2004-03-19 11:30 ` Thomas Kunert
@ 2004-03-19 12:28   ` Jakub Jelinek
  2004-03-21  2:04   ` Robert Dewar
  1 sibling, 0 replies; 7+ messages in thread
From: Jakub Jelinek @ 2004-03-19 12:28 UTC (permalink / raw)
  To: Thomas Kunert; +Cc: Roger Sayle, gcc

On Fri, Mar 19, 2004 at 09:17:27AM +0100, Thomas Kunert wrote:
> Roger Sayle wrote:
> 
> >interesting is that both the Microsoft Visual C/C++ compiler and Intel's
> >icc both *by default* completely optimized away this "float_truncate",
> >producing incorrectly rounded results.
> >
> 
> Could you please explain what's wrong with incorrectly rounded results?
> I was under the impression accuracy and performance is more important to
> most people. And for the ones who actually care about rounding
> there is -ffloat-store. What is this option for, if the truncation happens
> anyway?

OTOH ISO C99 requires certain handling of float/double which
GCC on IA-32 doesn't meet even with -ffloat-store.

	Jakub

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: x87 float truncation/accuracy (gcc vs. icc/msvc)
  2004-03-18 19:18   ` Roger Sayle
@ 2004-03-20 22:07     ` Jan Hubicka
  0 siblings, 0 replies; 7+ messages in thread
From: Jan Hubicka @ 2004-03-20 22:07 UTC (permalink / raw)
  To: Roger Sayle; +Cc: Jan Hubicka, gcc

> 
> On Thu, 18 Mar 2004, Jan Hubicka wrote:
> > You can just cut&paste the extendsfdf implementation.  Basically it
> > immitate move pattern for x87 but do proper conversions for SSE.
> > I was very tempted to do this for a while (and sent patch back in 98
> > or so) but there appeared to be consensus that the truncations are
> > very important, but it does not seem to be the practice.
> 
> If you can find the original posting and/or refresh this patch against
> mainline, I'll be happy to review it for you, provided any change in
> current functionality is guarded by flag_unsafe_math_optimizations.

It will need probably complette rewrite as the backend has been rewriten
since then too.  I will try to re-implement it, but I won'd be able to
do much work till 20th april, so probably i will have to delay it after
that.

Honza
> 
> 
> > If you want to get really good about elliminating the truncations, you
> > will need to play the games with combiner patterns containing truncates,
> > silimarly as we do for extensions but this is tricky (you will face
> > pattern explosion).
> 
> Indeed.  This was my major concern about modelling truncation like
> extension.  i386.md already contains a large number of patterns intended
> purely to provide "*ext" variants of floating point operations.  I
> wasn't sure if these were required by the i386 backend's extend?f?f2
> implementation.
> 
> Thanks for your help,
> 
> Roger
> --

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: x87 float truncation/accuracy (gcc vs. icc/msvc)
  2004-03-19 11:30 ` Thomas Kunert
  2004-03-19 12:28   ` Jakub Jelinek
@ 2004-03-21  2:04   ` Robert Dewar
  1 sibling, 0 replies; 7+ messages in thread
From: Robert Dewar @ 2004-03-21  2:04 UTC (permalink / raw)
  To: Thomas Kunert; +Cc: Roger Sayle, gcc

> Could you please explain what's wrong with incorrectly rounded results?

What's wrong with them is that they are incorrectly rounded!

> I was under the impression accuracy and performance is more important to
> most people.

Incorrect rounding is precisely a case of giving inaccurate results. Now
there are those who are willing to tolerate inaccurate results if they
can get them faster. After all lots of Fortran programmers tolerate
Cray's fast junk arithmetic (in which 81.0/3.0 was not 27.0) for
years! But this kind of behavior should not occur by default in
a compiler that advertises itself as a C compiler!

 > And please don't expect everyone to use a flag
> named unsafe-math-optimizations.

Why not? That's exactly what you are talking about, optimizations
that are not safe, in the sense that they give wrong results.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2004-03-20 23:08 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-03-18 17:49 x87 float truncation/accuracy (gcc vs. icc/msvc) Roger Sayle
2004-03-18 18:40 ` Jan Hubicka
2004-03-18 19:18   ` Roger Sayle
2004-03-20 22:07     ` Jan Hubicka
2004-03-19 11:30 ` Thomas Kunert
2004-03-19 12:28   ` Jakub Jelinek
2004-03-21  2:04   ` Robert Dewar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).