x87 float truncation/accuracy (gcc vs. icc/msvc)

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

From: Roger Sayle <roger@eyesopen.com>
To: gcc@gcc.gnu.org
Subject: x87 float truncation/accuracy (gcc vs. icc/msvc)
Date: Thu, 18 Mar 2004 17:49:00 -0000	[thread overview]
Message-ID: <Pine.LNX.4.44.0403180818300.16213-100000@www.eyesopen.com> (raw)

My boss recently alerted me to an anomalous performance change in
a piece of code he was working on.  The reduced test case is shown
below:

float foo(float *x)
{
  int i;
  float y = 0.0;
  for (i=0; i<10; i++)
    y += 2.0*x[i];
  return y;
}

which on x87 runs at less than half of the speed of changing the
"2.0" to "2.0f"!?.  The cause can be seen in the assembly output:

foo:    subl    $4, %esp
        fldz
        movl    8(%esp), %edx
        xorl    %eax, %eax
.L6:    flds    (%edx,%eax,4)
        incl    %eax
        cmpl    $9, %eax
        fadd    %st(0), %st
        faddp   %st, %st(1)
        fstps   (%esp)           <--- here
        flds    (%esp)           <--- here
        jle     .L6
        popl    %eax
        ret

The two instructions marked "here" are removed by using the float
constant, but are present with a double constant, even with -ffast-math.
The problem is that these instructions round %st(0) to a "float" by
storing it to memory and reading it back in again.  Changing the type
of "y" to double also resolves the problem (even though the addition
is actually XFmode, we don't attempt to correctly round down to DFmode).

In the original code, these two variants actually produced significantly
different results by changing the coefficient's type.  Particularly,
interesting is that both the Microsoft Visual C/C++ compiler and Intel's
icc both *by default* completely optimized away this "float_truncate",
producing incorrectly rounded results.

The same problem also hurts GCC on the popular "mflop" benchmark.

My interest now is how best to catch this transformation/optimization
using flag_unsafe_math_optimizations.  My analysis so far is that
this is an i386.md specific transformation.  On many machines "float"
operations are faster than "double", and their hardware often supports
efficient "double->float" conversion.  The IA-32 architecture on the
other hand seems unique in that commercial compilers are free to consider
"truncdfsf2" as a no-op, in the same way as "extendsfdf2".

Do any of the x86 backend gurus have any suggestions as to how best
to implement "truncdfsf2" as a move between x87 registers, but as a
regular "fst*s" instruction for memory targets?  My initial attempt
was to simply guard the following splitter with !flag_unsafe_math_...

(define_split
  [(set (match_operand:SF 0 "register_operand" "")
        (float_truncate:SF
         (match_operand:DF 1 "fp_register_operand" "")))
   (clobber (match_operand:SF 2 "memory_operand" ""))]
  "TARGET_80387 && reload_completed"
  [(set (match_dup 2) (float_truncate:SF (match_dup 1)))
   (set (match_dup 0) (match_dup 2))]
  "")

Alas this failed miserably.

Any advice would be much appreciated.  I've confirmed that GCC performs
the related "safe" constant folding optimizations, such as converting
"(float)((double)f1 op (double)f2)" into "f1 op f2" for floating point
values f1 and f2, and operation op one of add, sub or mul.  For "mul",
for example, the two 24 bit mantissas of an IEEE "float" can't overflow
the 53 bit mantissa of an IEEE double, so there's no double rounding and
so a floating point multiplication returns the same (perfectly rounded)
result.  These don't help the code above however, which is fundamentally
unsafe and not normally a win except on Intel.

Roger
--

next             reply	other threads:[~2004-03-18 17:30 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-03-18 17:49 Roger Sayle [this message]
2004-03-18 18:40 ` Jan Hubicka
2004-03-18 19:18   ` Roger Sayle
2004-03-20 22:07     ` Jan Hubicka
2004-03-19 11:30 ` Thomas Kunert
2004-03-19 12:28   ` Jakub Jelinek
2004-03-21  2:04   ` Robert Dewar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.44.0403180818300.16213-100000@www.eyesopen.com \
    --to=roger@eyesopen.com \
    --cc=gcc@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).