public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/19988] New: [4.0 Regression] pessimizes fp multiply-add/subtract combo
@ 2005-02-16  4:49 gcc-bugzilla at gcc dot gnu dot org
  2005-02-16  5:15 ` [Bug middle-end/19988] " pinskia at gcc dot gnu dot org
                   ` (11 more replies)
  0 siblings, 12 replies; 13+ messages in thread
From: gcc-bugzilla at gcc dot gnu dot org @ 2005-02-16  4:49 UTC (permalink / raw)
  To: gcc-bugs


When I compile the following code with 'gcc -O3 --save-temps -c':

double foo(double x, double y)
{
     return ((x + 0.1234 * y) * (x - 0.1234 * y));
}

gcc 3.x gives one load of the constant 0.1234, one multiplication
0.1234 * y, one addition, one subtraction, and the final
multiplication: total = one constant (load) and four fp operations.

gcc 4.0 (20050213 snapshot), on the other hand, compiles (x - 0.1234 *
y) as (x + (-0.1234) * y), and thus doesn't recognize that it is the
same constant as in the other expression.  Thus, it produces *two*
constants (2 loads), and *five* fp operations (3 multiplications):

foo:
        pushl   %ebp
        movl    %esp, %ebp
        fldl    16(%ebp)
        fld     %st(0)
        fldl    8(%ebp)
        fxch    %st(1)
        fmull   .LC0
        fxch    %st(2)
        popl    %ebp
        fmull   .LC1
        fxch    %st(2)
        fadd    %st(1), %st
        fxch    %st(1)
        faddp   %st, %st(2)
        fmulp   %st, %st(1)
        ret

As you can imagine, this leads to a major slowdown in code that has
lots of multiply-add and multiply-subtract combinations...in
particular any FFT (such as our FFTW, www.fftw.org) could
suffer a lot.

Thanks for your efforts,
Steven G Johnson

PS. When you fix this, please don't re-introduce another optimizer bug
that appears in gcc 3.x.  In particular, when compiling for a PowerPC
target, it *should* produce one constant load, one fused multiply-add,
one fused-multiply subtract, and one multiplication.  gcc 3.x, on the
other hand, pulls out the (0.1234 * y) in CSE, and thus does not
exploit the fma.  gcc 4.0 on PowerPC (MacOS 10.3) produces:

_foo:
        mflr r0
        bcl 20,31,"L00000000001$pb"
"L00000000001$pb":
        stw r31,-4(r1)
        fmr f13,f1
        mflr r31
        stw r0,8(r1)
        lwz r0,8(r1)
        addis r2,r31,ha16(LC0-"L00000000001$pb")
        lfd f1,lo16(LC0-"L00000000001$pb")(r2)
        addis r2,r31,ha16(LC1-"L00000000001$pb")
        lfd f0,lo16(LC1-"L00000000001$pb")(r2)
Cordially,        mtlr r0
        fmadd f1,f2,f1,f13
        lwz r31,-4(r1)
        fmadd f2,f2,f0,f13
        fmul f1,f1,f2
        blr

which utilizes the fma, but loads the constant twice (as 0.1234 and
-0.1234) instead of using fmadd and fmsub.

PPS. In general, turning negative constants into positive constants by
changing additions into subtractions can lead to substantial speedups
by reducing the number of fp constants in certain kinds of code.
e.g. "manually" doing this in FFTW gained us 10-15% in speed; YMMV.
Something to think about.

Environment:
System: Linux fftw.org 2.6.3-1-686-smp #2 SMP Tue Feb 24 20:29:08 EST 2004 i686 GNU/Linux
Architecture: i686

	
host: i686-pc-linux-gnu
build: i686-pc-linux-gnu
target: i686-pc-linux-gnu
configured with: ../configure --prefix=/home/stevenj/gcc4

How-To-Repeat:
Compile above foo() subroutine with gcc -O3 -c --save-temps and look
at assembler output.

-- 
           Summary: [4.0 Regression] pessimizes fp multiply-add/subtract
                    combo
           Product: gcc
           Version: 4.0.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: stevenj at fftw dot org
                CC: gcc-bugs at gcc dot gnu dot org
 GCC build triplet: i686-pc-linux-gnu
  GCC host triplet: i686-pc-linux-gnu
GCC target triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19988


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2005-09-27 16:22 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-02-16  4:49 [Bug c/19988] New: [4.0 Regression] pessimizes fp multiply-add/subtract combo gcc-bugzilla at gcc dot gnu dot org
2005-02-16  5:15 ` [Bug middle-end/19988] " pinskia at gcc dot gnu dot org
2005-02-16  6:33 ` pinskia at gcc dot gnu dot org
2005-02-16 19:00 ` pinskia at gcc dot gnu dot org
2005-02-16 19:01 ` pinskia at gcc dot gnu dot org
2005-02-16 23:47 ` roger at eyesopen dot com
2005-02-17  3:35 ` athena at fftw dot org
2005-02-17  3:38 ` athena at fftw dot org
2005-02-19 16:58 ` roger at eyesopen dot com
2005-04-21  5:03 ` [Bug middle-end/19988] [4.0/4.1 " mmitchel at gcc dot gnu dot org
2005-07-08  1:42 ` mmitchel at gcc dot gnu dot org
2005-08-17 12:45 ` pinskia at gcc dot gnu dot org
2005-09-27 16:23 ` mmitchel at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).