[Bug c/19988] New: [4.0 Regression] pessimizes fp multiply-add/subtract combo

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug c/19988] New: [4.0 Regression] pessimizes fp multiply-add/subtract combo
@ 2005-02-16  4:49 gcc-bugzilla at gcc dot gnu dot org
  2005-02-16  5:15 ` [Bug middle-end/19988] " pinskia at gcc dot gnu dot org
                   ` (11 more replies)
  0 siblings, 12 replies; 13+ messages in thread
From: gcc-bugzilla at gcc dot gnu dot org @ 2005-02-16  4:49 UTC (permalink / raw)
  To: gcc-bugs

When I compile the following code with 'gcc -O3 --save-temps -c':

double foo(double x, double y)
{
     return ((x + 0.1234 * y) * (x - 0.1234 * y));
}

gcc 3.x gives one load of the constant 0.1234, one multiplication
0.1234 * y, one addition, one subtraction, and the final
multiplication: total = one constant (load) and four fp operations.

gcc 4.0 (20050213 snapshot), on the other hand, compiles (x - 0.1234 *
y) as (x + (-0.1234) * y), and thus doesn't recognize that it is the
same constant as in the other expression.  Thus, it produces *two*
constants (2 loads), and *five* fp operations (3 multiplications):

foo:
        pushl   %ebp
        movl    %esp, %ebp
        fldl    16(%ebp)
        fld     %st(0)
        fldl    8(%ebp)
        fxch    %st(1)
        fmull   .LC0
        fxch    %st(2)
        popl    %ebp
        fmull   .LC1
        fxch    %st(2)
        fadd    %st(1), %st
        fxch    %st(1)
        faddp   %st, %st(2)
        fmulp   %st, %st(1)
        ret

As you can imagine, this leads to a major slowdown in code that has
lots of multiply-add and multiply-subtract combinations...in
particular any FFT (such as our FFTW, www.fftw.org) could
suffer a lot.

Thanks for your efforts,
Steven G Johnson

PS. When you fix this, please don't re-introduce another optimizer bug
that appears in gcc 3.x.  In particular, when compiling for a PowerPC
target, it *should* produce one constant load, one fused multiply-add,
one fused-multiply subtract, and one multiplication.  gcc 3.x, on the
other hand, pulls out the (0.1234 * y) in CSE, and thus does not
exploit the fma.  gcc 4.0 on PowerPC (MacOS 10.3) produces:

_foo:
        mflr r0
        bcl 20,31,"L00000000001$pb"
"L00000000001$pb":
        stw r31,-4(r1)
        fmr f13,f1
        mflr r31
        stw r0,8(r1)
        lwz r0,8(r1)
        addis r2,r31,ha16(LC0-"L00000000001$pb")
        lfd f1,lo16(LC0-"L00000000001$pb")(r2)
        addis r2,r31,ha16(LC1-"L00000000001$pb")
        lfd f0,lo16(LC1-"L00000000001$pb")(r2)
Cordially,        mtlr r0
        fmadd f1,f2,f1,f13
        lwz r31,-4(r1)
        fmadd f2,f2,f0,f13
        fmul f1,f1,f2
        blr

which utilizes the fma, but loads the constant twice (as 0.1234 and
-0.1234) instead of using fmadd and fmsub.

PPS. In general, turning negative constants into positive constants by
changing additions into subtractions can lead to substantial speedups
by reducing the number of fp constants in certain kinds of code.
e.g. "manually" doing this in FFTW gained us 10-15% in speed; YMMV.
Something to think about.

Environment:
System: Linux fftw.org 2.6.3-1-686-smp #2 SMP Tue Feb 24 20:29:08 EST 2004 i686 GNU/Linux
Architecture: i686

host: i686-pc-linux-gnu
build: i686-pc-linux-gnu
target: i686-pc-linux-gnu
configured with: ../configure --prefix=/home/stevenj/gcc4

How-To-Repeat:
Compile above foo() subroutine with gcc -O3 -c --save-temps and look
at assembler output.

-- 
           Summary: [4.0 Regression] pessimizes fp multiply-add/subtract
                    combo
           Product: gcc
           Version: 4.0.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: stevenj at fftw dot org
                CC: gcc-bugs at gcc dot gnu dot org
 GCC build triplet: i686-pc-linux-gnu
  GCC host triplet: i686-pc-linux-gnu
GCC target triplet: i686-pc-linux-gnu

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19988

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug middle-end/19988] [4.0 Regression] pessimizes fp multiply-add/subtract combo
  2005-02-16  4:49 [Bug c/19988] New: [4.0 Regression] pessimizes fp multiply-add/subtract combo gcc-bugzilla at gcc dot gnu dot org
@ 2005-02-16  5:15 ` pinskia at gcc dot gnu dot org
  2005-02-16  6:33 ` pinskia at gcc dot gnu dot org
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-02-16  5:15 UTC (permalink / raw)
  To: gcc-bugs



-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|c                           |middle-end
           Keywords|                            |missed-optimization
   Target Milestone|---                         |4.0.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19988


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug middle-end/19988] [4.0 Regression] pessimizes fp multiply-add/subtract combo
  2005-02-16  4:49 [Bug c/19988] New: [4.0 Regression] pessimizes fp multiply-add/subtract combo gcc-bugzilla at gcc dot gnu dot org
  2005-02-16  5:15 ` [Bug middle-end/19988] " pinskia at gcc dot gnu dot org
@ 2005-02-16  6:33 ` pinskia at gcc dot gnu dot org
  2005-02-16 19:00 ` pinskia at gcc dot gnu dot org
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-02-16  6:33 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From pinskia at gcc dot gnu dot org  2005-02-15 23:42 -------
Almost want to say this is caused by:
2004-07-11  Roger Sayle  <roger@eyesopen.com>

        * fold-const.c (fold) <PLUS_EXPR>: Canonicalize X + -C as X - C for
        floating point additions, to keep real immediate constant positive.
        <MINUS_EXPR>:  For floating point subtractions, only transform X - -C
        into X + C, and leave positive real constants as X - C.

http://gcc.gnu.org/ml/gcc-patches/2004-07/msg01155.html

Confirmed.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |roger at eyesopen dot com
           Severity|normal                      |minor
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|                            |1
   Last reconfirmed|0000-00-00 00:00:00         |2005-02-15 23:42:25
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19988


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug middle-end/19988] [4.0 Regression] pessimizes fp multiply-add/subtract combo
  2005-02-16  4:49 [Bug c/19988] New: [4.0 Regression] pessimizes fp multiply-add/subtract combo gcc-bugzilla at gcc dot gnu dot org
  2005-02-16  5:15 ` [Bug middle-end/19988] " pinskia at gcc dot gnu dot org
  2005-02-16  6:33 ` pinskia at gcc dot gnu dot org
@ 2005-02-16 19:00 ` pinskia at gcc dot gnu dot org
  2005-02-16 19:01 ` pinskia at gcc dot gnu dot org
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-02-16 19:00 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From pinskia at gcc dot gnu dot org  2005-02-16 14:49 -------
Actually this is a much older regression than what I had orginally thought.
The asm produced changed between 20040201 and 20040301.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19988


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug middle-end/19988] [4.0 Regression] pessimizes fp multiply-add/subtract combo
  2005-02-16  4:49 [Bug c/19988] New: [4.0 Regression] pessimizes fp multiply-add/subtract combo gcc-bugzilla at gcc dot gnu dot org
                   ` (2 preceding siblings ...)
  2005-02-16 19:00 ` pinskia at gcc dot gnu dot org
@ 2005-02-16 19:01 ` pinskia at gcc dot gnu dot org
  2005-02-16 23:47 ` roger at eyesopen dot com
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-02-16 19:01 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From pinskia at gcc dot gnu dot org  2005-02-16 14:51 -------
Ok, this is the patch which caused this code generation regression:
2004-02-07  Roger Sayle  <roger@eyesopen.com>

        * fold-const.c (negate_expr_p, negate_expr): Optimize -(A+B) into
        either (-A)-B or (-B)-A, if A or B is easily negated respectively.
        (fold) <MINUS_EXPR>: Optimize (A*C) - (B*C) -> (A-B)*C for both
        integer types and floating point with unsafe_math_optimizations.
        Add similar optimization for (A*C1) - (A*C2) -> A*(C1-C2).
        Optimize A - B as A + (-B), if B is easily negated.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19988


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug middle-end/19988] [4.0 Regression] pessimizes fp multiply-add/subtract combo
  2005-02-16  4:49 [Bug c/19988] New: [4.0 Regression] pessimizes fp multiply-add/subtract combo gcc-bugzilla at gcc dot gnu dot org
                   ` (3 preceding siblings ...)
  2005-02-16 19:01 ` pinskia at gcc dot gnu dot org
@ 2005-02-16 23:47 ` roger at eyesopen dot com
  2005-02-17  3:35 ` athena at fftw dot org
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: roger at eyesopen dot com @ 2005-02-16 23:47 UTC (permalink / raw)
  To: gcc-bugs

------- Additional Comments From roger at eyesopen dot com  2005-02-16 19:17 -------
Hmm.  I don't think the problem in this case is at the tree-level, where I think
keeping X-(Y*C) and -(Y*C) as a more canonical X + (Y*C') and Y*C' should help
with reassociation and other tree-ssa optimizations.  Indeed, it's these types
of transformations that have enabled the use of fmadd on the PowerPC for mainline.

The regression however comes from the (rare) interaction when a floating point
constant and its negative now need to be stored in the constant pool.  It's only
when X and -X are required in a function (potentially in short succession) that
this is a problem, and then only on machines that need to load floating point
constant from memory (AVR and other platforms with immediate floating point
constants, for example, are unaffected).

Some aspects of keeping X and -X in the constant pool were addressed by my
patch quoted in comment #1, which attempts to keep floating point constant
positive *when* this doesn't interfere with GCC's other optimizations.

I think the correct solution to this regression is to improve CSE/GCSE to
recognize that X*C can be synthesized from a previously available X*(-C) at
the cost of a negation, which is presumably cheaper than a multiplication on
most platforms.  Indeed, there's probably a set of targets for which loading
a positive from a constant pool and then negating it, is cheaper than loading
both a positive constant and then loading a negative constant.

Unfortunately, I doubt whether it'll be possible to siumultaneously address
this performance regression without reintroducing the 3.x issue mentioned in
the original "PS".  I doubt on many platforms a two multiply-adds are much
faster than a single floating point multiplication whose result is shared by
two additions.  Though again it might be possible to do something at the RTL
level, especially if duplicating the multiplication is a win with -Os.

-- 

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19988

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug middle-end/19988] [4.0 Regression] pessimizes fp multiply-add/subtract combo
  2005-02-16  4:49 [Bug c/19988] New: [4.0 Regression] pessimizes fp multiply-add/subtract combo gcc-bugzilla at gcc dot gnu dot org
                   ` (4 preceding siblings ...)
  2005-02-16 23:47 ` roger at eyesopen dot com
@ 2005-02-17  3:35 ` athena at fftw dot org
  2005-02-17  3:38 ` athena at fftw dot org
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: athena at fftw dot org @ 2005-02-17  3:35 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From athena at fftw dot org  2005-02-16 20:37 -------
(In reply to comment #3)
> Ok, this is the patch which caused this code generation regression:
> 2004-02-07  Roger Sayle  <roger@eyesopen.com>
> 
>         * fold-const.c (negate_expr_p, negate_expr): Optimize -(A+B) into
>         either (-A)-B or (-B)-A, if A or B is easily negated respectively.
>         (fold) <MINUS_EXPR>: Optimize (A*C) - (B*C) -> (A-B)*C for both
>         integer types and floating point with unsafe_math_optimizations.
>         Add similar optimization for (A*C1) - (A*C2) -> A*(C1-C2).
>         Optimize A - B as A + (-B), if B is easily negated.

Note that the straightforward transformation -(A+B) => (-A)-B is illegal
according to the ieee-754 standard.  For A=-0 and B=+0, -(A+B) = -(+0) = -0,
whereas (-A)-B = +0 - +0 = +0 + -0 = +0.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19988


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug middle-end/19988] [4.0 Regression] pessimizes fp multiply-add/subtract combo
  2005-02-16  4:49 [Bug c/19988] New: [4.0 Regression] pessimizes fp multiply-add/subtract combo gcc-bugzilla at gcc dot gnu dot org
                   ` (5 preceding siblings ...)
  2005-02-17  3:35 ` athena at fftw dot org
@ 2005-02-17  3:38 ` athena at fftw dot org
  2005-02-19 16:58 ` roger at eyesopen dot com
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: athena at fftw dot org @ 2005-02-17  3:38 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From athena at fftw dot org  2005-02-16 20:44 -------
> Unfortunately, I doubt whether it'll be possible to siumultaneously address
> this performance regression without reintroducing the 3.x issue mentioned in
> the original "PS".  I doubt on many platforms a two multiply-adds are much
> faster than a single floating point multiplication whose result is shared by
> two additions.  Though again it might be possible to do something at the RTL
> level, especially if duplicating the multiplication is a win with -Os.

PowerPC is indeed a platform where an addition costs the same as a
multiplication and the same as a fused multiply-add.  The ia64 FPU does FMA's
only; you code A*B as A*B+(-0), and A+B as A*1+B.  (On a related
matter, altivec has FMA but not multiplication, and the same trick
applies.)

Bottom line: gcc should make an effort to respect FMAs, at least when
they appear explicitly in the source code.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19988


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug middle-end/19988] [4.0 Regression] pessimizes fp multiply-add/subtract combo
  2005-02-16  4:49 [Bug c/19988] New: [4.0 Regression] pessimizes fp multiply-add/subtract combo gcc-bugzilla at gcc dot gnu dot org
                   ` (6 preceding siblings ...)
  2005-02-17  3:38 ` athena at fftw dot org
@ 2005-02-19 16:58 ` roger at eyesopen dot com
  2005-04-21  5:03 ` [Bug middle-end/19988] [4.0/4.1 " mmitchel at gcc dot gnu dot org
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: roger at eyesopen dot com @ 2005-02-19 16:58 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From roger at eyesopen dot com  2005-02-19 05:41 -------
Re: comment #5
For floating point expressions, -(A+B) is only transformed into (-A)-B or
(-B)-A when the user explicitly specifies -ffast-math, i.e. only when
flag_unsafe_math_optimizations is true.

Re: comment #6
Interesting.  Although on a handful of rs6000 cores (mpccore, 601 and 603),
a fused-multiply-add is more expensive that an addition, its always a win
to perform two fma's rather than a mult and two adds.  It might be possible
(with some work) to teach combine to un-CSE the following:

double x;
double y;

void foo(double p, double q, double r, double s)
{
  double t = p * q;
  x = t + r;
  y = t + s;
}


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19988


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug middle-end/19988] [4.0/4.1 Regression] pessimizes fp multiply-add/subtract combo
  2005-02-16  4:49 [Bug c/19988] New: [4.0 Regression] pessimizes fp multiply-add/subtract combo gcc-bugzilla at gcc dot gnu dot org
                   ` (7 preceding siblings ...)
  2005-02-19 16:58 ` roger at eyesopen dot com
@ 2005-04-21  5:03 ` mmitchel at gcc dot gnu dot org
  2005-07-08  1:42 ` mmitchel at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2005-04-21  5:03 UTC (permalink / raw)
  To: gcc-bugs



-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.0.0                       |4.0.1


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19988


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug middle-end/19988] [4.0/4.1 Regression] pessimizes fp multiply-add/subtract combo
  2005-02-16  4:49 [Bug c/19988] New: [4.0 Regression] pessimizes fp multiply-add/subtract combo gcc-bugzilla at gcc dot gnu dot org
                   ` (8 preceding siblings ...)
  2005-04-21  5:03 ` [Bug middle-end/19988] [4.0/4.1 " mmitchel at gcc dot gnu dot org
@ 2005-07-08  1:42 ` mmitchel at gcc dot gnu dot org
  2005-08-17 12:45 ` pinskia at gcc dot gnu dot org
  2005-09-27 16:23 ` mmitchel at gcc dot gnu dot org
  11 siblings, 0 replies; 13+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2005-07-08  1:42 UTC (permalink / raw)
  To: gcc-bugs



-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.0.1                       |4.0.2


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19988


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug middle-end/19988] [4.0/4.1 Regression] pessimizes fp multiply-add/subtract combo
  2005-02-16  4:49 [Bug c/19988] New: [4.0 Regression] pessimizes fp multiply-add/subtract combo gcc-bugzilla at gcc dot gnu dot org
                   ` (9 preceding siblings ...)
  2005-07-08  1:42 ` mmitchel at gcc dot gnu dot org
@ 2005-08-17 12:45 ` pinskia at gcc dot gnu dot org
  2005-09-27 16:23 ` mmitchel at gcc dot gnu dot org
  11 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-08-17 12:45 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From pinskia at gcc dot gnu dot org  2005-08-17 12:43 -------
For the last one, we don't have an un-CSE just yet.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|2005-02-15 23:42:25         |2005-08-17 12:43:49
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19988


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug middle-end/19988] [4.0/4.1 Regression] pessimizes fp multiply-add/subtract combo
  2005-02-16  4:49 [Bug c/19988] New: [4.0 Regression] pessimizes fp multiply-add/subtract combo gcc-bugzilla at gcc dot gnu dot org
                   ` (10 preceding siblings ...)
  2005-08-17 12:45 ` pinskia at gcc dot gnu dot org
@ 2005-09-27 16:23 ` mmitchel at gcc dot gnu dot org
  11 siblings, 0 replies; 13+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2005-09-27 16:23 UTC (permalink / raw)
  To: gcc-bugs



-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.0.2                       |4.0.3


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19988


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2005-09-27 16:22 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-02-16  4:49 [Bug c/19988] New: [4.0 Regression] pessimizes fp multiply-add/subtract combo gcc-bugzilla at gcc dot gnu dot org
2005-02-16  5:15 ` [Bug middle-end/19988] " pinskia at gcc dot gnu dot org
2005-02-16  6:33 ` pinskia at gcc dot gnu dot org
2005-02-16 19:00 ` pinskia at gcc dot gnu dot org
2005-02-16 19:01 ` pinskia at gcc dot gnu dot org
2005-02-16 23:47 ` roger at eyesopen dot com
2005-02-17  3:35 ` athena at fftw dot org
2005-02-17  3:38 ` athena at fftw dot org
2005-02-19 16:58 ` roger at eyesopen dot com
2005-04-21  5:03 ` [Bug middle-end/19988] [4.0/4.1 " mmitchel at gcc dot gnu dot org
2005-07-08  1:42 ` mmitchel at gcc dot gnu dot org
2005-08-17 12:45 ` pinskia at gcc dot gnu dot org
2005-09-27 16:23 ` mmitchel at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).