From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 29771 invoked by alias); 15 Feb 2005 22:54:00 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Received: (qmail 29724 invoked by alias); 15 Feb 2005 22:53:54 -0000 Date: Wed, 16 Feb 2005 04:49:00 -0000 From: "gcc-bugzilla at gcc dot gnu dot org" To: gcc-bugs@gcc.gnu.org Message-ID: <20050215225352.19988.stevenj@fftw.org> Reply-To: gcc-bugzilla@gcc.gnu.org Subject: [Bug c/19988] New: [4.0 Regression] pessimizes fp multiply-add/subtract combo X-Bugzilla-Reason: CC X-SW-Source: 2005-02/txt/msg01676.txt.bz2 List-Id: When I compile the following code with 'gcc -O3 --save-temps -c': double foo(double x, double y) { return ((x + 0.1234 * y) * (x - 0.1234 * y)); } gcc 3.x gives one load of the constant 0.1234, one multiplication 0.1234 * y, one addition, one subtraction, and the final multiplication: total = one constant (load) and four fp operations. gcc 4.0 (20050213 snapshot), on the other hand, compiles (x - 0.1234 * y) as (x + (-0.1234) * y), and thus doesn't recognize that it is the same constant as in the other expression. Thus, it produces *two* constants (2 loads), and *five* fp operations (3 multiplications): foo: pushl %ebp movl %esp, %ebp fldl 16(%ebp) fld %st(0) fldl 8(%ebp) fxch %st(1) fmull .LC0 fxch %st(2) popl %ebp fmull .LC1 fxch %st(2) fadd %st(1), %st fxch %st(1) faddp %st, %st(2) fmulp %st, %st(1) ret As you can imagine, this leads to a major slowdown in code that has lots of multiply-add and multiply-subtract combinations...in particular any FFT (such as our FFTW, www.fftw.org) could suffer a lot. Thanks for your efforts, Steven G Johnson PS. When you fix this, please don't re-introduce another optimizer bug that appears in gcc 3.x. In particular, when compiling for a PowerPC target, it *should* produce one constant load, one fused multiply-add, one fused-multiply subtract, and one multiplication. gcc 3.x, on the other hand, pulls out the (0.1234 * y) in CSE, and thus does not exploit the fma. gcc 4.0 on PowerPC (MacOS 10.3) produces: _foo: mflr r0 bcl 20,31,"L00000000001$pb" "L00000000001$pb": stw r31,-4(r1) fmr f13,f1 mflr r31 stw r0,8(r1) lwz r0,8(r1) addis r2,r31,ha16(LC0-"L00000000001$pb") lfd f1,lo16(LC0-"L00000000001$pb")(r2) addis r2,r31,ha16(LC1-"L00000000001$pb") lfd f0,lo16(LC1-"L00000000001$pb")(r2) Cordially, mtlr r0 fmadd f1,f2,f1,f13 lwz r31,-4(r1) fmadd f2,f2,f0,f13 fmul f1,f1,f2 blr which utilizes the fma, but loads the constant twice (as 0.1234 and -0.1234) instead of using fmadd and fmsub. PPS. In general, turning negative constants into positive constants by changing additions into subtractions can lead to substantial speedups by reducing the number of fp constants in certain kinds of code. e.g. "manually" doing this in FFTW gained us 10-15% in speed; YMMV. Something to think about. Environment: System: Linux fftw.org 2.6.3-1-686-smp #2 SMP Tue Feb 24 20:29:08 EST 2004 i686 GNU/Linux Architecture: i686 host: i686-pc-linux-gnu build: i686-pc-linux-gnu target: i686-pc-linux-gnu configured with: ../configure --prefix=/home/stevenj/gcc4 How-To-Repeat: Compile above foo() subroutine with gcc -O3 -c --save-temps and look at assembler output. -- Summary: [4.0 Regression] pessimizes fp multiply-add/subtract combo Product: gcc Version: 4.0.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: stevenj at fftw dot org CC: gcc-bugs at gcc dot gnu dot org GCC build triplet: i686-pc-linux-gnu GCC host triplet: i686-pc-linux-gnu GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19988