From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 20986 invoked by alias); 31 May 2006 10:57:20 -0000 Received: (qmail 20262 invoked by uid 48); 31 May 2006 10:56:32 -0000 Date: Wed, 31 May 2006 10:57:00 -0000 Message-ID: <20060531105632.20260.qmail@sourceware.org> X-Bugzilla-Reason: CC References: Subject: [Bug target/27827] gcc 4 produces worse x87 code on all platforms than gcc 3 In-Reply-To: Reply-To: gcc-bugzilla@gcc.gnu.org To: gcc-bugs@gcc.gnu.org From: "uros at kss-loka dot si" Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2006-05/txt/msg03197.txt.bz2 List-Id: ------- Comment #7 from uros at kss-loka dot si 2006-05-31 10:56 ------- IMO the fact that gcc 3.x beats 4.x on this code could be attributed to pure luck. Looking into 3.x RTL, these things can be observed: Instruction that multiplies pA0 and rB0 is described as: __.20.combine: (insn 75 73 76 2 (set (reg:DF 84) (mult:DF (mem:DF (reg/v/f:DI 70 [ pA0 ]) [0 S8 A64]) (reg/v:DF 78 [ rB0 ]))) 551 {*fop_df_comm_nosse} (insn_list 65 (nil)) (nil)) At this point, first input operand does not satisfy the operand constraint, so register allocator pushes memory operand into the register: __.25.greg: (insn 703 73 75 2 (set (reg:DF 8 st [84]) (mem:DF (reg/v/f:DI 0 ax [orig:70 pA0 ] [70]) [0 S8 A64])) 96 {*movdf_integer} (nil) (nil)) (insn 75 703 76 2 (set (reg:DF 8 st [84]) (mult:DF (reg:DF 8 st [84]) (reg/v:DF 9 st(1) [orig:78 rB0 ] [78]))) 551 {*fop_df_comm_nosse} (insn_list 65 (nil)) (nil)) This RTL produces following asm sequence: fldl (%rax) #* pA0 fmul %st(1), %st # In 4.x case, we have: __.127r.combine: (insn 60 58 61 4 (set (reg:DF 207) (mult:DF (reg/v:DF 187 [ rB0 ]) (mem:DF (plus:DI (reg/v/f:DI 178 [ pA0.161 ]) (const_int 960 [0x3c0])) [0 S8 A64]))) 591 {*fop_df_comm_i387} (nil) (nil)) This instruction almost satisfies operand constraint, and register allocator produces: __.138r.greg: (insn 470 58 60 5 (set (reg:DF 12 st(4) [207]) (reg/v:DF 8 st [orig:187 rB0 ] [187])) 94 {*movdf_integer} (nil) (nil)) (insn 60 470 61 5 (set (reg:DF 12 st(4) [207]) (mult:DF (reg:DF 12 st(4) [207]) (mem:DF (plus:DI (reg/v/f:DI 0 ax [orig:178 pA0.161 ] [178]) (const_int 960 [0x3c0])) [0 S8 A64]))) 591 {*fop_df_comm_i387} (nil) (nil)) Stack handling then fixes this RTL to: __.151r.stack: (insn 470 58 60 4 (set (reg:DF 8 st) (reg:DF 8 st)) 94 {*movdf_integer} (nil) (nil)) (insn 60 470 61 4 (set (reg:DF 8 st) (mult:DF (reg:DF 8 st) (mem:DF (plus:DI (reg/v/f:DI 0 ax [orig:178 pA0.161 ] [178]) (const_int 960 [0x3c0])) [0 S8 A64]))) 591 {*fop_df_comm_i387} (nil) (nil)) >>From your measurement, it looks that instead of: fld %st(0) # fmull (%rax) #* pA0.161 it is faster to emit fldl (%rax) #* pA0 fmul %st(1), %st #, -- uros at kss-loka dot si changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |uros at kss-loka dot si http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827