From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 18573 invoked by alias); 22 Mar 2012 13:18:45 -0000 Received: (qmail 18561 invoked by uid 22791); 22 Mar 2012 13:18:44 -0000 X-SWARE-Spam-Status: No, hits=-2.8 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00,TW_BD,TW_VM,TW_VP X-Spam-Check-By: sourceware.org Received: from localhost (HELO gcc.gnu.org) (127.0.0.1) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 22 Mar 2012 13:18:31 +0000 From: "venkataramanan.kumar at amd dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/44141] Redundant loads and stores generated for AMD bdver1 target Date: Thu, 22 Mar 2012 13:24:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: venkataramanan.kumar at amd dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated Content-Type: text/plain; charset="UTF-8" MIME-Version: 1.0 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2012-03/txt/msg01897.txt.bz2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44141 --- Comment #4 from Venkataramanan 2012-03-22 13:17:34 UTC --- I dont have permission to confirm this bug. Here is my analysis for the cause. #(insn:TI 4886 4885 4888 132 (set (reg:V2DF 25 xmm4 [8797]) # (mult:V2DF (reg:V2DF 25 xmm4 [8795]) # (reg:V2DF 22 xmm1 [8758]))) ac.f90:499 1138 {*mulv2df3} # (nil)) vmulpd %xmm1, %xmm4, %xmm4 # 4886 *mulv2df3/2 [length = 4] We are forcing a conversion from V2DF to V4SF mode here for unaligned moves when TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL is set. (-----Snip ix86_expand_vector_move_misalign-----) case V2DFmode: if (TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL) { op0 = gen_lowpart (V4SFmode, op0); op1 = gen_lowpart (V4SFmode, op1); emit_insn (gen_sse_movups (op0, op1)); return; } (-----Snip-----) This conversion generates RTL as shown below. #(insn:TI 4888 4886 4890 132 (set (mem/c:V4SF (plus:DI (reg/f:DI 7 sp) # (const_int 6136 [0x17f8])) [3 MEM[(real(kind=8)[26] *)&dclroo + 152B]+0 S16 A64]) # (unspec:V4SF [ # (reg:V4SF 25 xmm4 [8797]) # ] UNSPEC_MOVU)) ac.f90:499 1104 {*sse_movups} # (expr_list:REG_DEAD (reg:V4SF 25 xmm4 [8797]) # (nil))) vmovups %xmm4, 6136(%rsp) # 4888 *sse_movups/2 [length = 9] Now GCC does not know how to come back to V2DF mode again. As Uros said, it reloads through memory. #(insn 4930 4929 8259 132 (set (reg:V4SF 23 xmm2) # (unspec:V4SF [ # (mem/c:V4SF (plus:DI (reg/f:DI 7 sp) # (const_int 6136 [0x17f8])) [3 MEM[(real(kind=8)[26] *)&dclroo + 152B]+0 S16 A64]) # ] UNSPEC_MOVU)) ac.f90:503 1104 {*sse_movups} # (nil)) vmovups 6136(%rsp), %xmm2 # 4930 *sse_movups/1 [length = 9] #(insn:TI 8259 4930 8261 132 (set (mem/c:V4SF (plus:DI (reg/f:DI 7 sp) # (const_int 240 [0xf0])) [12 %sfp+-11184 S16 A128]) # (reg:V4SF 23 xmm2)) ac.f90:503 1098 {*movv4sf_internal} # (expr_list:REG_DEAD (reg:V4SF 23 xmm2) # (nil))) vmovaps %xmm2, 240(%rsp) # 8259 *movv4sf_internal/3 [length = 9] #(insn 8261 8259 4931 132 (set (reg:V2DF 23 xmm2) # (mem/c:V2DF (plus:DI (reg/f:DI 7 sp) # (const_int 240 [0xf0])) [12 %sfp+-11184 S16 A128])) ac.f90:503 1100 {*movv2df_internal} # (nil)) vmovaps 240(%rsp), %xmm2 # 8261 *movv2df_internal/2 [length = 9] #(insn:TI 4931 8261 8260 132 (set (reg:V2DF 23 xmm2) # (div:V2DF (reg:V2DF 23 xmm2) # (mem/c:V2DF (plus:DI (reg/f:DI 7 sp) # (const_int 6128 [0x17f0])) [3 MEM[(real(kind=8)[26] *)&dclroo + 144B]+0 S16 A128]))) ac.f90:503 1144 {sse2_divv2df3} # (nil)) vdivpd 6128(%rsp), %xmm2, %xmm2 # 4931 sse2_divv2df3/2 [length = 9]