From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 18147 invoked by alias); 7 Feb 2014 08:52:35 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Received: (qmail 18099 invoked by uid 48); 7 Feb 2014 08:52:31 -0000 From: "abel at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug rtl-optimization/60086] suboptimal asm generated for a loop (store/load false aliasing) Date: Fri, 07 Feb 2014 08:52:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: rtl-optimization X-Bugzilla-Version: 4.7.3 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: abel at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2014-02/txt/msg00697.txt.bz2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60086 --- Comment #5 from Andrey Belevantsev --- (In reply to Jakub Jelinek from comment #1) > ... > doesn't reorder those is that RA allocates the same register. With -O3 > -mavx -fselective-scheduling2 the stores are also changed, but we end up > with a weird: > .L9: > movq -136(%rbp), %rdx > vmovapd (%r9,%rax), %ymm0 > addq $1, %rdi > vmovapd (%r10,%rax), %ymm8 > vaddpd (%rdx,%rax), %ymm0, %ymm0 > movq -144(%rbp), %rdx > vaddpd (%rdx,%rax), %ymm8, %ymm9 > vmovapd %ymm0, (%r9,%rax) > vmovapd %ymm8, %ymm0 > vmovapd %ymm9, %ymm0 > vmovapd %ymm0, (%r10,%rax) > addq $32, %rax > cmpq %rdi, -152(%rbp) > ja .L9 > Why there is the vmovapd %ymm8, %ymm0 is a mystery, and vmovapd %ymm9, %ymm0 > could be very well merged with the store into vmovapd %ymm9, (%r10,%rax). That's because we do a renaming and a substitution. We have (in the middle of scheduling, just scheduled insn 78): 262: dx:DI=[bp:DI-0x88] 72: xmm0:V4DF=[r9:DI+ax:DI] 78: {di:DI=di:DI+0x1;clobber flags:CC;} <--- we are here 73: xmm0:V4DF=xmm0:V4DF+[dx:DI+ax:DI] 74: [r9:DI+ax:DI]=xmm0:V4DF 75: xmm0:V4DF=[r10:DI+ax:DI] 263: dx:DI=[bp:DI-0x90] 76: xmm0:V4DF=xmm0:V4DF+[dx:DI+ax:DI] 77: [r10:DI+ax:DI]=xmm0:V4DF Now we want to schedule insn 75 but xmm0 is busy in 74 and 73, so we rename it to xmm8 and have: 262: dx:DI=[bp:DI-0x88] 72: xmm0:V4DF=[r9:DI+ax:DI] 78: {di:DI=di:DI+0x1;clobber flags:CC;} 459: xmm8:V4DF=[r10:DI+ax:DI] <--- we are here 73: xmm0:V4DF=xmm0:V4DF+[dx:DI+ax:DI] 74: [r9:DI+ax:DI]=xmm0:V4DF 461: xmm0:V4DF=xmm8:V4DF <--- copy after renaming 263: dx:DI=[bp:DI-0x90] 76: xmm0:V4DF=xmm0:V4DF+[dx:DI+ax:DI] 77: [r10:DI+ax:DI]=xmm0:V4DF Then after scheduling insns 73 and 263 we have 262: dx:DI=[bp:DI-0x88] 72: xmm0:V4DF=[r9:DI+ax:DI] 78: {di:DI=di:DI+0x1;clobber flags:CC;} 459: xmm8:V4DF=[r10:DI+ax:DI] 73: xmm0:V4DF=xmm0:V4DF+[dx:DI+ax:DI] 263: dx:DI=[bp:DI-0x90] <--- we are here 74: [r9:DI+ax:DI]=xmm0:V4DF 461: xmm0:V4DF=xmm8:V4DF 76: xmm0:V4DF=xmm0:V4DF+[dx:DI+ax:DI] 77: [r10:DI+ax:DI]=xmm0:V4DF and now we want to schedule insn 76. We substitute its rhs through a copy 461 but then xmm0 is again busy so we rename the target register to xmm9 and get 262: dx:DI=[bp:DI-0x88] 72: xmm0:V4DF=[r9:DI+ax:DI] 78: {di:DI=di:DI+0x1;clobber flags:CC;} 459: xmm8:V4DF=[r10:DI+ax:DI] 73: xmm0:V4DF=xmm0:V4DF+[dx:DI+ax:DI] 263: dx:DI=[bp:DI-0x90] 464: xmm9:V4DF=xmm8:V4DF+[dx:DI+ax:DI] <--- new renamed insn 74: [r9:DI+ax:DI]=xmm0:V4DF 461: xmm0:V4DF=xmm8:V4DF 466: xmm0:V4DF=xmm9:V4DF <--- copy after renaming 77: [r10:DI+ax:DI]=xmm0:V4DF At this point insn 461 is dead but we do not notice, and it doesn't look easy. I think there was some suggestion in the original research for killing dead insn copies left after renaming but I don't remember offhand.