From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 24819 invoked by alias); 22 Feb 2013 23:41:02 -0000 Received: (qmail 24766 invoked by uid 48); 22 Feb 2013 23:40:38 -0000 From: "steven at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/54000] [4.6/4.7/4.8 Regression] Performance breakdown for gcc-4.{6,7} vs. gcc-4.5 using std::vector in matrix vector multiplication (IVopts / inliner) Date: Fri, 22 Feb 2013 23:41:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: steven at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 4.6.4 X-Bugzilla-Changed-Fields: CC Summary Message-ID: In-Reply-To: References: X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated Content-Type: text/plain; charset="UTF-8" MIME-Version: 1.0 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2013-02/txt/msg02283.txt.bz2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54000 Steven Bosscher changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |rguenth at gcc dot gnu.org Summary|[4.6/4.7/4.8 |[4.6/4.7/4.8 Regression] |Regression][IVOPTS] |Performance breakdown for |Performance breakdown for |gcc-4.{6,7} vs. gcc-4.5 |gcc-4.{6,7} vs. gcc-4.5 |using std::vector in matrix |using std::vector in matrix |vector multiplication |vector multiplication |(IVopts / inliner) --- Comment #9 from Steven Bosscher 2013-02-22 23:40:35 UTC --- (In reply to comment #8) > Thanks for the reduced testcase. The innermost loops compare as follows: > > 4.5: > > .L7: > movsd (%rbx,%rcx), %xmm0 > addq $8, %rcx > mulsd 0(%rbp,%rdx), %xmm0 > addq $8, %rdx > cmpq $24, %rdx > addsd %xmm0, %xmm1 > movsd %xmm1, (%rsi) > jne .L7 4.8 r196182 with "--param early-inlining-insns=2" (2 x the default value): .L13: movsd (%rdx), %xmm0 addq $8, %rdx mulsd (%rsi,%rax), %xmm0 addq $8, %rax cmpq $24, %rax addsd %xmm0, %xmm1 movsd %xmm1, 8(%rdi,%rcx) jne .L13 > > 4.7: > > .L13: > movq 64(%rsp), %rdi > movq 80(%rsp), %rdx > addq %rcx, %rdi > addq %r8, %rdx > movsd -8(%rax,%rdi), %xmm0 > mulsd (%rsi,%rax), %xmm0 > addq $8, %rax > cmpq $24, %rax > addsd (%rdx), %xmm0 > movsd %xmm0, (%rdx) > jne .L13 This is similar to what 4.8 r196182 produces without inliner tweaks: .L18: movq %rcx, %rdi addq 64(%rsp), %rdi movq %r8, %rdx addq 80(%rsp), %rdx movsd -8(%rax,%rdi), %xmm0 mulsd (%rsi,%rax), %xmm0 addq $8, %rax cmpq $24, %rax addsd (%rdx), %xmm0 movsd %xmm0, (%rdx) jne .L18 > so we seem to have a register allocation / spilling issue here as well > as a bad induction variable choice. GCC 4.8 is not any better here. All true, but in the end it looks like an inliner heuristics issue first (as also suggested by comment #3).