From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 15305 invoked by alias); 17 Nov 2008 18:13:14 -0000 Received: (qmail 4647 invoked by uid 48); 17 Nov 2008 18:11:49 -0000 Date: Mon, 17 Nov 2008 18:13:00 -0000 Message-ID: <20081117181149.4646.qmail@sourceware.org> X-Bugzilla-Reason: CC References: Subject: [Bug target/38134] [4.4 Regression] speed regression with inline-asm sse code In-Reply-To: Reply-To: gcc-bugzilla@gcc.gnu.org To: gcc-bugs@gcc.gnu.org From: "ubizjak at gmail dot com" Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2008-11/txt/msg01396.txt.bz2 ------- Comment #6 from ubizjak at gmail dot com 2008-11-17 18:11 ------- I think that addps .LC10(%rip), %xmm0 mulps %xmm1, %xmm0 addps .LC11(%rip), %xmm0 mulps %xmm1, %xmm0 addps .LC12(%rip), %xmm0 mulps %xmm1, %xmm0 addps .LC13(%rip), %xmm0 mulps %xmm1, %xmm0 addps .LC14(%rip), %xmm0 mulps %xmm1, %xmm0 is the bottleneck. Perhaps we should split impilicit memory operands out of the insn by some generic peephole (if the register is available) and schedule loads appropriately. OTOH, loop optimizer should detect invariant loads and move them out of the loop. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38134