public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/55342] New: [LRA,x86] Non-optimal code for simple loop with LRA
@ 2012-11-15 15:25 ysrumyan at gmail dot com
2012-11-15 16:47 ` [Bug rtl-optimization/55342] " hjl.tools at gmail dot com
` (20 more replies)
0 siblings, 21 replies; 22+ messages in thread
From: ysrumyan at gmail dot com @ 2012-11-15 15:25 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342
Bug #: 55342
Summary: [LRA,x86] Non-optimal code for simple loop with LRA
Classification: Unclassified
Product: gcc
Version: 4.8.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
AssignedTo: unassigned@gcc.gnu.org
ReportedBy: ysrumyan@gmail.com
Target: x86
For a simple test-case we got -15% regression with LRA on x86 in 32-bit mode.
The test-case is
#define byte unsigned char
#define MIN(a, b) ((a) > (b)?(b):(a))
void convert_image(byte *in, byte *out, int size) {
int i;
byte * read = in,
* write = out;
for(i = 0; i < size; i++) {
byte r = *read++;
byte g = *read++;
byte b = *read++;
byte c, m, y, k, tmp;
c = 255 - r;
m = 255 - g;
y = 255 - b;
if (c < m)
k = MIN (c, y);
else
k = MIN (m, y);
*write++ = c - k;
*write++ = m - k;
*write++ = y - k;
*write++ = k;
}
}
The essential part of assembly is (it is correspondent to write-part of loop):
without LRA
.L4:
movl %esi, %ecx
addl $4, %eax
subl %ecx, %ebx
movzbl 3(%esp), %ecx
movb %bl, -4(%eax)
movl %esi, %ebx
subl %ebx, %edx
movb %dl, -2(%eax)
subl %ebx, %ecx
movb %cl, -3(%eax)
cmpl %ebp, 4(%esp)
movb %bl, -1(%eax)
je .L1
with LRA
.L4:
movl %esi, %eax
subl %eax, %ebx
movl 28(%esp), %eax
movb %bl, (%eax)
movl %esi, %eax
subl %eax, %ecx
movl 28(%esp), %eax
movb %cl, 1(%eax)
movl %esi, %eax
subl %eax, %edx
movl 28(%esp), %eax
movb %dl, 2(%eax)
addl $4, %eax
movl %eax, 28(%esp)
movl 28(%esp), %ecx
movl %esi, %eax
cmpl %ebp, (%esp)
movb %al, -1(%ecx)
je .L1
I also wonder why additional moves are required to perform subtraction:
movl %esi, %eax
subl %eax, %ebx
whereas only one instruction is required:
subl %esi, %ebx.
I assume that this part is not related to LRA.
^ permalink raw reply [flat|nested] 22+ messages in thread
* [Bug rtl-optimization/55342] [LRA,x86] Non-optimal code for simple loop with LRA
2012-11-15 15:25 [Bug rtl-optimization/55342] New: [LRA,x86] Non-optimal code for simple loop with LRA ysrumyan at gmail dot com
@ 2012-11-15 16:47 ` hjl.tools at gmail dot com
2012-11-17 18:01 ` [Bug rtl-optimization/55342] [4.8 Regression] " vmakarov at gcc dot gnu.org
` (19 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: hjl.tools at gmail dot com @ 2012-11-15 16:47 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342
H.J. Lu <hjl.tools at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |hjl.tools at gmail dot com,
| |ubizjak at gmail dot com,
| |vmakarov at redhat dot com
Target Milestone|--- |4.8.0
^ permalink raw reply [flat|nested] 22+ messages in thread
* [Bug rtl-optimization/55342] [4.8 Regression] [LRA,x86] Non-optimal code for simple loop with LRA
2012-11-15 15:25 [Bug rtl-optimization/55342] New: [LRA,x86] Non-optimal code for simple loop with LRA ysrumyan at gmail dot com
2012-11-15 16:47 ` [Bug rtl-optimization/55342] " hjl.tools at gmail dot com
@ 2012-11-17 18:01 ` vmakarov at gcc dot gnu.org
2012-11-19 12:06 ` ysrumyan at gmail dot com
` (18 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: vmakarov at gcc dot gnu.org @ 2012-11-17 18:01 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342
--- Comment #1 from Vladimir Makarov <vmakarov at gcc dot gnu.org> 2012-11-17 17:59:41 UTC ---
Author: vmakarov
Date: Sat Nov 17 17:59:35 2012
New Revision: 193588
URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=193588
Log:
2012-11-17 Vladimir Makarov <vmakarov@redhat.com>
PR rtl-optimization/55342
* lra-assigns.c (spill_for): Try to allocate other reload pseudos
before and after spilling.
Modified:
trunk/gcc/ChangeLog
trunk/gcc/lra-assigns.c
^ permalink raw reply [flat|nested] 22+ messages in thread
* [Bug rtl-optimization/55342] [4.8 Regression] [LRA,x86] Non-optimal code for simple loop with LRA
2012-11-15 15:25 [Bug rtl-optimization/55342] New: [LRA,x86] Non-optimal code for simple loop with LRA ysrumyan at gmail dot com
2012-11-15 16:47 ` [Bug rtl-optimization/55342] " hjl.tools at gmail dot com
2012-11-17 18:01 ` [Bug rtl-optimization/55342] [4.8 Regression] " vmakarov at gcc dot gnu.org
@ 2012-11-19 12:06 ` ysrumyan at gmail dot com
2012-12-07 10:13 ` rguenth at gcc dot gnu.org
` (17 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: ysrumyan at gmail dot com @ 2012-11-19 12:06 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342
--- Comment #2 from Yuri Rumyantsev <ysrumyan at gmail dot com> 2012-11-19 12:06:20 UTC ---
The patching compiler produces better binaries but we still have -6%
performance degradation on corei7. The main cause of it it that LRA compiler
generates spill of 'pure' byte 'g' whereas old compiler generates spill for 'm'
that is negation of 'g':
gcc wwithout LRA (assembly part the head of loop)
.L7:
movzbl 1(%edi), %edx
leal 3(%edi), %ebp
movzbl (%edi), %ebx
movl %ebp, %edi
notl %edx // perform negation on register
movb %dl, 3(%esp)
gcc with LRA
.L7:
movzbl (%edi), %ebx
leal 3(%edi), %ebp
movzbl 1(%edi), %ecx
movl %ebp, %edi
movzbl -1(%ebp), %edx
notl %ebx
notl %ecx
movb %dl, (%esp)
cmpb %cl, %bl
notb (%esp) // perform nagation in memory
i.e. wwe have redundant load and store form/to stack.
I assume that this should be fixed also.
^ permalink raw reply [flat|nested] 22+ messages in thread
* [Bug rtl-optimization/55342] [4.8 Regression] [LRA,x86] Non-optimal code for simple loop with LRA
2012-11-15 15:25 [Bug rtl-optimization/55342] New: [LRA,x86] Non-optimal code for simple loop with LRA ysrumyan at gmail dot com
` (2 preceding siblings ...)
2012-11-19 12:06 ` ysrumyan at gmail dot com
@ 2012-12-07 10:13 ` rguenth at gcc dot gnu.org
2013-01-28 18:53 ` jakub at gcc dot gnu.org
` (16 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-12-07 10:13 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Priority|P3 |P1
^ permalink raw reply [flat|nested] 22+ messages in thread
* [Bug rtl-optimization/55342] [4.8 Regression] [LRA,x86] Non-optimal code for simple loop with LRA
2012-11-15 15:25 [Bug rtl-optimization/55342] New: [LRA,x86] Non-optimal code for simple loop with LRA ysrumyan at gmail dot com
` (3 preceding siblings ...)
2012-12-07 10:13 ` rguenth at gcc dot gnu.org
@ 2013-01-28 18:53 ` jakub at gcc dot gnu.org
2013-02-20 14:11 ` rguenth at gcc dot gnu.org
` (15 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: jakub at gcc dot gnu.org @ 2013-01-28 18:53 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Priority|P1 |P2
CC| |jakub at gcc dot gnu.org
--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> 2013-01-28 18:53:25 UTC ---
I'm downgrading this to P2, it is unfortunate we generate slightly worse code
on this testcase, but it is more important how LRA vs. reload behaves on
average on various benchmarks etc. This doesn't mean this isn't important to
look at, but I think it isn't a release blocker at this point.
^ permalink raw reply [flat|nested] 22+ messages in thread
* [Bug rtl-optimization/55342] [4.8 Regression] [LRA,x86] Non-optimal code for simple loop with LRA
2012-11-15 15:25 [Bug rtl-optimization/55342] New: [LRA,x86] Non-optimal code for simple loop with LRA ysrumyan at gmail dot com
` (4 preceding siblings ...)
2013-01-28 18:53 ` jakub at gcc dot gnu.org
@ 2013-02-20 14:11 ` rguenth at gcc dot gnu.org
2013-03-22 14:48 ` [Bug rtl-optimization/55342] [4.8/4.9 " jakub at gcc dot gnu.org
` (14 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: rguenth at gcc dot gnu.org @ 2013-02-20 14:11 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |missed-optimization, ra
Target|x86 |i?86-*-*
Status|UNCONFIRMED |NEW
Last reconfirmed| |2013-02-20
Ever Confirmed|0 |1
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> 2013-02-20 14:09:59 UTC ---
At least it seems to be confirmed.
^ permalink raw reply [flat|nested] 22+ messages in thread
* [Bug rtl-optimization/55342] [4.8/4.9 Regression] [LRA,x86] Non-optimal code for simple loop with LRA
2012-11-15 15:25 [Bug rtl-optimization/55342] New: [LRA,x86] Non-optimal code for simple loop with LRA ysrumyan at gmail dot com
` (5 preceding siblings ...)
2013-02-20 14:11 ` rguenth at gcc dot gnu.org
@ 2013-03-22 14:48 ` jakub at gcc dot gnu.org
2013-05-31 10:59 ` jakub at gcc dot gnu.org
` (13 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: jakub at gcc dot gnu.org @ 2013-03-22 14:48 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|4.8.0 |4.8.1
--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> 2013-03-22 14:44:31 UTC ---
GCC 4.8.0 is being released, adjusting target milestone.
^ permalink raw reply [flat|nested] 22+ messages in thread
* [Bug rtl-optimization/55342] [4.8/4.9 Regression] [LRA,x86] Non-optimal code for simple loop with LRA
2012-11-15 15:25 [Bug rtl-optimization/55342] New: [LRA,x86] Non-optimal code for simple loop with LRA ysrumyan at gmail dot com
` (6 preceding siblings ...)
2013-03-22 14:48 ` [Bug rtl-optimization/55342] [4.8/4.9 " jakub at gcc dot gnu.org
@ 2013-05-31 10:59 ` jakub at gcc dot gnu.org
2013-06-06 15:20 ` vmakarov at gcc dot gnu.org
` (12 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: jakub at gcc dot gnu.org @ 2013-05-31 10:59 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|4.8.1 |4.8.2
--- Comment #6 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC 4.8.1 has been released.
^ permalink raw reply [flat|nested] 22+ messages in thread
* [Bug rtl-optimization/55342] [4.8/4.9 Regression] [LRA,x86] Non-optimal code for simple loop with LRA
2012-11-15 15:25 [Bug rtl-optimization/55342] New: [LRA,x86] Non-optimal code for simple loop with LRA ysrumyan at gmail dot com
` (7 preceding siblings ...)
2013-05-31 10:59 ` jakub at gcc dot gnu.org
@ 2013-06-06 15:20 ` vmakarov at gcc dot gnu.org
2013-09-05 14:44 ` ysrumyan at gmail dot com
` (11 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: vmakarov at gcc dot gnu.org @ 2013-06-06 15:20 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342
Vladimir Makarov <vmakarov at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |vmakarov at gcc dot gnu.org
--- Comment #7 from Vladimir Makarov <vmakarov at gcc dot gnu.org> ---
(In reply to Yuri Rumyantsev from comment #2)
> The patching compiler produces better binaries but we still have -6%
> performance degradation on corei7. The main cause of it it that LRA compiler
> generates spill of 'pure' byte 'g' whereas old compiler generates spill for
> 'm' that is negation of 'g':
>
> gcc wwithout LRA (assembly part the head of loop)
>
> .L7:
> movzbl 1(%edi), %edx
> leal 3(%edi), %ebp
> movzbl (%edi), %ebx
> movl %ebp, %edi
> notl %edx // perform negation on register
> movb %dl, 3(%esp)
>
> gcc with LRA
>
> .L7:
> movzbl (%edi), %ebx
> leal 3(%edi), %ebp
> movzbl 1(%edi), %ecx
> movl %ebp, %edi
> movzbl -1(%ebp), %edx
> notl %ebx
> notl %ecx
> movb %dl, (%esp)
> cmpb %cl, %bl
> notb (%esp) // perform nagation in memory
>
> i.e. wwe have redundant load and store form/to stack.
>
> I assume that this should be fixed also.
Fixing problem with notl needs implementing a new functionality in LRA: making
reloads which stays if the reload pseudo got a hard registers and was inherited
(in this case it is profitable). Otherwise the current code should be
generated (the reloads and reload pseudos should be removed, the old code
should be restored). I've started work on this but it will not be fixed
quickly as implementing the new functionality is not trivial task.
>From gcc-bugs-return-423919-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Thu Jun 06 15:52:32 2013
Return-Path: <gcc-bugs-return-423919-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Delivered-To: listarch-gcc-bugs@gcc.gnu.org
Received: (qmail 19939 invoked by alias); 6 Jun 2013 15:52:32 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Delivered-To: mailing list gcc-bugs@gcc.gnu.org
Received: (qmail 19697 invoked by uid 48); 6 Jun 2013 15:52:29 -0000
From: "hjl.tools at gmail dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug rtl-optimization/57447] [4.9 Regression] ICE on 435.gromacs from spec2006 after r199298
Date: Thu, 06 Jun 2013 15:52:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: rtl-optimization
X-Bugzilla-Version: 4.9.0
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: hjl.tools at gmail dot com
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 4.9.0
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields:
Message-ID: <bug-57447-4-LSBrMYdNvT@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-57447-4@http.gcc.gnu.org/bugzilla/>
References: <bug-57447-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2013-06/txt/msg00298.txt.bz2
Content-length: 276
http://gcc.gnu.org/bugzilla/show_bug.cgi?idW447
--- Comment #2 from H.J. Lu <hjl.tools at gmail dot com> ---
(In reply to David Binderman from comment #1)
> I see this problem also.
>
> Additional test case available on request.
Please upload a testcase here. Thanks.
^ permalink raw reply [flat|nested] 22+ messages in thread
* [Bug rtl-optimization/55342] [4.8/4.9 Regression] [LRA,x86] Non-optimal code for simple loop with LRA
2012-11-15 15:25 [Bug rtl-optimization/55342] New: [LRA,x86] Non-optimal code for simple loop with LRA ysrumyan at gmail dot com
` (8 preceding siblings ...)
2013-06-06 15:20 ` vmakarov at gcc dot gnu.org
@ 2013-09-05 14:44 ` ysrumyan at gmail dot com
2013-09-05 14:51 ` ysrumyan at gmail dot com
` (10 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: ysrumyan at gmail dot com @ 2013-09-05 14:44 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342
--- Comment #8 from Yuri Rumyantsev <ysrumyan at gmail dot com> ---
Created attachment 30751
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30751&action=edit
modified test-case
Modified test-case to reproduce sub-optimal register allocation.
^ permalink raw reply [flat|nested] 22+ messages in thread
* [Bug rtl-optimization/55342] [4.8/4.9 Regression] [LRA,x86] Non-optimal code for simple loop with LRA
2012-11-15 15:25 [Bug rtl-optimization/55342] New: [LRA,x86] Non-optimal code for simple loop with LRA ysrumyan at gmail dot com
` (9 preceding siblings ...)
2013-09-05 14:44 ` ysrumyan at gmail dot com
@ 2013-09-05 14:51 ` ysrumyan at gmail dot com
2013-09-13 13:00 ` ysrumyan at gmail dot com
` (9 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: ysrumyan at gmail dot com @ 2013-09-05 14:51 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342
--- Comment #9 from Yuri Rumyantsev <ysrumyan at gmail dot com> ---
The issue still exists in 4.9 compiler but we got another 30% degradation after
r202165 fix. It can be reproduced with modified test-case which as attached
with any 4.9 compiler, namely code produced for inner loop looks like:
.L8:
movl %esi, %ecx
movl %esi, %edi
movzbl 3(%esp), %edx
cmpb %cl, %dl
movl %edx, %ecx
cmovbe %ecx, %edi
.L4:
movl %esi, %edx
movl %edi, %ecx
subl %ecx, %edx
movl 28(%esp), %ecx
movl 28(%esp), %esi
addl $4, 28(%esp)
movb %dl, (%ecx)
movl %edi, %ecx
subl %ecx, %ebx
movl %edi, %edx
movzbl 3(%esp), %ecx
movb %bl, 1(%esi)
subl %edx, %ecx
movl %edi, %ebx
movb %cl, 2(%esi)
movl 28(%esp), %esi
cmpl %ebp, %eax
movb %bl, -1(%esi)
je .L1
.L5:
movzbl (%eax), %esi
leal 3(%eax), %eax
movzbl -2(%eax), %ebx
notl %esi
notl %ebx
movl %esi, %edx
movzbl -1(%eax), %ecx
cmpb %bl, %dl
movb %cl, 3(%esp)
notb 3(%esp)
jb .L8
movzbl 3(%esp), %edx
movl %ebx, %edi
cmpb %bl, %dl
cmovbe %edx, %edi
jmp .L4
and you can see that (1) there are 2 additional moves on top of blocks marked
with .L4 and .L8; (2) redundant spill/fills of 'write' base in block marked
with .L4 (28(%esp)).
To reproduce it is sufficient to compile modified test-case with '-m32
-march=atom' options.
^ permalink raw reply [flat|nested] 22+ messages in thread
* [Bug rtl-optimization/55342] [4.8/4.9 Regression] [LRA,x86] Non-optimal code for simple loop with LRA
2012-11-15 15:25 [Bug rtl-optimization/55342] New: [LRA,x86] Non-optimal code for simple loop with LRA ysrumyan at gmail dot com
` (10 preceding siblings ...)
2013-09-05 14:51 ` ysrumyan at gmail dot com
@ 2013-09-13 13:00 ` ysrumyan at gmail dot com
2013-09-13 13:03 ` ysrumyan at gmail dot com
` (8 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: ysrumyan at gmail dot com @ 2013-09-13 13:00 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342
--- Comment #10 from Yuri Rumyantsev <ysrumyan at gmail dot com> ---
After fix rev. 202468 assembly looks slightly better but we met with another RA
inefficiency which can be illustrated on the attached (t1.c) test compiled with
options "-march=atom -mtune=atom -m32 -O2" that upped bound ol loop check is on
register but base register for "write" is on stack:
.L8:
movzbl 3(%esp), %edx
movl %esi, %ecx
cmpb %cl, %dl
movl %esi, %edi
cmovbe %edx, %edi
.L4:
movl %esi, %edx
movl 28(%esp), %esi <-- why write is on stack
movl %edi, %ecx
addl $4, 28(%esp) <-- perform write incrementation on stack
subl %ecx, %edx
subl %ecx, %ebx
movzbl 3(%esp), %ecx
movb %dl, (%esi)
movl %edi, %edx
subl %edx, %ecx
movb %bl, 1(%esi)
movb %cl, 2(%esi)
movl 28(%esp), %esi
cmpl %ebp, %eax <-- why upper bound is in register?
movb %dl, -1(%esi)
je .L1
.L5:
movzbl (%eax), %esi
leal 3(%eax), %eax
movzbl -2(%eax), %ebx
notl %esi
notl %ebx
movl %esi, %edx
movzbl -1(%eax), %ecx
cmpb %bl, %dl
notl %ecx
movb %cl, 3(%esp)
jb .L8
movzbl 3(%esp), %edx
movl %ebx, %edi
cmpb %bl, %dl
cmovbe %edx, %edi
jmp .L4
Is it something wrong in ATOM cost model? But anyway I assume that keeping
upper bound on stack is much cheeper then load base with incrementation from
stack.
^ permalink raw reply [flat|nested] 22+ messages in thread
* [Bug rtl-optimization/55342] [4.8/4.9 Regression] [LRA,x86] Non-optimal code for simple loop with LRA
2012-11-15 15:25 [Bug rtl-optimization/55342] New: [LRA,x86] Non-optimal code for simple loop with LRA ysrumyan at gmail dot com
` (11 preceding siblings ...)
2013-09-13 13:00 ` ysrumyan at gmail dot com
@ 2013-09-13 13:03 ` ysrumyan at gmail dot com
2013-10-16 9:50 ` jakub at gcc dot gnu.org
` (7 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: ysrumyan at gmail dot com @ 2013-09-13 13:03 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342
--- Comment #11 from Yuri Rumyantsev <ysrumyan at gmail dot com> ---
Created attachment 30816
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30816&action=edit
test-case to reproduce
t1.c must be compiled on x86 with options:
-O2 -march=atom -mtune=atom -mfpmath=sse -m32
^ permalink raw reply [flat|nested] 22+ messages in thread
* [Bug rtl-optimization/55342] [4.8/4.9 Regression] [LRA,x86] Non-optimal code for simple loop with LRA
2012-11-15 15:25 [Bug rtl-optimization/55342] New: [LRA,x86] Non-optimal code for simple loop with LRA ysrumyan at gmail dot com
` (12 preceding siblings ...)
2013-09-13 13:03 ` ysrumyan at gmail dot com
@ 2013-10-16 9:50 ` jakub at gcc dot gnu.org
2014-05-22 9:07 ` [Bug rtl-optimization/55342] [4.8/4.9/4.10 " rguenth at gcc dot gnu.org
` (6 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: jakub at gcc dot gnu.org @ 2013-10-16 9:50 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|4.8.2 |4.8.3
--- Comment #12 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC 4.8.2 has been released.
^ permalink raw reply [flat|nested] 22+ messages in thread
* [Bug rtl-optimization/55342] [4.8/4.9/4.10 Regression] [LRA,x86] Non-optimal code for simple loop with LRA
2012-11-15 15:25 [Bug rtl-optimization/55342] New: [LRA,x86] Non-optimal code for simple loop with LRA ysrumyan at gmail dot com
` (13 preceding siblings ...)
2013-10-16 9:50 ` jakub at gcc dot gnu.org
@ 2014-05-22 9:07 ` rguenth at gcc dot gnu.org
2014-12-19 13:36 ` [Bug rtl-optimization/55342] [4.8/4.9/5 " jakub at gcc dot gnu.org
` (5 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-05-22 9:07 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|4.8.3 |4.8.4
--- Comment #13 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 4.8.3 is being released, adjusting target milestone.
^ permalink raw reply [flat|nested] 22+ messages in thread
* [Bug rtl-optimization/55342] [4.8/4.9/5 Regression] [LRA,x86] Non-optimal code for simple loop with LRA
2012-11-15 15:25 [Bug rtl-optimization/55342] New: [LRA,x86] Non-optimal code for simple loop with LRA ysrumyan at gmail dot com
` (14 preceding siblings ...)
2014-05-22 9:07 ` [Bug rtl-optimization/55342] [4.8/4.9/4.10 " rguenth at gcc dot gnu.org
@ 2014-12-19 13:36 ` jakub at gcc dot gnu.org
2015-02-12 7:19 ` [Bug rtl-optimization/55342] [4.8/4.9 " law at redhat dot com
` (4 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: jakub at gcc dot gnu.org @ 2014-12-19 13:36 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|4.8.4 |4.8.5
--- Comment #14 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC 4.8.4 has been released.
^ permalink raw reply [flat|nested] 22+ messages in thread
* [Bug rtl-optimization/55342] [4.8/4.9 Regression] [LRA,x86] Non-optimal code for simple loop with LRA
2012-11-15 15:25 [Bug rtl-optimization/55342] New: [LRA,x86] Non-optimal code for simple loop with LRA ysrumyan at gmail dot com
` (15 preceding siblings ...)
2014-12-19 13:36 ` [Bug rtl-optimization/55342] [4.8/4.9/5 " jakub at gcc dot gnu.org
@ 2015-02-12 7:19 ` law at redhat dot com
2015-02-12 13:47 ` jakub at gcc dot gnu.org
` (3 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: law at redhat dot com @ 2015-02-12 7:19 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342
Jeffrey A. Law <law at redhat dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|[4.8/4.9/5 Regression] |[4.8/4.9 Regression]
|[LRA,x86] Non-optimal code |[LRA,x86] Non-optimal code
|for simple loop with LRA |for simple loop with LRA
--- Comment #15 from Jeffrey A. Law <law at redhat dot com> ---
I've examined the various testcases and the complaints about the poor register
allocation in this BZ with a trunk compiler.
I'm happy to report that I'm seeing none of the issues raised in this BZ.
For c#0 (store-back part of the loop):
.L5:
movl %edi, %ecx
addl $4, %esi
subl %ecx, %eax
subl %ecx, %edx
movzbl 3(%esp), %ecx
movb %al, -3(%esi)
movl %edi, %eax
movb %dl, -4(%esi)
subl %eax, %ecx
movb %cl, -2(%esi)
cmpl %ebp, %ebx
movb %al, -1(%esi)
je .L1
In c#2, the negation sequence is pointed out. We now get:
.L9:
movzbl (%ebx), %edx
movzbl 1(%ebx), %eax
addl $3, %ebx
movzbl -1(%ebx), %ecx
notl %edx
notl %eax
notl %ecx
cmpb %al, %dl
movb %cl, 3(%esp)
jb .L13
cmpb 3(%esp), %al
movzbl %al, %edi
jbe .L5
movzbl 3(%esp), %edi
jmp .L5
For the 1st modified testcase -O2 -mcpu=atom -m32:
.L11:
movzbl %al, %edi
cmpb %al, %cl
cmovbe %ecx, %edi
.L4:
movl %edi, %eax
leal 4(%esi), %esi
subl %eax, %edx
subl %eax, %ecx
movb %dl, -3(%esi)
movb %cl, -4(%esi)
movzbl 3(%esp), %edx
subl %eax, %edx
movl %edi, %eax
movb %dl, -2(%esi)
cmpl %ebx, %ebp
movb %al, -1(%esi)
je .L1
.L7:
movzbl (%ebx), %ecx
leal 3(%ebx), %ebx
movzbl -2(%ebx), %edx
notl %ecx
movzbl -1(%ebx), %eax
notl %edx
notl %eax
cmpb %dl, %cl
movb %al, 3(%esp)
jb .L11
movzbl 3(%esp), %eax
movzbl %al, %edi
cmpb %al, %dl
cmovbe %edx, %edi
jmp .L4
Then in c#10 (t1 testcase):
.L11:
movzbl %al, %edi
cmpb %al, %cl
cmovbe %ecx, %edi
.L4:
movl %edi, %eax
leal 4(%esi), %esi
subl %eax, %edx
subl %eax, %ecx
movb %dl, -3(%esi)
movb %cl, -4(%esi)
movzbl 3(%esp), %edx
subl %eax, %edx
movl %edi, %eax
movb %dl, -2(%esi)
cmpl %ebp, %ebx
movb %al, -1(%esi)
je .L1
.L7:
movzbl (%ebx), %ecx
leal 3(%ebx), %ebx
movzbl -2(%ebx), %edx
notl %ecx
movzbl -1(%ebx), %eax
notl %edx
notl %eax
cmpb %dl, %cl
movb %al, 3(%esp)
jb .L11
movzbl 3(%esp), %eax
movzbl %al, %edi
cmpb %al, %dl
cmovbe %edx, %edi
jmp .L4
Across the board we're not seeing objects spilled into the stack. The code
looks quite tight to me.
Clearing the regressio marker for GCC 5. I didn't do any bisection work to
identify what changes fixed things.
^ permalink raw reply [flat|nested] 22+ messages in thread
* [Bug rtl-optimization/55342] [4.8/4.9 Regression] [LRA,x86] Non-optimal code for simple loop with LRA
2012-11-15 15:25 [Bug rtl-optimization/55342] New: [LRA,x86] Non-optimal code for simple loop with LRA ysrumyan at gmail dot com
` (16 preceding siblings ...)
2015-02-12 7:19 ` [Bug rtl-optimization/55342] [4.8/4.9 " law at redhat dot com
@ 2015-02-12 13:47 ` jakub at gcc dot gnu.org
2015-06-23 8:21 ` rguenth at gcc dot gnu.org
` (2 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: jakub at gcc dot gnu.org @ 2015-02-12 13:47 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342
--- Comment #16 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
The #c10 issue went away with r204212 I believe.
^ permalink raw reply [flat|nested] 22+ messages in thread
* [Bug rtl-optimization/55342] [4.8/4.9 Regression] [LRA,x86] Non-optimal code for simple loop with LRA
2012-11-15 15:25 [Bug rtl-optimization/55342] New: [LRA,x86] Non-optimal code for simple loop with LRA ysrumyan at gmail dot com
` (17 preceding siblings ...)
2015-02-12 13:47 ` jakub at gcc dot gnu.org
@ 2015-06-23 8:21 ` rguenth at gcc dot gnu.org
2015-06-26 20:09 ` [Bug rtl-optimization/55342] [4.9 " jakub at gcc dot gnu.org
2015-06-26 20:36 ` jakub at gcc dot gnu.org
20 siblings, 0 replies; 22+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-06-23 8:21 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|4.8.5 |4.9.3
--- Comment #17 from Richard Biener <rguenth at gcc dot gnu.org> ---
The gcc-4_8-branch is being closed, re-targeting regressions to 4.9.3.
^ permalink raw reply [flat|nested] 22+ messages in thread
* [Bug rtl-optimization/55342] [4.9 Regression] [LRA,x86] Non-optimal code for simple loop with LRA
2012-11-15 15:25 [Bug rtl-optimization/55342] New: [LRA,x86] Non-optimal code for simple loop with LRA ysrumyan at gmail dot com
` (18 preceding siblings ...)
2015-06-23 8:21 ` rguenth at gcc dot gnu.org
@ 2015-06-26 20:09 ` jakub at gcc dot gnu.org
2015-06-26 20:36 ` jakub at gcc dot gnu.org
20 siblings, 0 replies; 22+ messages in thread
From: jakub at gcc dot gnu.org @ 2015-06-26 20:09 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342
--- Comment #18 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC 4.9.3 has been released.
^ permalink raw reply [flat|nested] 22+ messages in thread
* [Bug rtl-optimization/55342] [4.9 Regression] [LRA,x86] Non-optimal code for simple loop with LRA
2012-11-15 15:25 [Bug rtl-optimization/55342] New: [LRA,x86] Non-optimal code for simple loop with LRA ysrumyan at gmail dot com
` (19 preceding siblings ...)
2015-06-26 20:09 ` [Bug rtl-optimization/55342] [4.9 " jakub at gcc dot gnu.org
@ 2015-06-26 20:36 ` jakub at gcc dot gnu.org
20 siblings, 0 replies; 22+ messages in thread
From: jakub at gcc dot gnu.org @ 2015-06-26 20:36 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|4.9.3 |4.9.4
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2015-06-26 20:35 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-15 15:25 [Bug rtl-optimization/55342] New: [LRA,x86] Non-optimal code for simple loop with LRA ysrumyan at gmail dot com
2012-11-15 16:47 ` [Bug rtl-optimization/55342] " hjl.tools at gmail dot com
2012-11-17 18:01 ` [Bug rtl-optimization/55342] [4.8 Regression] " vmakarov at gcc dot gnu.org
2012-11-19 12:06 ` ysrumyan at gmail dot com
2012-12-07 10:13 ` rguenth at gcc dot gnu.org
2013-01-28 18:53 ` jakub at gcc dot gnu.org
2013-02-20 14:11 ` rguenth at gcc dot gnu.org
2013-03-22 14:48 ` [Bug rtl-optimization/55342] [4.8/4.9 " jakub at gcc dot gnu.org
2013-05-31 10:59 ` jakub at gcc dot gnu.org
2013-06-06 15:20 ` vmakarov at gcc dot gnu.org
2013-09-05 14:44 ` ysrumyan at gmail dot com
2013-09-05 14:51 ` ysrumyan at gmail dot com
2013-09-13 13:00 ` ysrumyan at gmail dot com
2013-09-13 13:03 ` ysrumyan at gmail dot com
2013-10-16 9:50 ` jakub at gcc dot gnu.org
2014-05-22 9:07 ` [Bug rtl-optimization/55342] [4.8/4.9/4.10 " rguenth at gcc dot gnu.org
2014-12-19 13:36 ` [Bug rtl-optimization/55342] [4.8/4.9/5 " jakub at gcc dot gnu.org
2015-02-12 7:19 ` [Bug rtl-optimization/55342] [4.8/4.9 " law at redhat dot com
2015-02-12 13:47 ` jakub at gcc dot gnu.org
2015-06-23 8:21 ` rguenth at gcc dot gnu.org
2015-06-26 20:09 ` [Bug rtl-optimization/55342] [4.9 " jakub at gcc dot gnu.org
2015-06-26 20:36 ` jakub at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).