public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/55342] New: [LRA,x86] Non-optimal code for simple loop with LRA
@ 2012-11-15 15:25 ysrumyan at gmail dot com
  2012-11-15 16:47 ` [Bug rtl-optimization/55342] " hjl.tools at gmail dot com
                   ` (20 more replies)
  0 siblings, 21 replies; 22+ messages in thread
From: ysrumyan at gmail dot com @ 2012-11-15 15:25 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342

             Bug #: 55342
           Summary: [LRA,x86] Non-optimal code for simple loop with LRA
    Classification: Unclassified
           Product: gcc
           Version: 4.8.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: ysrumyan@gmail.com
            Target: x86


For a simple test-case we got -15% regression with LRA on x86 in 32-bit mode.
The test-case is

#define byte unsigned char
#define MIN(a, b) ((a) > (b)?(b):(a))

void convert_image(byte *in, byte *out, int size) {
    int i;
    byte * read = in,
     * write = out;
    for(i = 0; i < size; i++) {
        byte r = *read++;
        byte g = *read++;
        byte b = *read++;
        byte c, m, y, k, tmp;
        c = 255 - r;
        m = 255 - g;
        y = 255 - b;
    if (c < m)
      k = MIN (c, y);
    else
          k = MIN (m, y);
        *write++ = c - k;
        *write++ = m - k;
        *write++ = y - k;
        *write++ = k;
    }
}

The essential part of assembly is (it is correspondent to write-part of loop): 

without LRA
.L4:
    movl    %esi, %ecx
    addl    $4, %eax
    subl    %ecx, %ebx
    movzbl    3(%esp), %ecx
    movb    %bl, -4(%eax)
    movl    %esi, %ebx
    subl    %ebx, %edx
    movb    %dl, -2(%eax)
    subl    %ebx, %ecx
    movb    %cl, -3(%eax)
    cmpl    %ebp, 4(%esp)
    movb    %bl, -1(%eax)
    je    .L1

with LRA

.L4:
    movl    %esi, %eax
    subl    %eax, %ebx
    movl    28(%esp), %eax
    movb    %bl, (%eax)
    movl    %esi, %eax
    subl    %eax, %ecx
    movl    28(%esp), %eax
    movb    %cl, 1(%eax)
    movl    %esi, %eax
    subl    %eax, %edx
    movl    28(%esp), %eax
    movb    %dl, 2(%eax)
    addl    $4, %eax
    movl    %eax, 28(%esp)
    movl    28(%esp), %ecx
    movl    %esi, %eax
    cmpl    %ebp, (%esp)
    movb    %al, -1(%ecx)
    je    .L1

I also wonder why additional moves are required to perform subtraction:

    movl  %esi, %eax
    subl  %eax, %ebx

whereas only one instruction is required:
    subl  %esi, %ebx.

I assume that this part is not related to LRA.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug rtl-optimization/55342] [LRA,x86] Non-optimal code for simple loop with LRA
  2012-11-15 15:25 [Bug rtl-optimization/55342] New: [LRA,x86] Non-optimal code for simple loop with LRA ysrumyan at gmail dot com
@ 2012-11-15 16:47 ` hjl.tools at gmail dot com
  2012-11-17 18:01 ` [Bug rtl-optimization/55342] [4.8 Regression] " vmakarov at gcc dot gnu.org
                   ` (19 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: hjl.tools at gmail dot com @ 2012-11-15 16:47 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342

H.J. Lu <hjl.tools at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hjl.tools at gmail dot com,
                   |                            |ubizjak at gmail dot com,
                   |                            |vmakarov at redhat dot com
   Target Milestone|---                         |4.8.0


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug rtl-optimization/55342] [4.8 Regression] [LRA,x86] Non-optimal code for simple loop with LRA
  2012-11-15 15:25 [Bug rtl-optimization/55342] New: [LRA,x86] Non-optimal code for simple loop with LRA ysrumyan at gmail dot com
  2012-11-15 16:47 ` [Bug rtl-optimization/55342] " hjl.tools at gmail dot com
@ 2012-11-17 18:01 ` vmakarov at gcc dot gnu.org
  2012-11-19 12:06 ` ysrumyan at gmail dot com
                   ` (18 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: vmakarov at gcc dot gnu.org @ 2012-11-17 18:01 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342

--- Comment #1 from Vladimir Makarov <vmakarov at gcc dot gnu.org> 2012-11-17 17:59:41 UTC ---
Author: vmakarov
Date: Sat Nov 17 17:59:35 2012
New Revision: 193588

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=193588
Log:
2012-11-17  Vladimir Makarov  <vmakarov@redhat.com>

    PR rtl-optimization/55342
    * lra-assigns.c (spill_for): Try to allocate other reload pseudos
    before and after spilling.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/lra-assigns.c


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug rtl-optimization/55342] [4.8 Regression] [LRA,x86] Non-optimal code for simple loop with LRA
  2012-11-15 15:25 [Bug rtl-optimization/55342] New: [LRA,x86] Non-optimal code for simple loop with LRA ysrumyan at gmail dot com
  2012-11-15 16:47 ` [Bug rtl-optimization/55342] " hjl.tools at gmail dot com
  2012-11-17 18:01 ` [Bug rtl-optimization/55342] [4.8 Regression] " vmakarov at gcc dot gnu.org
@ 2012-11-19 12:06 ` ysrumyan at gmail dot com
  2012-12-07 10:13 ` rguenth at gcc dot gnu.org
                   ` (17 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: ysrumyan at gmail dot com @ 2012-11-19 12:06 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342

--- Comment #2 from Yuri Rumyantsev <ysrumyan at gmail dot com> 2012-11-19 12:06:20 UTC ---
The patching compiler produces better binaries but we still have -6%
performance degradation on corei7. The main cause of it it that LRA compiler
generates spill of 'pure' byte 'g' whereas old compiler generates spill for 'm'
that is negation of 'g':

gcc wwithout LRA (assembly part the head of loop)

.L7:
    movzbl    1(%edi), %edx
    leal    3(%edi), %ebp
    movzbl    (%edi), %ebx
    movl    %ebp, %edi
    notl    %edx   // perform negation on register
    movb    %dl, 3(%esp)

gcc with LRA

.L7:
    movzbl    (%edi), %ebx
    leal    3(%edi), %ebp
    movzbl    1(%edi), %ecx
    movl    %ebp, %edi
    movzbl    -1(%ebp), %edx
    notl    %ebx
    notl    %ecx
    movb    %dl, (%esp)
    cmpb    %cl, %bl
    notb    (%esp) // perform nagation in memory

i.e. wwe have redundant load and store form/to stack.

I assume that this should be fixed also.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug rtl-optimization/55342] [4.8 Regression] [LRA,x86] Non-optimal code for simple loop with LRA
  2012-11-15 15:25 [Bug rtl-optimization/55342] New: [LRA,x86] Non-optimal code for simple loop with LRA ysrumyan at gmail dot com
                   ` (2 preceding siblings ...)
  2012-11-19 12:06 ` ysrumyan at gmail dot com
@ 2012-12-07 10:13 ` rguenth at gcc dot gnu.org
  2013-01-28 18:53 ` jakub at gcc dot gnu.org
                   ` (16 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-12-07 10:13 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P3                          |P1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug rtl-optimization/55342] [4.8 Regression] [LRA,x86] Non-optimal code for simple loop with LRA
  2012-11-15 15:25 [Bug rtl-optimization/55342] New: [LRA,x86] Non-optimal code for simple loop with LRA ysrumyan at gmail dot com
                   ` (3 preceding siblings ...)
  2012-12-07 10:13 ` rguenth at gcc dot gnu.org
@ 2013-01-28 18:53 ` jakub at gcc dot gnu.org
  2013-02-20 14:11 ` rguenth at gcc dot gnu.org
                   ` (15 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: jakub at gcc dot gnu.org @ 2013-01-28 18:53 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P1                          |P2
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> 2013-01-28 18:53:25 UTC ---
I'm downgrading this to P2, it is unfortunate we generate slightly worse code
on this testcase, but it is more important how LRA vs. reload behaves on
average on various benchmarks etc.  This doesn't mean this isn't important to
look at, but I think it isn't a release blocker at this point.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug rtl-optimization/55342] [4.8 Regression] [LRA,x86] Non-optimal code for simple loop with LRA
  2012-11-15 15:25 [Bug rtl-optimization/55342] New: [LRA,x86] Non-optimal code for simple loop with LRA ysrumyan at gmail dot com
                   ` (4 preceding siblings ...)
  2013-01-28 18:53 ` jakub at gcc dot gnu.org
@ 2013-02-20 14:11 ` rguenth at gcc dot gnu.org
  2013-03-22 14:48 ` [Bug rtl-optimization/55342] [4.8/4.9 " jakub at gcc dot gnu.org
                   ` (14 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: rguenth at gcc dot gnu.org @ 2013-02-20 14:11 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization, ra
             Target|x86                         |i?86-*-*
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2013-02-20
     Ever Confirmed|0                           |1

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> 2013-02-20 14:09:59 UTC ---
At least it seems to be confirmed.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug rtl-optimization/55342] [4.8/4.9 Regression] [LRA,x86] Non-optimal code for simple loop with LRA
  2012-11-15 15:25 [Bug rtl-optimization/55342] New: [LRA,x86] Non-optimal code for simple loop with LRA ysrumyan at gmail dot com
                   ` (5 preceding siblings ...)
  2013-02-20 14:11 ` rguenth at gcc dot gnu.org
@ 2013-03-22 14:48 ` jakub at gcc dot gnu.org
  2013-05-31 10:59 ` jakub at gcc dot gnu.org
                   ` (13 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: jakub at gcc dot gnu.org @ 2013-03-22 14:48 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.8.0                       |4.8.1

--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> 2013-03-22 14:44:31 UTC ---
GCC 4.8.0 is being released, adjusting target milestone.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug rtl-optimization/55342] [4.8/4.9 Regression] [LRA,x86] Non-optimal code for simple loop with LRA
  2012-11-15 15:25 [Bug rtl-optimization/55342] New: [LRA,x86] Non-optimal code for simple loop with LRA ysrumyan at gmail dot com
                   ` (6 preceding siblings ...)
  2013-03-22 14:48 ` [Bug rtl-optimization/55342] [4.8/4.9 " jakub at gcc dot gnu.org
@ 2013-05-31 10:59 ` jakub at gcc dot gnu.org
  2013-06-06 15:20 ` vmakarov at gcc dot gnu.org
                   ` (12 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: jakub at gcc dot gnu.org @ 2013-05-31 10:59 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.8.1                       |4.8.2

--- Comment #6 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC 4.8.1 has been released.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug rtl-optimization/55342] [4.8/4.9 Regression] [LRA,x86] Non-optimal code for simple loop with LRA
  2012-11-15 15:25 [Bug rtl-optimization/55342] New: [LRA,x86] Non-optimal code for simple loop with LRA ysrumyan at gmail dot com
                   ` (7 preceding siblings ...)
  2013-05-31 10:59 ` jakub at gcc dot gnu.org
@ 2013-06-06 15:20 ` vmakarov at gcc dot gnu.org
  2013-09-05 14:44 ` ysrumyan at gmail dot com
                   ` (11 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: vmakarov at gcc dot gnu.org @ 2013-06-06 15:20 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342

Vladimir Makarov <vmakarov at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |vmakarov at gcc dot gnu.org

--- Comment #7 from Vladimir Makarov <vmakarov at gcc dot gnu.org> ---
(In reply to Yuri Rumyantsev from comment #2)
> The patching compiler produces better binaries but we still have -6%
> performance degradation on corei7. The main cause of it it that LRA compiler
> generates spill of 'pure' byte 'g' whereas old compiler generates spill for
> 'm' that is negation of 'g':
> 
> gcc wwithout LRA (assembly part the head of loop)
> 
> .L7:
> 	movzbl	1(%edi), %edx
> 	leal	3(%edi), %ebp
> 	movzbl	(%edi), %ebx
> 	movl	%ebp, %edi
> 	notl	%edx   // perform negation on register
> 	movb	%dl, 3(%esp)
> 
> gcc with LRA
> 
> .L7:
> 	movzbl	(%edi), %ebx
> 	leal	3(%edi), %ebp
> 	movzbl	1(%edi), %ecx
> 	movl	%ebp, %edi
> 	movzbl	-1(%ebp), %edx
> 	notl	%ebx
> 	notl	%ecx
> 	movb	%dl, (%esp)
> 	cmpb	%cl, %bl
> 	notb	(%esp) // perform nagation in memory
> 
> i.e. wwe have redundant load and store form/to stack.
> 
> I assume that this should be fixed also.

Fixing problem with notl needs implementing a new functionality in LRA: making
reloads which stays if the reload pseudo got a hard registers and was inherited
(in this case it is profitable).  Otherwise the current code should be
generated (the reloads and reload pseudos should be removed, the old code
should be restored).  I've started work on this but it will not be fixed
quickly as implementing the new functionality is not trivial task.
>From gcc-bugs-return-423919-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Thu Jun 06 15:52:32 2013
Return-Path: <gcc-bugs-return-423919-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Delivered-To: listarch-gcc-bugs@gcc.gnu.org
Received: (qmail 19939 invoked by alias); 6 Jun 2013 15:52:32 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Delivered-To: mailing list gcc-bugs@gcc.gnu.org
Received: (qmail 19697 invoked by uid 48); 6 Jun 2013 15:52:29 -0000
From: "hjl.tools at gmail dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug rtl-optimization/57447] [4.9 Regression] ICE on 435.gromacs from spec2006 after r199298
Date: Thu, 06 Jun 2013 15:52:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: rtl-optimization
X-Bugzilla-Version: 4.9.0
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: hjl.tools at gmail dot com
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 4.9.0
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields:
Message-ID: <bug-57447-4-LSBrMYdNvT@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-57447-4@http.gcc.gnu.org/bugzilla/>
References: <bug-57447-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2013-06/txt/msg00298.txt.bz2
Content-length: 276

http://gcc.gnu.org/bugzilla/show_bug.cgi?idW447

--- Comment #2 from H.J. Lu <hjl.tools at gmail dot com> ---
(In reply to David Binderman from comment #1)
> I see this problem also.
>
> Additional test case available on request.

Please upload a testcase here.  Thanks.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug rtl-optimization/55342] [4.8/4.9 Regression] [LRA,x86] Non-optimal code for simple loop with LRA
  2012-11-15 15:25 [Bug rtl-optimization/55342] New: [LRA,x86] Non-optimal code for simple loop with LRA ysrumyan at gmail dot com
                   ` (8 preceding siblings ...)
  2013-06-06 15:20 ` vmakarov at gcc dot gnu.org
@ 2013-09-05 14:44 ` ysrumyan at gmail dot com
  2013-09-05 14:51 ` ysrumyan at gmail dot com
                   ` (10 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: ysrumyan at gmail dot com @ 2013-09-05 14:44 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342

--- Comment #8 from Yuri Rumyantsev <ysrumyan at gmail dot com> ---
Created attachment 30751
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30751&action=edit
modified test-case

Modified test-case to reproduce sub-optimal register allocation.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug rtl-optimization/55342] [4.8/4.9 Regression] [LRA,x86] Non-optimal code for simple loop with LRA
  2012-11-15 15:25 [Bug rtl-optimization/55342] New: [LRA,x86] Non-optimal code for simple loop with LRA ysrumyan at gmail dot com
                   ` (9 preceding siblings ...)
  2013-09-05 14:44 ` ysrumyan at gmail dot com
@ 2013-09-05 14:51 ` ysrumyan at gmail dot com
  2013-09-13 13:00 ` ysrumyan at gmail dot com
                   ` (9 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: ysrumyan at gmail dot com @ 2013-09-05 14:51 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342

--- Comment #9 from Yuri Rumyantsev <ysrumyan at gmail dot com> ---
The issue still exists in 4.9 compiler but we got another 30% degradation after
r202165 fix. It can be reproduced with modified test-case which as attached
with any 4.9 compiler, namely code produced for inner loop looks like:

.L8:
    movl    %esi, %ecx
    movl    %esi, %edi
    movzbl    3(%esp), %edx
    cmpb    %cl, %dl
    movl    %edx, %ecx
    cmovbe    %ecx, %edi
.L4:
    movl    %esi, %edx
    movl    %edi, %ecx
    subl    %ecx, %edx
    movl    28(%esp), %ecx
    movl    28(%esp), %esi
    addl    $4, 28(%esp)
    movb    %dl, (%ecx)
    movl    %edi, %ecx
    subl    %ecx, %ebx
    movl    %edi, %edx
    movzbl    3(%esp), %ecx
    movb    %bl, 1(%esi)
    subl    %edx, %ecx
    movl    %edi, %ebx
    movb    %cl, 2(%esi)
    movl    28(%esp), %esi
    cmpl    %ebp, %eax
    movb    %bl, -1(%esi)
    je    .L1
.L5:
    movzbl    (%eax), %esi
    leal    3(%eax), %eax
    movzbl    -2(%eax), %ebx
    notl    %esi
    notl    %ebx
    movl    %esi, %edx
    movzbl    -1(%eax), %ecx
    cmpb    %bl, %dl
    movb    %cl, 3(%esp)
    notb    3(%esp)
    jb    .L8
    movzbl    3(%esp), %edx
    movl    %ebx, %edi
    cmpb    %bl, %dl
    cmovbe    %edx, %edi
    jmp    .L4

and you can see that (1) there are 2 additional moves on top of blocks marked
with .L4 and .L8; (2) redundant spill/fills of 'write' base in block marked
with .L4 (28(%esp)).
To reproduce it is sufficient to compile modified test-case with '-m32
-march=atom' options.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug rtl-optimization/55342] [4.8/4.9 Regression] [LRA,x86] Non-optimal code for simple loop with LRA
  2012-11-15 15:25 [Bug rtl-optimization/55342] New: [LRA,x86] Non-optimal code for simple loop with LRA ysrumyan at gmail dot com
                   ` (10 preceding siblings ...)
  2013-09-05 14:51 ` ysrumyan at gmail dot com
@ 2013-09-13 13:00 ` ysrumyan at gmail dot com
  2013-09-13 13:03 ` ysrumyan at gmail dot com
                   ` (8 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: ysrumyan at gmail dot com @ 2013-09-13 13:00 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342

--- Comment #10 from Yuri Rumyantsev <ysrumyan at gmail dot com> ---
After fix rev. 202468 assembly looks slightly better but we met with another RA
inefficiency which can be illustrated on the attached (t1.c) test compiled with
options "-march=atom -mtune=atom -m32 -O2" that upped bound ol loop check is on
register but base register for "write" is on stack:

.L8:
    movzbl    3(%esp), %edx
    movl    %esi, %ecx
    cmpb    %cl, %dl
    movl    %esi, %edi
    cmovbe    %edx, %edi
.L4:
    movl    %esi, %edx
    movl    28(%esp), %esi  <-- why write is on stack
    movl    %edi, %ecx
    addl    $4, 28(%esp)  <-- perform write incrementation on stack
    subl    %ecx, %edx
    subl    %ecx, %ebx
    movzbl    3(%esp), %ecx
    movb    %dl, (%esi)
    movl    %edi, %edx
    subl    %edx, %ecx
    movb    %bl, 1(%esi)
    movb    %cl, 2(%esi)
    movl    28(%esp), %esi
    cmpl    %ebp, %eax  <-- why upper bound is in register?
    movb    %dl, -1(%esi)
    je    .L1
.L5:
    movzbl    (%eax), %esi
    leal    3(%eax), %eax
    movzbl    -2(%eax), %ebx
    notl    %esi
    notl    %ebx
    movl    %esi, %edx
    movzbl    -1(%eax), %ecx
    cmpb    %bl, %dl
    notl    %ecx
    movb    %cl, 3(%esp)
    jb    .L8
    movzbl    3(%esp), %edx
    movl    %ebx, %edi
    cmpb    %bl, %dl
    cmovbe    %edx, %edi
    jmp    .L4

Is it something wrong in ATOM cost model? But anyway I assume that keeping
upper bound on stack is much cheeper then load base with incrementation from
stack.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug rtl-optimization/55342] [4.8/4.9 Regression] [LRA,x86] Non-optimal code for simple loop with LRA
  2012-11-15 15:25 [Bug rtl-optimization/55342] New: [LRA,x86] Non-optimal code for simple loop with LRA ysrumyan at gmail dot com
                   ` (11 preceding siblings ...)
  2013-09-13 13:00 ` ysrumyan at gmail dot com
@ 2013-09-13 13:03 ` ysrumyan at gmail dot com
  2013-10-16  9:50 ` jakub at gcc dot gnu.org
                   ` (7 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: ysrumyan at gmail dot com @ 2013-09-13 13:03 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342

--- Comment #11 from Yuri Rumyantsev <ysrumyan at gmail dot com> ---
Created attachment 30816
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30816&action=edit
test-case to reproduce

t1.c must be compiled on x86 with options:

-O2 -march=atom -mtune=atom -mfpmath=sse -m32


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug rtl-optimization/55342] [4.8/4.9 Regression] [LRA,x86] Non-optimal code for simple loop with LRA
  2012-11-15 15:25 [Bug rtl-optimization/55342] New: [LRA,x86] Non-optimal code for simple loop with LRA ysrumyan at gmail dot com
                   ` (12 preceding siblings ...)
  2013-09-13 13:03 ` ysrumyan at gmail dot com
@ 2013-10-16  9:50 ` jakub at gcc dot gnu.org
  2014-05-22  9:07 ` [Bug rtl-optimization/55342] [4.8/4.9/4.10 " rguenth at gcc dot gnu.org
                   ` (6 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: jakub at gcc dot gnu.org @ 2013-10-16  9:50 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.8.2                       |4.8.3

--- Comment #12 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC 4.8.2 has been released.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug rtl-optimization/55342] [4.8/4.9/4.10 Regression] [LRA,x86] Non-optimal code for simple loop with LRA
  2012-11-15 15:25 [Bug rtl-optimization/55342] New: [LRA,x86] Non-optimal code for simple loop with LRA ysrumyan at gmail dot com
                   ` (13 preceding siblings ...)
  2013-10-16  9:50 ` jakub at gcc dot gnu.org
@ 2014-05-22  9:07 ` rguenth at gcc dot gnu.org
  2014-12-19 13:36 ` [Bug rtl-optimization/55342] [4.8/4.9/5 " jakub at gcc dot gnu.org
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-05-22  9:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.8.3                       |4.8.4

--- Comment #13 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 4.8.3 is being released, adjusting target milestone.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug rtl-optimization/55342] [4.8/4.9/5 Regression] [LRA,x86] Non-optimal code for simple loop with LRA
  2012-11-15 15:25 [Bug rtl-optimization/55342] New: [LRA,x86] Non-optimal code for simple loop with LRA ysrumyan at gmail dot com
                   ` (14 preceding siblings ...)
  2014-05-22  9:07 ` [Bug rtl-optimization/55342] [4.8/4.9/4.10 " rguenth at gcc dot gnu.org
@ 2014-12-19 13:36 ` jakub at gcc dot gnu.org
  2015-02-12  7:19 ` [Bug rtl-optimization/55342] [4.8/4.9 " law at redhat dot com
                   ` (4 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: jakub at gcc dot gnu.org @ 2014-12-19 13:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.8.4                       |4.8.5

--- Comment #14 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC 4.8.4 has been released.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug rtl-optimization/55342] [4.8/4.9 Regression] [LRA,x86] Non-optimal code for simple loop with LRA
  2012-11-15 15:25 [Bug rtl-optimization/55342] New: [LRA,x86] Non-optimal code for simple loop with LRA ysrumyan at gmail dot com
                   ` (15 preceding siblings ...)
  2014-12-19 13:36 ` [Bug rtl-optimization/55342] [4.8/4.9/5 " jakub at gcc dot gnu.org
@ 2015-02-12  7:19 ` law at redhat dot com
  2015-02-12 13:47 ` jakub at gcc dot gnu.org
                   ` (3 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: law at redhat dot com @ 2015-02-12  7:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342

Jeffrey A. Law <law at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|[4.8/4.9/5 Regression]      |[4.8/4.9 Regression]
                   |[LRA,x86] Non-optimal code  |[LRA,x86] Non-optimal code
                   |for simple loop with LRA    |for simple loop with LRA

--- Comment #15 from Jeffrey A. Law <law at redhat dot com> ---
I've examined the various testcases and the complaints about the poor register
allocation in this BZ with a trunk compiler.

I'm happy to report that I'm seeing none of the issues raised in this BZ.  

For c#0 (store-back part of the loop):
.L5:
        movl    %edi, %ecx
        addl    $4, %esi
        subl    %ecx, %eax
        subl    %ecx, %edx
        movzbl  3(%esp), %ecx
        movb    %al, -3(%esi)
        movl    %edi, %eax
        movb    %dl, -4(%esi)
        subl    %eax, %ecx
        movb    %cl, -2(%esi)
        cmpl    %ebp, %ebx
        movb    %al, -1(%esi)
        je      .L1

In c#2, the negation sequence is pointed out.  We now get:

.L9:
        movzbl  (%ebx), %edx
        movzbl  1(%ebx), %eax
        addl    $3, %ebx
        movzbl  -1(%ebx), %ecx
        notl    %edx
        notl    %eax
        notl    %ecx
        cmpb    %al, %dl
        movb    %cl, 3(%esp)
        jb      .L13
        cmpb    3(%esp), %al
        movzbl  %al, %edi
        jbe     .L5
        movzbl  3(%esp), %edi
        jmp     .L5

For the 1st modified testcase -O2 -mcpu=atom -m32:

.L11:
        movzbl  %al, %edi
        cmpb    %al, %cl
        cmovbe  %ecx, %edi
.L4:
        movl    %edi, %eax
        leal    4(%esi), %esi
        subl    %eax, %edx
        subl    %eax, %ecx
        movb    %dl, -3(%esi)
        movb    %cl, -4(%esi)
        movzbl  3(%esp), %edx
        subl    %eax, %edx
        movl    %edi, %eax
        movb    %dl, -2(%esi)
        cmpl    %ebx, %ebp
        movb    %al, -1(%esi)
        je      .L1
.L7:
        movzbl  (%ebx), %ecx
        leal    3(%ebx), %ebx
        movzbl  -2(%ebx), %edx
        notl    %ecx
        movzbl  -1(%ebx), %eax
        notl    %edx
        notl    %eax
        cmpb    %dl, %cl
        movb    %al, 3(%esp)
        jb      .L11
        movzbl  3(%esp), %eax
        movzbl  %al, %edi
        cmpb    %al, %dl
        cmovbe  %edx, %edi
        jmp     .L4

Then in c#10 (t1 testcase):

.L11:
        movzbl  %al, %edi
        cmpb    %al, %cl
        cmovbe  %ecx, %edi
.L4:
        movl    %edi, %eax
        leal    4(%esi), %esi
        subl    %eax, %edx
        subl    %eax, %ecx
        movb    %dl, -3(%esi)
        movb    %cl, -4(%esi)
        movzbl  3(%esp), %edx
        subl    %eax, %edx
        movl    %edi, %eax
        movb    %dl, -2(%esi)
        cmpl    %ebp, %ebx
        movb    %al, -1(%esi)
        je      .L1
.L7:
        movzbl  (%ebx), %ecx
        leal    3(%ebx), %ebx
        movzbl  -2(%ebx), %edx
        notl    %ecx
        movzbl  -1(%ebx), %eax
        notl    %edx
        notl    %eax
        cmpb    %dl, %cl
        movb    %al, 3(%esp)
        jb      .L11
        movzbl  3(%esp), %eax
        movzbl  %al, %edi
        cmpb    %al, %dl
        cmovbe  %edx, %edi
        jmp     .L4


Across the board we're not seeing objects spilled into the stack.  The code
looks quite tight to me.

Clearing the regressio marker for GCC 5.  I didn't do any bisection work to
identify what changes fixed things.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug rtl-optimization/55342] [4.8/4.9 Regression] [LRA,x86] Non-optimal code for simple loop with LRA
  2012-11-15 15:25 [Bug rtl-optimization/55342] New: [LRA,x86] Non-optimal code for simple loop with LRA ysrumyan at gmail dot com
                   ` (16 preceding siblings ...)
  2015-02-12  7:19 ` [Bug rtl-optimization/55342] [4.8/4.9 " law at redhat dot com
@ 2015-02-12 13:47 ` jakub at gcc dot gnu.org
  2015-06-23  8:21 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: jakub at gcc dot gnu.org @ 2015-02-12 13:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342

--- Comment #16 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
The #c10 issue went away with r204212 I believe.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug rtl-optimization/55342] [4.8/4.9 Regression] [LRA,x86] Non-optimal code for simple loop with LRA
  2012-11-15 15:25 [Bug rtl-optimization/55342] New: [LRA,x86] Non-optimal code for simple loop with LRA ysrumyan at gmail dot com
                   ` (17 preceding siblings ...)
  2015-02-12 13:47 ` jakub at gcc dot gnu.org
@ 2015-06-23  8:21 ` rguenth at gcc dot gnu.org
  2015-06-26 20:09 ` [Bug rtl-optimization/55342] [4.9 " jakub at gcc dot gnu.org
  2015-06-26 20:36 ` jakub at gcc dot gnu.org
  20 siblings, 0 replies; 22+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-06-23  8:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.8.5                       |4.9.3

--- Comment #17 from Richard Biener <rguenth at gcc dot gnu.org> ---
The gcc-4_8-branch is being closed, re-targeting regressions to 4.9.3.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug rtl-optimization/55342] [4.9 Regression] [LRA,x86] Non-optimal code for simple loop with LRA
  2012-11-15 15:25 [Bug rtl-optimization/55342] New: [LRA,x86] Non-optimal code for simple loop with LRA ysrumyan at gmail dot com
                   ` (18 preceding siblings ...)
  2015-06-23  8:21 ` rguenth at gcc dot gnu.org
@ 2015-06-26 20:09 ` jakub at gcc dot gnu.org
  2015-06-26 20:36 ` jakub at gcc dot gnu.org
  20 siblings, 0 replies; 22+ messages in thread
From: jakub at gcc dot gnu.org @ 2015-06-26 20:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342

--- Comment #18 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC 4.9.3 has been released.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug rtl-optimization/55342] [4.9 Regression] [LRA,x86] Non-optimal code for simple loop with LRA
  2012-11-15 15:25 [Bug rtl-optimization/55342] New: [LRA,x86] Non-optimal code for simple loop with LRA ysrumyan at gmail dot com
                   ` (19 preceding siblings ...)
  2015-06-26 20:09 ` [Bug rtl-optimization/55342] [4.9 " jakub at gcc dot gnu.org
@ 2015-06-26 20:36 ` jakub at gcc dot gnu.org
  20 siblings, 0 replies; 22+ messages in thread
From: jakub at gcc dot gnu.org @ 2015-06-26 20:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.9.3                       |4.9.4


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2015-06-26 20:35 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-15 15:25 [Bug rtl-optimization/55342] New: [LRA,x86] Non-optimal code for simple loop with LRA ysrumyan at gmail dot com
2012-11-15 16:47 ` [Bug rtl-optimization/55342] " hjl.tools at gmail dot com
2012-11-17 18:01 ` [Bug rtl-optimization/55342] [4.8 Regression] " vmakarov at gcc dot gnu.org
2012-11-19 12:06 ` ysrumyan at gmail dot com
2012-12-07 10:13 ` rguenth at gcc dot gnu.org
2013-01-28 18:53 ` jakub at gcc dot gnu.org
2013-02-20 14:11 ` rguenth at gcc dot gnu.org
2013-03-22 14:48 ` [Bug rtl-optimization/55342] [4.8/4.9 " jakub at gcc dot gnu.org
2013-05-31 10:59 ` jakub at gcc dot gnu.org
2013-06-06 15:20 ` vmakarov at gcc dot gnu.org
2013-09-05 14:44 ` ysrumyan at gmail dot com
2013-09-05 14:51 ` ysrumyan at gmail dot com
2013-09-13 13:00 ` ysrumyan at gmail dot com
2013-09-13 13:03 ` ysrumyan at gmail dot com
2013-10-16  9:50 ` jakub at gcc dot gnu.org
2014-05-22  9:07 ` [Bug rtl-optimization/55342] [4.8/4.9/4.10 " rguenth at gcc dot gnu.org
2014-12-19 13:36 ` [Bug rtl-optimization/55342] [4.8/4.9/5 " jakub at gcc dot gnu.org
2015-02-12  7:19 ` [Bug rtl-optimization/55342] [4.8/4.9 " law at redhat dot com
2015-02-12 13:47 ` jakub at gcc dot gnu.org
2015-06-23  8:21 ` rguenth at gcc dot gnu.org
2015-06-26 20:09 ` [Bug rtl-optimization/55342] [4.9 " jakub at gcc dot gnu.org
2015-06-26 20:36 ` jakub at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).