[Bug regression/39836] New: [4.4 regression] unoptimal code generated

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug regression/39836]  New: [4.4 regression] unoptimal code generated
@ 2009-04-21 15:40 alexvod at google dot com
  2009-04-21 15:43 ` [Bug rtl-optimization/39836] [4.4/4.5 " pinskia at gcc dot gnu dot org
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: alexvod at google dot com @ 2009-04-21 15:40 UTC (permalink / raw)
  To: gcc-bugs

Very simple code:

int* func();
int func2(long long);
void test (int unused, int idx, char tag, long long value)
{
  int *p = func() + idx;
  switch (tag) {
    case 1:
      *p = (int) value;
    case 2:
      *p = func2(value);
    }
}

is compiled to 46 bytes by GCC 4.3.1 and to 48 bytes by GCC 4.4.0. Bisection
shows that it was changed by
http://gcc.gnu.org/viewcvs?view=rev&revision=139949:

Code generated by 139948:
test:
        push    {r3, r4, r5, r6, r7, lr}
        mov     r4, r1
        mov     r5, r2
        ldr     r6, [sp, #24]
        ldr     r7, [sp, #28]
        bl      func
        lsl     r4, r4, #2
        add     r4, r0, r4
        cmp     r5, #1
        beq     .L3
        cmp     r5, #2
        bne     .L5
        b       .L4
.L3:
        str     r6, [r4]
.L4:
        mov     r0, r6
        mov     r1, r7
        bl      func2
        str     r0, [r4]
.L5:
        @ sp needed for prologue
        pop     {r3, r4, r5, r6, r7, pc}

Code generated by 139949:
test:
        push    {r4, r5, r6, r7, lr}
        sub     sp, sp, #12
        mov     r5, r1
        ldr     r1, [sp, #36]
        mov     r6, r2
        ldr     r7, [sp, #32]
        str     r1, [sp, #4]
        bl      func
        lsl     r4, r5, #2
        add     r4, r0, r4
        ldr     r1, [sp, #4]
        cmp     r6, #1
        beq     .L3
        cmp     r6, #2
        bne     .L5
        b       .L4
.L3:
        str     r7, [r4]
.L4:
        mov     r0, r7
        bl      func2
        str     r0, [r4]
.L5:
        add     sp, sp, #12
        @ sp needed for prologue
        pop     {r4, r5, r6, r7, pc}

Temporary variable was spilled on the stack [sp+4].

BTW, this function is compiled by GCC 4.2.1 to 42 (which is event better!).


-- 
           Summary: [4.4 regression] unoptimal code generated
           Product: gcc
           Version: 4.4.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: regression
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: alexvod at google dot com
 GCC build triplet: x86_64-unknown-linux-gnu
  GCC host triplet: x86_64-unknown-linux-gnu
GCC target triplet: arm-eabi


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39836


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug rtl-optimization/39836] [4.4/4.5 regression] unoptimal code generated
  2009-04-21 15:40 [Bug regression/39836] New: [4.4 regression] unoptimal code generated alexvod at google dot com
@ 2009-04-21 15:43 ` pinskia at gcc dot gnu dot org
  2009-04-21 16:08 ` alexvod at google dot com
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2009-04-21 15:43 UTC (permalink / raw)
  To: gcc-bugs



-- 

pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|regression                  |rtl-optimization
           Keywords|                            |missed-optimization, ra
            Summary|[4.4 regression] unoptimal  |[4.4/4.5 regression]
                   |code generated              |unoptimal code generated
   Target Milestone|---                         |4.4.1


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39836


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug rtl-optimization/39836] [4.4/4.5 regression] unoptimal code generated
  2009-04-21 15:40 [Bug regression/39836] New: [4.4 regression] unoptimal code generated alexvod at google dot com
  2009-04-21 15:43 ` [Bug rtl-optimization/39836] [4.4/4.5 " pinskia at gcc dot gnu dot org
@ 2009-04-21 16:08 ` alexvod at google dot com
  2009-04-21 17:07 ` pinskia at gcc dot gnu dot org
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: alexvod at google dot com @ 2009-04-21 16:08 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from alexvod at google dot com  2009-04-21 16:08 -------
Compilation options: -march=armv5te -fpic -mthumb-interwork -Os -mthumb


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39836


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug rtl-optimization/39836] [4.4/4.5 regression] unoptimal code generated
  2009-04-21 15:40 [Bug regression/39836] New: [4.4 regression] unoptimal code generated alexvod at google dot com
  2009-04-21 15:43 ` [Bug rtl-optimization/39836] [4.4/4.5 " pinskia at gcc dot gnu dot org
  2009-04-21 16:08 ` alexvod at google dot com
@ 2009-04-21 17:07 ` pinskia at gcc dot gnu dot org
  2009-04-23 16:49 ` alexvod at google dot com
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2009-04-21 17:07 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from pinskia at gcc dot gnu dot org  2009-04-21 17:07 -------
So it is using stack space instead of r3 for something that is alive across the
function call.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39836


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug rtl-optimization/39836] [4.4/4.5 regression] unoptimal code generated
  2009-04-21 15:40 [Bug regression/39836] New: [4.4 regression] unoptimal code generated alexvod at google dot com
                   ` (2 preceding siblings ...)
  2009-04-21 17:07 ` pinskia at gcc dot gnu dot org
@ 2009-04-23 16:49 ` alexvod at google dot com
  2009-04-26 17:38 ` vmakarov at redhat dot com
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: alexvod at google dot com @ 2009-04-23 16:49 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from alexvod at google dot com  2009-04-23 16:49 -------
Another example of sub-optimal register allocation on ARM/thumb with IRA (not
sure if this the same bug or a different one).

int func(char*);
void func2(const char*, int);

void test(char **pSignature)
{
  int clazz = 0;
  char *signature = *pSignature;
  if (*signature == '[') {
    char savedChar;
    savedChar = *++signature;
    clazz = func(*pSignature);
    *signature = savedChar;
  }
  if (clazz == 0) {
    func2("abc", 0);
  }
  *pSignature = signature;
}

It was changed by http://gcc.gnu.org/viewcvs?view=rev&revision=139590:

GCC rev139589:
test:
        push    {lr}
        sub     sp, sp, #12
        mov     r3, #0
        str     r3, [sp, #4]
.L2:
        add     r0, sp, #4
        bl      func
        ldr     r3, [sp, #4]
        cmp     r3, #12
        ble     .L2
        add     sp, sp, #12
        @ sp needed for prologue
        pop     {pc}

GCC rev139590:
test:
        push    {r4, lr}
        sub     sp, sp, #8
        mov     r3, #0
        add     r4, sp, #4   // why put sp+4 in r4 if we can use sp+4 directly?
        str     r3, [sp, #4]
.L2:
        mov     r0, r4
        bl      func
        ldr     r3, [sp, #4]
        cmp     r3, #12
        ble     .L2
        add     sp, sp, #8
        @ sp needed for prologue
        pop     {r4, pc}


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39836


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug rtl-optimization/39836] [4.4/4.5 regression] unoptimal code generated
  2009-04-21 15:40 [Bug regression/39836] New: [4.4 regression] unoptimal code generated alexvod at google dot com
                   ` (3 preceding siblings ...)
  2009-04-23 16:49 ` alexvod at google dot com
@ 2009-04-26 17:38 ` vmakarov at redhat dot com
  2009-04-27  9:06 ` alexvod at google dot com
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: vmakarov at redhat dot com @ 2009-04-26 17:38 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from vmakarov at redhat dot com  2009-04-26 17:38 -------
The first test case is just an example that RA is a heuristic
solution.  Even heuristic algorithm which works worse in average
sometimes can generate a better solution than ones working better in
average.

Here is IRA log of pseudo assignments.  Different assignments made by
the old RA are in parenthesis:

      Popping a0(r133,l0)  -- assign reg 4 
      Popping a4(r138,l0)  -- assign reg 5
      Popping a3(r145,l0)  -- assign reg 6
      Popping a7(r137,l0)  -- assign reg 7  (4) <-- key point
      Popping a2(r146,l0)  -- spill         (7)
      Popping a9(r139,l0)  -- assign reg 2
      Popping a10(r147,l0)  -- assign reg 6 (1)
      Popping a1(r134,l0)  -- assign reg 0
      Popping a5(r143,l0)  -- assign reg 7  (4)
      Popping a6(r135,l0)  -- assign reg 0
      Popping a8(r140,l0)  -- assign reg 5  (2)

If IRA assigned hard reg 4 instead of 7 to pseudo 137, all pseudos
would get registers including r139.  The old RA assigns 4 to r137
because it prefers the first hard reg in register allocation order if
all other conditions are equal.  IRA assigns 7 because it has smaller
cost than 4.  The cost is smaller because IRA hopes that r135 could
get 4 later (which is not happened).  That is because there is a copy
connected the two allocnos.

  cp4:a0(r133)<->a6(r135)@125:shuffle

This copy (called shuffle) is originated from case as in the following
situation

r2 = op (..., r1) and r1 is becoming dead.

Assigning the hard register of r1 to r2 in this situation usually
results in better RA (please read a literature about RA).

The following patch ignoring the shuffle copies results in the same RA
for this test:

Index: gcc/ira-color.c
===================================================================
--- gcc/ira-color.c     (revision 146788)
+++ gcc/ira-color.c     (working copy)
@@ -298,16 +298,18 @@ update_copy_costs (ira_allocno_t allocno
            (&ALLOCNO_UPDATED_HARD_REG_COSTS (another_allocno), cover_class,
             ALLOCNO_UPDATED_COVER_CLASS_COST (another_allocno),
             ALLOCNO_HARD_REG_COSTS (another_allocno));
-         ira_allocate_and_set_or_copy_costs
-           (&ALLOCNO_UPDATED_CONFLICT_HARD_REG_COSTS (another_allocno),
-            cover_class, 0,
-            ALLOCNO_CONFLICT_HARD_REG_COSTS (another_allocno));
          i = ira_class_hard_reg_index[cover_class][hard_regno];
          ira_assert (i >= 0);
          ALLOCNO_UPDATED_HARD_REG_COSTS (another_allocno)[i] += update_cost;
-         ALLOCNO_UPDATED_CONFLICT_HARD_REG_COSTS (another_allocno)[i]
-           += update_cost;
-
+         if (cp->insn != NULL_RTX || cp->constraint_p)
+           {
+             ira_allocate_and_set_or_copy_costs
+               (&ALLOCNO_UPDATED_CONFLICT_HARD_REG_COSTS (another_allocno),
+                cover_class, 0,
+                ALLOCNO_CONFLICT_HARD_REG_COSTS (another_allocno));
+             ALLOCNO_UPDATED_CONFLICT_HARD_REG_COSTS (another_allocno)[i]
+               += update_cost;
+           }
          queue_update_cost (another_allocno, divisor * COST_HOP_DIVISOR);
        }
     }

But this patch does not improve RA in average (today I checked code
size and performance of SPEC2000 for Core i7 in 32-bit mode).  So I
have no sense to commit the patch.

As for the second test I did not find a difference to worry about (I
checked the mainline and gcc-4.3.1 using -march=armv5te
-mthumb-interwork -Os -mthumb):

--- a0.s        2009-04-24 17:40:54.000000000 -0400
+++ a1.s        2009-04-24 17:40:36.000000000 -0400
@@ -1,8 +1,5 @@
        .code   16
        .file   "a.i"
-       .section        .rodata
-.LC0:
-       .ascii  "abc\000"
        .text
        .align  2
        .global test
@@ -11,31 +8,35 @@
        .type   test, %function
 test:
        push    {r4, r5, r6, r7, lr}
-       ldr     r5, [r0]
-       mov     r7, r0
-       ldrb    r3, [r5]
+       ldr     r4, [r0]
+       mov     r5, r0
+       ldrb    r3, [r4]
        cmp     r3, #91
        bne     .L2
-       mov     r0, r5
-       ldrb    r4, [r5, #1]
+       mov     r0, r4
+       ldrb    r7, [r4, #1]
        bl      func
-       add     r6, r5, #1
-       strb    r4, [r5, #1]
+       add     r6, r4, #1
+       strb    r7, [r4, #1]
        cmp     r0, #0
        bne     .L3
-       mov     r5, r6
+       mov     r4, r6
 .L2:
        ldr     r0, .L5
        mov     r1, #0
        bl      func2
-       mov     r6, r5
+       mov     r6, r4
 .L3:
-       str     r6, [r7]
+       str     r6, [r5]
        @ sp needed for prologue
        pop     {r4, r5, r6, r7, pc}
 .L6:
        .align  2
 .L5:
-       .word   .LC0
+       .word   .LANCHOR0
        .size   test, .-test
-       .ident  "GCC: (GNU) 4.3.4 20090424 (prerelease)"
+       .section        .rodata
+       .set    .LANCHOR0,. + 0
+.LC0:
+       .ascii  "abc\000"
+       .ident  "GCC: (GNU) 4.5.0 20090423 (experimental)"

--------------------------------------------------------------------

Once again RA is a heuristic algorithm (optimal solution for some RA
models, e.g. based on ILP or algorithms for quadratic assignment
problems solutions, are not useful in practice).  It is possible to
find a lot of tests where the old RA works better than IRA.  Analysis
of the tests takes a lot of time (although I did a lot of them last 2
years working on IRA).  Alexander, I'd really appreciate if you did
more analysis and proposed the solutions of the found problem instead
of just posting the tests.  Although posting the tests is useful too
but doing just this will make such PRs less and less priority for me.
I checked several times that IRA generates smaller code on bigger
tests for ARM, So there is already a progress in RA in comparison with
the old RA.  But of course, there is no limit for perfection.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39836


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug rtl-optimization/39836] [4.4/4.5 regression] unoptimal code generated
  2009-04-21 15:40 [Bug regression/39836] New: [4.4 regression] unoptimal code generated alexvod at google dot com
                   ` (4 preceding siblings ...)
  2009-04-26 17:38 ` vmakarov at redhat dot com
@ 2009-04-27  9:06 ` alexvod at google dot com
  2009-04-27  9:22 ` jakub at gcc dot gnu dot org
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: alexvod at google dot com @ 2009-04-27  9:06 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from alexvod at google dot com  2009-04-27 09:06 -------
Vladimir, many thanks for your analysis! I will try to do analysis myself and
make comparison on larger real-word examples next time. Lowering severity for
now.


-- 

alexvod at google dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |minor


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39836


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug rtl-optimization/39836] [4.4/4.5 regression] unoptimal code generated
  2009-04-21 15:40 [Bug regression/39836] New: [4.4 regression] unoptimal code generated alexvod at google dot com
                   ` (5 preceding siblings ...)
  2009-04-27  9:06 ` alexvod at google dot com
@ 2009-04-27  9:22 ` jakub at gcc dot gnu dot org
  2009-05-13 16:47 ` ramana at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: jakub at gcc dot gnu dot org @ 2009-04-27  9:22 UTC (permalink / raw)
  To: gcc-bugs



-- 

jakub at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P3                          |P4


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39836


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug rtl-optimization/39836] [4.4/4.5 regression] unoptimal code generated
  2009-04-21 15:40 [Bug regression/39836] New: [4.4 regression] unoptimal code generated alexvod at google dot com
                   ` (6 preceding siblings ...)
  2009-04-27  9:22 ` jakub at gcc dot gnu dot org
@ 2009-05-13 16:47 ` ramana at gcc dot gnu dot org
  2009-06-22 16:25 ` steven at gcc dot gnu dot org
  2009-07-13 16:05 ` ramana at gcc dot gnu dot org
  9 siblings, 0 replies; 11+ messages in thread
From: ramana at gcc dot gnu dot org @ 2009-05-13 16:47 UTC (permalink / raw)
  To: gcc-bugs



-- 

ramana at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|0                           |1
   Last reconfirmed|0000-00-00 00:00:00         |2009-05-13 16:47:27
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39836


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug rtl-optimization/39836] [4.4/4.5 regression] unoptimal code generated
  2009-04-21 15:40 [Bug regression/39836] New: [4.4 regression] unoptimal code generated alexvod at google dot com
                   ` (7 preceding siblings ...)
  2009-05-13 16:47 ` ramana at gcc dot gnu dot org
@ 2009-06-22 16:25 ` steven at gcc dot gnu dot org
  2009-07-13 16:05 ` ramana at gcc dot gnu dot org
  9 siblings, 0 replies; 11+ messages in thread
From: steven at gcc dot gnu dot org @ 2009-06-22 16:25 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from steven at gcc dot gnu dot org  2009-06-22 16:25 -------
Since this is inherently a heuristics issue, and the IRA heuristics result in
overall better code size according to Vlad, I would like to propose we close
this PR as WONTFIX. Would anyone object to that?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39836


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug rtl-optimization/39836] [4.4/4.5 regression] unoptimal code generated
  2009-04-21 15:40 [Bug regression/39836] New: [4.4 regression] unoptimal code generated alexvod at google dot com
                   ` (8 preceding siblings ...)
  2009-06-22 16:25 ` steven at gcc dot gnu dot org
@ 2009-07-13 16:05 ` ramana at gcc dot gnu dot org
  9 siblings, 0 replies; 11+ messages in thread
From: ramana at gcc dot gnu dot org @ 2009-07-13 16:05 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #7 from ramana at gcc dot gnu dot org  2009-07-13 16:05 -------
Since there are no objections to comment #6 - I am closing this to WONTFIX


-- 

ramana at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |WONTFIX


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39836


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2009-07-13 16:05 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-04-21 15:40 [Bug regression/39836] New: [4.4 regression] unoptimal code generated alexvod at google dot com
2009-04-21 15:43 ` [Bug rtl-optimization/39836] [4.4/4.5 " pinskia at gcc dot gnu dot org
2009-04-21 16:08 ` alexvod at google dot com
2009-04-21 17:07 ` pinskia at gcc dot gnu dot org
2009-04-23 16:49 ` alexvod at google dot com
2009-04-26 17:38 ` vmakarov at redhat dot com
2009-04-27  9:06 ` alexvod at google dot com
2009-04-27  9:22 ` jakub at gcc dot gnu dot org
2009-05-13 16:47 ` ramana at gcc dot gnu dot org
2009-06-22 16:25 ` steven at gcc dot gnu dot org
2009-07-13 16:05 ` ramana at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).