public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
* [Bug regression/39836] New: [4.4 regression] unoptimal code generated @ 2009-04-21 15:40 alexvod at google dot com 2009-04-21 15:43 ` [Bug rtl-optimization/39836] [4.4/4.5 " pinskia at gcc dot gnu dot org ` (9 more replies) 0 siblings, 10 replies; 11+ messages in thread From: alexvod at google dot com @ 2009-04-21 15:40 UTC (permalink / raw) To: gcc-bugs Very simple code: int* func(); int func2(long long); void test (int unused, int idx, char tag, long long value) { int *p = func() + idx; switch (tag) { case 1: *p = (int) value; case 2: *p = func2(value); } } is compiled to 46 bytes by GCC 4.3.1 and to 48 bytes by GCC 4.4.0. Bisection shows that it was changed by http://gcc.gnu.org/viewcvs?view=rev&revision=139949: Code generated by 139948: test: push {r3, r4, r5, r6, r7, lr} mov r4, r1 mov r5, r2 ldr r6, [sp, #24] ldr r7, [sp, #28] bl func lsl r4, r4, #2 add r4, r0, r4 cmp r5, #1 beq .L3 cmp r5, #2 bne .L5 b .L4 .L3: str r6, [r4] .L4: mov r0, r6 mov r1, r7 bl func2 str r0, [r4] .L5: @ sp needed for prologue pop {r3, r4, r5, r6, r7, pc} Code generated by 139949: test: push {r4, r5, r6, r7, lr} sub sp, sp, #12 mov r5, r1 ldr r1, [sp, #36] mov r6, r2 ldr r7, [sp, #32] str r1, [sp, #4] bl func lsl r4, r5, #2 add r4, r0, r4 ldr r1, [sp, #4] cmp r6, #1 beq .L3 cmp r6, #2 bne .L5 b .L4 .L3: str r7, [r4] .L4: mov r0, r7 bl func2 str r0, [r4] .L5: add sp, sp, #12 @ sp needed for prologue pop {r4, r5, r6, r7, pc} Temporary variable was spilled on the stack [sp+4]. BTW, this function is compiled by GCC 4.2.1 to 42 (which is event better!). -- Summary: [4.4 regression] unoptimal code generated Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: regression AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: alexvod at google dot com GCC build triplet: x86_64-unknown-linux-gnu GCC host triplet: x86_64-unknown-linux-gnu GCC target triplet: arm-eabi http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39836 ^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug rtl-optimization/39836] [4.4/4.5 regression] unoptimal code generated 2009-04-21 15:40 [Bug regression/39836] New: [4.4 regression] unoptimal code generated alexvod at google dot com @ 2009-04-21 15:43 ` pinskia at gcc dot gnu dot org 2009-04-21 16:08 ` alexvod at google dot com ` (8 subsequent siblings) 9 siblings, 0 replies; 11+ messages in thread From: pinskia at gcc dot gnu dot org @ 2009-04-21 15:43 UTC (permalink / raw) To: gcc-bugs -- pinskia at gcc dot gnu dot org changed: What |Removed |Added ---------------------------------------------------------------------------- Component|regression |rtl-optimization Keywords| |missed-optimization, ra Summary|[4.4 regression] unoptimal |[4.4/4.5 regression] |code generated |unoptimal code generated Target Milestone|--- |4.4.1 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39836 ^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug rtl-optimization/39836] [4.4/4.5 regression] unoptimal code generated 2009-04-21 15:40 [Bug regression/39836] New: [4.4 regression] unoptimal code generated alexvod at google dot com 2009-04-21 15:43 ` [Bug rtl-optimization/39836] [4.4/4.5 " pinskia at gcc dot gnu dot org @ 2009-04-21 16:08 ` alexvod at google dot com 2009-04-21 17:07 ` pinskia at gcc dot gnu dot org ` (7 subsequent siblings) 9 siblings, 0 replies; 11+ messages in thread From: alexvod at google dot com @ 2009-04-21 16:08 UTC (permalink / raw) To: gcc-bugs ------- Comment #1 from alexvod at google dot com 2009-04-21 16:08 ------- Compilation options: -march=armv5te -fpic -mthumb-interwork -Os -mthumb -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39836 ^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug rtl-optimization/39836] [4.4/4.5 regression] unoptimal code generated 2009-04-21 15:40 [Bug regression/39836] New: [4.4 regression] unoptimal code generated alexvod at google dot com 2009-04-21 15:43 ` [Bug rtl-optimization/39836] [4.4/4.5 " pinskia at gcc dot gnu dot org 2009-04-21 16:08 ` alexvod at google dot com @ 2009-04-21 17:07 ` pinskia at gcc dot gnu dot org 2009-04-23 16:49 ` alexvod at google dot com ` (6 subsequent siblings) 9 siblings, 0 replies; 11+ messages in thread From: pinskia at gcc dot gnu dot org @ 2009-04-21 17:07 UTC (permalink / raw) To: gcc-bugs ------- Comment #2 from pinskia at gcc dot gnu dot org 2009-04-21 17:07 ------- So it is using stack space instead of r3 for something that is alive across the function call. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39836 ^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug rtl-optimization/39836] [4.4/4.5 regression] unoptimal code generated 2009-04-21 15:40 [Bug regression/39836] New: [4.4 regression] unoptimal code generated alexvod at google dot com ` (2 preceding siblings ...) 2009-04-21 17:07 ` pinskia at gcc dot gnu dot org @ 2009-04-23 16:49 ` alexvod at google dot com 2009-04-26 17:38 ` vmakarov at redhat dot com ` (5 subsequent siblings) 9 siblings, 0 replies; 11+ messages in thread From: alexvod at google dot com @ 2009-04-23 16:49 UTC (permalink / raw) To: gcc-bugs ------- Comment #3 from alexvod at google dot com 2009-04-23 16:49 ------- Another example of sub-optimal register allocation on ARM/thumb with IRA (not sure if this the same bug or a different one). int func(char*); void func2(const char*, int); void test(char **pSignature) { int clazz = 0; char *signature = *pSignature; if (*signature == '[') { char savedChar; savedChar = *++signature; clazz = func(*pSignature); *signature = savedChar; } if (clazz == 0) { func2("abc", 0); } *pSignature = signature; } It was changed by http://gcc.gnu.org/viewcvs?view=rev&revision=139590: GCC rev139589: test: push {lr} sub sp, sp, #12 mov r3, #0 str r3, [sp, #4] .L2: add r0, sp, #4 bl func ldr r3, [sp, #4] cmp r3, #12 ble .L2 add sp, sp, #12 @ sp needed for prologue pop {pc} GCC rev139590: test: push {r4, lr} sub sp, sp, #8 mov r3, #0 add r4, sp, #4 // why put sp+4 in r4 if we can use sp+4 directly? str r3, [sp, #4] .L2: mov r0, r4 bl func ldr r3, [sp, #4] cmp r3, #12 ble .L2 add sp, sp, #8 @ sp needed for prologue pop {r4, pc} -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39836 ^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug rtl-optimization/39836] [4.4/4.5 regression] unoptimal code generated 2009-04-21 15:40 [Bug regression/39836] New: [4.4 regression] unoptimal code generated alexvod at google dot com ` (3 preceding siblings ...) 2009-04-23 16:49 ` alexvod at google dot com @ 2009-04-26 17:38 ` vmakarov at redhat dot com 2009-04-27 9:06 ` alexvod at google dot com ` (4 subsequent siblings) 9 siblings, 0 replies; 11+ messages in thread From: vmakarov at redhat dot com @ 2009-04-26 17:38 UTC (permalink / raw) To: gcc-bugs ------- Comment #4 from vmakarov at redhat dot com 2009-04-26 17:38 ------- The first test case is just an example that RA is a heuristic solution. Even heuristic algorithm which works worse in average sometimes can generate a better solution than ones working better in average. Here is IRA log of pseudo assignments. Different assignments made by the old RA are in parenthesis: Popping a0(r133,l0) -- assign reg 4 Popping a4(r138,l0) -- assign reg 5 Popping a3(r145,l0) -- assign reg 6 Popping a7(r137,l0) -- assign reg 7 (4) <-- key point Popping a2(r146,l0) -- spill (7) Popping a9(r139,l0) -- assign reg 2 Popping a10(r147,l0) -- assign reg 6 (1) Popping a1(r134,l0) -- assign reg 0 Popping a5(r143,l0) -- assign reg 7 (4) Popping a6(r135,l0) -- assign reg 0 Popping a8(r140,l0) -- assign reg 5 (2) If IRA assigned hard reg 4 instead of 7 to pseudo 137, all pseudos would get registers including r139. The old RA assigns 4 to r137 because it prefers the first hard reg in register allocation order if all other conditions are equal. IRA assigns 7 because it has smaller cost than 4. The cost is smaller because IRA hopes that r135 could get 4 later (which is not happened). That is because there is a copy connected the two allocnos. cp4:a0(r133)<->a6(r135)@125:shuffle This copy (called shuffle) is originated from case as in the following situation r2 = op (..., r1) and r1 is becoming dead. Assigning the hard register of r1 to r2 in this situation usually results in better RA (please read a literature about RA). The following patch ignoring the shuffle copies results in the same RA for this test: Index: gcc/ira-color.c =================================================================== --- gcc/ira-color.c (revision 146788) +++ gcc/ira-color.c (working copy) @@ -298,16 +298,18 @@ update_copy_costs (ira_allocno_t allocno (&ALLOCNO_UPDATED_HARD_REG_COSTS (another_allocno), cover_class, ALLOCNO_UPDATED_COVER_CLASS_COST (another_allocno), ALLOCNO_HARD_REG_COSTS (another_allocno)); - ira_allocate_and_set_or_copy_costs - (&ALLOCNO_UPDATED_CONFLICT_HARD_REG_COSTS (another_allocno), - cover_class, 0, - ALLOCNO_CONFLICT_HARD_REG_COSTS (another_allocno)); i = ira_class_hard_reg_index[cover_class][hard_regno]; ira_assert (i >= 0); ALLOCNO_UPDATED_HARD_REG_COSTS (another_allocno)[i] += update_cost; - ALLOCNO_UPDATED_CONFLICT_HARD_REG_COSTS (another_allocno)[i] - += update_cost; - + if (cp->insn != NULL_RTX || cp->constraint_p) + { + ira_allocate_and_set_or_copy_costs + (&ALLOCNO_UPDATED_CONFLICT_HARD_REG_COSTS (another_allocno), + cover_class, 0, + ALLOCNO_CONFLICT_HARD_REG_COSTS (another_allocno)); + ALLOCNO_UPDATED_CONFLICT_HARD_REG_COSTS (another_allocno)[i] + += update_cost; + } queue_update_cost (another_allocno, divisor * COST_HOP_DIVISOR); } } But this patch does not improve RA in average (today I checked code size and performance of SPEC2000 for Core i7 in 32-bit mode). So I have no sense to commit the patch. As for the second test I did not find a difference to worry about (I checked the mainline and gcc-4.3.1 using -march=armv5te -mthumb-interwork -Os -mthumb): --- a0.s 2009-04-24 17:40:54.000000000 -0400 +++ a1.s 2009-04-24 17:40:36.000000000 -0400 @@ -1,8 +1,5 @@ .code 16 .file "a.i" - .section .rodata -.LC0: - .ascii "abc\000" .text .align 2 .global test @@ -11,31 +8,35 @@ .type test, %function test: push {r4, r5, r6, r7, lr} - ldr r5, [r0] - mov r7, r0 - ldrb r3, [r5] + ldr r4, [r0] + mov r5, r0 + ldrb r3, [r4] cmp r3, #91 bne .L2 - mov r0, r5 - ldrb r4, [r5, #1] + mov r0, r4 + ldrb r7, [r4, #1] bl func - add r6, r5, #1 - strb r4, [r5, #1] + add r6, r4, #1 + strb r7, [r4, #1] cmp r0, #0 bne .L3 - mov r5, r6 + mov r4, r6 .L2: ldr r0, .L5 mov r1, #0 bl func2 - mov r6, r5 + mov r6, r4 .L3: - str r6, [r7] + str r6, [r5] @ sp needed for prologue pop {r4, r5, r6, r7, pc} .L6: .align 2 .L5: - .word .LC0 + .word .LANCHOR0 .size test, .-test - .ident "GCC: (GNU) 4.3.4 20090424 (prerelease)" + .section .rodata + .set .LANCHOR0,. + 0 +.LC0: + .ascii "abc\000" + .ident "GCC: (GNU) 4.5.0 20090423 (experimental)" -------------------------------------------------------------------- Once again RA is a heuristic algorithm (optimal solution for some RA models, e.g. based on ILP or algorithms for quadratic assignment problems solutions, are not useful in practice). It is possible to find a lot of tests where the old RA works better than IRA. Analysis of the tests takes a lot of time (although I did a lot of them last 2 years working on IRA). Alexander, I'd really appreciate if you did more analysis and proposed the solutions of the found problem instead of just posting the tests. Although posting the tests is useful too but doing just this will make such PRs less and less priority for me. I checked several times that IRA generates smaller code on bigger tests for ARM, So there is already a progress in RA in comparison with the old RA. But of course, there is no limit for perfection. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39836 ^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug rtl-optimization/39836] [4.4/4.5 regression] unoptimal code generated 2009-04-21 15:40 [Bug regression/39836] New: [4.4 regression] unoptimal code generated alexvod at google dot com ` (4 preceding siblings ...) 2009-04-26 17:38 ` vmakarov at redhat dot com @ 2009-04-27 9:06 ` alexvod at google dot com 2009-04-27 9:22 ` jakub at gcc dot gnu dot org ` (3 subsequent siblings) 9 siblings, 0 replies; 11+ messages in thread From: alexvod at google dot com @ 2009-04-27 9:06 UTC (permalink / raw) To: gcc-bugs ------- Comment #5 from alexvod at google dot com 2009-04-27 09:06 ------- Vladimir, many thanks for your analysis! I will try to do analysis myself and make comparison on larger real-word examples next time. Lowering severity for now. -- alexvod at google dot com changed: What |Removed |Added ---------------------------------------------------------------------------- Severity|normal |minor http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39836 ^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug rtl-optimization/39836] [4.4/4.5 regression] unoptimal code generated 2009-04-21 15:40 [Bug regression/39836] New: [4.4 regression] unoptimal code generated alexvod at google dot com ` (5 preceding siblings ...) 2009-04-27 9:06 ` alexvod at google dot com @ 2009-04-27 9:22 ` jakub at gcc dot gnu dot org 2009-05-13 16:47 ` ramana at gcc dot gnu dot org ` (2 subsequent siblings) 9 siblings, 0 replies; 11+ messages in thread From: jakub at gcc dot gnu dot org @ 2009-04-27 9:22 UTC (permalink / raw) To: gcc-bugs -- jakub at gcc dot gnu dot org changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|P3 |P4 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39836 ^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug rtl-optimization/39836] [4.4/4.5 regression] unoptimal code generated 2009-04-21 15:40 [Bug regression/39836] New: [4.4 regression] unoptimal code generated alexvod at google dot com ` (6 preceding siblings ...) 2009-04-27 9:22 ` jakub at gcc dot gnu dot org @ 2009-05-13 16:47 ` ramana at gcc dot gnu dot org 2009-06-22 16:25 ` steven at gcc dot gnu dot org 2009-07-13 16:05 ` ramana at gcc dot gnu dot org 9 siblings, 0 replies; 11+ messages in thread From: ramana at gcc dot gnu dot org @ 2009-05-13 16:47 UTC (permalink / raw) To: gcc-bugs -- ramana at gcc dot gnu dot org changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Ever Confirmed|0 |1 Last reconfirmed|0000-00-00 00:00:00 |2009-05-13 16:47:27 date| | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39836 ^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug rtl-optimization/39836] [4.4/4.5 regression] unoptimal code generated 2009-04-21 15:40 [Bug regression/39836] New: [4.4 regression] unoptimal code generated alexvod at google dot com ` (7 preceding siblings ...) 2009-05-13 16:47 ` ramana at gcc dot gnu dot org @ 2009-06-22 16:25 ` steven at gcc dot gnu dot org 2009-07-13 16:05 ` ramana at gcc dot gnu dot org 9 siblings, 0 replies; 11+ messages in thread From: steven at gcc dot gnu dot org @ 2009-06-22 16:25 UTC (permalink / raw) To: gcc-bugs ------- Comment #6 from steven at gcc dot gnu dot org 2009-06-22 16:25 ------- Since this is inherently a heuristics issue, and the IRA heuristics result in overall better code size according to Vlad, I would like to propose we close this PR as WONTFIX. Would anyone object to that? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39836 ^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug rtl-optimization/39836] [4.4/4.5 regression] unoptimal code generated 2009-04-21 15:40 [Bug regression/39836] New: [4.4 regression] unoptimal code generated alexvod at google dot com ` (8 preceding siblings ...) 2009-06-22 16:25 ` steven at gcc dot gnu dot org @ 2009-07-13 16:05 ` ramana at gcc dot gnu dot org 9 siblings, 0 replies; 11+ messages in thread From: ramana at gcc dot gnu dot org @ 2009-07-13 16:05 UTC (permalink / raw) To: gcc-bugs ------- Comment #7 from ramana at gcc dot gnu dot org 2009-07-13 16:05 ------- Since there are no objections to comment #6 - I am closing this to WONTFIX -- ramana at gcc dot gnu dot org changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |WONTFIX http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39836 ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2009-07-13 16:05 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2009-04-21 15:40 [Bug regression/39836] New: [4.4 regression] unoptimal code generated alexvod at google dot com 2009-04-21 15:43 ` [Bug rtl-optimization/39836] [4.4/4.5 " pinskia at gcc dot gnu dot org 2009-04-21 16:08 ` alexvod at google dot com 2009-04-21 17:07 ` pinskia at gcc dot gnu dot org 2009-04-23 16:49 ` alexvod at google dot com 2009-04-26 17:38 ` vmakarov at redhat dot com 2009-04-27 9:06 ` alexvod at google dot com 2009-04-27 9:22 ` jakub at gcc dot gnu dot org 2009-05-13 16:47 ` ramana at gcc dot gnu dot org 2009-06-22 16:25 ` steven at gcc dot gnu dot org 2009-07-13 16:05 ` ramana at gcc dot gnu dot org
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).