public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug regression/39836] New: [4.4 regression] unoptimal code generated
@ 2009-04-21 15:40 alexvod at google dot com
2009-04-21 15:43 ` [Bug rtl-optimization/39836] [4.4/4.5 " pinskia at gcc dot gnu dot org
` (9 more replies)
0 siblings, 10 replies; 11+ messages in thread
From: alexvod at google dot com @ 2009-04-21 15:40 UTC (permalink / raw)
To: gcc-bugs
Very simple code:
int* func();
int func2(long long);
void test (int unused, int idx, char tag, long long value)
{
int *p = func() + idx;
switch (tag) {
case 1:
*p = (int) value;
case 2:
*p = func2(value);
}
}
is compiled to 46 bytes by GCC 4.3.1 and to 48 bytes by GCC 4.4.0. Bisection
shows that it was changed by
http://gcc.gnu.org/viewcvs?view=rev&revision=139949:
Code generated by 139948:
test:
push {r3, r4, r5, r6, r7, lr}
mov r4, r1
mov r5, r2
ldr r6, [sp, #24]
ldr r7, [sp, #28]
bl func
lsl r4, r4, #2
add r4, r0, r4
cmp r5, #1
beq .L3
cmp r5, #2
bne .L5
b .L4
.L3:
str r6, [r4]
.L4:
mov r0, r6
mov r1, r7
bl func2
str r0, [r4]
.L5:
@ sp needed for prologue
pop {r3, r4, r5, r6, r7, pc}
Code generated by 139949:
test:
push {r4, r5, r6, r7, lr}
sub sp, sp, #12
mov r5, r1
ldr r1, [sp, #36]
mov r6, r2
ldr r7, [sp, #32]
str r1, [sp, #4]
bl func
lsl r4, r5, #2
add r4, r0, r4
ldr r1, [sp, #4]
cmp r6, #1
beq .L3
cmp r6, #2
bne .L5
b .L4
.L3:
str r7, [r4]
.L4:
mov r0, r7
bl func2
str r0, [r4]
.L5:
add sp, sp, #12
@ sp needed for prologue
pop {r4, r5, r6, r7, pc}
Temporary variable was spilled on the stack [sp+4].
BTW, this function is compiled by GCC 4.2.1 to 42 (which is event better!).
--
Summary: [4.4 regression] unoptimal code generated
Product: gcc
Version: 4.4.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: regression
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: alexvod at google dot com
GCC build triplet: x86_64-unknown-linux-gnu
GCC host triplet: x86_64-unknown-linux-gnu
GCC target triplet: arm-eabi
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39836
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug rtl-optimization/39836] [4.4/4.5 regression] unoptimal code generated
2009-04-21 15:40 [Bug regression/39836] New: [4.4 regression] unoptimal code generated alexvod at google dot com
@ 2009-04-21 15:43 ` pinskia at gcc dot gnu dot org
2009-04-21 16:08 ` alexvod at google dot com
` (8 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2009-04-21 15:43 UTC (permalink / raw)
To: gcc-bugs
--
pinskia at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Component|regression |rtl-optimization
Keywords| |missed-optimization, ra
Summary|[4.4 regression] unoptimal |[4.4/4.5 regression]
|code generated |unoptimal code generated
Target Milestone|--- |4.4.1
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39836
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug rtl-optimization/39836] [4.4/4.5 regression] unoptimal code generated
2009-04-21 15:40 [Bug regression/39836] New: [4.4 regression] unoptimal code generated alexvod at google dot com
2009-04-21 15:43 ` [Bug rtl-optimization/39836] [4.4/4.5 " pinskia at gcc dot gnu dot org
@ 2009-04-21 16:08 ` alexvod at google dot com
2009-04-21 17:07 ` pinskia at gcc dot gnu dot org
` (7 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: alexvod at google dot com @ 2009-04-21 16:08 UTC (permalink / raw)
To: gcc-bugs
------- Comment #1 from alexvod at google dot com 2009-04-21 16:08 -------
Compilation options: -march=armv5te -fpic -mthumb-interwork -Os -mthumb
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39836
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug rtl-optimization/39836] [4.4/4.5 regression] unoptimal code generated
2009-04-21 15:40 [Bug regression/39836] New: [4.4 regression] unoptimal code generated alexvod at google dot com
2009-04-21 15:43 ` [Bug rtl-optimization/39836] [4.4/4.5 " pinskia at gcc dot gnu dot org
2009-04-21 16:08 ` alexvod at google dot com
@ 2009-04-21 17:07 ` pinskia at gcc dot gnu dot org
2009-04-23 16:49 ` alexvod at google dot com
` (6 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2009-04-21 17:07 UTC (permalink / raw)
To: gcc-bugs
------- Comment #2 from pinskia at gcc dot gnu dot org 2009-04-21 17:07 -------
So it is using stack space instead of r3 for something that is alive across the
function call.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39836
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug rtl-optimization/39836] [4.4/4.5 regression] unoptimal code generated
2009-04-21 15:40 [Bug regression/39836] New: [4.4 regression] unoptimal code generated alexvod at google dot com
` (2 preceding siblings ...)
2009-04-21 17:07 ` pinskia at gcc dot gnu dot org
@ 2009-04-23 16:49 ` alexvod at google dot com
2009-04-26 17:38 ` vmakarov at redhat dot com
` (5 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: alexvod at google dot com @ 2009-04-23 16:49 UTC (permalink / raw)
To: gcc-bugs
------- Comment #3 from alexvod at google dot com 2009-04-23 16:49 -------
Another example of sub-optimal register allocation on ARM/thumb with IRA (not
sure if this the same bug or a different one).
int func(char*);
void func2(const char*, int);
void test(char **pSignature)
{
int clazz = 0;
char *signature = *pSignature;
if (*signature == '[') {
char savedChar;
savedChar = *++signature;
clazz = func(*pSignature);
*signature = savedChar;
}
if (clazz == 0) {
func2("abc", 0);
}
*pSignature = signature;
}
It was changed by http://gcc.gnu.org/viewcvs?view=rev&revision=139590:
GCC rev139589:
test:
push {lr}
sub sp, sp, #12
mov r3, #0
str r3, [sp, #4]
.L2:
add r0, sp, #4
bl func
ldr r3, [sp, #4]
cmp r3, #12
ble .L2
add sp, sp, #12
@ sp needed for prologue
pop {pc}
GCC rev139590:
test:
push {r4, lr}
sub sp, sp, #8
mov r3, #0
add r4, sp, #4 // why put sp+4 in r4 if we can use sp+4 directly?
str r3, [sp, #4]
.L2:
mov r0, r4
bl func
ldr r3, [sp, #4]
cmp r3, #12
ble .L2
add sp, sp, #8
@ sp needed for prologue
pop {r4, pc}
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39836
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug rtl-optimization/39836] [4.4/4.5 regression] unoptimal code generated
2009-04-21 15:40 [Bug regression/39836] New: [4.4 regression] unoptimal code generated alexvod at google dot com
` (3 preceding siblings ...)
2009-04-23 16:49 ` alexvod at google dot com
@ 2009-04-26 17:38 ` vmakarov at redhat dot com
2009-04-27 9:06 ` alexvod at google dot com
` (4 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: vmakarov at redhat dot com @ 2009-04-26 17:38 UTC (permalink / raw)
To: gcc-bugs
------- Comment #4 from vmakarov at redhat dot com 2009-04-26 17:38 -------
The first test case is just an example that RA is a heuristic
solution. Even heuristic algorithm which works worse in average
sometimes can generate a better solution than ones working better in
average.
Here is IRA log of pseudo assignments. Different assignments made by
the old RA are in parenthesis:
Popping a0(r133,l0) -- assign reg 4
Popping a4(r138,l0) -- assign reg 5
Popping a3(r145,l0) -- assign reg 6
Popping a7(r137,l0) -- assign reg 7 (4) <-- key point
Popping a2(r146,l0) -- spill (7)
Popping a9(r139,l0) -- assign reg 2
Popping a10(r147,l0) -- assign reg 6 (1)
Popping a1(r134,l0) -- assign reg 0
Popping a5(r143,l0) -- assign reg 7 (4)
Popping a6(r135,l0) -- assign reg 0
Popping a8(r140,l0) -- assign reg 5 (2)
If IRA assigned hard reg 4 instead of 7 to pseudo 137, all pseudos
would get registers including r139. The old RA assigns 4 to r137
because it prefers the first hard reg in register allocation order if
all other conditions are equal. IRA assigns 7 because it has smaller
cost than 4. The cost is smaller because IRA hopes that r135 could
get 4 later (which is not happened). That is because there is a copy
connected the two allocnos.
cp4:a0(r133)<->a6(r135)@125:shuffle
This copy (called shuffle) is originated from case as in the following
situation
r2 = op (..., r1) and r1 is becoming dead.
Assigning the hard register of r1 to r2 in this situation usually
results in better RA (please read a literature about RA).
The following patch ignoring the shuffle copies results in the same RA
for this test:
Index: gcc/ira-color.c
===================================================================
--- gcc/ira-color.c (revision 146788)
+++ gcc/ira-color.c (working copy)
@@ -298,16 +298,18 @@ update_copy_costs (ira_allocno_t allocno
(&ALLOCNO_UPDATED_HARD_REG_COSTS (another_allocno), cover_class,
ALLOCNO_UPDATED_COVER_CLASS_COST (another_allocno),
ALLOCNO_HARD_REG_COSTS (another_allocno));
- ira_allocate_and_set_or_copy_costs
- (&ALLOCNO_UPDATED_CONFLICT_HARD_REG_COSTS (another_allocno),
- cover_class, 0,
- ALLOCNO_CONFLICT_HARD_REG_COSTS (another_allocno));
i = ira_class_hard_reg_index[cover_class][hard_regno];
ira_assert (i >= 0);
ALLOCNO_UPDATED_HARD_REG_COSTS (another_allocno)[i] += update_cost;
- ALLOCNO_UPDATED_CONFLICT_HARD_REG_COSTS (another_allocno)[i]
- += update_cost;
-
+ if (cp->insn != NULL_RTX || cp->constraint_p)
+ {
+ ira_allocate_and_set_or_copy_costs
+ (&ALLOCNO_UPDATED_CONFLICT_HARD_REG_COSTS (another_allocno),
+ cover_class, 0,
+ ALLOCNO_CONFLICT_HARD_REG_COSTS (another_allocno));
+ ALLOCNO_UPDATED_CONFLICT_HARD_REG_COSTS (another_allocno)[i]
+ += update_cost;
+ }
queue_update_cost (another_allocno, divisor * COST_HOP_DIVISOR);
}
}
But this patch does not improve RA in average (today I checked code
size and performance of SPEC2000 for Core i7 in 32-bit mode). So I
have no sense to commit the patch.
As for the second test I did not find a difference to worry about (I
checked the mainline and gcc-4.3.1 using -march=armv5te
-mthumb-interwork -Os -mthumb):
--- a0.s 2009-04-24 17:40:54.000000000 -0400
+++ a1.s 2009-04-24 17:40:36.000000000 -0400
@@ -1,8 +1,5 @@
.code 16
.file "a.i"
- .section .rodata
-.LC0:
- .ascii "abc\000"
.text
.align 2
.global test
@@ -11,31 +8,35 @@
.type test, %function
test:
push {r4, r5, r6, r7, lr}
- ldr r5, [r0]
- mov r7, r0
- ldrb r3, [r5]
+ ldr r4, [r0]
+ mov r5, r0
+ ldrb r3, [r4]
cmp r3, #91
bne .L2
- mov r0, r5
- ldrb r4, [r5, #1]
+ mov r0, r4
+ ldrb r7, [r4, #1]
bl func
- add r6, r5, #1
- strb r4, [r5, #1]
+ add r6, r4, #1
+ strb r7, [r4, #1]
cmp r0, #0
bne .L3
- mov r5, r6
+ mov r4, r6
.L2:
ldr r0, .L5
mov r1, #0
bl func2
- mov r6, r5
+ mov r6, r4
.L3:
- str r6, [r7]
+ str r6, [r5]
@ sp needed for prologue
pop {r4, r5, r6, r7, pc}
.L6:
.align 2
.L5:
- .word .LC0
+ .word .LANCHOR0
.size test, .-test
- .ident "GCC: (GNU) 4.3.4 20090424 (prerelease)"
+ .section .rodata
+ .set .LANCHOR0,. + 0
+.LC0:
+ .ascii "abc\000"
+ .ident "GCC: (GNU) 4.5.0 20090423 (experimental)"
--------------------------------------------------------------------
Once again RA is a heuristic algorithm (optimal solution for some RA
models, e.g. based on ILP or algorithms for quadratic assignment
problems solutions, are not useful in practice). It is possible to
find a lot of tests where the old RA works better than IRA. Analysis
of the tests takes a lot of time (although I did a lot of them last 2
years working on IRA). Alexander, I'd really appreciate if you did
more analysis and proposed the solutions of the found problem instead
of just posting the tests. Although posting the tests is useful too
but doing just this will make such PRs less and less priority for me.
I checked several times that IRA generates smaller code on bigger
tests for ARM, So there is already a progress in RA in comparison with
the old RA. But of course, there is no limit for perfection.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39836
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug rtl-optimization/39836] [4.4/4.5 regression] unoptimal code generated
2009-04-21 15:40 [Bug regression/39836] New: [4.4 regression] unoptimal code generated alexvod at google dot com
` (4 preceding siblings ...)
2009-04-26 17:38 ` vmakarov at redhat dot com
@ 2009-04-27 9:06 ` alexvod at google dot com
2009-04-27 9:22 ` jakub at gcc dot gnu dot org
` (3 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: alexvod at google dot com @ 2009-04-27 9:06 UTC (permalink / raw)
To: gcc-bugs
------- Comment #5 from alexvod at google dot com 2009-04-27 09:06 -------
Vladimir, many thanks for your analysis! I will try to do analysis myself and
make comparison on larger real-word examples next time. Lowering severity for
now.
--
alexvod at google dot com changed:
What |Removed |Added
----------------------------------------------------------------------------
Severity|normal |minor
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39836
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug rtl-optimization/39836] [4.4/4.5 regression] unoptimal code generated
2009-04-21 15:40 [Bug regression/39836] New: [4.4 regression] unoptimal code generated alexvod at google dot com
` (5 preceding siblings ...)
2009-04-27 9:06 ` alexvod at google dot com
@ 2009-04-27 9:22 ` jakub at gcc dot gnu dot org
2009-05-13 16:47 ` ramana at gcc dot gnu dot org
` (2 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: jakub at gcc dot gnu dot org @ 2009-04-27 9:22 UTC (permalink / raw)
To: gcc-bugs
--
jakub at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Priority|P3 |P4
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39836
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug rtl-optimization/39836] [4.4/4.5 regression] unoptimal code generated
2009-04-21 15:40 [Bug regression/39836] New: [4.4 regression] unoptimal code generated alexvod at google dot com
` (6 preceding siblings ...)
2009-04-27 9:22 ` jakub at gcc dot gnu dot org
@ 2009-05-13 16:47 ` ramana at gcc dot gnu dot org
2009-06-22 16:25 ` steven at gcc dot gnu dot org
2009-07-13 16:05 ` ramana at gcc dot gnu dot org
9 siblings, 0 replies; 11+ messages in thread
From: ramana at gcc dot gnu dot org @ 2009-05-13 16:47 UTC (permalink / raw)
To: gcc-bugs
--
ramana at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Ever Confirmed|0 |1
Last reconfirmed|0000-00-00 00:00:00 |2009-05-13 16:47:27
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39836
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug rtl-optimization/39836] [4.4/4.5 regression] unoptimal code generated
2009-04-21 15:40 [Bug regression/39836] New: [4.4 regression] unoptimal code generated alexvod at google dot com
` (7 preceding siblings ...)
2009-05-13 16:47 ` ramana at gcc dot gnu dot org
@ 2009-06-22 16:25 ` steven at gcc dot gnu dot org
2009-07-13 16:05 ` ramana at gcc dot gnu dot org
9 siblings, 0 replies; 11+ messages in thread
From: steven at gcc dot gnu dot org @ 2009-06-22 16:25 UTC (permalink / raw)
To: gcc-bugs
------- Comment #6 from steven at gcc dot gnu dot org 2009-06-22 16:25 -------
Since this is inherently a heuristics issue, and the IRA heuristics result in
overall better code size according to Vlad, I would like to propose we close
this PR as WONTFIX. Would anyone object to that?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39836
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug rtl-optimization/39836] [4.4/4.5 regression] unoptimal code generated
2009-04-21 15:40 [Bug regression/39836] New: [4.4 regression] unoptimal code generated alexvod at google dot com
` (8 preceding siblings ...)
2009-06-22 16:25 ` steven at gcc dot gnu dot org
@ 2009-07-13 16:05 ` ramana at gcc dot gnu dot org
9 siblings, 0 replies; 11+ messages in thread
From: ramana at gcc dot gnu dot org @ 2009-07-13 16:05 UTC (permalink / raw)
To: gcc-bugs
------- Comment #7 from ramana at gcc dot gnu dot org 2009-07-13 16:05 -------
Since there are no objections to comment #6 - I am closing this to WONTFIX
--
ramana at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |WONTFIX
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39836
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2009-07-13 16:05 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-04-21 15:40 [Bug regression/39836] New: [4.4 regression] unoptimal code generated alexvod at google dot com
2009-04-21 15:43 ` [Bug rtl-optimization/39836] [4.4/4.5 " pinskia at gcc dot gnu dot org
2009-04-21 16:08 ` alexvod at google dot com
2009-04-21 17:07 ` pinskia at gcc dot gnu dot org
2009-04-23 16:49 ` alexvod at google dot com
2009-04-26 17:38 ` vmakarov at redhat dot com
2009-04-27 9:06 ` alexvod at google dot com
2009-04-27 9:22 ` jakub at gcc dot gnu dot org
2009-05-13 16:47 ` ramana at gcc dot gnu dot org
2009-06-22 16:25 ` steven at gcc dot gnu dot org
2009-07-13 16:05 ` ramana at gcc dot gnu dot org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).