public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/42499] New: Bad register allocation in multiplication code by constant
@ 2009-12-25 10:12 sliao at google dot com
2009-12-31 15:30 ` [Bug rtl-optimization/42499] " rguenth at gcc dot gnu dot org
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: sliao at google dot com @ 2009-12-25 10:12 UTC (permalink / raw)
To: gcc-bugs
It seems that GCC 4.2.1 generates better code than GCC 4.4.0 in this case:
The following code (extracted from Android's
Dalvik_java_lang_System_currentTimeMillis in native/java_lang_System.c):
// compilation options: -march=armv5te -mthumb -Os
struct timeval
{
long tv_sec;
long tv_usec;
};
extern void get_time(struct timeval*);
void test(long long *res)
{
struct timeval tv;
get_time(&tv);
*res = tv.tv_sec * 1000LL + tv.tv_usec / 1000;
}
is compiled by gcc-4.4.0 in sub-optimal way, so it takes 110 bytes (vs 74 bytes
when compiled by gcc-4.2.1). Assembly files shows that it spills some registers
on stack because code that multiply on 1000LL uses more registers that it need
(that is use when compiled by gcc-4.2.1). Multiplication code is similar, but
gcc 4.4 emits several additional MOVs that can be easily eliminated.
This bug can be more easily demonstrated with multiplication of tv_sec by 10
and tv_usec/ 1000 removed.
gcc.4.2.1:
push {r4, r5, lr}
sub sp, sp, #12
mov r5, r0
mov r0, sp
bl get_time
ldr r2, [sp]
add sp, sp, #12
@ sp needed for prologue
asr r4, r2, #31
mov r3, r2
lsr r0, r2, #30
lsl r2, r4, #2
orr r2, r2, r0
lsl r1, r3, #2
add r1, r1, r3
adc r2, r2, r4
lsr r0, r1, #31
lsl r4, r2, #1
orr r4, r4, r0
lsl r3, r1, #1
str r3, [r5]
str r4, [r5, #4]
pop {r4, r5, pc}
gcc 4.4.0:
push {r4, r5, r6, r7, lr} // note that gcc 4.2.1 uses only
{r4, r5, lr}
sub sp, sp, #12
mov r4, r0
mov r0, sp
bl get_time
ldr r6, [sp]
add sp, sp, #12
@ sp needed for prologue
mov r0, r6
asr r6, r6, #31
lsr r7, r0, #30
lsl r3, r6, #2
orr r3, r3, r7
mov r1, r6 // not needed actually, r6 can be used directly
lsl r2, r0, #2
add r0, r0, r2
adc r1, r1, r3
lsr r2, r0, #31
lsl r3, r1, #1
orr r3, r3, r2
lsl r0, r0, #1
str r0, [r4]
str r3, [r4, #4]
pop {r4, r5, r6, r7, pc}
--
Summary: Bad register allocation in multiplication code by
constant
Product: gcc
Version: 4.4.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: sliao at google dot com
GCC build triplet: i686-linux
GCC host triplet: i686-linux
GCC target triplet: arm-eabi
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42499
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug rtl-optimization/42499] Bad register allocation in multiplication code by constant
2009-12-25 10:12 [Bug rtl-optimization/42499] New: Bad register allocation in multiplication code by constant sliao at google dot com
@ 2009-12-31 15:30 ` rguenth at gcc dot gnu dot org
2010-01-05 18:28 ` ramana at gcc dot gnu dot org
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-12-31 15:30 UTC (permalink / raw)
To: gcc-bugs
------- Comment #1 from rguenth at gcc dot gnu dot org 2009-12-31 15:29 -------
Please try with trunk.
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |missed-optimization, ra
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42499
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug rtl-optimization/42499] Bad register allocation in multiplication code by constant
2009-12-25 10:12 [Bug rtl-optimization/42499] New: Bad register allocation in multiplication code by constant sliao at google dot com
2009-12-31 15:30 ` [Bug rtl-optimization/42499] " rguenth at gcc dot gnu dot org
@ 2010-01-05 18:28 ` ramana at gcc dot gnu dot org
2010-01-07 12:44 ` sliao at google dot com
2010-01-07 12:55 ` sliao at google dot com
3 siblings, 0 replies; 5+ messages in thread
From: ramana at gcc dot gnu dot org @ 2010-01-05 18:28 UTC (permalink / raw)
To: gcc-bugs
------- Comment #2 from ramana at gcc dot gnu dot org 2010-01-05 18:28 -------
Why is there no load of tv.tv_usec in the code generated that you've pasted?
Are you sure you've pasted this right ?
With 4.4 arm-eabi - 17/12/2009 snapshot I see the following code ?
for -march=armv5te -mthumb -Os . Are you sure you have given the right options
here ?
.arch armv5te
.fpu softvfp
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 1
.eabi_attribute 30, 4
.eabi_attribute 18, 4
.code 16
.file "t.c"
.global __aeabi_idiv
.text
.align 2
.global test
.code 16
.thumb_func
.type test, %function
test:
push {r4, r5, r6, r7, lr}
sub sp, sp, #28
add r4, sp, #16
str r0, [sp, #12]
mov r0, r4
bl get_time
mov r1, #250
ldr r0, [r4, #4]
lsl r1, r1, #2
bl __aeabi_idiv
ldr r7, [sp, #16]
mov r3, r7
asr r7, r7, #31
mov r4, r7
lsl r7, r7, #5
lsr r1, r3, #27
mov r2, r7
orr r2, r2, r1
str r2, [sp, #4]
mov r5, r0
asr r6, r0, #31
lsl r0, r3, #5
str r0, [sp]
ldr r0, [sp]
ldr r1, [sp, #4]
sub r0, r0, r3
sbc r1, r1, r4
str r0, [sp]
str r1, [sp, #4]
ldr r1, [sp]
ldr r7, [sp, #4]
lsr r0, r1, #30
lsl r2, r7, #2
orr r2, r2, r0
ldr r0, [sp]
lsl r1, r0, #2
mov r0, r1
mov r1, r2
add r0, r0, r3
adc r1, r1, r4
lsr r4, r0, #29
lsl r3, r1, #3
orr r3, r3, r4
ldr r1, [sp, #12]
lsl r2, r0, #3
add r5, r5, r2
adc r6, r6, r3
add sp, sp, #28
str r5, [r1]
str r6, [r1, #4]
@ sp needed for prologue
pop {r4, r5, r6, r7, pc}
.size test, .-test
.ident "GCC: (GNU) 4.4.3 20091217 (prerelease)"
--
ramana at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |WAITING
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42499
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug rtl-optimization/42499] Bad register allocation in multiplication code by constant
2009-12-25 10:12 [Bug rtl-optimization/42499] New: Bad register allocation in multiplication code by constant sliao at google dot com
2009-12-31 15:30 ` [Bug rtl-optimization/42499] " rguenth at gcc dot gnu dot org
2010-01-05 18:28 ` ramana at gcc dot gnu dot org
@ 2010-01-07 12:44 ` sliao at google dot com
2010-01-07 12:55 ` sliao at google dot com
3 siblings, 0 replies; 5+ messages in thread
From: sliao at google dot com @ 2010-01-07 12:44 UTC (permalink / raw)
To: gcc-bugs
------- Comment #3 from sliao at google dot com 2010-01-07 12:43 -------
Because:
"This bug can be more easily demonstrated with multiplication of tv_sec by 10
and tv_usec/ 1000 removed"
, the input program is:
#include <sys/time.h>
extern void get_time(struct timeval*);
void test(long long *res)
{
struct timeval tv;
get_time(&tv);
*res = tv.tv_sec * 10;
}
As a result, there is no load of tv.tv_usec in the code generated. Sorry for
the confusion.
--
sliao at google dot com changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jingyu at google dot com,
| |dougkwan at google dot com,
| |carrot at google dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42499
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug rtl-optimization/42499] Bad register allocation in multiplication code by constant
2009-12-25 10:12 [Bug rtl-optimization/42499] New: Bad register allocation in multiplication code by constant sliao at google dot com
` (2 preceding siblings ...)
2010-01-07 12:44 ` sliao at google dot com
@ 2010-01-07 12:55 ` sliao at google dot com
3 siblings, 0 replies; 5+ messages in thread
From: sliao at google dot com @ 2010-01-07 12:55 UTC (permalink / raw)
To: gcc-bugs
------- Comment #4 from sliao at google dot com 2010-01-07 12:55 -------
Compilation flags: -march=armv5te -mthumb -Os
gcc.4.2.1: (code size 0x1e bytes)
push {r4, lr}
sub sp, #8
adds r4, r0, #0
mov r0, sp
bl 0 <get_time>
ldr r2, [sp, #0]
add sp, #8
lsls r3, r2, #2
adds r3, r3, r2
lsls r3, r3, #1
str r3, [r4, #0]
asrs r3, r3, #31
str r3, [r4, #4]
pop {r4, pc} nop ; (mov r8, r8) // why is this NOP not optimized away?
gcc.4.5.0: (code size 0x1c bytes)
push {r4, lr}
sub sp, #8
adds r4, r0, #0
mov r0, sp
bl 0 <get_time>
ldr r3, [sp, #0]
add sp, #8
lsls r2, r3, #2
adds r3, r2, r3
lsls r3, r3, #1
str r3, [r4, #0]
asrs r3, r3, #31
str r3, [r4, #4]
pop {r4, pc}
BTW, again, the input program is now
#include <sys/time.h>
extern void get_time(struct timeval*);
void test(long long *res)
{
struct timeval tv;
get_time(&tv);
*res = tv.tv_sec * 10;
}
1. I apologize for the confusion on the source code. Originally I used less
simplified code from Dalvik_java_lang_System_currentTimeMillis. Now I use this
code above, so the code looks different.
2. Anyway, the codes generated by GCC 4.2.1 and 4.5.0 are generally the same
(except that there's NOP at the end of the 4.2.1's). I think this bug is
resolved in the trunk.
--
sliao at google dot com changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|WAITING |RESOLVED
Resolution| |FIXED
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42499
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2010-01-07 12:55 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-12-25 10:12 [Bug rtl-optimization/42499] New: Bad register allocation in multiplication code by constant sliao at google dot com
2009-12-31 15:30 ` [Bug rtl-optimization/42499] " rguenth at gcc dot gnu dot org
2010-01-05 18:28 ` ramana at gcc dot gnu dot org
2010-01-07 12:44 ` sliao at google dot com
2010-01-07 12:55 ` sliao at google dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).