[Bug rtl-optimization/42499] New: Bad register allocation in multiplication code by constant

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug rtl-optimization/42499]  New: Bad register allocation in multiplication code by constant
@ 2009-12-25 10:12 sliao at google dot com
  2009-12-31 15:30 ` [Bug rtl-optimization/42499] " rguenth at gcc dot gnu dot org
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: sliao at google dot com @ 2009-12-25 10:12 UTC (permalink / raw)
  To: gcc-bugs

It seems that GCC 4.2.1 generates better code than GCC 4.4.0 in this case:

The following code (extracted from Android's
Dalvik_java_lang_System_currentTimeMillis in native/java_lang_System.c):

// compilation options: -march=armv5te -mthumb -Os

struct timeval
{
 long tv_sec;
 long tv_usec;
};

extern void get_time(struct timeval*);

void test(long long *res)
{
 struct timeval tv;
 get_time(&tv);
 *res = tv.tv_sec * 1000LL + tv.tv_usec / 1000;
}

is compiled by gcc-4.4.0 in sub-optimal way, so it takes 110 bytes (vs 74 bytes
when compiled by gcc-4.2.1). Assembly files shows that it spills some registers
on stack because code that multiply on 1000LL uses more registers that it need
(that is use when compiled by gcc-4.2.1). Multiplication code is similar, but
gcc 4.4 emits several additional MOVs that can be easily eliminated.

This bug can be more easily demonstrated with multiplication of tv_sec by 10
and tv_usec/ 1000 removed.

gcc.4.2.1:
      push    {r4, r5, lr}
       sub     sp, sp, #12
       mov     r5, r0
       mov     r0, sp
       bl      get_time
       ldr     r2, [sp]
       add     sp, sp, #12
       @ sp needed for prologue
       asr     r4, r2, #31
       mov     r3, r2
       lsr     r0, r2, #30
       lsl     r2, r4, #2
       orr     r2, r2, r0
       lsl     r1, r3, #2
       add     r1, r1, r3
       adc     r2, r2, r4
       lsr     r0, r1, #31
       lsl     r4, r2, #1
       orr     r4, r4, r0
       lsl     r3, r1, #1
       str     r3, [r5]
       str     r4, [r5, #4]
       pop     {r4, r5, pc}

gcc 4.4.0:
       push    {r4, r5, r6, r7, lr}          // note that gcc 4.2.1 uses only
{r4, r5, lr}
       sub     sp, sp, #12
       mov     r4, r0
       mov     r0, sp
       bl      get_time
       ldr     r6, [sp]
       add     sp, sp, #12
       @ sp needed for prologue
       mov     r0, r6
       asr     r6, r6, #31
       lsr     r7, r0, #30
       lsl     r3, r6, #2
       orr     r3, r3, r7
       mov     r1, r6   // not needed actually, r6 can be used directly
       lsl     r2, r0, #2
       add     r0, r0, r2
       adc     r1, r1, r3
       lsr     r2, r0, #31
       lsl     r3, r1, #1
       orr     r3, r3, r2
       lsl     r0, r0, #1
       str     r0, [r4]
       str     r3, [r4, #4]
       pop     {r4, r5, r6, r7, pc}


-- 
           Summary: Bad register allocation in multiplication code by
                    constant
           Product: gcc
           Version: 4.4.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: sliao at google dot com
 GCC build triplet: i686-linux
  GCC host triplet: i686-linux
GCC target triplet: arm-eabi


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42499


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug rtl-optimization/42499] Bad register allocation in multiplication code by constant
  2009-12-25 10:12 [Bug rtl-optimization/42499] New: Bad register allocation in multiplication code by constant sliao at google dot com
@ 2009-12-31 15:30 ` rguenth at gcc dot gnu dot org
  2010-01-05 18:28 ` ramana at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-12-31 15:30 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from rguenth at gcc dot gnu dot org  2009-12-31 15:29 -------
Please try with trunk.


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization, ra


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42499


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug rtl-optimization/42499] Bad register allocation in multiplication code by constant
  2009-12-25 10:12 [Bug rtl-optimization/42499] New: Bad register allocation in multiplication code by constant sliao at google dot com
  2009-12-31 15:30 ` [Bug rtl-optimization/42499] " rguenth at gcc dot gnu dot org
@ 2010-01-05 18:28 ` ramana at gcc dot gnu dot org
  2010-01-07 12:44 ` sliao at google dot com
  2010-01-07 12:55 ` sliao at google dot com
  3 siblings, 0 replies; 5+ messages in thread
From: ramana at gcc dot gnu dot org @ 2010-01-05 18:28 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from ramana at gcc dot gnu dot org  2010-01-05 18:28 -------
Why is there no load of tv.tv_usec in the code generated that you've pasted?
Are you sure you've pasted this right ? 

With 4.4 arm-eabi - 17/12/2009 snapshot I see the following code ?

for -march=armv5te -mthumb -Os . Are you sure you have given the right options
here ? 



        .arch armv5te
        .fpu softvfp
        .eabi_attribute 20, 1
        .eabi_attribute 21, 1
        .eabi_attribute 23, 3
        .eabi_attribute 24, 1
        .eabi_attribute 25, 1
        .eabi_attribute 26, 1
        .eabi_attribute 30, 4
        .eabi_attribute 18, 4
        .code   16
        .file   "t.c"
        .global __aeabi_idiv
        .text
        .align  2
        .global test
        .code   16
        .thumb_func
        .type   test, %function
test:
        push    {r4, r5, r6, r7, lr}
        sub     sp, sp, #28
        add     r4, sp, #16
        str     r0, [sp, #12]
        mov     r0, r4
        bl      get_time
        mov     r1, #250
        ldr     r0, [r4, #4]
        lsl     r1, r1, #2
        bl      __aeabi_idiv
        ldr     r7, [sp, #16]
        mov     r3, r7
        asr     r7, r7, #31
        mov     r4, r7
        lsl     r7, r7, #5
        lsr     r1, r3, #27
        mov     r2, r7
        orr     r2, r2, r1
        str     r2, [sp, #4]
        mov     r5, r0
        asr     r6, r0, #31
        lsl     r0, r3, #5
        str     r0, [sp]
        ldr     r0, [sp]
        ldr     r1, [sp, #4]
        sub     r0, r0, r3
        sbc     r1, r1, r4
        str     r0, [sp]
        str     r1, [sp, #4]
        ldr     r1, [sp]
        ldr     r7, [sp, #4]
        lsr     r0, r1, #30
        lsl     r2, r7, #2
        orr     r2, r2, r0
        ldr     r0, [sp]
        lsl     r1, r0, #2
        mov     r0, r1
        mov     r1, r2
        add     r0, r0, r3
        adc     r1, r1, r4
        lsr     r4, r0, #29
        lsl     r3, r1, #3
        orr     r3, r3, r4
        ldr     r1, [sp, #12]
        lsl     r2, r0, #3
        add     r5, r5, r2
        adc     r6, r6, r3
        add     sp, sp, #28
        str     r5, [r1]
        str     r6, [r1, #4]
        @ sp needed for prologue
        pop     {r4, r5, r6, r7, pc}
        .size   test, .-test
        .ident  "GCC: (GNU) 4.4.3 20091217 (prerelease)"


-- 

ramana at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |WAITING


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42499


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug rtl-optimization/42499] Bad register allocation in multiplication code by constant
  2009-12-25 10:12 [Bug rtl-optimization/42499] New: Bad register allocation in multiplication code by constant sliao at google dot com
  2009-12-31 15:30 ` [Bug rtl-optimization/42499] " rguenth at gcc dot gnu dot org
  2010-01-05 18:28 ` ramana at gcc dot gnu dot org
@ 2010-01-07 12:44 ` sliao at google dot com
  2010-01-07 12:55 ` sliao at google dot com
  3 siblings, 0 replies; 5+ messages in thread
From: sliao at google dot com @ 2010-01-07 12:44 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from sliao at google dot com  2010-01-07 12:43 -------
Because:
"This bug can be more easily demonstrated with multiplication of tv_sec by 10
and tv_usec/ 1000 removed"

, the input program is:
#include <sys/time.h>

extern void get_time(struct timeval*);

void test(long long *res)
{
    struct timeval tv;
    get_time(&tv);
    *res = tv.tv_sec * 10;
}

As a result, there is no load of tv.tv_usec in the code generated. Sorry for
the confusion.


-- 

sliao at google dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jingyu at google dot com,
                   |                            |dougkwan at google dot com,
                   |                            |carrot at google dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42499


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug rtl-optimization/42499] Bad register allocation in multiplication code by constant
  2009-12-25 10:12 [Bug rtl-optimization/42499] New: Bad register allocation in multiplication code by constant sliao at google dot com
                   ` (2 preceding siblings ...)
  2010-01-07 12:44 ` sliao at google dot com
@ 2010-01-07 12:55 ` sliao at google dot com
  3 siblings, 0 replies; 5+ messages in thread
From: sliao at google dot com @ 2010-01-07 12:55 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from sliao at google dot com  2010-01-07 12:55 -------
Compilation flags: -march=armv5te -mthumb -Os

gcc.4.2.1: (code size 0x1e bytes)
 push {r4, lr} 
 sub sp, #8
 adds r4, r0, #0
 mov r0, sp
 bl 0 <get_time>
 ldr r2, [sp, #0]
 add sp, #8
 lsls r3, r2, #2
 adds r3, r3, r2
 lsls r3, r3, #1
 str r3, [r4, #0]
 asrs r3, r3, #31
 str r3, [r4, #4]
 pop {r4, pc} nop ; (mov r8, r8) // why is this NOP not optimized away?

gcc.4.5.0: (code size 0x1c bytes)
 push {r4, lr}
 sub sp, #8
 adds r4, r0, #0
 mov r0, sp
 bl 0 <get_time>
 ldr r3, [sp, #0]
 add sp, #8
 lsls r2, r3, #2
 adds r3, r2, r3
 lsls r3, r3, #1
 str r3, [r4, #0]
 asrs r3, r3, #31
 str r3, [r4, #4]
 pop {r4, pc}

BTW, again, the input program is now 

#include <sys/time.h>

extern void get_time(struct timeval*);

void test(long long *res)
{
    struct timeval tv;
    get_time(&tv);
    *res = tv.tv_sec * 10;
}

1. I apologize for the confusion on the source code. Originally I used less
simplified code from Dalvik_java_lang_System_currentTimeMillis. Now I use this
code above, so the code looks different.

2. Anyway, the codes generated by GCC 4.2.1 and 4.5.0 are generally the same
(except that there's NOP at the end of the 4.2.1's). I think this bug is
resolved in the trunk.


-- 

sliao at google dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|WAITING                     |RESOLVED
         Resolution|                            |FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42499


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-01-07 12:55 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-12-25 10:12 [Bug rtl-optimization/42499] New: Bad register allocation in multiplication code by constant sliao at google dot com
2009-12-31 15:30 ` [Bug rtl-optimization/42499] " rguenth at gcc dot gnu dot org
2010-01-05 18:28 ` ramana at gcc dot gnu dot org
2010-01-07 12:44 ` sliao at google dot com
2010-01-07 12:55 ` sliao at google dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).