[Bug c/27016] New: ARM optimizer produces severely suboptimal code

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug c/27016]  New: ARM optimizer produces severely suboptimal code
@ 2006-04-04  8:12 Eric dot Doenges at betty-tv dot com
  2006-04-04  8:16 ` [Bug middle-end/27016] " pinskia at gcc dot gnu dot org
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: Eric dot Doenges at betty-tv dot com @ 2006-04-04  8:12 UTC (permalink / raw)
  To: gcc-bugs

The compiler creates extremely bad code for the ARM target.
Consider the following source file:

--- SNIP ---
unsigned int code_in_ram[100];

void testme(void)
{
  unsigned int *p_rom, *p_ram, *p_end, len;

  extern unsigned int _ram_erase_sector_start;
  extern unsigned int _ram_erase_sector_end;


  p_ram = code_in_ram;
  p_rom = &_ram_erase_sector_start;
  len = ((unsigned int)&_ram_erase_sector_end 
         - (unsigned int)&_ram_erase_sector_start) / sizeof(unsigned int);

  for (p_rom = &_ram_erase_sector_start, p_end = &_ram_erase_sector_end;
       p_rom < p_end;) {
    *p_ram++ = *p_rom++;
  }
}
--- SNIP ---

Compiled with arm-elf-gcc -mcpu=arm7tdmi -S -Os testme.c, we get the following
code:

--- SNIP ---
        .file   "testme.c"
        .text
        .align  2
        .global testme
        .type   testme, %function
testme:
        @ args = 0, pretend = 0, frame = 0
        @ frame_needed = 0, uses_anonymous_args = 0
        @ link register save eliminated.
        ldr     r1, .L6
        ldr     r2, .L6+4
        @ lr needed for prologue
        b       .L2
.L3:
        ldr     r3, [r1], #4
        str     r3, [r2, #-4]
.L2:
        ldr     r3, .L6+8
        cmp     r1, r3
        add     r2, r2, #4
        bcc     .L3
        bx      lr
.L7:
        .align  2
.L6:
        .word   _ram_erase_sector_start
        .word   code_in_ram
        .word   _ram_erase_sector_end
        .size   testme, .-testme
        .comm   code_in_ram,400,4
        .ident  "GCC: (GNU) 4.1.0"
--- SNIP ---

Even a cursory examination reveals that it would be a lot better to write:

    ldr r1, .L6
    ldr r2, .L6+4
    ldr r0, .L6+8
    b   .L2

.L3:
    ldr r3, [r1], #4
    str r3, [r2], #4
.L2:
    cmp r1, r0
    bcc .L3
    bx  lr

This code would be one instruction shorter overall , and two instructions less
in the loop. The way
gcc-4.1.0 refuses to use post-indexed addressing for the store is especially
bizzare, since it does use
post-indexed addressing for the preceeding load. Gcc 3.4.3 does not exhibit
this behaviour; it compiles
the above code to:

   ldr r2, .L6
   ldr r0, .L6+4
   cmp r2,r0
   ldr r1, .L6
   movcs pc,lr

.L4:
   ldr r2,[r2],#4
   cmp r2, r0
   str r3,[r1],#4
   bcc .L4
   mov pc,lr

While not perfect either, this also only has 4 instructions in the loop.


-- 
           Summary: ARM optimizer produces severely suboptimal code
           Product: gcc
           Version: 4.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: Eric dot Doenges at betty-tv dot com
  GCC host triplet: powerpc-apple-darwin8.5.0
GCC target triplet: arm-elf-unknown


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27016


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug middle-end/27016] ARM optimizer produces severely suboptimal code
  2006-04-04  8:12 [Bug c/27016] New: ARM optimizer produces severely suboptimal code Eric dot Doenges at betty-tv dot com
@ 2006-04-04  8:16 ` pinskia at gcc dot gnu dot org
  2006-04-04  9:14 ` Eric dot Doenges at betty-tv dot com
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2006-04-04  8:16 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from pinskia at gcc dot gnu dot org  2006-04-04 08:16 -------
This code is undefined:
  len = ((unsigned int)&_ram_erase_sector_end 
         - (unsigned int)&_ram_erase_sector_start) / sizeof(unsigned int);

That is obviously undefined as taking the difference between two pointers which
are not in the same array is undefined code.

Even the comparision:
p_rom < p_end;
is undefined.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27016


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug middle-end/27016] ARM optimizer produces severely suboptimal code
  2006-04-04  8:12 [Bug c/27016] New: ARM optimizer produces severely suboptimal code Eric dot Doenges at betty-tv dot com
  2006-04-04  8:16 ` [Bug middle-end/27016] " pinskia at gcc dot gnu dot org
@ 2006-04-04  9:14 ` Eric dot Doenges at betty-tv dot com
  2009-06-30 11:35 ` steven at gcc dot gnu dot org
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Eric dot Doenges at betty-tv dot com @ 2006-04-04  9:14 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from Eric dot Doenges at betty-tv dot com  2006-04-04 09:13 -------
(In reply to comment #1)
> This code is undefined:
>   len = ((unsigned int)&_ram_erase_sector_end 
>          - (unsigned int)&_ram_erase_sector_start) / sizeof(unsigned int);
> 
> That is obviously undefined as taking the difference between two pointers which
> are not in the same array is undefined code.
> 
> Even the comparision:
> p_rom < p_end;
> is undefined.
> 
In the code I took this snippet from, _ram_erase_sector_start and
_ram_erase_sector_end are symbols
generated by the linker at the start and the end of a special segment which I
need to copy to ram,
so I would argue that these pointers do in fact refer to the same "array" (in
this case, the "array" is the
entire flash memory). 

However, none of this should affect the decision to use (or not to use) the
post-indexed addressing
mode. If I replace the for loop with a for (len = 100; len > 0; --len), the
quality of the generated code
actually degrades even further:

        ldr     r2, .L7
        ldr     r1, .L7+4
        @ lr needed for prologue
.L2:
        ldr     r3, [r1, #-4]
        str     r3, [r2, #-4]
        ldr     r3, .L7+8
        add     r2, r2, #4
        cmp     r2, r3
        add     r1, r1, #4
        bne     .L2
        bx      lr

While I thinks it's nifty that gcc recognizes that it doesn't need to keep the
len variable, but instead
uses p_ram to determine when the loop is finished, I also think it's pretty
brain-dead that it won't
use post-indexed addressing for either the ldr or str in the loop. And why it
thinks it needs to load
the constant end address to compare against every time inside the loop instead
of once into a scratch
register outside the loop is anyone's guess.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27016


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug middle-end/27016] ARM optimizer produces severely suboptimal code
  2006-04-04  8:12 [Bug c/27016] New: ARM optimizer produces severely suboptimal code Eric dot Doenges at betty-tv dot com
  2006-04-04  8:16 ` [Bug middle-end/27016] " pinskia at gcc dot gnu dot org
  2006-04-04  9:14 ` Eric dot Doenges at betty-tv dot com
@ 2009-06-30 11:35 ` steven at gcc dot gnu dot org
  2009-06-30 13:21 ` steven at gcc dot gnu dot org
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: steven at gcc dot gnu dot org @ 2009-06-30 11:35 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from steven at gcc dot gnu dot org  2009-06-30 11:35 -------
For this test case:

unsigned int code_in_ram[100];

void testme(void)
{
  unsigned int *p_rom, *p_ram, *p_end, len;

  extern unsigned int _ram_erase_sector_start;
  extern unsigned int _ram_erase_sector_end;

  p_ram = code_in_ram;
  p_rom = &_ram_erase_sector_start;
  len = ((unsigned int)&_ram_erase_sector_end
         - (unsigned int)&_ram_erase_sector_start) / sizeof(unsigned int);

  for (len = 100; len > 0; --len)
    {
      *p_ram++ = *p_rom++;
    }
}


I get the following code with a 4.5.0 checkout from a few days ago (see
.ident):

        .cpu arm7tdmi
        .fpu softvfp
        .eabi_attribute 20, 1
        .eabi_attribute 21, 1
        .eabi_attribute 23, 3
        .eabi_attribute 24, 1
        .eabi_attribute 25, 1
        .eabi_attribute 26, 1
        .eabi_attribute 30, 4
        .eabi_attribute 18, 4
        .file   "t.c"
        .text
        .align  2
        .global testme
        .type   testme, %function
testme:
        @ Function supports interworking.
        @ args = 0, pretend = 0, frame = 0
        @ frame_needed = 0, uses_anonymous_args = 0
        @ link register save eliminated.
        ldr     r1, .L5
        ldr     r2, .L5+4
        mov     r3, #0
.L2:
        ldr     r0, [r2, r3]
        str     r0, [r1, r3]
        add     r3, r3, #4
        cmp     r3, #400
        bne     .L2
        bx      lr
.L6:
        .align  2
.L5:
        .word   code_in_ram
        .word   _ram_erase_sector_start
        .size   testme, .-testme
        .comm   code_in_ram,400,4
        .ident  "GCC: (GNU) 4.5.0 20090626 (experimental) [trunk revision
148960]"


This looks marginally better than the code of comment #2.



With the original test case of comment #0 I get the following code:

        .cpu arm7tdmi
        .fpu softvfp
        .eabi_attribute 20, 1
        .eabi_attribute 21, 1
        .eabi_attribute 23, 3
        .eabi_attribute 24, 1
        .eabi_attribute 25, 1
        .eabi_attribute 26, 1
        .eabi_attribute 30, 4
        .eabi_attribute 18, 4
        .file   "t.c"
        .text
        .align  2
        .global testme
        .type   testme, %function
testme:
        @ Function supports interworking.
        @ args = 0, pretend = 0, frame = 0
        @ frame_needed = 0, uses_anonymous_args = 0
        @ link register save eliminated.
        ldr     r2, .L5
        ldr     r3, .L5+4
        ldr     r1, .L5+8
        b       .L2
.L3:
        ldr     r0, [r3], #4
        str     r0, [r2, #-4]
.L2:
        cmp     r3, r1
        add     r2, r2, #4
        bcc     .L3
        bx      lr
.L6:
        .align  2
.L5:
        .word   code_in_ram
        .word   _ram_erase_sector_start
        .word   _ram_erase_sector_end
        .size   testme, .-testme
        .comm   code_in_ram,400,4
        .ident  "GCC: (GNU) 4.5.0 20090626 (experimental) [trunk revision
148960]"


The "ldr     r3, .L6+8" line of comment #0 is now hoisted from the loop into r1
in "ldr     r1, .L5+8".  GCC still does not use a post-increment for the store
in the loop (see use of r2), even though it uses a post-increment for the load.

So, bug still is there.  Might be a dup of one of those many other bugs about
poor use of auto-increment instructions by GCC... ;-)


-- 

steven at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|0                           |1
   Last reconfirmed|0000-00-00 00:00:00         |2009-06-30 11:35:26
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27016


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug middle-end/27016] ARM optimizer produces severely suboptimal code
  2006-04-04  8:12 [Bug c/27016] New: ARM optimizer produces severely suboptimal code Eric dot Doenges at betty-tv dot com
                   ` (2 preceding siblings ...)
  2009-06-30 11:35 ` steven at gcc dot gnu dot org
@ 2009-06-30 13:21 ` steven at gcc dot gnu dot org
  2009-06-30 13:27 ` steven at gcc dot gnu dot org
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: steven at gcc dot gnu dot org @ 2009-06-30 13:21 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from steven at gcc dot gnu dot org  2009-06-30 13:21 -------
The auto-inc-dec pass fails because the store and the reg increment are not in
the same basic block.  The dump of the pass before auto-inc-dec (reginfo) looks
like this:

;; Function testme (testme)

   74 NOTE_INSN_BASIC_BLOCK
   72 NOTE_INSN_FUNCTION_BEG
   76 r205:SI=`code_in_ram'
   73 r203:SI=`_ram_erase_sector_start'
   87 r208:SI=`_ram_erase_sector_end'
L86:
   79 NOTE_INSN_BASIC_BLOCK
   80 r206:SI=[r203:SI]
   81 [r205:SI-0x4]=r206:SI
      REG_DEAD: r206:SI
   82 r203:SI=r203:SI+0x4
L83:
   84 NOTE_INSN_BASIC_BLOCK
   85 r205:SI=r205:SI+0x4
   88 cc:CC=cmp(r203:SI,r208:SI)
      REG_EQUAL: cmp(r203:SI,`_ram_erase_sector_end')
   89 pc={(ltu(cc:CC,0x0))?L86:pc}
      REG_DEAD: cc:CC
      REG_BR_PROB: 0x238c
   92 NOTE_INSN_BASIC_BLOCK


Then auto-inc-dec comes along and notices the opportunity to merge the load and
increment in insns 80 and 82:

starting bb 3
   82 r203:SI=r203:SI+0x4
   81 [r205:SI-0x4]=r206:SI
      REG_DEAD: r206:SI
   81 [r205:SI-0x4]=r206:SI
      REG_DEAD: r206:SI
found mem(81) *(r[205]+-4)
   80 r206:SI=[r203:SI]
   80 r206:SI=[r203:SI]
found mem(80) *(r[203]+0)
   82 r203:SI=r203:SI+0x4
found post inc(82) r[203]+=4
trying SIMPLE_POST_INC
rescanning insn with uid = 80.
deleting insn with uid = 80.
deleting insn with uid = 82.
****success    80 r206:SI=[r203:SI++]
      REG_INC: r203:SI


Merging the store and increment of insns 81 and 85 is never tried because the
insns are not in the same basic block.

Bernd Schmidt has patches that probably address this case of insns in different
basic blocks.  I will give his patches a try to see if it helps for ARM (the
patches were written for Blackfin).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27016


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug middle-end/27016] ARM optimizer produces severely suboptimal code
  2006-04-04  8:12 [Bug c/27016] New: ARM optimizer produces severely suboptimal code Eric dot Doenges at betty-tv dot com
                   ` (3 preceding siblings ...)
  2009-06-30 13:21 ` steven at gcc dot gnu dot org
@ 2009-06-30 13:27 ` steven at gcc dot gnu dot org
  2010-01-02  1:06 ` steven at gcc dot gnu dot org
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: steven at gcc dot gnu dot org @ 2009-06-30 13:27 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from steven at gcc dot gnu dot org  2009-06-30 13:27 -------
Compiling with "./cc1 -Os t.c -fno-ivopts" I get the following code:

        .global testme
        .type   testme, %function
testme:
        @ Function supports interworking.
        @ args = 0, pretend = 0, frame = 0
        @ frame_needed = 0, uses_anonymous_args = 0
        @ link register save eliminated.
        ldr     r2, .L5
        ldr     r3, .L5+4
        ldr     r1, .L5+8
        b       .L2
.L3:
        ldr     r0, [r3], #4
        str     r0, [r2], #4
.L2:
        cmp     r3, r1
        bcc     .L3
        bx      lr


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27016


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug middle-end/27016] ARM optimizer produces severely suboptimal code
  2006-04-04  8:12 [Bug c/27016] New: ARM optimizer produces severely suboptimal code Eric dot Doenges at betty-tv dot com
                   ` (4 preceding siblings ...)
  2009-06-30 13:27 ` steven at gcc dot gnu dot org
@ 2010-01-02  1:06 ` steven at gcc dot gnu dot org
  2010-02-12 22:52 ` [Bug target/27016] [4.3/4.4/4.5 Regression] " steven at gcc dot gnu dot org
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: steven at gcc dot gnu dot org @ 2010-01-02  1:06 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from steven at gcc dot gnu dot org  2010-01-02 01:05 -------
Comment #3 and comment #5 still true with "GCC: (GNU) 4.5.0 20090808
(experimental) [trunk revision 150579]" -> reconfirmed.


-- 

steven at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|2009-06-30 11:35:26         |2010-01-02 01:05:52
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27016


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/27016] [4.3/4.4/4.5 Regression] ARM optimizer produces severely suboptimal code
  2006-04-04  8:12 [Bug c/27016] New: ARM optimizer produces severely suboptimal code Eric dot Doenges at betty-tv dot com
                   ` (5 preceding siblings ...)
  2010-01-02  1:06 ` steven at gcc dot gnu dot org
@ 2010-02-12 22:52 ` steven at gcc dot gnu dot org
  2010-02-16 10:27 ` rguenth at gcc dot gnu dot org
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: steven at gcc dot gnu dot org @ 2010-02-12 22:52 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #7 from steven at gcc dot gnu dot org  2010-02-12 22:52 -------
If arm_arm_address_cost is "fixed" to return 1 unconditionally, the expected
code of comment #5 comes out at -Os, with the bonus of one less branch:

testme:
        ldr     r2, .L4
        ldr     r1, .L4+4
        sub     r3, r2, #400
.L2:
        ldr     r0, [r1], #4
        str     r0, [r3], #4
        cmp     r3, r2
        bne     .L2
        bx      lr

This is actually a regression from GCC 3.4.6.


-- 

steven at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rearnsha at gcc dot gnu dot
                   |                            |org
          Component|middle-end                  |target
 GCC target triplet|arm-elf-unknown             |arm-elf, arm-eabi
      Known to fail|                            |4.4.0 4.5.0
      Known to work|                            |3.4.6
            Summary|ARM optimizer produces      |[4.3/4.4/4.5 Regression] ARM
                   |severely suboptimal code    |optimizer produces severely
                   |                            |suboptimal code


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27016


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/27016] [4.3/4.4/4.5 Regression] ARM optimizer produces severely suboptimal code
  2006-04-04  8:12 [Bug c/27016] New: ARM optimizer produces severely suboptimal code Eric dot Doenges at betty-tv dot com
                   ` (6 preceding siblings ...)
  2010-02-12 22:52 ` [Bug target/27016] [4.3/4.4/4.5 Regression] " steven at gcc dot gnu dot org
@ 2010-02-16 10:27 ` rguenth at gcc dot gnu dot org
  2010-02-17 17:09 ` mmitchel at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-02-16 10:27 UTC (permalink / raw)
  To: gcc-bugs



-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |4.3.5


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27016


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/27016] [4.3/4.4/4.5 Regression] ARM optimizer produces severely suboptimal code
  2006-04-04  8:12 [Bug c/27016] New: ARM optimizer produces severely suboptimal code Eric dot Doenges at betty-tv dot com
                   ` (7 preceding siblings ...)
  2010-02-16 10:27 ` rguenth at gcc dot gnu dot org
@ 2010-02-17 17:09 ` mmitchel at gcc dot gnu dot org
  2010-02-21 14:32 ` steven at gcc dot gnu dot org
  2010-02-26  1:23 ` law at redhat dot com
  10 siblings, 0 replies; 12+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2010-02-17 17:09 UTC (permalink / raw)
  To: gcc-bugs



-- 

mmitchel at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P3                          |P2


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27016


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/27016] [4.3/4.4/4.5 Regression] ARM optimizer produces severely suboptimal code
  2006-04-04  8:12 [Bug c/27016] New: ARM optimizer produces severely suboptimal code Eric dot Doenges at betty-tv dot com
                   ` (8 preceding siblings ...)
  2010-02-17 17:09 ` mmitchel at gcc dot gnu dot org
@ 2010-02-21 14:32 ` steven at gcc dot gnu dot org
  2010-02-26  1:23 ` law at redhat dot com
  10 siblings, 0 replies; 12+ messages in thread
From: steven at gcc dot gnu dot org @ 2010-02-21 14:32 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #8 from steven at gcc dot gnu dot org  2010-02-21 14:32 -------
I have played with CSiBE with this patch:

-------------------------------------------------- 8< -------------
--- ../../gcc/gcc/config/arm/arm.c      2010-02-12 21:45:29.000000000 +0100
+++ ../../combined/gcc/config/arm/arm.c 2010-02-21 14:54:23.000000000 +0100
@@ -7448,7 +7448,7 @@
 arm_arm_address_cost (rtx x)
 {
   enum rtx_code c  = GET_CODE (x);
-
+if (1) return 1;
   if (c == PRE_INC || c == PRE_DEC || c == POST_INC || c == POST_DEC)
     return 0;
   if (c == MEM || c == LABEL_REF || c == SYMBOL_REF)
-------------------------------------------------- 8< -------------

I tried CSiBE on arm-eabi with "-Os -mabi=aapcs-linux" (latter flag is required
for mpeg2dec-0.3.1), but I only looked at size because I have no ARM board to
test speed on.

Overall the patch results in smaller code:

   text    data     bss     dec     hex filename
3226845  316581  561788 4105214  3ea3fe (TOTALS)            unpatched
3226269  316581  561788 4104638  3ea1be (TOTALS)            patched

For the individual files the results vary. A more detailed investigation is
necessary to figure out when a patch like the above is a win, and when it is
not.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27016


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/27016] [4.3/4.4/4.5 Regression] ARM optimizer produces severely suboptimal code
  2006-04-04  8:12 [Bug c/27016] New: ARM optimizer produces severely suboptimal code Eric dot Doenges at betty-tv dot com
                   ` (9 preceding siblings ...)
  2010-02-21 14:32 ` steven at gcc dot gnu dot org
@ 2010-02-26  1:23 ` law at redhat dot com
  10 siblings, 0 replies; 12+ messages in thread
From: law at redhat dot com @ 2010-02-26  1:23 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #9 from law at redhat dot com  2010-02-26 01:23 -------
The trunk now generates:

        ldr     r3, .L4
        ldr     r2, .L4+4
        ldr     r1, .L4+8
        b       .L2
.L3:
        ldr     r0, [r3], #4
        str     r0, [r2], #4
.L2:
        cmp     r3, r1
        bcc     .L3
        bx      lr
.L5:

This is probably a direct result of Bernd's work to improve auto-inc addressing
modes, particularly the ability to discover them when the memref and increment
are in different basic blocks.


-- 

law at redhat dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27016


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2010-02-26  1:23 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-04-04  8:12 [Bug c/27016] New: ARM optimizer produces severely suboptimal code Eric dot Doenges at betty-tv dot com
2006-04-04  8:16 ` [Bug middle-end/27016] " pinskia at gcc dot gnu dot org
2006-04-04  9:14 ` Eric dot Doenges at betty-tv dot com
2009-06-30 11:35 ` steven at gcc dot gnu dot org
2009-06-30 13:21 ` steven at gcc dot gnu dot org
2009-06-30 13:27 ` steven at gcc dot gnu dot org
2010-01-02  1:06 ` steven at gcc dot gnu dot org
2010-02-12 22:52 ` [Bug target/27016] [4.3/4.4/4.5 Regression] " steven at gcc dot gnu dot org
2010-02-16 10:27 ` rguenth at gcc dot gnu dot org
2010-02-17 17:09 ` mmitchel at gcc dot gnu dot org
2010-02-21 14:32 ` steven at gcc dot gnu dot org
2010-02-26  1:23 ` law at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).