public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/53938] New: ARM target generates sub-optimal code (extra instructions) on load from memory
@ 2012-07-12 10:58 gregpsmith at live dot co.uk
  2012-07-12 11:27 ` [Bug target/53938] " rguenth at gcc dot gnu.org
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: gregpsmith at live dot co.uk @ 2012-07-12 10:58 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53938

             Bug #: 53938
           Summary: ARM target generates sub-optimal code (extra
                    instructions) on load from memory
    Classification: Unclassified
           Product: gcc
           Version: 4.6.2
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: gregpsmith@live.co.uk


Created attachment 27781
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27781
Example C source code

We are targetting an embedded device and we do a lot of work accessing an FPGA
(but this applies just as well to memory access). It has annoyed me for years
that the GCC compiler emits unncessary code, wasting memory and cycles when
reading 8 and 16-bit values.

The attached script shows opportunities to generate better code. when compiled
with:

gcc -c -O3 -mcpu=arm946e-s codegen.c

It compiles to (I have added comments):

<DeviceAccess>
    mov   r2, #0xE0000000  // base address of the device
    ldrb  r1, [r2]         // load an unsigned byte, 0 extend
    ldrb  r12, [r2]        // load signed byte - WHY NOT ldrsb?
    and   r1, r1, #0xFF    // WHAT IS THIS FOR
    ldrh  r3, [r2]         // load unsigned short
    tst   r1, #0x80        // if (i & 0x80)
    movne r1, #0           //     i = 0
    lsl   r12, r12, #24    // sign extend j (but could be avoided)
    tst   r3, #0x80        // if (k & 0x80)
    ldrh  r0, [r2]         // load signed short - WHY NOT ldrsh?
    movne r3, #0           //     k = 0
    add   r1, r1, r12, asr #24 // add sign extended
    add   r3, r1, r3
    lsl   r0, r0, #16      // sign extend l
    add   r0, r3, r0, asr #16
    bx lr

There are two issues:

1) There is a completely redundant and r1,r1,#0xff. This does not occur when
loading the unsigned short (which is why I have the similar code for loading an
unsigned short).
2) There is unneccesary sign extension taking place. ARM has allowed signed
loads of 8 and 16-bit values since v4. Spotting this has to be opportunistic as
there are offset restrictions.

Ideally the code would look like:
    mov   r2, #0xE0000000 // base address of the device
    ldrb  r1, [r2]        // load an unsigned byte, 0 extend
    ldrsb r12, [r2]       // load signed byte
    ldrh  r3, [r2]        // load unsigned short
    tst   r1, #0x80       // if (i & 0x80)
    movne r1, #0          //     i = 0
    tst   r3, #0x80       // if (k & 0x80)
    ldrsh r0, [r2]        // load signed short, extend to 32-bits
    movne r3, #0          //     k = 0
    add   r1, r1, r12     // add sign extended
    add   r3, r1, r3
    add   r0, r3, r0
    bx lr


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/53938] ARM target generates sub-optimal code (extra instructions) on load from memory
  2012-07-12 10:58 [Bug target/53938] New: ARM target generates sub-optimal code (extra instructions) on load from memory gregpsmith at live dot co.uk
@ 2012-07-12 11:27 ` rguenth at gcc dot gnu.org
  2012-07-12 16:10 ` pinskia at gcc dot gnu.org
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-07-12 11:27 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53938

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|                            |arm*-*-*
             Status|UNCONFIRMED                 |WAITING
   Last reconfirmed|                            |2012-07-12
     Ever Confirmed|0                           |1

--- Comment #1 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-07-12 11:27:07 UTC ---
Can you verify if the situation improves with GCC 4.7 or current development
trunk?


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/53938] ARM target generates sub-optimal code (extra instructions) on load from memory
  2012-07-12 10:58 [Bug target/53938] New: ARM target generates sub-optimal code (extra instructions) on load from memory gregpsmith at live dot co.uk
  2012-07-12 11:27 ` [Bug target/53938] " rguenth at gcc dot gnu.org
@ 2012-07-12 16:10 ` pinskia at gcc dot gnu.org
  2012-07-12 19:09 ` gregpsmith at live dot co.uk
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2012-07-12 16:10 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53938

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> 2012-07-12 16:10:09 UTC ---
I think this is the standard volatile vs combine issue.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/53938] ARM target generates sub-optimal code (extra instructions) on load from memory
  2012-07-12 10:58 [Bug target/53938] New: ARM target generates sub-optimal code (extra instructions) on load from memory gregpsmith at live dot co.uk
  2012-07-12 11:27 ` [Bug target/53938] " rguenth at gcc dot gnu.org
  2012-07-12 16:10 ` pinskia at gcc dot gnu.org
@ 2012-07-12 19:09 ` gregpsmith at live dot co.uk
  2013-08-05 21:20 ` rearnsha at gcc dot gnu.org
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: gregpsmith at live dot co.uk @ 2012-07-12 19:09 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53938

--- Comment #3 from Greg Smith <gregpsmith at live dot co.uk> 2012-07-12 19:09:41 UTC ---
(In reply to comment #1)
> Can you verify if the situation improves with GCC 4.7 or current development
> trunk?

I am an end user of the Rowley CrossWorks system and they are not on to 4.7
yet. I  am not really set up to conveniently build my own ARM cross compiler
(from Windows)... though this is not impossible.

However, I see nothing in the 4.7.0 and 4.7.1 release notes to suggest that any
changes have been made in this area. I recall seeing this type of code
generation for several years...


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/53938] ARM target generates sub-optimal code (extra instructions) on load from memory
  2012-07-12 10:58 [Bug target/53938] New: ARM target generates sub-optimal code (extra instructions) on load from memory gregpsmith at live dot co.uk
                   ` (2 preceding siblings ...)
  2012-07-12 19:09 ` gregpsmith at live dot co.uk
@ 2013-08-05 21:20 ` rearnsha at gcc dot gnu.org
  2024-01-16 22:13 ` pinskia at gcc dot gnu.org
  2024-04-24 13:27 ` rsaxvc at gmail dot com
  5 siblings, 0 replies; 7+ messages in thread
From: rearnsha at gcc dot gnu.org @ 2013-08-05 21:20 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53938

Richard Earnshaw <rearnsha at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|WAITING                     |NEW

--- Comment #4 from Richard Earnshaw <rearnsha at gcc dot gnu.org> ---
Trunk as of this weekend still generates:

        mov     r3, #-536870912
        ldrb    r1, [r3]        @ zero_extendqisi2
        ldrb    ip, [r3]        @ zero_extendqisi2
        and     r1, r1, #255
        ldrh    r2, [r3]
        tst     r1, #128
        movne   r1, #0
        tst     r2, #128
        movne   r2, #0
        mov     ip, ip, asl #24
        ldrh    r0, [r3]
        add     r1, r1, ip, asr #24
        add     r2, r1, r2
        mov     r0, r0, asl #16
        add     r0, r2, r0, asr #16
        bx      lr

The real problem is that the RTL expansion passes never generate zero- or
sign-extended values directly.  They expect combine to pick this up. 
Unfortunately, combine won't touch a memory access that is volatile.  

What does still surprise me is that we fail to eliminate the zero-expand
operation.  After expand we have:

(insn 8 7 9 (set (reg:SI 126)
        (zero_extend:SI (mem/v:QI (reg/f:SI 124) [0 MEM[(union io
*)3758096384B].uch+0 S1 A64]))) test.c:30 -1
     (nil))

(insn 9 8 10 (set (reg:QI 125)
        (subreg:QI (reg:SI 126) 0)) test.c:30 -1
     (nil))

(insn 10 9 0 (set (reg/v:SI 111 [ i ])
        (and:SI (subreg:SI (reg:QI 125) 0)
            (const_int 255 [0xff]))) test.c:30 -1
     (nil))

I would have expected at the very least that some pass would have worked out
that regs 126 and 111 are equivalent.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/53938] ARM target generates sub-optimal code (extra instructions) on load from memory
  2012-07-12 10:58 [Bug target/53938] New: ARM target generates sub-optimal code (extra instructions) on load from memory gregpsmith at live dot co.uk
                   ` (3 preceding siblings ...)
  2013-08-05 21:20 ` rearnsha at gcc dot gnu.org
@ 2024-01-16 22:13 ` pinskia at gcc dot gnu.org
  2024-04-24 13:27 ` rsaxvc at gmail dot com
  5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-01-16 22:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53938

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rsaxvc at gmail dot com

--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
*** Bug 113432 has been marked as a duplicate of this bug. ***

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/53938] ARM target generates sub-optimal code (extra instructions) on load from memory
  2012-07-12 10:58 [Bug target/53938] New: ARM target generates sub-optimal code (extra instructions) on load from memory gregpsmith at live dot co.uk
                   ` (4 preceding siblings ...)
  2024-01-16 22:13 ` pinskia at gcc dot gnu.org
@ 2024-04-24 13:27 ` rsaxvc at gmail dot com
  5 siblings, 0 replies; 7+ messages in thread
From: rsaxvc at gmail dot com @ 2024-04-24 13:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53938

--- Comment #6 from rsaxvc at gmail dot com ---
This also impacts Cortex-M0 & M23 on GCC13.2.0, just with the new extension
instructions.

Oddly, when loading a volatile u8 or u16 on Cortex-M3/4/7 does not generate
extra zero extension instructions. But these cores do still have separate
ldrb/ldrb + sxtab/sxtah sign extension instead of LDRSB/LDRSH.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-04-24 13:27 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-12 10:58 [Bug target/53938] New: ARM target generates sub-optimal code (extra instructions) on load from memory gregpsmith at live dot co.uk
2012-07-12 11:27 ` [Bug target/53938] " rguenth at gcc dot gnu.org
2012-07-12 16:10 ` pinskia at gcc dot gnu.org
2012-07-12 19:09 ` gregpsmith at live dot co.uk
2013-08-05 21:20 ` rearnsha at gcc dot gnu.org
2024-01-16 22:13 ` pinskia at gcc dot gnu.org
2024-04-24 13:27 ` rsaxvc at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).