public inbox for gcc-prs@sourceware.org
help / color / mirror / Atom feed
* optimization/3977: arm peephole for loading two consecutive memory locations generates suboptimal code (on arm7tdmi)
@ 2001-08-09 12:46 segher
  0 siblings, 0 replies; 2+ messages in thread
From: segher @ 2001-08-09 12:46 UTC (permalink / raw)
  To: gcc-gnats

>Number:         3977
>Category:       optimization
>Synopsis:       arm peephole for loading two consecutive memory locations generates suboptimal code (on arm7tdmi)
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    unassigned
>State:          open
>Class:          pessimizes-code
>Submitter-Id:   net
>Arrival-Date:   Thu Aug 09 12:46:01 PDT 2001
>Closed-Date:
>Last-Modified:
>Originator:     segher@chello.nl
>Release:        3.0
>Organization:
>Environment:

>Description:
I often see generated code like

add   temp, pointer, #offset
ldmia temp, {regA, regB}

(after which temp is dead)

which is slower than just

ldr regA, [pointer, #offset]
ldr regB, [pointer, #offset+4]

and wastes a register as well
>How-To-Repeat:
a lot of code will do this
>Fix:
change the peephole, i think
>Release-Note:
>Audit-Trail:
>Unformatted:


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: optimization/3977: arm peephole for loading two consecutive memory locations generates suboptimal code (on arm7tdmi)
@ 2001-08-10  3:20 rearnsha
  0 siblings, 0 replies; 2+ messages in thread
From: rearnsha @ 2001-08-10  3:20 UTC (permalink / raw)
  To: gcc-bugs, gcc-prs, nobody, rearnsha, segher

Synopsis: arm peephole for loading two consecutive memory locations generates suboptimal code (on arm7tdmi)

Responsible-Changed-From-To: unassigned->rearnsha
Responsible-Changed-By: rearnsha
Responsible-Changed-When: Fri Aug 10 03:20:57 2001
Responsible-Changed-Why:
    Mine
State-Changed-From-To: open->closed
State-Changed-By: rearnsha
State-Changed-When: Fri Aug 10 03:20:57 2001
State-Changed-Why:
    Your analysis is incorrect (at least for the ARM7TDMI).
    An LDR instruction takes 3 cycles (of which 2 are N-cycles --
    non-sequential memory accesses, and 1 I-cycle -- idle). 
    An add instruction takes 1 cycle (normally an S-cycle -- sequential)
    A k-word LDM instruction takes 2+k cycles of which 2 are N-cycles
    and k-1 are S-cycles, giving 2N+1S+1I for the 2-word example in this
    case.
    
    So for the code generated we have 1S + (2N + 1I + 1S) = 2N+2S+I
    and for the two LDR instructions we have 2x(2N+I) = 4N+2I
    
    On most memory systems I cycles and S cyles will be the same
    duration, but N cycles will typically be twice as long as S cycles
    so you can easily see that the LDM sequence will in fact execute
    more quickly.
    
    It is also incorrect that this requires an additional scratch register
    -- we can always use one of the registers we are about to load
    as the scratch.

http://gcc.gnu.org/cgi-bin/gnatsweb.pl?cmd=view&pr=3977&database=gcc


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2001-08-10  3:20 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-08-09 12:46 optimization/3977: arm peephole for loading two consecutive memory locations generates suboptimal code (on arm7tdmi) segher
2001-08-10  3:20 rearnsha

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).