public inbox for gcc-prs@sourceware.org
help / color / mirror / Atom feed
* optimization/3977: arm peephole for loading two consecutive memory locations generates suboptimal code (on arm7tdmi)
@ 2001-08-09 12:46 segher
0 siblings, 0 replies; 2+ messages in thread
From: segher @ 2001-08-09 12:46 UTC (permalink / raw)
To: gcc-gnats
>Number: 3977
>Category: optimization
>Synopsis: arm peephole for loading two consecutive memory locations generates suboptimal code (on arm7tdmi)
>Confidential: no
>Severity: non-critical
>Priority: medium
>Responsible: unassigned
>State: open
>Class: pessimizes-code
>Submitter-Id: net
>Arrival-Date: Thu Aug 09 12:46:01 PDT 2001
>Closed-Date:
>Last-Modified:
>Originator: segher@chello.nl
>Release: 3.0
>Organization:
>Environment:
>Description:
I often see generated code like
add temp, pointer, #offset
ldmia temp, {regA, regB}
(after which temp is dead)
which is slower than just
ldr regA, [pointer, #offset]
ldr regB, [pointer, #offset+4]
and wastes a register as well
>How-To-Repeat:
a lot of code will do this
>Fix:
change the peephole, i think
>Release-Note:
>Audit-Trail:
>Unformatted:
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: optimization/3977: arm peephole for loading two consecutive memory locations generates suboptimal code (on arm7tdmi)
@ 2001-08-10 3:20 rearnsha
0 siblings, 0 replies; 2+ messages in thread
From: rearnsha @ 2001-08-10 3:20 UTC (permalink / raw)
To: gcc-bugs, gcc-prs, nobody, rearnsha, segher
Synopsis: arm peephole for loading two consecutive memory locations generates suboptimal code (on arm7tdmi)
Responsible-Changed-From-To: unassigned->rearnsha
Responsible-Changed-By: rearnsha
Responsible-Changed-When: Fri Aug 10 03:20:57 2001
Responsible-Changed-Why:
Mine
State-Changed-From-To: open->closed
State-Changed-By: rearnsha
State-Changed-When: Fri Aug 10 03:20:57 2001
State-Changed-Why:
Your analysis is incorrect (at least for the ARM7TDMI).
An LDR instruction takes 3 cycles (of which 2 are N-cycles --
non-sequential memory accesses, and 1 I-cycle -- idle).
An add instruction takes 1 cycle (normally an S-cycle -- sequential)
A k-word LDM instruction takes 2+k cycles of which 2 are N-cycles
and k-1 are S-cycles, giving 2N+1S+1I for the 2-word example in this
case.
So for the code generated we have 1S + (2N + 1I + 1S) = 2N+2S+I
and for the two LDR instructions we have 2x(2N+I) = 4N+2I
On most memory systems I cycles and S cyles will be the same
duration, but N cycles will typically be twice as long as S cycles
so you can easily see that the LDM sequence will in fact execute
more quickly.
It is also incorrect that this requires an additional scratch register
-- we can always use one of the registers we are about to load
as the scratch.
http://gcc.gnu.org/cgi-bin/gnatsweb.pl?cmd=view&pr=3977&database=gcc
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2001-08-10 3:20 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-08-09 12:46 optimization/3977: arm peephole for loading two consecutive memory locations generates suboptimal code (on arm7tdmi) segher
2001-08-10 3:20 rearnsha
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).