public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/102135] New: (ARM Cortex-M3 and newer) changing operation order may reduce number of instructions needed
@ 2021-08-30 20:24 jankowski938 at gmail dot com
2021-08-31 13:27 ` [Bug target/102135] " rearnsha at gcc dot gnu.org
0 siblings, 1 reply; 2+ messages in thread
From: jankowski938 at gmail dot com @ 2021-08-30 20:24 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102135
Bug ID: 102135
Summary: (ARM Cortex-M3 and newer) changing operation order
may reduce number of instructions needed
Product: gcc
Version: 10.2.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jankowski938 at gmail dot com
Target Milestone: ---
uint64_t foo64(const uint8_t *rData1)
{
uint64_t buffer;
buffer = (((uint64_t)rData1[7]) << 56)|((uint64_t)(rData1[6]) <<
48)|((uint64_t)(rData1[5]) << 40)|(((uint64_t)rData1[4]) << 32)|
(((uint64_t)rData1[3]) <<
24)|(((uint64_t)rData1[2]) << 16)|((uint64_t)(rData1[1]) << 8)|rData1[0];
----
foo64:
mov r3, r0
ldr r0, [r0] @ unaligned
ldr r1, [r3, #4] @ unaligned
bx lr
Only 3 instructions are needed:
ldr r1, [r0, #4] @ unaligned
ldr r0, [r0] @ unaligned
bx lr
Options:
-O3 -mthumb -mcpu=cortex-M4
^ permalink raw reply [flat|nested] 2+ messages in thread
* [Bug target/102135] (ARM Cortex-M3 and newer) changing operation order may reduce number of instructions needed
2021-08-30 20:24 [Bug target/102135] New: (ARM Cortex-M3 and newer) changing operation order may reduce number of instructions needed jankowski938 at gmail dot com
@ 2021-08-31 13:27 ` rearnsha at gcc dot gnu.org
0 siblings, 0 replies; 2+ messages in thread
From: rearnsha at gcc dot gnu.org @ 2021-08-31 13:27 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102135
--- Comment #1 from Richard Earnshaw <rearnsha at gcc dot gnu.org> ---
A small change to the testcase shows that this is highly dependent on the
constrained registers from the calling convention.
uint64_t foo64(int dummy, const uint8_t *rData1)
{
uint64_t buffer;
buffer = (((uint64_t)rData1[7]) << 56)|((uint64_t)(rData1[6]) <<
48)|((uint64_t)(rData1[5]) << 40)|(((uint64_t)rData1[4]) << 32)|
(((uint64_t)rData1[3]) <<
24)|(((uint64_t)rData1[2]) << 16)|((uint64_t)(rData1[1]) << 8)|rData1[0];
}
Register allocation does not re-order code in order to reduce the conflicts, so
this is not easy to fix.
This is also a problem that is more obvious in micro-testcases such as this
example, in real code it is more common for the register allocator to have more
freedom and to be able to avoid issues like this. If your programming style is
to write functions like this you'd likely get better code overall by marking
these very small functions as inline, so that they do not incur the call setup
and call/return overhead, which can be significant when you take into account
the number of registers that must be saved over a function call.
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2021-08-31 13:27 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-30 20:24 [Bug target/102135] New: (ARM Cortex-M3 and newer) changing operation order may reduce number of instructions needed jankowski938 at gmail dot com
2021-08-31 13:27 ` [Bug target/102135] " rearnsha at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).