public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/39819] New: Missed optimisation when setting 4-byte values
@ 2009-04-19 21:09 david dot brown at hesbynett dot no
2009-04-19 22:26 ` [Bug target/39819] [avr] " eric dot weddington at atmel dot com
2009-08-21 19:28 ` eric dot weddington at atmel dot com
0 siblings, 2 replies; 4+ messages in thread
From: david dot brown at hesbynett dot no @ 2009-04-19 21:09 UTC (permalink / raw)
To: gcc-bugs
avr-gcc misses a number of optimisations when copying 4-byte values or
assigning a single byte value to 4 byte values. The issue actually applies to
other sized values as well, but since 4 byte values are common (such as for
32-bit ints, and for floats) the issue is especially relevant.
In summary, the compiler tends to produce code that is either a series of
direct memory accesses, or uses indirect access (through Z) in a loop. A
better choice would often be to set up Z as a pointer, then unroll the indirect
pointer loop.
All code was compiled using avr-gcc 4.3.2 from winavr-20090313, using -Os.
Look at the code:
typedef unsigned char uint8_t;
typedef unsigned long int uint32_t;
static uint8_t as[4];
static uint8_t bs[4];
void foo1(void) {
for (uint8_t i = 0; i < sz; i++) {
bs[i] = as[1];
}
}
void foo2(void) {
for (uint8_t i = 0; i < sz; i++) {
*(bs + i) = *(as + 1);
}
}
foo1 compiles to:
lds r24, as+1
sts bs, r24
sts bs+1, r24
sts bs+2, r24
sts bs+3, r24
ret
Excluding the "ret", this is 10 words and 10 cycles.
foo2 is logically identical (array access and pointer access are the same
thing), but compiles to:
lds r24, as+1
ldi r30, lo8(bs)
ldi r31, hi8(bs)
.L1:
st Z+, r24
ldi r25, hi8(bs+4)
cpi r30, lo8(bs+4)
cpc r31, r25
brne L1
ret
Excluding the "ret", this is 9 words and 31 cycles (27 on the XMega). Hoisting
the "ldi r25, hi8(bs+4)" above the label would save four cycles.
An implementation that is smaller than both of these, and slightly slower on
the Mega and slightly faster on the XMega, is:
lds r24, as+1
ldi r30, lo8(bs)
ldi r31, hi8(bs)
st Z+, r24
st Z+, r24
st Z+, r24
st Z+, r24
ret
Excluding the "ret" this is 8 words, and 12 cycles (8 on the XMega).
For the code:
static uint32_t al, bl;
static float af;
void foo3(void) {
al = 0;
}
void foo4(void) {
af = 0;
}
we get:
foo3:
sts al, __zero_reg__
sts (al)+1, __zero_reg__
sts (al)+2, __zero_reg__
sts (al)+3, __zero_reg__
ret
That's 8 words and 8 cycles (plus "ret"). Using
ldi r30, lo8(bs)
ldi r31, hi8(bs)
st Z+, __zero_reg__
st Z+, __zero_reg__
st Z+, __zero_reg__
st Z+, __zero_reg__
ret
Gives 6 words and 10 cycles, or 6 cycles on the XMega (plus "ret")
Function foo4() should of course give the same code, but instead compiles to
the very inefficient:
foo4:
ldi r24, lo8(0x00)
ldi r25, hi8(0x00)
ldi r26, hlo8(0x00)
ldi r27, hhi8(0x00)
sts af, __zero_reg__
sts (af)+1, __zero_reg__
sts (af)+2, __zero_reg__
sts (af)+3, __zero_reg__
ret
That's 12 words and 12 cycles, and uses 4 registers unnecessarily.
Similar code is produced when copying values:
void foo5(void) {
al = bl;
}
compiles to:
foo5:
lds r24, bl
lds r25, (bl) + 1
lds r26, (bl) + 2
lds r27, (bl) + 3
sts al, r24
sts (al) + 1, r25
sts (al) + 2, r26
sts (al) + 3, r27
Using the Z and either X or Y pointers would make this code slightly smaller
but marginally slower on the Mega (and marginally faster on the XMega). Even
without that, re-arranging the code would allow a single register to be used
rather than four.
ret
--
Summary: Missed optimisation when setting 4-byte values
Product: gcc
Version: 4.3.2
Status: UNCONFIRMED
Severity: enhancement
Priority: P3
Component: c
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: david dot brown at hesbynett dot no
GCC host triplet: mingw
GCC target triplet: avr-gcc
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39819
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug target/39819] [avr] Missed optimisation when setting 4-byte values
2009-04-19 21:09 [Bug c/39819] New: Missed optimisation when setting 4-byte values david dot brown at hesbynett dot no
@ 2009-04-19 22:26 ` eric dot weddington at atmel dot com
2009-08-21 19:28 ` eric dot weddington at atmel dot com
1 sibling, 0 replies; 4+ messages in thread
From: eric dot weddington at atmel dot com @ 2009-04-19 22:26 UTC (permalink / raw)
To: gcc-bugs
--
eric dot weddington at atmel dot com changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |eric dot weddington at atmel
| |dot com
Severity|enhancement |normal
Component|c |target
GCC host triplet|mingw |
GCC target triplet|avr-gcc |avr-*-*
Keywords| |missed-optimization
Summary|Missed optimisation when |[avr] Missed optimisation
|setting 4-byte values |when setting 4-byte values
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39819
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug target/39819] [avr] Missed optimisation when setting 4-byte values
2009-04-19 21:09 [Bug c/39819] New: Missed optimisation when setting 4-byte values david dot brown at hesbynett dot no
2009-04-19 22:26 ` [Bug target/39819] [avr] " eric dot weddington at atmel dot com
@ 2009-08-21 19:28 ` eric dot weddington at atmel dot com
1 sibling, 0 replies; 4+ messages in thread
From: eric dot weddington at atmel dot com @ 2009-08-21 19:28 UTC (permalink / raw)
To: gcc-bugs
------- Comment #1 from eric dot weddington at atmel dot com 2009-08-21 19:28 -------
Confirmed on 4.3.2.
--
eric dot weddington at atmel dot com changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Ever Confirmed|0 |1
Known to fail| |4.3.2
Last reconfirmed|0000-00-00 00:00:00 |2009-08-21 19:28:03
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39819
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug target/39819] [avr] Missed optimisation when setting 4-byte values
[not found] <bug-39819-4@http.gcc.gnu.org/bugzilla/>
@ 2011-07-02 21:25 ` gjl at gcc dot gnu.org
0 siblings, 0 replies; 4+ messages in thread
From: gjl at gcc dot gnu.org @ 2011-07-02 21:25 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39819
Georg-Johann Lay <gjl at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
CC| |gjl at gcc dot gnu.org
Known to work| |4.6.1
Resolution| |WONTFIX
--- Comment #2 from Georg-Johann Lay <gjl at gcc dot gnu.org> 2011-07-02 21:23:59 UTC ---
Closed as WONTFIX.
Compiled the following code in 4.6.1, -std=c99 -mmcu=atmega88 -Os
typedef unsigned char uint8_t;
typedef unsigned long int uint32_t;
uint8_t as[4];
uint8_t bs[4];
static const uint8_t sz = 4;
void foo1(void) {
for (uint8_t i = 0; i < sz; i++) {
bs[i] = as[1];
}
}
void foo2(void) {
for (uint8_t i = 0; i < sz; i++) {
*(bs + i) = *(as + 1);
}
}
The result is:
foo1:
lds r24,as+1
sts bs,r24
sts bs+1,r24
sts bs+2,r24
sts bs+3,r24
ret
foo2:
lds r24,as+1
sts bs,r24
sts bs+1,r24
sts bs+2,r24
sts bs+3,r24
ret
So the difference has gone.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2011-07-02 21:25 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-04-19 21:09 [Bug c/39819] New: Missed optimisation when setting 4-byte values david dot brown at hesbynett dot no
2009-04-19 22:26 ` [Bug target/39819] [avr] " eric dot weddington at atmel dot com
2009-08-21 19:28 ` eric dot weddington at atmel dot com
[not found] <bug-39819-4@http.gcc.gnu.org/bugzilla/>
2011-07-02 21:25 ` gjl at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).