public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/39819]  New: Missed optimisation when setting 4-byte values
@ 2009-04-19 21:09 david dot brown at hesbynett dot no
  2009-04-19 22:26 ` [Bug target/39819] [avr] " eric dot weddington at atmel dot com
  2009-08-21 19:28 ` eric dot weddington at atmel dot com
  0 siblings, 2 replies; 4+ messages in thread
From: david dot brown at hesbynett dot no @ 2009-04-19 21:09 UTC (permalink / raw)
  To: gcc-bugs

avr-gcc misses a number of optimisations when copying 4-byte values or
assigning a single byte value to 4 byte values.  The issue actually applies to
other sized values as well, but since 4 byte values are common (such as for
32-bit ints, and for floats) the issue is especially relevant.

In summary, the compiler tends to produce code that is either a series of
direct memory accesses, or uses indirect access (through Z) in a loop.  A
better choice would often be to set up Z as a pointer, then unroll the indirect
pointer loop.

All code was compiled using avr-gcc 4.3.2 from winavr-20090313, using -Os.

Look at the code:

typedef unsigned char uint8_t;
typedef unsigned long int uint32_t;

static uint8_t as[4];
static uint8_t bs[4];

void foo1(void) {
        for (uint8_t i = 0; i < sz; i++) {
                bs[i] = as[1];
        }
}

void foo2(void) {
        for (uint8_t i = 0; i < sz; i++) {
                *(bs + i) = *(as + 1);
        }
}

foo1 compiles to:

lds r24, as+1
sts bs, r24
sts bs+1, r24
sts bs+2, r24
sts bs+3, r24
ret

Excluding the "ret", this is 10 words and 10 cycles.

foo2 is logically identical (array access and pointer access are the same
thing), but compiles to:

lds r24, as+1
ldi r30, lo8(bs)
ldi r31, hi8(bs)
.L1:
st Z+, r24
ldi r25, hi8(bs+4)
cpi r30, lo8(bs+4)
cpc r31, r25
brne L1
ret

Excluding the "ret", this is 9 words and 31 cycles (27 on the XMega).  Hoisting
the "ldi r25, hi8(bs+4)" above the label would save four cycles.

An implementation that is smaller than both of these, and slightly slower on
the Mega and slightly faster on the XMega, is:

lds r24, as+1
ldi r30, lo8(bs)
ldi r31, hi8(bs)
st Z+, r24
st Z+, r24
st Z+, r24
st Z+, r24
ret

Excluding the "ret" this is 8 words, and 12 cycles (8 on the XMega).


For the code:

static uint32_t al, bl;
static float af;

void foo3(void) {
        al = 0;
}

void foo4(void) {
        af = 0;
}

we get:

foo3:
sts al, __zero_reg__
sts (al)+1, __zero_reg__
sts (al)+2, __zero_reg__
sts (al)+3, __zero_reg__
ret

That's 8 words and 8 cycles (plus "ret").  Using

ldi r30, lo8(bs)
ldi r31, hi8(bs)
st Z+, __zero_reg__
st Z+, __zero_reg__
st Z+, __zero_reg__
st Z+, __zero_reg__
ret

Gives 6 words and 10 cycles, or 6 cycles on the XMega (plus "ret")

Function foo4() should of course give the same code, but instead compiles to
the very inefficient:

foo4:
ldi r24, lo8(0x00)
ldi r25, hi8(0x00)
ldi r26, hlo8(0x00)
ldi r27, hhi8(0x00)
sts af, __zero_reg__
sts (af)+1, __zero_reg__
sts (af)+2, __zero_reg__
sts (af)+3, __zero_reg__
ret

That's 12 words and 12 cycles, and uses 4 registers unnecessarily.


Similar code is produced when copying values:

void foo5(void) {
        al = bl;
}

compiles to:

foo5:
lds r24, bl
lds r25, (bl) + 1
lds r26, (bl) + 2
lds r27, (bl) + 3
sts al, r24
sts (al) + 1, r25
sts (al) + 2, r26
sts (al) + 3, r27

Using the Z and either X or Y pointers would make this code slightly smaller
but marginally slower on the Mega (and marginally faster on the XMega).  Even
without that, re-arranging the code would allow a single register to be used
rather than four.

ret


-- 
           Summary: Missed optimisation when setting 4-byte values
           Product: gcc
           Version: 4.3.2
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: c
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: david dot brown at hesbynett dot no
  GCC host triplet: mingw
GCC target triplet: avr-gcc


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39819


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/39819] [avr] Missed optimisation when setting 4-byte values
  2009-04-19 21:09 [Bug c/39819] New: Missed optimisation when setting 4-byte values david dot brown at hesbynett dot no
@ 2009-04-19 22:26 ` eric dot weddington at atmel dot com
  2009-08-21 19:28 ` eric dot weddington at atmel dot com
  1 sibling, 0 replies; 4+ messages in thread
From: eric dot weddington at atmel dot com @ 2009-04-19 22:26 UTC (permalink / raw)
  To: gcc-bugs



-- 

eric dot weddington at atmel dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |eric dot weddington at atmel
                   |                            |dot com
           Severity|enhancement                 |normal
          Component|c                           |target
   GCC host triplet|mingw                       |
 GCC target triplet|avr-gcc                     |avr-*-*
           Keywords|                            |missed-optimization
            Summary|Missed optimisation when    |[avr] Missed optimisation
                   |setting 4-byte values       |when setting 4-byte values


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39819


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/39819] [avr] Missed optimisation when setting 4-byte values
  2009-04-19 21:09 [Bug c/39819] New: Missed optimisation when setting 4-byte values david dot brown at hesbynett dot no
  2009-04-19 22:26 ` [Bug target/39819] [avr] " eric dot weddington at atmel dot com
@ 2009-08-21 19:28 ` eric dot weddington at atmel dot com
  1 sibling, 0 replies; 4+ messages in thread
From: eric dot weddington at atmel dot com @ 2009-08-21 19:28 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from eric dot weddington at atmel dot com  2009-08-21 19:28 -------
Confirmed on 4.3.2.


-- 

eric dot weddington at atmel dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|0                           |1
      Known to fail|                            |4.3.2
   Last reconfirmed|0000-00-00 00:00:00         |2009-08-21 19:28:03
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39819


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/39819] [avr] Missed optimisation when setting 4-byte values
       [not found] <bug-39819-4@http.gcc.gnu.org/bugzilla/>
@ 2011-07-02 21:25 ` gjl at gcc dot gnu.org
  0 siblings, 0 replies; 4+ messages in thread
From: gjl at gcc dot gnu.org @ 2011-07-02 21:25 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39819

Georg-Johann Lay <gjl at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
                 CC|                            |gjl at gcc dot gnu.org
      Known to work|                            |4.6.1
         Resolution|                            |WONTFIX

--- Comment #2 from Georg-Johann Lay <gjl at gcc dot gnu.org> 2011-07-02 21:23:59 UTC ---
Closed as WONTFIX.

Compiled the following code in 4.6.1, -std=c99 -mmcu=atmega88 -Os

typedef unsigned char uint8_t;
typedef unsigned long int uint32_t;

uint8_t as[4];
uint8_t bs[4];
static const uint8_t sz = 4;

void foo1(void) {
    for (uint8_t i = 0; i < sz; i++) {
        bs[i] = as[1];
    }
}

void foo2(void) {
    for (uint8_t i = 0; i < sz; i++) {
        *(bs + i) = *(as + 1);
    }
}

The result is:

foo1:
    lds r24,as+1
    sts bs,r24
    sts bs+1,r24
    sts bs+2,r24
    sts bs+3,r24
    ret

foo2:
    lds r24,as+1
    sts bs,r24
    sts bs+1,r24
    sts bs+2,r24
    sts bs+3,r24
    ret

So the difference has gone.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-07-02 21:25 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-04-19 21:09 [Bug c/39819] New: Missed optimisation when setting 4-byte values david dot brown at hesbynett dot no
2009-04-19 22:26 ` [Bug target/39819] [avr] " eric dot weddington at atmel dot com
2009-08-21 19:28 ` eric dot weddington at atmel dot com
     [not found] <bug-39819-4@http.gcc.gnu.org/bugzilla/>
2011-07-02 21:25 ` gjl at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).