public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/34737]  New: missed optimization, foo(p); p++ is better then foo(p++)
@ 2008-01-11  8:58 wvangulik at xs4all dot nl
  2008-01-11  8:59 ` [Bug c/34737] " wvangulik at xs4all dot nl
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: wvangulik at xs4all dot nl @ 2008-01-11  8:58 UTC (permalink / raw)
  To: gcc-bugs

Consider the following:

char *x;
volatile int y;

void foo(char *p)
{
    y += *p;
}

void main(void)
{
    char *p1 = x;
    foo(p1++);
    foo(p1++);
    foo(p1++);
    foo(p1++);
    foo(p1++);
    foo(p1++);
    foo(p1++);
    foo(p1++);
    foo(p1++);
    foo(p1++);
}

For the AVR target this will generate ugly code. Having a double saved variable
etc.

/* prologue: frame size=0 */
    push r14
    push r15
    push r16
    push r17
/* prologue end (size=4) */
    lds r24,x
    lds r25,(x)+1
    movw r16,r24
    subi r16,lo8(-(1))
    sbci r17,hi8(-(1))
    call foo
    movw r14,r16
    sec
    adc r14,__zero_reg__
    adc r15,__zero_reg__
    movw r24,r16
    call foo
    movw r16,r14
    subi r16,lo8(-(1))
    sbci r17,hi8(-(1))
    movw r24,r14
    call foo
etc..

The results gets much better when writing it like "foo(p); p++;"

/* prologue: frame size=0 */
        push r16
        push r17
/* prologue end (size=2) */
        movw r16,r24
        call foo
        subi r16,lo8(-(1))
        sbci r17,hi8(-(1))
        movw r24,r16
        call foo
        subi r16,lo8(-(1))
        sbci r17,hi8(-(1))

And the results get near optimal when using larger increments then the target
can add immediately ( >64). The compiler then adds the cumulative offset. Which
would be the most optimal case if also done for lower increments.

        movw r16,r24
        call foo
        movw r24,r16
        subi r24,lo8(-(65))
        sbci r25,hi8(-(65))
        call foo
        movw r24,r16
        subi r24,lo8(-(130))
        sbci r25,hi8(-(130))

This worst behaviour is shown for 4.1.2, 4.2.2, 4.3.0
Better results (still non-optimal) are with 3.4.6 and 3.3.6.
But 4.0.4 is producing the most optimal code for the original foo(p++)

Ugly code is also being seen for arm/thumb and pdp-11.
But good code for arm/arm

So it's a multi-target problem, not just the avr!


-- 
           Summary: missed optimization, foo(p); p++ is better then foo(p++)
           Product: gcc
           Version: 4.3.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: wvangulik at xs4all dot nl
GCC target triplet: multiple-none-none


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34737


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-09-13 11:38 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-01-11  8:58 [Bug c/34737] New: missed optimization, foo(p); p++ is better then foo(p++) wvangulik at xs4all dot nl
2008-01-11  8:59 ` [Bug c/34737] " wvangulik at xs4all dot nl
2008-01-11 10:19 ` [Bug tree-optimization/34737] Inefficient gimplification of post-modified function arguments, TER doesn't do its work rguenth at gcc dot gnu dot org
2008-01-11 11:48 ` [Bug tree-optimization/34737] Scheduling of post-modified function arguments is not good pinskia at gcc dot gnu dot org
2009-06-24  7:42 ` steven at gcc dot gnu dot org
2009-06-24  9:08 ` rguenther at suse dot de
2010-09-13 11:38 ` abnikant dot singh at atmel dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).