public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/34737]  New: missed optimization, foo(p); p++ is better then foo(p++)
@ 2008-01-11  8:58 wvangulik at xs4all dot nl
  2008-01-11  8:59 ` [Bug c/34737] " wvangulik at xs4all dot nl
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: wvangulik at xs4all dot nl @ 2008-01-11  8:58 UTC (permalink / raw)
  To: gcc-bugs

Consider the following:

char *x;
volatile int y;

void foo(char *p)
{
    y += *p;
}

void main(void)
{
    char *p1 = x;
    foo(p1++);
    foo(p1++);
    foo(p1++);
    foo(p1++);
    foo(p1++);
    foo(p1++);
    foo(p1++);
    foo(p1++);
    foo(p1++);
    foo(p1++);
}

For the AVR target this will generate ugly code. Having a double saved variable
etc.

/* prologue: frame size=0 */
    push r14
    push r15
    push r16
    push r17
/* prologue end (size=4) */
    lds r24,x
    lds r25,(x)+1
    movw r16,r24
    subi r16,lo8(-(1))
    sbci r17,hi8(-(1))
    call foo
    movw r14,r16
    sec
    adc r14,__zero_reg__
    adc r15,__zero_reg__
    movw r24,r16
    call foo
    movw r16,r14
    subi r16,lo8(-(1))
    sbci r17,hi8(-(1))
    movw r24,r14
    call foo
etc..

The results gets much better when writing it like "foo(p); p++;"

/* prologue: frame size=0 */
        push r16
        push r17
/* prologue end (size=2) */
        movw r16,r24
        call foo
        subi r16,lo8(-(1))
        sbci r17,hi8(-(1))
        movw r24,r16
        call foo
        subi r16,lo8(-(1))
        sbci r17,hi8(-(1))

And the results get near optimal when using larger increments then the target
can add immediately ( >64). The compiler then adds the cumulative offset. Which
would be the most optimal case if also done for lower increments.

        movw r16,r24
        call foo
        movw r24,r16
        subi r24,lo8(-(65))
        sbci r25,hi8(-(65))
        call foo
        movw r24,r16
        subi r24,lo8(-(130))
        sbci r25,hi8(-(130))

This worst behaviour is shown for 4.1.2, 4.2.2, 4.3.0
Better results (still non-optimal) are with 3.4.6 and 3.3.6.
But 4.0.4 is producing the most optimal code for the original foo(p++)

Ugly code is also being seen for arm/thumb and pdp-11.
But good code for arm/arm

So it's a multi-target problem, not just the avr!


-- 
           Summary: missed optimization, foo(p); p++ is better then foo(p++)
           Product: gcc
           Version: 4.3.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: wvangulik at xs4all dot nl
GCC target triplet: multiple-none-none


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34737


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug c/34737] missed optimization, foo(p); p++ is better then foo(p++)
  2008-01-11  8:58 [Bug c/34737] New: missed optimization, foo(p); p++ is better then foo(p++) wvangulik at xs4all dot nl
@ 2008-01-11  8:59 ` wvangulik at xs4all dot nl
  2008-01-11 10:19 ` [Bug tree-optimization/34737] Inefficient gimplification of post-modified function arguments, TER doesn't do its work rguenth at gcc dot gnu dot org
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: wvangulik at xs4all dot nl @ 2008-01-11  8:59 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from wvangulik at xs4all dot nl  2008-01-11 08:17 -------
Created an attachment (id=14920)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14920&action=view)
Test case showing the three cases

Compile using -fno-line.
For the AVR I used: avr-gcc -Wall -Os -fno-inline -mmcu=avr5 --save-temps
main.c


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34737


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/34737] Inefficient gimplification of post-modified function arguments, TER doesn't do its work
  2008-01-11  8:58 [Bug c/34737] New: missed optimization, foo(p); p++ is better then foo(p++) wvangulik at xs4all dot nl
  2008-01-11  8:59 ` [Bug c/34737] " wvangulik at xs4all dot nl
@ 2008-01-11 10:19 ` rguenth at gcc dot gnu dot org
  2008-01-11 11:48 ` [Bug tree-optimization/34737] Scheduling of post-modified function arguments is not good pinskia at gcc dot gnu dot org
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-01-11 10:19 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from rguenth at gcc dot gnu dot org  2008-01-11 09:42 -------
Confirmed.

void foo(char *p);

void test1(char * p)
{
    foo(p++);
    foo(p++);
    foo(p++);
    foo(p++);
}

void test2(char * p)
{
    foo(p); p++;
    foo(p); p++;
    foo(p); p++;
    foo(p); p++;
}

The problem is with the first variant we have two registers life over each
function call, while with the second variant only one.  This can be seen
from the optimized tree-dump already:

test1 (p)
{
<bb 2>: 
  p_3 = p_1(D) + 1;
  foo (p_1(D));
  p_5 = p_3 + 1;
  foo (p_3);
  p_7 = p_5 + 1;
  foo (p_5);
  foo (p_7) [tail call];
  return;

}

test2 (p)
{
<bb 2>:
  foo (p_1(D));
  p_2 = p_1(D) + 1;
  foo (p_2);
  p_3 = p_2 + 1;
  foo (p_3);
  p_4 = p_3 + 1;
  foo (p_4) [tail call];
  return;

}

and is initially caused by gimplification which produces

  p.0 = p;
  p = p + 1;
  foo (p.0);

from

  foo (p++ );

no further pass undos this transformation.

With GCC 4.0 TER produced

  foo (p);
  foo (p + 1B);
  foo (p + 2B);
...

where we can generate good code from.  From 4.1 on this is no longer done.


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu dot
                   |                            |org
           Severity|normal                      |enhancement
             Status|UNCONFIRMED                 |NEW
          Component|c                           |tree-optimization
     Ever Confirmed|0                           |1
 GCC target triplet|multiple-none-none          |
           Keywords|                            |missed-optimization
   Last reconfirmed|0000-00-00 00:00:00         |2008-01-11 09:42:38
               date|                            |
            Summary|missed optimization, foo(p);|Inefficient gimplification
                   |p++ is better then foo(p++) |of post-modified function
                   |                            |arguments, TER doesn't do
                   |                            |its work


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34737


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/34737] Scheduling of post-modified function arguments is not good
  2008-01-11  8:58 [Bug c/34737] New: missed optimization, foo(p); p++ is better then foo(p++) wvangulik at xs4all dot nl
  2008-01-11  8:59 ` [Bug c/34737] " wvangulik at xs4all dot nl
  2008-01-11 10:19 ` [Bug tree-optimization/34737] Inefficient gimplification of post-modified function arguments, TER doesn't do its work rguenth at gcc dot gnu dot org
@ 2008-01-11 11:48 ` pinskia at gcc dot gnu dot org
  2009-06-24  7:42 ` steven at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2008-01-11 11:48 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from pinskia at gcc dot gnu dot org  2008-01-11 11:33 -------
No what happened with 4.0 is rather DOM would prop x+1 for each x.

Really this comes down to scheduling of instructions and moving them closer to
their usage.


-- 

pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
      Known to work|4.0.4                       |
            Summary|Inefficient gimplification  |Scheduling of post-modified
                   |of post-modified function   |function arguments is not
                   |arguments, TER doesn't do   |good
                   |its work                    |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34737


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/34737] Scheduling of post-modified function arguments is not good
  2008-01-11  8:58 [Bug c/34737] New: missed optimization, foo(p); p++ is better then foo(p++) wvangulik at xs4all dot nl
                   ` (2 preceding siblings ...)
  2008-01-11 11:48 ` [Bug tree-optimization/34737] Scheduling of post-modified function arguments is not good pinskia at gcc dot gnu dot org
@ 2009-06-24  7:42 ` steven at gcc dot gnu dot org
  2009-06-24  9:08 ` rguenther at suse dot de
  2010-09-13 11:38 ` abnikant dot singh at atmel dot com
  5 siblings, 0 replies; 7+ messages in thread
From: steven at gcc dot gnu dot org @ 2009-06-24  7:42 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from steven at gcc dot gnu dot org  2009-06-24 07:42 -------
Couldn't this be fixed also by changing the initial gimplification from:

  p.0 = p;
  p = p + 1;
  foo (p.0);

to:

  p.0 = p;
  foo (p.0);
  p = p + 1;

?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34737


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/34737] Scheduling of post-modified function arguments is not good
  2008-01-11  8:58 [Bug c/34737] New: missed optimization, foo(p); p++ is better then foo(p++) wvangulik at xs4all dot nl
                   ` (3 preceding siblings ...)
  2009-06-24  7:42 ` steven at gcc dot gnu dot org
@ 2009-06-24  9:08 ` rguenther at suse dot de
  2010-09-13 11:38 ` abnikant dot singh at atmel dot com
  5 siblings, 0 replies; 7+ messages in thread
From: rguenther at suse dot de @ 2009-06-24  9:08 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from rguenther at suse dot de  2009-06-24 09:07 -------
Subject: Re:  Scheduling of post-modified function
 arguments is not good

On Wed, 24 Jun 2009, steven at gcc dot gnu dot org wrote:

> ------- Comment #4 from steven at gcc dot gnu dot org  2009-06-24 07:42 -------
> Couldn't this be fixed also by changing the initial gimplification from:
> 
>   p.0 = p;
>   p = p + 1;
>   foo (p.0);
> 
> to:
> 
>   p.0 = p;
>   foo (p.0);
>   p = p + 1;

Probably yes.

Richard.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34737


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/34737] Scheduling of post-modified function arguments is not good
  2008-01-11  8:58 [Bug c/34737] New: missed optimization, foo(p); p++ is better then foo(p++) wvangulik at xs4all dot nl
                   ` (4 preceding siblings ...)
  2009-06-24  9:08 ` rguenther at suse dot de
@ 2010-09-13 11:38 ` abnikant dot singh at atmel dot com
  5 siblings, 0 replies; 7+ messages in thread
From: abnikant dot singh at atmel dot com @ 2010-09-13 11:38 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from abnikant dot singh at atmel dot com  2010-09-13 11:38 -------
we get better code in the head. Both the cases [test1 and test2] produce the
same piece of code:
i.e for the following test case:

void foo(char *p);

void test1(char * p)
{
    foo(p++);
    foo(p++);
    foo(p++);
    foo(p++);
}

void test2(char * p)
{
    foo(p); p++;
    foo(p); p++;
    foo(p); p++;
    foo(p); p++;
}

we get:
test1:
        push r28
        push r29
/* prologue: function */
/* frame size = 0 */
/* stack size = 2 */
.L__stack_usage = 2
        mov r28,r24
        mov r29,r25
        rcall foo
        mov r24,r28
        mov r25,r29
        adiw r24,1
        rcall foo
        mov r24,r28
        mov r25,r29
        adiw r24,2
        rcall foo
        mov r24,r28
        mov r25,r29
        adiw r24,3
        rcall foo
/* epilogue start */
        pop r29
        pop r28
        ret
        .size   test1, .-test1
.global test2
        .type   test2, @function
test2:
        push r28
        push r29
/* prologue: function */
/* frame size = 0 */
/* stack size = 2 */
.L__stack_usage = 2
        mov r28,r24
        mov r29,r25
        rcall foo
        mov r24,r28
        mov r25,r29
        adiw r24,1
        rcall foo
        mov r24,r28
        mov r25,r29
        adiw r24,2
        rcall foo
        mov r24,r28
        mov r25,r29
        adiw r24,3
        rcall foo
/* epilogue start */
        pop r29
        pop r28
        ret
        .size   test2, .-test2


-- 

abnikant dot singh at atmel dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |abnikant dot singh at atmel
                   |                            |dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34737


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-09-13 11:38 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-01-11  8:58 [Bug c/34737] New: missed optimization, foo(p); p++ is better then foo(p++) wvangulik at xs4all dot nl
2008-01-11  8:59 ` [Bug c/34737] " wvangulik at xs4all dot nl
2008-01-11 10:19 ` [Bug tree-optimization/34737] Inefficient gimplification of post-modified function arguments, TER doesn't do its work rguenth at gcc dot gnu dot org
2008-01-11 11:48 ` [Bug tree-optimization/34737] Scheduling of post-modified function arguments is not good pinskia at gcc dot gnu dot org
2009-06-24  7:42 ` steven at gcc dot gnu dot org
2009-06-24  9:08 ` rguenther at suse dot de
2010-09-13 11:38 ` abnikant dot singh at atmel dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).