public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/34737] New: missed optimization, foo(p); p++ is better then foo(p++)
@ 2008-01-11 8:58 wvangulik at xs4all dot nl
2008-01-11 8:59 ` [Bug c/34737] " wvangulik at xs4all dot nl
` (5 more replies)
0 siblings, 6 replies; 7+ messages in thread
From: wvangulik at xs4all dot nl @ 2008-01-11 8:58 UTC (permalink / raw)
To: gcc-bugs
Consider the following:
char *x;
volatile int y;
void foo(char *p)
{
y += *p;
}
void main(void)
{
char *p1 = x;
foo(p1++);
foo(p1++);
foo(p1++);
foo(p1++);
foo(p1++);
foo(p1++);
foo(p1++);
foo(p1++);
foo(p1++);
foo(p1++);
}
For the AVR target this will generate ugly code. Having a double saved variable
etc.
/* prologue: frame size=0 */
push r14
push r15
push r16
push r17
/* prologue end (size=4) */
lds r24,x
lds r25,(x)+1
movw r16,r24
subi r16,lo8(-(1))
sbci r17,hi8(-(1))
call foo
movw r14,r16
sec
adc r14,__zero_reg__
adc r15,__zero_reg__
movw r24,r16
call foo
movw r16,r14
subi r16,lo8(-(1))
sbci r17,hi8(-(1))
movw r24,r14
call foo
etc..
The results gets much better when writing it like "foo(p); p++;"
/* prologue: frame size=0 */
push r16
push r17
/* prologue end (size=2) */
movw r16,r24
call foo
subi r16,lo8(-(1))
sbci r17,hi8(-(1))
movw r24,r16
call foo
subi r16,lo8(-(1))
sbci r17,hi8(-(1))
And the results get near optimal when using larger increments then the target
can add immediately ( >64). The compiler then adds the cumulative offset. Which
would be the most optimal case if also done for lower increments.
movw r16,r24
call foo
movw r24,r16
subi r24,lo8(-(65))
sbci r25,hi8(-(65))
call foo
movw r24,r16
subi r24,lo8(-(130))
sbci r25,hi8(-(130))
This worst behaviour is shown for 4.1.2, 4.2.2, 4.3.0
Better results (still non-optimal) are with 3.4.6 and 3.3.6.
But 4.0.4 is producing the most optimal code for the original foo(p++)
Ugly code is also being seen for arm/thumb and pdp-11.
But good code for arm/arm
So it's a multi-target problem, not just the avr!
--
Summary: missed optimization, foo(p); p++ is better then foo(p++)
Product: gcc
Version: 4.3.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: wvangulik at xs4all dot nl
GCC target triplet: multiple-none-none
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34737
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug c/34737] missed optimization, foo(p); p++ is better then foo(p++)
2008-01-11 8:58 [Bug c/34737] New: missed optimization, foo(p); p++ is better then foo(p++) wvangulik at xs4all dot nl
@ 2008-01-11 8:59 ` wvangulik at xs4all dot nl
2008-01-11 10:19 ` [Bug tree-optimization/34737] Inefficient gimplification of post-modified function arguments, TER doesn't do its work rguenth at gcc dot gnu dot org
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: wvangulik at xs4all dot nl @ 2008-01-11 8:59 UTC (permalink / raw)
To: gcc-bugs
------- Comment #1 from wvangulik at xs4all dot nl 2008-01-11 08:17 -------
Created an attachment (id=14920)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14920&action=view)
Test case showing the three cases
Compile using -fno-line.
For the AVR I used: avr-gcc -Wall -Os -fno-inline -mmcu=avr5 --save-temps
main.c
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34737
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug tree-optimization/34737] Inefficient gimplification of post-modified function arguments, TER doesn't do its work
2008-01-11 8:58 [Bug c/34737] New: missed optimization, foo(p); p++ is better then foo(p++) wvangulik at xs4all dot nl
2008-01-11 8:59 ` [Bug c/34737] " wvangulik at xs4all dot nl
@ 2008-01-11 10:19 ` rguenth at gcc dot gnu dot org
2008-01-11 11:48 ` [Bug tree-optimization/34737] Scheduling of post-modified function arguments is not good pinskia at gcc dot gnu dot org
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-01-11 10:19 UTC (permalink / raw)
To: gcc-bugs
------- Comment #2 from rguenth at gcc dot gnu dot org 2008-01-11 09:42 -------
Confirmed.
void foo(char *p);
void test1(char * p)
{
foo(p++);
foo(p++);
foo(p++);
foo(p++);
}
void test2(char * p)
{
foo(p); p++;
foo(p); p++;
foo(p); p++;
foo(p); p++;
}
The problem is with the first variant we have two registers life over each
function call, while with the second variant only one. This can be seen
from the optimized tree-dump already:
test1 (p)
{
<bb 2>:
p_3 = p_1(D) + 1;
foo (p_1(D));
p_5 = p_3 + 1;
foo (p_3);
p_7 = p_5 + 1;
foo (p_5);
foo (p_7) [tail call];
return;
}
test2 (p)
{
<bb 2>:
foo (p_1(D));
p_2 = p_1(D) + 1;
foo (p_2);
p_3 = p_2 + 1;
foo (p_3);
p_4 = p_3 + 1;
foo (p_4) [tail call];
return;
}
and is initially caused by gimplification which produces
p.0 = p;
p = p + 1;
foo (p.0);
from
foo (p++ );
no further pass undos this transformation.
With GCC 4.0 TER produced
foo (p);
foo (p + 1B);
foo (p + 2B);
...
where we can generate good code from. From 4.1 on this is no longer done.
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |rguenth at gcc dot gnu dot
| |org
Severity|normal |enhancement
Status|UNCONFIRMED |NEW
Component|c |tree-optimization
Ever Confirmed|0 |1
GCC target triplet|multiple-none-none |
Keywords| |missed-optimization
Last reconfirmed|0000-00-00 00:00:00 |2008-01-11 09:42:38
date| |
Summary|missed optimization, foo(p);|Inefficient gimplification
|p++ is better then foo(p++) |of post-modified function
| |arguments, TER doesn't do
| |its work
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34737
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug tree-optimization/34737] Scheduling of post-modified function arguments is not good
2008-01-11 8:58 [Bug c/34737] New: missed optimization, foo(p); p++ is better then foo(p++) wvangulik at xs4all dot nl
2008-01-11 8:59 ` [Bug c/34737] " wvangulik at xs4all dot nl
2008-01-11 10:19 ` [Bug tree-optimization/34737] Inefficient gimplification of post-modified function arguments, TER doesn't do its work rguenth at gcc dot gnu dot org
@ 2008-01-11 11:48 ` pinskia at gcc dot gnu dot org
2009-06-24 7:42 ` steven at gcc dot gnu dot org
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2008-01-11 11:48 UTC (permalink / raw)
To: gcc-bugs
------- Comment #3 from pinskia at gcc dot gnu dot org 2008-01-11 11:33 -------
No what happened with 4.0 is rather DOM would prop x+1 for each x.
Really this comes down to scheduling of instructions and moving them closer to
their usage.
--
pinskia at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Known to work|4.0.4 |
Summary|Inefficient gimplification |Scheduling of post-modified
|of post-modified function |function arguments is not
|arguments, TER doesn't do |good
|its work |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34737
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug tree-optimization/34737] Scheduling of post-modified function arguments is not good
2008-01-11 8:58 [Bug c/34737] New: missed optimization, foo(p); p++ is better then foo(p++) wvangulik at xs4all dot nl
` (2 preceding siblings ...)
2008-01-11 11:48 ` [Bug tree-optimization/34737] Scheduling of post-modified function arguments is not good pinskia at gcc dot gnu dot org
@ 2009-06-24 7:42 ` steven at gcc dot gnu dot org
2009-06-24 9:08 ` rguenther at suse dot de
2010-09-13 11:38 ` abnikant dot singh at atmel dot com
5 siblings, 0 replies; 7+ messages in thread
From: steven at gcc dot gnu dot org @ 2009-06-24 7:42 UTC (permalink / raw)
To: gcc-bugs
------- Comment #4 from steven at gcc dot gnu dot org 2009-06-24 07:42 -------
Couldn't this be fixed also by changing the initial gimplification from:
p.0 = p;
p = p + 1;
foo (p.0);
to:
p.0 = p;
foo (p.0);
p = p + 1;
?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34737
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug tree-optimization/34737] Scheduling of post-modified function arguments is not good
2008-01-11 8:58 [Bug c/34737] New: missed optimization, foo(p); p++ is better then foo(p++) wvangulik at xs4all dot nl
` (3 preceding siblings ...)
2009-06-24 7:42 ` steven at gcc dot gnu dot org
@ 2009-06-24 9:08 ` rguenther at suse dot de
2010-09-13 11:38 ` abnikant dot singh at atmel dot com
5 siblings, 0 replies; 7+ messages in thread
From: rguenther at suse dot de @ 2009-06-24 9:08 UTC (permalink / raw)
To: gcc-bugs
------- Comment #5 from rguenther at suse dot de 2009-06-24 09:07 -------
Subject: Re: Scheduling of post-modified function
arguments is not good
On Wed, 24 Jun 2009, steven at gcc dot gnu dot org wrote:
> ------- Comment #4 from steven at gcc dot gnu dot org 2009-06-24 07:42 -------
> Couldn't this be fixed also by changing the initial gimplification from:
>
> p.0 = p;
> p = p + 1;
> foo (p.0);
>
> to:
>
> p.0 = p;
> foo (p.0);
> p = p + 1;
Probably yes.
Richard.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34737
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug tree-optimization/34737] Scheduling of post-modified function arguments is not good
2008-01-11 8:58 [Bug c/34737] New: missed optimization, foo(p); p++ is better then foo(p++) wvangulik at xs4all dot nl
` (4 preceding siblings ...)
2009-06-24 9:08 ` rguenther at suse dot de
@ 2010-09-13 11:38 ` abnikant dot singh at atmel dot com
5 siblings, 0 replies; 7+ messages in thread
From: abnikant dot singh at atmel dot com @ 2010-09-13 11:38 UTC (permalink / raw)
To: gcc-bugs
------- Comment #6 from abnikant dot singh at atmel dot com 2010-09-13 11:38 -------
we get better code in the head. Both the cases [test1 and test2] produce the
same piece of code:
i.e for the following test case:
void foo(char *p);
void test1(char * p)
{
foo(p++);
foo(p++);
foo(p++);
foo(p++);
}
void test2(char * p)
{
foo(p); p++;
foo(p); p++;
foo(p); p++;
foo(p); p++;
}
we get:
test1:
push r28
push r29
/* prologue: function */
/* frame size = 0 */
/* stack size = 2 */
.L__stack_usage = 2
mov r28,r24
mov r29,r25
rcall foo
mov r24,r28
mov r25,r29
adiw r24,1
rcall foo
mov r24,r28
mov r25,r29
adiw r24,2
rcall foo
mov r24,r28
mov r25,r29
adiw r24,3
rcall foo
/* epilogue start */
pop r29
pop r28
ret
.size test1, .-test1
.global test2
.type test2, @function
test2:
push r28
push r29
/* prologue: function */
/* frame size = 0 */
/* stack size = 2 */
.L__stack_usage = 2
mov r28,r24
mov r29,r25
rcall foo
mov r24,r28
mov r25,r29
adiw r24,1
rcall foo
mov r24,r28
mov r25,r29
adiw r24,2
rcall foo
mov r24,r28
mov r25,r29
adiw r24,3
rcall foo
/* epilogue start */
pop r29
pop r28
ret
.size test2, .-test2
--
abnikant dot singh at atmel dot com changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |abnikant dot singh at atmel
| |dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34737
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2010-09-13 11:38 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-01-11 8:58 [Bug c/34737] New: missed optimization, foo(p); p++ is better then foo(p++) wvangulik at xs4all dot nl
2008-01-11 8:59 ` [Bug c/34737] " wvangulik at xs4all dot nl
2008-01-11 10:19 ` [Bug tree-optimization/34737] Inefficient gimplification of post-modified function arguments, TER doesn't do its work rguenth at gcc dot gnu dot org
2008-01-11 11:48 ` [Bug tree-optimization/34737] Scheduling of post-modified function arguments is not good pinskia at gcc dot gnu dot org
2009-06-24 7:42 ` steven at gcc dot gnu dot org
2009-06-24 9:08 ` rguenther at suse dot de
2010-09-13 11:38 ` abnikant dot singh at atmel dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).