public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/32698] New: [4.3 regression] inefficient pointer expression
@ 2007-07-09 13:25 zippel at gcc dot gnu dot org
2007-07-09 13:40 ` [Bug tree-optimization/32698] " rguenth at gcc dot gnu dot org
` (22 more replies)
0 siblings, 23 replies; 24+ messages in thread
From: zippel at gcc dot gnu dot org @ 2007-07-09 13:25 UTC (permalink / raw)
To: gcc-bugs
Taking this example:
int foo(int *p, unsigned int i)
{
return p[i + 1] + p[i + 2] + p[i + 3];
}
produces inefficient code. The problem already starts at tree level, where for
"p[i+1]" an expression like *(p + (i + 1) * 4)) is generated, which is not a
common pointer expression.
Also since this is different from the other generated pointer expression, the
common index expression isn't completely replaced.
An initial discussion about this can be found here:
http://gcc.gnu.org/ml/gcc-patches/2007-07/msg00418.html
--
Summary: [4.3 regression] inefficient pointer expression
Product: gcc
Version: 4.3.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: zippel at gcc dot gnu dot org
GCC host triplet: i686-pc-linux-gnu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32698
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug tree-optimization/32698] [4.3 regression] inefficient pointer expression
2007-07-09 13:25 [Bug tree-optimization/32698] New: [4.3 regression] inefficient pointer expression zippel at gcc dot gnu dot org
@ 2007-07-09 13:40 ` rguenth at gcc dot gnu dot org
2007-07-09 13:41 ` rguenth at gcc dot gnu dot org
` (21 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-07-09 13:40 UTC (permalink / raw)
To: gcc-bugs
------- Comment #1 from rguenth at gcc dot gnu dot org 2007-07-09 13:39 -------
With 2.95, 3.3.6 and 3.4.6 I get (for 32bit):
foo:
pushl %ebp
movl %esp,%ebp
movl 8(%ebp),%edx
movl 12(%ebp),%eax
movl %ebp,%esp
popl %ebp
movl 8(%edx,%eax,4),%ecx
addl 4(%edx,%eax,4),%ecx
addl 12(%edx,%eax,4),%ecx
movl %ecx,%eax
ret
4.0.3 and 4.1.2 generate:
foo:
pushl %ebp
movl %esp, %ebp
movl 12(%ebp), %edx
sall $2, %edx
addl 8(%ebp), %edx
movl 8(%edx), %eax
addl 4(%edx), %eax
addl 12(%edx), %eax
leave
ret
4.2.0 generates
foo:
pushl %ebp
movl %esp, %ebp
movl 12(%ebp), %ecx
movl 8(%ebp), %edx
popl %ebp
sall $2, %ecx
movl 4(%ecx,%edx), %eax
addl 8(%ecx,%edx), %eax
addl 12(%ecx,%edx), %eax
ret
while mainline generates
foo:
pushl %ebp
movl %esp, %ebp
movl 12(%ebp), %ecx
movl 8(%ebp), %edx
pushl %ebx
leal 0(,%ecx,4), %ebx
movl 8(%ebx,%edx), %eax
addl 4(%edx,%ecx,4), %eax
addl 12(%ebx,%edx), %eax
popl %ebx
popl %ebp
ret
64bit variants of all of the above create
foo:
.LFB2:
leal 1(%rsi), %edx
leal 2(%rsi), %eax
addl $3, %esi
movl (%rdi,%rax,4), %eax
addl (%rdi,%rdx,4), %eax
addl (%rdi,%rsi,4), %eax
ret
Tree dumps (of mainline) 64bit vs. 32bit are:
return (*(p + (long unsigned int) (i + 2) * 4) + *(p + (long unsigned int) (i
+ 1) * 4)) + *(p + (long unsigned int) (i + 3) * 4);
unsigned int D.1660;
D.1660 = i * 4;
return (*(p + (D.1660 + 8)) + *(p + (i + 1) * 4)) + *(p + (D.1660 + 12));
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Ever Confirmed|0 |1
GCC host triplet|i686-pc-linux-gnu |
GCC target triplet| |i686-*-*
Keywords| |missed-optimization
Last reconfirmed|0000-00-00 00:00:00 |2007-07-09 13:39:51
date| |
Target Milestone|--- |4.3.0
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32698
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug tree-optimization/32698] [4.3 regression] inefficient pointer expression
2007-07-09 13:25 [Bug tree-optimization/32698] New: [4.3 regression] inefficient pointer expression zippel at gcc dot gnu dot org
2007-07-09 13:40 ` [Bug tree-optimization/32698] " rguenth at gcc dot gnu dot org
@ 2007-07-09 13:41 ` rguenth at gcc dot gnu dot org
2007-07-09 13:42 ` rguenth at gcc dot gnu dot org
` (20 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-07-09 13:41 UTC (permalink / raw)
To: gcc-bugs
------- Comment #2 from rguenth at gcc dot gnu dot org 2007-07-09 13:41 -------
Created an attachment (id=13875)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13875&action=view)
patch
With the proposed patch mainline generates again
foo:
pushl %ebp
movl %esp, %ebp
movl 8(%ebp), %ecx
movl 12(%ebp), %edx
popl %ebp
movl 8(%ecx,%edx,4), %eax
addl 4(%ecx,%edx,4), %eax
addl 12(%ecx,%edx,4), %eax
ret
and
return (*(p + (i + 2) * 4) + *(p + (i + 1) * 4)) + *(p + (i + 3) * 4);
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32698
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug tree-optimization/32698] [4.3 regression] inefficient pointer expression
2007-07-09 13:25 [Bug tree-optimization/32698] New: [4.3 regression] inefficient pointer expression zippel at gcc dot gnu dot org
2007-07-09 13:40 ` [Bug tree-optimization/32698] " rguenth at gcc dot gnu dot org
2007-07-09 13:41 ` rguenth at gcc dot gnu dot org
@ 2007-07-09 13:42 ` rguenth at gcc dot gnu dot org
2007-07-09 14:40 ` zippel at gcc dot gnu dot org
` (19 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-07-09 13:42 UTC (permalink / raw)
To: gcc-bugs
------- Comment #3 from rguenth at gcc dot gnu dot org 2007-07-09 13:42 -------
Mine.
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
AssignedTo|unassigned at gcc dot gnu |rguenth at gcc dot gnu dot
|dot org |org
Status|NEW |ASSIGNED
Last reconfirmed|2007-07-09 13:39:51 |2007-07-09 13:42:24
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32698
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug tree-optimization/32698] [4.3 regression] inefficient pointer expression
2007-07-09 13:25 [Bug tree-optimization/32698] New: [4.3 regression] inefficient pointer expression zippel at gcc dot gnu dot org
` (2 preceding siblings ...)
2007-07-09 13:42 ` rguenth at gcc dot gnu dot org
@ 2007-07-09 14:40 ` zippel at gcc dot gnu dot org
2007-07-09 15:01 ` rguenth at gcc dot gnu dot org
` (18 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: zippel at gcc dot gnu dot org @ 2007-07-09 14:40 UTC (permalink / raw)
To: gcc-bugs
------- Comment #4 from zippel at gcc dot gnu dot org 2007-07-09 14:40 -------
IMHO something like this should be generated:
p2 = p + (i * 4);
return (*(p2 + 4) + *(p2 + 8) + *(p2 + 12));
Right now not even the (i*4) expression is removed from the last instruction
anymore.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32698
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug tree-optimization/32698] [4.3 regression] inefficient pointer expression
2007-07-09 13:25 [Bug tree-optimization/32698] New: [4.3 regression] inefficient pointer expression zippel at gcc dot gnu dot org
` (3 preceding siblings ...)
2007-07-09 14:40 ` zippel at gcc dot gnu dot org
@ 2007-07-09 15:01 ` rguenth at gcc dot gnu dot org
2007-07-09 15:28 ` zippel at gcc dot gnu dot org
` (17 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-07-09 15:01 UTC (permalink / raw)
To: gcc-bugs
------- Comment #5 from rguenth at gcc dot gnu dot org 2007-07-09 15:00 -------
Note that we don't hoist the i*4 or the addition to p to help addressing mode
selection which likes to see the "whole" address.
Of course it shouldn't matter in which form we see the addresses and the same
code should be generated for all, still canonicalization to the same form
makes a difference.
In fact, canonicalizing to
unsigned int D.1656;
<bb 2>:
D.1656 = i * 4;
return (*(p + (D.1656 + 8)) + *(p + (D.1656 + 4))) + *(p + (D.1656 + 12));
as you suggest creates worse assembly (look at the extra shift)
foo:
pushl %ebp
movl %esp, %ebp
movl 12(%ebp), %ecx
movl 8(%ebp), %edx
popl %ebp
sall $2, %ecx
movl 8(%ecx,%edx), %eax
addl 4(%ecx,%edx), %eax
addl 12(%ecx,%edx), %eax
ret
in fact the above shows that the proper fix would be in the backend (if
there is anything to fix) and making sure we consistently canonicalize
is good enough. Consider the related testcase
int foo(int *p, short *q, char *r, unsigned int i)
{
return p[i + 1] + q[i + 1] + r[i + 1];
}
which in one case is canonicalized to
unsigned int D.1658;
<bb 2>:
D.1658 = i + 1;
return ((int) *(r + D.1658) + *(p + D.1658 * 4)) + (int) *(q + D.1658 * 2);
in the other to
return ((int) *(r + (i + 1)) + *(p + (i * 4 + 4))) + (int) *(q + (i * 2 +
2));
so there is no form that is clearly better to canonicalize to. But the
correct "form" depends on the context (whether it is profitable to either
CSE i * 4 or i + 1).
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32698
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug tree-optimization/32698] [4.3 regression] inefficient pointer expression
2007-07-09 13:25 [Bug tree-optimization/32698] New: [4.3 regression] inefficient pointer expression zippel at gcc dot gnu dot org
` (4 preceding siblings ...)
2007-07-09 15:01 ` rguenth at gcc dot gnu dot org
@ 2007-07-09 15:28 ` zippel at gcc dot gnu dot org
2007-07-09 15:37 ` rguenther at suse dot de
` (16 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: zippel at gcc dot gnu dot org @ 2007-07-09 15:28 UTC (permalink / raw)
To: gcc-bugs
------- Comment #6 from zippel at gcc dot gnu dot org 2007-07-09 15:27 -------
(In reply to comment #5)
> as you suggest creates worse assembly (look at the extra shift)
>
> foo:
> pushl %ebp
> movl %esp, %ebp
> movl 12(%ebp), %ecx
> movl 8(%ebp), %edx
> popl %ebp
> sall $2, %ecx
> movl 8(%ecx,%edx), %eax
> addl 4(%ecx,%edx), %eax
> addl 12(%ecx,%edx), %eax
> ret
The cost of this is dependent on the target, so IMO the shift could be
propagated back into the address at RTL level.
> so there is no form that is clearly better to canonicalize to.
Your example is rather artificial and depends on that (i + x) * y is completely
eliminated. My main point is still that such expression are far more difficult
to translate into proper address operations.
To generate addresses targeting a form of (i * x) + y is clearly better.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32698
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug tree-optimization/32698] [4.3 regression] inefficient pointer expression
2007-07-09 13:25 [Bug tree-optimization/32698] New: [4.3 regression] inefficient pointer expression zippel at gcc dot gnu dot org
` (5 preceding siblings ...)
2007-07-09 15:28 ` zippel at gcc dot gnu dot org
@ 2007-07-09 15:37 ` rguenther at suse dot de
2007-07-09 17:42 ` zippel at gcc dot gnu dot org
` (15 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: rguenther at suse dot de @ 2007-07-09 15:37 UTC (permalink / raw)
To: gcc-bugs
------- Comment #7 from rguenther at suse dot de 2007-07-09 15:37 -------
Subject: Re: [4.3 regression] inefficient
pointer expression
On Mon, 9 Jul 2007, zippel at gcc dot gnu dot org wrote:
> (In reply to comment #5)
> > as you suggest creates worse assembly (look at the extra shift)
> >
> > foo:
> > pushl %ebp
> > movl %esp, %ebp
> > movl 12(%ebp), %ecx
> > movl 8(%ebp), %edx
> > popl %ebp
> > sall $2, %ecx
> > movl 8(%ecx,%edx), %eax
> > addl 4(%ecx,%edx), %eax
> > addl 12(%ecx,%edx), %eax
> > ret
>
> The cost of this is dependent on the target, so IMO the shift could be
> propagated back into the address at RTL level.
Or the other way around.
> > so there is no form that is clearly better to canonicalize to.
>
> Your example is rather artificial and depends on that (i + x) * y is completely
> eliminated. My main point is still that such expression are far more difficult
> to translate into proper address operations.
> To generate addresses targeting a form of (i * x) + y is clearly better.
My example is not artificial. It is quite common to have loops that
stream data from one type to another. Also it's not 'clearly' better to
me.
Anyway, fold () is certainly not able to decide this and both value
numbering and re-association can improve the IL by taking into account
context.
On the backend side we have the fwprop pass which is supposed to do
addressing mode selection and the backend which is supposed to provide
accurate costs for them.
Richard.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32698
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug tree-optimization/32698] [4.3 regression] inefficient pointer expression
2007-07-09 13:25 [Bug tree-optimization/32698] New: [4.3 regression] inefficient pointer expression zippel at gcc dot gnu dot org
` (6 preceding siblings ...)
2007-07-09 15:37 ` rguenther at suse dot de
@ 2007-07-09 17:42 ` zippel at gcc dot gnu dot org
2007-07-09 19:42 ` rguenth at gcc dot gnu dot org
` (14 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: zippel at gcc dot gnu dot org @ 2007-07-09 17:42 UTC (permalink / raw)
To: gcc-bugs
------- Comment #8 from zippel at gcc dot gnu dot org 2007-07-09 17:42 -------
(In reply to comment #7)
> On the backend side we have the fwprop pass which is supposed to do
> addressing mode selection and the backend which is supposed to provide
> accurate costs for them.
Let's take your proposed form:
return (*(p + (i + 2) * 4) + *(p + (i + 1) * 4)) + *(p + (i + 3) * 4);
After initial RTL expansion something like this is generated:
t1 = i + 2;
r = *(p + t1 * 4);
t2 = i + 1;
r += *(p + t2 * 4);
t3 = i + 3;
r += *(p + t3 * 4);
The problem is now that it takes until combine until this is generated:
r = *(p + i * 4 + 8);
r += *(p + i * 4 + 4);
r += *(p + i * 4 + 12);
and at this point it's too late. It's easy to blame the back end, but at this
point it's IMHO unrealistic to undo this mess at RTL level. The proper address
form should be generated as early as possible, so that at the time RTL is
generated, it's as close as possible to the final form.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32698
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug tree-optimization/32698] [4.3 regression] inefficient pointer expression
2007-07-09 13:25 [Bug tree-optimization/32698] New: [4.3 regression] inefficient pointer expression zippel at gcc dot gnu dot org
` (7 preceding siblings ...)
2007-07-09 17:42 ` zippel at gcc dot gnu dot org
@ 2007-07-09 19:42 ` rguenth at gcc dot gnu dot org
2007-07-18 6:00 ` pinskia at gcc dot gnu dot org
` (13 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-07-09 19:42 UTC (permalink / raw)
To: gcc-bugs
------- Comment #9 from rguenth at gcc dot gnu dot org 2007-07-09 19:42 -------
Subject: Bug 32698
Author: rguenth
Date: Mon Jul 9 19:41:54 2007
New Revision: 126494
URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=126494
Log:
2007-07-09 Richard Guenther <rguenther@suse.de>
PR middle-end/32698
* fold-const.c (fold_plusminus_mult_expr): Move constant
arguments second to allow decomposing.
Modified:
trunk/gcc/ChangeLog
trunk/gcc/fold-const.c
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32698
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug tree-optimization/32698] [4.3 regression] inefficient pointer expression
2007-07-09 13:25 [Bug tree-optimization/32698] New: [4.3 regression] inefficient pointer expression zippel at gcc dot gnu dot org
` (8 preceding siblings ...)
2007-07-09 19:42 ` rguenth at gcc dot gnu dot org
@ 2007-07-18 6:00 ` pinskia at gcc dot gnu dot org
2007-07-18 12:56 ` zippel at gcc dot gnu dot org
` (12 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2007-07-18 6:00 UTC (permalink / raw)
To: gcc-bugs
------- Comment #10 from pinskia at gcc dot gnu dot org 2007-07-18 06:00 -------
Fixed.
--
pinskia at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution| |FIXED
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32698
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug tree-optimization/32698] [4.3 regression] inefficient pointer expression
2007-07-09 13:25 [Bug tree-optimization/32698] New: [4.3 regression] inefficient pointer expression zippel at gcc dot gnu dot org
` (9 preceding siblings ...)
2007-07-18 6:00 ` pinskia at gcc dot gnu dot org
@ 2007-07-18 12:56 ` zippel at gcc dot gnu dot org
2007-07-19 16:50 ` rguenth at gcc dot gnu dot org
` (11 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: zippel at gcc dot gnu dot org @ 2007-07-18 12:56 UTC (permalink / raw)
To: gcc-bugs
------- Comment #11 from zippel at gcc dot gnu dot org 2007-07-18 12:56 -------
This bug is not fixed yet.
Current gcc still generates:
return (*(p + (i + 2) * 4) + *(p + (i + 1) * 4)) + *(p + (i + 3) * 4);
1. it still fails to extract the common expression at tree level.
2. it generates ineffecient initial RTL, so later optimizers have little chance
to do something useful with it.
--
zippel at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|RESOLVED |REOPENED
Resolution|FIXED |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32698
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug tree-optimization/32698] [4.3 regression] inefficient pointer expression
2007-07-09 13:25 [Bug tree-optimization/32698] New: [4.3 regression] inefficient pointer expression zippel at gcc dot gnu dot org
` (10 preceding siblings ...)
2007-07-18 12:56 ` zippel at gcc dot gnu dot org
@ 2007-07-19 16:50 ` rguenth at gcc dot gnu dot org
2007-07-19 18:27 ` zippel at gcc dot gnu dot org
` (10 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-07-19 16:50 UTC (permalink / raw)
To: gcc-bugs
------- Comment #12 from rguenth at gcc dot gnu dot org 2007-07-19 16:50 -------
The IL representation is not a thing to complain about. Do you have a testcase
that shows a missed optimization instead of a one that has IL that is different
from what you expect?
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
AssignedTo|rguenth at gcc dot gnu dot |unassigned at gcc dot gnu
|org |dot org
Status|REOPENED |NEW
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32698
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug tree-optimization/32698] [4.3 regression] inefficient pointer expression
2007-07-09 13:25 [Bug tree-optimization/32698] New: [4.3 regression] inefficient pointer expression zippel at gcc dot gnu dot org
` (11 preceding siblings ...)
2007-07-19 16:50 ` rguenth at gcc dot gnu dot org
@ 2007-07-19 18:27 ` zippel at gcc dot gnu dot org
2007-07-20 11:48 ` rguenth at gcc dot gnu dot org
` (9 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: zippel at gcc dot gnu dot org @ 2007-07-19 18:27 UTC (permalink / raw)
To: gcc-bugs
------- Comment #13 from zippel at gcc dot gnu dot org 2007-07-19 18:27 -------
The initial test case is part of the missed optimization. For example current
stable Debian gcc (4.1.2 20061115) produces code like this:
movl 4(%esp), %eax
movl 8(%esp), %edx
leal (%eax,%edx,4), %edx
movl 4(%edx), %ecx
movl 8(%edx), %eax
addl %ecx, %eax
movl 12(%edx), %ecx
addl %ecx, %eax
ret
Which has some unnecessaries moves, but it shows the basic idea, so with
eliminated moves it would be:
movl 4(%esp), %eax
movl 8(%esp), %edx
leal (%eax,%edx,4), %edx
movl 4(%edx), %eax
addl 8(%edx), %eax
addl 12(%edx), %eax
ret
>From the code size this is identical to:
movl 4(%esp), %ecx
movl 8(%esp), %edx
movl 8(%ecx,%edx,4), %eax
addl 4(%ecx,%edx,4), %eax
addl 12(%ecx,%edx,4), %eax
ret
But it depends now on the target which instruction sequence is better.
The problem is now with the new canonical form, that AFAICT it has become
practically very difficult to generate the optimal sequence based on
instruction costs.
The older gcc produces this IL before RTL generation:
D.1283 = (int *) (i * 4) + p;
return *(D.1283 + 4B) + *(D.1283 + 8B) + *(D.1283 + 12B);
which produces far better RTL for the optimizers to work with.
BTW this problem is not limited to pointer expression, since the lea
instruction is used in other expressions as well.
Let's take this example:
void f(unsigned int *p, unsigned int a)
{
p[0] = a * 4 + 4;
p[1] = a * 4 + 8;
p[2] = a * 4 + 12;
}
Above gcc 4.1 produces this:
D.1281 = a * 4;
*p = D.1281 + 4;
*(p + 4B) = D.1281 + 8;
*(p + 8B) = D.1281 + 12;
movl 8(%esp), %eax
movl 4(%esp), %ecx
sall $2, %eax
leal 4(%eax), %edx
movl %edx, (%ecx)
leal 8(%eax), %edx
addl $12, %eax
movl %edx, 4(%ecx)
movl %eax, 8(%ecx)
ret
gcc 4.2 produces this:
*p = (a + 1) * 4;
D.1545 = a * 4;
*(p + 4B) = D.1545 + 8;
*(p + 8B) = D.1545 + 12;
movl 8(%esp), %eax
movl 4(%esp), %ecx
leal 4(,%eax,4), %edx
sall $2, %eax
movl %edx, (%ecx)
leal 8(%eax), %edx
addl $12, %eax
movl %edx, 4(%ecx)
movl %eax, 8(%ecx)
ret
So 4.2 already produces slightly worse code.
Current gcc finally produces:
*p = (a + 1) * 4;
*(p + 4) = (a + 2) * 4;
*(p + 8) = (a + 3) * 4;
movl 8(%esp), %eax
movl 4(%esp), %ecx
leal 4(,%eax,4), %edx
movl %edx, (%ecx)
leal 8(,%eax,4), %edx
leal 12(,%eax,4), %eax
movl %edx, 4(%ecx)
movl %eax, 8(%ecx)
ret
This has now the largest code size of all versions.
This new canonical form IMHO clearly conflicts with what is expected at RTL
level, so I don't understand why it's so important to use this one. Could you
maybe explain the reason behind this choice?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32698
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug tree-optimization/32698] [4.3 regression] inefficient pointer expression
2007-07-09 13:25 [Bug tree-optimization/32698] New: [4.3 regression] inefficient pointer expression zippel at gcc dot gnu dot org
` (12 preceding siblings ...)
2007-07-19 18:27 ` zippel at gcc dot gnu dot org
@ 2007-07-20 11:48 ` rguenth at gcc dot gnu dot org
2007-07-20 11:58 ` zippel at gcc dot gnu dot org
` (8 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-07-20 11:48 UTC (permalink / raw)
To: gcc-bugs
------- Comment #14 from rguenth at gcc dot gnu dot org 2007-07-20 11:48 -------
For current mainline I get (-O2)
foo:
pushl %ebp
movl %esp, %ebp
movl 8(%ebp), %ecx
movl 12(%ebp), %edx
popl %ebp
movl 8(%ecx,%edx,4), %eax
addl 4(%ecx,%edx,4), %eax
addl 12(%ecx,%edx,4), %eax
ret
Can you be more specific about what processor tuning you are using? That is,
can you provide the output of adding -v to the gcc commandline that produces
the mainline results in the last comment?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32698
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug tree-optimization/32698] [4.3 regression] inefficient pointer expression
2007-07-09 13:25 [Bug tree-optimization/32698] New: [4.3 regression] inefficient pointer expression zippel at gcc dot gnu dot org
` (13 preceding siblings ...)
2007-07-20 11:48 ` rguenth at gcc dot gnu dot org
@ 2007-07-20 11:58 ` zippel at gcc dot gnu dot org
2007-07-20 16:06 ` rguenth at gcc dot gnu dot org
` (7 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: zippel at gcc dot gnu dot org @ 2007-07-20 11:58 UTC (permalink / raw)
To: gcc-bugs
------- Comment #15 from zippel at gcc dot gnu dot org 2007-07-20 11:58 -------
In the examples I used -fomit-frame-pointer.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32698
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug tree-optimization/32698] [4.3 regression] inefficient pointer expression
2007-07-09 13:25 [Bug tree-optimization/32698] New: [4.3 regression] inefficient pointer expression zippel at gcc dot gnu dot org
` (14 preceding siblings ...)
2007-07-20 11:58 ` zippel at gcc dot gnu dot org
@ 2007-07-20 16:06 ` rguenth at gcc dot gnu dot org
2007-07-20 16:21 ` zippel at gcc dot gnu dot org
` (6 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-07-20 16:06 UTC (permalink / raw)
To: gcc-bugs
------- Comment #16 from rguenth at gcc dot gnu dot org 2007-07-20 16:05 -------
That makes it
foo:
movl 4(%esp), %ecx
movl 8(%esp), %edx
movl 8(%ecx,%edx,4), %eax
addl 4(%ecx,%edx,4), %eax
addl 12(%ecx,%edx,4), %eax
ret
for me. Still different from what you claim.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32698
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug tree-optimization/32698] [4.3 regression] inefficient pointer expression
2007-07-09 13:25 [Bug tree-optimization/32698] New: [4.3 regression] inefficient pointer expression zippel at gcc dot gnu dot org
` (15 preceding siblings ...)
2007-07-20 16:06 ` rguenth at gcc dot gnu dot org
@ 2007-07-20 16:21 ` zippel at gcc dot gnu dot org
2007-07-20 16:35 ` rguenth at gcc dot gnu dot org
` (5 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: zippel at gcc dot gnu dot org @ 2007-07-20 16:21 UTC (permalink / raw)
To: gcc-bugs
------- Comment #17 from zippel at gcc dot gnu dot org 2007-07-20 16:21 -------
Which claim?
It's exactly the third code example in comment #13
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32698
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug tree-optimization/32698] [4.3 regression] inefficient pointer expression
2007-07-09 13:25 [Bug tree-optimization/32698] New: [4.3 regression] inefficient pointer expression zippel at gcc dot gnu dot org
` (16 preceding siblings ...)
2007-07-20 16:21 ` zippel at gcc dot gnu dot org
@ 2007-07-20 16:35 ` rguenth at gcc dot gnu dot org
2007-07-20 17:06 ` zippel at gcc dot gnu dot org
` (4 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-07-20 16:35 UTC (permalink / raw)
To: gcc-bugs
------- Comment #18 from rguenth at gcc dot gnu dot org 2007-07-20 16:35 -------
I mean
<cite>
Current gcc finally produces:
*p = (a + 1) * 4;
*(p + 4) = (a + 2) * 4;
*(p + 8) = (a + 3) * 4;
movl 8(%esp), %eax
movl 4(%esp), %ecx
leal 4(,%eax,4), %edx
movl %edx, (%ecx)
leal 8(,%eax,4), %edx
leal 12(,%eax,4), %eax
movl %edx, 4(%ecx)
movl %eax, 8(%ecx)
ret
This has now the largest code size of all versions.
This new canonical form IMHO clearly conflicts with what is expected at RTL
level, so I don't understand why it's so important to use this one. Could you
maybe explain the reason behind this choice?
</cite>
which suggests that current trunk is worse with the patch. Or am I confused
and you are happy with the code generated by current trunk for the original
testcase?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32698
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug tree-optimization/32698] [4.3 regression] inefficient pointer expression
2007-07-09 13:25 [Bug tree-optimization/32698] New: [4.3 regression] inefficient pointer expression zippel at gcc dot gnu dot org
` (17 preceding siblings ...)
2007-07-20 16:35 ` rguenth at gcc dot gnu dot org
@ 2007-07-20 17:06 ` zippel at gcc dot gnu dot org
2007-07-20 17:22 ` rguenth at gcc dot gnu dot org
` (3 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: zippel at gcc dot gnu dot org @ 2007-07-20 17:06 UTC (permalink / raw)
To: gcc-bugs
------- Comment #19 from zippel at gcc dot gnu dot org 2007-07-20 17:06 -------
There is another small source example inbetween, which is used to produce all
code examples following it. :)
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32698
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug tree-optimization/32698] [4.3 regression] inefficient pointer expression
2007-07-09 13:25 [Bug tree-optimization/32698] New: [4.3 regression] inefficient pointer expression zippel at gcc dot gnu dot org
` (18 preceding siblings ...)
2007-07-20 17:06 ` zippel at gcc dot gnu dot org
@ 2007-07-20 17:22 ` rguenth at gcc dot gnu dot org
2007-08-10 0:44 ` mmitchel at gcc dot gnu dot org
` (2 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-07-20 17:22 UTC (permalink / raw)
To: gcc-bugs
------- Comment #20 from rguenth at gcc dot gnu dot org 2007-07-20 17:22 -------
Whoops ;) I missed that.
I have a counter-example that is better with the patch in the same way yours
is worse with it.
void f(unsigned int *p, unsigned int a)
{
p[0] = a * 4 + 4;
p[1] = a * 8 + 8;
p[2] = a * 12 + 12;
}
As I said, the fold canonicalization is just canonicalization, the code
generation has to be fixed elsewhere.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32698
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug tree-optimization/32698] [4.3 regression] inefficient pointer expression
2007-07-09 13:25 [Bug tree-optimization/32698] New: [4.3 regression] inefficient pointer expression zippel at gcc dot gnu dot org
` (19 preceding siblings ...)
2007-07-20 17:22 ` rguenth at gcc dot gnu dot org
@ 2007-08-10 0:44 ` mmitchel at gcc dot gnu dot org
2007-11-19 9:01 ` steven at gcc dot gnu dot org
2008-01-13 15:18 ` rguenth at gcc dot gnu dot org
22 siblings, 0 replies; 24+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2007-08-10 0:44 UTC (permalink / raw)
To: gcc-bugs
------- Comment #21 from mmitchel at gcc dot gnu dot org 2007-08-10 00:44 -------
I'm not convinced that there's anything to fix here; it sounds like we've just
traded which of two examples is better. If there is a bug here, please add a
note explaining, and upgrade back to P3.
--
mmitchel at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Priority|P3 |P4
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32698
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug tree-optimization/32698] [4.3 regression] inefficient pointer expression
2007-07-09 13:25 [Bug tree-optimization/32698] New: [4.3 regression] inefficient pointer expression zippel at gcc dot gnu dot org
` (20 preceding siblings ...)
2007-08-10 0:44 ` mmitchel at gcc dot gnu dot org
@ 2007-11-19 9:01 ` steven at gcc dot gnu dot org
2008-01-13 15:18 ` rguenth at gcc dot gnu dot org
22 siblings, 0 replies; 24+ messages in thread
From: steven at gcc dot gnu dot org @ 2007-11-19 9:01 UTC (permalink / raw)
To: gcc-bugs
------- Comment #22 from steven at gcc dot gnu dot org 2007-11-19 09:01 -------
"...and then he said: ``well, that's nice and all, but, ehm, where's the
bug?''"
--
steven at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |WAITING
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32698
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug tree-optimization/32698] [4.3 regression] inefficient pointer expression
2007-07-09 13:25 [Bug tree-optimization/32698] New: [4.3 regression] inefficient pointer expression zippel at gcc dot gnu dot org
` (21 preceding siblings ...)
2007-11-19 9:01 ` steven at gcc dot gnu dot org
@ 2008-01-13 15:18 ` rguenth at gcc dot gnu dot org
22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-01-13 15:18 UTC (permalink / raw)
To: gcc-bugs
------- Comment #23 from rguenth at gcc dot gnu dot org 2008-01-13 14:56 -------
Closing as fixed.
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|WAITING |RESOLVED
Resolution| |FIXED
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32698
^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2008-01-13 14:57 UTC | newest]
Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-07-09 13:25 [Bug tree-optimization/32698] New: [4.3 regression] inefficient pointer expression zippel at gcc dot gnu dot org
2007-07-09 13:40 ` [Bug tree-optimization/32698] " rguenth at gcc dot gnu dot org
2007-07-09 13:41 ` rguenth at gcc dot gnu dot org
2007-07-09 13:42 ` rguenth at gcc dot gnu dot org
2007-07-09 14:40 ` zippel at gcc dot gnu dot org
2007-07-09 15:01 ` rguenth at gcc dot gnu dot org
2007-07-09 15:28 ` zippel at gcc dot gnu dot org
2007-07-09 15:37 ` rguenther at suse dot de
2007-07-09 17:42 ` zippel at gcc dot gnu dot org
2007-07-09 19:42 ` rguenth at gcc dot gnu dot org
2007-07-18 6:00 ` pinskia at gcc dot gnu dot org
2007-07-18 12:56 ` zippel at gcc dot gnu dot org
2007-07-19 16:50 ` rguenth at gcc dot gnu dot org
2007-07-19 18:27 ` zippel at gcc dot gnu dot org
2007-07-20 11:48 ` rguenth at gcc dot gnu dot org
2007-07-20 11:58 ` zippel at gcc dot gnu dot org
2007-07-20 16:06 ` rguenth at gcc dot gnu dot org
2007-07-20 16:21 ` zippel at gcc dot gnu dot org
2007-07-20 16:35 ` rguenth at gcc dot gnu dot org
2007-07-20 17:06 ` zippel at gcc dot gnu dot org
2007-07-20 17:22 ` rguenth at gcc dot gnu dot org
2007-08-10 0:44 ` mmitchel at gcc dot gnu dot org
2007-11-19 9:01 ` steven at gcc dot gnu dot org
2008-01-13 15:18 ` rguenth at gcc dot gnu dot org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).