* [Bug c/29256] [4.2.0 performance regression]
2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
@ 2006-09-27 18:30 ` edmar at freescale dot com
2006-09-27 18:30 ` edmar at freescale dot com
` (41 subsequent siblings)
42 siblings, 0 replies; 44+ messages in thread
From: edmar at freescale dot com @ 2006-09-27 18:30 UTC (permalink / raw)
To: gcc-bugs
------- Comment #1 from edmar at freescale dot com 2006-09-27 18:30 -------
Created an attachment (id=12340)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=12340&action=view)
Result of 4.1 compilation
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256
^ permalink raw reply [flat|nested] 44+ messages in thread
* [Bug c/29256] [4.2.0 performance regression]
2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
2006-09-27 18:30 ` [Bug c/29256] " edmar at freescale dot com
@ 2006-09-27 18:30 ` edmar at freescale dot com
2006-09-28 3:00 ` [Bug middle-end/29256] [4.2 regression] loop unrolling performance regression pinskia at gcc dot gnu dot org
` (40 subsequent siblings)
42 siblings, 0 replies; 44+ messages in thread
From: edmar at freescale dot com @ 2006-09-27 18:30 UTC (permalink / raw)
To: gcc-bugs
------- Comment #2 from edmar at freescale dot com 2006-09-27 18:30 -------
Created an attachment (id=12341)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=12341&action=view)
Result of 4.2 compilation
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256
^ permalink raw reply [flat|nested] 44+ messages in thread
* [Bug middle-end/29256] [4.2 regression] loop unrolling performance regression
2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
2006-09-27 18:30 ` [Bug c/29256] " edmar at freescale dot com
2006-09-27 18:30 ` edmar at freescale dot com
@ 2006-09-28 3:00 ` pinskia at gcc dot gnu dot org
2006-09-28 11:08 ` rguenth at gcc dot gnu dot org
` (39 subsequent siblings)
42 siblings, 0 replies; 44+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2006-09-28 3:00 UTC (permalink / raw)
To: gcc-bugs
------- Comment #3 from pinskia at gcc dot gnu dot org 2006-09-28 02:59 -------
This is a generic regression, x86 has the same problem with the code. Even
doing -Ddouble=int, we have the same problem.
--
pinskia at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |pinskia at gcc dot gnu dot
| |org
Status|UNCONFIRMED |NEW
Ever Confirmed|0 |1
GCC host triplet|x86_64-unknown-linux-gnu |
GCC target triplet|powerpc-unknown-linux-gnuspe|
Keywords| |missed-optimization
Known to fail| |4.2.0
Known to work| |4.1.2
Last reconfirmed|0000-00-00 00:00:00 |2006-09-28 02:59:57
date| |
Summary|[4.2 regression] performance|[4.2 regression] loop
|regression with double on |unrolling performance
|SPE2 |regression
Target Milestone|--- |4.2.0
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256
^ permalink raw reply [flat|nested] 44+ messages in thread
* [Bug middle-end/29256] [4.2 regression] loop unrolling performance regression
2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
` (2 preceding siblings ...)
2006-09-28 3:00 ` [Bug middle-end/29256] [4.2 regression] loop unrolling performance regression pinskia at gcc dot gnu dot org
@ 2006-09-28 11:08 ` rguenth at gcc dot gnu dot org
2006-09-28 11:34 ` rakdver at gcc dot gnu dot org
` (38 subsequent siblings)
42 siblings, 0 replies; 44+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2006-09-28 11:08 UTC (permalink / raw)
To: gcc-bugs
------- Comment #4 from rguenth at gcc dot gnu dot org 2006-09-28 11:08 -------
On x86_64 4.2 decides to unroll 9 times while on 4.1 it unrolls 8 times. This
is
a code-size regression, but other than that? The 4.2 version runs slightly
faster than the 4.1 version, though the difference may be in the noise.
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |rguenth at gcc dot gnu dot
| |org, rakdver at gcc dot gnu
| |dot org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256
^ permalink raw reply [flat|nested] 44+ messages in thread
* [Bug middle-end/29256] [4.2 regression] loop unrolling performance regression
2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
` (3 preceding siblings ...)
2006-09-28 11:08 ` rguenth at gcc dot gnu dot org
@ 2006-09-28 11:34 ` rakdver at gcc dot gnu dot org
2006-09-28 13:47 ` pinskia at gcc dot gnu dot org
` (37 subsequent siblings)
42 siblings, 0 replies; 44+ messages in thread
From: rakdver at gcc dot gnu dot org @ 2006-09-28 11:34 UTC (permalink / raw)
To: gcc-bugs
------- Comment #5 from rakdver at gcc dot gnu dot org 2006-09-28 11:34 -------
(In reply to comment #4)
> On x86_64 4.2 decides to unroll 9 times while on 4.1 it unrolls 8 times. This
> is
> a code-size regression, but other than that? The 4.2 version runs slightly
> faster than the 4.1 version, though the difference may be in the noise.
Choosing 9 instead of 8 looks weird, though :-). The reason is following:
jump threading in vrp2 pass peels one iteration of the loop. With this change,
unrolling by factor of 9 creates smaller code (only one extra iteration needs
to be peeled to make the number of iterations divisible by 9, while one would
need to peel 7 more iterations to make it divisible by 8).
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256
^ permalink raw reply [flat|nested] 44+ messages in thread
* [Bug middle-end/29256] [4.2 regression] loop unrolling performance regression
2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
` (4 preceding siblings ...)
2006-09-28 11:34 ` rakdver at gcc dot gnu dot org
@ 2006-09-28 13:47 ` pinskia at gcc dot gnu dot org
2006-09-28 14:03 ` rguenth at gcc dot gnu dot org
` (36 subsequent siblings)
42 siblings, 0 replies; 44+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2006-09-28 13:47 UTC (permalink / raw)
To: gcc-bugs
------- Comment #6 from pinskia at gcc dot gnu dot org 2006-09-28 13:47 -------
(In reply to comment #4)
> On x86_64 4.2 decides to unroll 9 times while on 4.1 it unrolls 8 times. This
> is
> a code-size regression, but other than that? The 4.2 version runs slightly
> faster than the 4.1 version, though the difference may be in the noise.
No, no, no, I and Edmar are not complaining about how many times it unrolled
but the use of index addressing mode instead of offset addressing mode for the
stores and the extra adds.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256
^ permalink raw reply [flat|nested] 44+ messages in thread
* [Bug middle-end/29256] [4.2 regression] loop unrolling performance regression
2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
` (5 preceding siblings ...)
2006-09-28 13:47 ` pinskia at gcc dot gnu dot org
@ 2006-09-28 14:03 ` rguenth at gcc dot gnu dot org
2006-09-28 14:08 ` pinskia at gcc dot gnu dot org
` (35 subsequent siblings)
42 siblings, 0 replies; 44+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2006-09-28 14:03 UTC (permalink / raw)
To: gcc-bugs
------- Comment #7 from rguenth at gcc dot gnu dot org 2006-09-28 14:02 -------
Oh, but those do not happen on x86_64. So this is a target issue really.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256
^ permalink raw reply [flat|nested] 44+ messages in thread
* [Bug middle-end/29256] [4.2 regression] loop unrolling performance regression
2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
` (6 preceding siblings ...)
2006-09-28 14:03 ` rguenth at gcc dot gnu dot org
@ 2006-09-28 14:08 ` pinskia at gcc dot gnu dot org
2006-09-28 14:11 ` rguenth at gcc dot gnu dot org
` (34 subsequent siblings)
42 siblings, 0 replies; 44+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2006-09-28 14:08 UTC (permalink / raw)
To: gcc-bugs
------- Comment #8 from pinskia at gcc dot gnu dot org 2006-09-28 14:08 -------
D.1563 = -&a;
MEM[base: (int *) D.1563 + &c, index: D.1562] = MEM[base: D.1562];
WTFFFFFFF
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256
^ permalink raw reply [flat|nested] 44+ messages in thread
* [Bug middle-end/29256] [4.2 regression] loop unrolling performance regression
2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
` (7 preceding siblings ...)
2006-09-28 14:08 ` pinskia at gcc dot gnu dot org
@ 2006-09-28 14:11 ` rguenth at gcc dot gnu dot org
2006-09-28 14:15 ` rakdver at gcc dot gnu dot org
` (33 subsequent siblings)
42 siblings, 0 replies; 44+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2006-09-28 14:11 UTC (permalink / raw)
To: gcc-bugs
------- Comment #9 from rguenth at gcc dot gnu dot org 2006-09-28 14:11 -------
Oh, didn't I fix this? See PR26726.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256
^ permalink raw reply [flat|nested] 44+ messages in thread
* [Bug middle-end/29256] [4.2 regression] loop unrolling performance regression
2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
` (8 preceding siblings ...)
2006-09-28 14:11 ` rguenth at gcc dot gnu dot org
@ 2006-09-28 14:15 ` rakdver at gcc dot gnu dot org
2006-09-28 14:16 ` [Bug middle-end/29256] [4.2 regression] loop " pinskia at gcc dot gnu dot org
` (32 subsequent siblings)
42 siblings, 0 replies; 44+ messages in thread
From: rakdver at gcc dot gnu dot org @ 2006-09-28 14:15 UTC (permalink / raw)
To: gcc-bugs
------- Comment #10 from rakdver at gcc dot gnu dot org 2006-09-28 14:15 -------
(In reply to comment #8)
> D.1563 = -&a;
> MEM[base: (int *) D.1563 + &c, index: D.1562] = MEM[base: D.1562];
>
> WTFFFFFFF
ivopts are having fun :-) On the other hand, this is (one of several possible)
cheapest ways how to express the code, and it should not affect creation of
offsetted modes on RTL, so although this is indeed somewhat curious (well, bug
in fact, from reasons unrelated to the problem covered by this PR), it is not
the cause of this problem.
On x86, tree optimizers seem to do just fine, producing
MEM[symbol: c, index: D.1569, step: 8B] = MEM[symbol: a, index: D.1569, step:
8B];
However, on RTL, we fail to create offsetted version of this addressing mode
after unrolling.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256
^ permalink raw reply [flat|nested] 44+ messages in thread
* [Bug middle-end/29256] [4.2 regression] loop performance regression
2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
` (9 preceding siblings ...)
2006-09-28 14:15 ` rakdver at gcc dot gnu dot org
@ 2006-09-28 14:16 ` pinskia at gcc dot gnu dot org
2006-09-28 14:21 ` rakdver at gcc dot gnu dot org
` (31 subsequent siblings)
42 siblings, 0 replies; 44+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2006-09-28 14:16 UTC (permalink / raw)
To: gcc-bugs
------- Comment #11 from pinskia at gcc dot gnu dot org 2006-09-28 14:16 -------
(In reply to comment #9)
> Oh, didn't I fix this? See PR26726.
This is unrelated to that as the trees produced is defined but just looks weird
and really the one IV selection is messed up. It should have chosen two IVs
for this loop instead of just one.
Actually unrolling is not need to produced the bad code:
.L2:
lwz 0,0(9)
stwx 0,11,9
addi 9,9,4
bdnz .L2
I bet a beer that loop.c actually fixed this crap up before.
--
pinskia at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|[4.2 regression] loop |[4.2 regression] loop
|unrolling performance |performance regression
|regression |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256
^ permalink raw reply [flat|nested] 44+ messages in thread
* [Bug middle-end/29256] [4.2 regression] loop performance regression
2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
` (10 preceding siblings ...)
2006-09-28 14:16 ` [Bug middle-end/29256] [4.2 regression] loop " pinskia at gcc dot gnu dot org
@ 2006-09-28 14:21 ` rakdver at gcc dot gnu dot org
2006-09-28 14:35 ` pinskia at gcc dot gnu dot org
` (30 subsequent siblings)
42 siblings, 0 replies; 44+ messages in thread
From: rakdver at gcc dot gnu dot org @ 2006-09-28 14:21 UTC (permalink / raw)
To: gcc-bugs
------- Comment #12 from rakdver at gcc dot gnu dot org 2006-09-28 14:21 -------
(In reply to comment #11)
> (In reply to comment #9)
> > Oh, didn't I fix this? See PR26726.
> This is unrelated to that as the trees produced is defined but just looks weird
> and really the one IV selection is messed up. It should have chosen two IVs
> for this loop instead of just one.
> Actually unrolling is not need to produced the bad code:
> .L2:
> lwz 0,0(9)
> stwx 0,11,9
> addi 9,9,4
> bdnz .L2
> I bet a beer that loop.c actually fixed this crap up before.
I am bad at reading ppc assembler; could you please explain what exactly is
wrong with the code you present?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256
^ permalink raw reply [flat|nested] 44+ messages in thread
* [Bug middle-end/29256] [4.2 regression] loop performance regression
2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
` (11 preceding siblings ...)
2006-09-28 14:21 ` rakdver at gcc dot gnu dot org
@ 2006-09-28 14:35 ` pinskia at gcc dot gnu dot org
2006-09-28 14:40 ` rakdver at gcc dot gnu dot org
` (29 subsequent siblings)
42 siblings, 0 replies; 44+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2006-09-28 14:35 UTC (permalink / raw)
To: gcc-bugs
------- Comment #13 from pinskia at gcc dot gnu dot org 2006-09-28 14:34 -------
(In reply to comment #12)
> (In reply to comment #11)
> > (In reply to comment #9)
> > > Oh, didn't I fix this? See PR26726.
> > This is unrelated to that as the trees produced is defined but just looks weird
> > and really the one IV selection is messed up. It should have chosen two IVs
> > for this loop instead of just one.
> > Actually unrolling is not need to produced the bad code:
> > .L2:
> > lwz 0,0(9)
> > stwx 0,11,9
> > addi 9,9,4
> > bdnz .L2
> > I bet a beer that loop.c actually fixed this crap up before.
>
> I am bad at reading ppc assembler; could you please explain what exactly is
> wrong with the code you present?
One, there are two adds still there (just one is implicated)
so why not do the loop as:
.L2:
lwz r0,0(r9)
stw r0,0(r11)
addi r9,r9,4
addi r11,r11,4
bdnz .L2
Or:
.L2:
lwxz r0,r9,r12
stwx r0,r11,r12
addi r12,r12,4
bdnz .L2
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256
^ permalink raw reply [flat|nested] 44+ messages in thread
* [Bug middle-end/29256] [4.2 regression] loop performance regression
2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
` (12 preceding siblings ...)
2006-09-28 14:35 ` pinskia at gcc dot gnu dot org
@ 2006-09-28 14:40 ` rakdver at gcc dot gnu dot org
2006-09-28 14:44 ` rakdver at gcc dot gnu dot org
` (28 subsequent siblings)
42 siblings, 0 replies; 44+ messages in thread
From: rakdver at gcc dot gnu dot org @ 2006-09-28 14:40 UTC (permalink / raw)
To: gcc-bugs
------- Comment #14 from rakdver at gcc dot gnu dot org 2006-09-28 14:40 -------
> > > for this loop instead of just one.
> > > Actually unrolling is not need to produced the bad code:
> > > .L2:
> > > lwz 0,0(9)
> > > stwx 0,11,9
> > > addi 9,9,4
> > > bdnz .L2
> > > I bet a beer that loop.c actually fixed this crap up before.
> >
> > I am bad at reading ppc assembler; could you please explain what exactly is
> > wrong with the code you present?
>
> One, there are two adds still there (just one is implicated)
> so why not do the loop as:
there is only one add, as far as I can see.
> .L2:
> lwz r0,0(r9)
> stw r0,0(r11)
> addi r9,r9,4
> addi r11,r11,4
> bdnz .L2
Otoh, this seems worse to me (one more add).
> Or:
> .L2:
> lwxz r0,r9,r12
> stwx r0,r11,r12
> addi r12,r12,4
> bdnz .L2
Yes, this would be about the same. Still, ivopts chose one of the best
possible ways, so I do not see what you are complaining about so much.
The unrolled case is something different -- of course we should use offsetted
modes there.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256
^ permalink raw reply [flat|nested] 44+ messages in thread
* [Bug middle-end/29256] [4.2 regression] loop performance regression
2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
` (13 preceding siblings ...)
2006-09-28 14:40 ` rakdver at gcc dot gnu dot org
@ 2006-09-28 14:44 ` rakdver at gcc dot gnu dot org
2006-09-28 14:50 ` rakdver at gcc dot gnu dot org
` (27 subsequent siblings)
42 siblings, 0 replies; 44+ messages in thread
From: rakdver at gcc dot gnu dot org @ 2006-09-28 14:44 UTC (permalink / raw)
To: gcc-bugs
--
rakdver at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
AssignedTo|unassigned at gcc dot gnu |rakdver at gcc dot gnu dot
|dot org |org
Status|NEW |ASSIGNED
Last reconfirmed|2006-09-28 02:59:57 |2006-09-28 14:44:02
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256
^ permalink raw reply [flat|nested] 44+ messages in thread
* [Bug middle-end/29256] [4.2 regression] loop performance regression
2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
` (14 preceding siblings ...)
2006-09-28 14:44 ` rakdver at gcc dot gnu dot org
@ 2006-09-28 14:50 ` rakdver at gcc dot gnu dot org
2006-09-28 23:48 ` rakdver at gcc dot gnu dot org
` (26 subsequent siblings)
42 siblings, 0 replies; 44+ messages in thread
From: rakdver at gcc dot gnu dot org @ 2006-09-28 14:50 UTC (permalink / raw)
To: gcc-bugs
------- Comment #15 from rakdver at gcc dot gnu dot org 2006-09-28 14:50 -------
(In reply to comment #8)
> D.1563 = -&a;
> MEM[base: (int *) D.1563 + &c, index: D.1562] = MEM[base: D.1562];
>
> WTFFFFFFF
This is caused by my change to ivopts in
http://gcc.gnu.org/ml/gcc-patches/2006-08/msg00198.html.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256
^ permalink raw reply [flat|nested] 44+ messages in thread
* [Bug middle-end/29256] [4.2 regression] loop performance regression
2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
` (15 preceding siblings ...)
2006-09-28 14:50 ` rakdver at gcc dot gnu dot org
@ 2006-09-28 23:48 ` rakdver at gcc dot gnu dot org
2006-10-01 23:04 ` mmitchel at gcc dot gnu dot org
` (25 subsequent siblings)
42 siblings, 0 replies; 44+ messages in thread
From: rakdver at gcc dot gnu dot org @ 2006-09-28 23:48 UTC (permalink / raw)
To: gcc-bugs
------- Comment #16 from rakdver at gcc dot gnu dot org 2006-09-28 23:48 -------
Patch for the induction variable selection (that however does not fix the
problem with offsetted addressing modes not being created after unrolling):
http://gcc.gnu.org/ml/gcc-patches/2006-09/msg01308.html
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256
^ permalink raw reply [flat|nested] 44+ messages in thread
* [Bug middle-end/29256] [4.2 regression] loop performance regression
2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
` (16 preceding siblings ...)
2006-09-28 23:48 ` rakdver at gcc dot gnu dot org
@ 2006-10-01 23:04 ` mmitchel at gcc dot gnu dot org
2006-10-06 19:32 ` rakdver at gcc dot gnu dot org
` (24 subsequent siblings)
42 siblings, 0 replies; 44+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2006-10-01 23:04 UTC (permalink / raw)
To: gcc-bugs
--
mmitchel at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Priority|P3 |P2
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256
^ permalink raw reply [flat|nested] 44+ messages in thread
* [Bug middle-end/29256] [4.2 regression] loop performance regression
2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
` (17 preceding siblings ...)
2006-10-01 23:04 ` mmitchel at gcc dot gnu dot org
@ 2006-10-06 19:32 ` rakdver at gcc dot gnu dot org
2007-05-14 21:37 ` [Bug middle-end/29256] [4.2/4.3 " mmitchel at gcc dot gnu dot org
` (23 subsequent siblings)
42 siblings, 0 replies; 44+ messages in thread
From: rakdver at gcc dot gnu dot org @ 2006-10-06 19:32 UTC (permalink / raw)
To: gcc-bugs
------- Comment #17 from rakdver at gcc dot gnu dot org 2006-10-06 19:32 -------
Subject: Bug 29256
Author: rakdver
Date: Fri Oct 6 19:32:04 2006
New Revision: 117513
URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=117513
Log:
PR middle-end/29256
* tree-ssa-loop-ivopts.c (determine_base_object): Handle pointers
casted to integer type.
(get_address_cost): Decrease cost of [symbol + index] addressing modes
if they are significantly more expensive than [reg + index] ones.
* gcc.dg/tree-ssa/loop-19.c: New test.
Added:
trunk/gcc/testsuite/gcc.dg/tree-ssa/loop-19.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-ssa-loop-ivopts.c
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256
^ permalink raw reply [flat|nested] 44+ messages in thread
* [Bug middle-end/29256] [4.2/4.3 regression] loop performance regression
2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
` (18 preceding siblings ...)
2006-10-06 19:32 ` rakdver at gcc dot gnu dot org
@ 2007-05-14 21:37 ` mmitchel at gcc dot gnu dot org
2007-07-20 3:50 ` mmitchel at gcc dot gnu dot org
` (22 subsequent siblings)
42 siblings, 0 replies; 44+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2007-05-14 21:37 UTC (permalink / raw)
To: gcc-bugs
------- Comment #18 from mmitchel at gcc dot gnu dot org 2007-05-14 22:26 -------
Will not be fixed in 4.2.0; retargeting at 4.2.1.
--
mmitchel at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|4.2.0 |4.2.1
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256
^ permalink raw reply [flat|nested] 44+ messages in thread
* [Bug middle-end/29256] [4.2/4.3 regression] loop performance regression
2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
` (19 preceding siblings ...)
2007-05-14 21:37 ` [Bug middle-end/29256] [4.2/4.3 " mmitchel at gcc dot gnu dot org
@ 2007-07-20 3:50 ` mmitchel at gcc dot gnu dot org
2007-10-09 19:25 ` mmitchel at gcc dot gnu dot org
` (21 subsequent siblings)
42 siblings, 0 replies; 44+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2007-07-20 3:50 UTC (permalink / raw)
To: gcc-bugs
--
mmitchel at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|4.2.1 |4.2.2
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256
^ permalink raw reply [flat|nested] 44+ messages in thread
* [Bug middle-end/29256] [4.2/4.3 regression] loop performance regression
2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
` (20 preceding siblings ...)
2007-07-20 3:50 ` mmitchel at gcc dot gnu dot org
@ 2007-10-09 19:25 ` mmitchel at gcc dot gnu dot org
2008-01-11 5:16 ` ghazi at gcc dot gnu dot org
` (20 subsequent siblings)
42 siblings, 0 replies; 44+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2007-10-09 19:25 UTC (permalink / raw)
To: gcc-bugs
------- Comment #19 from mmitchel at gcc dot gnu dot org 2007-10-09 19:21 -------
Change target milestone to 4.2.3, as 4.2.2 has been released.
--
mmitchel at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|4.2.2 |4.2.3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256
^ permalink raw reply [flat|nested] 44+ messages in thread
* [Bug middle-end/29256] [4.2/4.3 regression] loop performance regression
2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
` (21 preceding siblings ...)
2007-10-09 19:25 ` mmitchel at gcc dot gnu dot org
@ 2008-01-11 5:16 ` ghazi at gcc dot gnu dot org
2008-01-11 6:04 ` rakdver at kam dot mff dot cuni dot cz
` (19 subsequent siblings)
42 siblings, 0 replies; 44+ messages in thread
From: ghazi at gcc dot gnu dot org @ 2008-01-11 5:16 UTC (permalink / raw)
To: gcc-bugs
------- Comment #20 from ghazi at gcc dot gnu dot org 2008-01-11 04:21 -------
Is the testcase gcc.dg/tree-ssa/loop-19.c supposed to work with -fpic/-fPIC?
I'm getting failures on mainline and 4.2 with x86_64, and only on 4.2 with
i686. Mainline i686 seems to work though.
Fails:
http://gcc.gnu.org/ml/gcc-testresults/2008-01/msg00383.html
http://gcc.gnu.org/ml/gcc-testresults/2008-01/msg00365.html
http://gcc.gnu.org/ml/gcc-testresults/2008-01/msg00410.html
works:
http://gcc.gnu.org/ml/gcc-testresults/2008-01/msg00366.html
Thanks,
--Kaveh
--
ghazi at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |ghazi at gcc dot gnu dot org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256
^ permalink raw reply [flat|nested] 44+ messages in thread
* [Bug middle-end/29256] [4.2/4.3 regression] loop performance regression
2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
` (22 preceding siblings ...)
2008-01-11 5:16 ` ghazi at gcc dot gnu dot org
@ 2008-01-11 6:04 ` rakdver at kam dot mff dot cuni dot cz
2008-01-12 8:43 ` ghazi at gcc dot gnu dot org
` (18 subsequent siblings)
42 siblings, 0 replies; 44+ messages in thread
From: rakdver at kam dot mff dot cuni dot cz @ 2008-01-11 6:04 UTC (permalink / raw)
To: gcc-bugs
------- Comment #21 from rakdver at kam dot mff dot cuni dot cz 2008-01-11 04:44 -------
Subject: Re: [4.2/4.3 regression] loop performance regression
> Is the testcase gcc.dg/tree-ssa/loop-19.c supposed to work with -fpic/-fPIC?
not necessarily; with -fpic, both memory accesses are fully
strength-reduced, which seems to be the correct thing to do; however,
> I'm getting failures on mainline and 4.2 with x86_64, and only on 4.2 with
> i686. Mainline i686 seems to work though.
the difference in the costs of the two variants is so small that you
will basically get one of them at random. This test is not intended to
be run with -fpic.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256
^ permalink raw reply [flat|nested] 44+ messages in thread
* [Bug middle-end/29256] [4.2/4.3 regression] loop performance regression
2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
` (23 preceding siblings ...)
2008-01-11 6:04 ` rakdver at kam dot mff dot cuni dot cz
@ 2008-01-12 8:43 ` ghazi at gcc dot gnu dot org
2008-02-01 17:00 ` jsm28 at gcc dot gnu dot org
` (17 subsequent siblings)
42 siblings, 0 replies; 44+ messages in thread
From: ghazi at gcc dot gnu dot org @ 2008-01-12 8:43 UTC (permalink / raw)
To: gcc-bugs
------- Comment #22 from ghazi at gcc dot gnu dot org 2008-01-12 08:35 -------
Thanks, testsuite patch posted here:
http://gcc.gnu.org/ml/gcc-patches/2008-01/msg00530.html
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256
^ permalink raw reply [flat|nested] 44+ messages in thread
* [Bug middle-end/29256] [4.2/4.3 regression] loop performance regression
2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
` (24 preceding siblings ...)
2008-01-12 8:43 ` ghazi at gcc dot gnu dot org
@ 2008-02-01 17:00 ` jsm28 at gcc dot gnu dot org
2008-05-19 20:35 ` [Bug middle-end/29256] [4.2/4.3/4.4 " jsm28 at gcc dot gnu dot org
` (16 subsequent siblings)
42 siblings, 0 replies; 44+ messages in thread
From: jsm28 at gcc dot gnu dot org @ 2008-02-01 17:00 UTC (permalink / raw)
To: gcc-bugs
------- Comment #23 from jsm28 at gcc dot gnu dot org 2008-02-01 16:53 -------
4.2.3 is being released now, changing milestones of open bugs to 4.2.4.
--
jsm28 at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|4.2.3 |4.2.4
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256
^ permalink raw reply [flat|nested] 44+ messages in thread
* [Bug middle-end/29256] [4.2/4.3/4.4 regression] loop performance regression
2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
` (25 preceding siblings ...)
2008-02-01 17:00 ` jsm28 at gcc dot gnu dot org
@ 2008-05-19 20:35 ` jsm28 at gcc dot gnu dot org
2008-08-06 6:58 ` cnstar9988 at gmail dot com
` (15 subsequent siblings)
42 siblings, 0 replies; 44+ messages in thread
From: jsm28 at gcc dot gnu dot org @ 2008-05-19 20:35 UTC (permalink / raw)
To: gcc-bugs
------- Comment #24 from jsm28 at gcc dot gnu dot org 2008-05-19 20:22 -------
4.2.4 is being released, changing milestones to 4.2.5.
--
jsm28 at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|4.2.4 |4.2.5
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256
^ permalink raw reply [flat|nested] 44+ messages in thread
* [Bug middle-end/29256] [4.2/4.3/4.4 regression] loop performance regression
2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
` (26 preceding siblings ...)
2008-05-19 20:35 ` [Bug middle-end/29256] [4.2/4.3/4.4 " jsm28 at gcc dot gnu dot org
@ 2008-08-06 6:58 ` cnstar9988 at gmail dot com
2008-08-06 21:52 ` rakdver at gcc dot gnu dot org
` (14 subsequent siblings)
42 siblings, 0 replies; 44+ messages in thread
From: cnstar9988 at gmail dot com @ 2008-08-06 6:58 UTC (permalink / raw)
To: gcc-bugs
------- Comment #25 from cnstar9988 at gmail dot com 2008-08-06 06:57 -------
ping...
Can this be fixed before 4.3.2? thanks.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256
^ permalink raw reply [flat|nested] 44+ messages in thread
* [Bug middle-end/29256] [4.2/4.3/4.4 regression] loop performance regression
2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
` (27 preceding siblings ...)
2008-08-06 6:58 ` cnstar9988 at gmail dot com
@ 2008-08-06 21:52 ` rakdver at gcc dot gnu dot org
2008-08-06 21:55 ` rakdver at gcc dot gnu dot org
` (13 subsequent siblings)
42 siblings, 0 replies; 44+ messages in thread
From: rakdver at gcc dot gnu dot org @ 2008-08-06 21:52 UTC (permalink / raw)
To: gcc-bugs
------- Comment #26 from rakdver at gcc dot gnu dot org 2008-08-06 21:51 -------
Created an attachment (id=16036)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16036&action=view)
possible fix
One place where this can be fixed is fwprop (something like the attached
patch). I am not sure whether it is the right place, though; maybe cse should
be handling this?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256
^ permalink raw reply [flat|nested] 44+ messages in thread
* [Bug middle-end/29256] [4.2/4.3/4.4 regression] loop performance regression
2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
` (28 preceding siblings ...)
2008-08-06 21:52 ` rakdver at gcc dot gnu dot org
@ 2008-08-06 21:55 ` rakdver at gcc dot gnu dot org
2008-08-06 21:57 ` rakdver at gcc dot gnu dot org
` (12 subsequent siblings)
42 siblings, 0 replies; 44+ messages in thread
From: rakdver at gcc dot gnu dot org @ 2008-08-06 21:55 UTC (permalink / raw)
To: gcc-bugs
--
rakdver at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |bonzini at gnu dot org
AssignedTo|rakdver at gcc dot gnu dot |unassigned at gcc dot gnu
|org |dot org
Status|ASSIGNED |NEW
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256
^ permalink raw reply [flat|nested] 44+ messages in thread
* [Bug middle-end/29256] [4.2/4.3/4.4 regression] loop performance regression
2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
` (29 preceding siblings ...)
2008-08-06 21:55 ` rakdver at gcc dot gnu dot org
@ 2008-08-06 21:57 ` rakdver at gcc dot gnu dot org
2008-08-07 5:03 ` bonzini at gnu dot org
` (11 subsequent siblings)
42 siblings, 0 replies; 44+ messages in thread
From: rakdver at gcc dot gnu dot org @ 2008-08-06 21:57 UTC (permalink / raw)
To: gcc-bugs
------- Comment #27 from rakdver at gcc dot gnu dot org 2008-08-06 21:56 -------
(In reply to comment #26)
> Created an attachment (id=16036)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16036&action=view) [edit]
> possible fix
>
> One place where this can be fixed is fwprop (something like the attached
> patch). I am not sure whether it is the right place, though; maybe cse should
> be handling this?
Also, I only checked the problem on x86; most likely, something different is
happening on ppc.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256
^ permalink raw reply [flat|nested] 44+ messages in thread
* [Bug middle-end/29256] [4.2/4.3/4.4 regression] loop performance regression
2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
` (30 preceding siblings ...)
2008-08-06 21:57 ` rakdver at gcc dot gnu dot org
@ 2008-08-07 5:03 ` bonzini at gnu dot org
2008-10-29 17:05 ` janis at gcc dot gnu dot org
` (10 subsequent siblings)
42 siblings, 0 replies; 44+ messages in thread
From: bonzini at gnu dot org @ 2008-08-07 5:03 UTC (permalink / raw)
To: gcc-bugs
------- Comment #28 from bonzini at gnu dot org 2008-08-07 05:01 -------
fwprop seems the right place to do that indeed.
Only thing, I wonder you need to "find a location to add the constant": it
could be enough to do
*x = simplify_gen_binary (PLUS, Pmode, *x, cst_to_add);
because simplify_plus_minus should have machinery to do what you are doing
already. Indeed I wonder if this code shouldn't go in simplify_plus_minus so
that propagate_rtx would call it automatically.
Also when you compute cst_to_add you can use the _const_ version of the
simplification routine.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256
^ permalink raw reply [flat|nested] 44+ messages in thread
* [Bug middle-end/29256] [4.2/4.3/4.4 regression] loop performance regression
2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
` (31 preceding siblings ...)
2008-08-07 5:03 ` bonzini at gnu dot org
@ 2008-10-29 17:05 ` janis at gcc dot gnu dot org
2009-03-31 19:46 ` [Bug middle-end/29256] [4.3/4.4/4.5 " jsm28 at gcc dot gnu dot org
` (9 subsequent siblings)
42 siblings, 0 replies; 44+ messages in thread
From: janis at gcc dot gnu dot org @ 2008-10-29 17:05 UTC (permalink / raw)
To: gcc-bugs
------- Comment #29 from janis at gcc dot gnu dot org 2008-10-29 17:05 -------
On powerpc-linux the submitter's testcase gets better code with the patch from
comment #17, but the same testcase with the loop starting with 1 instead of
zero gets worse code. From the 4.1 branch with -O2:
.L2:
lfd 0,0(9)
addi 9,9,8
stfd 0,0(11)
addi 11,11,8
bdnz .L2
>From the 4.2 branch:
.L2:
add 9,10,0
add 11,10,8
addi 10,10,8
lfd 0,8(9)
stfd 0,8(11)
bdnz .L2
Code from current mainline is the same except for the order of the addi and lfd
instructions.
--
janis at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |janis at gcc dot gnu dot org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256
^ permalink raw reply [flat|nested] 44+ messages in thread
* [Bug middle-end/29256] [4.3/4.4/4.5 regression] loop performance regression
2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
` (32 preceding siblings ...)
2008-10-29 17:05 ` janis at gcc dot gnu dot org
@ 2009-03-31 19:46 ` jsm28 at gcc dot gnu dot org
2009-08-04 12:35 ` rguenth at gcc dot gnu dot org
` (8 subsequent siblings)
42 siblings, 0 replies; 44+ messages in thread
From: jsm28 at gcc dot gnu dot org @ 2009-03-31 19:46 UTC (permalink / raw)
To: gcc-bugs
------- Comment #30 from jsm28 at gcc dot gnu dot org 2009-03-31 19:45 -------
Closing 4.2 branch.
--
jsm28 at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|[4.2/4.3/4.4/4.5 regression]|[4.3/4.4/4.5 regression]
|loop performance regression |loop performance regression
Target Milestone|4.2.5 |4.3.4
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256
^ permalink raw reply [flat|nested] 44+ messages in thread
* [Bug middle-end/29256] [4.3/4.4/4.5 regression] loop performance regression
2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
` (33 preceding siblings ...)
2009-03-31 19:46 ` [Bug middle-end/29256] [4.3/4.4/4.5 " jsm28 at gcc dot gnu dot org
@ 2009-08-04 12:35 ` rguenth at gcc dot gnu dot org
2010-05-22 18:23 ` [Bug middle-end/29256] [4.3/4.4/4.5/4.6 " rguenth at gcc dot gnu dot org
` (7 subsequent siblings)
42 siblings, 0 replies; 44+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-08-04 12:35 UTC (permalink / raw)
To: gcc-bugs
------- Comment #31 from rguenth at gcc dot gnu dot org 2009-08-04 12:27 -------
GCC 4.3.4 is being released, adjusting target milestone.
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|4.3.4 |4.3.5
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256
^ permalink raw reply [flat|nested] 44+ messages in thread
* [Bug middle-end/29256] [4.3/4.4/4.5/4.6 regression] loop performance regression
2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
` (34 preceding siblings ...)
2009-08-04 12:35 ` rguenth at gcc dot gnu dot org
@ 2010-05-22 18:23 ` rguenth at gcc dot gnu dot org
2010-07-16 19:14 ` pthaugen at gcc dot gnu dot org
` (6 subsequent siblings)
42 siblings, 0 replies; 44+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-05-22 18:23 UTC (permalink / raw)
To: gcc-bugs
------- Comment #32 from rguenth at gcc dot gnu dot org 2010-05-22 18:11 -------
GCC 4.3.5 is being released, adjusting target milestone.
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|4.3.5 |4.3.6
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256
^ permalink raw reply [flat|nested] 44+ messages in thread
* [Bug middle-end/29256] [4.3/4.4/4.5/4.6 regression] loop performance regression
2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
` (35 preceding siblings ...)
2010-05-22 18:23 ` [Bug middle-end/29256] [4.3/4.4/4.5/4.6 " rguenth at gcc dot gnu dot org
@ 2010-07-16 19:14 ` pthaugen at gcc dot gnu dot org
2010-07-18 17:49 ` rguenth at gcc dot gnu dot org
` (5 subsequent siblings)
42 siblings, 0 replies; 44+ messages in thread
From: pthaugen at gcc dot gnu dot org @ 2010-07-16 19:14 UTC (permalink / raw)
To: gcc-bugs
------- Comment #33 from pthaugen at gcc dot gnu dot org 2010-07-16 19:14 -------
gcc.dg/tree-ssa/loop-19.c started failing on powerpc with -m64 between 7/5 and
7/7. The tree dump now looks like the following:
<bb 2>:
ivtmp.10_12 = (long unsigned int) &a[-1];
ivtmp.16_15 = (long unsigned int) &c[-1];
a.21_18 = (long unsigned int) &a;
D.2035_19 = a.21_18 + 15999992;
<bb 3>:
# ivtmp.10_9 = PHI <ivtmp.10_5(3), ivtmp.10_12(2)>
# ivtmp.16_13 = PHI <ivtmp.16_14(3), ivtmp.16_15(2)>
ivtmp.10_5 = ivtmp.10_9 + 8;
D.2032_16 = (void *) ivtmp.10_5;
D.2007_3 = MEM[(double[2000000] *)D.2032_16];
ivtmp.16_14 = ivtmp.16_13 + 8;
D.2033_17 = (void *) ivtmp.16_14;
MEM[(double[2000000] *)D.2033_17] = D.2007_3;
if (ivtmp.10_5 != D.2035_19)
goto <bb 3>;
else
goto <bb 4>;
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256
^ permalink raw reply [flat|nested] 44+ messages in thread
* [Bug middle-end/29256] [4.3/4.4/4.5/4.6 regression] loop performance regression
2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
` (36 preceding siblings ...)
2010-07-16 19:14 ` pthaugen at gcc dot gnu dot org
@ 2010-07-18 17:49 ` rguenth at gcc dot gnu dot org
2010-07-21 4:16 ` sandra at codesourcery dot com
` (4 subsequent siblings)
42 siblings, 0 replies; 44+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-07-18 17:49 UTC (permalink / raw)
To: gcc-bugs
------- Comment #34 from rguenth at gcc dot gnu dot org 2010-07-18 17:49 -------
In particular we are now back to generating the very bogus
ivtmp.10_12 = (long unsigned int) &a[-1];
ivtmp.16_15 = (long unsigned int) &c[-1];
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |sandra at codesourcery dot
| |com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256
^ permalink raw reply [flat|nested] 44+ messages in thread
* [Bug middle-end/29256] [4.3/4.4/4.5/4.6 regression] loop performance regression
2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
` (37 preceding siblings ...)
2010-07-18 17:49 ` rguenth at gcc dot gnu dot org
@ 2010-07-21 4:16 ` sandra at codesourcery dot com
2010-07-21 4:17 ` sandra at codesourcery dot com
` (3 subsequent siblings)
42 siblings, 0 replies; 44+ messages in thread
From: sandra at codesourcery dot com @ 2010-07-21 4:16 UTC (permalink / raw)
To: gcc-bugs
------- Comment #35 from sandra at codesourcery dot com 2010-07-21 04:16 -------
Created an attachment (id=21274)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21274&action=view)
-fdump-tree-ivopts-details output from r161843
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256
^ permalink raw reply [flat|nested] 44+ messages in thread
* [Bug middle-end/29256] [4.3/4.4/4.5/4.6 regression] loop performance regression
2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
` (38 preceding siblings ...)
2010-07-21 4:16 ` sandra at codesourcery dot com
@ 2010-07-21 4:17 ` sandra at codesourcery dot com
2010-07-21 4:21 ` sandra at codesourcery dot com
` (2 subsequent siblings)
42 siblings, 0 replies; 44+ messages in thread
From: sandra at codesourcery dot com @ 2010-07-21 4:17 UTC (permalink / raw)
To: gcc-bugs
------- Comment #36 from sandra at codesourcery dot com 2010-07-21 04:16 -------
Created an attachment (id=21275)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21275&action=view)
-fdump-tree-ivopts-details output from r161844
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256
^ permalink raw reply [flat|nested] 44+ messages in thread
* [Bug middle-end/29256] [4.3/4.4/4.5/4.6 regression] loop performance regression
2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
` (39 preceding siblings ...)
2010-07-21 4:17 ` sandra at codesourcery dot com
@ 2010-07-21 4:21 ` sandra at codesourcery dot com
2010-07-21 16:10 ` sandra at codesourcery dot com
2010-07-21 21:51 ` pthaugen at gcc dot gnu dot org
42 siblings, 0 replies; 44+ messages in thread
From: sandra at codesourcery dot com @ 2010-07-21 4:21 UTC (permalink / raw)
To: gcc-bugs
------- Comment #37 from sandra at codesourcery dot com 2010-07-21 04:21 -------
It seems like the change was introduced by my patch for PR42505 in r161844.
But, it is correctly choosing the lower-cost candidate set -- the problem is in
the cost model, which was unchanged from r161843. Take a look at the
"Use-candidate costs" section of the dump. Those costs with negative values
(like -7) look very suspicious to me.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256
^ permalink raw reply [flat|nested] 44+ messages in thread
* [Bug middle-end/29256] [4.3/4.4/4.5/4.6 regression] loop performance regression
2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
` (40 preceding siblings ...)
2010-07-21 4:21 ` sandra at codesourcery dot com
@ 2010-07-21 16:10 ` sandra at codesourcery dot com
2010-07-21 21:51 ` pthaugen at gcc dot gnu dot org
42 siblings, 0 replies; 44+ messages in thread
From: sandra at codesourcery dot com @ 2010-07-21 16:10 UTC (permalink / raw)
To: gcc-bugs
------- Comment #38 from sandra at codesourcery dot com 2010-07-21 16:08 -------
On reading the code again, I think the -7 is coming from the can_autoinc case
in determine_use_iv_cost_address. I also think it is correct to prefer
autoinc. E.g., here's the generated code for the loop in r161843:
.L2:
addi 11,8,9216
ldx 0,10,9
stdx 0,11,9
addi 9,9,8
bdnz .L2
and in r161844:
.L2:
ldu 0,8(11)
stdu 0,8(9)
bdnz .L2
I'm no expert on powerpc architecture, but 3 instructions versus 5 looks like a
win to me. Bit-rotten test case?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256
^ permalink raw reply [flat|nested] 44+ messages in thread
* [Bug middle-end/29256] [4.3/4.4/4.5/4.6 regression] loop performance regression
2006-09-27 18:29 [Bug c/29256] New: [4.2.0 performance regression] edmar at freescale dot com
` (41 preceding siblings ...)
2010-07-21 16:10 ` sandra at codesourcery dot com
@ 2010-07-21 21:51 ` pthaugen at gcc dot gnu dot org
42 siblings, 0 replies; 44+ messages in thread
From: pthaugen at gcc dot gnu dot org @ 2010-07-21 21:51 UTC (permalink / raw)
To: gcc-bugs
------- Comment #39 from pthaugen at gcc dot gnu dot org 2010-07-21 21:51 -------
(In reply to comment #38)
>
> .L2:
> addi 11,8,9216
> ldx 0,10,9
> stdx 0,11,9
> addi 9,9,8
> bdnz .L2
>
> and in r161844:
>
> .L2:
> ldu 0,8(11)
> stdu 0,8(9)
> bdnz .L2
>
> I'm no expert on powerpc architecture, but 3 instructions versus 5 looks like a
> win to me. Bit-rotten test case?
>
The 'addi 11,8,9216' in the first loop is invariant and should be hoisted out
of the loop. Separate issue?
As for the issue of indexed ld/st+addi vs. update-form ld/st. The update forms
are cracked into ld/st+addi which imposes a scheduling restriction on them
(cracked insns start a dispatch group). May not make any difference in this
simple loop, but indexed ld/st+addi may have better scheduling opportunities
were there more insns in the loop.
This testcase also appears to be dependent on -mcpu value. Specifying
-mcpu=power7 the testcase passes (although there's still the issue of invariant
addi in the loop). And if I change to use -m32, then it only fails for
-mcpu=power6.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256
^ permalink raw reply [flat|nested] 44+ messages in thread