public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/43174] New: Teaching SCEV about ADDR_EXPR causes regression
@ 2010-02-25 14:43 amonakov at gcc dot gnu dot org
2010-02-25 15:26 ` [Bug tree-optimization/43174] " steven at gcc dot gnu dot org
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: amonakov at gcc dot gnu dot org @ 2010-02-25 14:43 UTC (permalink / raw)
To: gcc-bugs
With patch from here: http://gcc.gnu.org/ml/gcc-patches/2010-02/msg00668.html
IVopts begin to create IVs for expressions like &a0[i][j][0]. This may cause
regressions in stack usage and code size (also possibly speed). Test case:
/* ---8<--- */
enum {N=123};
int a0[N][N][N], a1[N][N][N], a2[N][N][N], a3[N][N][N],
a4[N][N][N], a5[N][N][N], a6[N][N][N], a7[N][N][N];
int foo() {
int i, j, k, s = 0;
for (i = 0; i < N; i++)
for (j = 0; j < N; j++)
for (k = 0; k < N; k++) {
s += a0[i][j][k]; s += a1[i][j][k]; s += a2[i][j][k]; s += a3[i][j][k];
s += a4[i][j][k]; s += a5[i][j][k]; s += a6[i][j][k]; s += a7[i][j][k];
}
return s;
}
/* ---8<--- */
Without the patch, IVopts produce one IV for j loop and 8 IVs for k loop. With
the patch, IVopts additionally produce 8 IVs for j loop (with 123*4 increment),
4 of which live on stack (on x86-64, -O2).
Creation of IVs that live on stack is likely due to inexact register pressure
estimation in IVopts.
However, it would be nice if IVopts could notice that it's cheaper to take the
final value of inner loop IVs (e.g. &a0[i][j][k]) instead of incrementing IV
holding &a0[i][j][0] by 123*4. It would decrease register pressure and allow
to generate perfect code for the test case.
--
Summary: Teaching SCEV about ADDR_EXPR causes regression
Product: gcc
Version: 4.5.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: amonakov at gcc dot gnu dot org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43174
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug tree-optimization/43174] Teaching SCEV about ADDR_EXPR causes regression
2010-02-25 14:43 [Bug tree-optimization/43174] New: Teaching SCEV about ADDR_EXPR causes regression amonakov at gcc dot gnu dot org
@ 2010-02-25 15:26 ` steven at gcc dot gnu dot org
2010-03-01 17:44 ` amonakov at gcc dot gnu dot org
2010-03-03 9:21 ` rakdver at kam dot mff dot cuni dot cz
2 siblings, 0 replies; 4+ messages in thread
From: steven at gcc dot gnu dot org @ 2010-02-25 15:26 UTC (permalink / raw)
To: gcc-bugs
--
steven at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |rakdver at gcc dot gnu dot
| |org
Status|UNCONFIRMED |NEW
Ever Confirmed|0 |1
Last reconfirmed|0000-00-00 00:00:00 |2010-02-25 15:26:20
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43174
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug tree-optimization/43174] Teaching SCEV about ADDR_EXPR causes regression
2010-02-25 14:43 [Bug tree-optimization/43174] New: Teaching SCEV about ADDR_EXPR causes regression amonakov at gcc dot gnu dot org
2010-02-25 15:26 ` [Bug tree-optimization/43174] " steven at gcc dot gnu dot org
@ 2010-03-01 17:44 ` amonakov at gcc dot gnu dot org
2010-03-03 9:21 ` rakdver at kam dot mff dot cuni dot cz
2 siblings, 0 replies; 4+ messages in thread
From: amonakov at gcc dot gnu dot org @ 2010-03-01 17:44 UTC (permalink / raw)
To: gcc-bugs
------- Comment #1 from amonakov at gcc dot gnu dot org 2010-03-01 17:43 -------
Created an attachment (id=20001)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20001&action=view)
Simplify increments in IVopts using final values of inner loop IVs
A quick & dirty attempt to implement register pressure reduction in outer loops
by using final values of inner loop IVs. Currently, given
for (i = 0; i < N; i++)
for (j = 0; j < N; j++)
s += a[i][j];
we generate something like
<bb1>
L1:
s.0 = PHI(0, s.2)
i.0 = PHI(0, i.1)
ivtmp.0 = &a[i.0][0]
<bb2>
L2:
s.1 = PHI(s.0, s.2)
j.0 = PHI(122, j.1)
ivtmp.1 = PHI(ivtmp.0, ivtmp.2)
s.2 = s.1 + MEM(ivtmp.1)
ivtmp.2 = ivtmp.1 + 4
j.1 = j.0 - 1
if (j.1 >= 0) goto L2
<bb3>
i.1 = i.0 + 1
if (i.1 <= 122) goto L1
This together with the patch mentioned in the previous comment allows to
generate:
ivtmp.0 = &a[0][0]
<bb1>
L1:
s.0 = PHI(0, s.2)
i.0 = PHI(122, i.1)
ivtmp.1 = PHI(ivtmp.0, ivtmp.4)
<bb2>
L2:
s.1 = PHI(s.0, s.2)
j.0 = PHI(122, j.1)
ivtmp.2 = PHI(ivtmp.1, ivtmp.3)
s.2 = s.1 + MEM(ivtmp.2)
ivtmp.3 = ivtmp.2 + 4
j.1 = j.0 - 1
if (j.1 >= 0) goto L2
<bb3>
ivtmp.4 = ivtmp.3 // would be ivtmp.4 = ivtmp.1 + stride
i.1 = i.0 - 1
if (i.1 >= 0) goto L1
The improvement is that ivtmp.1 is not live across the inner loop.
The approach is to store final values of IVs in a hashtable, mapping SSA_NAME
of initial value in the preheader to aff_tree with final value, and then try to
replace increments of new IVs with uses of IVs from inner loops (currently I
just implemented a brute force loop over all IV uses to find a useful entry in
that hashtable).
Does this make sense and sound acceptable?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43174
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug tree-optimization/43174] Teaching SCEV about ADDR_EXPR causes regression
2010-02-25 14:43 [Bug tree-optimization/43174] New: Teaching SCEV about ADDR_EXPR causes regression amonakov at gcc dot gnu dot org
2010-02-25 15:26 ` [Bug tree-optimization/43174] " steven at gcc dot gnu dot org
2010-03-01 17:44 ` amonakov at gcc dot gnu dot org
@ 2010-03-03 9:21 ` rakdver at kam dot mff dot cuni dot cz
2 siblings, 0 replies; 4+ messages in thread
From: rakdver at kam dot mff dot cuni dot cz @ 2010-03-03 9:21 UTC (permalink / raw)
To: gcc-bugs
------- Comment #2 from rakdver at kam dot mff dot cuni dot cz 2010-03-03 09:21 -------
Subject: Re: Teaching SCEV about ADDR_EXPR
causes regression
> This together with the patch mentioned in the previous comment allows to
> generate:
> ivtmp.0 = &a[0][0]
> <bb1>
> L1:
> s.0 = PHI(0, s.2)
> i.0 = PHI(122, i.1)
> ivtmp.1 = PHI(ivtmp.0, ivtmp.4)
> <bb2>
> L2:
> s.1 = PHI(s.0, s.2)
> j.0 = PHI(122, j.1)
> ivtmp.2 = PHI(ivtmp.1, ivtmp.3)
> s.2 = s.1 + MEM(ivtmp.2)
> ivtmp.3 = ivtmp.2 + 4
> j.1 = j.0 - 1
> if (j.1 >= 0) goto L2
> <bb3>
> ivtmp.4 = ivtmp.3 // would be ivtmp.4 = ivtmp.1 + stride
> i.1 = i.0 - 1
> if (i.1 >= 0) goto L1
>
> The improvement is that ivtmp.1 is not live across the inner loop.
>
> The approach is to store final values of IVs in a hashtable, mapping SSA_NAME
> of initial value in the preheader to aff_tree with final value, and then try to
> replace increments of new IVs with uses of IVs from inner loops (currently I
> just implemented a brute force loop over all IV uses to find a useful entry in
> that hashtable).
> Does this make sense and sound acceptable?
the approach seems ok. However, it is not immediately clear that performing
the
replacement is a good idea -- it trades of register pressure for creating new
dependences, i.e., it makes register allocation easier, but scheduling harder.
So, some performance testing is necessary to check this,
Zdenek
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43174
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2010-03-03 9:21 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-02-25 14:43 [Bug tree-optimization/43174] New: Teaching SCEV about ADDR_EXPR causes regression amonakov at gcc dot gnu dot org
2010-02-25 15:26 ` [Bug tree-optimization/43174] " steven at gcc dot gnu dot org
2010-03-01 17:44 ` amonakov at gcc dot gnu dot org
2010-03-03 9:21 ` rakdver at kam dot mff dot cuni dot cz
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).