mips64 gcc 3.3.6 problem

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* mips64 gcc 3.3.6 problem
@ 2009-08-19 11:53 Sergey Anosov
  2009-08-19 11:55 ` Paolo Carlini
  2009-08-19 12:13 ` complete_unrolli / complete_unroll Albert Cohen
  0 siblings, 2 replies; 16+ messages in thread
From: Sergey Anosov @ 2009-08-19 11:53 UTC (permalink / raw)
  To: gcc

Hi all!

I've made a toolchain for mips64el - binutils 2.17.90 + gcc 3.3.6 + glibc 2.3.6.
I've successfully compile a linux 2.16.62 kernel and run it in qemu.
But when I try to execute dynamically compiled "Hello world" program - I' ve got a SIGSEGV"

do_page_fault() #2: sending SIGSEGV to a.out for invalid write access to
000000555556a7a7 (epc == 0000005555558010, ra == 0000005555557cd0)
@@@ do_page_fault
Segmentation fault

Statically linked executables are running without any problem!

If I use gcc 4.1.3 + glibc 2.7 + binutils 2.19 - everything ok.
I tried to install binutils 2.19 instead of binutils 2.17 (binutils 2.19, gcc 3.3.6 and glibc 2.3.6) - and dynamically linked programs are running.

So, I think it is a compiler problem, isn'it?
Does anybody have such problem? Or mips64 platform and gcc 3.3.6 are not compatible?

Thanks!

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: mips64 gcc 3.3.6 problem
  2009-08-19 11:53 mips64 gcc 3.3.6 problem Sergey Anosov
@ 2009-08-19 11:55 ` Paolo Carlini
  2009-08-19 12:13 ` complete_unrolli / complete_unroll Albert Cohen
  1 sibling, 0 replies; 16+ messages in thread
From: Paolo Carlini @ 2009-08-19 11:55 UTC (permalink / raw)
  To: Sergey Anosov; +Cc: gcc

Sergey Anosov wrote:
> Does anybody have such problem? Or mips64 platform and gcc 3.3.6 are not compatible?
>   
If I were you, considering how old and currently completely unmaintained
it is, I would leave gcc3.3.x alone...

Paolo.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* complete_unrolli / complete_unroll
  2009-08-19 11:53 mips64 gcc 3.3.6 problem Sergey Anosov
  2009-08-19 11:55 ` Paolo Carlini
@ 2009-08-19 12:13 ` Albert Cohen
  2009-08-19 12:33   ` Richard Guenther
  1 sibling, 1 reply; 16+ messages in thread
From: Albert Cohen @ 2009-08-19 12:13 UTC (permalink / raw)
  To: gcc

When debugging graphite, we ran into code bloat issues due to
pass_complete_unrolli being called very early in the non-ipa
optimization sequence. Much later, the full-blown pass_complete_unroll
is scheduled, and this one does not do any harm.

Strangely, this early unrolling pass (tuned to only unroll inner loops)
is only enabled at -O3, independently of the -funroll-loops flag.

Does anyone remember why it is there, for which platform it is useful,
and what are the perf regressions if we remove it?

My guess is that it may only harm... disabling or damaging the
effectivenesss of the (loop-level) vectorizer and increasing compilation
time.

Thanks,
Albert

PS: When this question is solved, it will also be interesting to start a
serious discussion on how to improve the flexibility in customizing pass
ordering and parameterization of passes depending on the target. Grigori
Fursin's work shows the strong benefits and already provides a working
prototype. This question is independent of whether the customization is
done by experts or machine-learning/statistical techniques.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: complete_unrolli / complete_unroll
  2009-08-19 12:13 ` complete_unrolli / complete_unroll Albert Cohen
@ 2009-08-19 12:33   ` Richard Guenther
  2009-08-19 12:35     ` Richard Guenther
  2009-08-19 13:56     ` Albert Cohen
  0 siblings, 2 replies; 16+ messages in thread
From: Richard Guenther @ 2009-08-19 12:33 UTC (permalink / raw)
  To: Albert Cohen; +Cc: gcc

2009/8/19 Albert Cohen <Albert.Cohen@inria.fr>:
> When debugging graphite, we ran into code bloat issues due to
> pass_complete_unrolli being called very early in the non-ipa
> optimization sequence. Much later, the full-blown pass_complete_unroll
> is scheduled, and this one does not do any harm.
>
> Strangely, this early unrolling pass (tuned to only unroll inner loops)
> is only enabled at -O3, independently of the -funroll-loops flag.
>
> Does anyone remember why it is there, for which platform it is useful,
> and what are the perf regressions if we remove it?

The early loop unrolling pass is very important to remove abstraction
penalty for C++ programs that chose not to implement manual
unrolling by relying on the inliner and template metaprogramming.

In tramp3d you for example see (very much simplified, intermediate
state after some inlining):

 foo (int i, int j, int k)
{
 double a[][][];
 int index[3];
 const int dX[3] = { 1, 0, 0 };
...
 for (m=0; m<3; ++m)
  index[m] = 0;
 index[0] = i;
 index[1] = j;
 index[2] = k;
  ... a[index[0]][index[1]][index[2]];
 for (m=0; m<3; ++m)
  index[m] += dx[m];
... a[index[0]][index[1]][index[2]];

etc. to access a[i][j][k] and a[i+1][j][k].

There is an absoulte need to unroll these simple loops before
CSE otherwise loop optimizations have no chance on optimizing
anything here.

Another benchmark that degrades considerably without early
unrolling is 454.calculix (in fact that one was the reason to
add this pass).

> My guess is that it may only harm... disabling or damaging the
> effectivenesss of the (loop-level) vectorizer and increasing compilation
> time.

No it definitely does not.  But it has one small issue in that it sometimes
also unrolls an outermost loop IIRC, that could be fixed.

Richard.

>
> Thanks,
> Albert
>
> PS: When this question is solved, it will also be interesting to start a
> serious discussion on how to improve the flexibility in customizing pass
> ordering and parameterization of passes depending on the target. Grigori
> Fursin's work shows the strong benefits and already provides a working
> prototype. This question is independent of whether the customization is
> done by experts or machine-learning/statistical techniques.
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: complete_unrolli / complete_unroll
  2009-08-19 12:33   ` Richard Guenther
@ 2009-08-19 12:35     ` Richard Guenther
  2009-08-19 13:56     ` Albert Cohen
  1 sibling, 0 replies; 16+ messages in thread
From: Richard Guenther @ 2009-08-19 12:35 UTC (permalink / raw)
  To: Albert Cohen; +Cc: gcc

On Wed, Aug 19, 2009 at 2:07 PM, Richard
Guenther<richard.guenther@gmail.com> wrote:
> 2009/8/19 Albert Cohen <Albert.Cohen@inria.fr>:
>> When debugging graphite, we ran into code bloat issues due to
>> pass_complete_unrolli being called very early in the non-ipa
>> optimization sequence. Much later, the full-blown pass_complete_unroll
>> is scheduled, and this one does not do any harm.
>>
>> Strangely, this early unrolling pass (tuned to only unroll inner loops)
>> is only enabled at -O3, independently of the -funroll-loops flag.

Note it is also enabled at -O2 independently of -funroll-loops just
with the restriction to the heuristics that it may not increase code
size.  This is consistent with the other unroll pass (which even runs
unconditionally!).

Richard.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: complete_unrolli / complete_unroll
  2009-08-19 12:33   ` Richard Guenther
  2009-08-19 12:35     ` Richard Guenther
@ 2009-08-19 13:56     ` Albert Cohen
  2009-08-19 14:44       ` Albert Cohen
  1 sibling, 1 reply; 16+ messages in thread
From: Albert Cohen @ 2009-08-19 13:56 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc

Richard Guenther wrote:
> 2009/8/19 Albert Cohen <Albert.Cohen@inria.fr>:
>> When debugging graphite, we ran into code bloat issues due to
>> pass_complete_unrolli being called very early in the non-ipa
>> optimization sequence. Much later, the full-blown pass_complete_unroll
>> is scheduled, and this one does not do any harm.
>>
>> Strangely, this early unrolling pass (tuned to only unroll inner loops)
>> is only enabled at -O3, independently of the -funroll-loops flag.
>>
>> Does anyone remember why it is there, for which platform it is useful,
>> and what are the perf regressions if we remove it?
> 
> The early loop unrolling pass is very important to remove abstraction
> penalty for C++ programs that chose not to implement manual
> unrolling by relying on the inliner and template metaprogramming.
> 
> In tramp3d you for example see (very much simplified, intermediate
> state after some inlining):
> 
>  foo (int i, int j, int k)
> {
>  double a[][][];
>  int index[3];
>  const int dX[3] = { 1, 0, 0 };
> ...
>  for (m=0; m<3; ++m)
>   index[m] = 0;
>  index[0] = i;
>  index[1] = j;
>  index[2] = k;
>   ... a[index[0]][index[1]][index[2]];
>  for (m=0; m<3; ++m)
>   index[m] += dx[m];
> ... a[index[0]][index[1]][index[2]];
> 
> etc. to access a[i][j][k] and a[i+1][j][k].
> 
> There is an absoulte need to unroll these simple loops before
> CSE otherwise loop optimizations have no chance on optimizing
> anything here.
> 
> Another benchmark that degrades considerably without early
> unrolling is 454.calculix (in fact that one was the reason to
> add this pass).
> 
>> My guess is that it may only harm... disabling or damaging the
>> effectivenesss of the (loop-level) vectorizer and increasing compilation
>> time.
> 
> No it definitely does not.  But it has one small issue in that it sometimes
> also unrolls an outermost loop IIRC, that could be fixed.

Thanks a lot for the quick and detailed response.

It is more difficult than I thought, then :-( We'll think more, and
maybe come up with yet another pass ordering proposal, but definitely
this tramp3d code deserves to be processed by graphite AFTER
unrolling+cse has done its specialization trick.

Albert

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: complete_unrolli / complete_unroll
  2009-08-19 13:56     ` Albert Cohen
@ 2009-08-19 14:44       ` Albert Cohen
  2009-08-19 15:09         ` Richard Guenther
  0 siblings, 1 reply; 16+ messages in thread
From: Albert Cohen @ 2009-08-19 14:44 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc

Albert Cohen wrote:
> Thanks a lot for the quick and detailed response.
> 
> It is more difficult than I thought, then :-( We'll think more, and
> maybe come up with yet another pass ordering proposal, but definitely
> this tramp3d code deserves to be processed by graphite AFTER
> unrolling+cse has done its specialization trick.

One way out would be to make unrolli pass a little more careful. As you
suggest, the heuristic is already not quite satisfactory as it sometimes
unrolls outemost loops.

A better heursitic would be to run through the different cases where
unrolling helps specialization (e.g., the subscripts of subscripts in
the tramp3d example), and check for these patterns explicitely. But this
is not easy to implement (or to make it robust, and not too
syntax-dependent).

Albert

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: complete_unrolli / complete_unroll
  2009-08-19 14:44       ` Albert Cohen
@ 2009-08-19 15:09         ` Richard Guenther
  2009-08-19 20:30           ` Richard Guenther
  0 siblings, 1 reply; 16+ messages in thread
From: Richard Guenther @ 2009-08-19 15:09 UTC (permalink / raw)
  To: Albert Cohen; +Cc: gcc

On Wed, Aug 19, 2009 at 3:54 PM, Albert Cohen<Albert.Cohen@inria.fr> wrote:
> Albert Cohen wrote:
>> Thanks a lot for the quick and detailed response.
>>
>> It is more difficult than I thought, then :-( We'll think more, and
>> maybe come up with yet another pass ordering proposal, but definitely
>> this tramp3d code deserves to be processed by graphite AFTER
>> unrolling+cse has done its specialization trick.
>
> One way out would be to make unrolli pass a little more careful. As you
> suggest, the heuristic is already not quite satisfactory as it sometimes
> unrolls outemost loops.
>
> A better heursitic would be to run through the different cases where
> unrolling helps specialization (e.g., the subscripts of subscripts in
> the tramp3d example), and check for these patterns explicitely. But this
> is not easy to implement (or to make it robust, and not too
> syntax-dependent).

Well, one thing is to simply adjust the maximal size increase or see
if honzas improved size heuristics allow to unroll only if the estimated
size does not increase (though for calculix we depend on a very
large estimated size increase IIRC, with the old heuristics at least - there
is a testcase in the testsuite for it).

Richard.

> Albert
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: complete_unrolli / complete_unroll
  2009-08-19 15:09         ` Richard Guenther
@ 2009-08-19 20:30           ` Richard Guenther
  2009-08-20  8:45             ` Albert Cohen
  0 siblings, 1 reply; 16+ messages in thread
From: Richard Guenther @ 2009-08-19 20:30 UTC (permalink / raw)
  To: Albert Cohen; +Cc: gcc

On Wed, Aug 19, 2009 at 3:56 PM, Richard
Guenther<richard.guenther@gmail.com> wrote:
> On Wed, Aug 19, 2009 at 3:54 PM, Albert Cohen<Albert.Cohen@inria.fr> wrote:
>> Albert Cohen wrote:
>>> Thanks a lot for the quick and detailed response.
>>>
>>> It is more difficult than I thought, then :-( We'll think more, and
>>> maybe come up with yet another pass ordering proposal, but definitely
>>> this tramp3d code deserves to be processed by graphite AFTER
>>> unrolling+cse has done its specialization trick.
>>
>> One way out would be to make unrolli pass a little more careful. As you
>> suggest, the heuristic is already not quite satisfactory as it sometimes
>> unrolls outemost loops.
>>
>> A better heursitic would be to run through the different cases where
>> unrolling helps specialization (e.g., the subscripts of subscripts in
>> the tramp3d example), and check for these patterns explicitely. But this
>> is not easy to implement (or to make it robust, and not too
>> syntax-dependent).
>
> Well, one thing is to simply adjust the maximal size increase or see
> if honzas improved size heuristics allow to unroll only if the estimated
> size does not increase (though for calculix we depend on a very
> large estimated size increase IIRC, with the old heuristics at least - there
> is a testcase in the testsuite for it).

gfortran.dg/reassoc_4.f, the hottest loop from calculix.

Richard.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: complete_unrolli / complete_unroll
  2009-08-19 20:30           ` Richard Guenther
@ 2009-08-20  8:45             ` Albert Cohen
  2009-08-20  9:57               ` Richard Guenther
  0 siblings, 1 reply; 16+ messages in thread
From: Albert Cohen @ 2009-08-20  8:45 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc

Richard Guenther wrote:
> gfortran.dg/reassoc_4.f, the hottest loop from calculix.

Thanks.

This example is slightly different. Graphite should be able to handle it 
with loop fusion rather than pre-unrolling + cse. But I agree that the 
unrolling + cse approach also makes sense (and does not depend on the 
same legality constraints as loop fusion).

This makes me think of a simple, general criterion to detect cases where 
pre-unrolling of inner loop helps further cse and loop optimizations.
The idea is to unroll only when we can see some evidence of array 
references that are not presently loop-invariant that would be made 
(outer)-loop invariant via full unrolling of some inner loop.
This can be implemented by complementing the current heuristic (or its 
complementary enhancements by Honza) with an additional condition, only 
enabled when running it with the "i" (inner) flag (which should probably 
be renamed if we do implement this...).

The simplest, weakest condition I can think of would be to traverse all 
array references in the region enclosed by the loop-to-be-unrolled, 
compute the SCEV for each one, instanciate it in the loop's context, and 
checking if it only depends on the loop counter, as well as outer loop 
counters or parameters.

This condition would a priori pass on the tramp3d and reassoc_4 cases. 
Yet it is probably too weak and will still pass on many codes where 
unrolling would probably not help at all... and probably harm.
If this is the case, we should consider multiple loops to be unrolled, 
and the combined effect of unrolling ALL of these, resulting in complete 
instanciation of the array subscripts with constants. This is a very 
special case, again satisfied by our two motivating examples. Maybe it 
will be too specific and we'll have performance regressions... It 
remained to be investigated if we have to go through a stricter 
condition than the first, weak one I proposed.

If this is not clear, I can write some pseudo-code to clarify :-).

Albert

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: complete_unrolli / complete_unroll
  2009-08-20  8:45             ` Albert Cohen
@ 2009-08-20  9:57               ` Richard Guenther
  2009-08-20 13:34                 ` Albert Cohen
  2009-09-29 18:54                 ` David Edelsohn
  0 siblings, 2 replies; 16+ messages in thread
From: Richard Guenther @ 2009-08-20  9:57 UTC (permalink / raw)
  To: Albert Cohen; +Cc: gcc

On Thu, Aug 20, 2009 at 3:19 AM, Albert Cohen<Albert.Cohen@inria.fr> wrote:
> Richard Guenther wrote:
>>
>> gfortran.dg/reassoc_4.f, the hottest loop from calculix.
>
> Thanks.
>
> This example is slightly different. Graphite should be able to handle it
> with loop fusion rather than pre-unrolling + cse. But I agree that the
> unrolling + cse approach also makes sense (and does not depend on the same
> legality constraints as loop fusion).
>
> This makes me think of a simple, general criterion to detect cases where
> pre-unrolling of inner loop helps further cse and loop optimizations.
> The idea is to unroll only when we can see some evidence of array references
> that are not presently loop-invariant that would be made (outer)-loop
> invariant via full unrolling of some inner loop.
> This can be implemented by complementing the current heuristic (or its
> complementary enhancements by Honza) with an additional condition, only
> enabled when running it with the "i" (inner) flag (which should probably be
> renamed if we do implement this...).
>
> The simplest, weakest condition I can think of would be to traverse all
> array references in the region enclosed by the loop-to-be-unrolled, compute
> the SCEV for each one, instanciate it in the loop's context, and checking if
> it only depends on the loop counter, as well as outer loop counters or
> parameters.
>
> This condition would a priori pass on the tramp3d and reassoc_4 cases. Yet
> it is probably too weak and will still pass on many codes where unrolling
> would probably not help at all... and probably harm.
> If this is the case, we should consider multiple loops to be unrolled, and
> the combined effect of unrolling ALL of these, resulting in complete
> instanciation of the array subscripts with constants. This is a very special
> case, again satisfied by our two motivating examples. Maybe it will be too
> specific and we'll have performance regressions... It remained to be
> investigated if we have to go through a stricter condition than the first,
> weak one I proposed.
>
> If this is not clear, I can write some pseudo-code to clarify :-).

Can't we use graphite to re-roll loops?  That is, compress the
polyhedron by introducing a new parameter?  But maybe I am
not good at guessing what your initial bloat issue looks like.

The reason I'm asking is that there is enough code out in the
wild that has loops with manually unrolled bodies - I have seen
up to 16 times here.

Richard.
> Albert
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: complete_unrolli / complete_unroll
  2009-08-20  9:57               ` Richard Guenther
@ 2009-08-20 13:34                 ` Albert Cohen
  2009-09-29 18:54                 ` David Edelsohn
  1 sibling, 0 replies; 16+ messages in thread
From: Albert Cohen @ 2009-08-20 13:34 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc

Richard Guenther wrote:
>> If this is not clear, I can write some pseudo-code to clarify :-).
> 
> Can't we use graphite to re-roll loops?  That is, compress the
> polyhedron by introducing a new parameter?  But maybe I am
> not good at guessing what your initial bloat issue looks like.
> 
> The reason I'm asking is that there is enough code out in the
> wild that has loops with manually unrolled bodies - I have seen
> up to 16 times here.
> 

I agree that the conditions I propose are not as reliable as unrolling
and checking if it helps. At some point, this kind of sandboxing of the
IR to explore a tree of optimizations would be interesting... except for
compilation time and memory usage :-( Great for iterative and machine
learning optimization anyway.

Regarding your rerolling question, it is currently not known.
There is indeed a nice parallel between rerolling and the code
generation algorithms in CLooG. But this is only for rerolling right
after unrolling. The real problem is rerolling after a sequence of
optimizations that took advantage of prior unrolling. In this case, we
have an algorithm equivalence problem, very general and very nasty.
Polyhedral approaches do help but so far did not do much more than very
theoretical papers and toy prototypes (I can give references if you are
interested).

Clearly, it would be nice to have a rerolling pass, it would also help
the intra-block vectorization (there are specific papers on this, and
preliminary support in the vectorizer), but it is not something people
understand well.

We'll wait a little, but more feedback on conditions to stricten the
application of the early unrolling pass will be helpful, then one of the
Graphite developers may gets his or her hand dirty on it.

Albert

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: complete_unrolli / complete_unroll
  2009-08-20  9:57               ` Richard Guenther
  2009-08-20 13:34                 ` Albert Cohen
@ 2009-09-29 18:54                 ` David Edelsohn
  2009-09-30 12:36                   ` Richard Guenther
  1 sibling, 1 reply; 16+ messages in thread
From: David Edelsohn @ 2009-09-29 18:54 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Albert Cohen, gcc

On Thu, Aug 20, 2009 at 4:48 AM, Richard Guenther
<richard.guenther@gmail.com> wrote:

> Can't we use graphite to re-roll loops?  That is, compress the
> polyhedron by introducing a new parameter?  But maybe I am
> not good at guessing what your initial bloat issue looks like.
>
> The reason I'm asking is that there is enough code out in the
> wild that has loops with manually unrolled bodies - I have seen
> up to 16 times here.

Do we want to try to address this partially in GCC 4.5?  Providing
some way to disable early unrolling either explicitly or implicitly
when Graphite is enabled?

Early unrolling can cause two problems:

1) Increase the size of SCoPs, which increases memory consumption and
analysis time.

2) Confusing SCoP analysis.

Separate from re-rolling and other long-term solutions, it would be
helpful for Graphite if there was some explicit control over early
unrolling to help with experimentation.

Thanks, David

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: complete_unrolli / complete_unroll
  2009-09-29 18:54                 ` David Edelsohn
@ 2009-09-30 12:36                   ` Richard Guenther
  0 siblings, 0 replies; 16+ messages in thread
From: Richard Guenther @ 2009-09-30 12:36 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Albert Cohen, gcc

On Tue, Sep 29, 2009 at 8:23 PM, David Edelsohn <dje.gcc@gmail.com> wrote:
> On Thu, Aug 20, 2009 at 4:48 AM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>
>> Can't we use graphite to re-roll loops?  That is, compress the
>> polyhedron by introducing a new parameter?  But maybe I am
>> not good at guessing what your initial bloat issue looks like.
>>
>> The reason I'm asking is that there is enough code out in the
>> wild that has loops with manually unrolled bodies - I have seen
>> up to 16 times here.
>
> Do we want to try to address this partially in GCC 4.5?  Providing
> some way to disable early unrolling either explicitly or implicitly
> when Graphite is enabled?
>
> Early unrolling can cause two problems:
>
> 1) Increase the size of SCoPs, which increases memory consumption and
> analysis time.
>
> 2) Confusing SCoP analysis.
>
> Separate from re-rolling and other long-term solutions, it would be
> helpful for Graphite if there was some explicit control over early
> unrolling to help with experimentation.

I can definitely look into that - can someone open a bugreport
and assign it to me please?

Thanks,
RIchard.

> Thanks, David
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: complete_unrolli / complete_unroll
  2009-08-20 11:20 Dominique Dhumieres
@ 2009-08-20 12:25 ` Richard Guenther
  0 siblings, 0 replies; 16+ messages in thread
From: Richard Guenther @ 2009-08-20 12:25 UTC (permalink / raw)
  To: Dominique Dhumieres; +Cc: gcc, Albert.Cohen

On Thu, Aug 20, 2009 at 11:48 AM, Dominique Dhumieres<dominiq@lps.ens.fr> wrote:
> IIRC another code that is "improved" by complete_unrolli is the polyhedron
> test induct.f90.  However it gives worse results for some variants
> (see pr34265: induct_v2/3).
>
>> Can't we use graphite to re-roll loops? ...
>
> Is doing and undoing always some kind of work?

Yes it is, but you can only reliably find out if unrolling will enable
further optimizations if you do the unrolling.  If it didn't enable anything
re-rolling the loop should be possible.  And within the polyhedral
representation it should even be easy (I hope).

Richard.

> Cheers
>
> Dominique
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: complete_unrolli / complete_unroll
@ 2009-08-20 11:20 Dominique Dhumieres
  2009-08-20 12:25 ` Richard Guenther
  0 siblings, 1 reply; 16+ messages in thread
From: Dominique Dhumieres @ 2009-08-20 11:20 UTC (permalink / raw)
  To: gcc; +Cc: richard.guenther, Albert.Cohen

IIRC another code that is "improved" by complete_unrolli is the polyhedron
test induct.f90.  However it gives worse results for some variants
(see pr34265: induct_v2/3).

> Can't we use graphite to re-roll loops? ...

Is doing and undoing always some kind of work?

Cheers

Dominique

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2009-09-30  9:06 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-08-19 11:53 mips64 gcc 3.3.6 problem Sergey Anosov
2009-08-19 11:55 ` Paolo Carlini
2009-08-19 12:13 ` complete_unrolli / complete_unroll Albert Cohen
2009-08-19 12:33   ` Richard Guenther
2009-08-19 12:35     ` Richard Guenther
2009-08-19 13:56     ` Albert Cohen
2009-08-19 14:44       ` Albert Cohen
2009-08-19 15:09         ` Richard Guenther
2009-08-19 20:30           ` Richard Guenther
2009-08-20  8:45             ` Albert Cohen
2009-08-20  9:57               ` Richard Guenther
2009-08-20 13:34                 ` Albert Cohen
2009-09-29 18:54                 ` David Edelsohn
2009-09-30 12:36                   ` Richard Guenther
2009-08-20 11:20 Dominique Dhumieres
2009-08-20 12:25 ` Richard Guenther

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).