Interesting paper from Perdue

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* Interesting paper from Perdue
@ 2004-09-20 13:31 Steven Bosscher
  2004-09-21  7:21 ` tm_gccmail
  2004-09-21 16:01 ` Vladimir Makarov
  0 siblings, 2 replies; 5+ messages in thread
From: Steven Bosscher @ 2004-09-20 13:31 UTC (permalink / raw)
  To: gcc

I don't know if anyone has ever seen/read/mentioned this paper
before, I might have missed it.  Otherwise, interesting reading:
https://engineering.purdue.edu/ECE/Research/TR/2004pdfs/TR-ECE-04-01.pdf

Gr.
Steven

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Interesting paper from Perdue
  2004-09-20 13:31 Interesting paper from Perdue Steven Bosscher
@ 2004-09-21  7:21 ` tm_gccmail
  2004-09-21 17:59   ` Vladimir Makarov
  2004-09-21 16:01 ` Vladimir Makarov
  1 sibling, 1 reply; 5+ messages in thread
From: tm_gccmail @ 2004-09-21  7:21 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: gcc

On Mon, 20 Sep 2004, Steven Bosscher wrote:

> I don't know if anyone has ever seen/read/mentioned this paper
> before, I might have missed it.  Otherwise, interesting reading:
> https://engineering.purdue.edu/ECE/Research/TR/2004pdfs/TR-ECE-04-01.pdf
> 
> Gr.
> Steven

I'll digress and rant a bit; apologizes in advance.

This is just the tip of the iceberg, really. There are many other 
instances where various optimizations are improved in isolation and 
degrade performance because they don't consider the effects on the other
optimization passes.

For example, Some of the recent work on alias analysis really worries me
because I believe this will result in a medium-term net performance
decrease on many targets.

Consider:

1. Improved alias analysis allows better disambiguation of memory 
   references.

2. The current scheduler is overly aggressive about hoisting loads, and
   is only restrained by the inadequacy of the current alias analysis.

   When alias analysis is improved, the first scheduling pass will
   greatly increase register pressure.

3. The register allocator inserts code suboptimially (in particular,
   restores are too early) and lacks basic fatures such as live-range
   splitting and rematerialization.

   Therefore, it exhibits increasingly bad behavior as register pressure
   increases.

I think the following will occur:

1. Targets with the first instruction scheduling pass enabled will exhibit
   a net decrease in performance due to increased register pressure. This
   will be exacerbated if the target has fewer registers (e.g. slightly
   worse on IA64, much worse on PPC). The SH is unlikely to be affected
   due to scheduler modifications already implemented.

1. Targets without the first scheduling pass enabled will exhibit a net
   decrease in performance only if the register set is very small
   (fewer than 16 registers). This includes the x86 and most embedded
   processors such as the H8/300, M68HC11, 8051, etc.

As I see it, the register allocator and the instruction scheduler are
really the base of the foundations for GCC optimization.

We keep adding improvements which:

1. Allow more intermediate values to be kept in registers which increase
   register pressure

2. Allow memory to be retained in registers longer, which increases
   register pressure

3. Create larger basic blocks, which increases register pressure

4. Allow more loop unrolling, which increases register pressure

5. etc

...and the register allocator doesn't handle the increased register
pressure well, so the net result is very little improvement.

We really spend some time improving the foundation of GCC instead of
piling more and more optimizations on top of it.

Toshi

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Interesting paper from Perdue
  2004-09-21  7:21 ` tm_gccmail
@ 2004-09-21 17:59   ` Vladimir Makarov
  2004-09-21 18:39     ` Daniel Berlin
  0 siblings, 1 reply; 5+ messages in thread
From: Vladimir Makarov @ 2004-09-21 17:59 UTC (permalink / raw)
  To: tm_gccmail; +Cc: Steven Bosscher, gcc

tm_gccmail@kloo.net wrote:

>On Mon, 20 Sep 2004, Steven Bosscher wrote:
>
>  
>
>>I don't know if anyone has ever seen/read/mentioned this paper
>>before, I might have missed it.  Otherwise, interesting reading:
>>https://engineering.purdue.edu/ECE/Research/TR/2004pdfs/TR-ECE-04-01.pdf
>>
>>Gr.
>>Steven
>>    
>>
>
>I'll digress and rant a bit; apologizes in advance.
>
>This is just the tip of the iceberg, really. There are many other 
>instances where various optimizations are improved in isolation and 
>degrade performance because they don't consider the effects on the other
>optimization passes.
>
>  
>
I think the approach mentioned in article has a merit for any compiler. 
 Any optimized compiler is bunch of pass because of complexity task. 
 Many passes are trying to solve subproblem not taking other passes into 
account.  It creates unpredictable compiler behaviour for given program 
when an optimization is on or off.

>...and the register allocator doesn't handle the increased register
>pressure well, so the net result is very little improvement.
>
>We really spend some time improving the foundation of GCC instead of
>piling more and more optimizations on top of it.
>
>  
>
I agree with this.  Gcc probably is a compiler with very upredictabe 
behaviour because inadequate register allocator/scheduler.  But writing 
a good register allocator is not easy task in gcc because of very rich 
register file model and a lot of machine-dependent macros used by gcc 
ports.  The colour-based register allocator project is an example of 
this.  I know about this because I worked on register allocator 
improvements and I am still working on it.  I think the key component is 
reload pass.  Tasks solved by reload should be combined with the 
register allocator.  We should rid off reload.  But it is an eneormous task.

Vlad


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Interesting paper from Perdue
  2004-09-21 17:59   ` Vladimir Makarov
@ 2004-09-21 18:39     ` Daniel Berlin
  0 siblings, 0 replies; 5+ messages in thread
From: Daniel Berlin @ 2004-09-21 18:39 UTC (permalink / raw)
  To: Vladimir Makarov; +Cc: gcc, tm_gccmail, Steven Bosscher


On Sep 21, 2004, at 12:01 PM, Vladimir Makarov wrote:

> tm_gccmail@kloo.net wrote:
>
>> On Mon, 20 Sep 2004, Steven Bosscher wrote:
>>
>>
>>> I don't know if anyone has ever seen/read/mentioned this paper
>>> before, I might have missed it.  Otherwise, interesting reading:
>>> https://engineering.purdue.edu/ECE/Research/TR/2004pdfs/TR-ECE-04 
>>> -01.pdf
>>>
>>> Gr.
>>> Steven
>>>
>>
>> I'll digress and rant a bit; apologizes in advance.
>>
>> This is just the tip of the iceberg, really. There are many other  
>> instances where various optimizations are improved in isolation and  
>> degrade performance because they don't consider the effects on the  
>> other
>> optimization passes.
>>
>>
> I think the approach mentioned in article has a merit for any  
> compiler. Any optimized compiler is bunch of pass because of  
> complexity task. Many passes are trying to solve subproblem not taking  
> other passes into account.  It creates unpredictable compiler  
> behaviour for given program when an optimization is on or off.
>
>> ...and the register allocator doesn't handle the increased register
>> pressure well, so the net result is very little improvement.
>>
>> We really spend some time improving the foundation of GCC instead of
>> piling more and more optimizations on top of it.
>>
>>
> I agree with this.  Gcc probably is a compiler with very upredictabe  
> behaviour because inadequate register allocator/scheduler.  But  
> writing a good register allocator is not easy task in gcc because of  
> very rich register file model and a lot of machine-dependent macros  
> used by gcc ports.


We have a register file model so rich that no single architecture is  
described well enough to get good results.
There is something ironic about this.
:)
> The colour-based register allocator project is an example of this.  I  
> know about this because I worked on register allocator improvements  
> and I am still working on it.  I think the key component is reload  
> pass.  Tasks solved by reload should be combined with the register  
> allocator.  We should rid off reload.  But it is an eneormous task.

Yes, which is one of a myriad of reasons new-ra never succeeded.
The goals were too ambitious
Getting rid of reload is a project all itself.

>
> Vlad
>
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Interesting paper from Perdue
  2004-09-20 13:31 Interesting paper from Perdue Steven Bosscher
  2004-09-21  7:21 ` tm_gccmail
@ 2004-09-21 16:01 ` Vladimir Makarov
  1 sibling, 0 replies; 5+ messages in thread
From: Vladimir Makarov @ 2004-09-21 16:01 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: gcc

Steven Bosscher wrote:

>I don't know if anyone has ever seen/read/mentioned this paper
>before, I might have missed it.  Otherwise, interesting reading:
>https://engineering.purdue.edu/ECE/Research/TR/2004pdfs/TR-ECE-04-01.pdf
>
>  
>
The most interesting thing about the article is that they spent a lot of 
machine time (which I have no in my disposal) to investigate individual 
options to get a better SPECInt2000 results.

But I see they used a black box approach because they don't know gcc 
internals at all (they tried -fschedule-insns for p4 which does nothing, 
they also did not use -mtune=pentium4, etc).

Their most complex algorithm (3rd algorithm) to choose better option 
combination is just oversimplified taboo search algorithm (with list of 
taboo moves which never expire).  I think that an algorithm based on 
taboo metaheuristic would achieve better results for the same number of 
tries.  Imho the taboo algorithm is the best fit approach for solution 
of the task (genetic apporach used by Scott Ladd or more random 
semulated annealing approach would work much worse on my opinion).

In any case, the approach is not practical (on my evaluation it needs 
about 15 hours to choose options by the 3rd algorithm for one 
SPECInt2000 test -- three 3 minutes runs, 20 options, 4 iteration as 
they reported).  Alhough it could be used to get a better (peak) 
SPECInt2000 report.

Vlad

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2004-09-21 17:59 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-09-20 13:31 Interesting paper from Perdue Steven Bosscher
2004-09-21  7:21 ` tm_gccmail
2004-09-21 17:59   ` Vladimir Makarov
2004-09-21 18:39     ` Daniel Berlin
2004-09-21 16:01 ` Vladimir Makarov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).