public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Short Displacement Problems.
@ 2002-05-29  6:49 Naveen Sharma, Noida
  2002-05-29  9:13 ` Joern Rennecke
  0 siblings, 1 reply; 14+ messages in thread
From: Naveen Sharma, Noida @ 2002-05-29  6:49 UTC (permalink / raw)
  To: law, Joern Rennecke, Alexandre Oliva, bernds, gniibe, Richard Henderson
  Cc: gcc

Hi Everyone,

This is in continuation of my earlier postings

1. http://gcc.gnu.org/ml/gcc/2002-05/msg02662.html
2. http://gcc.gnu.org/ml/gcc/2002-04/msg00379.html
 
I would like you to please have a look at the 
problem and comment whether

1. The problem, if solved, will bring significant gain on 
   architectures which have short (4bit,6bit type) displacements.
2. Any obvious issues that you see in solutions (described below)
   that I am thinking of.

I have studied the problem for SH architecture but other architectures
(mips16,hppa etc) have similar problems.
Your comments are important so that I take a proper direction.

Now let me describe the problem and solution in detail.  

For a sample code like this

void func(void)
{
        float fla[16];
        int l,m,n;
 
        putval(&l,&m,&n);
        l=m+n;
        func1(l,m,n);
}

Gcc produces this code (sh-elf) for statement "l=m+n"

        add     #72,r6
        mov     r14,r1   ! moving frame pointer r14 --> r1
        add     #68,r1   ! reaching "m"
        mov.l   @r1+,r5  ! r5 <-- m and reaching "n"
        mov.l   @r1,r6   ! r6 <-- n
        mov     r5,r4    ! n  <-- m
        add     r6,r4    ! l  <-- m+n
        add     #-8,r1

The code is like this because 1) SH has 64byte limit on displacement 
					2) Stack layout is "fla,l,m,n")

Ideally, if stack was laid out differently with following layout "l,m,n,fla"
We would have code something like
	  
 	  mov.l   @(4,r14),r5
        mov.l   @(8,r14),r6
        mov     r5,r4
        add     r6,r4

which has two advantages
        
       1. Reduced in code size.
       2. register r1 is free. In larger programs a register being 
          free at register allocation time means less spills and
          better code overall.

As I understand, we need to do two things

1. Reorder Stack with increasing size.
2. For variables with equal size, their layout on the stack
   should be based on locality of accesses.

POSSIBLE SOLUTIONS.

 1. We introduce a pass for achieving objectives #1 and #2.
    For addressing the locality issue we do the following 

    a. Create an access sequence of data items in the insn stream. 
       This would give information of usage and frequency of reference 
       of variables e.g. for code like    

             c=a+d;       
             f=d+e;  

      Sequence in which they variables are accessed is  "a d c d e f"

    b. Then we construct an Access Graph telling number of times 
       two ( or greater) variables   are accessed adjacently (or nearby) 
       e.g. in access graph constructed from above sequence we 
       would have an edge between <a, d> with frequency they occur adjacent.
       Ideally all adjacent references should be at a  SHORT displacement.
    c. From this information, we can determine placement of variables on the
stack to 
       minimize large displacements.(we would be spanning this graph to
maximize
       accesses that occur nearby)

2.   A second option is to possibly view this in spirit similar
     to register allocation. The problem is to allocate M fast access slots 
     (within N bit displacement window with  respect to a base register) 
     among total references with respect to that base register.
     Variables live at the  same time must be allocated at different
locations.
     If it  is possible to allocate  them in fast access window, we do so; 
     otherwise the variable has to be allocated  on slow access window.
     (Technically spilled to slow access window) and spill would mean 
     addition of extra code to access the desired variable.


An obvious problem ( as I mentioned in my previous mail)
I could see is that  most of stack allocations are called from reload.
While stack offset assignments would be most   beneficial  before register 
allocation, the picture of the stack isn't clear untill reload.
I want that register allocation should benefit from  offset assignments.

If I do stack offset assignment after register allocation, 
I might get reduction in code size but that would be not be as good
as it won't reduce register pressure during register allocation/reload.

Thoughts and ideas ??

Regards,
  Naveen Sharma. 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Short Displacement Problems.
  2002-05-29  6:49 Short Displacement Problems Naveen Sharma, Noida
@ 2002-05-29  9:13 ` Joern Rennecke
  2002-05-30 13:45   ` Nix
  0 siblings, 1 reply; 14+ messages in thread
From: Joern Rennecke @ 2002-05-29  9:13 UTC (permalink / raw)
  To: Naveen Sharma, Noida
  Cc: law, Alexandre Oliva, bernds, gniibe, Richard Henderson, gcc

"Naveen Sharma, Noida" wrote:
> POSSIBLE SOLUTIONS.
> 
>  1. We introduce a pass for achieving objectives #1 and #2.
>     For addressing the locality issue we do the following
> 
>     a. Create an access sequence of data items in the insn stream.
>        This would give information of usage and frequency of reference
>        of variables e.g. for code like
> 
>              c=a+d;
>              f=d+e;
> 
>       Sequence in which they variables are accessed is  "a d c d e f"
> 
>     b. Then we construct an Access Graph telling number of times
>        two ( or greater) variables   are accessed adjacently (or nearby)
>        e.g. in access graph constructed from above sequence we
>        would have an edge between <a, d> with frequency they occur adjacent.
>        Ideally all adjacent references should be at a  SHORT displacement.
>     c. From this information, we can determine placement of variables on the
> stack to
>        minimize large displacements.(we would be spanning this graph to
> maximize
>        accesses that occur nearby)
> 
> 2.   A second option is to possibly view this in spirit similar
>      to register allocation. The problem is to allocate M fast access slots
>      (within N bit displacement window with  respect to a base register)
>      among total references with respect to that base register.
>      Variables live at the  same time must be allocated at different
> locations.
>      If it  is possible to allocate  them in fast access window, we do so;
>      otherwise the variable has to be allocated  on slow access window.
>      (Technically spilled to slow access window) and spill would mean
>      addition of extra code to access the desired variable.
> 
> An obvious problem ( as I mentioned in my previous mail)
> I could see is that  most of stack allocations are called from reload.
> While stack offset assignments would be most   beneficial  before register
> allocation, the picture of the stack isn't clear untill reload.
> I want that register allocation should benefit from  offset assignments.
> 
> If I do stack offset assignment after register allocation,
> I might get reduction in code size but that would be not be as good
> as it won't reduce register pressure during register allocation/reload.
> 
> Thoughts and ideas ??

In general, the most important subproblem is to find the best stack slots for
register spills.  -O2 is notoriously bad at this, because the registers that
are spilled last get the largest offsets.  -O2 -fomit-frame-pointer, OTOH,
gives the last stilled registers the smallest offset.
If you want to get any better than that, your optimization has to run inside
of or alongside to or instead of reload.
	
-- 
--------------------------
SuperH
2430 Aztec West / Almondsbury / BRISTOL / BS32 4AQ
T:+44 1454 462330

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Short Displacement Problems.
  2002-05-29  9:13 ` Joern Rennecke
@ 2002-05-30 13:45   ` Nix
  0 siblings, 0 replies; 14+ messages in thread
From: Nix @ 2002-05-30 13:45 UTC (permalink / raw)
  To: Joern Rennecke
  Cc: Naveen Sharma, Noida, law, Alexandre Oliva, bernds, gniibe,
	Richard Henderson, gcc

On Wed, 29 May 2002, Joern Rennecke said:
> In general, the most important subproblem is to find the best stack slots for
> register spills.  -O2 is notoriously bad at this, because the registers that
> are spilled last get the largest offsets.  -O2 -fomit-frame-pointer, OTOH,
> gives the last stilled registers the smallest offset.

But -fomit-frame-pointer has other nasty bugs (but maybe this is only on
IA32).

It miscompiles at least GiNaC (-> segfaults) and GCC itself on i586 with
-O2 (-> all exception throws/catches in code compiled by a compiler
built with this flag dump core.) I'd report this to GNATS if I could get
a decent minimal testcase, but I don't have a *clue* how to debug a
problem with -fomit-frame-pointer, because the switch by its very nature
makes debugging so damn hard.

Anyone know how to debug this sort of problem? I tried for some time in
the GCC-3.0.x days and eventually gave up (3.0.x had the same bug).

-- 
`What happened?'
                 `Nick shipped buggy code!'
                                             `Oh, no dinner for him...'

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: Short Displacement Problems.
@ 2002-06-04  0:49 Naveen Sharma, Noida
  0 siblings, 0 replies; 14+ messages in thread
From: Naveen Sharma, Noida @ 2002-06-04  0:49 UTC (permalink / raw)
  To: gcc

Hi Everybody,

From the discussion thread regarding this problem, I 
think

1. Optimizing stack offsets is definitely desirable.Everybody
   would like to have this feature(especially from embedded targets
viewpoint).
   Also desired was that this change should work optimally with
   structs and arrays on stack.

2. The issues raised were only regarding its implementation methods.
   Following were the suggested implementation approaches,

First approach
--------------
> > On May 30, 2002, Zack Weinberg <zack@codesourcery.com> wrote:
> >
> > > Only if we stick with the existing, lame, way of 
> representing stack
> > > slots.  (mem:MODE (plus:P (reg:P sp) (const_int offset))) 
> is a pain to
> > > work with, yes.  Consider (stack:MODE slot) instead -- 
> with slot being
> > > akin to a pseudo-register number, and only one instance 
> of any given
> > > stack RTX.

This approach requires lot of changes. However, if it is worth the effort
and a clean solution, then we can go for it. Following two issues were
highlighted about this approach,

	a)Issue of CSE able expressions while computing offsets after
        register allocation.(partly addresseable by reload_cse)
      b)Loop optimizations.( this will be needed to be addressed in a 
				      post-register allocation optimizations
pass) 

Second Approach
---------------
To minimize changes, I  am thinking of abstracting the stack 
in a slightly different way.
Initially we take the ASSUMPTION that all pseudos would be on to the stack
and pseudos would be placed on the "big virtual stack" based on locality.  
So stack assignment of all psuedos on "virtual stack" is done before
register
allocation had begun.The reload pass would get rtx's from this
virtual stack.
But finally, as all pseudos would not go onto stack, In a post register
allocation pass
this virtual stack is "collapsed" to an actual realistic stack.

Issues (a & b)  mentioned above should be addressed by this approach as
well.
The only advantage might be lesser number of changes.

Also, Does it make sense to integrate this with 
new register allocator branch, if that is going to be the register allocator
some time in the future. 
Please let me know your thoughts on these.

Best Regards,
  Naveen Sharma. 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: Short Displacement Problems.
  2002-05-31 11:11   ` Alexandre Oliva
@ 2002-05-31 12:34     ` Gary Funck
  0 siblings, 0 replies; 14+ messages in thread
From: Gary Funck @ 2002-05-31 12:34 UTC (permalink / raw)
  To: gcc



> -----Original Message-----
> From: gcc-owner@gcc.gnu.org [mailto:gcc-owner@gcc.gnu.org]On Behalf Of
> Alexandre Oliva
> Sent: Friday, May 31, 2002 10:14 AM
> To: Zack Weinberg
> Cc: Naveen Sharma, Noida; Joern Rennecke; law@redhat.com;
> bernds@redhat.com; gniibe@m17n.org; Richard Henderson; gcc@gcc.gnu.org
> Subject: Re: Short Displacement Problems.
>
>
> On May 30, 2002, Zack Weinberg <zack@codesourcery.com> wrote:
>
> > Only if we stick with the existing, lame, way of representing stack
> > slots.  (mem:MODE (plus:P (reg:P sp) (const_int offset))) is a pain to
> > work with, yes.  Consider (stack:MODE slot) instead -- with slot being
> > akin to a pseudo-register number, and only one instance of any given
> > stack RTX.
>
> How is this different from plain pseudos?  At some point, we have to
> turn them into (mem (plus (reg sp) (const_int offset))), and reload is
> probably the best point.  It would be nice to defer the computation of
> the offset, but optimizing that can be very tricky if you consider
> that, by changing the location of a very commonly used pseudo to a
> stack slot with a smaller offset, and moving whatever was in that
> location to a stack slot with a larger offset, you end up needing more
> registers to hold the address, so you have to spill them, but you
> can't use yet another register to hold the spill address, so you
> lose.  This is just a simplified view of the complications.

If we assume that there are some targets which can efficiently encode small
offsets, might it not make sense to have the spill logic attempt to optimize
the most frequent references (or some other cost-weighted function) to the
smallest offsets, and then on a target-specific basis, run a second pass if
and only if at least one offset was encountered which is above some
target-defined maximum? In this case, the register allocator might need to
dedicate an additional register in order to avoid the spill difficulties
outlined above, before making the second pass. Thus, a second pass is only
required for larger, more complex programs, and is invoked in a
machine-dependent set of circumstances.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Short Displacement Problems.
  2002-05-30 10:32 ` Zack Weinberg
  2002-05-30 10:32   ` Jan Hubicka
@ 2002-05-31 11:11   ` Alexandre Oliva
  2002-05-31 12:34     ` Gary Funck
  1 sibling, 1 reply; 14+ messages in thread
From: Alexandre Oliva @ 2002-05-31 11:11 UTC (permalink / raw)
  To: Zack Weinberg
  Cc: Naveen Sharma, Noida, Joern Rennecke, law, bernds, gniibe,
	Richard Henderson, gcc

On May 30, 2002, Zack Weinberg <zack@codesourcery.com> wrote:

> Only if we stick with the existing, lame, way of representing stack
> slots.  (mem:MODE (plus:P (reg:P sp) (const_int offset))) is a pain to
> work with, yes.  Consider (stack:MODE slot) instead -- with slot being
> akin to a pseudo-register number, and only one instance of any given
> stack RTX.

How is this different from plain pseudos?  At some point, we have to
turn them into (mem (plus (reg sp) (const_int offset))), and reload is
probably the best point.  It would be nice to defer the computation of
the offset, but optimizing that can be very tricky if you consider
that, by changing the location of a very commonly used pseudo to a
stack slot with a smaller offset, and moving whatever was in that
location to a stack slot with a larger offset, you end up needing more
registers to hold the address, so you have to spill them, but you
can't use yet another register to hold the spill address, so you
lose.  This is just a simplified view of the complications.

> We could assign memory locations to these with just one linear scan
> over the RTL to replace them with MEM expressions at the end.

We already do this in reload, except we're replacing pseudos with mem
slots.  We don't have to introduce one more scan, we just have to be
smarter as to which slots we assign to each pseudo.

-- 
Alexandre Oliva   Enjoy Guarana', see http://www.ic.unicamp.br/~oliva/
Red Hat GCC Developer                  aoliva@{cygnus.com, redhat.com}
CS PhD student at IC-Unicamp        oliva@{lsd.ic.unicamp.br, gnu.org}
Free Software Evangelist                Professional serial bug killer

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: Short Displacement Problems.
@ 2002-05-31  9:41 Naveen Sharma, Noida
  0 siblings, 0 replies; 14+ messages in thread
From: Naveen Sharma, Noida @ 2002-05-31  9:41 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: Joern Rennecke, jh, Joe Buck, kenner, gcc

> Zack Weinberg  wrote:
> > 
> > One problem with this strategy is that
> > reshuffling stack during reload is expensive because
> > 
> > 1. There would be lot of stack variable references that we 
> will have to
> >    fix up in insn chain while allocating  "each" stack slot.
> >    The entire logic will increase compilation  time.
> 
> Only if we stick with the existing, lame, way of representing stack
> slots.  (mem:MODE (plus:P (reg:P sp) (const_int offset))) is a pain to
> work with, yes.  Consider (stack:MODE slot) instead -- with slot being
> akin to a pseudo-register number, and only one instance of any given
> stack RTX.  We could assign memory locations to these with just one
> linear scan over the RTL to replace them with MEM expressions at the
> end.

I agree with you on this.The present stack representation is a
bottleneck and possibly if we decide a better one, it would 
beneficial in the long run.
 
> In fact, if we used this representation from RTL creation time all the
> way to after register allocation, we could make some significant
> improvements to code generation, independent of the
> stack-slot-assignment issue.
> 
> Yeah, this would require changes to every machine description and
> every RTL optimizer pass, but it would be worth it.

To minimize changes, I  am thinking of abstracting the stack 
in a slightly different way.
Initially we take the ASSUMPTION that all pseudos would be on to the stack
and pseudos would be placed on the "big virtual stack" based on locality.  
So stack assignment of all psuedos on "virtual stack" is complete before
register
allocation had begun.The reload pass would get rtx's from this
virtual stack.
But finally, all pseudos would not go onto stack.In a post register
allocation pass
this virtual stack is "collapsed" to an actual realistic stack.

Initially, I am thinking to enable this virtual stack for SH only and other
ports might go on to it once I am able to set up a complete infrastructure
in place.

Please let me know if that looks promising enough.

Best Regards,
  Naveen Sharma.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Short Displacement Problems.
  2002-05-30 12:20     ` Joe Buck
@ 2002-05-30 12:29       ` Jan Hubicka
  0 siblings, 0 replies; 14+ messages in thread
From: Jan Hubicka @ 2002-05-30 12:29 UTC (permalink / raw)
  To: Joe Buck
  Cc: Jan Hubicka, Zack Weinberg, Naveen Sharma Noida, Joern Rennecke,
	law, Alexandre Oliva, bernds, gniibe, Richard Henderson, gcc

> 
> > In my experiments I've been simply using ADRESSOF as kind of "ticket for
> > stack slot" and it appears to work pretty well up to the reload.  Major
> > problem is that lowering the code may result in CSEable expressions.
> 
> Right, but the problem is that ADDRESSOF doesn't work in general for
> structs.  The result is a huge penalty in C++ for temporary objects
> with more than one data member, compared to with one.

In my simple minded patch it did - I simply assigned ADDRESSOF for a
struct (and each object on the stack) and purge_adressof homed the
structure on stack when addressof reference survived.  At least we
didn't allocate stack slots for structs never used.
But what we really need is scalar replacement I guess and that would
take more work.

Honza
> 
> Stack slot assignment probably isn't worth the bother unless it works
> for structs, not just objects that have a mode that GCC knows about.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Short Displacement Problems.
  2002-05-30 10:32   ` Jan Hubicka
@ 2002-05-30 12:20     ` Joe Buck
  2002-05-30 12:29       ` Jan Hubicka
  0 siblings, 1 reply; 14+ messages in thread
From: Joe Buck @ 2002-05-30 12:20 UTC (permalink / raw)
  To: Jan Hubicka
  Cc: Zack Weinberg, Naveen Sharma Noida, Joern Rennecke, law,
	Alexandre Oliva, bernds, gniibe, Richard Henderson, gcc


> In my experiments I've been simply using ADRESSOF as kind of "ticket for
> stack slot" and it appears to work pretty well up to the reload.  Major
> problem is that lowering the code may result in CSEable expressions.

Right, but the problem is that ADDRESSOF doesn't work in general for
structs.  The result is a huge penalty in C++ for temporary objects
with more than one data member, compared to with one.

Stack slot assignment probably isn't worth the bother unless it works
for structs, not just objects that have a mode that GCC knows about.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Short Displacement Problems.
  2002-05-30 10:43 Richard Kenner
@ 2002-05-30 11:34 ` Zack Weinberg
  0 siblings, 0 replies; 14+ messages in thread
From: Zack Weinberg @ 2002-05-30 11:34 UTC (permalink / raw)
  To: Richard Kenner; +Cc: gcc

On Thu, May 30, 2002 at 01:05:05PM -0400, Richard Kenner wrote:
>     Consider (stack:MODE slot) instead -- with slot being akin to a
>     pseudo-register number, and only one instance of any given stack RTX.
>     We could assign memory locations to these with just one linear scan
>     over the RTL to replace them with MEM expressions at the end.
> 
> The downside of this is that if the offset is too large for the access
> to be a single insn (always the case in IA64, for example), you have to
> be careful to do this replacement early enough to apply CSE and loop
> optimization to those address computations.

True.  We do already do some CSE after register allocation.  I'm not
sure what it would take to do loop optimizations there, but it might
help independent of better stack slot assignment.  (In general it
seems to me that most optimizations can constructively be run both
before and after register allocation.)

zw

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Short Displacement Problems.
@ 2002-05-30 10:43 Richard Kenner
  2002-05-30 11:34 ` Zack Weinberg
  0 siblings, 1 reply; 14+ messages in thread
From: Richard Kenner @ 2002-05-30 10:43 UTC (permalink / raw)
  To: zack; +Cc: gcc

    Consider (stack:MODE slot) instead -- with slot being akin to a
    pseudo-register number, and only one instance of any given stack RTX.
    We could assign memory locations to these with just one linear scan
    over the RTL to replace them with MEM expressions at the end.

The downside of this is that if the offset is too large for the access
to be a single insn (always the case in IA64, for example), you have to
be careful to do this replacement early enough to apply CSE and loop
optimization to those address computations.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Short Displacement Problems.
  2002-05-30  3:25 Naveen Sharma, Noida
@ 2002-05-30 10:32 ` Zack Weinberg
  2002-05-30 10:32   ` Jan Hubicka
  2002-05-31 11:11   ` Alexandre Oliva
  0 siblings, 2 replies; 14+ messages in thread
From: Zack Weinberg @ 2002-05-30 10:32 UTC (permalink / raw)
  To: Naveen Sharma, Noida
  Cc: Joern Rennecke, law, Alexandre Oliva, bernds, gniibe,
	Richard Henderson, gcc

On Thu, May 30, 2002 at 01:09:19PM +0530, Naveen Sharma, Noida wrote:
> 
> One problem with this strategy is that
> reshuffling stack during reload is expensive because
> 
> 1. There would be lot of stack variable references that we will have to
>    fix up in insn chain while allocating  "each" stack slot.
>    The entire logic will increase compilation  time.

Only if we stick with the existing, lame, way of representing stack
slots.  (mem:MODE (plus:P (reg:P sp) (const_int offset))) is a pain to
work with, yes.  Consider (stack:MODE slot) instead -- with slot being
akin to a pseudo-register number, and only one instance of any given
stack RTX.  We could assign memory locations to these with just one
linear scan over the RTL to replace them with MEM expressions at the
end.

In fact, if we used this representation from RTL creation time all the
way to after register allocation, we could make some significant
improvements to code generation, independent of the
stack-slot-assignment issue.

Yeah, this would require changes to every machine description and
every RTL optimizer pass, but it would be worth it.

zw

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Short Displacement Problems.
  2002-05-30 10:32 ` Zack Weinberg
@ 2002-05-30 10:32   ` Jan Hubicka
  2002-05-30 12:20     ` Joe Buck
  2002-05-31 11:11   ` Alexandre Oliva
  1 sibling, 1 reply; 14+ messages in thread
From: Jan Hubicka @ 2002-05-30 10:32 UTC (permalink / raw)
  To: Zack Weinberg
  Cc: Naveen Sharma, Noida, Joern Rennecke, law, Alexandre Oliva,
	bernds, gniibe, Richard Henderson, gcc

> On Thu, May 30, 2002 at 01:09:19PM +0530, Naveen Sharma, Noida wrote:
> > 
> > One problem with this strategy is that
> > reshuffling stack during reload is expensive because
> > 
> > 1. There would be lot of stack variable references that we will have to
> >    fix up in insn chain while allocating  "each" stack slot.
> >    The entire logic will increase compilation  time.
> 
> Only if we stick with the existing, lame, way of representing stack
> slots.  (mem:MODE (plus:P (reg:P sp) (const_int offset))) is a pain to
> work with, yes.  Consider (stack:MODE slot) instead -- with slot being
> akin to a pseudo-register number, and only one instance of any given
> stack RTX.  We could assign memory locations to these with just one
> linear scan over the RTL to replace them with MEM expressions at the
> end.

In my experiments I've been simply using ADRESSOF as kind of "ticket for
stack slot" and it appears to work pretty well up to the reload.  Major
problem is that lowering the code may result in CSEable expressions.

Honza
> 
> In fact, if we used this representation from RTL creation time all the
> way to after register allocation, we could make some significant
> improvements to code generation, independent of the
> stack-slot-assignment issue.
> 
> Yeah, this would require changes to every machine description and
> every RTL optimizer pass, but it would be worth it.
> 
> zw

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: Short Displacement Problems.
@ 2002-05-30  3:25 Naveen Sharma, Noida
  2002-05-30 10:32 ` Zack Weinberg
  0 siblings, 1 reply; 14+ messages in thread
From: Naveen Sharma, Noida @ 2002-05-30  3:25 UTC (permalink / raw)
  To: Joern Rennecke
  Cc: law, Alexandre Oliva, bernds, gniibe, Richard Henderson, gcc

Hi,

> > An obvious problem ( as I mentioned in my previous mail)
> > I could see is that  most of stack allocations are called 
> from reload.
> > While stack offset assignments would be most   beneficial  
> before register
> > allocation, the picture of the stack isn't clear untill reload.
> > I want that register allocation should benefit from  offset 
> assignments.
> > 
> > If I do stack offset assignment after register allocation,
> > I might get reduction in code size but that would be not be as good
> > as it won't reduce register pressure during register 
> allocation/reload.
> > 
> > Thoughts and ideas ??
> 
> In general, the most important subproblem is to find the best 
> stack slots for
> register spills.  -O2 is notoriously bad at this, because the 
> registers that
> are spilled last get the largest offsets.  -O2 
> -fomit-frame-pointer, OTOH,
> gives the last stilled registers the smallest offset.
> If you want to get any better than that, your optimization 
> has to run inside
> of or alongside to or instead of reload.

One problem with this strategy is that
reshuffling stack during reload is expensive because

1. There would be lot of stack variable references that we will have to
   fix up in insn chain while allocating  "each" stack slot.
   The entire logic will increase compilation  time.

e.g consider this scenario, we have some "n" number of variables on stack.
    relaod phase calls assign_stack_local for some pseudo, and from access
    data of that pseudo, we decide to put it somewhere between 1 .. n
    at positon "i".The offsets of i+1 .. n variables are effected and 
    all references to them fixed up.

But it seems there might be a tradeoff involved here.Let me see 
how people view this problem.

Would the performance increase justify increase in 
compilation time.

Best regards,
  Naveen Sharma.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2002-06-04  7:09 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-05-29  6:49 Short Displacement Problems Naveen Sharma, Noida
2002-05-29  9:13 ` Joern Rennecke
2002-05-30 13:45   ` Nix
2002-05-30  3:25 Naveen Sharma, Noida
2002-05-30 10:32 ` Zack Weinberg
2002-05-30 10:32   ` Jan Hubicka
2002-05-30 12:20     ` Joe Buck
2002-05-30 12:29       ` Jan Hubicka
2002-05-31 11:11   ` Alexandre Oliva
2002-05-31 12:34     ` Gary Funck
2002-05-30 10:43 Richard Kenner
2002-05-30 11:34 ` Zack Weinberg
2002-05-31  9:41 Naveen Sharma, Noida
2002-06-04  0:49 Naveen Sharma, Noida

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).