public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: [tree-ssa] Out of SSA status and issues
@ 2003-05-13 15:23 Richard Kenner
  2003-05-13 18:50 ` Geoff Keating
  2003-05-17 17:19 ` Michael S. Zick
  0 siblings, 2 replies; 36+ messages in thread
From: Richard Kenner @ 2003-05-13 15:23 UTC (permalink / raw)
  To: mszick; +Cc: gcc

    Consider instead that '*p' is just the name given a register's
    contents.  Neither 'i' nor '9' are processor internal.  Both are
    external to the cpu.  The memory reference for 'i' is source code
    implicit, the memory reference for '*p' is source code explicit.  

True, but given caching effects, it's hard to see how *p could be less
expensive than 'i'.

^ permalink raw reply	[flat|nested] 36+ messages in thread
* Re: [tree-ssa] Out of SSA status and issues
@ 2003-05-13 13:42 Richard Kenner
  0 siblings, 0 replies; 36+ messages in thread
From: Richard Kenner @ 2003-05-13 13:42 UTC (permalink / raw)
  To: matz; +Cc: gcc

    Suppose there is register pressure and we are in the register allocator
    and decide that p1 doesn't get a hardreg.  If you now don't allow the use
    of p1 to be replaced with *p2, you are forced to emit such code:

Of course, but we weren't talking about the register allocator.  Instead,
we were talking about a high-level optimization.

The register allocator indeed needs to know what a psuedo corresponds to
so that it can undo the optimization is there aren't enough hard registers.
We do some of this, but not enough.

^ permalink raw reply	[flat|nested] 36+ messages in thread
* Re: [tree-ssa] Out of SSA status and issues
@ 2003-05-13 13:17 Richard Kenner
  2003-05-13 13:27 ` Diego Novillo
                   ` (2 more replies)
  0 siblings, 3 replies; 36+ messages in thread
From: Richard Kenner @ 2003-05-13 13:17 UTC (permalink / raw)
  To: dnovillo; +Cc: gcc

    I see nothing wrong in replacing 'i + 9' with '*p + 9'.  It would
    probably not be efficient, but I can't see it being wrong.

If an optimizer pessimizes the code, I'd consider that "wrong".

This isn't a machine-dependent issue: with CPU speeds the way they are,
a memory reference is *always* many times more expensive than an addition.

^ permalink raw reply	[flat|nested] 36+ messages in thread
* [tree-ssa] Out of SSA status and issues
@ 2003-05-12 14:42 Andrew MacLeod
  2003-05-12 15:38 ` Diego Novillo
  2003-05-12 18:57 ` Andrew MacLeod
  0 siblings, 2 replies; 36+ messages in thread
From: Andrew MacLeod @ 2003-05-12 14:42 UTC (permalink / raw)
  To: gcc mailing list



Most of the out of ssa pass is ready to go, except for a couple of
issues. One is large, one is not.

First, the smaller issue. When overlapping live ranges are allowed, we
will sometimes have to issue copies on edges in order to satify PHI
assignments. Since it is impossible to issue a copy on an abnormal
critical edge, we must coalesce all the variables which could
potentially cause a copy across one of these edges. This prevents the
need for a copy on that edge.

The worst case scenario happens when you have 2 abnormal critical edges
feeding a PHI node in which the 2 values coming in conflict. This means
you *have* to issue a copy on one of the edges, and that can't be done.

Copy propagation can currently cause this to happen in C++ code. EH
generates lots of abnormal critical edges, and copy propagation can
cause this as the following example shows:

void f();

int val();
void g();

int *q;
int i;
void
a()
{
 int *p;

  p = &i;
  try {
    f();
    *p++ = val();
  }
  catch (...)
    {
    }

   *p =  20;
}

With copy propagation we end up with:

   #   (*T.2)_10 = VDEF <(*T.2)_7>;
    #   .GLOBAL_VAR_11 = VDEF <.GLOBAL_VAR_8>;
    #   VUSE <(*T.2)_7>;
    #   VUSE <.GLOBAL_VAR_8>;
    p_9 = &i;
    try
      {

        # BLOCK 1 (a.C:16).  PRED: 0.  SUCC: 2 3.
        {

          #   .GLOBAL_VAR_12 = VDEF <.GLOBAL_VAR_11>;
          f ();

          # BLOCK 2.  PRED: 1.  SUCC: 7 3.
          (void)0;

          #   (*T.2)_15 = VDEF <(*T.2)_10>;
          #   .GLOBAL_VAR_16 = VDEF <.GLOBAL_VAR_12>;
          p_14 = p_9 + 4B;

          #   (*T.2)_18 = VDEF <(*T.2)_15>;
          #   .GLOBAL_VAR_19 = VDEF <.GLOBAL_VAR_16>;
          T.2_17 = p_9;

          #   .GLOBAL_VAR_20 = VDEF <.GLOBAL_VAR_19>;
          #   (*T.2)_21 = VDEF <(*T.2)_18>;
          #   VUSE <T.2_17>;
          *T.2 = val ()
        }
      }
    catch
      {

        # BLOCK 3 (a.C:20).  PRED: 2 1.  SUCC: 7 4.
        #   p_1 = PHI <p_9(1), p_14(2)>;
        #   .GLOBAL_VAR_3 = PHI <.GLOBAL_VAR_12(1), .GLOBAL_VAR_20(2)>;
        #   (*T.2)_5 = PHI <(*T.2)_10(1), (*T.2)_21(2)>;
        catch ()
          {

            # BLOCK 4 (a.C:20).  PRED: 3.  SUCC: 7 5.
            {

              #   .GLOBAL_VAR_22 = VDEF <.GLOBAL_VAR_3>;
              __cxa_begin_catch (<<<exception object>>>);
              try
                {

                  # BLOCK 5 (a.C:21).  PRED: 4.  SUCC: 6 7.
                  {
                    (void)0
                  }
                }
              finally
                {

                  # BLOCK 6 (a.C:20).  PRED: 5.  SUCC: 7.

                  #   .GLOBAL_VAR_23 = VDEF <.GLOBAL_VAR_22>;
                  __cxa_end_catch ()
                }
            }
          }
      };

    # BLOCK 7 (a.C:24).  PRED: 6 5 4 3 2 0.  SUCC: -2.
    #   p_2 = PHI <p_9(0), p_14(2), p_1(3), p_1(4), p_1(5), p_1(6)>;
    #   .GLOBAL_VAR_4 = PHI <.GLOBAL_VAR_11(0), .GLOBAL_VAR_20(2),
.GLOBAL_VAR_3(3), .GLOBAL_VAR_22(4), .GLOBAL_VAR_22(5),
.GLOBAL_VAR_23(6)>;
    #   (*T.2)_6 = PHI <(*T.2)_10(0), (*T.2)_21(2), (*T.2)_5(3),
(*T.2)_5(4), (*T.2)_5(5), (*T.2)_5(6)>;

    #   (*T.2)_24 = VDEF <(*T.2)_6>;
    #   .GLOBAL_VAR_25 = VDEF <.GLOBAL_VAR_4>;
    #   VUSE <p_2>;
    *p = 20
  }
}

p_9 and p_14 overlap in block 2, and both are elements of a PHI node in
block 3. Both come into the PHI node on abnormal critical edges (block 3
is the start of the catch), and cannot occupy the same memory location,
so we can't coalesce them .

This occurs in libstdc++, and prevent us from compiling the library when
we enable overlapping live ranges.

So this problem needs to be resolved before we can turn on overlapping
ranges. We must not propagate copies which are going to cause conflicts
with other registers that are used across abnormal critical edges.  

                -------------------------

The second, and more serious problem, has to do with the relationship
between pointers and dereferenced pointers in our SSA implementation.

Given something simple, say:

int a[100];

int
b() 
{
  int *p= a;
  int y;

  *p = 20;
  p += 10;
  y = *p;
  if (y > 20)
    *p++ = 30;

  return *p;

}

we see something like:

  #   (*p)_5 = VDEF <(*p)_3>;
  #   VUSE <(*p)_3>;
  p_4 = &a;
  
  #   (*p)_6 = VDEF <(*p)_5>;
  #   VUSE <p_4>;
  *p = 20;
  
  #   (*p)_8 = VDEF <(*p)_6>;
  p_7 = p_4 + 40B;
  
  #   VUSE <(*p)_8>;
  #   VUSE <p_7>;
  y_9 = *p;
  if (y_9 > 20)
    {
      
      #   (*p)_10 = VDEF <(*p)_8>;
      #   VUSE <p_7>;
      *p = 30;
      
      #   (*p)_12 = VDEF <(*p)_10>;
      p_11 = p_7 + 4B
    }


if none of the versions of p overlap, then p_1, p_4, p_7 and p_11 all
coalesce together, and are assigned to 'p'. And this program works. 

If one or more of them can't be coalesced toegther, we need to create a
new variable to represent the ones which overlap. Anywhere where that
pointer is used, we have to replace the 'p' with the new variable.

Our SSA representation doesn't include these in the operands of a stmt. 
Take for example:

  #   VUSE <(*p)_8>;
  #   VUSE <p_7>;
  y_9 = *p;

the dereference of p actually uses p_7, but the use is a virtual use
instead of a real use.  If p_7 didn't coalesce with 'p', it would be
asigned to a new variable, say P.33. This stmt would  then need to be
rewritten as:

  y = *p.33

But since the use of p_7 is virtual instead of a real use, we don't see
this when we go to rewrite.

I have created a hack which Ive been using which will look for
derefernces of variables and try to match them up with a VUSE, and if
the VUSE has been coalesced to a different variable, then it rewrites
the variable in the stmt. So it will take care of the above situation.

However, its not flawless. There are 2 situations which can occur under
which it fails. If 2 different versions of p, say p_66 and p_77 are both
dereferenced on the same stmt, there is no way to know which one needs
to be rewritten, all we know is one of them needs it.

The second case occurs when a derefernce is copy propagated into a PHI
node:
 
   if (T.1_5 != 0B)
    {

      #   VUSE <(*map)_3>;
      #   VUSE <map_4>;
      T.2_8 = map->compact_to_partition;
      i.3_9 = (unsigned int)i_6;
      T.4_10 = i.3_9 * 4;
      T.5_11 = (int *)T.4_10;

      #   (*T.6)_13 = VDEF <(*T.6)_7>;
      T.6_12 = T.2_8 + T.5_11;

      #   VUSE <T.6_12>;
      i_14 = (*T.6)_13
    };
  #   i_1 = PHI <i_6(0), (*T.6)_13(1)>;
  #   (*T.6)_2 = PHI <(*T.6)_7(0), (*T.6)_13(1)>;

  #   (*T.7)_17 = VDEF <(*T.7)_16>;
  #   VUSE <(*map)_3>;
  #   VUSE <map_4>;
     
The value of i_14 has been propagated into the PHI node. DCE the deletes
the stmt 
   i_14 = (*T.6)_13

When we go to rewrite this, all we know is that its a derefernce of T.6.
There is no VUSE now to look at to figure out what the correct pointer
it.  The original def has it as T.6_12. The information could be found
by looking for the def of (*T.6)_13 (which is virtual), and looking at
the real def, which is T.6_12. I am about to try that in my hack and see
if it works.


The real solution to this is to use the real pointer in the stmt, and
make it a a real use instead of a virtual use.  Diego is currently
investigating this. Then rewrite would simply be replacing p_7 with the
correct variable and not need to go hunting through virtual operands to 
see if it needs to do something or not.

The orignal program example I gave would then look something like:

  #   (*p_4)_5 = VDEF <(*p)_3>;
  #   VUSE <(*p)_3>;
  p_4 = &a;
  
  #   (*p_4)_6 = VDEF <(*p_4)_5>;
  #   VUSE <p_4>;
  *p_4 = 20;
  
  #   (*p_7)_8 = VDEF <(*p_4)_6>;
  p_7 = p_4 + 40B;
  
  #   VUSE <(*p_7)_8>;
  y_9 = *p_7;
  if (y_9 > 20)
    {
      
      #   (*p_7)_10 = VDEF <(*p_7)_8>;
      *p_7 = 30;
      
      #   (*p_11)_12 = VDEF <(*p_7)_10>;
      p_11 = p_7 + 4B
    }
 

Once we resolve these issues, we should be able to at least attempt to
turn on overlapping live ranges, and give ourselves lots of other bugs
to look at :-)

Andrew


^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2003-05-17 17:09 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-05-13 15:23 [tree-ssa] Out of SSA status and issues Richard Kenner
2003-05-13 18:50 ` Geoff Keating
2003-05-13 23:28   ` Michael S. Zick
2003-05-17 17:19 ` Michael S. Zick
  -- strict thread matches above, loose matches on Subject: below --
2003-05-13 13:42 Richard Kenner
2003-05-13 13:17 Richard Kenner
2003-05-13 13:27 ` Diego Novillo
2003-05-13 13:40 ` Michael Matz
2003-05-13 15:08 ` Michael S. Zick
2003-05-12 14:42 Andrew MacLeod
2003-05-12 15:38 ` Diego Novillo
2003-05-12 15:57   ` Andrew MacLeod
2003-05-12 16:05     ` Michael Matz
2003-05-12 16:10       ` Andrew MacLeod
2003-05-12 16:16       ` law
2003-05-12 17:08     ` law
2003-05-12 17:12       ` Andrew MacLeod
2003-05-12 17:26         ` law
2003-05-12 18:57 ` Andrew MacLeod
2003-05-13  9:07   ` Michael Matz
2003-05-13 12:42     ` Diego Novillo
2003-05-13 12:50       ` Andrew MacLeod
2003-05-13 13:05         ` Diego Novillo
2003-05-13 13:29           ` Andrew MacLeod
2003-05-13 13:57             ` Diego Novillo
2003-05-13 12:57       ` Michael Matz
2003-05-13 13:11         ` Diego Novillo
2003-05-13 13:18           ` Andrew MacLeod
2003-05-14 17:19             ` Jan Vroonhof
2003-05-14 18:05               ` Andrew MacLeod
2003-05-14 18:33               ` Diego Novillo
2003-05-14 19:11                 ` Daniel Berlin
2003-05-13 15:01         ` Daniel Berlin
2003-05-13 12:33   ` Diego Novillo
2003-05-13 12:49     ` Andrew MacLeod
2003-05-13 12:58       ` Diego Novillo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).