public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* [tree-ssa] copy propagation and the abstraction penalty
@ 2003-05-14 22:45 Joe Buck
  2003-05-15  3:15 ` Andrew Pinski
  2003-05-15 14:10 ` Diego Novillo
  0 siblings, 2 replies; 9+ messages in thread
From: Joe Buck @ 2003-05-14 22:45 UTC (permalink / raw)
  To: gcc

Consider the code

--------------------------------
struct complex {
    double re, im;
    complex(double r, double i) : re(r), im(i) {}
};

inline complex operator+(const complex& a, const complex& b) {
    return complex(a.re+b.re, a.im+b.im);
}

complex addone(const complex& arg) {
    return arg + complex(1,0);
}
-------------------------------

We get really lousy code for this, in all gcc versions, including tree-ssa.
The reason is that we build a temporary struct to hold the 1, 0 and don't
get rid of it, so we take no advantage of the zero.

Calling this foo.C, foo.C.t09.ssa gives


;; Function complex addone(const complex&) (_Z6addoneRK7complex)

complex addone(const complex&) (arg)
{
  struct complex retval.12;
  struct complex <UVa150>;
  struct complex * T.8;
  struct complex * T.9;
  struct complex & T.10;
  struct complex T.11;

  {
    T.8_2 = &<UVa150>;
    {
      double i;
      double r;
      struct complex * const this;

      this_3 = (struct complex * const)T.8_2;
      r_5 = 1.0e+0;
      i_6 = 0.0;
      {
        this->re = 1.0e+0;
        this->im = 0.0;
        {
          (void)0
        }
      }
    };
    T.9_9 = &<UVa150>;
    T.10_10 = (struct complex &)T.9_9;
    {
      struct complex & b;
      struct complex <UVa770>;

      b_11 = T.10_10;
      {
        struct complex * T.1;
        double T.2;
        double T.3;
        double T.4;
        double T.5;
        double T.6;
        double T.7;

        {
          T.1_13 = &<UVa770>;
          T.2_15 = arg->re;
          T.3_16 = b->re;
          T.4_17 = T.2_15 + T.3_16;
          T.5_18 = arg->im;
          T.6_19 = b->im;
          T.7_20 = T.5_18 + T.6_19;
          {
            double i;
            double r;
            struct complex * const this;

            this_21 = (struct complex * const)T.1_13;
            r_23 = T.4_17;
            i_24 = T.7_20;
            {
              this->re = T.4_17;
              this->im = T.7_20;
              {
                (void)0
              }
            }
          };
          {
            (void)0;
            goto <ULa700>;
          }
        }
      };
      <ULa700>:;;
      retval.12_27 = <UVa770>
    };
    T.11_28 = retval.12_27;
    return retval.12_27;
  }
}

------------------------------------------------------------

It would seem simple enough to eliminate all uses of the temporary struct
<UVa150> by doing copy propagation: what we would have left is only the
initialization of the struct itself; all uses of the struct would get the
1.0 and 0.0 values from its re and im fields.  At this point the temporary
struct should be eligible for killing.

If we could do this alone, we would greatly improve C++ performance, especially
on things like the Boost graph library.  It seems that we have most of what we
need in place, right?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [tree-ssa] copy propagation and the abstraction penalty
  2003-05-14 22:45 [tree-ssa] copy propagation and the abstraction penalty Joe Buck
@ 2003-05-15  3:15 ` Andrew Pinski
  2003-05-15 14:10 ` Diego Novillo
  1 sibling, 0 replies; 9+ messages in thread
From: Andrew Pinski @ 2003-05-15  3:15 UTC (permalink / raw)
  To: Joe Buck; +Cc: Andrew Pinski, gcc

Also it will fix PR9566, which is the same problem.
It will also speed up mostly likely any C++ problem ever made.

Thanks,
Andrew Pinski

On Wednesday, May 14, 2003, at 18:44 US/Eastern, Joe Buck wrote:

> Consider the code
>
> --------------------------------
> struct complex {
>     double re, im;
>     complex(double r, double i) : re(r), im(i) {}
> };
>
> inline complex operator+(const complex& a, const complex& b) {
>     return complex(a.re+b.re, a.im+b.im);
> }
>
> complex addone(const complex& arg) {
>     return arg + complex(1,0);
> }
> -------------------------------
>
> We get really lousy code for this, in all gcc versions, including 
> tree-ssa.
> The reason is that we build a temporary struct to hold the 1, 0 and 
> don't
> get rid of it, so we take no advantage of the zero.
>
> Calling this foo.C, foo.C.t09.ssa gives
>
>
> ;; Function complex addone(const complex&) (_Z6addoneRK7complex)
>
> complex addone(const complex&) (arg)
> {
>   struct complex retval.12;
>   struct complex <UVa150>;
>   struct complex * T.8;
>   struct complex * T.9;
>   struct complex & T.10;
>   struct complex T.11;
>
>   {
>     T.8_2 = &<UVa150>;
>     {
>       double i;
>       double r;
>       struct complex * const this;
>
>       this_3 = (struct complex * const)T.8_2;
>       r_5 = 1.0e+0;
>       i_6 = 0.0;
>       {
>         this->re = 1.0e+0;
>         this->im = 0.0;
>         {
>           (void)0
>         }
>       }
>     };
>     T.9_9 = &<UVa150>;
>     T.10_10 = (struct complex &)T.9_9;
>     {
>       struct complex & b;
>       struct complex <UVa770>;
>
>       b_11 = T.10_10;
>       {
>         struct complex * T.1;
>         double T.2;
>         double T.3;
>         double T.4;
>         double T.5;
>         double T.6;
>         double T.7;
>
>         {
>           T.1_13 = &<UVa770>;
>           T.2_15 = arg->re;
>           T.3_16 = b->re;
>           T.4_17 = T.2_15 + T.3_16;
>           T.5_18 = arg->im;
>           T.6_19 = b->im;
>           T.7_20 = T.5_18 + T.6_19;
>           {
>             double i;
>             double r;
>             struct complex * const this;
>
>             this_21 = (struct complex * const)T.1_13;
>             r_23 = T.4_17;
>             i_24 = T.7_20;
>             {
>               this->re = T.4_17;
>               this->im = T.7_20;
>               {
>                 (void)0
>               }
>             }
>           };
>           {
>             (void)0;
>             goto <ULa700>;
>           }
>         }
>       };
>       <ULa700>:;;
>       retval.12_27 = <UVa770>
>     };
>     T.11_28 = retval.12_27;
>     return retval.12_27;
>   }
> }
>
> ------------------------------------------------------------
>
> It would seem simple enough to eliminate all uses of the temporary 
> struct
> <UVa150> by doing copy propagation: what we would have left is only the
> initialization of the struct itself; all uses of the struct would get 
> the
> 1.0 and 0.0 values from its re and im fields.  At this point the 
> temporary
> struct should be eligible for killing.
>
> If we could do this alone, we would greatly improve C++ performance, 
> especially
> on things like the Boost graph library.  It seems that we have most of 
> what we
> need in place, right?
>
>
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [tree-ssa] copy propagation and the abstraction penalty
  2003-05-14 22:45 [tree-ssa] copy propagation and the abstraction penalty Joe Buck
  2003-05-15  3:15 ` Andrew Pinski
@ 2003-05-15 14:10 ` Diego Novillo
  2003-05-15 17:09   ` Joe Buck
  2003-05-15 22:20   ` Richard Henderson
  1 sibling, 2 replies; 9+ messages in thread
From: Diego Novillo @ 2003-05-15 14:10 UTC (permalink / raw)
  To: Joe Buck; +Cc: gcc

On Wed, 2003-05-14 at 18:44, Joe Buck wrote:

> If we could do this alone, we would greatly improve C++ performance, especially
> on things like the Boost graph library.  It seems that we have most of what we
> need in place, right?
>
There's a few bits still missing, but we should be getting there.  In
this particular testcase I see:

      * Structures.  We only build enough SSA links to prove liveness. 
        Two alternatives to dealing with this is using ESSA from the
        SSAPRE engine and/or rely on a scalarization pass.

      * Aliasing.  Our default type-based aliasing mechanism gets all
        tangled up (it's really naive):

        Alias information for complex addone(const complex&): 1 sets
        
        Alias set #0:
          Tag: *this, may aliases: *this, may alias global memory, call clobbered
          Aliases objects: { <UV2150> *this <UV2770> *arg *b *this }
        
	
      * The long term plan is to use PTA, which disambiguates this
        program just fine.  PTA still needs some work before we can
        enable it by default.  I'm also wondering if we could change the
        may-alias between this and UV2150 to a must-alias, which would
        completely free this program from aliasing problems:

        Alias information for complex addone(const complex&): 1 sets
        
        Alias set #0:
          Tag: *this, may aliases: *this, .GLOBAL_VAR, call clobbered
          Aliases objects: { <UV2150> *this }
        

Diego.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [tree-ssa] copy propagation and the abstraction penalty
  2003-05-15 14:10 ` Diego Novillo
@ 2003-05-15 17:09   ` Joe Buck
  2003-05-15 17:28     ` Joe Buck
  2003-05-15 18:01     ` Daniel Berlin
  2003-05-15 22:20   ` Richard Henderson
  1 sibling, 2 replies; 9+ messages in thread
From: Joe Buck @ 2003-05-15 17:09 UTC (permalink / raw)
  To: Diego Novillo; +Cc: gcc

On Thu, May 15, 2003 at 10:10:22AM -0400, Diego Novillo wrote:

> There's a few bits still missing, but we should be getting there.  In
> this particular testcase I see:
> 
>       * Structures.  We only build enough SSA links to prove liveness. 
>         Two alternatives to dealing with this is using ESSA from the
>         SSAPRE engine and/or rely on a scalarization pass.

Since this is the absolutely most common defect in our current C++
compiler, it seems worth trying to tune optimization to solve it directly,
rather than trying to solve a harder, more general problem.

If copy propagation can store the content of individual structure fields,
then it seems that you don't need the rest of scalarization.

>       * Aliasing.  Our default type-based aliasing mechanism gets all
>         tangled up (it's really naive):

I think that the aliasing problem can be finessed.

Again, generalized copy propagation would appear to solve this without
solving any aliasing problem: replace pointers to known addresses by the
addresses.  Even stupid aliasing, that kills everything on the first
pointer write, still can win if this is done.

Let me step through this to show how it might work.

;; Function complex addone(const complex&) (_Z6addoneRK7complex)

complex addone(const complex&) (arg)
{
  struct complex retval.12;
  struct complex <UVa150>;
  struct complex * T.8;
  struct complex * T.9;
  struct complex & T.10;
  struct complex T.11;

  {
    T.8_2 = &<UVa150>;
    {
      double i;
      double r;
      struct complex * const this;

      this_3 = (struct complex * const)T.8_2;

Here this_3 is &<UVa150> (copy prop).

      r_5 = 1.0e+0;
      i_6 = 0.0;
      {
        this->re = 1.0e+0;

This is presumably this_3.  Now we know that UVa150.re is 1.0e+0.
(That is, we track fields of local structs, as part of copy propagation).

        this->im = 0.0;

Now we know that Uva150.im is 0.0.  We do need to care about aliasing, but
that means that we might have invalidate the value of these fields if we
get a write through a pointer that might alias UVa150.  This program has
no pointer writes at all, though.

I think that even a very simple-minded approach that discards all stored
structure field values on the first indirect write would still show huge
gains, because these temporary structs are often live for an extremely
short distance (they just pass information across an inlined call).

      }
    };
    T.9_9 = &<UVa150>;
    T.10_10 = (struct complex &)T.9_9;

T.10_10 is &<Uva150>.

    {
      struct complex & b;
      struct complex <UVa770>;

      b_11 = T.10_10;

b_11 is &<Uva150>.

      {
        struct complex * T.1;
        double T.2;
        double T.3;
        double T.4;
        double T.5;
        double T.6;
        double T.7;

        {
          T.1_13 = &<UVa770>;
          T.2_15 = arg->re;
          T.3_16 = b->re;

T.3_16 is (&<UVa150>)->re, or UVa150.re, or 1.0.

          T.4_17 = T.2_15 + T.3_16;
          T.5_18 = arg->im;

T.5_18 is (&<UVa150>)->im, or UVa150.im, or 1.0.
After doing this substitution, UVa150 is dead, and we can kill it.

          T.6_19 = b->im;
          T.7_20 = T.5_18 + T.6_19;
          {
            double i;
            double r;
            struct complex * const this;

            this_21 = (struct complex * const)T.1_13;
            r_23 = T.4_17;
            i_24 = T.7_20;
            {
              this->re = T.4_17;
              this->im = T.7_20;
            }
          };
        }
      };
      retval.12_27 = <UVa770>
    };
    T.11_28 = retval.12_27;
    return retval.12_27;
  }
}

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [tree-ssa] copy propagation and the abstraction penalty
  2003-05-15 17:09   ` Joe Buck
@ 2003-05-15 17:28     ` Joe Buck
  2003-05-15 18:01     ` Daniel Berlin
  1 sibling, 0 replies; 9+ messages in thread
From: Joe Buck @ 2003-05-15 17:28 UTC (permalink / raw)
  To: Diego Novillo; +Cc: gcc

I wrote:
>         this->im = 0.0;
> 
> Now we know that Uva150.im is 0.0.  We do need to care about aliasing, but
> that means that we might have invalidate the value of these fields if we
> get a write through a pointer that might alias UVa150.  This program has
> no pointer writes at all, though.

This may puzzle some folks: isn't "this->im = 0" a pointer write?  It is
not, because we know the exact value of "this", so it is a write to a
known structure field.  Aliasing isn't an issue when we know exactly where
the pointer points.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [tree-ssa] copy propagation and the abstraction penalty
  2003-05-15 17:09   ` Joe Buck
  2003-05-15 17:28     ` Joe Buck
@ 2003-05-15 18:01     ` Daniel Berlin
  1 sibling, 0 replies; 9+ messages in thread
From: Daniel Berlin @ 2003-05-15 18:01 UTC (permalink / raw)
  To: Joe Buck; +Cc: Diego Novillo, gcc


On Thursday, May 15, 2003, at 01:09  PM, Joe Buck wrote:

> On Thu, May 15, 2003 at 10:10:22AM -0400, Diego Novillo wrote:
>
>> There's a few bits still missing, but we should be getting there.  In
>> this particular testcase I see:
>>
>>       * Structures.  We only build enough SSA links to prove liveness.
>>         Two alternatives to dealing with this is using ESSA from the
>>         SSAPRE engine and/or rely on a scalarization pass.
>
> Since this is the absolutely most common defect in our current C++
> compiler, it seems worth trying to tune optimization to solve it 
> directly,
> rather than trying to solve a harder, more general problem.
>
> If copy propagation can store the content of individual structure 
> fields,
> then it seems that you don't need the rest of scalarization

But the amount of memory necessary to do SSA on component_refs gets 
*really* large unless you share the component_refs for the SSA 
variables where possible.
Diego and I went through this back when we had Factored Use-Def chains.

I really think we *should* have structure accesses renamed, and it's 
easier than doing pointer renaming (since unions are the only thing you 
have to really worry about, and you can calculate the offsets).
It's just something that we had punted on until later (after the great 
FUD chain -> renaming switchover).
I guess now = later.
--Dan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [tree-ssa] copy propagation and the abstraction penalty
  2003-05-15 14:10 ` Diego Novillo
  2003-05-15 17:09   ` Joe Buck
@ 2003-05-15 22:20   ` Richard Henderson
  2003-05-15 22:48     ` Joe Buck
  1 sibling, 1 reply; 9+ messages in thread
From: Richard Henderson @ 2003-05-15 22:20 UTC (permalink / raw)
  To: Diego Novillo; +Cc: Joe Buck, gcc

On Thu, May 15, 2003 at 10:10:22AM -0400, Diego Novillo wrote:
>         I'm also wondering if we could change the
>         may-alias between this and UV2150 to a must-alias, which would
>         completely free this program from aliasing problems:

Indeed.  And for this case I definitely think it's the right thing to do.

IMO constant propagation should be able to take

    T.8_2 = &<UVa150>;
    {
      struct complex * const this;

      this_3 = (struct complex * const)T.8_2;
      {
        this->re = 1.0e+0;

and turn it into

	(&<UVa150>)->re = 1.0e+0

which folds to

	<UVa150>.re = 1.0e+0

At which point we have no aliasing problem, and a subsequent round
of constant propagation ought to be able to send 1.0e+0 to its 
destination.


r~

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [tree-ssa] copy propagation and the abstraction penalty
  2003-05-15 22:20   ` Richard Henderson
@ 2003-05-15 22:48     ` Joe Buck
  2003-05-15 22:51       ` Joe Buck
  0 siblings, 1 reply; 9+ messages in thread
From: Joe Buck @ 2003-05-15 22:48 UTC (permalink / raw)
  To: Richard Henderson, Diego Novillo, gcc

On Thu, May 15, 2003 at 03:17:48PM -0700, Richard Henderson wrote:
> On Thu, May 15, 2003 at 10:10:22AM -0400, Diego Novillo wrote:
> >         I'm also wondering if we could change the
> >         may-alias between this and UV2150 to a must-alias, which would
> >         completely free this program from aliasing problems:
> 
> Indeed.  And for this case I definitely think it's the right thing to do.
> 
> IMO constant propagation should be able to take
> 
>     T.8_2 = &<UVa150>;
>     {
>       struct complex * const this;
> 
>       this_3 = (struct complex * const)T.8_2;
>       {
>         this->re = 1.0e+0;
> 
> and turn it into
> 
> 	(&<UVa150>)->re = 1.0e+0
> 
> which folds to
> 
> 	<UVa150>.re = 1.0e+0
> 
> At which point we have no aliasing problem, and a subsequent round
> of constant propagation ought to be able to send 1.0e+0 to its 
> destination.

That helps this case, but in many other cases the content of the temporary
struct's field will be a variable, and we would still want to copy-propagate.
For example, consider bit vector classes.  Typically the [] operator will
be overloaded to return a "bitref" object, which is a struct that has two
fields: a reference to the bit vector, and a bit offset.  The compiler
should be able to eliminate the temporary object.  We would have

struct bitref;

class bitvec {
public:
    void set_bit(unsigned pos, bool value);
    bool get_bit(unsigned pos) const;
    inline bitref operator[](unsigned pos);
};

struct bitref {
    bitref(bitvec& o, unsigned p) : obj(o), pos(p) {}
    bitvec& obj;
    unsigned pos;
    operator bool() const { return obj.get_bit(pos);}
    void operator=(bool value) { obj.set_bit(pos, value);}
    void operator=(const bitref& src) { obj.set_bit(pos, src);}
};

inline bitref bitvec::operator[](unsigned pos) { return bitref(*this, pos);}

and we want to compile

void assign(bitvec& dest, bitvec& src, unsigned i, unsigned j) {
    dest[i] = src[j];
}

Ideally, we should be able to do copy propagation good enough to turn this
into

    dest.set_bit(i, src.get_bit(j));

which means that we can kill the two generated bitref objects.  Note,
though, that there are no constants.  We have {&dest,i} and {&src,j}.

This kind of thing occurs throughout common C++ codes, and really kills us
on the Boost graph library.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [tree-ssa] copy propagation and the abstraction penalty
  2003-05-15 22:48     ` Joe Buck
@ 2003-05-15 22:51       ` Joe Buck
  0 siblings, 0 replies; 9+ messages in thread
From: Joe Buck @ 2003-05-15 22:51 UTC (permalink / raw)
  To: Richard Henderson, Diego Novillo, gcc

On Thu, May 15, 2003 at 03:48:29PM -0700, Joe Buck wrote:
> On Thu, May 15, 2003 at 03:17:48PM -0700, Richard Henderson wrote:
> > Indeed.  And for this case I definitely think it's the right thing to do.
> > 
> > IMO constant propagation should be able to take
> > 
> >     T.8_2 = &<UVa150>;
> >     {
> >       struct complex * const this;
> > 
> >       this_3 = (struct complex * const)T.8_2;
> >       {
> >         this->re = 1.0e+0;
> > 
> > and turn it into
> > 
> > 	(&<UVa150>)->re = 1.0e+0
> > 
> > which folds to
> > 
> > 	<UVa150>.re = 1.0e+0
> > 
> > At which point we have no aliasing problem, and a subsequent round
> > of constant propagation ought to be able to send 1.0e+0 to its 
> > destination.
> 
> That helps this case, but in many other cases the content of the temporary
> struct's field will be a variable, and we would still want to copy-propagate.

Sigh.  Sorry Richard: of course the addresses are constant; I was focusing
on the 1.0e0 constant.  As a Gilda Radner character used to say, never
mind.


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2003-05-15 22:51 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-05-14 22:45 [tree-ssa] copy propagation and the abstraction penalty Joe Buck
2003-05-15  3:15 ` Andrew Pinski
2003-05-15 14:10 ` Diego Novillo
2003-05-15 17:09   ` Joe Buck
2003-05-15 17:28     ` Joe Buck
2003-05-15 18:01     ` Daniel Berlin
2003-05-15 22:20   ` Richard Henderson
2003-05-15 22:48     ` Joe Buck
2003-05-15 22:51       ` Joe Buck

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).