* [tree-ssa] copy propagation and the abstraction penalty @ 2003-05-14 22:45 Joe Buck 2003-05-15 3:15 ` Andrew Pinski 2003-05-15 14:10 ` Diego Novillo 0 siblings, 2 replies; 9+ messages in thread From: Joe Buck @ 2003-05-14 22:45 UTC (permalink / raw) To: gcc Consider the code -------------------------------- struct complex { double re, im; complex(double r, double i) : re(r), im(i) {} }; inline complex operator+(const complex& a, const complex& b) { return complex(a.re+b.re, a.im+b.im); } complex addone(const complex& arg) { return arg + complex(1,0); } ------------------------------- We get really lousy code for this, in all gcc versions, including tree-ssa. The reason is that we build a temporary struct to hold the 1, 0 and don't get rid of it, so we take no advantage of the zero. Calling this foo.C, foo.C.t09.ssa gives ;; Function complex addone(const complex&) (_Z6addoneRK7complex) complex addone(const complex&) (arg) { struct complex retval.12; struct complex <UVa150>; struct complex * T.8; struct complex * T.9; struct complex & T.10; struct complex T.11; { T.8_2 = &<UVa150>; { double i; double r; struct complex * const this; this_3 = (struct complex * const)T.8_2; r_5 = 1.0e+0; i_6 = 0.0; { this->re = 1.0e+0; this->im = 0.0; { (void)0 } } }; T.9_9 = &<UVa150>; T.10_10 = (struct complex &)T.9_9; { struct complex & b; struct complex <UVa770>; b_11 = T.10_10; { struct complex * T.1; double T.2; double T.3; double T.4; double T.5; double T.6; double T.7; { T.1_13 = &<UVa770>; T.2_15 = arg->re; T.3_16 = b->re; T.4_17 = T.2_15 + T.3_16; T.5_18 = arg->im; T.6_19 = b->im; T.7_20 = T.5_18 + T.6_19; { double i; double r; struct complex * const this; this_21 = (struct complex * const)T.1_13; r_23 = T.4_17; i_24 = T.7_20; { this->re = T.4_17; this->im = T.7_20; { (void)0 } } }; { (void)0; goto <ULa700>; } } }; <ULa700>:;; retval.12_27 = <UVa770> }; T.11_28 = retval.12_27; return retval.12_27; } } ------------------------------------------------------------ It would seem simple enough to eliminate all uses of the temporary struct <UVa150> by doing copy propagation: what we would have left is only the initialization of the struct itself; all uses of the struct would get the 1.0 and 0.0 values from its re and im fields. At this point the temporary struct should be eligible for killing. If we could do this alone, we would greatly improve C++ performance, especially on things like the Boost graph library. It seems that we have most of what we need in place, right? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [tree-ssa] copy propagation and the abstraction penalty 2003-05-14 22:45 [tree-ssa] copy propagation and the abstraction penalty Joe Buck @ 2003-05-15 3:15 ` Andrew Pinski 2003-05-15 14:10 ` Diego Novillo 1 sibling, 0 replies; 9+ messages in thread From: Andrew Pinski @ 2003-05-15 3:15 UTC (permalink / raw) To: Joe Buck; +Cc: Andrew Pinski, gcc Also it will fix PR9566, which is the same problem. It will also speed up mostly likely any C++ problem ever made. Thanks, Andrew Pinski On Wednesday, May 14, 2003, at 18:44 US/Eastern, Joe Buck wrote: > Consider the code > > -------------------------------- > struct complex { > double re, im; > complex(double r, double i) : re(r), im(i) {} > }; > > inline complex operator+(const complex& a, const complex& b) { > return complex(a.re+b.re, a.im+b.im); > } > > complex addone(const complex& arg) { > return arg + complex(1,0); > } > ------------------------------- > > We get really lousy code for this, in all gcc versions, including > tree-ssa. > The reason is that we build a temporary struct to hold the 1, 0 and > don't > get rid of it, so we take no advantage of the zero. > > Calling this foo.C, foo.C.t09.ssa gives > > > ;; Function complex addone(const complex&) (_Z6addoneRK7complex) > > complex addone(const complex&) (arg) > { > struct complex retval.12; > struct complex <UVa150>; > struct complex * T.8; > struct complex * T.9; > struct complex & T.10; > struct complex T.11; > > { > T.8_2 = &<UVa150>; > { > double i; > double r; > struct complex * const this; > > this_3 = (struct complex * const)T.8_2; > r_5 = 1.0e+0; > i_6 = 0.0; > { > this->re = 1.0e+0; > this->im = 0.0; > { > (void)0 > } > } > }; > T.9_9 = &<UVa150>; > T.10_10 = (struct complex &)T.9_9; > { > struct complex & b; > struct complex <UVa770>; > > b_11 = T.10_10; > { > struct complex * T.1; > double T.2; > double T.3; > double T.4; > double T.5; > double T.6; > double T.7; > > { > T.1_13 = &<UVa770>; > T.2_15 = arg->re; > T.3_16 = b->re; > T.4_17 = T.2_15 + T.3_16; > T.5_18 = arg->im; > T.6_19 = b->im; > T.7_20 = T.5_18 + T.6_19; > { > double i; > double r; > struct complex * const this; > > this_21 = (struct complex * const)T.1_13; > r_23 = T.4_17; > i_24 = T.7_20; > { > this->re = T.4_17; > this->im = T.7_20; > { > (void)0 > } > } > }; > { > (void)0; > goto <ULa700>; > } > } > }; > <ULa700>:;; > retval.12_27 = <UVa770> > }; > T.11_28 = retval.12_27; > return retval.12_27; > } > } > > ------------------------------------------------------------ > > It would seem simple enough to eliminate all uses of the temporary > struct > <UVa150> by doing copy propagation: what we would have left is only the > initialization of the struct itself; all uses of the struct would get > the > 1.0 and 0.0 values from its re and im fields. At this point the > temporary > struct should be eligible for killing. > > If we could do this alone, we would greatly improve C++ performance, > especially > on things like the Boost graph library. It seems that we have most of > what we > need in place, right? > > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [tree-ssa] copy propagation and the abstraction penalty 2003-05-14 22:45 [tree-ssa] copy propagation and the abstraction penalty Joe Buck 2003-05-15 3:15 ` Andrew Pinski @ 2003-05-15 14:10 ` Diego Novillo 2003-05-15 17:09 ` Joe Buck 2003-05-15 22:20 ` Richard Henderson 1 sibling, 2 replies; 9+ messages in thread From: Diego Novillo @ 2003-05-15 14:10 UTC (permalink / raw) To: Joe Buck; +Cc: gcc On Wed, 2003-05-14 at 18:44, Joe Buck wrote: > If we could do this alone, we would greatly improve C++ performance, especially > on things like the Boost graph library. It seems that we have most of what we > need in place, right? > There's a few bits still missing, but we should be getting there. In this particular testcase I see: * Structures. We only build enough SSA links to prove liveness. Two alternatives to dealing with this is using ESSA from the SSAPRE engine and/or rely on a scalarization pass. * Aliasing. Our default type-based aliasing mechanism gets all tangled up (it's really naive): Alias information for complex addone(const complex&): 1 sets Alias set #0: Tag: *this, may aliases: *this, may alias global memory, call clobbered Aliases objects: { <UV2150> *this <UV2770> *arg *b *this } * The long term plan is to use PTA, which disambiguates this program just fine. PTA still needs some work before we can enable it by default. I'm also wondering if we could change the may-alias between this and UV2150 to a must-alias, which would completely free this program from aliasing problems: Alias information for complex addone(const complex&): 1 sets Alias set #0: Tag: *this, may aliases: *this, .GLOBAL_VAR, call clobbered Aliases objects: { <UV2150> *this } Diego. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [tree-ssa] copy propagation and the abstraction penalty 2003-05-15 14:10 ` Diego Novillo @ 2003-05-15 17:09 ` Joe Buck 2003-05-15 17:28 ` Joe Buck 2003-05-15 18:01 ` Daniel Berlin 2003-05-15 22:20 ` Richard Henderson 1 sibling, 2 replies; 9+ messages in thread From: Joe Buck @ 2003-05-15 17:09 UTC (permalink / raw) To: Diego Novillo; +Cc: gcc On Thu, May 15, 2003 at 10:10:22AM -0400, Diego Novillo wrote: > There's a few bits still missing, but we should be getting there. In > this particular testcase I see: > > * Structures. We only build enough SSA links to prove liveness. > Two alternatives to dealing with this is using ESSA from the > SSAPRE engine and/or rely on a scalarization pass. Since this is the absolutely most common defect in our current C++ compiler, it seems worth trying to tune optimization to solve it directly, rather than trying to solve a harder, more general problem. If copy propagation can store the content of individual structure fields, then it seems that you don't need the rest of scalarization. > * Aliasing. Our default type-based aliasing mechanism gets all > tangled up (it's really naive): I think that the aliasing problem can be finessed. Again, generalized copy propagation would appear to solve this without solving any aliasing problem: replace pointers to known addresses by the addresses. Even stupid aliasing, that kills everything on the first pointer write, still can win if this is done. Let me step through this to show how it might work. ;; Function complex addone(const complex&) (_Z6addoneRK7complex) complex addone(const complex&) (arg) { struct complex retval.12; struct complex <UVa150>; struct complex * T.8; struct complex * T.9; struct complex & T.10; struct complex T.11; { T.8_2 = &<UVa150>; { double i; double r; struct complex * const this; this_3 = (struct complex * const)T.8_2; Here this_3 is &<UVa150> (copy prop). r_5 = 1.0e+0; i_6 = 0.0; { this->re = 1.0e+0; This is presumably this_3. Now we know that UVa150.re is 1.0e+0. (That is, we track fields of local structs, as part of copy propagation). this->im = 0.0; Now we know that Uva150.im is 0.0. We do need to care about aliasing, but that means that we might have invalidate the value of these fields if we get a write through a pointer that might alias UVa150. This program has no pointer writes at all, though. I think that even a very simple-minded approach that discards all stored structure field values on the first indirect write would still show huge gains, because these temporary structs are often live for an extremely short distance (they just pass information across an inlined call). } }; T.9_9 = &<UVa150>; T.10_10 = (struct complex &)T.9_9; T.10_10 is &<Uva150>. { struct complex & b; struct complex <UVa770>; b_11 = T.10_10; b_11 is &<Uva150>. { struct complex * T.1; double T.2; double T.3; double T.4; double T.5; double T.6; double T.7; { T.1_13 = &<UVa770>; T.2_15 = arg->re; T.3_16 = b->re; T.3_16 is (&<UVa150>)->re, or UVa150.re, or 1.0. T.4_17 = T.2_15 + T.3_16; T.5_18 = arg->im; T.5_18 is (&<UVa150>)->im, or UVa150.im, or 1.0. After doing this substitution, UVa150 is dead, and we can kill it. T.6_19 = b->im; T.7_20 = T.5_18 + T.6_19; { double i; double r; struct complex * const this; this_21 = (struct complex * const)T.1_13; r_23 = T.4_17; i_24 = T.7_20; { this->re = T.4_17; this->im = T.7_20; } }; } }; retval.12_27 = <UVa770> }; T.11_28 = retval.12_27; return retval.12_27; } } ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [tree-ssa] copy propagation and the abstraction penalty 2003-05-15 17:09 ` Joe Buck @ 2003-05-15 17:28 ` Joe Buck 2003-05-15 18:01 ` Daniel Berlin 1 sibling, 0 replies; 9+ messages in thread From: Joe Buck @ 2003-05-15 17:28 UTC (permalink / raw) To: Diego Novillo; +Cc: gcc I wrote: > this->im = 0.0; > > Now we know that Uva150.im is 0.0. We do need to care about aliasing, but > that means that we might have invalidate the value of these fields if we > get a write through a pointer that might alias UVa150. This program has > no pointer writes at all, though. This may puzzle some folks: isn't "this->im = 0" a pointer write? It is not, because we know the exact value of "this", so it is a write to a known structure field. Aliasing isn't an issue when we know exactly where the pointer points. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [tree-ssa] copy propagation and the abstraction penalty 2003-05-15 17:09 ` Joe Buck 2003-05-15 17:28 ` Joe Buck @ 2003-05-15 18:01 ` Daniel Berlin 1 sibling, 0 replies; 9+ messages in thread From: Daniel Berlin @ 2003-05-15 18:01 UTC (permalink / raw) To: Joe Buck; +Cc: Diego Novillo, gcc On Thursday, May 15, 2003, at 01:09 PM, Joe Buck wrote: > On Thu, May 15, 2003 at 10:10:22AM -0400, Diego Novillo wrote: > >> There's a few bits still missing, but we should be getting there. In >> this particular testcase I see: >> >> * Structures. We only build enough SSA links to prove liveness. >> Two alternatives to dealing with this is using ESSA from the >> SSAPRE engine and/or rely on a scalarization pass. > > Since this is the absolutely most common defect in our current C++ > compiler, it seems worth trying to tune optimization to solve it > directly, > rather than trying to solve a harder, more general problem. > > If copy propagation can store the content of individual structure > fields, > then it seems that you don't need the rest of scalarization But the amount of memory necessary to do SSA on component_refs gets *really* large unless you share the component_refs for the SSA variables where possible. Diego and I went through this back when we had Factored Use-Def chains. I really think we *should* have structure accesses renamed, and it's easier than doing pointer renaming (since unions are the only thing you have to really worry about, and you can calculate the offsets). It's just something that we had punted on until later (after the great FUD chain -> renaming switchover). I guess now = later. --Dan ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [tree-ssa] copy propagation and the abstraction penalty 2003-05-15 14:10 ` Diego Novillo 2003-05-15 17:09 ` Joe Buck @ 2003-05-15 22:20 ` Richard Henderson 2003-05-15 22:48 ` Joe Buck 1 sibling, 1 reply; 9+ messages in thread From: Richard Henderson @ 2003-05-15 22:20 UTC (permalink / raw) To: Diego Novillo; +Cc: Joe Buck, gcc On Thu, May 15, 2003 at 10:10:22AM -0400, Diego Novillo wrote: > I'm also wondering if we could change the > may-alias between this and UV2150 to a must-alias, which would > completely free this program from aliasing problems: Indeed. And for this case I definitely think it's the right thing to do. IMO constant propagation should be able to take T.8_2 = &<UVa150>; { struct complex * const this; this_3 = (struct complex * const)T.8_2; { this->re = 1.0e+0; and turn it into (&<UVa150>)->re = 1.0e+0 which folds to <UVa150>.re = 1.0e+0 At which point we have no aliasing problem, and a subsequent round of constant propagation ought to be able to send 1.0e+0 to its destination. r~ ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [tree-ssa] copy propagation and the abstraction penalty 2003-05-15 22:20 ` Richard Henderson @ 2003-05-15 22:48 ` Joe Buck 2003-05-15 22:51 ` Joe Buck 0 siblings, 1 reply; 9+ messages in thread From: Joe Buck @ 2003-05-15 22:48 UTC (permalink / raw) To: Richard Henderson, Diego Novillo, gcc On Thu, May 15, 2003 at 03:17:48PM -0700, Richard Henderson wrote: > On Thu, May 15, 2003 at 10:10:22AM -0400, Diego Novillo wrote: > > I'm also wondering if we could change the > > may-alias between this and UV2150 to a must-alias, which would > > completely free this program from aliasing problems: > > Indeed. And for this case I definitely think it's the right thing to do. > > IMO constant propagation should be able to take > > T.8_2 = &<UVa150>; > { > struct complex * const this; > > this_3 = (struct complex * const)T.8_2; > { > this->re = 1.0e+0; > > and turn it into > > (&<UVa150>)->re = 1.0e+0 > > which folds to > > <UVa150>.re = 1.0e+0 > > At which point we have no aliasing problem, and a subsequent round > of constant propagation ought to be able to send 1.0e+0 to its > destination. That helps this case, but in many other cases the content of the temporary struct's field will be a variable, and we would still want to copy-propagate. For example, consider bit vector classes. Typically the [] operator will be overloaded to return a "bitref" object, which is a struct that has two fields: a reference to the bit vector, and a bit offset. The compiler should be able to eliminate the temporary object. We would have struct bitref; class bitvec { public: void set_bit(unsigned pos, bool value); bool get_bit(unsigned pos) const; inline bitref operator[](unsigned pos); }; struct bitref { bitref(bitvec& o, unsigned p) : obj(o), pos(p) {} bitvec& obj; unsigned pos; operator bool() const { return obj.get_bit(pos);} void operator=(bool value) { obj.set_bit(pos, value);} void operator=(const bitref& src) { obj.set_bit(pos, src);} }; inline bitref bitvec::operator[](unsigned pos) { return bitref(*this, pos);} and we want to compile void assign(bitvec& dest, bitvec& src, unsigned i, unsigned j) { dest[i] = src[j]; } Ideally, we should be able to do copy propagation good enough to turn this into dest.set_bit(i, src.get_bit(j)); which means that we can kill the two generated bitref objects. Note, though, that there are no constants. We have {&dest,i} and {&src,j}. This kind of thing occurs throughout common C++ codes, and really kills us on the Boost graph library. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [tree-ssa] copy propagation and the abstraction penalty 2003-05-15 22:48 ` Joe Buck @ 2003-05-15 22:51 ` Joe Buck 0 siblings, 0 replies; 9+ messages in thread From: Joe Buck @ 2003-05-15 22:51 UTC (permalink / raw) To: Richard Henderson, Diego Novillo, gcc On Thu, May 15, 2003 at 03:48:29PM -0700, Joe Buck wrote: > On Thu, May 15, 2003 at 03:17:48PM -0700, Richard Henderson wrote: > > Indeed. And for this case I definitely think it's the right thing to do. > > > > IMO constant propagation should be able to take > > > > T.8_2 = &<UVa150>; > > { > > struct complex * const this; > > > > this_3 = (struct complex * const)T.8_2; > > { > > this->re = 1.0e+0; > > > > and turn it into > > > > (&<UVa150>)->re = 1.0e+0 > > > > which folds to > > > > <UVa150>.re = 1.0e+0 > > > > At which point we have no aliasing problem, and a subsequent round > > of constant propagation ought to be able to send 1.0e+0 to its > > destination. > > That helps this case, but in many other cases the content of the temporary > struct's field will be a variable, and we would still want to copy-propagate. Sigh. Sorry Richard: of course the addresses are constant; I was focusing on the 1.0e0 constant. As a Gilda Radner character used to say, never mind. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2003-05-15 22:51 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2003-05-14 22:45 [tree-ssa] copy propagation and the abstraction penalty Joe Buck 2003-05-15 3:15 ` Andrew Pinski 2003-05-15 14:10 ` Diego Novillo 2003-05-15 17:09 ` Joe Buck 2003-05-15 17:28 ` Joe Buck 2003-05-15 18:01 ` Daniel Berlin 2003-05-15 22:20 ` Richard Henderson 2003-05-15 22:48 ` Joe Buck 2003-05-15 22:51 ` Joe Buck
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).