* [tree-ssa] alias analysis @ 2003-02-11 17:29 law 2003-02-11 18:44 ` Diego Novillo 0 siblings, 1 reply; 23+ messages in thread From: law @ 2003-02-11 17:29 UTC (permalink / raw) To: dnovillo; +Cc: gcc I've finally wrapped my head around how your scheme speeds up alias analysis and it's not going to be trivial to merge your ideas with mine for separating loads and stores, though it may be possible. Your scheme depends on having every object which has the same alias set being represented by a single tag. Then you just need to check things against that representative tag. In my scheme for separating loads and stores, we can have different tags for stores to addressable objects with the same underlying tag. ie int x = 5; int y = 10; foo (&x, &y); We'd have distinct tags for x & y, even though they have the same alias set. This is important when we check aliased loads -- we have to look through *all* the tags rather than quitting when we find a alias. The number of tags is proportional to the number of stores to unique addressable variables and the number of pointer stores using unique alias sets. FWIW, this scheme completes my components of cc1 test in 698 seconds. Note I haven't really tested this scheme thoroughly, so it may have bugs which could affect the timing. Another approach is to go ahead and assume that X & Y may alias each other so that they're globbed into a single alias set. This scheme will lead to less accurate information, but does play reasonably well with your approach. This scheme completes my components of cc1 test in 701 seconds. This scheme has been beat on pretty good. Compare that to the current state of the branch which clocks in at 708 seconds and produces less accurate information than either of the approaches mentioned above. I haven't compared the two schemes to see in general how they perform in terms of disambiguating memory references, but they both manage to disambiguate all the memory references in 20001226-1.c. Jeff ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [tree-ssa] alias analysis 2003-02-11 17:29 [tree-ssa] alias analysis law @ 2003-02-11 18:44 ` Diego Novillo 2003-02-11 18:51 ` law 0 siblings, 1 reply; 23+ messages in thread From: Diego Novillo @ 2003-02-11 18:44 UTC (permalink / raw) To: law; +Cc: gcc On Tue, 11 Feb 2003, Jeff Law wrote: > Your scheme depends on having every object which has the same alias set > being represented by a single tag. Then you just need to check things > against that representative tag. > Yes. > In my scheme for separating loads and stores, we can have different > tags for stores to addressable objects with the same underlying tag. > > ie > > int x = 5; > int y = 10; > > foo (&x, &y); > > > We'd have distinct tags for x & y, even though they have the same alias > set. > Now you lost me. I thought you were separating load from store aliases? Aren't x and y store aliases here? Or is it load aliases? I think I need a step-by-step description again :) > This is important when we check aliased loads -- we have to look through > *all* the tags rather than quitting when we find a alias. The number of > tags is proportional to the number of stores to unique addressable variables > and the number of pointer stores using unique alias sets. > Maybe we could add an attribute to each alias set to determine if it's a load or a store alias? Diego. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [tree-ssa] alias analysis 2003-02-11 18:44 ` Diego Novillo @ 2003-02-11 18:51 ` law 2003-02-11 20:35 ` Diego Novillo 0 siblings, 1 reply; 23+ messages in thread From: law @ 2003-02-11 18:51 UTC (permalink / raw) To: Diego Novillo; +Cc: gcc In message <20030211183344.GA5611@tornado.toronto.redhat.com>, Diego Novillo wr ites: >On Tue, 11 Feb 2003, Jeff Law wrote: > >> Your scheme depends on having every object which has the same alias set >> being represented by a single tag. Then you just need to check things >> against that representative tag. >> >Yes. > >> In my scheme for separating loads and stores, we can have different >> tags for stores to addressable objects with the same underlying tag. >> >> ie >> >> int x = 5; >> int y = 10; >> >> foo (&x, &y); >> >> >> We'd have distinct tags for x & y, even though they have the same alias >> set. >> >Now you lost me. I thought you were separating load from store >aliases? Aren't x and y store aliases here? Or is it load >aliases? I think I need a step-by-step description again :) They are stores, but they are stores to distinct VAR_DECLs. There's no way they can actually alias each other as the underlying objects will always be distinct. Ie, there is no way that x = foo will every modify y. So while they have the same alias set number, they do not alias and we can keep their tags distinct. The only time we have to join them into a superset is if we find something like *p = blah. Jeff ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [tree-ssa] alias analysis 2003-02-11 18:51 ` law @ 2003-02-11 20:35 ` Diego Novillo 2003-02-11 20:35 ` Diego Novillo ` (3 more replies) 0 siblings, 4 replies; 23+ messages in thread From: Diego Novillo @ 2003-02-11 20:35 UTC (permalink / raw) To: law; +Cc: gcc On Tue, 11 Feb 2003, Jeff Law wrote: > >> int x = 5; > >> int y = 10; > >> > >> foo (&x, &y); > >> > >> > >> We'd have distinct tags for x & y, even though they have the same alias > >> set. > >> > >Now you lost me. I thought you were separating load from store > >aliases? Aren't x and y store aliases here? Or is it load > >aliases? I think I need a step-by-step description again :) > They are stores, but they are stores to distinct VAR_DECLs. There's no > way they can actually alias each other as the underlying objects will > always be distinct. Ie, there is no way that x = foo will every modify > y. > Oh, I see. Do you want to be able to rewrite X and Y in this case? The problem is that since X and Y are aliased, regardless of what their alias is, SSA will refuse to rewrite them. What we will re-write are the VDEFs to their alias tag. In this case we would want to split alias sets whose members cannot alias each other. But this could quickly take us back to quadratic behaviour. Say we have this initial alias set for X and Y: *p: { x y } What you can do is traverse this alias set and make X and Y alias only those members of the set that can actually point to them. That is, any other INDIRECT_REFs in the set and themselves. In the case above, X and Y only alias themselves and *P. I know it's a bit absurd to think that an addressable aliases itself, but it's a tautology and happens to fit in the existing framework :) At this point we can introduce the asymmetry of loads and stores. For each variable V, we distinguish between V's store-aliases and load-aliases: (1) We create a VDEF for every store-alias of V everytime V is assigned a new value. (2) We create a VUSE for every load-alias of V everytime V is used. In this case we will have: x: store-alias: *p, load-alias: x y: store-alias: *p, load-alias: y *p: store-alias: x, y, load-alias: x, y Normal Form SSA Form { { p = &x or &y p = &x or &y ... ... # (*p)_2 = VDEF<(*p)_1> x = 5; x_1 = 5; # (*p)_3 = VDEF<(*p)_2> y = 3; y_2 = 3; # x_2 = VDEF<x_1> # y_3 = VDEF<y_2> *p = *p + 8; (*p)_5 = (*p)_3 + 8; ... = x + y; ... = x_2 + y_3; } } Is this something along the lines you had in mind. Apologies if it makes little sense. I'm on my way out and I'll probably have little connectivity until next week. I'll think about it some more in the meantime. Diego. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [tree-ssa] alias analysis 2003-02-11 20:35 ` Diego Novillo @ 2003-02-11 20:35 ` Diego Novillo 2003-02-11 22:39 ` [tree-ssa]: Always compute sets (Was Re: [tree-ssa] alias analysis) Daniel Berlin ` (2 subsequent siblings) 3 siblings, 0 replies; 23+ messages in thread From: Diego Novillo @ 2003-02-11 20:35 UTC (permalink / raw) To: law; +Cc: gcc On Tue, 11 Feb 2003, Diego Novillo wrote: > On Tue, 11 Feb 2003, Jeff Law wrote: > > > >> int x = 5; > > >> int y = 10; > > >> > > >> foo (&x, &y); > > >> > > >> > > >> We'd have distinct tags for x & y, even though they have the same alias > > >> set. > > >> > > >Now you lost me. I thought you were separating load from store > > >aliases? Aren't x and y store aliases here? Or is it load > > >aliases? I think I need a step-by-step description again :) > > They are stores, but they are stores to distinct VAR_DECLs. There's no > > way they can actually alias each other as the underlying objects will > > always be distinct. Ie, there is no way that x = foo will every modify > > y. > > > Oh, I see. Do you want to be able to rewrite X and Y in this > case? The problem is that since X and Y are aliased, regardless > of what their alias is, SSA will refuse to rewrite them. What we > will re-write are the VDEFs to their alias tag. > > In this case we would want to split alias sets whose members > cannot alias each other. But this could quickly take us back to > quadratic behaviour. Say we have this initial alias set for X > and Y: > > *p: { x y } > > What you can do is traverse this alias set and make X and Y alias > only those members of the set that can actually point to them. > That is, any other INDIRECT_REFs in the set and themselves. > > In the case above, X and Y only alias themselves and *P. I know > it's a bit absurd to think that an addressable aliases itself, > but it's a tautology and happens to fit in the existing framework :) > > At this point we can introduce the asymmetry of loads and stores. > For each variable V, we distinguish between V's store-aliases and > load-aliases: > > (1) We create a VDEF for every store-alias of V everytime V is > assigned a new value. > > (2) We create a VUSE for every load-alias of V everytime V is > used. > > In this case we will have: > > x: store-alias: *p, load-alias: x > y: store-alias: *p, load-alias: y > *p: store-alias: x, y, load-alias: x, y > > Normal Form SSA Form > > { { > p = &x or &y p = &x or &y > ... ... > > # (*p)_2 = VDEF<(*p)_1> > x = 5; x_1 = 5; > > # (*p)_3 = VDEF<(*p)_2> > y = 3; y_2 = 3; > > # x_2 = VDEF<x_1> > # y_3 = VDEF<y_2> > *p = *p + 8; (*p)_5 = (*p)_3 + 8; > ... = x + y; ... = x_2 + y_3; > } } > > > Is this something along the lines you had in mind. Apologies if > it makes little sense. I'm on my way out and I'll probably have > little connectivity until next week. I'll think about it some > more in the meantime. > One thing I forgot. This has the potential of being a memory and performance pig. These alias sets tend to have many variables clustered together. Diego. ^ permalink raw reply [flat|nested] 23+ messages in thread
* [tree-ssa]: Always compute sets (Was Re: [tree-ssa] alias analysis) 2003-02-11 20:35 ` Diego Novillo 2003-02-11 20:35 ` Diego Novillo @ 2003-02-11 22:39 ` Daniel Berlin 2003-02-12 4:37 ` [tree-ssa] alias analysis law 2003-02-12 6:30 ` law 3 siblings, 0 replies; 23+ messages in thread From: Daniel Berlin @ 2003-02-11 22:39 UTC (permalink / raw) To: Diego Novillo; +Cc: law, gcc > > > Is this something along the lines you had in mind. Apologies if > it makes little sense. I'm on my way out and I'll probably have > little connectivity until next week. I'll think about it some > more in the meantime. > FYI: I'm going to remove the specific casing for PTA (since it's broken right now, the find_may_aliases_for routine it calls doesn't generate aliases right), and let it just be a disambiguator. IE it'll just always call compute_alias_sets in tree-dfa.c It seems to do okay on tests i have in terms of alias sets not globbing together unrelated things that PTA says are non-aliasing. And the statics-only interprocedural mode does get rid of all the global var and parameter passing related aliasing sets when approriate, it seems. IE in static int bob (int **b1, int *c1, int *d1) { *b1 = c1; } int main(void) { int *a; int b2; int c2; bob (&a, &b2, &c2); printf ("%d\n", *a); } With -ftree-points-to=andersen -fip, we come up with: Alias information for main: 2 sets Alias set #0: Tag: *.GLOBAL_VAR, may aliases: *.GLOBAL_VAR, may alias global memory Aliases: { *.GLOBAL_VAR } Alias set #1: Tag: *a, may aliases: *a Aliases: { c2 *a } which is perfect information. And without -fip, for: int main(void) { int *a; int *b; int c; c = 5; b = &c; a = b; *a = *b; } we come up with: Without Andersen: Alias information for main: 1 sets Alias set #0: Tag: *a, may aliases: *a Aliases: { c *a *b } With Andersen: Alias information for main: 2 sets Alias set #0: Tag: *a, may aliases: *a Aliases: { c *a } Alias set #1: Tag: *b, may aliases: *b Aliases: { *b } (we never ask whether b can alias c, presumably because b is never dereferenced) Patch i'll commit attach. --Dan 2003-02-12 Daniel Berlin <dberlin@dberlin.org> * tree-dfa.c (find_may_aliases_for): Remove (compute_may_aliases): Always all compute_alias_sets. Index: tree-dfa.c =================================================================== RCS file: /cvs/gcc/gcc/gcc/Attic/tree-dfa.c,v retrieving revision 1.1.4.76 diff -u -3 -p -r1.1.4.76 tree-dfa.c --- tree-dfa.c 10 Feb 2003 12:27:03 -0000 1.1.4.76 +++ tree-dfa.c 11 Feb 2003 22:37:41 -0000 @@ -100,7 +100,6 @@ static void register_alias_set PARAMS ( static void find_alias_for PARAMS ((tree, tree)); static bool may_alias_p PARAMS ((tree, tree, HOST_WIDE_INT, tree, tree, HOST_WIDE_INT)); -static void find_may_aliases_for PARAMS ((int)); static bool may_access_global_mem_p PARAMS ((tree, tree)); static void set_def PARAMS ((tree *, tree)); static void add_use PARAMS ((tree *, tree)); @@ -1449,7 +1448,6 @@ clobber_vars_r (tp, walk_subtrees, data) void compute_may_aliases () { - unsigned long i; static htab_t vars_found; static htab_t indirect_refs_found; static htab_t addressable_vars_found; @@ -1507,13 +1505,7 @@ compute_may_aliases () timevar_pop (TV_TREE_PTA); } - if (flag_tree_points_to == PTA_NONE) - compute_alias_sets (); - else - { - for (i = 0; i < num_indirect_refs; i++) - find_may_aliases_for (i); - } + compute_alias_sets (); num_indirect_refs = 0; indirect_refs = 0; @@ -1870,58 +1862,6 @@ may_alias_p (v1, v1_base, v1_alias_set, return true; } - -/* Find variables that may be aliased by the variable (V1) at - index INDIRECT_REF_INDEX in the INDIRECT_REFS varray. */ - -static void -find_may_aliases_for (indirect_ref_index) - int indirect_ref_index; -{ - unsigned long i; - tree v1 = VARRAY_TREE (indirect_refs, indirect_ref_index); - tree v1_base = VARRAY_TREE (indirect_refs_base, indirect_ref_index); - HOST_WIDE_INT v1_alias_set - = VARRAY_INT (indirect_refs_alias_set, indirect_ref_index); - -#if defined ENABLE_CHECKING - if (TREE_CODE (v1) != INDIRECT_REF) - abort (); -#endif - - /* Note that our aliasing properties are symmetric, so we can - start this loop at INDIRECT_REF_INDEX + 1 to cut down on the - runtime for this routine. */ - for (i = indirect_ref_index + 1; i < num_indirect_refs; i++) - { - tree v2 = VARRAY_TREE (indirect_refs, i); - tree v2_base = VARRAY_TREE (indirect_refs_base, i); - HOST_WIDE_INT v2_alias_set = VARRAY_INT (indirect_refs_alias_set, i); - - if (may_alias_p (v1, v1_base, v1_alias_set, v2, v2_base, v2_alias_set)) - { - add_may_alias (v2, v2_base, v1, v1_base); - add_may_alias (v1, v1_base, v2, v2_base); - } - } - - /* Now check if V1 may alias any of the addressable variables. */ - for (i = 0; i < num_addressable_vars; i++) - { - tree v2 = VARRAY_TREE (addressable_vars, i); - tree v2_base = VARRAY_TREE (addressable_vars_base, i); - HOST_WIDE_INT v2_alias_set = VARRAY_INT (addressable_vars_alias_set, i); - - if (v1 == v2) - continue; - - if (may_alias_p (v1, v1_base, v1_alias_set, v2, v2_base, v2_alias_set)) - { - add_may_alias (v2, v2_base, v1, v1_base); - add_may_alias (v1, v1_base, v2, v2_base); - } - } -} /* Add ALIAS to the set of variables that may alias VAR. VAR_SYM and ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [tree-ssa] alias analysis 2003-02-11 20:35 ` Diego Novillo 2003-02-11 20:35 ` Diego Novillo 2003-02-11 22:39 ` [tree-ssa]: Always compute sets (Was Re: [tree-ssa] alias analysis) Daniel Berlin @ 2003-02-12 4:37 ` law 2003-02-12 7:33 ` Daniel Berlin 2003-02-13 15:32 ` Diego Novillo 2003-02-12 6:30 ` law 3 siblings, 2 replies; 23+ messages in thread From: law @ 2003-02-12 4:37 UTC (permalink / raw) To: Diego Novillo; +Cc: gcc In message <20030211203337.GA6202@tornado.toronto.redhat.com>, Diego Novillo wr ites: >Oh, I see. Do you want to be able to rewrite X and Y in this >case? The problem is that since X and Y are aliased, regardless >of what their alias is, SSA will refuse to rewrite them. What we >will re-write are the VDEFs to their alias tag. Well, in that example we certainly wouldn't want to rewrite them -- they're addressable *and* they would be aliased by the virtual store at the call site. We would want to continue to rewrite their VDEFs the same way we do now. All we're effectively doing is pruning the set of objects that are considered aliasing each other by noting that an alias created by a load-load is totally uninteresting. I don't see how these changes necessarily have to interact with what variables we rewrite. From the SSA builder's standpoint nothing has to change in this case as far as I can tell. >In this case we would want to split alias sets whose members >cannot alias each other. But this could quickly take us back to >quadratic behaviour. Say we have this initial alias set for X >and Y: No more so than the current algorithm. We can make the current algorithm blow up by having a different type for every variable and indirection for each of those types. Suddenly you're in the "oh god this sucks" situation again. If we make the same simplifying assumption you do for the current code, namely that the number of types is significantly smaller than the number of variables & indirections and we make the assumption that most variables are not addressable, then my code will be roughly equivalent in time complexity. Of course that also shows precisely when and how my code would blow up -- if you have code where every variable is a different type with lots of indirections, then it loses. Similarly it would lose if we had an ungodly number of addressable variables and no pointers which aliased them (if we have a pointer which aliases the addressables, then the pointer becomes the tag for the set which includes all the addressables it may alias). The reason my code tends to produce faster overall alias analysis is because we avoid computing any may aliasing information for load-load situations. Also note that my code can be twiddled to degenerate more gracefully by having addressable VAR_DECLs with the same alias set use the same tag. That was slightly slower than keeping them separate. Yes, splitting the sets would work too, but that seems to be avoiding the real issue -- creating too many useless alias relationships to begin with. Jeff ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [tree-ssa] alias analysis 2003-02-12 4:37 ` [tree-ssa] alias analysis law @ 2003-02-12 7:33 ` Daniel Berlin 2003-02-13 15:55 ` Diego Novillo 2003-02-13 15:32 ` Diego Novillo 1 sibling, 1 reply; 23+ messages in thread From: Daniel Berlin @ 2003-02-12 7:33 UTC (permalink / raw) To: law; +Cc: Diego Novillo, gcc > > Yes, splitting the sets would work too, but that seems to be avoiding > the real issue -- creating too many useless alias relationships to > begin > with. It's funny you mention creating too many useless alias relationships vs splitting sets. Splitting sets doesn't help make it faster. Turning on PTA makes it split sets a lot more right now (since we can disambiguate so much better). On 20001226-1.c, it takes a little over a second and a half (I can actually make it take well under a second, but it's not a priority for me since i think 1.68 seconds is an okay number here) to compute the points-to results tree PTA : 1.68 ( 1%) usr 0.11 ( 8%) sys 1.87 ( 1%) wall However, because of the extra set splitting that happens, phi insertion time goes through the roof: tree PHI insertion : 50.27 (18%) usr 0.05 ( 3%) sys 52.36 (18%) wall This is with *always* using alias sets, and just using PTA to disambiguate. (In case someone tries this, tree alias analysis time goes through the roof, but this is just a red herring. I've already fixed it in my tree) So be careful with extra set splitting, it'll currently drive your phi insertion time up (whether it should or not is another matter) On a random note, do either of you plan on making calls to may_alias_p from outside of tree-dfa.c anymore? I had moved the freeing of PTA memory to SSA deletion time when Diego was doing alias tags, and using may_alias_p in tree-ssa.c, but if it's not going to be called during SSA building anymore, i'll move the freeing of PTA memory back to where it used to be. > > Jeff > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [tree-ssa] alias analysis 2003-02-12 7:33 ` Daniel Berlin @ 2003-02-13 15:55 ` Diego Novillo 2003-02-13 16:00 ` Daniel Berlin 2003-02-13 16:00 ` Daniel Berlin 0 siblings, 2 replies; 23+ messages in thread From: Diego Novillo @ 2003-02-13 15:55 UTC (permalink / raw) To: Daniel Berlin; +Cc: Jeff Law, gcc On Wed, 2003-02-12 at 02:17, Daniel Berlin wrote: > On a random note, do either of you plan on making calls to may_alias_p > from outside of tree-dfa.c anymore? > Yes, of course. I keep wondering why you removed the call to PTA from tree-dfa.c. Ultimately I want to be able to use PTA. The only reason why we have a type-based analyzer was because we don't yet have a fully working PTA one. > I had moved the freeing of PTA memory to SSA deletion time when Diego > was doing alias tags, and using may_alias_p in tree-ssa.c, but if it's > not going to be called during SSA building anymore, i'll move the > freeing of PTA memory back to where it used to be. > Please put it back. I thought I had neatly separated what we do when -ftree-points-to is given and when it's not. I don't know why you removed it. If it's because tree-dfa.c doesn't use it properly, let's fix that. I want to be able to do both. Type-based should be *very* fast and rather stupid. PTA can be a bit slower (not necessarily, only if it wants to be), but more precise. Diego. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [tree-ssa] alias analysis 2003-02-13 15:55 ` Diego Novillo @ 2003-02-13 16:00 ` Daniel Berlin 2003-02-13 16:14 ` Diego Novillo 2003-02-13 16:00 ` Daniel Berlin 1 sibling, 1 reply; 23+ messages in thread From: Daniel Berlin @ 2003-02-13 16:00 UTC (permalink / raw) To: Diego Novillo; +Cc: Jeff Law, gcc On Thursday, February 13, 2003, at 10:27 AM, Diego Novillo wrote: > On Wed, 2003-02-12 at 02:17, Daniel Berlin wrote: > >> On a random note, do either of you plan on making calls to may_alias_p >> from outside of tree-dfa.c anymore? >> > Yes, of course. Errr, you removed them all and made may_alias_p static, and told me you wanted to make it private. That's why I asked. This is a completely opposite answer than what you said a few days ago. > I keep wondering why you removed the call to PTA from > tree-dfa.c. I didn't, look again. > Ultimately I want to be able to use PTA. The only reason > why we have a type-based analyzer was because we don't yet have a fully > working PTA one. We shouldn't be having an analyzer that *only* uses PTA or *only* uses TBAA. That's silly. They can work in concert, as they do now. PTA is still used to disambiguate the aliases. >> I had moved the freeing of PTA memory to SSA deletion time when Diego >> was doing alias tags, and using may_alias_p in tree-ssa.c, but if it's >> not going to be called during SSA building anymore, i'll move the >> freeing of PTA memory back to where it used to be. >> > Please put it back. Put what back? I'm not sure you quite understand what I asked above. Before (as in before we had alias leaders, months ago) we had create pta info do may-aliasing delete pta info create ssa optimize delete ssa When we had alias leaders (months ago) we had to do create pta info do may-aliasing create ssa optimize delete ssa delete pta info because you had calls to may_alias_p (which used pta info) from the "create ssa" steps. Now we still have create pta info do-may-aliasing create ssa optimize delete ssa delete pta info but you've removed all the calls to may_alias_p and made it static. If this is how it will stay, we can move back to doing create pta info do may-aliasing delete pta info create ssa optimize delete ssa again. That's all i've asked. > I thought I had neatly separated what we do when > -ftree-points-to is given and when it's not. I don't know why you > removed it. Because it's broken that way (Jeff's changes broke it, AFAICT), and because it's too memory intensive if you create explicit may-alias sets because of global var aliasing. It's not usable that way. I've got figures, but they aren't pretty. It's not a fixable problem. We'll have to create some kind of set to reduce the memory requirements. > If it's because tree-dfa.c doesn't use it properly, let's fix that. > I > want to be able to do both. We still use PTA to disambiguate the sets created now if it's available. We just don't create explicit may aliases with no sets anymore. The quality of the information actually hasn't changed with what we do now, or else i wouldn't have done it. > Type-based should be *very* fast and rather > stupid. PTA can be a bit slower (not necessarily, only if it wants to > be), but more precise. It still is 99.99% as precise as it was before. You seem to think we don't use PTA anymore. This is not the case at all. I've only removed broken code that won't do something we can use anyway, and can't be made to do something we can use. > > Diego. > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [tree-ssa] alias analysis 2003-02-13 16:00 ` Daniel Berlin @ 2003-02-13 16:14 ` Diego Novillo 0 siblings, 0 replies; 23+ messages in thread From: Diego Novillo @ 2003-02-13 16:14 UTC (permalink / raw) To: Daniel Berlin; +Cc: Jeff Law, gcc On Thu, 2003-02-13 at 10:58, Daniel Berlin wrote: > > On Thursday, February 13, 2003, at 10:27 AM, Diego Novillo wrote: > > > On Wed, 2003-02-12 at 02:17, Daniel Berlin wrote: > > > >> On a random note, do either of you plan on making calls to may_alias_p > >> from outside of tree-dfa.c anymore? > >> > > Yes, of course. > > Errr, you removed them all and made may_alias_p static, and told me you > wanted to make it private. That's why I asked. > This is a completely opposite answer than what you said a few days ago. > Eh? Oh, right. Sorry, I just realized that I was talking nonsense. I didn't read your message right. Please disregard *everything* I said :) Diego. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [tree-ssa] alias analysis 2003-02-13 15:55 ` Diego Novillo 2003-02-13 16:00 ` Daniel Berlin @ 2003-02-13 16:00 ` Daniel Berlin 2003-02-13 16:23 ` Diego Novillo 1 sibling, 1 reply; 23+ messages in thread From: Daniel Berlin @ 2003-02-13 16:00 UTC (permalink / raw) To: Diego Novillo; +Cc: Jeff Law, gcc On Thursday, February 13, 2003, at 10:27 AM, Diego Novillo wrote: > On Wed, 2003-02-12 at 02:17, Daniel Berlin wrote: > >> On a random note, do either of you plan on making calls to may_alias_p >> from outside of tree-dfa.c anymore? >> > Yes, of course. Errr, you removed them all and made may_alias_p static, and told me you wanted to make it private. That's why I asked. This is a completely opposite answer than what you said a few days ago. > I keep wondering why you removed the call to PTA from > tree-dfa.c. I didn't, look again. > Ultimately I want to be able to use PTA. The only reason > why we have a type-based analyzer was because we don't yet have a fully > working PTA one. We shouldn't be having an analyzer that *only* uses PTA or *only* uses TBAA. That's silly. They can work in concert, as they do now. PTA is still used to disambiguate the aliases. >> I had moved the freeing of PTA memory to SSA deletion time when Diego >> was doing alias tags, and using may_alias_p in tree-ssa.c, but if it's >> not going to be called during SSA building anymore, i'll move the >> freeing of PTA memory back to where it used to be. >> > Please put it back. Put what back? I'm not sure you quite understand what I asked above. Before (as in before we had alias leaders, months ago) we had create pta info do may-aliasing delete pta info create ssa optimize delete ssa When we had alias leaders (months ago) we had to do create pta info do may-aliasing create ssa optimize delete ssa delete pta info because you had calls to may_alias_p (which used pta info) from the "create ssa" steps. Now we still have create pta info do-may-aliasing create ssa optimize delete ssa delete pta info but you've removed all the calls to may_alias_p and made it static. If this is how it will stay, we can move back to doing create pta info do may-aliasing delete pta info create ssa optimize delete ssa again. That's all i've asked. > I thought I had neatly separated what we do when > -ftree-points-to is given and when it's not. I don't know why you > removed it. Because it's broken that way (Jeff's changes broke it, AFAICT), and because it's too memory intensive if you create explicit may-alias sets because of global var aliasing. It's not usable that way. I've got figures, but they aren't pretty. It's not a fixable problem. We'll have to create some kind of set to reduce the memory requirements. > If it's because tree-dfa.c doesn't use it properly, let's fix that. > I > want to be able to do both. We still use PTA to disambiguate the sets created now if it's available. We just don't create explicit may aliases with no sets anymore. The quality of the information actually hasn't changed with what we do now, or else i wouldn't have done it. > Type-based should be *very* fast and rather > stupid. PTA can be a bit slower (not necessarily, only if it wants to > be), but more precise. It still is 99.99% as precise as it was before. You seem to think we don't use PTA anymore. This is not the case at all. I've only removed broken code that won't do something we can use anyway, and can't be made to do something we can use. > > Diego. > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [tree-ssa] alias analysis 2003-02-13 16:00 ` Daniel Berlin @ 2003-02-13 16:23 ` Diego Novillo 2003-02-13 17:35 ` Daniel Berlin 0 siblings, 1 reply; 23+ messages in thread From: Diego Novillo @ 2003-02-13 16:23 UTC (permalink / raw) To: Daniel Berlin; +Cc: Jeff Law, gcc On Thu, 2003-02-13 at 10:59, Daniel Berlin wrote: > We shouldn't be having an analyzer that *only* uses PTA or *only* uses > TBAA. > That's silly. > They can work in concert, as they do now. > PTA is still used to disambiguate the aliases. > OK, so what I didn't understand is the model we use for PTA then. We don't actively call may_alias_p() from the optimizers. What we do is collect alias information before hand so that the optimizers can call 'may_aliases (var)' and get a list of all the variables that may alias with 'var'. Perhaps that's not a good model? When I suggested making may_alias_p static I had that model in mind. We pre-compute all the alias relationships before building SSA and then the optimizers can traverse 'may_aliases (var)'. If this is not a good model for doing PTA, then let's change it. > > I thought I had neatly separated what we do when > > -ftree-points-to is given and when it's not. I don't know why you > > removed it. > > Because it's broken that way (Jeff's changes broke it, AFAICT), and > because it's too memory intensive if you create explicit may-alias sets > because of global var aliasing. It's not usable that way. I've got > figures, but they aren't pretty. > It's not a fixable problem. We'll have to create some kind of set to > reduce the memory requirements. > OK, this one is different. I'm starting to believe that we should not model call-clobbering with a forced alias to *.GLOBAL_VAR. It seemed like a neat trick, but it creates problems: (1) Since *.GLOBAL_VAR aliases every type, functions that have call-clobbered variables also have a single alias set where everything aliases everything. Too pessimistic. (2) As you point out, *.GLOBAL_VAR blows up together with PTA. I've been thinking that instead of having this silly aliasing model, we should actually insert a VDEF for every variable that may be clobbered at a call site. > You seem to think we don't use PTA anymore. > This is not the case at all. > Right, sorry about this one. This is just me reading every other line of a message. Thoughts? Diego. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [tree-ssa] alias analysis 2003-02-13 16:23 ` Diego Novillo @ 2003-02-13 17:35 ` Daniel Berlin 0 siblings, 0 replies; 23+ messages in thread From: Daniel Berlin @ 2003-02-13 17:35 UTC (permalink / raw) To: Diego Novillo; +Cc: Jeff Law, gcc On Thursday, February 13, 2003, at 11:21 AM, Diego Novillo wrote: > On Thu, 2003-02-13 at 10:59, Daniel Berlin wrote: > >> We shouldn't be having an analyzer that *only* uses PTA or *only* uses >> TBAA. >> That's silly. >> They can work in concert, as they do now. >> PTA is still used to disambiguate the aliases. >> > OK, so what I didn't understand is the model we use for PTA then. We > don't actively call may_alias_p() from the optimizers. What we do is > collect alias information before hand so that the optimizers can call > 'may_aliases (var)' and get a list of all the variables that may alias > with 'var'. I've not yet figured out whether this model is going to be usable in terms of speed/space. Tell ya what. Let me commit the SSAPRE stuff, and while i'm waiting for the insertion framework (which it needs for me to continue debugging bootstraps), i'll work on something that uses it so i can tell if it's usable this way. > > Perhaps that's not a good model? When I suggested making may_alias_p > static I had that model in mind. We pre-compute all the alias > relationships before building SSA and then the optimizers can traverse > 'may_aliases (var)'. If this is not a good model for doing PTA, then > let's change it. The thing is that i'm not sure. PTA info is currently cheap to both query (because it caches it properly) and update (because it's flow insensitive, and thus, we can just throw new and changed statements at it regardless of where they actually appear. Of course, doing it this way makes it more conservative, so after some large number of updates we could recompute it fully if it makes sense or something). That said, right now (well, in my tree with the hash table lookups and other overhead removed), in computing alias sets, we ask PTA to disambiguate aliases (in 20001226-1.c) 260 million times. That's right. 260,160,770 times. ( 18.03 35.62 35.62 260160770 0.00 0.00 ptr_may_alias_var) It takes 35 seconds. I can't make ptr_may_alias_var any faster, it already takes less than a millisecond. Quick math shows it takes 1.3691533787422801e-07 seconds per call, which is good. :). This means TBAA, addressable determination, etc, is failing to determine things don't alias 260 million times (since PTA is a last resort call). From a different perspective, this also means if our optimizers are going to make less than 260 million PTA queries, it's a win to have *them* query the PTA info when they want to, even though this means an uncleaner interface. (IE they would walk may_aliases, which wouldn't be PTA disambiguated, and if they see some variable alias they really don't want, they call PTA to try to disambiguate it). Without some optimizers that use the info, it's impossible for me to really even guess which makes sense, since I have no figures on "how often an optimizer runs into an alias it really doesn't want to exist (IE is preventing an optimization) that PTA can disambiguate". From an interface perspective, a PTA query isn't very difficult or unclean or anything like that. Personally, I don't think code that looks like for (i = 0; i < VARRAY_ACTIVE_SIZE (may_aliases (var)); i++) { tree alias = VARRAY_TREE (may_aliases (var), i); if (I think this alias prevents me from doing what i want) { if (!ptr_may_alias_var (ptr, var)) <but it really doesn't> else <it really does> } } is going to be horrifically ugly compared to: for (i = 0; i < VARRAY_ACTIVE_SIZE (may_aliases (var)); i++) { tree alias = VARRAY_TREE (may_aliases (var), i); if (this alias prevents me from doing what i want) { <it really does> } } if it doesn't occur too often. But if it's everywhere that an optimizer needs to do this, ... Trying to draw on guidance from other compilers, i'll say i have yet to see a single one where the PTA info isn't integrated with the other alias set info (IE they effectively have just a single may_aliases array per variable). But those that use PTA info all do some combination of 1. Have alias info backed by a file, so that they aren't keeping it all in memory at once if they don't have to (granularity varies). 2. Use some kind of indexing and bitmaps so that their queries are bit testing and their memory usage is smaller. IE In one case they assign each variable an index, and use a sparse triangular bitmatrix to tell what numbers alias. If they want to determine the entire list of aliases (not often) for a variable, they just walk the bitmatrix row for that index, and use a reverse mapping to map the numbers back into variables. > >>> I thought I had neatly separated what we do when >>> -ftree-points-to is given and when it's not. I don't know why you >>> removed it. >> >> Because it's broken that way (Jeff's changes broke it, AFAICT), and >> because it's too memory intensive if you create explicit may-alias >> sets >> because of global var aliasing. It's not usable that way. I've got >> figures, but they aren't pretty. >> It's not a fixable problem. We'll have to create some kind of set to >> reduce the memory requirements. >> > OK, this one is different. I'm starting to believe that we should not > model call-clobbering with a forced alias to *.GLOBAL_VAR. It seemed > like a neat trick, but it creates problems: > > (1) Since *.GLOBAL_VAR aliases every type, functions that have > call-clobbered variables also have a single alias set where everything > aliases everything. Too pessimistic. > > (2) As you point out, *.GLOBAL_VAR blows up together with PTA. Blows up is an understatement. It's more like a nuclear holocaust. > > I've been thinking that instead of having this silly aliasing model, we > should actually insert a VDEF for every variable that may be clobbered > at a call site. > >> You seem to think we don't use PTA anymore. >> This is not the case at all. >> > Right, sorry about this one. This is just me reading every other line > of a message. > > > Thoughts? Diego. > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [tree-ssa] alias analysis 2003-02-12 4:37 ` [tree-ssa] alias analysis law 2003-02-12 7:33 ` Daniel Berlin @ 2003-02-13 15:32 ` Diego Novillo 2003-02-15 0:20 ` law 2003-02-20 1:43 ` law 1 sibling, 2 replies; 23+ messages in thread From: Diego Novillo @ 2003-02-13 15:32 UTC (permalink / raw) To: Jeff Law; +Cc: gcc On Tue, 2003-02-11 at 23:41, law@redhat.com wrote: > In message <20030211203337.GA6202@tornado.toronto.redhat.com>, Diego Novillo wr > ites: > >In this case we would want to split alias sets whose members > >cannot alias each other. But this could quickly take us back to > >quadratic behaviour. Say we have this initial alias set for X > >and Y: > No more so than the current algorithm. We can make the current algorithm > blow up by having a different type for every variable and indirection > for each of those types. Suddenly you're in the "oh god this sucks" > situation again. > To get into that situation you would need a program with tens of thousands of *different* types and tens of thousands of different variables of those types. If you got that kind of program, I bet you you will lose no matter what you do. > The reason my code tends to produce faster overall alias analysis is > because we avoid computing any may aliasing information for load-load > situations. > Faster than what we have now? At this point I will need to see the patch :) Right now we are so fast mainly because we bag most addressables and dereferences together. Your patch would refine this by separating load-load aliases. How can that be faster? You're adding more checks to the aliasing code :) But I agree with you that it can be comparable in complexity. If so, I'm all for it. What I would like to avoid is getting into the situation where we start to implement various additional heuristics to the type-based analyzer instead of relying on the PTA code. I think I'd rather have a good PTA implementation. The type-based analyzer was something to get by in the meantime, really. > Yes, splitting the sets would work too, but that seems to be avoiding > the real issue -- creating too many useless alias relationships to begin > with. > Yeah, good point. Diego. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [tree-ssa] alias analysis 2003-02-13 15:32 ` Diego Novillo @ 2003-02-15 0:20 ` law 2003-02-15 2:01 ` Daniel Berlin 2003-02-19 1:04 ` Diego Novillo 2003-02-20 1:43 ` law 1 sibling, 2 replies; 23+ messages in thread From: law @ 2003-02-15 0:20 UTC (permalink / raw) To: Diego Novillo; +Cc: gcc In message <1045149809.10501.18.camel@frodo>, Diego Novillo writes: >On Tue, 2003-02-11 at 23:41, law@redhat.com wrote: >> In message <20030211203337.GA6202@tornado.toronto.redhat.com>, Diego Novill >o wr >> ites: >> >In this case we would want to split alias sets whose members >> >cannot alias each other. But this could quickly take us back to >> >quadratic behaviour. Say we have this initial alias set for X >> >and Y: >> No more so than the current algorithm. We can make the current algorithm >> blow up by having a different type for every variable and indirection >> for each of those types. Suddenly you're in the "oh god this sucks" >> situation again. >> >To get into that situation you would need a program with tens of >thousands of *different* types and tens of thousands of different >variables of those types. If you got that kind of program, I bet you >you will lose no matter what you do. Precisely. >> The reason my code tends to produce faster overall alias analysis is >> because we avoid computing any may aliasing information for load-load >> situations. >> >Faster than what we have now? At this point I will need to see the >patch :) Right now we are so fast mainly because we bag most >addressables and dereferences together. Yup. My scheme is faster than what we have now. FWIW, after my various improvements to CCP, alias analysis has become the clear CPU hog again as far as the tree optimizers are concerned (with gimplification running a close second). And I know how to make mine even faster and probably use less memory as well :-) >What I would like to avoid is getting into the situation where we start >to implement various additional heuristics to the type-based analyzer >instead of relying on the PTA code. I think I'd rather have a good PTA >implementation. The type-based analyzer was something to get by in the >meantime, really. Conceptually you should think of PTA as a way to prune the aliases found by type analysis. In all likelihood they're going to have to work together. jeff ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [tree-ssa] alias analysis 2003-02-15 0:20 ` law @ 2003-02-15 2:01 ` Daniel Berlin 2003-02-19 1:04 ` Diego Novillo 1 sibling, 0 replies; 23+ messages in thread From: Daniel Berlin @ 2003-02-15 2:01 UTC (permalink / raw) To: law; +Cc: Diego Novillo, gcc On Friday, February 14, 2003, at 07:12 PM, law@redhat.com wrote: > In message <1045149809.10501.18.camel@frodo>, Diego Novillo writes: >> On Tue, 2003-02-11 at 23:41, law@redhat.com wrote: >>> In message <20030211203337.GA6202@tornado.toronto.redhat.com>, Diego >>> Novill >> o wr >>> ites: >>>> In this case we would want to split alias sets whose members >>>> cannot alias each other. But this could quickly take us back to >>>> quadratic behaviour. Say we have this initial alias set for X >>>> and Y: >>> No more so than the current algorithm. We can make the current >>> algorithm >>> blow up by having a different type for every variable and indirection >>> for each of those types. Suddenly you're in the "oh god this sucks" >>> situation again. >>> >> To get into that situation you would need a program with tens of >> thousands of *different* types and tens of thousands of different >> variables of those types. If you got that kind of program, I bet you >> you will lose no matter what you do. > Precisely. > > >>> The reason my code tends to produce faster overall alias analysis is >>> because we avoid computing any may aliasing information for load-load >>> situations. >>> >> Faster than what we have now? At this point I will need to see the >> patch :) Right now we are so fast mainly because we bag most >> addressables and dereferences together. > Yup. My scheme is faster than what we have now. > > FWIW, after my various improvements to CCP, alias analysis has become > the > clear CPU hog again as far as the tree optimizers are concerned > (with gimplification running a close second). And I know how to make > mine even faster and probably use less memory as well :-) > > >> What I would like to avoid is getting into the situation where we >> start >> to implement various additional heuristics to the type-based analyzer >> instead of relying on the PTA code. I think I'd rather have a good >> PTA >> implementation. The type-based analyzer was something to get by in >> the >> meantime, really. > Conceptually you should think of PTA as a way to prune the aliases > found by type analysis. In all likelihood they're going to have to > work together. > Yes, but it's likely going to need to be the other way around for speed reasons. PTA, without TBAA, generates orders of magnitudes less aliases that need to be pruned, in just as much time as TBAA takes. Thus, you'd want to use PTA and disambiguate the results using TBAA, as it would be faster. Otherwise, you end up asking PTA about aliases that TBAA can't figure out, 206 million times, like we do now, rather than using the results, and asking TBAA 100k times about the remaining aliases (or whatever). Keep that in mind as you are spending time optimizing the type-based analyzer. > > jeff > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [tree-ssa] alias analysis 2003-02-15 0:20 ` law 2003-02-15 2:01 ` Daniel Berlin @ 2003-02-19 1:04 ` Diego Novillo 2003-02-19 2:54 ` law 1 sibling, 1 reply; 23+ messages in thread From: Diego Novillo @ 2003-02-19 1:04 UTC (permalink / raw) To: Jeff Law; +Cc: gcc On Fri, 2003-02-14 at 19:12, law@redhat.com wrote: > >Faster than what we have now? At this point I will need to see the > >patch :) Right now we are so fast mainly because we bag most > >addressables and dereferences together. > Yup. My scheme is faster than what we have now. > Faster to compute aliasing, or faster when combined with the SSA builder? If you are trying to improve times that are in the fractions of a second for a multi-second compile, I think you're wasting your time. We have much bigger performance problems in the branch, don't you think? > FWIW, after my various improvements to CCP, alias analysis has become the > clear CPU hog again as far as the tree optimizers are concerned > (with gimplification running a close second). And I know how to make > mine even faster and probably use less memory as well :-) > Now you lost me again. Why are you so interested in making alias analysis a CPU hog? We were trying to do the opposite! > >What I would like to avoid is getting into the situation where we start > >to implement various additional heuristics to the type-based analyzer > >instead of relying on the PTA code. I think I'd rather have a good PTA > >implementation. The type-based analyzer was something to get by in the > >meantime, really. > Conceptually you should think of PTA as a way to prune the aliases > found by type analysis. In all likelihood they're going to have to > work together. > I always thought that TBA was a way of pruning PTA. You first do a quick TBA query, if that fails, you do PTA. Diego. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [tree-ssa] alias analysis 2003-02-19 1:04 ` Diego Novillo @ 2003-02-19 2:54 ` law 0 siblings, 0 replies; 23+ messages in thread From: law @ 2003-02-19 2:54 UTC (permalink / raw) To: Diego Novillo; +Cc: gcc In message <1045613408.1867.19.camel@frodo>, Diego Novillo writes: >On Fri, 2003-02-14 at 19:12, law@redhat.com wrote: > >> >Faster than what we have now? At this point I will need to see the >> >patch :) Right now we are so fast mainly because we bag most >> >addressables and dereferences together. >> Yup. My scheme is faster than what we have now. >> >Faster to compute aliasing, or faster when combined with the SSA >builder? If you are trying to improve times that are in the fractions >of a second for a multi-second compile, I think you're wasting your >time. We have much bigger performance problems in the branch, don't you >think? Faster overall. ie, any extra time spent in the SSA builder is recovered by significantly faster alias analysis. While there are other issues that can and should be addressed, the aliasing code (once my remaining CPP fixes are installed) is the slowest part of the tree-ssa path as a whole and is taking a significant amount of CPU time. To put it in perspective, over 2% of our time building cc1 is spent in the tree alias analysis code. That's a hell of a lot better than we used to be, but 2% frankly is way too much. IMHO if any part of the SSA path is taking more than 1% of the total compilation time, then it needs to be looked at very carefully. While it's great to have the ability to do more complex analysis and optimizations using tree-ssa, we have to be very aware of introducing more slow code into the compiler. It's very important to use good algorithms and implementations, otherwise we're going to be introducing a lot of crap that nobody will ever use. >> FWIW, after my various improvements to CCP, alias analysis has become the >> clear CPU hog again as far as the tree optimizers are concerned >> (with gimplification running a close second). And I know how to make >> mine even faster and probably use less memory as well :-) >> >Now you lost me again. Why are you so interested in making alias >analysis a CPU hog? We were trying to do the opposite! Before my changes to CCP it was the slowest part of the SSA path, taking on the order of 20 seconds for my components of cc1 test (out of a total of around 700 seconds). Now CCP is down to around 4 seconds for that same test. Alias analysis takes 13 seconds for that test and gimplification takes around 15 seconds. Neither are affected by my changes. But both clearly need to be worked on further as each accounts for about 2% of our total compilation time and are the biggest cpu hogs in the SSA path. By no means am I looking to slow things down -- what's happening is merely a matter of taking the worst offenders, making them fast, then continuing to iterate. With my CCP changes, CCP gets under the radar again. With CCP off the radar screen, it's time to look at another hog -- which happens to be our aliasing code. >> >What I would like to avoid is getting into the situation where we start >> >to implement various additional heuristics to the type-based analyzer >> >instead of relying on the PTA code. I think I'd rather have a good PTA >> >implementation. The type-based analyzer was something to get by in the >> >meantime, really. >> Conceptually you should think of PTA as a way to prune the aliases >> found by type analysis. In all likelihood they're going to have to >> work together. >> >I always thought that TBA was a way of pruning PTA. You first do a >quick TBA query, if that fails, you do PTA. Either way, our TBA is too bloody slow, even with your nice changes from a couple weeks ago. The amount of useless work we do in TBA needs to be reduced. jeff ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [tree-ssa] alias analysis 2003-02-13 15:32 ` Diego Novillo 2003-02-15 0:20 ` law @ 2003-02-20 1:43 ` law 2003-02-20 1:53 ` Diego Novillo 1 sibling, 1 reply; 23+ messages in thread From: law @ 2003-02-20 1:43 UTC (permalink / raw) To: Diego Novillo; +Cc: gcc In message <1045149809.10501.18.camel@frodo>, Diego Novillo writes: > >> The reason my code tends to produce faster overall alias analysis is >> because we avoid computing any may aliasing information for load-load >> situations. >> >Faster than what we have now? At this point I will need to see the >patch :) Right now we are so fast mainly because we bag most >addressables and dereferences together. > >Your patch would refine this by separating load-load aliases. How can >that be faster? You're adding more checks to the aliasing code :) >But I agree with you that it can be comparable in complexity. If so, >I'm all for it. The first (and most important) thing to realize is that most functions have more loads than stores. And in fact, many functions have just loads and no stores. To show the effect, let's (just for the sake of argument) look at a naive alias analysis. foreach indirect I { foreach indirect J check for conflicts between I & J foreach addressable K check for conflicts between I & K } Which performs I * J + I * K checks. Which is just I * (J + K) We know that I & J are related from which we get I * (I/2 + K) checks This algorithm is clearly dominated by I -- the number of indirect references. I includes indirects created by both stores and loads. If you split loads & stores you get something like this: foreach store S { foreach store T check for conflicts between S & T foreach load L check for conflicts between S & L } Which is S * (S/2 + L). Which looks the same until you realize that typically S will be much smaller than I. Which is good -- when we can drastically decrease dominating variable, we get much better behavior. The same concepts apply to the scheme currently on the branch. We register alias sets for all the stores, then check the stores & loads for conflicts with the registered alias sets. We have to register far far fewer alias sets because we only register those used by store instructions. A secondary effect is that our may alias sets get smaller, meaning that our cost to manage those sets can go down significantly. ie, we get far fewer calls to add_may_alias. The proof is in the pudding so they say. Separating loads from stores cuts alias time by nearly 70%. That's nothing to sneeze at. >What I would like to avoid is getting into the situation where we start >to implement various additional heuristics to the type-based analyzer >instead of relying on the PTA code. I think I'd rather have a good PTA >implementation. The type-based analyzer was something to get by in the >meantime, really. No heuristics needed. And remember, you're still going to need type based alias analysis. jeff ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [tree-ssa] alias analysis 2003-02-20 1:43 ` law @ 2003-02-20 1:53 ` Diego Novillo 2003-02-20 18:18 ` law 0 siblings, 1 reply; 23+ messages in thread From: Diego Novillo @ 2003-02-20 1:53 UTC (permalink / raw) To: law; +Cc: gcc On Wed, 19 Feb 2003, Jeff Law wrote: > The proof is in the pudding so they say. Separating loads from stores > cuts alias time by nearly 70%. That's nothing to sneeze at. > Sold. Let's do it then. Of course, by having better aliasing info we are going to make the compiler want to insert lots more PHI nodes, but you knew that already :) Diego. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [tree-ssa] alias analysis 2003-02-20 1:53 ` Diego Novillo @ 2003-02-20 18:18 ` law 0 siblings, 0 replies; 23+ messages in thread From: law @ 2003-02-20 18:18 UTC (permalink / raw) To: Diego Novillo; +Cc: gcc In message <20030220002936.GA29249@tornado.toronto.redhat.com>, Diego Novillo w rites: >On Wed, 19 Feb 2003, Jeff Law wrote: > >> The proof is in the pudding so they say. Separating loads from stores >> cuts alias time by nearly 70%. That's nothing to sneeze at. >> >Sold. Let's do it then. I'm testing what I hope is the final version right now. >Of course, by having better aliasing >info we are going to make the compiler want to insert lots more >PHI nodes, but you knew that already :) Yup. Typically this isn't a problem. But there are of course exceptions such as 20001226-1.c. I'll probably end up pushing a revisit of the global life code on my todo stack. Jeff ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [tree-ssa] alias analysis 2003-02-11 20:35 ` Diego Novillo ` (2 preceding siblings ...) 2003-02-12 4:37 ` [tree-ssa] alias analysis law @ 2003-02-12 6:30 ` law 3 siblings, 0 replies; 23+ messages in thread From: law @ 2003-02-12 6:30 UTC (permalink / raw) To: Diego Novillo; +Cc: gcc In message <20030211203337.GA6202@tornado.toronto.redhat.com>, Diego Novillo wr ites: >On Tue, 11 Feb 2003, Jeff Law wrote: > >> >> int x = 5; >> >> int y = 10; >> >> >> >> foo (&x, &y); >> >> >> >> >> >> We'd have distinct tags for x & y, even though they have the same alias >> >> set. >> >> >> >Now you lost me. I thought you were separating load from store >> >aliases? Aren't x and y store aliases here? Or is it load >> >aliases? I think I need a step-by-step description again :) >> They are stores, but they are stores to distinct VAR_DECLs. There's no >> way they can actually alias each other as the underlying objects will >> always be distinct. Ie, there is no way that x = foo will every modify y. So, just to provide a little more color on the two ways I think we can improve on the initial generation of may alias code. As I stated before, both are based on the notion that separating loads from stores can both speed up the alias analysis phase and provide more accurate information. I've already shown data which supports the belief that we can speed up the compiler. In the first scheme we keep separate tags for addressable VAR_DECLs even if they have the same alias set number. This requires us to walk all the store tags when checking to see if a load is aliased by any of the stores. Note that if we encounter a store via an INDIRECT_REF we glob all the stores via addressable VAR_DECLs with the same alias set number to use the tag for the INDIRECT_REF. The second scheme is simpler in that we always glob addressable VAR_DECLs with the same alias set into a single tag. This can lead to slightly worse alias information than the first scheme. Interestingly enough it also had slightly worse overall performance than the first scheme (but was still better than the code on the branch). First I instrumented find_alias_for. For the current branch sources it records how often an alias was found or not found when walking the indirect_refs and how often an alias was found or not found when walking the addresssables. Calls for indirect_refs which found an alias : 79896 Calls for indirect_refs which did not find an alias : 8 Totals calls for indirect_refs: : 79904 Calls for addressable_vars which found an alias : 16382 Calls for addressable_vars which did not find an alias: 469 Total calls for addressable_vars: : 16851 Total calls: : 96755 As you can see the vast majority of calls resulted in finding an alias. Then I instrumented the compiler using the two schemes noted above. Because the arrays are tracking totally different concepts than the current branch the numbers below aren't a direct comparison, but they do give you something to think about: Calls for stores which found a store alias : 31665 (31659) Calls for stores which did not find a store alias : 0 Total calls for stores : 31665 Calls for loads which found a store alias : 69780 (69932) Calls for loads which did not find a store alias : 4801 Total calls for loads : 74581 Total calls : 106246 The number in parenthesis is how many aliases were found -- remember we can't stop when we find the first alias. As you can see it was pretty rare to find a second alias. For the scheme which separates loads from stores, but always globs addressable VAR_DECLs we get: Calls for stores which found an alias : 31659 Calls for stores which did not find an alias: : 0 Total calls for stores : 31659 Calls for loads which found an alias : 69808 Calls for loads which did not find an alias : 4773 Total calls for loads : 74581 Total calls : 106240 Note that since this scheme globs, we can stop when we find the first alias. So what does all this mean? First, my scheme surprisingly does 10.5% more calls into find_alias_for. I say surprising because the compilers with my scheme are both around 1% faster than the current branch. I suspect, but have not confirmed that the increase in calls can be accounted for by the fact that my schemes generate a call for both a load and a store through the same INDIRECT_REF, but the current sources would generate a single call. The other difference is the number of calls which resulted in finding no may aliases. On the current branch only 0.5% of the calls resulted in finding no aliases. Contrast this to my code in which 4.4% of the calls resulted in not finding an alias (this is a good thing). Presumably this results in fewer vdef/vuses that need to be created, rewritten, etc etc. But I haven't verified this. [ In a worst case scenario the 4000 disambiguated references could come from the 10k extra calls. I really doubt that's the case, but it's definitely possible. ] Anyway, we could use some flag bits to reduce the redundant calls, which might give us results more comparable to the current branch sources. Anyway, I'm going to table this until Diego gets back. While I don't think there's an affect on the SSA builder and I think that doing something in this space is wise, it's clear that there's stuff to hash through. Jeff ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2003-02-20 18:11 UTC | newest] Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2003-02-11 17:29 [tree-ssa] alias analysis law 2003-02-11 18:44 ` Diego Novillo 2003-02-11 18:51 ` law 2003-02-11 20:35 ` Diego Novillo 2003-02-11 20:35 ` Diego Novillo 2003-02-11 22:39 ` [tree-ssa]: Always compute sets (Was Re: [tree-ssa] alias analysis) Daniel Berlin 2003-02-12 4:37 ` [tree-ssa] alias analysis law 2003-02-12 7:33 ` Daniel Berlin 2003-02-13 15:55 ` Diego Novillo 2003-02-13 16:00 ` Daniel Berlin 2003-02-13 16:14 ` Diego Novillo 2003-02-13 16:00 ` Daniel Berlin 2003-02-13 16:23 ` Diego Novillo 2003-02-13 17:35 ` Daniel Berlin 2003-02-13 15:32 ` Diego Novillo 2003-02-15 0:20 ` law 2003-02-15 2:01 ` Daniel Berlin 2003-02-19 1:04 ` Diego Novillo 2003-02-19 2:54 ` law 2003-02-20 1:43 ` law 2003-02-20 1:53 ` Diego Novillo 2003-02-20 18:18 ` law 2003-02-12 6:30 ` law
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).