On 23 May 2017 at 19:10, Prathamesh Kulkarni wrote: > On 19 May 2017 at 19:02, Jan Hubicka wrote: >>> >>> * LTO and memory management >>> This is a general question about LTO and memory management. >>> IIUC the following sequence takes place during normal LTO: >>> LGEN: generate_summary, write_summary >>> WPA: read_summary, execute ipa passes, write_opt_summary >>> >>> So I assumed it was OK in LGEN to allocate return_callees_map in >>> generate_summary and free it in write_summary and during WPA, allocate >>> return_callees_map in read_summary and free it after execute (since >>> write_opt_summary does not require return_callees_map). >>> >>> However with fat LTO, it seems the sequence changes for LGEN with >>> execute phase takes place after write_summary. However since >>> return_callees_map is freed in pure_const_write_summary and >>> propagate_malloc() accesses it in execute stage, it results in >>> segmentation fault. >>> >>> To work around this, I am using the following hack in pure_const_write_summary: >>> // FIXME: Do not free if -ffat-lto-objects is enabled. >>> if (!global_options.x_flag_fat_lto_objects) >>> free_return_callees_map (); >>> Is there a better approach for handling this ? >> >> I think most passes just do not free summaries with -flto. We probably want >> to fix it to make it possible to compile multiple units i.e. from plugin by >> adding release_summaries method... >> So I would say it is OK to do the same as others do and leak it with -flto. >>> diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c >>> index e457166ea39..724c26e03f6 100644 >>> --- a/gcc/ipa-pure-const.c >>> +++ b/gcc/ipa-pure-const.c >>> @@ -56,6 +56,7 @@ along with GCC; see the file COPYING3. If not see >>> #include "tree-scalar-evolution.h" >>> #include "intl.h" >>> #include "opts.h" >>> +#include "ssa.h" >>> >>> /* Lattice values for const and pure functions. Everything starts out >>> being const, then may drop to pure and then neither depending on >>> @@ -69,6 +70,15 @@ enum pure_const_state_e >>> >>> const char *pure_const_names[3] = {"const", "pure", "neither"}; >>> >>> +enum malloc_state_e >>> +{ >>> + PURE_CONST_MALLOC_TOP, >>> + PURE_CONST_MALLOC, >>> + PURE_CONST_MALLOC_BOTTOM >>> +}; >> >> It took me a while to work out what PURE_CONST means here :) >> I would just call it something like STATE_MALLOC_TOP... or so. >> ipa_pure_const is outdated name from the time pass was doing only >> those two. >>> @@ -109,6 +121,10 @@ typedef struct funct_state_d * funct_state; >>> >>> static vec funct_state_vec; >>> >>> +/* A map from node to subset of callees. The subset contains those callees >>> + * whose return-value is returned by the node. */ >>> +static hash_map< cgraph_node *, vec* > *return_callees_map; >>> + >> >> Hehe, a special case of return jump function. We ought to support those more generally. >> How do you keep it up to date over callgraph changes? >>> @@ -921,6 +1055,23 @@ end: >>> if (TREE_NOTHROW (decl)) >>> l->can_throw = false; >>> >>> + if (ipa) >>> + { >>> + vec v = vNULL; >>> + l->malloc_state = PURE_CONST_MALLOC_BOTTOM; >>> + if (DECL_IS_MALLOC (decl)) >>> + l->malloc_state = PURE_CONST_MALLOC; >>> + else if (malloc_candidate_p (DECL_STRUCT_FUNCTION (decl), v)) >>> + { >>> + l->malloc_state = PURE_CONST_MALLOC_TOP; >>> + vec *callees_p = new vec (vNULL); >>> + for (unsigned i = 0; i < v.length (); ++i) >>> + callees_p->safe_push (v[i]); >>> + return_callees_map->put (fn, callees_p); >>> + } >>> + v.release (); >>> + } >>> + >> >> I would do non-ipa variant, too. I think most attributes can be detected that way >> as well. >> >> The patch generally makes sense to me. It would be nice to make it easier to write such >> a basic propagators across callgraph (perhaps adding a template doing the basic >> propagation logic). Also I think you need to solve the problem with keeping your >> summaries up to date across callgraph node removal and duplications. > Thanks for the suggestions, I will try to address them in a follow-up patch. > IIUC, I would need to modify ipa-pure-const cgraph hooks - > add_new_function, remove_node_data, duplicate_node_data > to keep return_callees_map up-to-date across callgraph node insertions > and removal ? > > Also, if instead of having a separate data-structure like return_callees_map, > should we rather have a flag within cgraph_edge, which marks that the > caller may return the value of the callee ? Hi, Sorry for the very late response. I have attached an updated version of the prototype patch, which adds a non-ipa variant, and keeps return_callees_map up-to-date across callgraph node insertions and removal. For the non-ipa variant, malloc_candidate_p() additionally checks that all the "return callees" have DECL_IS_MALLOC set to true. Bootstrapped+tested and LTO bootstrapped+tested on x86_64-unknown-linux-gnu. Does it look OK so far ? Um sorry for this silly question, but I don't really understand how does indirect call propagation work in ipa-pure-const ? For example consider propagation of nothrow attribute in following test-case: __attribute__((noinline, noclone, nothrow)) int f1(int k) { return k; } __attribute__((noinline, noclone)) static int foo(int (*p)(int)) { return p(10); } __attribute__((noinline, noclone)) int bar(void) { return foo(f1); } Shouldn't foo and bar be also marked as nothrow ? Since foo indirectly calls f1 which is nothrow and bar only calls foo ? The local-pure-const2 dump shows function is locally throwing for "foo" and "bar". Um, I was wondering how to get "points-to" analysis for function-pointers, to get list of callees that may be indirectly called from that function pointer ? In the patch I just set node to bottom if it contains indirect calls which is far from ideal :( I would be much grateful for suggestions on how to handle indirect calls. Thanks! Regards, Prathamesh > > Thanks, > Prathamesh >> >> Honza