From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 22818 invoked by alias); 23 May 2014 14:50:46 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 22803 invoked by uid 89); 23 May 2014 14:50:46 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_50 autolearn=ham version=3.3.2 X-HELO: mx2.suse.de Received: from cantor2.suse.de (HELO mx2.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (CAMELLIA256-SHA encrypted) ESMTPS; Fri, 23 May 2014 14:50:41 +0000 Received: from relay1.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id D2E6BAD3B; Fri, 23 May 2014 14:50:37 +0000 (UTC) Date: Fri, 23 May 2014 14:50:00 -0000 From: Martin Jambor To: Richard Biener Cc: GCC Patches , Jan Hubicka Subject: Re: [PATCH 3/7] IPA-CP escape and clobber analysis Message-ID: <20140523145037.GB13095@virgil.suse> Mail-Followup-To: Richard Biener , GCC Patches , Jan Hubicka References: <20140521131634.178838544@virgil.suse.cz> <20140521131634.400660424@virgil.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-IsSubscribed: yes X-SW-Source: 2014-05/txt/msg02021.txt.bz2 Hi, sorry, I should have added a better description of the overall algorithm, I will try to do that now, I hope I will at least clarify what stage does what. At summary generation time, we process one function after another, looking at their bodies. There are three new things in the generated summaries: 1) How do actual arguments of calls relate to the formal parameters and how they relate to each other. If an argument in one call refers to the same object as an argument in another call or as a formal parameter, we want to know they are the same memory object. This is captured in jump functions, jump functions which did not have one get an integer index. 2) Which pointer formal parameters escape in this function and which pointer actual arguments of all calls escape in this function. This is stored as a bit per each formal parameter and each other (locally unescaped) memory object passed as an actual argument in one or more calls. 3) Whether we modify the memory pointed to by the formal and actual arguments in this function. Also just a but per memory object. When doing the local analysis, we allocate an ipa_escape structure for each SSA name and each addressable local declaration and after examining SSA definitions, store how they relate to each other (ie which point to the same thing and to what offsets which is useful for subsequent patches). We also look at uses of SSA names to see what objects escape locally. These structures are also used when building jump functions. When we now that passed data escapes locally, we mark it directly to the jump function. If it does not, we store an index into the jump function which identifies the memory object - it is an index to vector of ipa_ref_descriptor structures. Which are allocated to have one element for each formal parameter - locally escaped and non-pointer ones are marked as escaped - and every other locally unescaped memory object which is passed to a called function. ipa_escapes and associated data are then deallocated and we move on to another function. During WPA, we basically propagate escape and clobber flags across the call graph. Escape flags are propagated more or less in both directions, it is perhaps best described by figure 4 of http://sysrun.haifa.il.ibm.com/hrl/greps2007/papers/ipa-agg-no_copyright.pdf (I called them unusable flags in that paper some seven years ago) Modified flags are propagated only from callees to callers, of course. On Wed, May 21, 2014 at 04:50:33PM +0200, Richard Biener wrote: > On Wed, May 21, 2014 at 3:16 PM, Martin Jambor wrote: > > Hi, > > > > this patch is rather big but not overly complicated. Its goal is to > > figure out whether data passed to a function by reference escapes > > (somewhere, not necessarily in that particular function) and is > > potentially clobbered (in that one function or its callees). > > > > The result is stored into call graph node global structure, at least > > for now, because it is supposed to live longer than IPA-CP > > optimization info and be available for PTA later in the pipeline. > > Before that, however, quite a lot of intermediate results are stored > > in a number of places. First of all, there is a vector describing all > > SSA names and address taken local aggregates which is used to figure > > out relations between them and do the local escape and clobber > > analysis (I am aware that a local aggregate might incorrectly pass as > > non-clobbered, that is fixed by the fifth patch, this one is big > > enough as it is and it does not really matter here). > > > > We then store the local results describing formal parameters and > > so-far-presumed-unescaped aggregates and malloced data that is passed > > as actual arguments to other functions into a new vector ref_descs. I > > did not store this into the existing descriptors vector because there > > are often more elements. Also, I had to extend the UNKNOWN, > > KNOWN_TYPE and CONSTANT jump functions with an index into this new > > vector (PASS_THROUGH and ANCESTOR reuse the index into parameters), so > > there is quite a lot of new getter and setter methods. > > > > This information is used by simple queue based interprocedural > > propagation. Eventually, the information is stored into the call > > graph node, as described above. After propagation, data in ref_descs > > and in the call graph are the same, only the call graph can live much > > longer. One set of flags that is not copied to call graph nodes are > > callee_clobbered flags, which only IPA-CP uses it in a subsequent > > patch (and which would require maintenance during inlining). > > > > There are more uses of the flags introduced by subsequent patches. In > > this one, the only one is that IPA-CP modification phase is able to > > use the results instead of querying AA and is capable of doing more > > replacements of aggregate values when the aggregate is unescaped and > > not clobbered. > > > > The following table summarizes what the pass can discover now. All > > compilations are with -Ofast -flto. (I should have counted only > > pointer typed parameters but well, that thought occurred to me too > > late. All non-pointer ones are automatically considered clobbered.) > > Please note that in Fortran benchmarks, this information is often > > already available through fnspec flags. But we can discover a few > > more (see the last patch for some more information). > > > > | | | | | | | Callee | | > > | Test | Params | Noescape | % | Noclobber | % | noclobber | % | > > | | | | | | | | | > > |--------------------+--------+----------+-------+-----------+-------+-----------+-------+ > > | FF libxul.so | 462725 | 10422 | 2.25 | 4954 | 1.07 | 8872 | 1.92 | > > | Tramp 3D | 6344 | 1019 | 16.06 | 985 | 15.53 | 1005 | 15.84 | > > |--------------------+--------+----------+-------+-----------+-------+-----------+-------+ > > | perlbench | 2550 | 87 | 3.41 | 10 | 0.39 | 61 | 2.39 | > > | bzip | 194 | 28 | 14.43 | 1 | 0.52 | 13 | 6.70 | > > | gcc | 10725 | 179 | 1.67 | 18 | 0.17 | 147 | 1.37 | > > | mcf | 57 | 4 | 7.02 | 0 | 0.00 | 4 | 7.02 | > > | gobmk | 8873 | 132 | 1.49 | 3 | 0.03 | 85 | 0.96 | > > | hmmer | 643 | 71 | 11.04 | 8 | 1.24 | 64 | 9.95 | > > | sjeng | 161 | 5 | 3.11 | 0 | 0.00 | 5 | 3.11 | > > | libquantum | 187 | 48 | 25.67 | 6 | 3.21 | 14 | 7.49 | > > | h264ref | 1092 | 48 | 4.40 | 4 | 0.37 | 47 | 4.30 | > > | astar | 217 | 28 | 12.90 | 3 | 1.38 | 15 | 6.91 | > > | xalancbmk | 28861 | 737 | 2.55 | 536 | 1.86 | 712 | 2.47 | > > |--------------------+--------+----------+-------+-----------+-------+-----------+-------+ > > | bwaves | 74 | 35 | 47.30 | 25 | 33.78 | 35 | 47.30 | > > | gamess | 26059 | 3693 | 14.17 | 2796 | 10.73 | 3572 | 13.71 | > > | milc | 429 | 22 | 5.13 | 11 | 2.56 | 22 | 5.13 | > > | zeusmp | 284 | 31 | 10.92 | 2 | 0.70 | 31 | 10.92 | > > | gromacs | 5514 | 230 | 4.17 | 54 | 0.98 | 202 | 3.66 | > > | cactusADM | 2354 | 49 | 2.08 | 13 | 0.55 | 44 | 1.87 | > > | leslie3d | 18 | 0 | 0.00 | 0 | 0.00 | 0 | 0.00 | > > | namd | 163 | 0 | 0.00 | 0 | 0.00 | 0 | 0.00 | > > | soplex | 2341 | 80 | 3.42 | 10 | 0.43 | 55 | 2.35 | > > | povray | 4046 | 244 | 6.03 | 51 | 1.26 | 201 | 4.97 | > > | calculix | 6260 | 1109 | 17.72 | 672 | 10.73 | 933 | 14.90 | > > | GemsFDTD | 289 | 41 | 14.19 | 27 | 9.34 | 32 | 11.07 | > > | tonto | 7255 | 1361 | 18.76 | 1178 | 16.24 | 1329 | 18.32 | > > | lbm | 27 | 4 | 14.81 | 3 | 11.11 | 4 | 14.81 | > > | wrf | 14212 | 4375 | 30.78 | 3358 | 23.63 | 4120 | 28.99 | > > | sphinx3 | 770 | 16 | 2.08 | 1 | 0.13 | 15 | 1.95 | > > |--------------------+--------+----------+-------+-----------+-------+-----------+-------+ > > | ac.f90 | 21 | 14 | 66.67 | 7 | 33.33 | 14 | 66.67 | > > | aermod.f90 | 600 | 134 | 22.33 | 59 | 9.83 | 124 | 20.67 | > > | air.f90 | 85 | 41 | 48.24 | 14 | 16.47 | 41 | 48.24 | > > | capacita.f90 | 42 | 18 | 42.86 | 16 | 38.10 | 18 | 42.86 | > > | channel2.f90 | 12 | 4 | 33.33 | 4 | 33.33 | 4 | 33.33 | > > | doduc.f90 | 132 | 68 | 51.52 | 39 | 29.55 | 68 | 51.52 | > > | fatigue2.f90 | 65 | 43 | 66.15 | 20 | 30.77 | 43 | 66.15 | > > | gas_dyn2.f90 | 97 | 22 | 22.68 | 6 | 6.19 | 21 | 21.65 | > > | induct2.f90 | 121 | 41 | 33.88 | 24 | 19.83 | 41 | 33.88 | > > | linpk.f90 | 42 | 10 | 23.81 | 7 | 16.67 | 10 | 23.81 | > > | mdbx.f90 | 51 | 26 | 50.98 | 9 | 17.65 | 26 | 50.98 | > > | mp_prop_design.f90 | 2 | 0 | 0.00 | 0 | 0.00 | 0 | 0.00 | > > | nf.f90 | 41 | 8 | 19.51 | 8 | 19.51 | 8 | 19.51 | > > | protein.f90 | 116 | 40 | 34.48 | 25 | 21.55 | 35 | 30.17 | > > | rnflow.f90 | 212 | 54 | 25.47 | 37 | 17.45 | 51 | 24.06 | > > | test_fpu2.f90 | 160 | 22 | 13.75 | 14 | 8.75 | 18 | 11.25 | > > | tfft2.f90 | 7 | 3 | 42.86 | 0 | 0.00 | 3 | 42.86 | > > > > I hope to improve the results for example by propagating malloc > > attribute to callers. > > > > I have bootstrapped and tested this on x86_64, additionally I also > > checked it passes an LTO-bootstrap and LTO-built Firefox. I assume > > there will be many comments but after I address them, I'd like to > > commit this to trunk. > > > > Thanks, > > > > Martin > > > > > > 2014-04-30 Martin Jambor > > > > * cgraph.h (cgraph_global_info): New fields noescape_parameters > > and noclobber_parameters. > > (cgraph_param_noescape_p): Declare. > > (cgraph_set_param_noescape): Likewise. > > (cgraph_param_noclobber_p): Likewise. > > (cgraph_set_param_noclobber): Likewise. > > * ipa-prop.h (ipa_unknown_data): New type. > > (ipa_known_type_data): New fields escape_ref_valid and > > escape_ref_index. > > (ipa_constant_data): Likewise. > > (jump_func_value): New field unknown. > > (ipa_get_jf_unknown_esc_ref_valid): New function. > > (ipa_get_jf_unknown_esc_ref_index): Likewise. > > (ipa_get_jf_known_type_esc_ref_valid): Likewise. > > (ipa_get_jf_known_type_esc_ref_index): Likewise. > > (ipa_get_jf_constant_esc_ref_valid): Likewise. > > (ipa_get_jf_constant_esc_ref_index): Likewise. > > (ipa_ref_descriptor): New type. > > (ipa_node_params): New fields ref_descs and node_up_enqueued. > > (ipa_is_ref_escaped): New function. > > (ipa_is_ref_clobbered): Likewise. > > (ipa_is_ref_callee_clobbered): Likewise. > > (ipa_is_param_ref_safely_constant): Likewise. > > (ipa_spread_escapes): Declare. > > * ipa-prop.c: Include stringpool.h, tree-ssaname.h and pointer-set.h. > > (ipa_escape): New type. > > (valid_escape_result_index): New function. > > (func_body_info): New fields func, escapes and decl_escapes. > > (ipa_print_node_jump_functions_for_edge): Dump new fields. > > (ipa_set_jf_unknown): New function. Use it instead of directly > > setting a jump functions type elsewhere. > > (ipa_set_jf_unknown_copy): New function. > > (ipa_set_jf_unknown_ref_index): Likewise. > > (ipa_set_jf_known_type_copy): Likewise. > > (ipa_set_jf_known_type): Initialize new fields. > > (ipa_set_jf_known_type_ref_index): New function. > > (ipa_set_jf_constant): Initialize new fields. > > (ipa_set_jf_constant_ref_index): New function. > > (ipa_get_tracked_refs_count): Likewise. > > (ipa_set_ref_clobbered): Likewise. > > (ipa_get_tracked_refs_count): Likewise. > > (ipa_set_ref_escaped): Likewise. > > (ipa_set_ref_clobbered): Likewise. > > (ipa_set_ref_callee_clobbered): Likewise. > > (ipa_load_from_parm_agg_1): Use const_ref parameter flag. > > (get_escape_for_ref): New function. > > (get_escape_for_value): Likewise. > > (ipa_compute_jump_functions_for_edge): Add reference info to jump > > functions. Wrapped comments to 80 columns, added a checking assert > > all jump functions start with no information. > > (visit_ref_for_mod_analysis): Renamed to visit_ref_mark_it_used. > > Simplified comment. > > (ipa_analyze_params_uses_in_bb): Renamed to ipa_analyze_bb_statements. > > Simplified comment. > > (analyze_phi_escapes): New function. > > (analyze_ssa_escape): Likewise. > > (analyze_all_ssa_escapes): Likewise. > > (create_escape_structures): Likewise. > > (free_escape_structures): Likewise. > > (pick_escapes_from_call): Likewise. > > (gather_picked_escapes): Likewise. > > (ipa_analyze_node): Initialize and deinitialize new fbi fields and > > escape structures, call create_escape_structures, > > analyze_all_ssa_escapes and pick_escapes_from_call, assign ref indices > > to formal parameters. > > (escape_spreading_data): New type. > > (enque_to_propagate_escapes_up): New function. > > (enque_to_propagate_escapes_down): Likewise. > > (escape_origin_from_jfunc): Likewise. > > (spread_escapes_up_from_one_alias): Likewise. > > (spread_escapes_up): Likewise. > > (spread_escapes_down): Likewise. > > (ipa_spread_escapes): Likewise. > > (make_unknown_jf_from_known_type_jf): Likewise. > > (combine_known_type_and_ancestor_jfs): Also update ref index fields. > > Switch arguments for consistency, changed the one caller. > > (update_jump_functions_after_inlining): Also update ref index fields, > > make use of unescaped info. > > (update_indirect_edges_after_inlining): Make use of unescaped info. > > (ipa_free_node_params_substructures): Free also ref_desc vector. > > (ipa_node_duplication_hook): Also copy reference descriptor vector and > > const_refs. > > (ipa_print_node_params): Also print reference flags. > > (ipa_write_jump_function): Stream new fields. > > (ipa_read_jump_function): Likewise. > > (ipa_write_node_info): Stream reference description. > > (ipa_read_node_info): Likewise, also clear new flag node_up_enqueued. > > (read_agg_replacement_chain): Whitespace fix. > > (adjust_agg_replacement_values): Also assign const_refs in descriptors > > from those in tranformation data. > > (ipcp_transform_function): Initialize new fields of fbi. > > * ipa-cp.c (agg_pass_through_permissible_p): Make use of the new > > escape information. Accept caller_infom as a parameter, updated all > > callers. > > (propagate_aggs_accross_jump_function): Make use of the new escape > > information. > > (intersect_aggregates_with_edge): Bail out early if a pass_through > > jump function does not allow passing aggregates. Make use of the new > > escape information. Allow NULL values in aggregate jump functions. > > (ipcp_driver): Call spread_escapes. > > * ipa-inline.c (ipa_inline): Call spread_escapes if necessary. > > * cgraph.c (cgraph_param_noescape_p): New function. > > (cgraph_set_param_noescape): Likewise. > > (cgraph_param_noclobber_p): Likewise. > > (cgraph_set_param_noclobber): Likewise. > > * cgraphclones.c (duplicate_thunk_for_node): Assert that noclone and > > noescape bitmaps are NULL. > > (copy_noescape_noclobber_bitmaps): New function. > > (cgraph_clone_node): Copy noescpae and noclobber bitmaps. > > (cgraph_copy_node_for_versioning): Likewise. > > * lto-cgraph.c (output_param_bitmap): Likewise. > > (output_node_opt_summary): Use it to stream args_to_skip, > > combined_args_to_skip, noescape_parameters and noclobber_parameters > > bitmaps. > > (input_param_bitmap): New function. > > (input_node_opt_summary): Use it to stream args_to_skip, > > combined_args_to_skip, noescape_parameters and noclobber_parameters > > bitmaps. > > * tree-inline.c (update_noescape_noclobber_bitmaps): New function. > > (tree_function_versioning): Call it. > > > > testsuite/ > > * gcc.dg/ipa/ipcp-agg-10.c: New test. > > > > Index: src/gcc/ipa-prop.c > > =================================================================== > > --- src.orig/gcc/ipa-prop.c > > +++ src/gcc/ipa-prop.c > > @@ -43,6 +43,8 @@ along with GCC; see the file COPYING3. > > #include "gimple-ssa.h" > > #include "tree-cfg.h" > > #include "tree-phinodes.h" > > +#include "stringpool.h" > > +#include "tree-ssanames.h" > > #include "ssa-iterators.h" > > #include "tree-into-ssa.h" > > #include "tree-dfa.h" > > @@ -60,6 +62,7 @@ along with GCC; see the file COPYING3. > > #include "stringpool.h" > > #include "tree-ssanames.h" > > #include "domwalk.h" > > +#include "pointer-set.h" > > > > /* Intermediate information that we get from alias analysis about a particular > > parameter in a particular basic_block. When a parameter or the memory it > > @@ -91,11 +94,64 @@ struct ipa_bb_info > > vec param_aa_statuses; > > }; > > > > +/* Structure used for intra-procedural escape analysis (and associated > > + memory-write detection). When analyzing function body, we have one for each > > + SSA name and for all address-taken local declarations. */ > > And for all functions at the same time? It asks to space optimize > this ... No, after analyzing one function, the (so far overly optimistic) results are copied to vector of ipa_ref_descriptor which are kept for all functions, and everything accessible only from function_body_info is deallocated. I try to be very careful with data I keep across the whole analysis. I was not so strict with the data that live only while looking at one function, allowed for some convenience and it certainly might be compacted a bit. > > > +struct ipa_escape > > +{ > > + /* If target is non-NULL, this is the offset relative to the reference > > + described by target. */ > > + HOST_WIDE_INT offset; > > + > > + /* If this describes (a part of) data described by other ipa_escape > > + structure, target is non-NULL. In that case, that structure should be > > + used instead of this one and unless explicitely noted, other fields are > > + meaningless. */ > > + struct ipa_escape *target; > > + > > + /* The last seen edge that had a reference to this data among its parameters. > > + Used to make sure we do not pass the same data in two different > > + arguments. */ > > + struct cgraph_edge *last_seen_cs; > > + > > + /* Index of the bool slot where the analyzed flag is going to end up plus > > + one. Zero means this structure will remain unused. */ > > + int result_index; > > + > > + /* True if we have already dealt with this SSA name. Valid even if target is > > + non-NULL. */ > > + bool analyzed; > > + > > + /* Could the address of the data have escaped? */ > > + bool escaped; > > + > > + /* Flag set when an SSA name has been used as a base for a memory write. > > + Only valid when the SSA name is not considered escaped, otherwise it might > > + be incorrectly clear. */ > > + bool write_base; > > +}; > > + > > +/* If ESC has a valid (i.e. non-zero) result_index, return true and store the > > + directly usable (i.e. decremented) index to *INDEX. */ > > + > > +static inline bool > > +valid_escape_result_index (struct ipa_escape *esc, int *index) > > +{ > > + if (esc->result_index == 0) > > + return false; > > + *index = esc->result_index - 1; > > + return true; > > +} > > + > > /* Structure with global information that is only used when looking at function > > body. */ > > > > struct func_body_info > > { > > + /* Struct function of the function that is being analyzed. */ > > + struct function *func; > > + > > /* The node that is being analyzed. */ > > cgraph_node *node; > > DECL_STRUCT_FUNCTION (node->decl) == func? Yes, this is the convenience I was writing above. I can remove it but it is really only one pointer. > > > @@ -105,6 +161,13 @@ struct func_body_info > > /* Information about individual BBs. */ > > vec bb_infos; > > > > + /* Escape analysis information for SSA flags and local addressable > > + declarations. */ > > + vec escapes; > > + > > + /* Mapping from VAR_DECLS to escape information. */ > > + pointer_map *decl_escapes; > > + > > You can map from DECL_UID to an index in escapes which would > be more space efficient? OK, I will make the change. > > > /* Number of parameters. */ > > int param_count; > > > > @@ -282,7 +345,14 @@ ipa_print_node_jump_functions_for_edge ( > > > > fprintf (f, " param %d: ", i); > > if (type == IPA_JF_UNKNOWN) > > - fprintf (f, "UNKNOWN\n"); > > + { > > + fprintf (f, "UNKNOWN"); > > + if (ipa_get_jf_unknown_esc_ref_valid (jump_func)) > > + fprintf (f, ", escape ref: %i\n", > > + ipa_get_jf_unknown_esc_ref_index (jump_func)); > > + else > > + fprintf (f, "\n"); > > + } > > else if (type == IPA_JF_KNOWN_TYPE) > > { > > fprintf (f, "KNOWN TYPE: base "); > > @@ -290,6 +360,9 @@ ipa_print_node_jump_functions_for_edge ( > > fprintf (f, ", offset "HOST_WIDE_INT_PRINT_DEC", component ", > > jump_func->value.known_type.offset); > > print_generic_expr (f, jump_func->value.known_type.component_type, 0); > > + if (ipa_get_jf_known_type_esc_ref_valid (jump_func)) > > + fprintf (f, ", escape ref: %i", > > + ipa_get_jf_known_type_esc_ref_index (jump_func)); > > fprintf (f, "\n"); > > } > > else if (type == IPA_JF_CONST) > > @@ -304,6 +377,9 @@ ipa_print_node_jump_functions_for_edge ( > > print_generic_expr (f, DECL_INITIAL (TREE_OPERAND (val, 0)), > > 0); > > } > > + if (ipa_get_jf_constant_esc_ref_valid (jump_func)) > > + fprintf (f, ", escape ref: %i", > > + ipa_get_jf_constant_esc_ref_index (jump_func)); > > fprintf (f, "\n"); > > } > > else if (type == IPA_JF_PASS_THROUGH) > > @@ -430,6 +506,39 @@ ipa_print_all_jump_functions (FILE *f) > > } > > } > > > > +/* Set jfunc to be a jump function with invalid reference index. */ > > + > > +static void > > +ipa_set_jf_unknown (struct ipa_jump_func *jfunc) > > +{ > > + jfunc->type = IPA_JF_UNKNOWN; > > + jfunc->value.unknown.escape_ref_valid = false; > > +} > > + > > +/* Set JFUNC to be a copy of another unknown jump function SRC. */ > > + > > +static void > > +ipa_set_jf_unknown_copy (struct ipa_jump_func *dst, > > + struct ipa_jump_func *src) > > + > > +{ > > + gcc_checking_assert (src->type == IPA_JF_UNKNOWN); > > + dst->type = IPA_JF_UNKNOWN; > > + dst->value.unknown = src->value.unknown; > > +} > > + > > +/* Set reference description of unknown JFUNC to be valid and referring to > > + INDEX. */ > > + > > +static void > > +ipa_set_jf_unknown_ref_index (struct ipa_jump_func *jfunc, int index) > > +{ > > + gcc_checking_assert (jfunc->type == IPA_JF_UNKNOWN); > > + gcc_checking_assert (index >= 0); > > + jfunc->value.unknown.escape_ref_valid = true; > > + jfunc->value.unknown.escape_ref_index = index; > > +} > > + > > /* Set JFUNC to be a known type jump function. */ > > > > static void > > @@ -445,11 +554,37 @@ ipa_set_jf_known_type (struct ipa_jump_f > > jfunc->value.known_type.offset = offset, > > jfunc->value.known_type.base_type = base_type; > > jfunc->value.known_type.component_type = component_type; > > + jfunc->value.known_type.escape_ref_valid = false; > > + jfunc->value.known_type.escape_ref_index = 0; > > gcc_assert (component_type); > > } > > > > -/* Set JFUNC to be a copy of another jmp (to be used by jump function > > - combination code). The two functions will share their rdesc. */ > > +/* Set reference description of known_type JFUNC to be valid and referring to > > + INDEX. */ > > + > > +static void > > +ipa_set_jf_known_type_ref_index (struct ipa_jump_func *jfunc, int index) > > +{ > > + gcc_checking_assert (jfunc->type == IPA_JF_KNOWN_TYPE); > > + gcc_checking_assert (index >= 0); > > + jfunc->value.known_type.escape_ref_valid = true; > > + jfunc->value.known_type.escape_ref_index = index; > > +} > > + > > +/* Set DST to be a copy of another known type jump function SRC. */ > > + > > +static void > > +ipa_set_jf_known_type_copy (struct ipa_jump_func *dst, > > + struct ipa_jump_func *src) > > + > > +{ > > + gcc_checking_assert (src->type == IPA_JF_KNOWN_TYPE); > > + dst->type = IPA_JF_KNOWN_TYPE; > > + dst->value.known_type = src->value.known_type; > > +} > > + > > +/* Set DST to be a copy of another constant jump function SRC. The two > > + functions will share their rdesc. */ > > > > static void > > ipa_set_jf_cst_copy (struct ipa_jump_func *dst, > > @@ -472,6 +607,8 @@ ipa_set_jf_constant (struct ipa_jump_fun > > SET_EXPR_LOCATION (constant, UNKNOWN_LOCATION); > > jfunc->type = IPA_JF_CONST; > > jfunc->value.constant.value = unshare_expr_without_location (constant); > > + jfunc->value.constant.escape_ref_valid = false; > > + jfunc->value.constant.escape_ref_index = 0; > > > > if (TREE_CODE (constant) == ADDR_EXPR > > && TREE_CODE (TREE_OPERAND (constant, 0)) == FUNCTION_DECL) > > @@ -491,6 +628,19 @@ ipa_set_jf_constant (struct ipa_jump_fun > > jfunc->value.constant.rdesc = NULL; > > } > > > > +/* Set reference description of constant JFUNC to be valid and referring to > > + INDEX. */ > > + > > +static void > > +ipa_set_jf_constant_ref_index (struct ipa_jump_func *jfunc, int index) > > +{ > > + gcc_checking_assert (jfunc->type == IPA_JF_CONST); > > + gcc_checking_assert (index >= 0); > > + jfunc->value.constant.escape_ref_valid = true; > > + jfunc->value.constant.escape_ref_index = index; > > +} > > + > > + > > /* Set JFUNC to be a simple pass-through jump function. */ > > static void > > ipa_set_jf_simple_pass_through (struct ipa_jump_func *jfunc, int formal_id, > > @@ -539,6 +689,41 @@ ipa_set_ancestor_jf (struct ipa_jump_fun > > jfunc->value.ancestor.type_preserved = type_preserved; > > } > > > > +/* Return the number of references tracked for escape analysis in INFO. */ > > + > > +static inline int > > +ipa_get_tracked_refs_count (struct ipa_node_params *info) > > +{ > > + return info->ref_descs.length (); > > +} > > + > > +/* Set escape flag of reference number I of a function corresponding to NODE to > > + VAL. */ > > + > > +static inline void > > +ipa_set_ref_escaped (struct ipa_node_params *info, int i, bool val) > > +{ > > + info->ref_descs[i].escaped = val; > > +} > > + > > +/* Set the clobbered flag corresponding to the Ith tracked reference of the > > + function associated with INFO to VAL. */ > > + > > +static inline void > > +ipa_set_ref_clobbered (struct ipa_node_params *info, int i, bool val) > > +{ > > + info->ref_descs[i].clobbered = val; > > +} > > + > > +/* Set the callee_clobbered flag corresponding to the Ith tracked reference of > > + the function associated with INFO to VAL. */ > > + > > +static inline void > > +ipa_set_ref_callee_clobbered (struct ipa_node_params *info, int i, bool val) > > +{ > > + info->ref_descs[i].callee_clobbered = val; > > +} > > + > > /* Extract the acual BINFO being described by JFUNC which must be a known type > > jump function. */ > > > > @@ -784,7 +969,7 @@ detect_type_change (tree arg, tree base, > > if (!tci.known_current_type > > || tci.multiple_types_encountered > > || offset != 0) > > - jfunc->type = IPA_JF_UNKNOWN; > > + ipa_set_jf_unknown (jfunc); > > else > > ipa_set_jf_known_type (jfunc, 0, tci.known_current_type, comp_type); > > > > @@ -1090,7 +1275,8 @@ ipa_load_from_parm_agg_1 (struct func_bo > > } > > > > if (index >= 0 > > - && parm_ref_data_preserved_p (fbi, index, stmt, op)) > > + && ((fbi && cgraph_param_noclobber_p (fbi->node, index)) > > + || parm_ref_data_preserved_p (fbi, index, stmt, op))) > > { > > *index_p = index; > > *by_ref_p = true; > > @@ -1725,6 +1911,86 @@ ipa_get_callee_param_type (struct cgraph > > return NULL; > > } > > > > +static void > > +analyze_ssa_escape (struct func_body_info *fbi, tree ssa, > > + struct ipa_escape *esc); > > + > > +/* Return the ipa_escape structure suitable for REFERENCE, if it is a > > + declaration or a MEM_REF. Return NULL if there is no structure describing > > + REFERENCE. If a non-NULL result is returned, put the offset of the > > + REFERENCE relative to the start of data described by the result into > > + *OFFSET, and size and max_size as returned by get_ref_base_and_extent to > > + *SIZE and *MAX_SIZE respectively. */ > > + > > +static struct ipa_escape * > > +get_escape_for_ref (struct func_body_info *fbi, tree reference, > > + HOST_WIDE_INT *offset, HOST_WIDE_INT *size, > > + HOST_WIDE_INT *max_size) > > +{ > > + struct ipa_escape *res; > > + tree base = get_ref_base_and_extent (reference, offset, size, max_size); > > + > > + if (DECL_P (base)) > > + { > > + ipa_escape **d_esc = fbi->decl_escapes->contains (base); > > + if (!d_esc) > > + return NULL; > > + res = *d_esc; > > + } > > + else if (TREE_CODE (base) == MEM_REF > > + && TREE_CODE (TREE_OPERAND (base, 0)) == SSA_NAME) > > + { > > + tree ssa = TREE_OPERAND (base, 0); > > + res = &fbi->escapes[SSA_NAME_VERSION (ssa)]; > > + if (!res->analyzed) > > + analyze_ssa_escape (fbi, ssa, res); > > + } > > + else > > + return NULL; > > + > > + if (res->target) > > + { > > + *offset += res->offset; > > + res = res->target; > > + } > > + return res; > > +} > > + > > +/* Return the ipa_escape structure suitable for T, if it is an ssa_name or an > > + ADDR_EXPR. Return NULL if there is not structure for T. If a non-NULL > > + result is returned, put the offset of the value T relative to the start of > > + data described by the result into *OFFSET. */ > > + > > +static struct ipa_escape * > > +get_escape_for_value (struct func_body_info *fbi, tree t, > > + HOST_WIDE_INT *offset) > > +{ > > + if (TREE_CODE (t) == SSA_NAME) > > + { > > + struct ipa_escape *res; > > + *offset = 0; > > + res = &fbi->escapes[SSA_NAME_VERSION (t)]; > > + if (!res->analyzed) > > + analyze_ssa_escape (fbi, t, res); > > + > > + if (res->target) > > + { > > + *offset += res->offset; > > + res = res->target; > > + } > > + > > + return res; > > + } > > + else if (TREE_CODE (t) == ADDR_EXPR) > > + { > > + HOST_WIDE_INT dummy_size, dummy_max_size; > > + return get_escape_for_ref (fbi, TREE_OPERAND (t, 0), offset, &dummy_size, > > + &dummy_max_size); > > + } > > + else > > + return NULL; > > +} > > + > > /* Compute jump function for all arguments of callsite CS and insert the > > information in the jump_functions array in the ipa_edge_args corresponding > > to this callsite. */ > > @@ -1753,6 +2019,8 @@ ipa_compute_jump_functions_for_edge (str > > tree arg = gimple_call_arg (call, n); > > tree param_type = ipa_get_callee_param_type (cs, n); > > > > + gcc_checking_assert (jfunc->type == IPA_JF_UNKNOWN > > + && !ipa_get_jf_unknown_esc_ref_valid (jfunc)); > > if (is_gimple_ip_invariant (arg)) > > ipa_set_jf_constant (jfunc, arg, cs); > > else if (!is_gimple_reg_type (TREE_TYPE (arg)) > > @@ -1807,19 +2075,42 @@ ipa_compute_jump_functions_for_edge (str > > ? TREE_TYPE (param_type) > > : NULL); > > > > - /* If ARG is pointer, we can not use its type to determine the type of aggregate > > - passed (because type conversions are ignored in gimple). Usually we can > > - safely get type from function declaration, but in case of K&R prototypes or > > - variadic functions we can try our luck with type of the pointer passed. > > - TODO: Since we look for actual initialization of the memory object, we may better > > - work out the type based on the memory stores we find. */ > > + /* If ARG is pointer, we can not use its type to determine the type of > > + aggregate passed (because type conversions are ignored in gimple). > > + Usually we can safely get type from function declaration, but in case > > + of K&R prototypes or variadic functions we can try our luck with type > > + of the pointer passed. > > + TODO: Since we look for actual initialization of the memory object, we > > + may better work out the type based on the memory stores we find. */ > > if (!param_type) > > param_type = TREE_TYPE (arg); > > > > - if ((jfunc->type != IPA_JF_PASS_THROUGH > > - || !ipa_get_jf_pass_through_agg_preserved (jfunc)) > > - && (jfunc->type != IPA_JF_ANCESTOR > > - || !ipa_get_jf_ancestor_agg_preserved (jfunc)) > > + HOST_WIDE_INT dummy_offset; > > + struct ipa_escape *esc = get_escape_for_value (fbi, arg, &dummy_offset); > > + int ref_index; > > + if (esc && valid_escape_result_index (esc, &ref_index)) > > + { > > + if (jfunc->type == IPA_JF_UNKNOWN) > > + ipa_set_jf_unknown_ref_index (jfunc, ref_index); > > + else if (jfunc->type == IPA_JF_KNOWN_TYPE) > > + ipa_set_jf_known_type_ref_index (jfunc, ref_index); > > + else if (jfunc->type == IPA_JF_CONST) > > + ipa_set_jf_constant_ref_index (jfunc, ref_index); > > + else > > + { > > + gcc_checking_assert > > + (jfunc->type != IPA_JF_PASS_THROUGH > > + || ipa_get_jf_pass_through_formal_id (jfunc) == ref_index); > > + gcc_checking_assert > > + (jfunc->type != IPA_JF_ANCESTOR > > + || ipa_get_jf_ancestor_formal_id (jfunc) == ref_index); > > + } > > + } > > + > > + /* TODO: We should allow aggregate jump functions even for these types of > > + jump functions but we need to be able to combine them first. */ > > + if (jfunc->type != IPA_JF_PASS_THROUGH > > + && jfunc->type != IPA_JF_ANCESTOR > > && (AGGREGATE_TYPE_P (TREE_TYPE (arg)) > > || POINTER_TYPE_P (param_type))) > > determine_known_aggregate_parts (call, arg, param_type, jfunc); > > @@ -2223,12 +2514,11 @@ ipa_analyze_stmt_uses (struct func_body_ > > ipa_analyze_call_uses (fbi, stmt); > > } > > > > -/* Callback of walk_stmt_load_store_addr_ops for the visit_load. > > - If OP is a parameter declaration, mark it as used in the info structure > > - passed in DATA. */ > > +/* Callback of walk_stmt_load_store_addr_ops. If OP is a parameter > > + declaration, mark it as used in the info structure passed in DATA. */ > > > > static bool > > -visit_ref_for_mod_analysis (gimple, tree op, tree, void *data) > > +visit_ref_mark_it_used (gimple, tree op, tree, void *data) > > { > > struct ipa_node_params *info = (struct ipa_node_params *) data; > > > > @@ -2244,13 +2534,12 @@ visit_ref_for_mod_analysis (gimple, tree > > return false; > > } > > > > -/* Scan the statements in BB and inspect the uses of formal parameters. Store > > - the findings in various structures of the associated ipa_node_params > > - structure, such as parameter flags, notes etc. FBI holds various data about > > - the function being analyzed. */ > > +/* Scan the statements in BB and inspect the uses of formal parameters, escape > > + analysis and so on. FBI holds various data about the function being > > + analyzed. */ > > > > static void > > -ipa_analyze_params_uses_in_bb (struct func_body_info *fbi, basic_block bb) > > +ipa_analyze_bb_statements (struct func_body_info *fbi, basic_block bb) > > { > > gimple_stmt_iterator gsi; > > for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) > > @@ -2262,15 +2551,15 @@ ipa_analyze_params_uses_in_bb (struct fu > > > > ipa_analyze_stmt_uses (fbi, stmt); > > walk_stmt_load_store_addr_ops (stmt, fbi->info, > > - visit_ref_for_mod_analysis, > > - visit_ref_for_mod_analysis, > > - visit_ref_for_mod_analysis); > > + visit_ref_mark_it_used, > > + visit_ref_mark_it_used, > > + visit_ref_mark_it_used); > > } > > for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi)) > > walk_stmt_load_store_addr_ops (gsi_stmt (gsi), fbi->info, > > - visit_ref_for_mod_analysis, > > - visit_ref_for_mod_analysis, > > - visit_ref_for_mod_analysis); > > + visit_ref_mark_it_used, > > + visit_ref_mark_it_used, > > + visit_ref_mark_it_used); > > } > > > > /* Calculate controlled uses of parameters of NODE. */ > > @@ -2344,10 +2633,284 @@ private: > > void > > analysis_dom_walker::before_dom_children (basic_block bb) > > { > > - ipa_analyze_params_uses_in_bb (m_fbi, bb); > > + ipa_analyze_bb_statements (m_fbi, bb); > > ipa_compute_jump_functions_for_bb (m_fbi, bb); > > } > > > > +/* Look at operands of PHI and if any of them is an address of a declaration, > > + mark that declaration escaped. */ > > + > > +void > > +analyze_phi_escapes (gimple phi, struct func_body_info *fbi) > > +{ > > + for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i) > > + { > > + tree op = gimple_phi_arg_def (phi, i); > > + if (TREE_CODE (op) != ADDR_EXPR) > > + continue; > > + > > + tree base = get_base_address (TREE_OPERAND (op, 0)); > > + if (!DECL_P (base)) > > + continue; > > So this means that 'a' escapes in > > tem_1 = &a[i_2]; > > ? Only if that appears in a phi node (my understanding that with a constant index it can). I do treat phi nodes very pessimistically. > > > + ipa_escape **d_esc = fbi->decl_escapes->contains (base); > > + if (!d_esc) > > + continue; > > + (*d_esc)->escaped = true; > > + } > > +} > > + > > +/* Check definition and uses of SSA and update ESC (and potentially escape > > + structures associated with other SSA names) accordingly. */ > > + > > +static void > > +analyze_ssa_escape (struct func_body_info *fbi, tree ssa, > > + struct ipa_escape *esc) > > +{ > > + esc->analyzed = true; > > + if (!POINTER_TYPE_P (TREE_TYPE (ssa))) > > + { > > + esc->escaped = true; > > + return; > > + } > > + > > + /* First we need to check the definition and figure out whether we can work > > + with it or whether this name actually refers to data described by another > > + structure. */ > > + if (!SSA_NAME_IS_DEFAULT_DEF (ssa)) > > + { > > + gimple def = SSA_NAME_DEF_STMT (ssa); > > + > > + if (gimple_assign_single_p (def)) > > + { > > + tree rhs = gimple_assign_rhs1 (def); > > + HOST_WIDE_INT offset; > > + struct ipa_escape *r_esc = get_escape_for_value (fbi, rhs, &offset); > > + if (r_esc) > > + { > > + esc->offset = offset; > > + esc->target = r_esc; > > + } > > + else > > + { > > + esc->escaped = true; > > + return; > > + } > > + } > > + else if (is_gimple_call (def)) > > + { > > + /* TODO: If only C++ new had malloc attribute. */ > > + int flags = gimple_call_flags (def); > > How does ECF_MALLOC make something not escape?! I am not sure I understand. It is the other way round, a result of a non-ECF_MALLOC function is considered escaped. Of course a result of a malloc can escape because of other statements or even in another function. > And why > does the definition "escape" in any stmt?! If this is overly pessimistic (and I see that I should handle at least pointer arithmetics), I can address it as a followup. > > > + if ((flags & ECF_MALLOC) == 0) > > + { > > + esc->escaped = true; > > + return; > > + } > > + } > > + else > > + { > > + if (gimple_code (def) == GIMPLE_PHI) > > + /* Any SSA defined by a PHI is doomed but it is a convenient place > > + to check every pointer phi . */ > > + analyze_phi_escapes (def, fbi); > > + > > + esc->escaped = true; > > + return; > > + } > > + } > > + > > + if (esc->target) > > + esc = esc->target; > > + if (esc->escaped) > > + return; > > + > > + /* If the definition is fine, we need to check the uses. */ > > + > > + imm_use_iterator imm_iter; > > + use_operand_p use; > > + FOR_EACH_IMM_USE_FAST (use, imm_iter, ssa) > > + { > > + gimple stmt = USE_STMT (use); > > + if (is_gimple_debug (stmt)) > > + continue; > > + > > + switch (gimple_code (stmt)) > > + { > > + case GIMPLE_ASSIGN: > > + { > > + if (!gimple_assign_single_p (stmt)) > > + { > > + esc->escaped = true; > > Does that make SSA escape in > > tem_1 = ssa p+ 1; > > ? Yes, and this is an ommission that I forgot to come back to and fix. I will do that. > > > + return; > > + } > > + > > + tree lhs = gimple_assign_lhs (stmt); > > + /* Statements assigning to another SSA are OK, we check all of > > + them. */ > > + if (TREE_CODE (lhs) != SSA_NAME > > + /* If LHS is not an SSA_NAME, RHS cannot be an ADDR_EXPR, and > > + must be either a naked SSA_NAME or a load or an invariant. > > + We only care if it is the SSA name we are after. It can > > + be a different SSA name if the use was on the LHS in a > > + MEM_REF. */ > > + && gimple_assign_rhs1 (stmt) == ssa) > > + { > > + esc->escaped = true; > > + return; > > + } > > + > > + while (handled_component_p (lhs)) > > + lhs = TREE_OPERAND (lhs, 0); > > + if (TREE_CODE (lhs) == MEM_REF > > + && TREE_OPERAND (lhs, 0) == ssa) > > + esc->write_base = true; > > + } > > + break; > > + > > + case GIMPLE_CALL: > > + /* Calls will be dealt with when constructing jump functions. > > + However, indirect calls mean that all values escape (we do IPA > > + escape propagation before any devirtualization) and when not in > > + LTO, even calls to functions in other compilation units are dark > > + holes. On the other hand, builtin free is whitelisted. */ > > + if (!gimple_call_builtin_p (stmt, BUILT_IN_FREE)) > > + { > > + struct cgraph_edge *cs = cgraph_edge (fbi->node, stmt); > > + if (!cs || !cs->callee || (!cs->callee->definition && !flag_lto)) > > + { > > + esc->escaped = true; > > + return; > > + } > > + } > > + break; > > + > > + case GIMPLE_SWITCH: > > + case GIMPLE_COND: > > + /* These are harmless. */ > > + break; > > + > > + default: > > + esc->escaped = true; > > + return; > > + } > > + } > > +} > > + > > +/* Examine escapes of all SSA names. */ > > + > > +static void > > +analyze_all_ssa_escapes (struct func_body_info *fbi) > > +{ > > + for (unsigned i = 1; i < fbi->func->gimple_df->ssa_names->length (); ++i) > > SSANAMES (fbi->func)->length (); Changed. > > > + { > > + tree ssa = ssa_name (i); > > + if (!ssa) > > + continue; > > + struct ipa_escape *esc = &fbi->escapes[SSA_NAME_VERSION (ssa)]; > > + if (esc->analyzed) > > + return; > > + analyze_ssa_escape (fbi, ssa, esc); > > I think it's more cache friendly to walk all stmts instead of all > SSA names and their immediate uses. But maybe not. This is just more convenient. However, it can be done. > > > + } > > +} > > + > > +/* Initialize escape analysis structures in the FBI corresponding to FUNC. */ > > + > > +static void > > +create_escape_structures (struct func_body_info *fbi) > > +{ > > + tree var, parm; > > + unsigned int i, var_idx, var_count = 0; > > + > > + for (parm = DECL_ARGUMENTS (fbi->node->decl); > > + parm; > > + parm = DECL_CHAIN (parm)) > > + if (TREE_ADDRESSABLE (parm)) > > + var_count++; > > + > > + FOR_EACH_LOCAL_DECL (fbi->func, i, var) > > + if (TREE_CODE (var) == VAR_DECL && TREE_ADDRESSABLE (var)) > > + var_count++; > > + > > + fbi->escapes = vNULL; > > + fbi->escapes.safe_grow_cleared (SSANAMES (fbi->func)->length () + var_count); > > you want to use reserve_exact first and then grow. OK > > I miss an overall comment about the used algorithm and its caveats. I hope the few paragraphs at the top of this email will help to clarify the algorithm some more. I will try to add an explanatory comment to the top of the source code file as well. > I see the following > 1) it's not flow or context sensitive > 2) it doesn't handle local memory as non-escape sites > 3) it doesn't handle escapes through return (it handles them pessimistically) Handling returns for the sake of computation of the escape flag would be probably easy. But it would cause problems jump function building that is based on this patch. We would discover that what we optimistically considered independent pointers are actually aliases only at WPA stage and would have to discard all the associated jump functions anyway. The jump function building can however have its own flag (in ipa-cp structures, living only until WPA end). The situation can be similar when it comes to one reference passed in two argument to the same call. Thinking about it more, perhaps the analysis should not even be called escape analysis, it really is an automatic detection of restrict pointer parameters. Code building and using jump functions wants to know that the data can only be accessed through pointers it has under control and knows about. Using that result in PTA as escape is possible but the calculated property is more strict than escape. > > IPA-PTA does 2) and 3), of course not 1). > > For what it does this seems to be quite heavy-weight? I would not say it is heavy-weight. It does add the scan of SSA definitions and uses, the latter can be made part of the body sweep. > > What parts are actually carried out at WPA time? Not escape sites > it seems but only the actual objects (not) escaping, right? If I understand correctly then yes, only the escape propagation is performed at WPA, but it is also performed downwards so... > > How would your algorithm compare to one running at local_pure_const > time looking at a similar simple set of escapes of its function arguments > (but using computed fnspec attributes to handle callees already processed)? ...this would not propagate escapes to callees. > > What would be the complication of handling return functions? > > Maybe I missed them sofar, but do you have testcases that show > actual optimizations that are possible with this? Do you have > numbers for SPEC for optimizations that are triggered? Testcases are mostly in the subsequent patches, this patch has only one. I will have a look at the benchmarks some more. > > At least it seems that all this convolutes the existing (already > convoluted) IPA-PROP engine even more. I will try to make the ipa-prop.c file more comprehensible and better divided and documented. Thanks, Martin